Loop over a list of strings and increment letter count in a corresponding sublist

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
4
down vote

favorite












I have a 2D list as follows:



counts = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K",
"M", "F", "P", "S", "T", "W", "Y", "V", 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, ...;


The first sub-list consists of a heading, and the following sub-lists contain counts, initialized at zero.



I need to loop over another list, sequences, that contains strings plus a heading, and access the corresponding sub-list in counts to increment the appropriate letter count.



For example, take a string from sequences:




MKTIIALSYILCLVFAQKLPGNDNSTATLCLGHHAVPNGTIVKTITNDQIEVTNATELVQSSSTGEICDSPHQILDGKNCTLIDALLGDPQCDGFQNKKWDLFVERSKAYSNCYPYDVPDYASLRSLVASSGTLEFNNESFNWTGVTQNGTSSACIRRSKNSFFSRLNWLTHLNFKYPALNVTMPNNEQFDKLYIWGVHHPGTDKDQIFLYAQASGRITVSTKRSQQTVSPNIGSRPRVRNIPSRISIYWTIVKPGDILLINSTGNLIAPRGYFKIRSGKSSIMRSDAPIGKCNSECITPNGSIPNDKPFQNVNRITYGACPRYVKQNTLKLATGMRNVPEKQTRGIFGAIAGFIENGWEGMVDGWYGFRHQNSEGRGQAADLKSTQAAIDQINGKLNRLIGKTNEKFHQIEKEFSEVEGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFEKTKKQLRENAEDMGNGCFKIYHKCDNACIGSIRNGTYDHDVYRDEALNNRFQIKGVELKSGYKDWILWISFAISCFLLCVALLGFIMWACQKGNIRCNICI




Its corresponding sub-list in counts would be incremented to 31, 27, 45, 30, 18, 27, 25, 25, 42, 11, 48, 44, 37, 8, 23, 20, 41, 34, 11, 19, 25.



I obtained this via StringCount[sequences[[1]], #] & /@ counts[[1]] but am struggling to scale this code, and to make it update the sub-lists in counts instead of returning a new list.










share|improve this question























  • This works at counting, but if I map it over all sequences as Transpose@Tally@Characters@# & /@ sequences it will output multiple headings + counts.
    – briennakh
    32 mins ago










  • This works at counting, but if I map it over all sequences as Transpose@Tally@Characters@# & /@ sequences it will output multiple headings + counts.
    – briennakh
    29 mins ago














up vote
4
down vote

favorite












I have a 2D list as follows:



counts = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K",
"M", "F", "P", "S", "T", "W", "Y", "V", 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, ...;


The first sub-list consists of a heading, and the following sub-lists contain counts, initialized at zero.



I need to loop over another list, sequences, that contains strings plus a heading, and access the corresponding sub-list in counts to increment the appropriate letter count.



For example, take a string from sequences:




MKTIIALSYILCLVFAQKLPGNDNSTATLCLGHHAVPNGTIVKTITNDQIEVTNATELVQSSSTGEICDSPHQILDGKNCTLIDALLGDPQCDGFQNKKWDLFVERSKAYSNCYPYDVPDYASLRSLVASSGTLEFNNESFNWTGVTQNGTSSACIRRSKNSFFSRLNWLTHLNFKYPALNVTMPNNEQFDKLYIWGVHHPGTDKDQIFLYAQASGRITVSTKRSQQTVSPNIGSRPRVRNIPSRISIYWTIVKPGDILLINSTGNLIAPRGYFKIRSGKSSIMRSDAPIGKCNSECITPNGSIPNDKPFQNVNRITYGACPRYVKQNTLKLATGMRNVPEKQTRGIFGAIAGFIENGWEGMVDGWYGFRHQNSEGRGQAADLKSTQAAIDQINGKLNRLIGKTNEKFHQIEKEFSEVEGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFEKTKKQLRENAEDMGNGCFKIYHKCDNACIGSIRNGTYDHDVYRDEALNNRFQIKGVELKSGYKDWILWISFAISCFLLCVALLGFIMWACQKGNIRCNICI




Its corresponding sub-list in counts would be incremented to 31, 27, 45, 30, 18, 27, 25, 25, 42, 11, 48, 44, 37, 8, 23, 20, 41, 34, 11, 19, 25.



I obtained this via StringCount[sequences[[1]], #] & /@ counts[[1]] but am struggling to scale this code, and to make it update the sub-lists in counts instead of returning a new list.










share|improve this question























  • This works at counting, but if I map it over all sequences as Transpose@Tally@Characters@# & /@ sequences it will output multiple headings + counts.
    – briennakh
    32 mins ago










  • This works at counting, but if I map it over all sequences as Transpose@Tally@Characters@# & /@ sequences it will output multiple headings + counts.
    – briennakh
    29 mins ago












up vote
4
down vote

favorite









up vote
4
down vote

favorite











I have a 2D list as follows:



counts = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K",
"M", "F", "P", "S", "T", "W", "Y", "V", 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, ...;


The first sub-list consists of a heading, and the following sub-lists contain counts, initialized at zero.



I need to loop over another list, sequences, that contains strings plus a heading, and access the corresponding sub-list in counts to increment the appropriate letter count.



For example, take a string from sequences:




MKTIIALSYILCLVFAQKLPGNDNSTATLCLGHHAVPNGTIVKTITNDQIEVTNATELVQSSSTGEICDSPHQILDGKNCTLIDALLGDPQCDGFQNKKWDLFVERSKAYSNCYPYDVPDYASLRSLVASSGTLEFNNESFNWTGVTQNGTSSACIRRSKNSFFSRLNWLTHLNFKYPALNVTMPNNEQFDKLYIWGVHHPGTDKDQIFLYAQASGRITVSTKRSQQTVSPNIGSRPRVRNIPSRISIYWTIVKPGDILLINSTGNLIAPRGYFKIRSGKSSIMRSDAPIGKCNSECITPNGSIPNDKPFQNVNRITYGACPRYVKQNTLKLATGMRNVPEKQTRGIFGAIAGFIENGWEGMVDGWYGFRHQNSEGRGQAADLKSTQAAIDQINGKLNRLIGKTNEKFHQIEKEFSEVEGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFEKTKKQLRENAEDMGNGCFKIYHKCDNACIGSIRNGTYDHDVYRDEALNNRFQIKGVELKSGYKDWILWISFAISCFLLCVALLGFIMWACQKGNIRCNICI




Its corresponding sub-list in counts would be incremented to 31, 27, 45, 30, 18, 27, 25, 25, 42, 11, 48, 44, 37, 8, 23, 20, 41, 34, 11, 19, 25.



I obtained this via StringCount[sequences[[1]], #] & /@ counts[[1]] but am struggling to scale this code, and to make it update the sub-lists in counts instead of returning a new list.










share|improve this question















I have a 2D list as follows:



counts = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K",
"M", "F", "P", "S", "T", "W", "Y", "V", 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, ...;


The first sub-list consists of a heading, and the following sub-lists contain counts, initialized at zero.



I need to loop over another list, sequences, that contains strings plus a heading, and access the corresponding sub-list in counts to increment the appropriate letter count.



For example, take a string from sequences:




MKTIIALSYILCLVFAQKLPGNDNSTATLCLGHHAVPNGTIVKTITNDQIEVTNATELVQSSSTGEICDSPHQILDGKNCTLIDALLGDPQCDGFQNKKWDLFVERSKAYSNCYPYDVPDYASLRSLVASSGTLEFNNESFNWTGVTQNGTSSACIRRSKNSFFSRLNWLTHLNFKYPALNVTMPNNEQFDKLYIWGVHHPGTDKDQIFLYAQASGRITVSTKRSQQTVSPNIGSRPRVRNIPSRISIYWTIVKPGDILLINSTGNLIAPRGYFKIRSGKSSIMRSDAPIGKCNSECITPNGSIPNDKPFQNVNRITYGACPRYVKQNTLKLATGMRNVPEKQTRGIFGAIAGFIENGWEGMVDGWYGFRHQNSEGRGQAADLKSTQAAIDQINGKLNRLIGKTNEKFHQIEKEFSEVEGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFEKTKKQLRENAEDMGNGCFKIYHKCDNACIGSIRNGTYDHDVYRDEALNNRFQIKGVELKSGYKDWILWISFAISCFLLCVALLGFIMWACQKGNIRCNICI




Its corresponding sub-list in counts would be incremented to 31, 27, 45, 30, 18, 27, 25, 25, 42, 11, 48, 44, 37, 8, 23, 20, 41, 34, 11, 19, 25.



I obtained this via StringCount[sequences[[1]], #] & /@ counts[[1]] but am struggling to scale this code, and to make it update the sub-lists in counts instead of returning a new list.







list-manipulation numerics counting






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 59 mins ago

























asked 1 hour ago









briennakh

2156




2156











  • This works at counting, but if I map it over all sequences as Transpose@Tally@Characters@# & /@ sequences it will output multiple headings + counts.
    – briennakh
    32 mins ago










  • This works at counting, but if I map it over all sequences as Transpose@Tally@Characters@# & /@ sequences it will output multiple headings + counts.
    – briennakh
    29 mins ago
















  • This works at counting, but if I map it over all sequences as Transpose@Tally@Characters@# & /@ sequences it will output multiple headings + counts.
    – briennakh
    32 mins ago










  • This works at counting, but if I map it over all sequences as Transpose@Tally@Characters@# & /@ sequences it will output multiple headings + counts.
    – briennakh
    29 mins ago















This works at counting, but if I map it over all sequences as Transpose@Tally@Characters@# & /@ sequences it will output multiple headings + counts.
– briennakh
32 mins ago




This works at counting, but if I map it over all sequences as Transpose@Tally@Characters@# & /@ sequences it will output multiple headings + counts.
– briennakh
32 mins ago












This works at counting, but if I map it over all sequences as Transpose@Tally@Characters@# & /@ sequences it will output multiple headings + counts.
– briennakh
29 mins ago




This works at counting, but if I map it over all sequences as Transpose@Tally@Characters@# & /@ sequences it will output multiple headings + counts.
– briennakh
29 mins ago










3 Answers
3






active

oldest

votes

















up vote
2
down vote



accepted










You can use LetterCounts as follows:



letters = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", 
"K",  "M", "F", "P", "S", "T", "W", "Y", "V";
sequences = StringJoin /@ RandomChoice[Capitalize@Alphabet, 10, 100];
lcs = letters /. LetterCounts /@ sequences /. Thread[letters -> 0] ;
counts = Join[letters, lcs];
counts // Grid


enter image description here






share|improve this answer




















  • I like the pretty output!
    – briennakh
    39 mins ago

















up vote
3
down vote













sequences = "MKTIIALSYILCLVFAQKLPGNDNSTATLCLGHHAVPNGTIVKTITNDQIEVTNATELVQSSSTGEIC
DSPHQILDGKNCTLIDALLGDPQCDGFQNKKWDLFVERSKAYSNCYPYDVPDYASLRSLVASSGTLEFNN
ESFNWTGVTQNGTSSACIRRSKNSFFSRLNWLTHLNFKYPALNVTMPNNEQFDKLYIWGVHHPGTDKDQI
FLYAQASGRITVSTKRSQQTVSPNIGSRPRVRNIPSRISIYWTIVKPGDILLINSTGNLIAPRGYFKIRS
GKSSIMRSDAPIGKCNSECITPNGSIPNDKPFQNVNRITYGACPRYVKQNTLKLATGMRNVPEKQTRGIF
GAIAGFIENGWEGMVDGWYGFRHQNSEGRGQAADLKSTQAAIDQINGKLNRLIGKTNEKFHQIEKEFSEV
EGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFEKTKKQLRENAEDMGNGCFKIYHK
CDNACIGSIRNGTYDHDVYRDEALNNRFQIKGVELKSGYKDWILWISFAISCFLLCVALLGFIMWACQKG
NIRCNICI";

counts = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K",
"M", "F", "P", "S", "T", "W", "Y", "V", 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0;


and the code:



new = Values[
(CharacterCounts /@ sequences)[[All, First@counts]]
];

counts[[2 ;;]] += new;
counts



"A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K", "M", 
"F", "P", "S", "T", "W", "Y", "V", 31, 27, 45, 30, 18, 27, 25, 42,
11, 48, 44, 37, 8, 23, 20, 41, 34, 11, 19, 25






share|improve this answer




















  • Thank you, this works as well!
    – briennakh
    35 mins ago










  • But I would need to change the code to accommodate a list of sequences.
    – briennakh
    30 mins ago










  • @briennakh it should work in case of longer counts and sequences. If not please add examples to work with to your question
    – Kuba♦
    21 mins ago










  • This is also much faster than kglr's solution (see my post for timing examples).
    – Henrik Schumacher
    15 mins ago

















up vote
1
down vote













I can propose two things that speed up the letter counting tremendously:



1.) Use ToCharacterCode to convert your strings to packed arrays of integers.



2.) Use a compiled funcion for additive matrix assembly.



Additive assembly of each row can be obtained with this little function.



cAssembleRow = Compile[a, _Integer, 1, max, _Integer,
Block[b,
b = Table[0, max];
Do[b[[Compile`GetElement[a, i]]]++, i, 1, Length[a]];
b
],
CompilationTarget -> "C",
RuntimeAttributes -> Listable,
Parallelization -> True,
RuntimeOptions -> "Speed"
];


Borrowing a bit of code from kglr but cranking up the amount of strings and their length:



sequences = StringJoin /@ RandomChoice[Capitalize@Alphabet, 1000, 1000];
letters = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K", "M", "F", "P", "S", "T", "W", "Y", "V";


Here is how kglr's and Kuba's very elegant solution performs:



lcs = letters /. LetterCounts /@ sequences /. Thread[letters -> 0]; // RepeatedTiming // First
lcs2 = Values[(CharacterCounts /@ sequences)[[All, First@counts]]]; // RepeatedTiming // First



3.65



0.076




My version is a bit more clunky, but it does the job several times faster:



i0 = ToCharacterCode["A"][[1]] - 1;
letterpos = ToCharacterCode[StringJoin[letters]] - i0;

lcs3 = cAssemble[ToCharacterCode[sequences] - i0, 26][[All,letterpos]]; // RepeatedTiming // First
lcs == lcs2



0.0090




When all letters occur in each element of `sequences, then all results are equal:



lcs == lcs2 == lcs 3



True







share|improve this answer


















  • 1




    Henrik, if some letters have 0 count in some sequences, Kubalcs will have Missing[KeyAbsent] instead of 0; so some additional processing is needed.
    – kglr
    5 mins ago










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f182201%2floop-over-a-list-of-strings-and-increment-letter-count-in-a-corresponding-sublis%23new-answer', 'question_page');

);

Post as a guest






























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










You can use LetterCounts as follows:



letters = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", 
"K",  "M", "F", "P", "S", "T", "W", "Y", "V";
sequences = StringJoin /@ RandomChoice[Capitalize@Alphabet, 10, 100];
lcs = letters /. LetterCounts /@ sequences /. Thread[letters -> 0] ;
counts = Join[letters, lcs];
counts // Grid


enter image description here






share|improve this answer




















  • I like the pretty output!
    – briennakh
    39 mins ago














up vote
2
down vote



accepted










You can use LetterCounts as follows:



letters = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", 
"K",  "M", "F", "P", "S", "T", "W", "Y", "V";
sequences = StringJoin /@ RandomChoice[Capitalize@Alphabet, 10, 100];
lcs = letters /. LetterCounts /@ sequences /. Thread[letters -> 0] ;
counts = Join[letters, lcs];
counts // Grid


enter image description here






share|improve this answer




















  • I like the pretty output!
    – briennakh
    39 mins ago












up vote
2
down vote



accepted







up vote
2
down vote



accepted






You can use LetterCounts as follows:



letters = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", 
"K",  "M", "F", "P", "S", "T", "W", "Y", "V";
sequences = StringJoin /@ RandomChoice[Capitalize@Alphabet, 10, 100];
lcs = letters /. LetterCounts /@ sequences /. Thread[letters -> 0] ;
counts = Join[letters, lcs];
counts // Grid


enter image description here






share|improve this answer












You can use LetterCounts as follows:



letters = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", 
"K",  "M", "F", "P", "S", "T", "W", "Y", "V";
sequences = StringJoin /@ RandomChoice[Capitalize@Alphabet, 10, 100];
lcs = letters /. LetterCounts /@ sequences /. Thread[letters -> 0] ;
counts = Join[letters, lcs];
counts // Grid


enter image description here







share|improve this answer












share|improve this answer



share|improve this answer










answered 48 mins ago









kglr

161k8184384




161k8184384











  • I like the pretty output!
    – briennakh
    39 mins ago
















  • I like the pretty output!
    – briennakh
    39 mins ago















I like the pretty output!
– briennakh
39 mins ago




I like the pretty output!
– briennakh
39 mins ago










up vote
3
down vote













sequences = "MKTIIALSYILCLVFAQKLPGNDNSTATLCLGHHAVPNGTIVKTITNDQIEVTNATELVQSSSTGEIC
DSPHQILDGKNCTLIDALLGDPQCDGFQNKKWDLFVERSKAYSNCYPYDVPDYASLRSLVASSGTLEFNN
ESFNWTGVTQNGTSSACIRRSKNSFFSRLNWLTHLNFKYPALNVTMPNNEQFDKLYIWGVHHPGTDKDQI
FLYAQASGRITVSTKRSQQTVSPNIGSRPRVRNIPSRISIYWTIVKPGDILLINSTGNLIAPRGYFKIRS
GKSSIMRSDAPIGKCNSECITPNGSIPNDKPFQNVNRITYGACPRYVKQNTLKLATGMRNVPEKQTRGIF
GAIAGFIENGWEGMVDGWYGFRHQNSEGRGQAADLKSTQAAIDQINGKLNRLIGKTNEKFHQIEKEFSEV
EGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFEKTKKQLRENAEDMGNGCFKIYHK
CDNACIGSIRNGTYDHDVYRDEALNNRFQIKGVELKSGYKDWILWISFAISCFLLCVALLGFIMWACQKG
NIRCNICI";

counts = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K",
"M", "F", "P", "S", "T", "W", "Y", "V", 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0;


and the code:



new = Values[
(CharacterCounts /@ sequences)[[All, First@counts]]
];

counts[[2 ;;]] += new;
counts



"A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K", "M", 
"F", "P", "S", "T", "W", "Y", "V", 31, 27, 45, 30, 18, 27, 25, 42,
11, 48, 44, 37, 8, 23, 20, 41, 34, 11, 19, 25






share|improve this answer




















  • Thank you, this works as well!
    – briennakh
    35 mins ago










  • But I would need to change the code to accommodate a list of sequences.
    – briennakh
    30 mins ago










  • @briennakh it should work in case of longer counts and sequences. If not please add examples to work with to your question
    – Kuba♦
    21 mins ago










  • This is also much faster than kglr's solution (see my post for timing examples).
    – Henrik Schumacher
    15 mins ago














up vote
3
down vote













sequences = "MKTIIALSYILCLVFAQKLPGNDNSTATLCLGHHAVPNGTIVKTITNDQIEVTNATELVQSSSTGEIC
DSPHQILDGKNCTLIDALLGDPQCDGFQNKKWDLFVERSKAYSNCYPYDVPDYASLRSLVASSGTLEFNN
ESFNWTGVTQNGTSSACIRRSKNSFFSRLNWLTHLNFKYPALNVTMPNNEQFDKLYIWGVHHPGTDKDQI
FLYAQASGRITVSTKRSQQTVSPNIGSRPRVRNIPSRISIYWTIVKPGDILLINSTGNLIAPRGYFKIRS
GKSSIMRSDAPIGKCNSECITPNGSIPNDKPFQNVNRITYGACPRYVKQNTLKLATGMRNVPEKQTRGIF
GAIAGFIENGWEGMVDGWYGFRHQNSEGRGQAADLKSTQAAIDQINGKLNRLIGKTNEKFHQIEKEFSEV
EGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFEKTKKQLRENAEDMGNGCFKIYHK
CDNACIGSIRNGTYDHDVYRDEALNNRFQIKGVELKSGYKDWILWISFAISCFLLCVALLGFIMWACQKG
NIRCNICI";

counts = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K",
"M", "F", "P", "S", "T", "W", "Y", "V", 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0;


and the code:



new = Values[
(CharacterCounts /@ sequences)[[All, First@counts]]
];

counts[[2 ;;]] += new;
counts



"A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K", "M", 
"F", "P", "S", "T", "W", "Y", "V", 31, 27, 45, 30, 18, 27, 25, 42,
11, 48, 44, 37, 8, 23, 20, 41, 34, 11, 19, 25






share|improve this answer




















  • Thank you, this works as well!
    – briennakh
    35 mins ago










  • But I would need to change the code to accommodate a list of sequences.
    – briennakh
    30 mins ago










  • @briennakh it should work in case of longer counts and sequences. If not please add examples to work with to your question
    – Kuba♦
    21 mins ago










  • This is also much faster than kglr's solution (see my post for timing examples).
    – Henrik Schumacher
    15 mins ago












up vote
3
down vote










up vote
3
down vote









sequences = "MKTIIALSYILCLVFAQKLPGNDNSTATLCLGHHAVPNGTIVKTITNDQIEVTNATELVQSSSTGEIC
DSPHQILDGKNCTLIDALLGDPQCDGFQNKKWDLFVERSKAYSNCYPYDVPDYASLRSLVASSGTLEFNN
ESFNWTGVTQNGTSSACIRRSKNSFFSRLNWLTHLNFKYPALNVTMPNNEQFDKLYIWGVHHPGTDKDQI
FLYAQASGRITVSTKRSQQTVSPNIGSRPRVRNIPSRISIYWTIVKPGDILLINSTGNLIAPRGYFKIRS
GKSSIMRSDAPIGKCNSECITPNGSIPNDKPFQNVNRITYGACPRYVKQNTLKLATGMRNVPEKQTRGIF
GAIAGFIENGWEGMVDGWYGFRHQNSEGRGQAADLKSTQAAIDQINGKLNRLIGKTNEKFHQIEKEFSEV
EGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFEKTKKQLRENAEDMGNGCFKIYHK
CDNACIGSIRNGTYDHDVYRDEALNNRFQIKGVELKSGYKDWILWISFAISCFLLCVALLGFIMWACQKG
NIRCNICI";

counts = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K",
"M", "F", "P", "S", "T", "W", "Y", "V", 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0;


and the code:



new = Values[
(CharacterCounts /@ sequences)[[All, First@counts]]
];

counts[[2 ;;]] += new;
counts



"A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K", "M", 
"F", "P", "S", "T", "W", "Y", "V", 31, 27, 45, 30, 18, 27, 25, 42,
11, 48, 44, 37, 8, 23, 20, 41, 34, 11, 19, 25






share|improve this answer












sequences = "MKTIIALSYILCLVFAQKLPGNDNSTATLCLGHHAVPNGTIVKTITNDQIEVTNATELVQSSSTGEIC
DSPHQILDGKNCTLIDALLGDPQCDGFQNKKWDLFVERSKAYSNCYPYDVPDYASLRSLVASSGTLEFNN
ESFNWTGVTQNGTSSACIRRSKNSFFSRLNWLTHLNFKYPALNVTMPNNEQFDKLYIWGVHHPGTDKDQI
FLYAQASGRITVSTKRSQQTVSPNIGSRPRVRNIPSRISIYWTIVKPGDILLINSTGNLIAPRGYFKIRS
GKSSIMRSDAPIGKCNSECITPNGSIPNDKPFQNVNRITYGACPRYVKQNTLKLATGMRNVPEKQTRGIF
GAIAGFIENGWEGMVDGWYGFRHQNSEGRGQAADLKSTQAAIDQINGKLNRLIGKTNEKFHQIEKEFSEV
EGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFEKTKKQLRENAEDMGNGCFKIYHK
CDNACIGSIRNGTYDHDVYRDEALNNRFQIKGVELKSGYKDWILWISFAISCFLLCVALLGFIMWACQKG
NIRCNICI";

counts = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K",
"M", "F", "P", "S", "T", "W", "Y", "V", 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0;


and the code:



new = Values[
(CharacterCounts /@ sequences)[[All, First@counts]]
];

counts[[2 ;;]] += new;
counts



"A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K", "M", 
"F", "P", "S", "T", "W", "Y", "V", 31, 27, 45, 30, 18, 27, 25, 42,
11, 48, 44, 37, 8, 23, 20, 41, 34, 11, 19, 25







share|improve this answer












share|improve this answer



share|improve this answer










answered 47 mins ago









Kuba♦

99.7k11194492




99.7k11194492











  • Thank you, this works as well!
    – briennakh
    35 mins ago










  • But I would need to change the code to accommodate a list of sequences.
    – briennakh
    30 mins ago










  • @briennakh it should work in case of longer counts and sequences. If not please add examples to work with to your question
    – Kuba♦
    21 mins ago










  • This is also much faster than kglr's solution (see my post for timing examples).
    – Henrik Schumacher
    15 mins ago
















  • Thank you, this works as well!
    – briennakh
    35 mins ago










  • But I would need to change the code to accommodate a list of sequences.
    – briennakh
    30 mins ago










  • @briennakh it should work in case of longer counts and sequences. If not please add examples to work with to your question
    – Kuba♦
    21 mins ago










  • This is also much faster than kglr's solution (see my post for timing examples).
    – Henrik Schumacher
    15 mins ago















Thank you, this works as well!
– briennakh
35 mins ago




Thank you, this works as well!
– briennakh
35 mins ago












But I would need to change the code to accommodate a list of sequences.
– briennakh
30 mins ago




But I would need to change the code to accommodate a list of sequences.
– briennakh
30 mins ago












@briennakh it should work in case of longer counts and sequences. If not please add examples to work with to your question
– Kuba♦
21 mins ago




@briennakh it should work in case of longer counts and sequences. If not please add examples to work with to your question
– Kuba♦
21 mins ago












This is also much faster than kglr's solution (see my post for timing examples).
– Henrik Schumacher
15 mins ago




This is also much faster than kglr's solution (see my post for timing examples).
– Henrik Schumacher
15 mins ago










up vote
1
down vote













I can propose two things that speed up the letter counting tremendously:



1.) Use ToCharacterCode to convert your strings to packed arrays of integers.



2.) Use a compiled funcion for additive matrix assembly.



Additive assembly of each row can be obtained with this little function.



cAssembleRow = Compile[a, _Integer, 1, max, _Integer,
Block[b,
b = Table[0, max];
Do[b[[Compile`GetElement[a, i]]]++, i, 1, Length[a]];
b
],
CompilationTarget -> "C",
RuntimeAttributes -> Listable,
Parallelization -> True,
RuntimeOptions -> "Speed"
];


Borrowing a bit of code from kglr but cranking up the amount of strings and their length:



sequences = StringJoin /@ RandomChoice[Capitalize@Alphabet, 1000, 1000];
letters = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K", "M", "F", "P", "S", "T", "W", "Y", "V";


Here is how kglr's and Kuba's very elegant solution performs:



lcs = letters /. LetterCounts /@ sequences /. Thread[letters -> 0]; // RepeatedTiming // First
lcs2 = Values[(CharacterCounts /@ sequences)[[All, First@counts]]]; // RepeatedTiming // First



3.65



0.076




My version is a bit more clunky, but it does the job several times faster:



i0 = ToCharacterCode["A"][[1]] - 1;
letterpos = ToCharacterCode[StringJoin[letters]] - i0;

lcs3 = cAssemble[ToCharacterCode[sequences] - i0, 26][[All,letterpos]]; // RepeatedTiming // First
lcs == lcs2



0.0090




When all letters occur in each element of `sequences, then all results are equal:



lcs == lcs2 == lcs 3



True







share|improve this answer


















  • 1




    Henrik, if some letters have 0 count in some sequences, Kubalcs will have Missing[KeyAbsent] instead of 0; so some additional processing is needed.
    – kglr
    5 mins ago














up vote
1
down vote













I can propose two things that speed up the letter counting tremendously:



1.) Use ToCharacterCode to convert your strings to packed arrays of integers.



2.) Use a compiled funcion for additive matrix assembly.



Additive assembly of each row can be obtained with this little function.



cAssembleRow = Compile[a, _Integer, 1, max, _Integer,
Block[b,
b = Table[0, max];
Do[b[[Compile`GetElement[a, i]]]++, i, 1, Length[a]];
b
],
CompilationTarget -> "C",
RuntimeAttributes -> Listable,
Parallelization -> True,
RuntimeOptions -> "Speed"
];


Borrowing a bit of code from kglr but cranking up the amount of strings and their length:



sequences = StringJoin /@ RandomChoice[Capitalize@Alphabet, 1000, 1000];
letters = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K", "M", "F", "P", "S", "T", "W", "Y", "V";


Here is how kglr's and Kuba's very elegant solution performs:



lcs = letters /. LetterCounts /@ sequences /. Thread[letters -> 0]; // RepeatedTiming // First
lcs2 = Values[(CharacterCounts /@ sequences)[[All, First@counts]]]; // RepeatedTiming // First



3.65



0.076




My version is a bit more clunky, but it does the job several times faster:



i0 = ToCharacterCode["A"][[1]] - 1;
letterpos = ToCharacterCode[StringJoin[letters]] - i0;

lcs3 = cAssemble[ToCharacterCode[sequences] - i0, 26][[All,letterpos]]; // RepeatedTiming // First
lcs == lcs2



0.0090




When all letters occur in each element of `sequences, then all results are equal:



lcs == lcs2 == lcs 3



True







share|improve this answer


















  • 1




    Henrik, if some letters have 0 count in some sequences, Kubalcs will have Missing[KeyAbsent] instead of 0; so some additional processing is needed.
    – kglr
    5 mins ago












up vote
1
down vote










up vote
1
down vote









I can propose two things that speed up the letter counting tremendously:



1.) Use ToCharacterCode to convert your strings to packed arrays of integers.



2.) Use a compiled funcion for additive matrix assembly.



Additive assembly of each row can be obtained with this little function.



cAssembleRow = Compile[a, _Integer, 1, max, _Integer,
Block[b,
b = Table[0, max];
Do[b[[Compile`GetElement[a, i]]]++, i, 1, Length[a]];
b
],
CompilationTarget -> "C",
RuntimeAttributes -> Listable,
Parallelization -> True,
RuntimeOptions -> "Speed"
];


Borrowing a bit of code from kglr but cranking up the amount of strings and their length:



sequences = StringJoin /@ RandomChoice[Capitalize@Alphabet, 1000, 1000];
letters = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K", "M", "F", "P", "S", "T", "W", "Y", "V";


Here is how kglr's and Kuba's very elegant solution performs:



lcs = letters /. LetterCounts /@ sequences /. Thread[letters -> 0]; // RepeatedTiming // First
lcs2 = Values[(CharacterCounts /@ sequences)[[All, First@counts]]]; // RepeatedTiming // First



3.65



0.076




My version is a bit more clunky, but it does the job several times faster:



i0 = ToCharacterCode["A"][[1]] - 1;
letterpos = ToCharacterCode[StringJoin[letters]] - i0;

lcs3 = cAssemble[ToCharacterCode[sequences] - i0, 26][[All,letterpos]]; // RepeatedTiming // First
lcs == lcs2



0.0090




When all letters occur in each element of `sequences, then all results are equal:



lcs == lcs2 == lcs 3



True







share|improve this answer














I can propose two things that speed up the letter counting tremendously:



1.) Use ToCharacterCode to convert your strings to packed arrays of integers.



2.) Use a compiled funcion for additive matrix assembly.



Additive assembly of each row can be obtained with this little function.



cAssembleRow = Compile[a, _Integer, 1, max, _Integer,
Block[b,
b = Table[0, max];
Do[b[[Compile`GetElement[a, i]]]++, i, 1, Length[a]];
b
],
CompilationTarget -> "C",
RuntimeAttributes -> Listable,
Parallelization -> True,
RuntimeOptions -> "Speed"
];


Borrowing a bit of code from kglr but cranking up the amount of strings and their length:



sequences = StringJoin /@ RandomChoice[Capitalize@Alphabet, 1000, 1000];
letters = "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K", "M", "F", "P", "S", "T", "W", "Y", "V";


Here is how kglr's and Kuba's very elegant solution performs:



lcs = letters /. LetterCounts /@ sequences /. Thread[letters -> 0]; // RepeatedTiming // First
lcs2 = Values[(CharacterCounts /@ sequences)[[All, First@counts]]]; // RepeatedTiming // First



3.65



0.076




My version is a bit more clunky, but it does the job several times faster:



i0 = ToCharacterCode["A"][[1]] - 1;
letterpos = ToCharacterCode[StringJoin[letters]] - i0;

lcs3 = cAssemble[ToCharacterCode[sequences] - i0, 26][[All,letterpos]]; // RepeatedTiming // First
lcs == lcs2



0.0090




When all letters occur in each element of `sequences, then all results are equal:



lcs == lcs2 == lcs 3



True








share|improve this answer














share|improve this answer



share|improve this answer








edited 1 min ago

























answered 20 mins ago









Henrik Schumacher

38.7k253114




38.7k253114







  • 1




    Henrik, if some letters have 0 count in some sequences, Kubalcs will have Missing[KeyAbsent] instead of 0; so some additional processing is needed.
    – kglr
    5 mins ago












  • 1




    Henrik, if some letters have 0 count in some sequences, Kubalcs will have Missing[KeyAbsent] instead of 0; so some additional processing is needed.
    – kglr
    5 mins ago







1




1




Henrik, if some letters have 0 count in some sequences, Kubalcs will have Missing[KeyAbsent] instead of 0; so some additional processing is needed.
– kglr
5 mins ago




Henrik, if some letters have 0 count in some sequences, Kubalcs will have Missing[KeyAbsent] instead of 0; so some additional processing is needed.
– kglr
5 mins ago

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f182201%2floop-over-a-list-of-strings-and-increment-letter-count-in-a-corresponding-sublis%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What does second last employer means? [closed]

Installing NextGIS Connect into QGIS 3?

One-line joke