How can I find a more efficient solution to mapping characters to digits?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
5
down vote

favorite
1












Given:



strings = "send", "more", "money";
letterMap = Flatten[Characters[strings]] // DeleteDuplicates
(*Example: "s", "e", "n", "d", "m", "o", "r", "y"*)
numberMap = RandomSample[Range[0, 9], 8]
(*Example: 7, 3, 6, 9, 5, 0, 4, 1*)


I would like to convert these strings to integers as determined by the mapping specified by letterMap and numberMap.



For example, "send" should be equal to 7369 because s->7, e->3, n->6, d->9.



I have done this three different ways, but none of my solutions are computationally efficient enough. Can you find a more efficient method?



From Fastest to Slowest:



Using Associations and Lookup (Method 2):



StringToNumber2[str_String, letters_List, numbers_List] :=
(
Clear[rules, assoc];
rules = Thread[Rule[letters, numbers]];
assoc = Association[rules];
FromDigits[Lookup[assoc, #] & /@ Characters[str]]
)


Using Rules and ReplaceAll (can be improved a little using Thread like above) (Method 0):



StringToNumber0[str_String, letterMap_List, numberMap_List] :=
(
Clear[rules];
rules = Rule @@@ Partition[Riffle[letterMap, numberMap], 2];
FromDigits[Characters[str]] /. rules
)


Using Position and Part (Method 1):



StringToNumber1[str_String, letterMap_List, numberMap_List] :=
(
FromDigits[
numberMap[[Flatten[Position[letterMap, #]]]] & /@ Characters[str] //
Flatten]
)


Expected output, given inputs above:



7369, 5043, 50631


Most importantly, do you have any tips in improving efficiency of Mathematica code when faced with problems like this?



(Update after receiving wonderful feedback!)
Henrik's Compiled SparseArray method was definitely the quickest, although I'm wondering how much improvement of other methods could be realized by utilizing Compile!



Below was how I measured the performance of each method:



(*See how long the algorithms take to analyze 100,000 pieces of data*)
Column[Table[StringToNumber0[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber1[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber2[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber3[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber4[strings, letters, numbers], 100000]; //
AbsoluteTiming
]


Where Method 3 = Carl's method, and Method 4 = Henrik's method.




4.79445, Null, (*Method 0*)
8.75677, Null, (*Method 1*)
2.04422, Null, (*Method 2*)
1.75599, Null, (*Method 3*)
0.900393, Null, (*Method 4*)







share|improve this question






















  • Using //AbsolutTiming suggests StringToNumber2 is the fastest. But I'm curious about 2 things: (1) Do you need to convert back - assuming this is some sort of secret decoder ring method, and (2) Does it matter if there are more than 8 (or especially more than 9) unique characters?
    – JimB
    Aug 13 at 3:10










  • @JimB, 1) No converting back needed, 2) The particular problem dictates that there will be at most 10 unique characters with a numeric value ranging from 0..9.
    – tjm167us
    Aug 13 at 3:51










  • Your test timings are about 20 times slower than mine, probably because you don't give a list of strings as the first argument of StringToNumber3 and StringToNumber4.
    – Carl Woll
    Aug 14 at 4:54










  • @CarlWoll, I do give a list of strings, but only a list of 3! The problem requires that the number mapping changes, but the letter map to stay constant and the input strings to be constant and a length of 3. Given that, are there any optimizations you can think of?
    – tjm167us
    Aug 14 at 13:30














up vote
5
down vote

favorite
1












Given:



strings = "send", "more", "money";
letterMap = Flatten[Characters[strings]] // DeleteDuplicates
(*Example: "s", "e", "n", "d", "m", "o", "r", "y"*)
numberMap = RandomSample[Range[0, 9], 8]
(*Example: 7, 3, 6, 9, 5, 0, 4, 1*)


I would like to convert these strings to integers as determined by the mapping specified by letterMap and numberMap.



For example, "send" should be equal to 7369 because s->7, e->3, n->6, d->9.



I have done this three different ways, but none of my solutions are computationally efficient enough. Can you find a more efficient method?



From Fastest to Slowest:



Using Associations and Lookup (Method 2):



StringToNumber2[str_String, letters_List, numbers_List] :=
(
Clear[rules, assoc];
rules = Thread[Rule[letters, numbers]];
assoc = Association[rules];
FromDigits[Lookup[assoc, #] & /@ Characters[str]]
)


Using Rules and ReplaceAll (can be improved a little using Thread like above) (Method 0):



StringToNumber0[str_String, letterMap_List, numberMap_List] :=
(
Clear[rules];
rules = Rule @@@ Partition[Riffle[letterMap, numberMap], 2];
FromDigits[Characters[str]] /. rules
)


Using Position and Part (Method 1):



StringToNumber1[str_String, letterMap_List, numberMap_List] :=
(
FromDigits[
numberMap[[Flatten[Position[letterMap, #]]]] & /@ Characters[str] //
Flatten]
)


Expected output, given inputs above:



7369, 5043, 50631


Most importantly, do you have any tips in improving efficiency of Mathematica code when faced with problems like this?



(Update after receiving wonderful feedback!)
Henrik's Compiled SparseArray method was definitely the quickest, although I'm wondering how much improvement of other methods could be realized by utilizing Compile!



Below was how I measured the performance of each method:



(*See how long the algorithms take to analyze 100,000 pieces of data*)
Column[Table[StringToNumber0[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber1[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber2[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber3[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber4[strings, letters, numbers], 100000]; //
AbsoluteTiming
]


Where Method 3 = Carl's method, and Method 4 = Henrik's method.




4.79445, Null, (*Method 0*)
8.75677, Null, (*Method 1*)
2.04422, Null, (*Method 2*)
1.75599, Null, (*Method 3*)
0.900393, Null, (*Method 4*)







share|improve this question






















  • Using //AbsolutTiming suggests StringToNumber2 is the fastest. But I'm curious about 2 things: (1) Do you need to convert back - assuming this is some sort of secret decoder ring method, and (2) Does it matter if there are more than 8 (or especially more than 9) unique characters?
    – JimB
    Aug 13 at 3:10










  • @JimB, 1) No converting back needed, 2) The particular problem dictates that there will be at most 10 unique characters with a numeric value ranging from 0..9.
    – tjm167us
    Aug 13 at 3:51










  • Your test timings are about 20 times slower than mine, probably because you don't give a list of strings as the first argument of StringToNumber3 and StringToNumber4.
    – Carl Woll
    Aug 14 at 4:54










  • @CarlWoll, I do give a list of strings, but only a list of 3! The problem requires that the number mapping changes, but the letter map to stay constant and the input strings to be constant and a length of 3. Given that, are there any optimizations you can think of?
    – tjm167us
    Aug 14 at 13:30












up vote
5
down vote

favorite
1









up vote
5
down vote

favorite
1






1





Given:



strings = "send", "more", "money";
letterMap = Flatten[Characters[strings]] // DeleteDuplicates
(*Example: "s", "e", "n", "d", "m", "o", "r", "y"*)
numberMap = RandomSample[Range[0, 9], 8]
(*Example: 7, 3, 6, 9, 5, 0, 4, 1*)


I would like to convert these strings to integers as determined by the mapping specified by letterMap and numberMap.



For example, "send" should be equal to 7369 because s->7, e->3, n->6, d->9.



I have done this three different ways, but none of my solutions are computationally efficient enough. Can you find a more efficient method?



From Fastest to Slowest:



Using Associations and Lookup (Method 2):



StringToNumber2[str_String, letters_List, numbers_List] :=
(
Clear[rules, assoc];
rules = Thread[Rule[letters, numbers]];
assoc = Association[rules];
FromDigits[Lookup[assoc, #] & /@ Characters[str]]
)


Using Rules and ReplaceAll (can be improved a little using Thread like above) (Method 0):



StringToNumber0[str_String, letterMap_List, numberMap_List] :=
(
Clear[rules];
rules = Rule @@@ Partition[Riffle[letterMap, numberMap], 2];
FromDigits[Characters[str]] /. rules
)


Using Position and Part (Method 1):



StringToNumber1[str_String, letterMap_List, numberMap_List] :=
(
FromDigits[
numberMap[[Flatten[Position[letterMap, #]]]] & /@ Characters[str] //
Flatten]
)


Expected output, given inputs above:



7369, 5043, 50631


Most importantly, do you have any tips in improving efficiency of Mathematica code when faced with problems like this?



(Update after receiving wonderful feedback!)
Henrik's Compiled SparseArray method was definitely the quickest, although I'm wondering how much improvement of other methods could be realized by utilizing Compile!



Below was how I measured the performance of each method:



(*See how long the algorithms take to analyze 100,000 pieces of data*)
Column[Table[StringToNumber0[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber1[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber2[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber3[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber4[strings, letters, numbers], 100000]; //
AbsoluteTiming
]


Where Method 3 = Carl's method, and Method 4 = Henrik's method.




4.79445, Null, (*Method 0*)
8.75677, Null, (*Method 1*)
2.04422, Null, (*Method 2*)
1.75599, Null, (*Method 3*)
0.900393, Null, (*Method 4*)







share|improve this question














Given:



strings = "send", "more", "money";
letterMap = Flatten[Characters[strings]] // DeleteDuplicates
(*Example: "s", "e", "n", "d", "m", "o", "r", "y"*)
numberMap = RandomSample[Range[0, 9], 8]
(*Example: 7, 3, 6, 9, 5, 0, 4, 1*)


I would like to convert these strings to integers as determined by the mapping specified by letterMap and numberMap.



For example, "send" should be equal to 7369 because s->7, e->3, n->6, d->9.



I have done this three different ways, but none of my solutions are computationally efficient enough. Can you find a more efficient method?



From Fastest to Slowest:



Using Associations and Lookup (Method 2):



StringToNumber2[str_String, letters_List, numbers_List] :=
(
Clear[rules, assoc];
rules = Thread[Rule[letters, numbers]];
assoc = Association[rules];
FromDigits[Lookup[assoc, #] & /@ Characters[str]]
)


Using Rules and ReplaceAll (can be improved a little using Thread like above) (Method 0):



StringToNumber0[str_String, letterMap_List, numberMap_List] :=
(
Clear[rules];
rules = Rule @@@ Partition[Riffle[letterMap, numberMap], 2];
FromDigits[Characters[str]] /. rules
)


Using Position and Part (Method 1):



StringToNumber1[str_String, letterMap_List, numberMap_List] :=
(
FromDigits[
numberMap[[Flatten[Position[letterMap, #]]]] & /@ Characters[str] //
Flatten]
)


Expected output, given inputs above:



7369, 5043, 50631


Most importantly, do you have any tips in improving efficiency of Mathematica code when faced with problems like this?



(Update after receiving wonderful feedback!)
Henrik's Compiled SparseArray method was definitely the quickest, although I'm wondering how much improvement of other methods could be realized by utilizing Compile!



Below was how I measured the performance of each method:



(*See how long the algorithms take to analyze 100,000 pieces of data*)
Column[Table[StringToNumber0[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber1[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber2[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber3[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber4[strings, letters, numbers], 100000]; //
AbsoluteTiming
]


Where Method 3 = Carl's method, and Method 4 = Henrik's method.




4.79445, Null, (*Method 0*)
8.75677, Null, (*Method 1*)
2.04422, Null, (*Method 2*)
1.75599, Null, (*Method 3*)
0.900393, Null, (*Method 4*)









share|improve this question













share|improve this question




share|improve this question








edited Aug 14 at 3:35

























asked Aug 13 at 2:16









tjm167us

474211




474211











  • Using //AbsolutTiming suggests StringToNumber2 is the fastest. But I'm curious about 2 things: (1) Do you need to convert back - assuming this is some sort of secret decoder ring method, and (2) Does it matter if there are more than 8 (or especially more than 9) unique characters?
    – JimB
    Aug 13 at 3:10










  • @JimB, 1) No converting back needed, 2) The particular problem dictates that there will be at most 10 unique characters with a numeric value ranging from 0..9.
    – tjm167us
    Aug 13 at 3:51










  • Your test timings are about 20 times slower than mine, probably because you don't give a list of strings as the first argument of StringToNumber3 and StringToNumber4.
    – Carl Woll
    Aug 14 at 4:54










  • @CarlWoll, I do give a list of strings, but only a list of 3! The problem requires that the number mapping changes, but the letter map to stay constant and the input strings to be constant and a length of 3. Given that, are there any optimizations you can think of?
    – tjm167us
    Aug 14 at 13:30
















  • Using //AbsolutTiming suggests StringToNumber2 is the fastest. But I'm curious about 2 things: (1) Do you need to convert back - assuming this is some sort of secret decoder ring method, and (2) Does it matter if there are more than 8 (or especially more than 9) unique characters?
    – JimB
    Aug 13 at 3:10










  • @JimB, 1) No converting back needed, 2) The particular problem dictates that there will be at most 10 unique characters with a numeric value ranging from 0..9.
    – tjm167us
    Aug 13 at 3:51










  • Your test timings are about 20 times slower than mine, probably because you don't give a list of strings as the first argument of StringToNumber3 and StringToNumber4.
    – Carl Woll
    Aug 14 at 4:54










  • @CarlWoll, I do give a list of strings, but only a list of 3! The problem requires that the number mapping changes, but the letter map to stay constant and the input strings to be constant and a length of 3. Given that, are there any optimizations you can think of?
    – tjm167us
    Aug 14 at 13:30















Using //AbsolutTiming suggests StringToNumber2 is the fastest. But I'm curious about 2 things: (1) Do you need to convert back - assuming this is some sort of secret decoder ring method, and (2) Does it matter if there are more than 8 (or especially more than 9) unique characters?
– JimB
Aug 13 at 3:10




Using //AbsolutTiming suggests StringToNumber2 is the fastest. But I'm curious about 2 things: (1) Do you need to convert back - assuming this is some sort of secret decoder ring method, and (2) Does it matter if there are more than 8 (or especially more than 9) unique characters?
– JimB
Aug 13 at 3:10












@JimB, 1) No converting back needed, 2) The particular problem dictates that there will be at most 10 unique characters with a numeric value ranging from 0..9.
– tjm167us
Aug 13 at 3:51




@JimB, 1) No converting back needed, 2) The particular problem dictates that there will be at most 10 unique characters with a numeric value ranging from 0..9.
– tjm167us
Aug 13 at 3:51












Your test timings are about 20 times slower than mine, probably because you don't give a list of strings as the first argument of StringToNumber3 and StringToNumber4.
– Carl Woll
Aug 14 at 4:54




Your test timings are about 20 times slower than mine, probably because you don't give a list of strings as the first argument of StringToNumber3 and StringToNumber4.
– Carl Woll
Aug 14 at 4:54












@CarlWoll, I do give a list of strings, but only a list of 3! The problem requires that the number mapping changes, but the letter map to stay constant and the input strings to be constant and a length of 3. Given that, are there any optimizations you can think of?
– tjm167us
Aug 14 at 13:30




@CarlWoll, I do give a list of strings, but only a list of 3! The problem requires that the number mapping changes, but the letter map to stay constant and the input strings to be constant and a length of 3. Given that, are there any optimizations you can think of?
– tjm167us
Aug 14 at 13:30










2 Answers
2






active

oldest

votes

















up vote
9
down vote



accepted










Update



The following approach is much faster when the number of words is small and the number of mappings is large:



stringsToNumberMappings[words_List, letterMap_, samples_]:=Module[

len = Length @ letterMap, lrule, wvector
,

lrule = Thread @ Rule[letterMap, Range@len];

wvector = Transpose @ Normal @ Table[
SparseArray[
Thread @ Rule[Characters[w] /. lrule, 10^Range[StringLength[w]-1, 0, -1]],
len
],
w, words
];

samples . wvector
]


Comparison with previous answer:



words = "send", "more", "money";
letterMap = Flatten[Characters[words]] // DeleteDuplicates;
samples = Table[RandomSample[0 ;; 9, 8], 10^5];

r1 = stringsToNumberMappings[words, letterMap, samples]; //AbsoluteTiming
r2 = StringsToNumbers[words, letterMap, #]& /@ samples; //AbsoluteTiming

r1===r2



0.010555, Null



2.2923, Null



True




Original answer



I assume the computational inefficiency you're encountering is performing your replacement on lots of words? In that case, you could make use of the Listable attribute of StringReplace:



StringsToNumbers[str_List, letters_List, numbers_List] := With[
d = Thread[letters -> ToString /@ numbers],

FromDigits /@ StringReplace[str, d]
]


Simple example:



words = "send", "more", "money";
StringsToNumbers[words, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]



7369, 5043, 50631




Bigger example with a million words:



big = RandomChoice[words, 10^6];
StringsToNumbers[big, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]; //AbsoluteTiming



0.908794, Null







share|improve this answer






















  • Curiously enough, stringsToNumberMappings yielded worse results (5.7s for 100,000 iterations). The problem requires a stringList of three words, a letterMap, and a numberMap. The letterMap are the unique characters in the stringList (must be 10 or less), and the numberMap is a list of n numbers, were n is the number of unique characters in stringList (Length[letterMap]). This mapping is performed with the stringList and letterMap constant, with the number map varying through all permutations of n numbers. To approximate this, I used Table to perform 100,000 calculations.
    – tjm167us
    Aug 16 at 0:43










  • @tjm167us You're comparing the wrong thing. If permutations is your list of 100000 permutations, then you should use stringsToNumberMappings[words, letterMap, permutations] and not Table[stringsToNumberMappings[words, letterMap, p], p, permutations]. I did not optimize the speed of creating wvector because there are only 3 words, and I only need to create wvector once. If you use Table, though, wvector gets created 100000 times, which is slow.
    – Carl Woll
    Aug 16 at 1:09

















up vote
7
down vote













Recently, I found out for myself that ToCharacterCode is very efficient in transforming strings to numbers. In particular, ToCharacterCode turns strings into a packed arrays which is very good for performance.



Here the preparation (stealing a bit from Carl Woll).



SeedRandom[123];
words = "send", "more", "money";
letterMap = DeleteDuplicates@Flatten[Characters[strings]];
numberMap = RandomSample[Range[0, 9], 8];
big = RandomChoice[words, 10^6];


We create a packed array as lookup table via SparseArray and perform the actual lookup together with conversion to numbers in one go with the following compiled function



lookup = Compile[lookuptable, _Integer, 1, idx, _Integer, 1,
Sum[10^(Length[idx] - i) Compile`GetElement[lookuptable,
Compile`GetElement[idx, i]], i, 1, Length[idx]],
CompilationTarget -> "C",
RuntimeAttributes -> Listable,
Parallelization -> True,
RuntimeOptions -> "Speed"
];

StringsToNumbers2[strings_, letterMap_, numberMap_] := lookup[
Normal[SparseArray[ToCharacterCode[letterMap] -> numberMap]],
ToCharacterCode[strings]
];


On my machine, method is approximately twice as fast as using StringReplace and FromDigits:



result1 = StringsToNumbers[big, letterMap, numberMap]; // RepeatedTiming // First
result2 = StringsToNumbers2[big, letterMap, numberMap]; // RepeatedTiming // First
result1 === result2



0.810



0.40



True







share|improve this answer






















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "387"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f179936%2fhow-can-i-find-a-more-efficient-solution-to-mapping-characters-to-digits%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    9
    down vote



    accepted










    Update



    The following approach is much faster when the number of words is small and the number of mappings is large:



    stringsToNumberMappings[words_List, letterMap_, samples_]:=Module[

    len = Length @ letterMap, lrule, wvector
    ,

    lrule = Thread @ Rule[letterMap, Range@len];

    wvector = Transpose @ Normal @ Table[
    SparseArray[
    Thread @ Rule[Characters[w] /. lrule, 10^Range[StringLength[w]-1, 0, -1]],
    len
    ],
    w, words
    ];

    samples . wvector
    ]


    Comparison with previous answer:



    words = "send", "more", "money";
    letterMap = Flatten[Characters[words]] // DeleteDuplicates;
    samples = Table[RandomSample[0 ;; 9, 8], 10^5];

    r1 = stringsToNumberMappings[words, letterMap, samples]; //AbsoluteTiming
    r2 = StringsToNumbers[words, letterMap, #]& /@ samples; //AbsoluteTiming

    r1===r2



    0.010555, Null



    2.2923, Null



    True




    Original answer



    I assume the computational inefficiency you're encountering is performing your replacement on lots of words? In that case, you could make use of the Listable attribute of StringReplace:



    StringsToNumbers[str_List, letters_List, numbers_List] := With[
    d = Thread[letters -> ToString /@ numbers],

    FromDigits /@ StringReplace[str, d]
    ]


    Simple example:



    words = "send", "more", "money";
    StringsToNumbers[words, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]



    7369, 5043, 50631




    Bigger example with a million words:



    big = RandomChoice[words, 10^6];
    StringsToNumbers[big, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]; //AbsoluteTiming



    0.908794, Null







    share|improve this answer






















    • Curiously enough, stringsToNumberMappings yielded worse results (5.7s for 100,000 iterations). The problem requires a stringList of three words, a letterMap, and a numberMap. The letterMap are the unique characters in the stringList (must be 10 or less), and the numberMap is a list of n numbers, were n is the number of unique characters in stringList (Length[letterMap]). This mapping is performed with the stringList and letterMap constant, with the number map varying through all permutations of n numbers. To approximate this, I used Table to perform 100,000 calculations.
      – tjm167us
      Aug 16 at 0:43










    • @tjm167us You're comparing the wrong thing. If permutations is your list of 100000 permutations, then you should use stringsToNumberMappings[words, letterMap, permutations] and not Table[stringsToNumberMappings[words, letterMap, p], p, permutations]. I did not optimize the speed of creating wvector because there are only 3 words, and I only need to create wvector once. If you use Table, though, wvector gets created 100000 times, which is slow.
      – Carl Woll
      Aug 16 at 1:09














    up vote
    9
    down vote



    accepted










    Update



    The following approach is much faster when the number of words is small and the number of mappings is large:



    stringsToNumberMappings[words_List, letterMap_, samples_]:=Module[

    len = Length @ letterMap, lrule, wvector
    ,

    lrule = Thread @ Rule[letterMap, Range@len];

    wvector = Transpose @ Normal @ Table[
    SparseArray[
    Thread @ Rule[Characters[w] /. lrule, 10^Range[StringLength[w]-1, 0, -1]],
    len
    ],
    w, words
    ];

    samples . wvector
    ]


    Comparison with previous answer:



    words = "send", "more", "money";
    letterMap = Flatten[Characters[words]] // DeleteDuplicates;
    samples = Table[RandomSample[0 ;; 9, 8], 10^5];

    r1 = stringsToNumberMappings[words, letterMap, samples]; //AbsoluteTiming
    r2 = StringsToNumbers[words, letterMap, #]& /@ samples; //AbsoluteTiming

    r1===r2



    0.010555, Null



    2.2923, Null



    True




    Original answer



    I assume the computational inefficiency you're encountering is performing your replacement on lots of words? In that case, you could make use of the Listable attribute of StringReplace:



    StringsToNumbers[str_List, letters_List, numbers_List] := With[
    d = Thread[letters -> ToString /@ numbers],

    FromDigits /@ StringReplace[str, d]
    ]


    Simple example:



    words = "send", "more", "money";
    StringsToNumbers[words, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]



    7369, 5043, 50631




    Bigger example with a million words:



    big = RandomChoice[words, 10^6];
    StringsToNumbers[big, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]; //AbsoluteTiming



    0.908794, Null







    share|improve this answer






















    • Curiously enough, stringsToNumberMappings yielded worse results (5.7s for 100,000 iterations). The problem requires a stringList of three words, a letterMap, and a numberMap. The letterMap are the unique characters in the stringList (must be 10 or less), and the numberMap is a list of n numbers, were n is the number of unique characters in stringList (Length[letterMap]). This mapping is performed with the stringList and letterMap constant, with the number map varying through all permutations of n numbers. To approximate this, I used Table to perform 100,000 calculations.
      – tjm167us
      Aug 16 at 0:43










    • @tjm167us You're comparing the wrong thing. If permutations is your list of 100000 permutations, then you should use stringsToNumberMappings[words, letterMap, permutations] and not Table[stringsToNumberMappings[words, letterMap, p], p, permutations]. I did not optimize the speed of creating wvector because there are only 3 words, and I only need to create wvector once. If you use Table, though, wvector gets created 100000 times, which is slow.
      – Carl Woll
      Aug 16 at 1:09












    up vote
    9
    down vote



    accepted







    up vote
    9
    down vote



    accepted






    Update



    The following approach is much faster when the number of words is small and the number of mappings is large:



    stringsToNumberMappings[words_List, letterMap_, samples_]:=Module[

    len = Length @ letterMap, lrule, wvector
    ,

    lrule = Thread @ Rule[letterMap, Range@len];

    wvector = Transpose @ Normal @ Table[
    SparseArray[
    Thread @ Rule[Characters[w] /. lrule, 10^Range[StringLength[w]-1, 0, -1]],
    len
    ],
    w, words
    ];

    samples . wvector
    ]


    Comparison with previous answer:



    words = "send", "more", "money";
    letterMap = Flatten[Characters[words]] // DeleteDuplicates;
    samples = Table[RandomSample[0 ;; 9, 8], 10^5];

    r1 = stringsToNumberMappings[words, letterMap, samples]; //AbsoluteTiming
    r2 = StringsToNumbers[words, letterMap, #]& /@ samples; //AbsoluteTiming

    r1===r2



    0.010555, Null



    2.2923, Null



    True




    Original answer



    I assume the computational inefficiency you're encountering is performing your replacement on lots of words? In that case, you could make use of the Listable attribute of StringReplace:



    StringsToNumbers[str_List, letters_List, numbers_List] := With[
    d = Thread[letters -> ToString /@ numbers],

    FromDigits /@ StringReplace[str, d]
    ]


    Simple example:



    words = "send", "more", "money";
    StringsToNumbers[words, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]



    7369, 5043, 50631




    Bigger example with a million words:



    big = RandomChoice[words, 10^6];
    StringsToNumbers[big, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]; //AbsoluteTiming



    0.908794, Null







    share|improve this answer














    Update



    The following approach is much faster when the number of words is small and the number of mappings is large:



    stringsToNumberMappings[words_List, letterMap_, samples_]:=Module[

    len = Length @ letterMap, lrule, wvector
    ,

    lrule = Thread @ Rule[letterMap, Range@len];

    wvector = Transpose @ Normal @ Table[
    SparseArray[
    Thread @ Rule[Characters[w] /. lrule, 10^Range[StringLength[w]-1, 0, -1]],
    len
    ],
    w, words
    ];

    samples . wvector
    ]


    Comparison with previous answer:



    words = "send", "more", "money";
    letterMap = Flatten[Characters[words]] // DeleteDuplicates;
    samples = Table[RandomSample[0 ;; 9, 8], 10^5];

    r1 = stringsToNumberMappings[words, letterMap, samples]; //AbsoluteTiming
    r2 = StringsToNumbers[words, letterMap, #]& /@ samples; //AbsoluteTiming

    r1===r2



    0.010555, Null



    2.2923, Null



    True




    Original answer



    I assume the computational inefficiency you're encountering is performing your replacement on lots of words? In that case, you could make use of the Listable attribute of StringReplace:



    StringsToNumbers[str_List, letters_List, numbers_List] := With[
    d = Thread[letters -> ToString /@ numbers],

    FromDigits /@ StringReplace[str, d]
    ]


    Simple example:



    words = "send", "more", "money";
    StringsToNumbers[words, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]



    7369, 5043, 50631




    Bigger example with a million words:



    big = RandomChoice[words, 10^6];
    StringsToNumbers[big, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]; //AbsoluteTiming



    0.908794, Null








    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Aug 14 at 19:38

























    answered Aug 13 at 3:11









    Carl Woll

    55.4k271144




    55.4k271144











    • Curiously enough, stringsToNumberMappings yielded worse results (5.7s for 100,000 iterations). The problem requires a stringList of three words, a letterMap, and a numberMap. The letterMap are the unique characters in the stringList (must be 10 or less), and the numberMap is a list of n numbers, were n is the number of unique characters in stringList (Length[letterMap]). This mapping is performed with the stringList and letterMap constant, with the number map varying through all permutations of n numbers. To approximate this, I used Table to perform 100,000 calculations.
      – tjm167us
      Aug 16 at 0:43










    • @tjm167us You're comparing the wrong thing. If permutations is your list of 100000 permutations, then you should use stringsToNumberMappings[words, letterMap, permutations] and not Table[stringsToNumberMappings[words, letterMap, p], p, permutations]. I did not optimize the speed of creating wvector because there are only 3 words, and I only need to create wvector once. If you use Table, though, wvector gets created 100000 times, which is slow.
      – Carl Woll
      Aug 16 at 1:09
















    • Curiously enough, stringsToNumberMappings yielded worse results (5.7s for 100,000 iterations). The problem requires a stringList of three words, a letterMap, and a numberMap. The letterMap are the unique characters in the stringList (must be 10 or less), and the numberMap is a list of n numbers, were n is the number of unique characters in stringList (Length[letterMap]). This mapping is performed with the stringList and letterMap constant, with the number map varying through all permutations of n numbers. To approximate this, I used Table to perform 100,000 calculations.
      – tjm167us
      Aug 16 at 0:43










    • @tjm167us You're comparing the wrong thing. If permutations is your list of 100000 permutations, then you should use stringsToNumberMappings[words, letterMap, permutations] and not Table[stringsToNumberMappings[words, letterMap, p], p, permutations]. I did not optimize the speed of creating wvector because there are only 3 words, and I only need to create wvector once. If you use Table, though, wvector gets created 100000 times, which is slow.
      – Carl Woll
      Aug 16 at 1:09















    Curiously enough, stringsToNumberMappings yielded worse results (5.7s for 100,000 iterations). The problem requires a stringList of three words, a letterMap, and a numberMap. The letterMap are the unique characters in the stringList (must be 10 or less), and the numberMap is a list of n numbers, were n is the number of unique characters in stringList (Length[letterMap]). This mapping is performed with the stringList and letterMap constant, with the number map varying through all permutations of n numbers. To approximate this, I used Table to perform 100,000 calculations.
    – tjm167us
    Aug 16 at 0:43




    Curiously enough, stringsToNumberMappings yielded worse results (5.7s for 100,000 iterations). The problem requires a stringList of three words, a letterMap, and a numberMap. The letterMap are the unique characters in the stringList (must be 10 or less), and the numberMap is a list of n numbers, were n is the number of unique characters in stringList (Length[letterMap]). This mapping is performed with the stringList and letterMap constant, with the number map varying through all permutations of n numbers. To approximate this, I used Table to perform 100,000 calculations.
    – tjm167us
    Aug 16 at 0:43












    @tjm167us You're comparing the wrong thing. If permutations is your list of 100000 permutations, then you should use stringsToNumberMappings[words, letterMap, permutations] and not Table[stringsToNumberMappings[words, letterMap, p], p, permutations]. I did not optimize the speed of creating wvector because there are only 3 words, and I only need to create wvector once. If you use Table, though, wvector gets created 100000 times, which is slow.
    – Carl Woll
    Aug 16 at 1:09




    @tjm167us You're comparing the wrong thing. If permutations is your list of 100000 permutations, then you should use stringsToNumberMappings[words, letterMap, permutations] and not Table[stringsToNumberMappings[words, letterMap, p], p, permutations]. I did not optimize the speed of creating wvector because there are only 3 words, and I only need to create wvector once. If you use Table, though, wvector gets created 100000 times, which is slow.
    – Carl Woll
    Aug 16 at 1:09










    up vote
    7
    down vote













    Recently, I found out for myself that ToCharacterCode is very efficient in transforming strings to numbers. In particular, ToCharacterCode turns strings into a packed arrays which is very good for performance.



    Here the preparation (stealing a bit from Carl Woll).



    SeedRandom[123];
    words = "send", "more", "money";
    letterMap = DeleteDuplicates@Flatten[Characters[strings]];
    numberMap = RandomSample[Range[0, 9], 8];
    big = RandomChoice[words, 10^6];


    We create a packed array as lookup table via SparseArray and perform the actual lookup together with conversion to numbers in one go with the following compiled function



    lookup = Compile[lookuptable, _Integer, 1, idx, _Integer, 1,
    Sum[10^(Length[idx] - i) Compile`GetElement[lookuptable,
    Compile`GetElement[idx, i]], i, 1, Length[idx]],
    CompilationTarget -> "C",
    RuntimeAttributes -> Listable,
    Parallelization -> True,
    RuntimeOptions -> "Speed"
    ];

    StringsToNumbers2[strings_, letterMap_, numberMap_] := lookup[
    Normal[SparseArray[ToCharacterCode[letterMap] -> numberMap]],
    ToCharacterCode[strings]
    ];


    On my machine, method is approximately twice as fast as using StringReplace and FromDigits:



    result1 = StringsToNumbers[big, letterMap, numberMap]; // RepeatedTiming // First
    result2 = StringsToNumbers2[big, letterMap, numberMap]; // RepeatedTiming // First
    result1 === result2



    0.810



    0.40



    True







    share|improve this answer


























      up vote
      7
      down vote













      Recently, I found out for myself that ToCharacterCode is very efficient in transforming strings to numbers. In particular, ToCharacterCode turns strings into a packed arrays which is very good for performance.



      Here the preparation (stealing a bit from Carl Woll).



      SeedRandom[123];
      words = "send", "more", "money";
      letterMap = DeleteDuplicates@Flatten[Characters[strings]];
      numberMap = RandomSample[Range[0, 9], 8];
      big = RandomChoice[words, 10^6];


      We create a packed array as lookup table via SparseArray and perform the actual lookup together with conversion to numbers in one go with the following compiled function



      lookup = Compile[lookuptable, _Integer, 1, idx, _Integer, 1,
      Sum[10^(Length[idx] - i) Compile`GetElement[lookuptable,
      Compile`GetElement[idx, i]], i, 1, Length[idx]],
      CompilationTarget -> "C",
      RuntimeAttributes -> Listable,
      Parallelization -> True,
      RuntimeOptions -> "Speed"
      ];

      StringsToNumbers2[strings_, letterMap_, numberMap_] := lookup[
      Normal[SparseArray[ToCharacterCode[letterMap] -> numberMap]],
      ToCharacterCode[strings]
      ];


      On my machine, method is approximately twice as fast as using StringReplace and FromDigits:



      result1 = StringsToNumbers[big, letterMap, numberMap]; // RepeatedTiming // First
      result2 = StringsToNumbers2[big, letterMap, numberMap]; // RepeatedTiming // First
      result1 === result2



      0.810



      0.40



      True







      share|improve this answer
























        up vote
        7
        down vote










        up vote
        7
        down vote









        Recently, I found out for myself that ToCharacterCode is very efficient in transforming strings to numbers. In particular, ToCharacterCode turns strings into a packed arrays which is very good for performance.



        Here the preparation (stealing a bit from Carl Woll).



        SeedRandom[123];
        words = "send", "more", "money";
        letterMap = DeleteDuplicates@Flatten[Characters[strings]];
        numberMap = RandomSample[Range[0, 9], 8];
        big = RandomChoice[words, 10^6];


        We create a packed array as lookup table via SparseArray and perform the actual lookup together with conversion to numbers in one go with the following compiled function



        lookup = Compile[lookuptable, _Integer, 1, idx, _Integer, 1,
        Sum[10^(Length[idx] - i) Compile`GetElement[lookuptable,
        Compile`GetElement[idx, i]], i, 1, Length[idx]],
        CompilationTarget -> "C",
        RuntimeAttributes -> Listable,
        Parallelization -> True,
        RuntimeOptions -> "Speed"
        ];

        StringsToNumbers2[strings_, letterMap_, numberMap_] := lookup[
        Normal[SparseArray[ToCharacterCode[letterMap] -> numberMap]],
        ToCharacterCode[strings]
        ];


        On my machine, method is approximately twice as fast as using StringReplace and FromDigits:



        result1 = StringsToNumbers[big, letterMap, numberMap]; // RepeatedTiming // First
        result2 = StringsToNumbers2[big, letterMap, numberMap]; // RepeatedTiming // First
        result1 === result2



        0.810



        0.40



        True







        share|improve this answer














        Recently, I found out for myself that ToCharacterCode is very efficient in transforming strings to numbers. In particular, ToCharacterCode turns strings into a packed arrays which is very good for performance.



        Here the preparation (stealing a bit from Carl Woll).



        SeedRandom[123];
        words = "send", "more", "money";
        letterMap = DeleteDuplicates@Flatten[Characters[strings]];
        numberMap = RandomSample[Range[0, 9], 8];
        big = RandomChoice[words, 10^6];


        We create a packed array as lookup table via SparseArray and perform the actual lookup together with conversion to numbers in one go with the following compiled function



        lookup = Compile[lookuptable, _Integer, 1, idx, _Integer, 1,
        Sum[10^(Length[idx] - i) Compile`GetElement[lookuptable,
        Compile`GetElement[idx, i]], i, 1, Length[idx]],
        CompilationTarget -> "C",
        RuntimeAttributes -> Listable,
        Parallelization -> True,
        RuntimeOptions -> "Speed"
        ];

        StringsToNumbers2[strings_, letterMap_, numberMap_] := lookup[
        Normal[SparseArray[ToCharacterCode[letterMap] -> numberMap]],
        ToCharacterCode[strings]
        ];


        On my machine, method is approximately twice as fast as using StringReplace and FromDigits:



        result1 = StringsToNumbers[big, letterMap, numberMap]; // RepeatedTiming // First
        result2 = StringsToNumbers2[big, letterMap, numberMap]; // RepeatedTiming // First
        result1 === result2



        0.810



        0.40



        True








        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Aug 13 at 7:58

























        answered Aug 13 at 7:17









        Henrik Schumacher

        36k249102




        36k249102



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f179936%2fhow-can-i-find-a-more-efficient-solution-to-mapping-characters-to-digits%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            What does second last employer means? [closed]

            List of Gilmore Girls characters

            Confectionery