How can I find a more efficient solution to mapping characters to digits?
Clash Royale CLAN TAG#URR8PPP
up vote
5
down vote
favorite
Given:
strings = "send", "more", "money";
letterMap = Flatten[Characters[strings]] // DeleteDuplicates
(*Example: "s", "e", "n", "d", "m", "o", "r", "y"*)
numberMap = RandomSample[Range[0, 9], 8]
(*Example: 7, 3, 6, 9, 5, 0, 4, 1*)
I would like to convert these strings to integers as determined by the mapping specified by letterMap and numberMap.
For example, "send" should be equal to 7369 because s->7, e->3, n->6, d->9.
I have done this three different ways, but none of my solutions are computationally efficient enough. Can you find a more efficient method?
From Fastest to Slowest:
Using Associations and Lookup (Method 2):
StringToNumber2[str_String, letters_List, numbers_List] :=
(
Clear[rules, assoc];
rules = Thread[Rule[letters, numbers]];
assoc = Association[rules];
FromDigits[Lookup[assoc, #] & /@ Characters[str]]
)
Using Rules and ReplaceAll (can be improved a little using Thread like above) (Method 0):
StringToNumber0[str_String, letterMap_List, numberMap_List] :=
(
Clear[rules];
rules = Rule @@@ Partition[Riffle[letterMap, numberMap], 2];
FromDigits[Characters[str]] /. rules
)
Using Position and Part (Method 1):
StringToNumber1[str_String, letterMap_List, numberMap_List] :=
(
FromDigits[
numberMap[[Flatten[Position[letterMap, #]]]] & /@ Characters[str] //
Flatten]
)
Expected output, given inputs above:
7369, 5043, 50631
Most importantly, do you have any tips in improving efficiency of Mathematica code when faced with problems like this?
(Update after receiving wonderful feedback!)
Henrik's Compiled SparseArray method was definitely the quickest, although I'm wondering how much improvement of other methods could be realized by utilizing Compile!
Below was how I measured the performance of each method:
(*See how long the algorithms take to analyze 100,000 pieces of data*)
Column[Table[StringToNumber0[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber1[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber2[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber3[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber4[strings, letters, numbers], 100000]; //
AbsoluteTiming
]
Where Method 3 = Carl's method, and Method 4 = Henrik's method.
4.79445, Null, (*Method 0*)
8.75677, Null, (*Method 1*)
2.04422, Null, (*Method 2*)
1.75599, Null, (*Method 3*)
0.900393, Null, (*Method 4*)
performance-tuning functional-style
add a comment |Â
up vote
5
down vote
favorite
Given:
strings = "send", "more", "money";
letterMap = Flatten[Characters[strings]] // DeleteDuplicates
(*Example: "s", "e", "n", "d", "m", "o", "r", "y"*)
numberMap = RandomSample[Range[0, 9], 8]
(*Example: 7, 3, 6, 9, 5, 0, 4, 1*)
I would like to convert these strings to integers as determined by the mapping specified by letterMap and numberMap.
For example, "send" should be equal to 7369 because s->7, e->3, n->6, d->9.
I have done this three different ways, but none of my solutions are computationally efficient enough. Can you find a more efficient method?
From Fastest to Slowest:
Using Associations and Lookup (Method 2):
StringToNumber2[str_String, letters_List, numbers_List] :=
(
Clear[rules, assoc];
rules = Thread[Rule[letters, numbers]];
assoc = Association[rules];
FromDigits[Lookup[assoc, #] & /@ Characters[str]]
)
Using Rules and ReplaceAll (can be improved a little using Thread like above) (Method 0):
StringToNumber0[str_String, letterMap_List, numberMap_List] :=
(
Clear[rules];
rules = Rule @@@ Partition[Riffle[letterMap, numberMap], 2];
FromDigits[Characters[str]] /. rules
)
Using Position and Part (Method 1):
StringToNumber1[str_String, letterMap_List, numberMap_List] :=
(
FromDigits[
numberMap[[Flatten[Position[letterMap, #]]]] & /@ Characters[str] //
Flatten]
)
Expected output, given inputs above:
7369, 5043, 50631
Most importantly, do you have any tips in improving efficiency of Mathematica code when faced with problems like this?
(Update after receiving wonderful feedback!)
Henrik's Compiled SparseArray method was definitely the quickest, although I'm wondering how much improvement of other methods could be realized by utilizing Compile!
Below was how I measured the performance of each method:
(*See how long the algorithms take to analyze 100,000 pieces of data*)
Column[Table[StringToNumber0[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber1[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber2[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber3[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber4[strings, letters, numbers], 100000]; //
AbsoluteTiming
]
Where Method 3 = Carl's method, and Method 4 = Henrik's method.
4.79445, Null, (*Method 0*)
8.75677, Null, (*Method 1*)
2.04422, Null, (*Method 2*)
1.75599, Null, (*Method 3*)
0.900393, Null, (*Method 4*)
performance-tuning functional-style
Using//AbsolutTiming
suggestsStringToNumber2
is the fastest. But I'm curious about 2 things: (1) Do you need to convert back - assuming this is some sort of secret decoder ring method, and (2) Does it matter if there are more than 8 (or especially more than 9) unique characters?
– JimB
Aug 13 at 3:10
@JimB, 1) No converting back needed, 2) The particular problem dictates that there will be at most 10 unique characters with a numeric value ranging from 0..9.
– tjm167us
Aug 13 at 3:51
Your test timings are about 20 times slower than mine, probably because you don't give a list of strings as the first argument ofStringToNumber3
andStringToNumber4
.
– Carl Woll
Aug 14 at 4:54
@CarlWoll, I do give a list of strings, but only a list of 3! The problem requires that the number mapping changes, but the letter map to stay constant and the input strings to be constant and a length of 3. Given that, are there any optimizations you can think of?
– tjm167us
Aug 14 at 13:30
add a comment |Â
up vote
5
down vote
favorite
up vote
5
down vote
favorite
Given:
strings = "send", "more", "money";
letterMap = Flatten[Characters[strings]] // DeleteDuplicates
(*Example: "s", "e", "n", "d", "m", "o", "r", "y"*)
numberMap = RandomSample[Range[0, 9], 8]
(*Example: 7, 3, 6, 9, 5, 0, 4, 1*)
I would like to convert these strings to integers as determined by the mapping specified by letterMap and numberMap.
For example, "send" should be equal to 7369 because s->7, e->3, n->6, d->9.
I have done this three different ways, but none of my solutions are computationally efficient enough. Can you find a more efficient method?
From Fastest to Slowest:
Using Associations and Lookup (Method 2):
StringToNumber2[str_String, letters_List, numbers_List] :=
(
Clear[rules, assoc];
rules = Thread[Rule[letters, numbers]];
assoc = Association[rules];
FromDigits[Lookup[assoc, #] & /@ Characters[str]]
)
Using Rules and ReplaceAll (can be improved a little using Thread like above) (Method 0):
StringToNumber0[str_String, letterMap_List, numberMap_List] :=
(
Clear[rules];
rules = Rule @@@ Partition[Riffle[letterMap, numberMap], 2];
FromDigits[Characters[str]] /. rules
)
Using Position and Part (Method 1):
StringToNumber1[str_String, letterMap_List, numberMap_List] :=
(
FromDigits[
numberMap[[Flatten[Position[letterMap, #]]]] & /@ Characters[str] //
Flatten]
)
Expected output, given inputs above:
7369, 5043, 50631
Most importantly, do you have any tips in improving efficiency of Mathematica code when faced with problems like this?
(Update after receiving wonderful feedback!)
Henrik's Compiled SparseArray method was definitely the quickest, although I'm wondering how much improvement of other methods could be realized by utilizing Compile!
Below was how I measured the performance of each method:
(*See how long the algorithms take to analyze 100,000 pieces of data*)
Column[Table[StringToNumber0[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber1[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber2[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber3[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber4[strings, letters, numbers], 100000]; //
AbsoluteTiming
]
Where Method 3 = Carl's method, and Method 4 = Henrik's method.
4.79445, Null, (*Method 0*)
8.75677, Null, (*Method 1*)
2.04422, Null, (*Method 2*)
1.75599, Null, (*Method 3*)
0.900393, Null, (*Method 4*)
performance-tuning functional-style
Given:
strings = "send", "more", "money";
letterMap = Flatten[Characters[strings]] // DeleteDuplicates
(*Example: "s", "e", "n", "d", "m", "o", "r", "y"*)
numberMap = RandomSample[Range[0, 9], 8]
(*Example: 7, 3, 6, 9, 5, 0, 4, 1*)
I would like to convert these strings to integers as determined by the mapping specified by letterMap and numberMap.
For example, "send" should be equal to 7369 because s->7, e->3, n->6, d->9.
I have done this three different ways, but none of my solutions are computationally efficient enough. Can you find a more efficient method?
From Fastest to Slowest:
Using Associations and Lookup (Method 2):
StringToNumber2[str_String, letters_List, numbers_List] :=
(
Clear[rules, assoc];
rules = Thread[Rule[letters, numbers]];
assoc = Association[rules];
FromDigits[Lookup[assoc, #] & /@ Characters[str]]
)
Using Rules and ReplaceAll (can be improved a little using Thread like above) (Method 0):
StringToNumber0[str_String, letterMap_List, numberMap_List] :=
(
Clear[rules];
rules = Rule @@@ Partition[Riffle[letterMap, numberMap], 2];
FromDigits[Characters[str]] /. rules
)
Using Position and Part (Method 1):
StringToNumber1[str_String, letterMap_List, numberMap_List] :=
(
FromDigits[
numberMap[[Flatten[Position[letterMap, #]]]] & /@ Characters[str] //
Flatten]
)
Expected output, given inputs above:
7369, 5043, 50631
Most importantly, do you have any tips in improving efficiency of Mathematica code when faced with problems like this?
(Update after receiving wonderful feedback!)
Henrik's Compiled SparseArray method was definitely the quickest, although I'm wondering how much improvement of other methods could be realized by utilizing Compile!
Below was how I measured the performance of each method:
(*See how long the algorithms take to analyze 100,000 pieces of data*)
Column[Table[StringToNumber0[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber1[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber2[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber3[strings, letters, numbers], 100000]; //
AbsoluteTiming,
Table[StringToNumber4[strings, letters, numbers], 100000]; //
AbsoluteTiming
]
Where Method 3 = Carl's method, and Method 4 = Henrik's method.
4.79445, Null, (*Method 0*)
8.75677, Null, (*Method 1*)
2.04422, Null, (*Method 2*)
1.75599, Null, (*Method 3*)
0.900393, Null, (*Method 4*)
performance-tuning functional-style
edited Aug 14 at 3:35
asked Aug 13 at 2:16
tjm167us
474211
474211
Using//AbsolutTiming
suggestsStringToNumber2
is the fastest. But I'm curious about 2 things: (1) Do you need to convert back - assuming this is some sort of secret decoder ring method, and (2) Does it matter if there are more than 8 (or especially more than 9) unique characters?
– JimB
Aug 13 at 3:10
@JimB, 1) No converting back needed, 2) The particular problem dictates that there will be at most 10 unique characters with a numeric value ranging from 0..9.
– tjm167us
Aug 13 at 3:51
Your test timings are about 20 times slower than mine, probably because you don't give a list of strings as the first argument ofStringToNumber3
andStringToNumber4
.
– Carl Woll
Aug 14 at 4:54
@CarlWoll, I do give a list of strings, but only a list of 3! The problem requires that the number mapping changes, but the letter map to stay constant and the input strings to be constant and a length of 3. Given that, are there any optimizations you can think of?
– tjm167us
Aug 14 at 13:30
add a comment |Â
Using//AbsolutTiming
suggestsStringToNumber2
is the fastest. But I'm curious about 2 things: (1) Do you need to convert back - assuming this is some sort of secret decoder ring method, and (2) Does it matter if there are more than 8 (or especially more than 9) unique characters?
– JimB
Aug 13 at 3:10
@JimB, 1) No converting back needed, 2) The particular problem dictates that there will be at most 10 unique characters with a numeric value ranging from 0..9.
– tjm167us
Aug 13 at 3:51
Your test timings are about 20 times slower than mine, probably because you don't give a list of strings as the first argument ofStringToNumber3
andStringToNumber4
.
– Carl Woll
Aug 14 at 4:54
@CarlWoll, I do give a list of strings, but only a list of 3! The problem requires that the number mapping changes, but the letter map to stay constant and the input strings to be constant and a length of 3. Given that, are there any optimizations you can think of?
– tjm167us
Aug 14 at 13:30
Using
//AbsolutTiming
suggests StringToNumber2
is the fastest. But I'm curious about 2 things: (1) Do you need to convert back - assuming this is some sort of secret decoder ring method, and (2) Does it matter if there are more than 8 (or especially more than 9) unique characters?– JimB
Aug 13 at 3:10
Using
//AbsolutTiming
suggests StringToNumber2
is the fastest. But I'm curious about 2 things: (1) Do you need to convert back - assuming this is some sort of secret decoder ring method, and (2) Does it matter if there are more than 8 (or especially more than 9) unique characters?– JimB
Aug 13 at 3:10
@JimB, 1) No converting back needed, 2) The particular problem dictates that there will be at most 10 unique characters with a numeric value ranging from 0..9.
– tjm167us
Aug 13 at 3:51
@JimB, 1) No converting back needed, 2) The particular problem dictates that there will be at most 10 unique characters with a numeric value ranging from 0..9.
– tjm167us
Aug 13 at 3:51
Your test timings are about 20 times slower than mine, probably because you don't give a list of strings as the first argument of
StringToNumber3
and StringToNumber4
.– Carl Woll
Aug 14 at 4:54
Your test timings are about 20 times slower than mine, probably because you don't give a list of strings as the first argument of
StringToNumber3
and StringToNumber4
.– Carl Woll
Aug 14 at 4:54
@CarlWoll, I do give a list of strings, but only a list of 3! The problem requires that the number mapping changes, but the letter map to stay constant and the input strings to be constant and a length of 3. Given that, are there any optimizations you can think of?
– tjm167us
Aug 14 at 13:30
@CarlWoll, I do give a list of strings, but only a list of 3! The problem requires that the number mapping changes, but the letter map to stay constant and the input strings to be constant and a length of 3. Given that, are there any optimizations you can think of?
– tjm167us
Aug 14 at 13:30
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
9
down vote
accepted
Update
The following approach is much faster when the number of words is small and the number of mappings is large:
stringsToNumberMappings[words_List, letterMap_, samples_]:=Module[
len = Length @ letterMap, lrule, wvector
,
lrule = Thread @ Rule[letterMap, Range@len];
wvector = Transpose @ Normal @ Table[
SparseArray[
Thread @ Rule[Characters[w] /. lrule, 10^Range[StringLength[w]-1, 0, -1]],
len
],
w, words
];
samples . wvector
]
Comparison with previous answer:
words = "send", "more", "money";
letterMap = Flatten[Characters[words]] // DeleteDuplicates;
samples = Table[RandomSample[0 ;; 9, 8], 10^5];
r1 = stringsToNumberMappings[words, letterMap, samples]; //AbsoluteTiming
r2 = StringsToNumbers[words, letterMap, #]& /@ samples; //AbsoluteTiming
r1===r2
0.010555, Null
2.2923, Null
True
Original answer
I assume the computational inefficiency you're encountering is performing your replacement on lots of words? In that case, you could make use of the Listable
attribute of StringReplace
:
StringsToNumbers[str_List, letters_List, numbers_List] := With[
d = Thread[letters -> ToString /@ numbers],
FromDigits /@ StringReplace[str, d]
]
Simple example:
words = "send", "more", "money";
StringsToNumbers[words, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]
7369, 5043, 50631
Bigger example with a million words:
big = RandomChoice[words, 10^6];
StringsToNumbers[big, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]; //AbsoluteTiming
0.908794, Null
Curiously enough, stringsToNumberMappings yielded worse results (5.7s for 100,000 iterations). The problem requires a stringList of three words, a letterMap, and a numberMap. The letterMap are the unique characters in the stringList (must be 10 or less), and the numberMap is a list of n numbers, were n is the number of unique characters in stringList (Length[letterMap]). This mapping is performed with the stringList and letterMap constant, with the number map varying through all permutations of n numbers. To approximate this, I used Table to perform 100,000 calculations.
– tjm167us
Aug 16 at 0:43
@tjm167us You're comparing the wrong thing. Ifpermutations
is your list of 100000 permutations, then you should usestringsToNumberMappings[words, letterMap, permutations]
and notTable[stringsToNumberMappings[words, letterMap, p], p, permutations]
. I did not optimize the speed of creatingwvector
because there are only 3 words, and I only need to createwvector
once. If you useTable
, though,wvector
gets created 100000 times, which is slow.
– Carl Woll
Aug 16 at 1:09
add a comment |Â
up vote
7
down vote
Recently, I found out for myself that ToCharacterCode
is very efficient in transforming strings to numbers. In particular, ToCharacterCode
turns strings into a packed arrays which is very good for performance.
Here the preparation (stealing a bit from Carl Woll).
SeedRandom[123];
words = "send", "more", "money";
letterMap = DeleteDuplicates@Flatten[Characters[strings]];
numberMap = RandomSample[Range[0, 9], 8];
big = RandomChoice[words, 10^6];
We create a packed array as lookup table via SparseArray
and perform the actual lookup together with conversion to numbers in one go with the following compiled function
lookup = Compile[lookuptable, _Integer, 1, idx, _Integer, 1,
Sum[10^(Length[idx] - i) Compile`GetElement[lookuptable,
Compile`GetElement[idx, i]], i, 1, Length[idx]],
CompilationTarget -> "C",
RuntimeAttributes -> Listable,
Parallelization -> True,
RuntimeOptions -> "Speed"
];
StringsToNumbers2[strings_, letterMap_, numberMap_] := lookup[
Normal[SparseArray[ToCharacterCode[letterMap] -> numberMap]],
ToCharacterCode[strings]
];
On my machine, method is approximately twice as fast as using StringReplace
and FromDigits
:
result1 = StringsToNumbers[big, letterMap, numberMap]; // RepeatedTiming // First
result2 = StringsToNumbers2[big, letterMap, numberMap]; // RepeatedTiming // First
result1 === result2
0.810
0.40
True
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
9
down vote
accepted
Update
The following approach is much faster when the number of words is small and the number of mappings is large:
stringsToNumberMappings[words_List, letterMap_, samples_]:=Module[
len = Length @ letterMap, lrule, wvector
,
lrule = Thread @ Rule[letterMap, Range@len];
wvector = Transpose @ Normal @ Table[
SparseArray[
Thread @ Rule[Characters[w] /. lrule, 10^Range[StringLength[w]-1, 0, -1]],
len
],
w, words
];
samples . wvector
]
Comparison with previous answer:
words = "send", "more", "money";
letterMap = Flatten[Characters[words]] // DeleteDuplicates;
samples = Table[RandomSample[0 ;; 9, 8], 10^5];
r1 = stringsToNumberMappings[words, letterMap, samples]; //AbsoluteTiming
r2 = StringsToNumbers[words, letterMap, #]& /@ samples; //AbsoluteTiming
r1===r2
0.010555, Null
2.2923, Null
True
Original answer
I assume the computational inefficiency you're encountering is performing your replacement on lots of words? In that case, you could make use of the Listable
attribute of StringReplace
:
StringsToNumbers[str_List, letters_List, numbers_List] := With[
d = Thread[letters -> ToString /@ numbers],
FromDigits /@ StringReplace[str, d]
]
Simple example:
words = "send", "more", "money";
StringsToNumbers[words, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]
7369, 5043, 50631
Bigger example with a million words:
big = RandomChoice[words, 10^6];
StringsToNumbers[big, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]; //AbsoluteTiming
0.908794, Null
Curiously enough, stringsToNumberMappings yielded worse results (5.7s for 100,000 iterations). The problem requires a stringList of three words, a letterMap, and a numberMap. The letterMap are the unique characters in the stringList (must be 10 or less), and the numberMap is a list of n numbers, were n is the number of unique characters in stringList (Length[letterMap]). This mapping is performed with the stringList and letterMap constant, with the number map varying through all permutations of n numbers. To approximate this, I used Table to perform 100,000 calculations.
– tjm167us
Aug 16 at 0:43
@tjm167us You're comparing the wrong thing. Ifpermutations
is your list of 100000 permutations, then you should usestringsToNumberMappings[words, letterMap, permutations]
and notTable[stringsToNumberMappings[words, letterMap, p], p, permutations]
. I did not optimize the speed of creatingwvector
because there are only 3 words, and I only need to createwvector
once. If you useTable
, though,wvector
gets created 100000 times, which is slow.
– Carl Woll
Aug 16 at 1:09
add a comment |Â
up vote
9
down vote
accepted
Update
The following approach is much faster when the number of words is small and the number of mappings is large:
stringsToNumberMappings[words_List, letterMap_, samples_]:=Module[
len = Length @ letterMap, lrule, wvector
,
lrule = Thread @ Rule[letterMap, Range@len];
wvector = Transpose @ Normal @ Table[
SparseArray[
Thread @ Rule[Characters[w] /. lrule, 10^Range[StringLength[w]-1, 0, -1]],
len
],
w, words
];
samples . wvector
]
Comparison with previous answer:
words = "send", "more", "money";
letterMap = Flatten[Characters[words]] // DeleteDuplicates;
samples = Table[RandomSample[0 ;; 9, 8], 10^5];
r1 = stringsToNumberMappings[words, letterMap, samples]; //AbsoluteTiming
r2 = StringsToNumbers[words, letterMap, #]& /@ samples; //AbsoluteTiming
r1===r2
0.010555, Null
2.2923, Null
True
Original answer
I assume the computational inefficiency you're encountering is performing your replacement on lots of words? In that case, you could make use of the Listable
attribute of StringReplace
:
StringsToNumbers[str_List, letters_List, numbers_List] := With[
d = Thread[letters -> ToString /@ numbers],
FromDigits /@ StringReplace[str, d]
]
Simple example:
words = "send", "more", "money";
StringsToNumbers[words, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]
7369, 5043, 50631
Bigger example with a million words:
big = RandomChoice[words, 10^6];
StringsToNumbers[big, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]; //AbsoluteTiming
0.908794, Null
Curiously enough, stringsToNumberMappings yielded worse results (5.7s for 100,000 iterations). The problem requires a stringList of three words, a letterMap, and a numberMap. The letterMap are the unique characters in the stringList (must be 10 or less), and the numberMap is a list of n numbers, were n is the number of unique characters in stringList (Length[letterMap]). This mapping is performed with the stringList and letterMap constant, with the number map varying through all permutations of n numbers. To approximate this, I used Table to perform 100,000 calculations.
– tjm167us
Aug 16 at 0:43
@tjm167us You're comparing the wrong thing. Ifpermutations
is your list of 100000 permutations, then you should usestringsToNumberMappings[words, letterMap, permutations]
and notTable[stringsToNumberMappings[words, letterMap, p], p, permutations]
. I did not optimize the speed of creatingwvector
because there are only 3 words, and I only need to createwvector
once. If you useTable
, though,wvector
gets created 100000 times, which is slow.
– Carl Woll
Aug 16 at 1:09
add a comment |Â
up vote
9
down vote
accepted
up vote
9
down vote
accepted
Update
The following approach is much faster when the number of words is small and the number of mappings is large:
stringsToNumberMappings[words_List, letterMap_, samples_]:=Module[
len = Length @ letterMap, lrule, wvector
,
lrule = Thread @ Rule[letterMap, Range@len];
wvector = Transpose @ Normal @ Table[
SparseArray[
Thread @ Rule[Characters[w] /. lrule, 10^Range[StringLength[w]-1, 0, -1]],
len
],
w, words
];
samples . wvector
]
Comparison with previous answer:
words = "send", "more", "money";
letterMap = Flatten[Characters[words]] // DeleteDuplicates;
samples = Table[RandomSample[0 ;; 9, 8], 10^5];
r1 = stringsToNumberMappings[words, letterMap, samples]; //AbsoluteTiming
r2 = StringsToNumbers[words, letterMap, #]& /@ samples; //AbsoluteTiming
r1===r2
0.010555, Null
2.2923, Null
True
Original answer
I assume the computational inefficiency you're encountering is performing your replacement on lots of words? In that case, you could make use of the Listable
attribute of StringReplace
:
StringsToNumbers[str_List, letters_List, numbers_List] := With[
d = Thread[letters -> ToString /@ numbers],
FromDigits /@ StringReplace[str, d]
]
Simple example:
words = "send", "more", "money";
StringsToNumbers[words, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]
7369, 5043, 50631
Bigger example with a million words:
big = RandomChoice[words, 10^6];
StringsToNumbers[big, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]; //AbsoluteTiming
0.908794, Null
Update
The following approach is much faster when the number of words is small and the number of mappings is large:
stringsToNumberMappings[words_List, letterMap_, samples_]:=Module[
len = Length @ letterMap, lrule, wvector
,
lrule = Thread @ Rule[letterMap, Range@len];
wvector = Transpose @ Normal @ Table[
SparseArray[
Thread @ Rule[Characters[w] /. lrule, 10^Range[StringLength[w]-1, 0, -1]],
len
],
w, words
];
samples . wvector
]
Comparison with previous answer:
words = "send", "more", "money";
letterMap = Flatten[Characters[words]] // DeleteDuplicates;
samples = Table[RandomSample[0 ;; 9, 8], 10^5];
r1 = stringsToNumberMappings[words, letterMap, samples]; //AbsoluteTiming
r2 = StringsToNumbers[words, letterMap, #]& /@ samples; //AbsoluteTiming
r1===r2
0.010555, Null
2.2923, Null
True
Original answer
I assume the computational inefficiency you're encountering is performing your replacement on lots of words? In that case, you could make use of the Listable
attribute of StringReplace
:
StringsToNumbers[str_List, letters_List, numbers_List] := With[
d = Thread[letters -> ToString /@ numbers],
FromDigits /@ StringReplace[str, d]
]
Simple example:
words = "send", "more", "money";
StringsToNumbers[words, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]
7369, 5043, 50631
Bigger example with a million words:
big = RandomChoice[words, 10^6];
StringsToNumbers[big, letterMap, 7, 3, 6, 9, 5, 0, 4, 1]; //AbsoluteTiming
0.908794, Null
edited Aug 14 at 19:38
answered Aug 13 at 3:11


Carl Woll
55.4k271144
55.4k271144
Curiously enough, stringsToNumberMappings yielded worse results (5.7s for 100,000 iterations). The problem requires a stringList of three words, a letterMap, and a numberMap. The letterMap are the unique characters in the stringList (must be 10 or less), and the numberMap is a list of n numbers, were n is the number of unique characters in stringList (Length[letterMap]). This mapping is performed with the stringList and letterMap constant, with the number map varying through all permutations of n numbers. To approximate this, I used Table to perform 100,000 calculations.
– tjm167us
Aug 16 at 0:43
@tjm167us You're comparing the wrong thing. Ifpermutations
is your list of 100000 permutations, then you should usestringsToNumberMappings[words, letterMap, permutations]
and notTable[stringsToNumberMappings[words, letterMap, p], p, permutations]
. I did not optimize the speed of creatingwvector
because there are only 3 words, and I only need to createwvector
once. If you useTable
, though,wvector
gets created 100000 times, which is slow.
– Carl Woll
Aug 16 at 1:09
add a comment |Â
Curiously enough, stringsToNumberMappings yielded worse results (5.7s for 100,000 iterations). The problem requires a stringList of three words, a letterMap, and a numberMap. The letterMap are the unique characters in the stringList (must be 10 or less), and the numberMap is a list of n numbers, were n is the number of unique characters in stringList (Length[letterMap]). This mapping is performed with the stringList and letterMap constant, with the number map varying through all permutations of n numbers. To approximate this, I used Table to perform 100,000 calculations.
– tjm167us
Aug 16 at 0:43
@tjm167us You're comparing the wrong thing. Ifpermutations
is your list of 100000 permutations, then you should usestringsToNumberMappings[words, letterMap, permutations]
and notTable[stringsToNumberMappings[words, letterMap, p], p, permutations]
. I did not optimize the speed of creatingwvector
because there are only 3 words, and I only need to createwvector
once. If you useTable
, though,wvector
gets created 100000 times, which is slow.
– Carl Woll
Aug 16 at 1:09
Curiously enough, stringsToNumberMappings yielded worse results (5.7s for 100,000 iterations). The problem requires a stringList of three words, a letterMap, and a numberMap. The letterMap are the unique characters in the stringList (must be 10 or less), and the numberMap is a list of n numbers, were n is the number of unique characters in stringList (Length[letterMap]). This mapping is performed with the stringList and letterMap constant, with the number map varying through all permutations of n numbers. To approximate this, I used Table to perform 100,000 calculations.
– tjm167us
Aug 16 at 0:43
Curiously enough, stringsToNumberMappings yielded worse results (5.7s for 100,000 iterations). The problem requires a stringList of three words, a letterMap, and a numberMap. The letterMap are the unique characters in the stringList (must be 10 or less), and the numberMap is a list of n numbers, were n is the number of unique characters in stringList (Length[letterMap]). This mapping is performed with the stringList and letterMap constant, with the number map varying through all permutations of n numbers. To approximate this, I used Table to perform 100,000 calculations.
– tjm167us
Aug 16 at 0:43
@tjm167us You're comparing the wrong thing. If
permutations
is your list of 100000 permutations, then you should use stringsToNumberMappings[words, letterMap, permutations]
and not Table[stringsToNumberMappings[words, letterMap, p], p, permutations]
. I did not optimize the speed of creating wvector
because there are only 3 words, and I only need to create wvector
once. If you use Table
, though, wvector
gets created 100000 times, which is slow.– Carl Woll
Aug 16 at 1:09
@tjm167us You're comparing the wrong thing. If
permutations
is your list of 100000 permutations, then you should use stringsToNumberMappings[words, letterMap, permutations]
and not Table[stringsToNumberMappings[words, letterMap, p], p, permutations]
. I did not optimize the speed of creating wvector
because there are only 3 words, and I only need to create wvector
once. If you use Table
, though, wvector
gets created 100000 times, which is slow.– Carl Woll
Aug 16 at 1:09
add a comment |Â
up vote
7
down vote
Recently, I found out for myself that ToCharacterCode
is very efficient in transforming strings to numbers. In particular, ToCharacterCode
turns strings into a packed arrays which is very good for performance.
Here the preparation (stealing a bit from Carl Woll).
SeedRandom[123];
words = "send", "more", "money";
letterMap = DeleteDuplicates@Flatten[Characters[strings]];
numberMap = RandomSample[Range[0, 9], 8];
big = RandomChoice[words, 10^6];
We create a packed array as lookup table via SparseArray
and perform the actual lookup together with conversion to numbers in one go with the following compiled function
lookup = Compile[lookuptable, _Integer, 1, idx, _Integer, 1,
Sum[10^(Length[idx] - i) Compile`GetElement[lookuptable,
Compile`GetElement[idx, i]], i, 1, Length[idx]],
CompilationTarget -> "C",
RuntimeAttributes -> Listable,
Parallelization -> True,
RuntimeOptions -> "Speed"
];
StringsToNumbers2[strings_, letterMap_, numberMap_] := lookup[
Normal[SparseArray[ToCharacterCode[letterMap] -> numberMap]],
ToCharacterCode[strings]
];
On my machine, method is approximately twice as fast as using StringReplace
and FromDigits
:
result1 = StringsToNumbers[big, letterMap, numberMap]; // RepeatedTiming // First
result2 = StringsToNumbers2[big, letterMap, numberMap]; // RepeatedTiming // First
result1 === result2
0.810
0.40
True
add a comment |Â
up vote
7
down vote
Recently, I found out for myself that ToCharacterCode
is very efficient in transforming strings to numbers. In particular, ToCharacterCode
turns strings into a packed arrays which is very good for performance.
Here the preparation (stealing a bit from Carl Woll).
SeedRandom[123];
words = "send", "more", "money";
letterMap = DeleteDuplicates@Flatten[Characters[strings]];
numberMap = RandomSample[Range[0, 9], 8];
big = RandomChoice[words, 10^6];
We create a packed array as lookup table via SparseArray
and perform the actual lookup together with conversion to numbers in one go with the following compiled function
lookup = Compile[lookuptable, _Integer, 1, idx, _Integer, 1,
Sum[10^(Length[idx] - i) Compile`GetElement[lookuptable,
Compile`GetElement[idx, i]], i, 1, Length[idx]],
CompilationTarget -> "C",
RuntimeAttributes -> Listable,
Parallelization -> True,
RuntimeOptions -> "Speed"
];
StringsToNumbers2[strings_, letterMap_, numberMap_] := lookup[
Normal[SparseArray[ToCharacterCode[letterMap] -> numberMap]],
ToCharacterCode[strings]
];
On my machine, method is approximately twice as fast as using StringReplace
and FromDigits
:
result1 = StringsToNumbers[big, letterMap, numberMap]; // RepeatedTiming // First
result2 = StringsToNumbers2[big, letterMap, numberMap]; // RepeatedTiming // First
result1 === result2
0.810
0.40
True
add a comment |Â
up vote
7
down vote
up vote
7
down vote
Recently, I found out for myself that ToCharacterCode
is very efficient in transforming strings to numbers. In particular, ToCharacterCode
turns strings into a packed arrays which is very good for performance.
Here the preparation (stealing a bit from Carl Woll).
SeedRandom[123];
words = "send", "more", "money";
letterMap = DeleteDuplicates@Flatten[Characters[strings]];
numberMap = RandomSample[Range[0, 9], 8];
big = RandomChoice[words, 10^6];
We create a packed array as lookup table via SparseArray
and perform the actual lookup together with conversion to numbers in one go with the following compiled function
lookup = Compile[lookuptable, _Integer, 1, idx, _Integer, 1,
Sum[10^(Length[idx] - i) Compile`GetElement[lookuptable,
Compile`GetElement[idx, i]], i, 1, Length[idx]],
CompilationTarget -> "C",
RuntimeAttributes -> Listable,
Parallelization -> True,
RuntimeOptions -> "Speed"
];
StringsToNumbers2[strings_, letterMap_, numberMap_] := lookup[
Normal[SparseArray[ToCharacterCode[letterMap] -> numberMap]],
ToCharacterCode[strings]
];
On my machine, method is approximately twice as fast as using StringReplace
and FromDigits
:
result1 = StringsToNumbers[big, letterMap, numberMap]; // RepeatedTiming // First
result2 = StringsToNumbers2[big, letterMap, numberMap]; // RepeatedTiming // First
result1 === result2
0.810
0.40
True
Recently, I found out for myself that ToCharacterCode
is very efficient in transforming strings to numbers. In particular, ToCharacterCode
turns strings into a packed arrays which is very good for performance.
Here the preparation (stealing a bit from Carl Woll).
SeedRandom[123];
words = "send", "more", "money";
letterMap = DeleteDuplicates@Flatten[Characters[strings]];
numberMap = RandomSample[Range[0, 9], 8];
big = RandomChoice[words, 10^6];
We create a packed array as lookup table via SparseArray
and perform the actual lookup together with conversion to numbers in one go with the following compiled function
lookup = Compile[lookuptable, _Integer, 1, idx, _Integer, 1,
Sum[10^(Length[idx] - i) Compile`GetElement[lookuptable,
Compile`GetElement[idx, i]], i, 1, Length[idx]],
CompilationTarget -> "C",
RuntimeAttributes -> Listable,
Parallelization -> True,
RuntimeOptions -> "Speed"
];
StringsToNumbers2[strings_, letterMap_, numberMap_] := lookup[
Normal[SparseArray[ToCharacterCode[letterMap] -> numberMap]],
ToCharacterCode[strings]
];
On my machine, method is approximately twice as fast as using StringReplace
and FromDigits
:
result1 = StringsToNumbers[big, letterMap, numberMap]; // RepeatedTiming // First
result2 = StringsToNumbers2[big, letterMap, numberMap]; // RepeatedTiming // First
result1 === result2
0.810
0.40
True
edited Aug 13 at 7:58
answered Aug 13 at 7:17


Henrik Schumacher
36k249102
36k249102
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f179936%2fhow-can-i-find-a-more-efficient-solution-to-mapping-characters-to-digits%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Using
//AbsolutTiming
suggestsStringToNumber2
is the fastest. But I'm curious about 2 things: (1) Do you need to convert back - assuming this is some sort of secret decoder ring method, and (2) Does it matter if there are more than 8 (or especially more than 9) unique characters?– JimB
Aug 13 at 3:10
@JimB, 1) No converting back needed, 2) The particular problem dictates that there will be at most 10 unique characters with a numeric value ranging from 0..9.
– tjm167us
Aug 13 at 3:51
Your test timings are about 20 times slower than mine, probably because you don't give a list of strings as the first argument of
StringToNumber3
andStringToNumber4
.– Carl Woll
Aug 14 at 4:54
@CarlWoll, I do give a list of strings, but only a list of 3! The problem requires that the number mapping changes, but the letter map to stay constant and the input strings to be constant and a length of 3. Given that, are there any optimizations you can think of?
– tjm167us
Aug 14 at 13:30