Check if a character string is not random
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
Background
Let's say we have an alphabet of A,B, C, D
, then we look through some data and find a "word" which is DDDDDDDDCDDDDDD
the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD
seems less random.
Question
How should I check whether the strings I encounter are not random?
I tried some things in R, e.g., encoding the letters numerically and than comparing these to permutations. But encoding beforehand is quite cumbersome, likely there is a more direct approach for this.
text-mining randomness
New contributor
CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
1
down vote
favorite
Background
Let's say we have an alphabet of A,B, C, D
, then we look through some data and find a "word" which is DDDDDDDDCDDDDDD
the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD
seems less random.
Question
How should I check whether the strings I encounter are not random?
I tried some things in R, e.g., encoding the letters numerically and than comparing these to permutations. But encoding beforehand is quite cumbersome, likely there is a more direct approach for this.
text-mining randomness
New contributor
CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
What exactly do you mean by "random" in this context?
– gung♦
8 mins ago
Note that questions that are only about how to do something w/ given software are generally off topic here. This Q doesn't really have anything to do w/ R, however.
– gung♦
7 mins ago
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Background
Let's say we have an alphabet of A,B, C, D
, then we look through some data and find a "word" which is DDDDDDDDCDDDDDD
the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD
seems less random.
Question
How should I check whether the strings I encounter are not random?
I tried some things in R, e.g., encoding the letters numerically and than comparing these to permutations. But encoding beforehand is quite cumbersome, likely there is a more direct approach for this.
text-mining randomness
New contributor
CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Background
Let's say we have an alphabet of A,B, C, D
, then we look through some data and find a "word" which is DDDDDDDDCDDDDDD
the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD
seems less random.
Question
How should I check whether the strings I encounter are not random?
I tried some things in R, e.g., encoding the letters numerically and than comparing these to permutations. But encoding beforehand is quite cumbersome, likely there is a more direct approach for this.
text-mining randomness
text-mining randomness
New contributor
CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 7 mins ago
gung♦
103k34248511
103k34248511
New contributor
CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 1 hour ago
CodeNoob
1061
1061
New contributor
CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
What exactly do you mean by "random" in this context?
– gung♦
8 mins ago
Note that questions that are only about how to do something w/ given software are generally off topic here. This Q doesn't really have anything to do w/ R, however.
– gung♦
7 mins ago
add a comment |Â
What exactly do you mean by "random" in this context?
– gung♦
8 mins ago
Note that questions that are only about how to do something w/ given software are generally off topic here. This Q doesn't really have anything to do w/ R, however.
– gung♦
7 mins ago
What exactly do you mean by "random" in this context?
– gung♦
8 mins ago
What exactly do you mean by "random" in this context?
– gung♦
8 mins ago
Note that questions that are only about how to do something w/ given software are generally off topic here. This Q doesn't really have anything to do w/ R, however.
– gung♦
7 mins ago
Note that questions that are only about how to do something w/ given software are generally off topic here. This Q doesn't really have anything to do w/ R, however.
– gung♦
7 mins ago
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
"the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD seems less random."
Why would that be? If the overall proportion of letters A...D is equal to 0.25 for each letter, and each letter is independent of the other one, then both words are exactly equally probable. If the overall proportions of letters differ, then of course the probabilities of generating both words might be different.
You can try to find "low complexity" words, for example words with an especially high proportion of one letter (you could use the Shannon information as suggested in the other response, and in biological sequence analysis there are many other approaches), but there is no test for "randomness", as without further assumptions or knowledge about what you are actually analyzing, the term "randomness" makes no sense.
add a comment |Â
up vote
1
down vote
You could try Shannon information:
$$
H = -sum_i = 0^n P_ilog_2(P_i)
$$
where, $P_i = fracc_in$, $c_i$ is the count of some letter $c$ in the word and $n = |rm word|$.
For the first word you have $H = 0.35$. In the second word you have $H = 2$.
If the entropy is high, you could think of it as more random vs. another word with lower entropy.
New contributor
Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
"the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD seems less random."
Why would that be? If the overall proportion of letters A...D is equal to 0.25 for each letter, and each letter is independent of the other one, then both words are exactly equally probable. If the overall proportions of letters differ, then of course the probabilities of generating both words might be different.
You can try to find "low complexity" words, for example words with an especially high proportion of one letter (you could use the Shannon information as suggested in the other response, and in biological sequence analysis there are many other approaches), but there is no test for "randomness", as without further assumptions or knowledge about what you are actually analyzing, the term "randomness" makes no sense.
add a comment |Â
up vote
2
down vote
"the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD seems less random."
Why would that be? If the overall proportion of letters A...D is equal to 0.25 for each letter, and each letter is independent of the other one, then both words are exactly equally probable. If the overall proportions of letters differ, then of course the probabilities of generating both words might be different.
You can try to find "low complexity" words, for example words with an especially high proportion of one letter (you could use the Shannon information as suggested in the other response, and in biological sequence analysis there are many other approaches), but there is no test for "randomness", as without further assumptions or knowledge about what you are actually analyzing, the term "randomness" makes no sense.
add a comment |Â
up vote
2
down vote
up vote
2
down vote
"the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD seems less random."
Why would that be? If the overall proportion of letters A...D is equal to 0.25 for each letter, and each letter is independent of the other one, then both words are exactly equally probable. If the overall proportions of letters differ, then of course the probabilities of generating both words might be different.
You can try to find "low complexity" words, for example words with an especially high proportion of one letter (you could use the Shannon information as suggested in the other response, and in biological sequence analysis there are many other approaches), but there is no test for "randomness", as without further assumptions or knowledge about what you are actually analyzing, the term "randomness" makes no sense.
"the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD seems less random."
Why would that be? If the overall proportion of letters A...D is equal to 0.25 for each letter, and each letter is independent of the other one, then both words are exactly equally probable. If the overall proportions of letters differ, then of course the probabilities of generating both words might be different.
You can try to find "low complexity" words, for example words with an especially high proportion of one letter (you could use the Shannon information as suggested in the other response, and in biological sequence analysis there are many other approaches), but there is no test for "randomness", as without further assumptions or knowledge about what you are actually analyzing, the term "randomness" makes no sense.
answered 9 mins ago
January
3,6241945
3,6241945
add a comment |Â
add a comment |Â
up vote
1
down vote
You could try Shannon information:
$$
H = -sum_i = 0^n P_ilog_2(P_i)
$$
where, $P_i = fracc_in$, $c_i$ is the count of some letter $c$ in the word and $n = |rm word|$.
For the first word you have $H = 0.35$. In the second word you have $H = 2$.
If the entropy is high, you could think of it as more random vs. another word with lower entropy.
New contributor
Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
1
down vote
You could try Shannon information:
$$
H = -sum_i = 0^n P_ilog_2(P_i)
$$
where, $P_i = fracc_in$, $c_i$ is the count of some letter $c$ in the word and $n = |rm word|$.
For the first word you have $H = 0.35$. In the second word you have $H = 2$.
If the entropy is high, you could think of it as more random vs. another word with lower entropy.
New contributor
Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
You could try Shannon information:
$$
H = -sum_i = 0^n P_ilog_2(P_i)
$$
where, $P_i = fracc_in$, $c_i$ is the count of some letter $c$ in the word and $n = |rm word|$.
For the first word you have $H = 0.35$. In the second word you have $H = 2$.
If the entropy is high, you could think of it as more random vs. another word with lower entropy.
New contributor
Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
You could try Shannon information:
$$
H = -sum_i = 0^n P_ilog_2(P_i)
$$
where, $P_i = fracc_in$, $c_i$ is the count of some letter $c$ in the word and $n = |rm word|$.
For the first word you have $H = 0.35$. In the second word you have $H = 2$.
If the entropy is high, you could think of it as more random vs. another word with lower entropy.
New contributor
Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 39 secs ago
gung♦
103k34248511
103k34248511
New contributor
Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 21 mins ago


Edvrsoft
112
112
New contributor
Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
add a comment |Â
CodeNoob is a new contributor. Be nice, and check out our Code of Conduct.
CodeNoob is a new contributor. Be nice, and check out our Code of Conduct.
CodeNoob is a new contributor. Be nice, and check out our Code of Conduct.
CodeNoob is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f371150%2fcheck-if-a-character-string-is-not-random%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
What exactly do you mean by "random" in this context?
– gung♦
8 mins ago
Note that questions that are only about how to do something w/ given software are generally off topic here. This Q doesn't really have anything to do w/ R, however.
– gung♦
7 mins ago