Check if a character string is not random

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite
1












Background

Let's say we have an alphabet of A,B, C, D, then we look through some data and find a "word" which is DDDDDDDDCDDDDDD the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD seems less random.



Question

How should I check whether the strings I encounter are not random?



I tried some things in R, e.g., encoding the letters numerically and than comparing these to permutations. But encoding beforehand is quite cumbersome, likely there is a more direct approach for this.










share|cite|improve this question









New contributor




CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • What exactly do you mean by "random" in this context?
    – gung♦
    8 mins ago










  • Note that questions that are only about how to do something w/ given software are generally off topic here. This Q doesn't really have anything to do w/ R, however.
    – gung♦
    7 mins ago
















up vote
1
down vote

favorite
1












Background

Let's say we have an alphabet of A,B, C, D, then we look through some data and find a "word" which is DDDDDDDDCDDDDDD the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD seems less random.



Question

How should I check whether the strings I encounter are not random?



I tried some things in R, e.g., encoding the letters numerically and than comparing these to permutations. But encoding beforehand is quite cumbersome, likely there is a more direct approach for this.










share|cite|improve this question









New contributor




CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • What exactly do you mean by "random" in this context?
    – gung♦
    8 mins ago










  • Note that questions that are only about how to do something w/ given software are generally off topic here. This Q doesn't really have anything to do w/ R, however.
    – gung♦
    7 mins ago












up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





Background

Let's say we have an alphabet of A,B, C, D, then we look through some data and find a "word" which is DDDDDDDDCDDDDDD the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD seems less random.



Question

How should I check whether the strings I encounter are not random?



I tried some things in R, e.g., encoding the letters numerically and than comparing these to permutations. But encoding beforehand is quite cumbersome, likely there is a more direct approach for this.










share|cite|improve this question









New contributor




CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











Background

Let's say we have an alphabet of A,B, C, D, then we look through some data and find a "word" which is DDDDDDDDCDDDDDD the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD seems less random.



Question

How should I check whether the strings I encounter are not random?



I tried some things in R, e.g., encoding the letters numerically and than comparing these to permutations. But encoding beforehand is quite cumbersome, likely there is a more direct approach for this.







text-mining randomness






share|cite|improve this question









New contributor




CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|cite|improve this question









New contributor




CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|cite|improve this question




share|cite|improve this question








edited 7 mins ago









gung♦

103k34248511




103k34248511






New contributor




CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 1 hour ago









CodeNoob

1061




1061




New contributor




CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






CodeNoob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • What exactly do you mean by "random" in this context?
    – gung♦
    8 mins ago










  • Note that questions that are only about how to do something w/ given software are generally off topic here. This Q doesn't really have anything to do w/ R, however.
    – gung♦
    7 mins ago
















  • What exactly do you mean by "random" in this context?
    – gung♦
    8 mins ago










  • Note that questions that are only about how to do something w/ given software are generally off topic here. This Q doesn't really have anything to do w/ R, however.
    – gung♦
    7 mins ago















What exactly do you mean by "random" in this context?
– gung♦
8 mins ago




What exactly do you mean by "random" in this context?
– gung♦
8 mins ago












Note that questions that are only about how to do something w/ given software are generally off topic here. This Q doesn't really have anything to do w/ R, however.
– gung♦
7 mins ago




Note that questions that are only about how to do something w/ given software are generally off topic here. This Q doesn't really have anything to do w/ R, however.
– gung♦
7 mins ago










2 Answers
2






active

oldest

votes

















up vote
2
down vote













"the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD seems less random."



Why would that be? If the overall proportion of letters A...D is equal to 0.25 for each letter, and each letter is independent of the other one, then both words are exactly equally probable. If the overall proportions of letters differ, then of course the probabilities of generating both words might be different.



You can try to find "low complexity" words, for example words with an especially high proportion of one letter (you could use the Shannon information as suggested in the other response, and in biological sequence analysis there are many other approaches), but there is no test for "randomness", as without further assumptions or knowledge about what you are actually analyzing, the term "randomness" makes no sense.






share|cite



























    up vote
    1
    down vote













    You could try Shannon information:
    $$
    H = -sum_i = 0^n P_ilog_2(P_i)
    $$

    where, $P_i = fracc_in$, $c_i$ is the count of some letter $c$ in the word and $n = |rm word|$.



    For the first word you have $H = 0.35$. In the second word you have $H = 2$.



    If the entropy is high, you could think of it as more random vs. another word with lower entropy.






    share|cite|improve this answer










    New contributor




    Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.

















      Your Answer




      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "65"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );






      CodeNoob is a new contributor. Be nice, and check out our Code of Conduct.









       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f371150%2fcheck-if-a-character-string-is-not-random%23new-answer', 'question_page');

      );

      Post as a guest






























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      2
      down vote













      "the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD seems less random."



      Why would that be? If the overall proportion of letters A...D is equal to 0.25 for each letter, and each letter is independent of the other one, then both words are exactly equally probable. If the overall proportions of letters differ, then of course the probabilities of generating both words might be different.



      You can try to find "low complexity" words, for example words with an especially high proportion of one letter (you could use the Shannon information as suggested in the other response, and in biological sequence analysis there are many other approaches), but there is no test for "randomness", as without further assumptions or knowledge about what you are actually analyzing, the term "randomness" makes no sense.






      share|cite
























        up vote
        2
        down vote













        "the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD seems less random."



        Why would that be? If the overall proportion of letters A...D is equal to 0.25 for each letter, and each letter is independent of the other one, then both words are exactly equally probable. If the overall proportions of letters differ, then of course the probabilities of generating both words might be different.



        You can try to find "low complexity" words, for example words with an especially high proportion of one letter (you could use the Shannon information as suggested in the other response, and in biological sequence analysis there are many other approaches), but there is no test for "randomness", as without further assumptions or knowledge about what you are actually analyzing, the term "randomness" makes no sense.






        share|cite






















          up vote
          2
          down vote










          up vote
          2
          down vote









          "the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD seems less random."



          Why would that be? If the overall proportion of letters A...D is equal to 0.25 for each letter, and each letter is independent of the other one, then both words are exactly equally probable. If the overall proportions of letters differ, then of course the probabilities of generating both words might be different.



          You can try to find "low complexity" words, for example words with an especially high proportion of one letter (you could use the Shannon information as suggested in the other response, and in biological sequence analysis there are many other approaches), but there is no test for "randomness", as without further assumptions or knowledge about what you are actually analyzing, the term "randomness" makes no sense.






          share|cite












          "the chance of finding this random seems low to me whereas finding BABDCABCDACDBACD seems less random."



          Why would that be? If the overall proportion of letters A...D is equal to 0.25 for each letter, and each letter is independent of the other one, then both words are exactly equally probable. If the overall proportions of letters differ, then of course the probabilities of generating both words might be different.



          You can try to find "low complexity" words, for example words with an especially high proportion of one letter (you could use the Shannon information as suggested in the other response, and in biological sequence analysis there are many other approaches), but there is no test for "randomness", as without further assumptions or knowledge about what you are actually analyzing, the term "randomness" makes no sense.







          share|cite












          share|cite



          share|cite










          answered 9 mins ago









          January

          3,6241945




          3,6241945






















              up vote
              1
              down vote













              You could try Shannon information:
              $$
              H = -sum_i = 0^n P_ilog_2(P_i)
              $$

              where, $P_i = fracc_in$, $c_i$ is the count of some letter $c$ in the word and $n = |rm word|$.



              For the first word you have $H = 0.35$. In the second word you have $H = 2$.



              If the entropy is high, you could think of it as more random vs. another word with lower entropy.






              share|cite|improve this answer










              New contributor




              Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.





















                up vote
                1
                down vote













                You could try Shannon information:
                $$
                H = -sum_i = 0^n P_ilog_2(P_i)
                $$

                where, $P_i = fracc_in$, $c_i$ is the count of some letter $c$ in the word and $n = |rm word|$.



                For the first word you have $H = 0.35$. In the second word you have $H = 2$.



                If the entropy is high, you could think of it as more random vs. another word with lower entropy.






                share|cite|improve this answer










                New contributor




                Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.



















                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  You could try Shannon information:
                  $$
                  H = -sum_i = 0^n P_ilog_2(P_i)
                  $$

                  where, $P_i = fracc_in$, $c_i$ is the count of some letter $c$ in the word and $n = |rm word|$.



                  For the first word you have $H = 0.35$. In the second word you have $H = 2$.



                  If the entropy is high, you could think of it as more random vs. another word with lower entropy.






                  share|cite|improve this answer










                  New contributor




                  Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  You could try Shannon information:
                  $$
                  H = -sum_i = 0^n P_ilog_2(P_i)
                  $$

                  where, $P_i = fracc_in$, $c_i$ is the count of some letter $c$ in the word and $n = |rm word|$.



                  For the first word you have $H = 0.35$. In the second word you have $H = 2$.



                  If the entropy is high, you could think of it as more random vs. another word with lower entropy.







                  share|cite|improve this answer










                  New contributor




                  Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited 39 secs ago









                  gung♦

                  103k34248511




                  103k34248511






                  New contributor




                  Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered 21 mins ago









                  Edvrsoft

                  112




                  112




                  New contributor




                  Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  Edvrsoft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.




















                      CodeNoob is a new contributor. Be nice, and check out our Code of Conduct.









                       

                      draft saved


                      draft discarded


















                      CodeNoob is a new contributor. Be nice, and check out our Code of Conduct.












                      CodeNoob is a new contributor. Be nice, and check out our Code of Conduct.











                      CodeNoob is a new contributor. Be nice, and check out our Code of Conduct.













                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f371150%2fcheck-if-a-character-string-is-not-random%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Comments

                      Popular posts from this blog

                      What does second last employer means? [closed]

                      List of Gilmore Girls characters

                      Confectionery