Sorting values and grepping the best score (highest number)

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
4
down vote

favorite
1












I have a file that looks like this:



 7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
8 C00000002 score: -39.520 nathvy = 49 nconfs = 3129
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
10 C00000002 score: -38.454 nathvy = 49 nconfs = 9473
11 C00000004 score: -37.704 nathvy = 24 nconfs = 156
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51
2 C00000002 score: -48.649 nathvy = 49 nconfs = 3878
3 C00000001 score: -44.988 nathvy = 41 nconfs = 1988
4 C00000002 score: -42.674 nathvy = 49 nconfs = 6740
5 C00000002 score: -42.453 nathvy = 49 nconfs = 4553
6 C00000002 score: -41.829 nathvy = 49 nconfs = 7559


My second column are some IDs that are not sorted here, some of them are repeating, such as (C00000001) for example. All of them have a different number assigned followed by score: (number most often starts with -).



What I would like to do is:



1) read second column (non sorted IDs) and to always pick the first one that appears. So in case of C00000001 it would pick the on with score : -37.558.



2) now when I have unique values presented, I would like to sort them based on the number after score:, meaning the most negative number to be on the first position while the most positive one to be on the last position.



I would like to have output printed out the same way as my input file (same structure).







share|improve this question






















  • The first score that appears for C00000001 is -37.558. Or is the order defined by the first column?
    – Melebius
    Sep 4 at 5:45










  • oh, thanks Melebius, my fault, will edit it now..I wrote the number with the highest score for this particular ID. So, at first step we dont look at the score, we just pick up the first unique value that appears and then organize them based on number under score, from most negative to most positive.
    – djordje
    Sep 4 at 5:49














up vote
4
down vote

favorite
1












I have a file that looks like this:



 7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
8 C00000002 score: -39.520 nathvy = 49 nconfs = 3129
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
10 C00000002 score: -38.454 nathvy = 49 nconfs = 9473
11 C00000004 score: -37.704 nathvy = 24 nconfs = 156
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51
2 C00000002 score: -48.649 nathvy = 49 nconfs = 3878
3 C00000001 score: -44.988 nathvy = 41 nconfs = 1988
4 C00000002 score: -42.674 nathvy = 49 nconfs = 6740
5 C00000002 score: -42.453 nathvy = 49 nconfs = 4553
6 C00000002 score: -41.829 nathvy = 49 nconfs = 7559


My second column are some IDs that are not sorted here, some of them are repeating, such as (C00000001) for example. All of them have a different number assigned followed by score: (number most often starts with -).



What I would like to do is:



1) read second column (non sorted IDs) and to always pick the first one that appears. So in case of C00000001 it would pick the on with score : -37.558.



2) now when I have unique values presented, I would like to sort them based on the number after score:, meaning the most negative number to be on the first position while the most positive one to be on the last position.



I would like to have output printed out the same way as my input file (same structure).







share|improve this question






















  • The first score that appears for C00000001 is -37.558. Or is the order defined by the first column?
    – Melebius
    Sep 4 at 5:45










  • oh, thanks Melebius, my fault, will edit it now..I wrote the number with the highest score for this particular ID. So, at first step we dont look at the score, we just pick up the first unique value that appears and then organize them based on number under score, from most negative to most positive.
    – djordje
    Sep 4 at 5:49












up vote
4
down vote

favorite
1









up vote
4
down vote

favorite
1






1





I have a file that looks like this:



 7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
8 C00000002 score: -39.520 nathvy = 49 nconfs = 3129
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
10 C00000002 score: -38.454 nathvy = 49 nconfs = 9473
11 C00000004 score: -37.704 nathvy = 24 nconfs = 156
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51
2 C00000002 score: -48.649 nathvy = 49 nconfs = 3878
3 C00000001 score: -44.988 nathvy = 41 nconfs = 1988
4 C00000002 score: -42.674 nathvy = 49 nconfs = 6740
5 C00000002 score: -42.453 nathvy = 49 nconfs = 4553
6 C00000002 score: -41.829 nathvy = 49 nconfs = 7559


My second column are some IDs that are not sorted here, some of them are repeating, such as (C00000001) for example. All of them have a different number assigned followed by score: (number most often starts with -).



What I would like to do is:



1) read second column (non sorted IDs) and to always pick the first one that appears. So in case of C00000001 it would pick the on with score : -37.558.



2) now when I have unique values presented, I would like to sort them based on the number after score:, meaning the most negative number to be on the first position while the most positive one to be on the last position.



I would like to have output printed out the same way as my input file (same structure).







share|improve this question














I have a file that looks like this:



 7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
8 C00000002 score: -39.520 nathvy = 49 nconfs = 3129
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
10 C00000002 score: -38.454 nathvy = 49 nconfs = 9473
11 C00000004 score: -37.704 nathvy = 24 nconfs = 156
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51
2 C00000002 score: -48.649 nathvy = 49 nconfs = 3878
3 C00000001 score: -44.988 nathvy = 41 nconfs = 1988
4 C00000002 score: -42.674 nathvy = 49 nconfs = 6740
5 C00000002 score: -42.453 nathvy = 49 nconfs = 4553
6 C00000002 score: -41.829 nathvy = 49 nconfs = 7559


My second column are some IDs that are not sorted here, some of them are repeating, such as (C00000001) for example. All of them have a different number assigned followed by score: (number most often starts with -).



What I would like to do is:



1) read second column (non sorted IDs) and to always pick the first one that appears. So in case of C00000001 it would pick the on with score : -37.558.



2) now when I have unique values presented, I would like to sort them based on the number after score:, meaning the most negative number to be on the first position while the most positive one to be on the last position.



I would like to have output printed out the same way as my input file (same structure).









share|improve this question













share|improve this question




share|improve this question








edited Sep 4 at 6:03









Ravexina

27.3k146594




27.3k146594










asked Sep 4 at 5:37









djordje

1068




1068











  • The first score that appears for C00000001 is -37.558. Or is the order defined by the first column?
    – Melebius
    Sep 4 at 5:45










  • oh, thanks Melebius, my fault, will edit it now..I wrote the number with the highest score for this particular ID. So, at first step we dont look at the score, we just pick up the first unique value that appears and then organize them based on number under score, from most negative to most positive.
    – djordje
    Sep 4 at 5:49
















  • The first score that appears for C00000001 is -37.558. Or is the order defined by the first column?
    – Melebius
    Sep 4 at 5:45










  • oh, thanks Melebius, my fault, will edit it now..I wrote the number with the highest score for this particular ID. So, at first step we dont look at the score, we just pick up the first unique value that appears and then organize them based on number under score, from most negative to most positive.
    – djordje
    Sep 4 at 5:49















The first score that appears for C00000001 is -37.558. Or is the order defined by the first column?
– Melebius
Sep 4 at 5:45




The first score that appears for C00000001 is -37.558. Or is the order defined by the first column?
– Melebius
Sep 4 at 5:45












oh, thanks Melebius, my fault, will edit it now..I wrote the number with the highest score for this particular ID. So, at first step we dont look at the score, we just pick up the first unique value that appears and then organize them based on number under score, from most negative to most positive.
– djordje
Sep 4 at 5:49




oh, thanks Melebius, my fault, will edit it now..I wrote the number with the highest score for this particular ID. So, at first step we dont look at the score, we just pick up the first unique value that appears and then organize them based on number under score, from most negative to most positive.
– djordje
Sep 4 at 5:49










3 Answers
3






active

oldest

votes

















up vote
8
down vote



accepted










$ sort -k2,2 -u < filename | sort -k4,4n

7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51



Explanation:




  1. sort -k2,2 -u: sorts the lines based on second column and does not change the order of them (cause they're basically the same value) and keep the first one.


  2. sort -k4,4n: sort numerically based on the scores (there is no need for -r to reverse it).





share|improve this answer






















  • You should use angle brackets for filename: <filename>. At the first moment, I thought it’s a sorting option. See docopt.org, for example.
    – Melebius
    Sep 4 at 6:11







  • 2




    Sure, I'll try to keep it in mind ;). but have you seen this?
    – Ravexina
    Sep 4 at 6:15











  • ... or rather a variable reference like $filename. As the angle brackets are a confusing syntax for shell scripts.
    – Grzegorz Oledzki
    Sep 4 at 8:27










  • @Thor I have saw your comment the first time you post it, I'm not able to get your suggestion to work at any form, however I have updated my command (Yesterday) to: sort -k4,4n, and it is enough to get the highest value in this situation.
    – Ravexina
    Sep 5 at 7:28


















up vote
1
down vote













With GNU awk > 4.0:



$ gawk '
!seen[$2] seen[$2] = $0
END PROCINFO["sorted_in"] = "@val_num_asc"; for (i in seen) print seen[i]
' file
7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51





share|improve this answer



























    up vote
    0
    down vote













    Contributing with an additional single-line command that can easily be configured



    for row in $(cat tmp | awk 'print $2' | sort | uniq); do cat tmp | grep $row | head -n 1; done | sort -r --key=4

    7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
    9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
    12 C00000001 score: -37.558 nathvy = 41 nconfs = 51





    share|improve this answer




















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "89"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1071870%2fsorting-values-and-grepping-the-best-score-highest-number%23new-answer', 'question_page');

      );

      Post as a guest






























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      8
      down vote



      accepted










      $ sort -k2,2 -u < filename | sort -k4,4n

      7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
      9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
      12 C00000001 score: -37.558 nathvy = 41 nconfs = 51



      Explanation:




      1. sort -k2,2 -u: sorts the lines based on second column and does not change the order of them (cause they're basically the same value) and keep the first one.


      2. sort -k4,4n: sort numerically based on the scores (there is no need for -r to reverse it).





      share|improve this answer






















      • You should use angle brackets for filename: <filename>. At the first moment, I thought it’s a sorting option. See docopt.org, for example.
        – Melebius
        Sep 4 at 6:11







      • 2




        Sure, I'll try to keep it in mind ;). but have you seen this?
        – Ravexina
        Sep 4 at 6:15











      • ... or rather a variable reference like $filename. As the angle brackets are a confusing syntax for shell scripts.
        – Grzegorz Oledzki
        Sep 4 at 8:27










      • @Thor I have saw your comment the first time you post it, I'm not able to get your suggestion to work at any form, however I have updated my command (Yesterday) to: sort -k4,4n, and it is enough to get the highest value in this situation.
        – Ravexina
        Sep 5 at 7:28















      up vote
      8
      down vote



      accepted










      $ sort -k2,2 -u < filename | sort -k4,4n

      7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
      9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
      12 C00000001 score: -37.558 nathvy = 41 nconfs = 51



      Explanation:




      1. sort -k2,2 -u: sorts the lines based on second column and does not change the order of them (cause they're basically the same value) and keep the first one.


      2. sort -k4,4n: sort numerically based on the scores (there is no need for -r to reverse it).





      share|improve this answer






















      • You should use angle brackets for filename: <filename>. At the first moment, I thought it’s a sorting option. See docopt.org, for example.
        – Melebius
        Sep 4 at 6:11







      • 2




        Sure, I'll try to keep it in mind ;). but have you seen this?
        – Ravexina
        Sep 4 at 6:15











      • ... or rather a variable reference like $filename. As the angle brackets are a confusing syntax for shell scripts.
        – Grzegorz Oledzki
        Sep 4 at 8:27










      • @Thor I have saw your comment the first time you post it, I'm not able to get your suggestion to work at any form, however I have updated my command (Yesterday) to: sort -k4,4n, and it is enough to get the highest value in this situation.
        – Ravexina
        Sep 5 at 7:28













      up vote
      8
      down vote



      accepted







      up vote
      8
      down vote



      accepted






      $ sort -k2,2 -u < filename | sort -k4,4n

      7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
      9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
      12 C00000001 score: -37.558 nathvy = 41 nconfs = 51



      Explanation:




      1. sort -k2,2 -u: sorts the lines based on second column and does not change the order of them (cause they're basically the same value) and keep the first one.


      2. sort -k4,4n: sort numerically based on the scores (there is no need for -r to reverse it).





      share|improve this answer














      $ sort -k2,2 -u < filename | sort -k4,4n

      7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
      9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
      12 C00000001 score: -37.558 nathvy = 41 nconfs = 51



      Explanation:




      1. sort -k2,2 -u: sorts the lines based on second column and does not change the order of them (cause they're basically the same value) and keep the first one.


      2. sort -k4,4n: sort numerically based on the scores (there is no need for -r to reverse it).






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Sep 4 at 13:43

























      answered Sep 4 at 5:58









      Ravexina

      27.3k146594




      27.3k146594











      • You should use angle brackets for filename: <filename>. At the first moment, I thought it’s a sorting option. See docopt.org, for example.
        – Melebius
        Sep 4 at 6:11







      • 2




        Sure, I'll try to keep it in mind ;). but have you seen this?
        – Ravexina
        Sep 4 at 6:15











      • ... or rather a variable reference like $filename. As the angle brackets are a confusing syntax for shell scripts.
        – Grzegorz Oledzki
        Sep 4 at 8:27










      • @Thor I have saw your comment the first time you post it, I'm not able to get your suggestion to work at any form, however I have updated my command (Yesterday) to: sort -k4,4n, and it is enough to get the highest value in this situation.
        – Ravexina
        Sep 5 at 7:28

















      • You should use angle brackets for filename: <filename>. At the first moment, I thought it’s a sorting option. See docopt.org, for example.
        – Melebius
        Sep 4 at 6:11







      • 2




        Sure, I'll try to keep it in mind ;). but have you seen this?
        – Ravexina
        Sep 4 at 6:15











      • ... or rather a variable reference like $filename. As the angle brackets are a confusing syntax for shell scripts.
        – Grzegorz Oledzki
        Sep 4 at 8:27










      • @Thor I have saw your comment the first time you post it, I'm not able to get your suggestion to work at any form, however I have updated my command (Yesterday) to: sort -k4,4n, and it is enough to get the highest value in this situation.
        – Ravexina
        Sep 5 at 7:28
















      You should use angle brackets for filename: <filename>. At the first moment, I thought it’s a sorting option. See docopt.org, for example.
      – Melebius
      Sep 4 at 6:11





      You should use angle brackets for filename: <filename>. At the first moment, I thought it’s a sorting option. See docopt.org, for example.
      – Melebius
      Sep 4 at 6:11





      2




      2




      Sure, I'll try to keep it in mind ;). but have you seen this?
      – Ravexina
      Sep 4 at 6:15





      Sure, I'll try to keep it in mind ;). but have you seen this?
      – Ravexina
      Sep 4 at 6:15













      ... or rather a variable reference like $filename. As the angle brackets are a confusing syntax for shell scripts.
      – Grzegorz Oledzki
      Sep 4 at 8:27




      ... or rather a variable reference like $filename. As the angle brackets are a confusing syntax for shell scripts.
      – Grzegorz Oledzki
      Sep 4 at 8:27












      @Thor I have saw your comment the first time you post it, I'm not able to get your suggestion to work at any form, however I have updated my command (Yesterday) to: sort -k4,4n, and it is enough to get the highest value in this situation.
      – Ravexina
      Sep 5 at 7:28





      @Thor I have saw your comment the first time you post it, I'm not able to get your suggestion to work at any form, however I have updated my command (Yesterday) to: sort -k4,4n, and it is enough to get the highest value in this situation.
      – Ravexina
      Sep 5 at 7:28













      up vote
      1
      down vote













      With GNU awk > 4.0:



      $ gawk '
      !seen[$2] seen[$2] = $0
      END PROCINFO["sorted_in"] = "@val_num_asc"; for (i in seen) print seen[i]
      ' file
      7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
      9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
      12 C00000001 score: -37.558 nathvy = 41 nconfs = 51





      share|improve this answer
























        up vote
        1
        down vote













        With GNU awk > 4.0:



        $ gawk '
        !seen[$2] seen[$2] = $0
        END PROCINFO["sorted_in"] = "@val_num_asc"; for (i in seen) print seen[i]
        ' file
        7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
        9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
        12 C00000001 score: -37.558 nathvy = 41 nconfs = 51





        share|improve this answer






















          up vote
          1
          down vote










          up vote
          1
          down vote









          With GNU awk > 4.0:



          $ gawk '
          !seen[$2] seen[$2] = $0
          END PROCINFO["sorted_in"] = "@val_num_asc"; for (i in seen) print seen[i]
          ' file
          7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
          9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
          12 C00000001 score: -37.558 nathvy = 41 nconfs = 51





          share|improve this answer












          With GNU awk > 4.0:



          $ gawk '
          !seen[$2] seen[$2] = $0
          END PROCINFO["sorted_in"] = "@val_num_asc"; for (i in seen) print seen[i]
          ' file
          7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
          9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
          12 C00000001 score: -37.558 nathvy = 41 nconfs = 51






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Sep 4 at 11:45









          steeldriver

          62.8k1197165




          62.8k1197165




















              up vote
              0
              down vote













              Contributing with an additional single-line command that can easily be configured



              for row in $(cat tmp | awk 'print $2' | sort | uniq); do cat tmp | grep $row | head -n 1; done | sort -r --key=4

              7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
              9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
              12 C00000001 score: -37.558 nathvy = 41 nconfs = 51





              share|improve this answer
























                up vote
                0
                down vote













                Contributing with an additional single-line command that can easily be configured



                for row in $(cat tmp | awk 'print $2' | sort | uniq); do cat tmp | grep $row | head -n 1; done | sort -r --key=4

                7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
                9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
                12 C00000001 score: -37.558 nathvy = 41 nconfs = 51





                share|improve this answer






















                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Contributing with an additional single-line command that can easily be configured



                  for row in $(cat tmp | awk 'print $2' | sort | uniq); do cat tmp | grep $row | head -n 1; done | sort -r --key=4

                  7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
                  9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
                  12 C00000001 score: -37.558 nathvy = 41 nconfs = 51





                  share|improve this answer












                  Contributing with an additional single-line command that can easily be configured



                  for row in $(cat tmp | awk 'print $2' | sort | uniq); do cat tmp | grep $row | head -n 1; done | sort -r --key=4

                  7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
                  9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
                  12 C00000001 score: -37.558 nathvy = 41 nconfs = 51






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Sep 4 at 12:34









                  user2832190

                  61




                  61



























                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1071870%2fsorting-values-and-grepping-the-best-score-highest-number%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Comments

                      Popular posts from this blog

                      What does second last employer means? [closed]

                      List of Gilmore Girls characters

                      Confectionery