Sorting values and grepping the best score (highest number)

up vote
4
down vote

favorite

I have a file that looks like this:

 7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
 8 C00000002 score: -39.520 nathvy = 49 nconfs = 3129
 9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
 10 C00000002 score: -38.454 nathvy = 49 nconfs = 9473
 11 C00000004 score: -37.704 nathvy = 24 nconfs = 156
 12 C00000001 score: -37.558 nathvy = 41 nconfs = 51
 2 C00000002 score: -48.649 nathvy = 49 nconfs = 3878
 3 C00000001 score: -44.988 nathvy = 41 nconfs = 1988
 4 C00000002 score: -42.674 nathvy = 49 nconfs = 6740
 5 C00000002 score: -42.453 nathvy = 49 nconfs = 4553
 6 C00000002 score: -41.829 nathvy = 49 nconfs = 7559

My second column are some IDs that are not sorted here, some of them are repeating, such as (C00000001) for example. All of them have a different number assigned followed by score: (number most often starts with -).

What I would like to do is:

1) read second column (non sorted IDs) and to always pick the first one that appears. So in case of C00000001 it would pick the on with score : -37.558.

2) now when I have unique values presented, I would like to sort them based on the number after score:, meaning the most negative number to be on the first position while the most positive one to be on the last position.

I would like to have output printed out the same way as my input file (same structure).

edited Sep 4 at 6:03

Ravexina

27.3k146594

asked Sep 4 at 5:37

djordje

1068

The first score that appears for C00000001 is -37.558. Or is the order defined by the first column?
â€“Â Melebius
Sep 4 at 5:45

oh, thanks Melebius, my fault, will edit it now..I wrote the number with the highest score for this particular ID. So, at first step we dont look at the score, we just pick up the first unique value that appears and then organize them based on number under score, from most negative to most positive.
â€“Â djordje
Sep 4 at 5:49

add a commentÂ |Â

up vote
4
down vote

favorite

I have a file that looks like this:

 7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
 8 C00000002 score: -39.520 nathvy = 49 nconfs = 3129
 9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
 10 C00000002 score: -38.454 nathvy = 49 nconfs = 9473
 11 C00000004 score: -37.704 nathvy = 24 nconfs = 156
 12 C00000001 score: -37.558 nathvy = 41 nconfs = 51
 2 C00000002 score: -48.649 nathvy = 49 nconfs = 3878
 3 C00000001 score: -44.988 nathvy = 41 nconfs = 1988
 4 C00000002 score: -42.674 nathvy = 49 nconfs = 6740
 5 C00000002 score: -42.453 nathvy = 49 nconfs = 4553
 6 C00000002 score: -41.829 nathvy = 49 nconfs = 7559

What I would like to do is:

1) read second column (non sorted IDs) and to always pick the first one that appears. So in case of C00000001 it would pick the on with score : -37.558.

I would like to have output printed out the same way as my input file (same structure).

edited Sep 4 at 6:03

Ravexina

27.3k146594

asked Sep 4 at 5:37

djordje

1068

The first score that appears for C00000001 is -37.558. Or is the order defined by the first column?
â€“Â Melebius
Sep 4 at 5:45

oh, thanks Melebius, my fault, will edit it now..I wrote the number with the highest score for this particular ID. So, at first step we dont look at the score, we just pick up the first unique value that appears and then organize them based on number under score, from most negative to most positive.
â€“Â djordje
Sep 4 at 5:49

add a commentÂ |Â

up vote
4
down vote

favorite

I have a file that looks like this:

 7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
 8 C00000002 score: -39.520 nathvy = 49 nconfs = 3129
 9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
 10 C00000002 score: -38.454 nathvy = 49 nconfs = 9473
 11 C00000004 score: -37.704 nathvy = 24 nconfs = 156
 12 C00000001 score: -37.558 nathvy = 41 nconfs = 51
 2 C00000002 score: -48.649 nathvy = 49 nconfs = 3878
 3 C00000001 score: -44.988 nathvy = 41 nconfs = 1988
 4 C00000002 score: -42.674 nathvy = 49 nconfs = 6740
 5 C00000002 score: -42.453 nathvy = 49 nconfs = 4553
 6 C00000002 score: -41.829 nathvy = 49 nconfs = 7559

What I would like to do is:

1) read second column (non sorted IDs) and to always pick the first one that appears. So in case of C00000001 it would pick the on with score : -37.558.

I would like to have output printed out the same way as my input file (same structure).

edited Sep 4 at 6:03

Ravexina

27.3k146594

asked Sep 4 at 5:37

djordje

1068

I have a file that looks like this:

 7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
 8 C00000002 score: -39.520 nathvy = 49 nconfs = 3129
 9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
 10 C00000002 score: -38.454 nathvy = 49 nconfs = 9473
 11 C00000004 score: -37.704 nathvy = 24 nconfs = 156
 12 C00000001 score: -37.558 nathvy = 41 nconfs = 51
 2 C00000002 score: -48.649 nathvy = 49 nconfs = 3878
 3 C00000001 score: -44.988 nathvy = 41 nconfs = 1988
 4 C00000002 score: -42.674 nathvy = 49 nconfs = 6740
 5 C00000002 score: -42.453 nathvy = 49 nconfs = 4553
 6 C00000002 score: -41.829 nathvy = 49 nconfs = 7559

What I would like to do is:

1) read second column (non sorted IDs) and to always pick the first one that appears. So in case of C00000001 it would pick the on with score : -37.558.

I would like to have output printed out the same way as my input file (same structure).

edited Sep 4 at 6:03

Ravexina

27.3k146594

asked Sep 4 at 5:37

djordje

1068

edited Sep 4 at 6:03

Ravexina

27.3k146594

edited Sep 4 at 6:03

Ravexina

27.3k146594

edited Sep 4 at 6:03

Ravexina

27.3k146594

asked Sep 4 at 5:37

djordje

1068

asked Sep 4 at 5:37

djordje

1068

asked Sep 4 at 5:37

djordje

1068

The first score that appears for C00000001 is -37.558. Or is the order defined by the first column?
â€“Â Melebius
Sep 4 at 5:45

oh, thanks Melebius, my fault, will edit it now..I wrote the number with the highest score for this particular ID. So, at first step we dont look at the score, we just pick up the first unique value that appears and then organize them based on number under score, from most negative to most positive.
â€“Â djordje
Sep 4 at 5:49

add a commentÂ |Â

The first score that appears for C00000001 is -37.558. Or is the order defined by the first column?
â€“Â Melebius
Sep 4 at 5:45

oh, thanks Melebius, my fault, will edit it now..I wrote the number with the highest score for this particular ID. So, at first step we dont look at the score, we just pick up the first unique value that appears and then organize them based on number under score, from most negative to most positive.
â€“Â djordje
Sep 4 at 5:49

The first score that appears for C00000001 is -37.558. Or is the order defined by the first column?
â€“Â Melebius
Sep 4 at 5:45

oh, thanks Melebius, my fault, will edit it now..I wrote the number with the highest score for this particular ID. So, at first step we dont look at the score, we just pick up the first unique value that appears and then organize them based on number under score, from most negative to most positive.
â€“Â djordje
Sep 4 at 5:49

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
8
down vote

accepted

$ sort -k2,2 -u < filename | sort -k4,4n

7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

Explanation:

sort -k2,2 -u: sorts the lines based on second column and does not change the order of them (cause they're basically the same value) and keep the first one.

sort -k4,4n: sort numerically based on the scores (there is no need for -r to reverse it).

edited Sep 4 at 13:43

answered Sep 4 at 5:58

Ravexina

27.3k146594

You should use angle brackets for filename: <filename>. At the first moment, I thought itÃ¢Â€Â™s a sorting option. See docopt.org, for example.
â€“Â Melebius
Sep 4 at 6:11

2

Sure, I'll try to keep it in mind ;). but have you seen this?
â€“Â Ravexina
Sep 4 at 6:15

... or rather a variable reference like $filename. As the angle brackets are a confusing syntax for shell scripts.
â€“Â Grzegorz Oledzki
Sep 4 at 8:27

@Thor I have saw your comment the first time you post it, I'm not able to get your suggestion to work at any form, however I have updated my command (Yesterday) to: sort -k4,4n, and it is enough to get the highest value in this situation.
â€“Â Ravexina
Sep 5 at 7:28

add a commentÂ |Â

up vote
1
down vote

With GNU awk > 4.0:

$ gawk '
 !seen[$2] seen[$2] = $0 
 END PROCINFO["sorted_in"] = "@val_num_asc"; for (i in seen) print seen[i]
 ' file
 7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
 9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
 12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

answered Sep 4 at 11:45

steeldriver

62.8k1197165

add a commentÂ |Â

up vote
0
down vote

Contributing with an additional single-line command that can easily be configured

for row in $(cat tmp | awk 'print $2' | sort | uniq); do cat tmp | grep $row | head -n 1; done | sort -r --key=4

7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

answered Sep 4 at 12:34

user2832190

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "89"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1071870%2fsorting-values-and-grepping-the-best-score-highest-number%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
8
down vote

accepted

$ sort -k2,2 -u < filename | sort -k4,4n

7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

Explanation:

sort -k2,2 -u: sorts the lines based on second column and does not change the order of them (cause they're basically the same value) and keep the first one.

sort -k4,4n: sort numerically based on the scores (there is no need for -r to reverse it).

edited Sep 4 at 13:43

answered Sep 4 at 5:58

Ravexina

27.3k146594

You should use angle brackets for filename: <filename>. At the first moment, I thought itÃ¢Â€Â™s a sorting option. See docopt.org, for example.
â€“Â Melebius
Sep 4 at 6:11

2

Sure, I'll try to keep it in mind ;). but have you seen this?
â€“Â Ravexina
Sep 4 at 6:15

... or rather a variable reference like $filename. As the angle brackets are a confusing syntax for shell scripts.
â€“Â Grzegorz Oledzki
Sep 4 at 8:27

@Thor I have saw your comment the first time you post it, I'm not able to get your suggestion to work at any form, however I have updated my command (Yesterday) to: sort -k4,4n, and it is enough to get the highest value in this situation.
â€“Â Ravexina
Sep 5 at 7:28

add a commentÂ |Â

up vote
8
down vote

accepted

$ sort -k2,2 -u < filename | sort -k4,4n

7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

Explanation:

sort -k2,2 -u: sorts the lines based on second column and does not change the order of them (cause they're basically the same value) and keep the first one.

sort -k4,4n: sort numerically based on the scores (there is no need for -r to reverse it).

edited Sep 4 at 13:43

answered Sep 4 at 5:58

Ravexina

27.3k146594

You should use angle brackets for filename: <filename>. At the first moment, I thought itÃ¢Â€Â™s a sorting option. See docopt.org, for example.
â€“Â Melebius
Sep 4 at 6:11

2

Sure, I'll try to keep it in mind ;). but have you seen this?
â€“Â Ravexina
Sep 4 at 6:15

... or rather a variable reference like $filename. As the angle brackets are a confusing syntax for shell scripts.
â€“Â Grzegorz Oledzki
Sep 4 at 8:27

@Thor I have saw your comment the first time you post it, I'm not able to get your suggestion to work at any form, however I have updated my command (Yesterday) to: sort -k4,4n, and it is enough to get the highest value in this situation.
â€“Â Ravexina
Sep 5 at 7:28

add a commentÂ |Â

up vote
8
down vote

accepted

$ sort -k2,2 -u < filename | sort -k4,4n

7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

Explanation:

sort -k2,2 -u: sorts the lines based on second column and does not change the order of them (cause they're basically the same value) and keep the first one.

sort -k4,4n: sort numerically based on the scores (there is no need for -r to reverse it).

edited Sep 4 at 13:43

answered Sep 4 at 5:58

Ravexina

27.3k146594

$ sort -k2,2 -u < filename | sort -k4,4n

7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

Explanation:

sort -k2,2 -u: sorts the lines based on second column and does not change the order of them (cause they're basically the same value) and keep the first one.

sort -k4,4n: sort numerically based on the scores (there is no need for -r to reverse it).

edited Sep 4 at 13:43

answered Sep 4 at 5:58

Ravexina

27.3k146594

edited Sep 4 at 13:43

answered Sep 4 at 5:58

Ravexina

27.3k146594

answered Sep 4 at 5:58

Ravexina

27.3k146594

answered Sep 4 at 5:58

Ravexina

27.3k146594

You should use angle brackets for filename: <filename>. At the first moment, I thought itÃ¢Â€Â™s a sorting option. See docopt.org, for example.
â€“Â Melebius
Sep 4 at 6:11

2

Sure, I'll try to keep it in mind ;). but have you seen this?
â€“Â Ravexina
Sep 4 at 6:15

... or rather a variable reference like $filename. As the angle brackets are a confusing syntax for shell scripts.
â€“Â Grzegorz Oledzki
Sep 4 at 8:27

@Thor I have saw your comment the first time you post it, I'm not able to get your suggestion to work at any form, however I have updated my command (Yesterday) to: sort -k4,4n, and it is enough to get the highest value in this situation.
â€“Â Ravexina
Sep 5 at 7:28

add a commentÂ |Â

You should use angle brackets for filename: <filename>. At the first moment, I thought itÃ¢Â€Â™s a sorting option. See docopt.org, for example.
â€“Â Melebius
Sep 4 at 6:11

2

Sure, I'll try to keep it in mind ;). but have you seen this?
â€“Â Ravexina
Sep 4 at 6:15

... or rather a variable reference like $filename. As the angle brackets are a confusing syntax for shell scripts.
â€“Â Grzegorz Oledzki
Sep 4 at 8:27

@Thor I have saw your comment the first time you post it, I'm not able to get your suggestion to work at any form, however I have updated my command (Yesterday) to: sort -k4,4n, and it is enough to get the highest value in this situation.
â€“Â Ravexina
Sep 5 at 7:28

You should use angle brackets for filename: <filename>. At the first moment, I thought itÃ¢Â€Â™s a sorting option. See docopt.org, for example.
â€“Â Melebius
Sep 4 at 6:11

Sure, I'll try to keep it in mind ;). but have you seen this?
â€“Â Ravexina
Sep 4 at 6:15

... or rather a variable reference like $filename. As the angle brackets are a confusing syntax for shell scripts.
â€“Â Grzegorz Oledzki
Sep 4 at 8:27

@Thor I have saw your comment the first time you post it, I'm not able to get your suggestion to work at any form, however I have updated my command (Yesterday) to: sort -k4,4n, and it is enough to get the highest value in this situation.
â€“Â Ravexina
Sep 5 at 7:28

add a commentÂ |Â

up vote
1
down vote

With GNU awk > 4.0:

$ gawk '
 !seen[$2] seen[$2] = $0 
 END PROCINFO["sorted_in"] = "@val_num_asc"; for (i in seen) print seen[i]
 ' file
 7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
 9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
 12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

answered Sep 4 at 11:45

steeldriver

62.8k1197165

add a commentÂ |Â

up vote
1
down vote

With GNU awk > 4.0:

$ gawk '
 !seen[$2] seen[$2] = $0 
 END PROCINFO["sorted_in"] = "@val_num_asc"; for (i in seen) print seen[i]
 ' file
 7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
 9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
 12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

answered Sep 4 at 11:45

steeldriver

62.8k1197165

add a commentÂ |Â

up vote
1
down vote

With GNU awk > 4.0:

$ gawk '
 !seen[$2] seen[$2] = $0 
 END PROCINFO["sorted_in"] = "@val_num_asc"; for (i in seen) print seen[i]
 ' file
 7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
 9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
 12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

answered Sep 4 at 11:45

steeldriver

62.8k1197165

With GNU awk > 4.0:

$ gawk '
 !seen[$2] seen[$2] = $0 
 END PROCINFO["sorted_in"] = "@val_num_asc"; for (i in seen) print seen[i]
 ' file
 7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
 9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
 12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

answered Sep 4 at 11:45

steeldriver

62.8k1197165

answered Sep 4 at 11:45

steeldriver

62.8k1197165

answered Sep 4 at 11:45

steeldriver

62.8k1197165

answered Sep 4 at 11:45

steeldriver

62.8k1197165

add a commentÂ |Â

up vote
0
down vote

Contributing with an additional single-line command that can easily be configured

for row in $(cat tmp | awk 'print $2' | sort | uniq); do cat tmp | grep $row | head -n 1; done | sort -r --key=4

7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

answered Sep 4 at 12:34

user2832190

add a commentÂ |Â

up vote
0
down vote

Contributing with an additional single-line command that can easily be configured

for row in $(cat tmp | awk 'print $2' | sort | uniq); do cat tmp | grep $row | head -n 1; done | sort -r --key=4

7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

answered Sep 4 at 12:34

user2832190

add a commentÂ |Â

up vote
0
down vote

Contributing with an additional single-line command that can easily be configured

for row in $(cat tmp | awk 'print $2' | sort | uniq); do cat tmp | grep $row | head -n 1; done | sort -r --key=4

7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

answered Sep 4 at 12:34

user2832190

Contributing with an additional single-line command that can easily be configured

for row in $(cat tmp | awk 'print $2' | sort | uniq); do cat tmp | grep $row | head -n 1; done | sort -r --key=4

7 C00000002 score: -41.156 nathvy = 49 nconfs = 2251
9 C00000004 score: -38.928 nathvy = 24 nconfs = 150
12 C00000001 score: -37.558 nathvy = 41 nconfs = 51

answered Sep 4 at 12:34

user2832190

answered Sep 4 at 12:34

user2832190

answered Sep 4 at 12:34

user2832190

answered Sep 4 at 12:34

user2832190

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky