Ordering a string by the count of substrings?
Clash Royale CLAN TAG#URR8PPP
up vote
4
down vote
favorite
I have long list of numbers like this:
1234-212-22-11153782-0114232192380
8807698823332-6756-234-14-09867378
45323-14-221-238372635363-43676256
62736373-9983-23-234-8863345637388
. . . .
. . . .
I would like to do two things:
1) order this list by the count of digits within each segment, the output should be like this:
22-212-1234-11153782-0114232192380
14-234-6756-09867378-8807698823332
14-221-45323-43676256-238372635363
23-234-9983-62736373-8863345637388
2) find the count of sub strings in each line, the output should be:
2-3-4-8-13
2-3-4-8-13
2-3-5-8-12
2-3-4-8-13
In this example the first, second and third segments of each number has same numbers, but they could be different.
text-processing sort
New contributor
marco is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
4
down vote
favorite
I have long list of numbers like this:
1234-212-22-11153782-0114232192380
8807698823332-6756-234-14-09867378
45323-14-221-238372635363-43676256
62736373-9983-23-234-8863345637388
. . . .
. . . .
I would like to do two things:
1) order this list by the count of digits within each segment, the output should be like this:
22-212-1234-11153782-0114232192380
14-234-6756-09867378-8807698823332
14-221-45323-43676256-238372635363
23-234-9983-62736373-8863345637388
2) find the count of sub strings in each line, the output should be:
2-3-4-8-13
2-3-4-8-13
2-3-5-8-12
2-3-4-8-13
In this example the first, second and third segments of each number has same numbers, but they could be different.
text-processing sort
New contributor
marco is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
are fields of identical length possible? How should these be arranged?
– RudiC
1 hour ago
This looks like homework. Are you allowed to use something besides bash, e.g. python?
– Hermann
1 hour ago
You mention Linux; can we assume a GNU/Linux environment for solutions?
– Jeff Schaller
1 hour ago
Not sure why this would be homework. Could be log output from a game or a piece of custom hardware or anything really.
– pipe
10 mins ago
@marco. Supper nice question!! :-)
– Goro
3 mins ago
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I have long list of numbers like this:
1234-212-22-11153782-0114232192380
8807698823332-6756-234-14-09867378
45323-14-221-238372635363-43676256
62736373-9983-23-234-8863345637388
. . . .
. . . .
I would like to do two things:
1) order this list by the count of digits within each segment, the output should be like this:
22-212-1234-11153782-0114232192380
14-234-6756-09867378-8807698823332
14-221-45323-43676256-238372635363
23-234-9983-62736373-8863345637388
2) find the count of sub strings in each line, the output should be:
2-3-4-8-13
2-3-4-8-13
2-3-5-8-12
2-3-4-8-13
In this example the first, second and third segments of each number has same numbers, but they could be different.
text-processing sort
New contributor
marco is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I have long list of numbers like this:
1234-212-22-11153782-0114232192380
8807698823332-6756-234-14-09867378
45323-14-221-238372635363-43676256
62736373-9983-23-234-8863345637388
. . . .
. . . .
I would like to do two things:
1) order this list by the count of digits within each segment, the output should be like this:
22-212-1234-11153782-0114232192380
14-234-6756-09867378-8807698823332
14-221-45323-43676256-238372635363
23-234-9983-62736373-8863345637388
2) find the count of sub strings in each line, the output should be:
2-3-4-8-13
2-3-4-8-13
2-3-5-8-12
2-3-4-8-13
In this example the first, second and third segments of each number has same numbers, but they could be different.
text-processing sort
text-processing sort
New contributor
marco is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
marco is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 4 mins ago
Goro
6,16452763
6,16452763
New contributor
marco is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 1 hour ago
marco
513
513
New contributor
marco is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
marco is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
marco is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
are fields of identical length possible? How should these be arranged?
– RudiC
1 hour ago
This looks like homework. Are you allowed to use something besides bash, e.g. python?
– Hermann
1 hour ago
You mention Linux; can we assume a GNU/Linux environment for solutions?
– Jeff Schaller
1 hour ago
Not sure why this would be homework. Could be log output from a game or a piece of custom hardware or anything really.
– pipe
10 mins ago
@marco. Supper nice question!! :-)
– Goro
3 mins ago
add a comment |Â
are fields of identical length possible? How should these be arranged?
– RudiC
1 hour ago
This looks like homework. Are you allowed to use something besides bash, e.g. python?
– Hermann
1 hour ago
You mention Linux; can we assume a GNU/Linux environment for solutions?
– Jeff Schaller
1 hour ago
Not sure why this would be homework. Could be log output from a game or a piece of custom hardware or anything really.
– pipe
10 mins ago
@marco. Supper nice question!! :-)
– Goro
3 mins ago
are fields of identical length possible? How should these be arranged?
– RudiC
1 hour ago
are fields of identical length possible? How should these be arranged?
– RudiC
1 hour ago
This looks like homework. Are you allowed to use something besides bash, e.g. python?
– Hermann
1 hour ago
This looks like homework. Are you allowed to use something besides bash, e.g. python?
– Hermann
1 hour ago
You mention Linux; can we assume a GNU/Linux environment for solutions?
– Jeff Schaller
1 hour ago
You mention Linux; can we assume a GNU/Linux environment for solutions?
– Jeff Schaller
1 hour ago
Not sure why this would be homework. Could be log output from a game or a piece of custom hardware or anything really.
– pipe
10 mins ago
Not sure why this would be homework. Could be log output from a game or a piece of custom hardware or anything really.
– pipe
10 mins ago
@marco. Supper nice question!! :-)
– Goro
3 mins ago
@marco. Supper nice question!! :-)
– Goro
3 mins ago
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
6
down vote
How about
$ perl -F'-' -lne 'print join "-", sort length $a <=> length $b @F' file
22-212-1234-11153782-0114232192380
14-234-6756-09867378-8807698823332
14-221-45323-43676256-238372635363
23-234-9983-62736373-8863345637388
and
$ perl -F'-' -lne 'print join "-", map length $_ sort length $a <=> length $b @F' file
2-3-4-8-13
2-3-4-8-13
2-3-5-8-12
2-3-4-8-13
add a comment |Â
up vote
4
down vote
GNU awk can sort, so the trickiest part is deciding how to separate the two desired outputs; this script generates both results, and you can decide if you'd like them somewhere other than hard-coded output files:
function compare_length(i1, v1, i2, v2)
return (length(v1) - length(v2));
BEGIN
PROCINFO["sorted_in"]="compare_length"
FS="-"
split($0, elements);
asort(elements, sorted_elements, "compare_length");
reordered="";
lengths="";
for (element in sorted_elements)
reordered=(reordered == "" ? "" : reordered FS) sorted_elements[element];
lengths=(lengths == "" ? "" : lengths FS) length(sorted_elements[element]);
print reordered > "reordered.out";
print lengths > "lengths.out";
add a comment |Â
up vote
2
down vote
How far would this get you:
awk -F- ' # set "-" as the field separator
for (i=1; i<=NF; i++)
L = length($i) # for every single field, calc its length
T[L] = $i # and populate the T array with length as index
if (L>MX) MX = L # keep max length
$0 = "" # empty line
for (i=1; i<=MX; i++)
if (T[i])
$0 = $0 OFS T[i] # append each non-zero T element to the line, separated by "-"
C = C OFS i # keep the field lengths in separate variable C
print substr ($0, 2) "t" substr (C, 2) # print the line and the field lengths, eliminating each first char
C = MX = "" # reset working variables
split ("", T) # delete T array
' OFS=- file
22-212-1234-11153782-0114232192380 2-3-4-8-13
14-234-6756-09867378-8807698823332 2-3-4-8-13
14-221-45323-43676256-238372635363 2-3-5-8-12
23-234-9983-62736373-8863345637388 2-3-4-8-13
You may want to split the printout into two result files.
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
How about
$ perl -F'-' -lne 'print join "-", sort length $a <=> length $b @F' file
22-212-1234-11153782-0114232192380
14-234-6756-09867378-8807698823332
14-221-45323-43676256-238372635363
23-234-9983-62736373-8863345637388
and
$ perl -F'-' -lne 'print join "-", map length $_ sort length $a <=> length $b @F' file
2-3-4-8-13
2-3-4-8-13
2-3-5-8-12
2-3-4-8-13
add a comment |Â
up vote
6
down vote
How about
$ perl -F'-' -lne 'print join "-", sort length $a <=> length $b @F' file
22-212-1234-11153782-0114232192380
14-234-6756-09867378-8807698823332
14-221-45323-43676256-238372635363
23-234-9983-62736373-8863345637388
and
$ perl -F'-' -lne 'print join "-", map length $_ sort length $a <=> length $b @F' file
2-3-4-8-13
2-3-4-8-13
2-3-5-8-12
2-3-4-8-13
add a comment |Â
up vote
6
down vote
up vote
6
down vote
How about
$ perl -F'-' -lne 'print join "-", sort length $a <=> length $b @F' file
22-212-1234-11153782-0114232192380
14-234-6756-09867378-8807698823332
14-221-45323-43676256-238372635363
23-234-9983-62736373-8863345637388
and
$ perl -F'-' -lne 'print join "-", map length $_ sort length $a <=> length $b @F' file
2-3-4-8-13
2-3-4-8-13
2-3-5-8-12
2-3-4-8-13
How about
$ perl -F'-' -lne 'print join "-", sort length $a <=> length $b @F' file
22-212-1234-11153782-0114232192380
14-234-6756-09867378-8807698823332
14-221-45323-43676256-238372635363
23-234-9983-62736373-8863345637388
and
$ perl -F'-' -lne 'print join "-", map length $_ sort length $a <=> length $b @F' file
2-3-4-8-13
2-3-4-8-13
2-3-5-8-12
2-3-4-8-13
answered 37 mins ago
steeldriver
32.4k34979
32.4k34979
add a comment |Â
add a comment |Â
up vote
4
down vote
GNU awk can sort, so the trickiest part is deciding how to separate the two desired outputs; this script generates both results, and you can decide if you'd like them somewhere other than hard-coded output files:
function compare_length(i1, v1, i2, v2)
return (length(v1) - length(v2));
BEGIN
PROCINFO["sorted_in"]="compare_length"
FS="-"
split($0, elements);
asort(elements, sorted_elements, "compare_length");
reordered="";
lengths="";
for (element in sorted_elements)
reordered=(reordered == "" ? "" : reordered FS) sorted_elements[element];
lengths=(lengths == "" ? "" : lengths FS) length(sorted_elements[element]);
print reordered > "reordered.out";
print lengths > "lengths.out";
add a comment |Â
up vote
4
down vote
GNU awk can sort, so the trickiest part is deciding how to separate the two desired outputs; this script generates both results, and you can decide if you'd like them somewhere other than hard-coded output files:
function compare_length(i1, v1, i2, v2)
return (length(v1) - length(v2));
BEGIN
PROCINFO["sorted_in"]="compare_length"
FS="-"
split($0, elements);
asort(elements, sorted_elements, "compare_length");
reordered="";
lengths="";
for (element in sorted_elements)
reordered=(reordered == "" ? "" : reordered FS) sorted_elements[element];
lengths=(lengths == "" ? "" : lengths FS) length(sorted_elements[element]);
print reordered > "reordered.out";
print lengths > "lengths.out";
add a comment |Â
up vote
4
down vote
up vote
4
down vote
GNU awk can sort, so the trickiest part is deciding how to separate the two desired outputs; this script generates both results, and you can decide if you'd like them somewhere other than hard-coded output files:
function compare_length(i1, v1, i2, v2)
return (length(v1) - length(v2));
BEGIN
PROCINFO["sorted_in"]="compare_length"
FS="-"
split($0, elements);
asort(elements, sorted_elements, "compare_length");
reordered="";
lengths="";
for (element in sorted_elements)
reordered=(reordered == "" ? "" : reordered FS) sorted_elements[element];
lengths=(lengths == "" ? "" : lengths FS) length(sorted_elements[element]);
print reordered > "reordered.out";
print lengths > "lengths.out";
GNU awk can sort, so the trickiest part is deciding how to separate the two desired outputs; this script generates both results, and you can decide if you'd like them somewhere other than hard-coded output files:
function compare_length(i1, v1, i2, v2)
return (length(v1) - length(v2));
BEGIN
PROCINFO["sorted_in"]="compare_length"
FS="-"
split($0, elements);
asort(elements, sorted_elements, "compare_length");
reordered="";
lengths="";
for (element in sorted_elements)
reordered=(reordered == "" ? "" : reordered FS) sorted_elements[element];
lengths=(lengths == "" ? "" : lengths FS) length(sorted_elements[element]);
print reordered > "reordered.out";
print lengths > "lengths.out";
answered 33 mins ago


Jeff Schaller
33.2k849111
33.2k849111
add a comment |Â
add a comment |Â
up vote
2
down vote
How far would this get you:
awk -F- ' # set "-" as the field separator
for (i=1; i<=NF; i++)
L = length($i) # for every single field, calc its length
T[L] = $i # and populate the T array with length as index
if (L>MX) MX = L # keep max length
$0 = "" # empty line
for (i=1; i<=MX; i++)
if (T[i])
$0 = $0 OFS T[i] # append each non-zero T element to the line, separated by "-"
C = C OFS i # keep the field lengths in separate variable C
print substr ($0, 2) "t" substr (C, 2) # print the line and the field lengths, eliminating each first char
C = MX = "" # reset working variables
split ("", T) # delete T array
' OFS=- file
22-212-1234-11153782-0114232192380 2-3-4-8-13
14-234-6756-09867378-8807698823332 2-3-4-8-13
14-221-45323-43676256-238372635363 2-3-5-8-12
23-234-9983-62736373-8863345637388 2-3-4-8-13
You may want to split the printout into two result files.
add a comment |Â
up vote
2
down vote
How far would this get you:
awk -F- ' # set "-" as the field separator
for (i=1; i<=NF; i++)
L = length($i) # for every single field, calc its length
T[L] = $i # and populate the T array with length as index
if (L>MX) MX = L # keep max length
$0 = "" # empty line
for (i=1; i<=MX; i++)
if (T[i])
$0 = $0 OFS T[i] # append each non-zero T element to the line, separated by "-"
C = C OFS i # keep the field lengths in separate variable C
print substr ($0, 2) "t" substr (C, 2) # print the line and the field lengths, eliminating each first char
C = MX = "" # reset working variables
split ("", T) # delete T array
' OFS=- file
22-212-1234-11153782-0114232192380 2-3-4-8-13
14-234-6756-09867378-8807698823332 2-3-4-8-13
14-221-45323-43676256-238372635363 2-3-5-8-12
23-234-9983-62736373-8863345637388 2-3-4-8-13
You may want to split the printout into two result files.
add a comment |Â
up vote
2
down vote
up vote
2
down vote
How far would this get you:
awk -F- ' # set "-" as the field separator
for (i=1; i<=NF; i++)
L = length($i) # for every single field, calc its length
T[L] = $i # and populate the T array with length as index
if (L>MX) MX = L # keep max length
$0 = "" # empty line
for (i=1; i<=MX; i++)
if (T[i])
$0 = $0 OFS T[i] # append each non-zero T element to the line, separated by "-"
C = C OFS i # keep the field lengths in separate variable C
print substr ($0, 2) "t" substr (C, 2) # print the line and the field lengths, eliminating each first char
C = MX = "" # reset working variables
split ("", T) # delete T array
' OFS=- file
22-212-1234-11153782-0114232192380 2-3-4-8-13
14-234-6756-09867378-8807698823332 2-3-4-8-13
14-221-45323-43676256-238372635363 2-3-5-8-12
23-234-9983-62736373-8863345637388 2-3-4-8-13
You may want to split the printout into two result files.
How far would this get you:
awk -F- ' # set "-" as the field separator
for (i=1; i<=NF; i++)
L = length($i) # for every single field, calc its length
T[L] = $i # and populate the T array with length as index
if (L>MX) MX = L # keep max length
$0 = "" # empty line
for (i=1; i<=MX; i++)
if (T[i])
$0 = $0 OFS T[i] # append each non-zero T element to the line, separated by "-"
C = C OFS i # keep the field lengths in separate variable C
print substr ($0, 2) "t" substr (C, 2) # print the line and the field lengths, eliminating each first char
C = MX = "" # reset working variables
split ("", T) # delete T array
' OFS=- file
22-212-1234-11153782-0114232192380 2-3-4-8-13
14-234-6756-09867378-8807698823332 2-3-4-8-13
14-221-45323-43676256-238372635363 2-3-5-8-12
23-234-9983-62736373-8863345637388 2-3-4-8-13
You may want to split the printout into two result files.
edited 22 mins ago
terdon♦
124k29234408
124k29234408
answered 51 mins ago
RudiC
1,6749
1,6749
add a comment |Â
add a comment |Â
marco is a new contributor. Be nice, and check out our Code of Conduct.
marco is a new contributor. Be nice, and check out our Code of Conduct.
marco is a new contributor. Be nice, and check out our Code of Conduct.
marco is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f472976%2fordering-a-string-by-the-count-of-substrings%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
are fields of identical length possible? How should these be arranged?
– RudiC
1 hour ago
This looks like homework. Are you allowed to use something besides bash, e.g. python?
– Hermann
1 hour ago
You mention Linux; can we assume a GNU/Linux environment for solutions?
– Jeff Schaller
1 hour ago
Not sure why this would be homework. Could be log output from a game or a piece of custom hardware or anything really.
– pipe
10 mins ago
@marco. Supper nice question!! :-)
– Goro
3 mins ago