making summary of sentences
Clash Royale CLAN TAG#URR8PPP
up vote
6
down vote
favorite
I have data and I want to summarize sentences to generate conclusions. The example below is not related to the data but just to clarify the idea so I can replicate it.
Employee Suzie signed one time.
Employee Dan signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzie signed one time.
Employee Harold signed one time.
Employee Sebastian signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzan signed one time.
I want to make a summary of these sentences, like this:
Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)
I played with awk
but it seems very hard to do it, then I tried sed
but didn't work. It seems sed
just for finding and changing things please help me with this task?
shell text-processing awk sed
New contributor
Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
6
down vote
favorite
I have data and I want to summarize sentences to generate conclusions. The example below is not related to the data but just to clarify the idea so I can replicate it.
Employee Suzie signed one time.
Employee Dan signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzie signed one time.
Employee Harold signed one time.
Employee Sebastian signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzan signed one time.
I want to make a summary of these sentences, like this:
Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)
I played with awk
but it seems very hard to do it, then I tried sed
but didn't work. It seems sed
just for finding and changing things please help me with this task?
shell text-processing awk sed
New contributor
Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Is "one" the only count possible, or can there be more events in a single line?
– RudiC
16 mins ago
add a comment |Â
up vote
6
down vote
favorite
up vote
6
down vote
favorite
I have data and I want to summarize sentences to generate conclusions. The example below is not related to the data but just to clarify the idea so I can replicate it.
Employee Suzie signed one time.
Employee Dan signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzie signed one time.
Employee Harold signed one time.
Employee Sebastian signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzan signed one time.
I want to make a summary of these sentences, like this:
Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)
I played with awk
but it seems very hard to do it, then I tried sed
but didn't work. It seems sed
just for finding and changing things please help me with this task?
shell text-processing awk sed
New contributor
Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I have data and I want to summarize sentences to generate conclusions. The example below is not related to the data but just to clarify the idea so I can replicate it.
Employee Suzie signed one time.
Employee Dan signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzie signed one time.
Employee Harold signed one time.
Employee Sebastian signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzan signed one time.
I want to make a summary of these sentences, like this:
Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)
I played with awk
but it seems very hard to do it, then I tried sed
but didn't work. It seems sed
just for finding and changing things please help me with this task?
shell text-processing awk sed
shell text-processing awk sed
New contributor
Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 19 mins ago


Jeff Schaller
34.2k851113
34.2k851113
New contributor
Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 29 mins ago
Zolo
333
333
New contributor
Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Is "one" the only count possible, or can there be more events in a single line?
– RudiC
16 mins ago
add a comment |Â
Is "one" the only count possible, or can there be more events in a single line?
– RudiC
16 mins ago
Is "one" the only count possible, or can there be more events in a single line?
– RudiC
16 mins ago
Is "one" the only count possible, or can there be more events in a single line?
– RudiC
16 mins ago
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
7
down vote
accepted
The general approach would be
$ awk ' count[$2]++
END
for (name in count)
printf("%s signed %d time(s)n", name, count[name])
' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)
I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END
block, iterate over all the names and print out the summary for each.
For slightly nicer formatting, change the %s
placeholder in the printf()
call to something like %-10s
to reserve 10 characters for the names (left-justified).
$ awk ' count[$2]++
END
for (name in count)
printf("%-10s signed %d time(s)n", name, count[name])
' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)
More fiddling around with the output (because I'm bored):
$ awk ' count[$2]++
END
for (name in count)
printf("%-10s signed %d time%sn", name, count[name],
count[name] > 1 ? "s" : "" )
' <file
Harold signed 1 time
Dan signed 1 time
Sebastian signed 1 time
Suzie signed 4 times
Jordan signed 2 times
Suzan signed 1 time
why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
– Zolo
9 mins ago
1
@Zolo The simplest way to sort the result by name in this case would be to pipe the output throughsort
:awk ... <file | sort
.
– Kusalananda
7 mins ago
add a comment |Â
up vote
6
down vote
This job is for awk
. You need an array[index]
to do it:
awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file
Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)
NF
is to remove extra blank lines.
The data is stored in the index and value of the array. Values are referenced with the corresponding index.
Thank you for the help
– Zolo
8 mins ago
add a comment |Â
up vote
1
down vote
while awk
is using an associated array and that would be limited to the memory size you have, you could do as following instead:
sort -k2,2 infile | uniq -c
or to formatting as you want:
sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
7
down vote
accepted
The general approach would be
$ awk ' count[$2]++
END
for (name in count)
printf("%s signed %d time(s)n", name, count[name])
' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)
I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END
block, iterate over all the names and print out the summary for each.
For slightly nicer formatting, change the %s
placeholder in the printf()
call to something like %-10s
to reserve 10 characters for the names (left-justified).
$ awk ' count[$2]++
END
for (name in count)
printf("%-10s signed %d time(s)n", name, count[name])
' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)
More fiddling around with the output (because I'm bored):
$ awk ' count[$2]++
END
for (name in count)
printf("%-10s signed %d time%sn", name, count[name],
count[name] > 1 ? "s" : "" )
' <file
Harold signed 1 time
Dan signed 1 time
Sebastian signed 1 time
Suzie signed 4 times
Jordan signed 2 times
Suzan signed 1 time
why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
– Zolo
9 mins ago
1
@Zolo The simplest way to sort the result by name in this case would be to pipe the output throughsort
:awk ... <file | sort
.
– Kusalananda
7 mins ago
add a comment |Â
up vote
7
down vote
accepted
The general approach would be
$ awk ' count[$2]++
END
for (name in count)
printf("%s signed %d time(s)n", name, count[name])
' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)
I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END
block, iterate over all the names and print out the summary for each.
For slightly nicer formatting, change the %s
placeholder in the printf()
call to something like %-10s
to reserve 10 characters for the names (left-justified).
$ awk ' count[$2]++
END
for (name in count)
printf("%-10s signed %d time(s)n", name, count[name])
' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)
More fiddling around with the output (because I'm bored):
$ awk ' count[$2]++
END
for (name in count)
printf("%-10s signed %d time%sn", name, count[name],
count[name] > 1 ? "s" : "" )
' <file
Harold signed 1 time
Dan signed 1 time
Sebastian signed 1 time
Suzie signed 4 times
Jordan signed 2 times
Suzan signed 1 time
why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
– Zolo
9 mins ago
1
@Zolo The simplest way to sort the result by name in this case would be to pipe the output throughsort
:awk ... <file | sort
.
– Kusalananda
7 mins ago
add a comment |Â
up vote
7
down vote
accepted
up vote
7
down vote
accepted
The general approach would be
$ awk ' count[$2]++
END
for (name in count)
printf("%s signed %d time(s)n", name, count[name])
' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)
I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END
block, iterate over all the names and print out the summary for each.
For slightly nicer formatting, change the %s
placeholder in the printf()
call to something like %-10s
to reserve 10 characters for the names (left-justified).
$ awk ' count[$2]++
END
for (name in count)
printf("%-10s signed %d time(s)n", name, count[name])
' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)
More fiddling around with the output (because I'm bored):
$ awk ' count[$2]++
END
for (name in count)
printf("%-10s signed %d time%sn", name, count[name],
count[name] > 1 ? "s" : "" )
' <file
Harold signed 1 time
Dan signed 1 time
Sebastian signed 1 time
Suzie signed 4 times
Jordan signed 2 times
Suzan signed 1 time
The general approach would be
$ awk ' count[$2]++
END
for (name in count)
printf("%s signed %d time(s)n", name, count[name])
' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)
I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END
block, iterate over all the names and print out the summary for each.
For slightly nicer formatting, change the %s
placeholder in the printf()
call to something like %-10s
to reserve 10 characters for the names (left-justified).
$ awk ' count[$2]++
END
for (name in count)
printf("%-10s signed %d time(s)n", name, count[name])
' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)
More fiddling around with the output (because I'm bored):
$ awk ' count[$2]++
END
for (name in count)
printf("%-10s signed %d time%sn", name, count[name],
count[name] > 1 ? "s" : "" )
' <file
Harold signed 1 time
Dan signed 1 time
Sebastian signed 1 time
Suzie signed 4 times
Jordan signed 2 times
Suzan signed 1 time
edited 11 mins ago
answered 22 mins ago


Kusalananda
110k15214338
110k15214338
why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
– Zolo
9 mins ago
1
@Zolo The simplest way to sort the result by name in this case would be to pipe the output throughsort
:awk ... <file | sort
.
– Kusalananda
7 mins ago
add a comment |Â
why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
– Zolo
9 mins ago
1
@Zolo The simplest way to sort the result by name in this case would be to pipe the output throughsort
:awk ... <file | sort
.
– Kusalananda
7 mins ago
why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
– Zolo
9 mins ago
why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
– Zolo
9 mins ago
1
1
@Zolo The simplest way to sort the result by name in this case would be to pipe the output through
sort
: awk ... <file | sort
.– Kusalananda
7 mins ago
@Zolo The simplest way to sort the result by name in this case would be to pipe the output through
sort
: awk ... <file | sort
.– Kusalananda
7 mins ago
add a comment |Â
up vote
6
down vote
This job is for awk
. You need an array[index]
to do it:
awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file
Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)
NF
is to remove extra blank lines.
The data is stored in the index and value of the array. Values are referenced with the corresponding index.
Thank you for the help
– Zolo
8 mins ago
add a comment |Â
up vote
6
down vote
This job is for awk
. You need an array[index]
to do it:
awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file
Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)
NF
is to remove extra blank lines.
The data is stored in the index and value of the array. Values are referenced with the corresponding index.
Thank you for the help
– Zolo
8 mins ago
add a comment |Â
up vote
6
down vote
up vote
6
down vote
This job is for awk
. You need an array[index]
to do it:
awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file
Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)
NF
is to remove extra blank lines.
The data is stored in the index and value of the array. Values are referenced with the corresponding index.
This job is for awk
. You need an array[index]
to do it:
awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file
Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)
NF
is to remove extra blank lines.
The data is stored in the index and value of the array. Values are referenced with the corresponding index.
answered 23 mins ago
Goro
10.2k64993
10.2k64993
Thank you for the help
– Zolo
8 mins ago
add a comment |Â
Thank you for the help
– Zolo
8 mins ago
Thank you for the help
– Zolo
8 mins ago
Thank you for the help
– Zolo
8 mins ago
add a comment |Â
up vote
1
down vote
while awk
is using an associated array and that would be limited to the memory size you have, you could do as following instead:
sort -k2,2 infile | uniq -c
or to formatting as you want:
sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '
add a comment |Â
up vote
1
down vote
while awk
is using an associated array and that would be limited to the memory size you have, you could do as following instead:
sort -k2,2 infile | uniq -c
or to formatting as you want:
sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '
add a comment |Â
up vote
1
down vote
up vote
1
down vote
while awk
is using an associated array and that would be limited to the memory size you have, you could do as following instead:
sort -k2,2 infile | uniq -c
or to formatting as you want:
sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '
while awk
is using an associated array and that would be limited to the memory size you have, you could do as following instead:
sort -k2,2 infile | uniq -c
or to formatting as you want:
sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '
answered 8 mins ago


sddgob
16.2k102564
16.2k102564
add a comment |Â
add a comment |Â
Zolo is a new contributor. Be nice, and check out our Code of Conduct.
Zolo is a new contributor. Be nice, and check out our Code of Conduct.
Zolo is a new contributor. Be nice, and check out our Code of Conduct.
Zolo is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f476284%2fmaking-summary-of-sentences%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Is "one" the only count possible, or can there be more events in a single line?
– RudiC
16 mins ago