making summary of sentences

up vote
6
down vote

favorite

I have data and I want to summarize sentences to generate conclusions. The example below is not related to the data but just to clarify the idea so I can replicate it.

Employee Suzie signed one time.
Employee Dan signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzie signed one time.
Employee Harold signed one time.
Employee Sebastian signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzan signed one time.

I want to make a summary of these sentences, like this:

Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)

I played with awk but it seems very hard to do it, then I tried sed but didn't work. It seems sed just for finding and changing things please help me with this task?

edited 19 mins ago

Jeff Schaller

34.2k851113

asked 29 mins ago

Zolo

333

New contributor

Is "one" the only count possible, or can there be more events in a single line?
â€“Â RudiC
16 mins ago

add a commentÂ |Â

up vote
6
down vote

favorite

I have data and I want to summarize sentences to generate conclusions. The example below is not related to the data but just to clarify the idea so I can replicate it.

Employee Suzie signed one time.
Employee Dan signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzie signed one time.
Employee Harold signed one time.
Employee Sebastian signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzan signed one time.

I want to make a summary of these sentences, like this:

Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)

I played with awk but it seems very hard to do it, then I tried sed but didn't work. It seems sed just for finding and changing things please help me with this task?

edited 19 mins ago

Jeff Schaller

34.2k851113

asked 29 mins ago

Zolo

333

New contributor

Is "one" the only count possible, or can there be more events in a single line?
â€“Â RudiC
16 mins ago

add a commentÂ |Â

up vote
6
down vote

favorite

I have data and I want to summarize sentences to generate conclusions. The example below is not related to the data but just to clarify the idea so I can replicate it.

Employee Suzie signed one time.
Employee Dan signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzie signed one time.
Employee Harold signed one time.
Employee Sebastian signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzan signed one time.

I want to make a summary of these sentences, like this:

Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)

I played with awk but it seems very hard to do it, then I tried sed but didn't work. It seems sed just for finding and changing things please help me with this task?

edited 19 mins ago

Jeff Schaller

34.2k851113

asked 29 mins ago

Zolo

333

New contributor

I have data and I want to summarize sentences to generate conclusions. The example below is not related to the data but just to clarify the idea so I can replicate it.

Employee Suzie signed one time.
Employee Dan signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzie signed one time.
Employee Harold signed one time.
Employee Sebastian signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzan signed one time.

I want to make a summary of these sentences, like this:

Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)

I played with awk but it seems very hard to do it, then I tried sed but didn't work. It seems sed just for finding and changing things please help me with this task?

shell text-processing awk sed

edited 19 mins ago

Jeff Schaller

34.2k851113

asked 29 mins ago

Zolo

333

New contributor

edited 19 mins ago

Jeff Schaller

34.2k851113

asked 29 mins ago

Zolo

333

New contributor

edited 19 mins ago

Jeff Schaller

34.2k851113

edited 19 mins ago

Jeff Schaller

34.2k851113

edited 19 mins ago

Jeff Schaller

34.2k851113

asked 29 mins ago

Zolo

333

New contributor

asked 29 mins ago

Zolo

333

asked 29 mins ago

Zolo

333

New contributor

Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

Is "one" the only count possible, or can there be more events in a single line?
â€“Â RudiC
16 mins ago

add a commentÂ |Â

Is "one" the only count possible, or can there be more events in a single line?
â€“Â RudiC
16 mins ago

Is "one" the only count possible, or can there be more events in a single line?
â€“Â RudiC
16 mins ago

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
7
down vote

accepted

The general approach would be

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%s signed %d time(s)n", name, count[name])
 ' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)

I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END block, iterate over all the names and print out the summary for each.

For slightly nicer formatting, change the %s placeholder in the printf() call to something like %-10s to reserve 10 characters for the names (left-justified).

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%-10s signed %d time(s)n", name, count[name])
 ' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)

More fiddling around with the output (because I'm bored):

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%-10s signed %d time%sn", name, count[name],
 count[name] > 1 ? "s" : "" )
 ' <file
Harold signed 1 time
Dan signed 1 time
Sebastian signed 1 time
Suzie signed 4 times
Jordan signed 2 times
Suzan signed 1 time

edited 11 mins ago

answered 22 mins ago

Kusalananda

110k15214338

why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
â€“Â Zolo
9 mins ago

1

@Zolo The simplest way to sort the result by name in this case would be to pipe the output through sort: awk ... <file | sort.
â€“Â Kusalananda
7 mins ago

add a commentÂ |Â

up vote
6
down vote

This job is for awk. You need an array[index] to do it:

awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file

Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)

NF is to remove extra blank lines.
The data is stored in the index and value of the array. Values are referenced with the corresponding index.

answered 23 mins ago

Goro

10.2k64993

Thank you for the help
â€“Â Zolo
8 mins ago

add a commentÂ |Â

up vote
1
down vote

while awk is using an associated array and that would be limited to the memory size you have, you could do as following instead:

sort -k2,2 infile | uniq -c

or to formatting as you want:

sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '

answered 8 mins ago

sddgob

16.2k102564

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Zolo is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f476284%2fmaking-summary-of-sentences%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
7
down vote

accepted

The general approach would be

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%s signed %d time(s)n", name, count[name])
 ' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)

I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END block, iterate over all the names and print out the summary for each.

For slightly nicer formatting, change the %s placeholder in the printf() call to something like %-10s to reserve 10 characters for the names (left-justified).

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%-10s signed %d time(s)n", name, count[name])
 ' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)

More fiddling around with the output (because I'm bored):

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%-10s signed %d time%sn", name, count[name],
 count[name] > 1 ? "s" : "" )
 ' <file
Harold signed 1 time
Dan signed 1 time
Sebastian signed 1 time
Suzie signed 4 times
Jordan signed 2 times
Suzan signed 1 time

edited 11 mins ago

answered 22 mins ago

Kusalananda

110k15214338

why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
â€“Â Zolo
9 mins ago

1

@Zolo The simplest way to sort the result by name in this case would be to pipe the output through sort: awk ... <file | sort.
â€“Â Kusalananda
7 mins ago

add a commentÂ |Â

up vote
7
down vote

accepted

The general approach would be

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%s signed %d time(s)n", name, count[name])
 ' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)

I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END block, iterate over all the names and print out the summary for each.

For slightly nicer formatting, change the %s placeholder in the printf() call to something like %-10s to reserve 10 characters for the names (left-justified).

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%-10s signed %d time(s)n", name, count[name])
 ' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)

More fiddling around with the output (because I'm bored):

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%-10s signed %d time%sn", name, count[name],
 count[name] > 1 ? "s" : "" )
 ' <file
Harold signed 1 time
Dan signed 1 time
Sebastian signed 1 time
Suzie signed 4 times
Jordan signed 2 times
Suzan signed 1 time

edited 11 mins ago

answered 22 mins ago

Kusalananda

110k15214338

why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
â€“Â Zolo
9 mins ago

1

@Zolo The simplest way to sort the result by name in this case would be to pipe the output through sort: awk ... <file | sort.
â€“Â Kusalananda
7 mins ago

add a commentÂ |Â

up vote
7
down vote

accepted

The general approach would be

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%s signed %d time(s)n", name, count[name])
 ' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)

I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END block, iterate over all the names and print out the summary for each.

For slightly nicer formatting, change the %s placeholder in the printf() call to something like %-10s to reserve 10 characters for the names (left-justified).

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%-10s signed %d time(s)n", name, count[name])
 ' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)

More fiddling around with the output (because I'm bored):

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%-10s signed %d time%sn", name, count[name],
 count[name] > 1 ? "s" : "" )
 ' <file
Harold signed 1 time
Dan signed 1 time
Sebastian signed 1 time
Suzie signed 4 times
Jordan signed 2 times
Suzan signed 1 time

edited 11 mins ago

answered 22 mins ago

Kusalananda

110k15214338

The general approach would be

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%s signed %d time(s)n", name, count[name])
 ' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)

I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END block, iterate over all the names and print out the summary for each.

For slightly nicer formatting, change the %s placeholder in the printf() call to something like %-10s to reserve 10 characters for the names (left-justified).

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%-10s signed %d time(s)n", name, count[name])
 ' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)

More fiddling around with the output (because I'm bored):

$ awk ' count[$2]++ 
 END 
 for (name in count)
 printf("%-10s signed %d time%sn", name, count[name],
 count[name] > 1 ? "s" : "" )
 ' <file
Harold signed 1 time
Dan signed 1 time
Sebastian signed 1 time
Suzie signed 4 times
Jordan signed 2 times
Suzan signed 1 time

edited 11 mins ago

answered 22 mins ago

Kusalananda

110k15214338

edited 11 mins ago

answered 22 mins ago

Kusalananda

110k15214338

answered 22 mins ago

Kusalananda

110k15214338

answered 22 mins ago

Kusalananda

110k15214338

why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
â€“Â Zolo
9 mins ago

1

@Zolo The simplest way to sort the result by name in this case would be to pipe the output through sort: awk ... <file | sort.
â€“Â Kusalananda
7 mins ago

add a commentÂ |Â

why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
â€“Â Zolo
9 mins ago

1

@Zolo The simplest way to sort the result by name in this case would be to pipe the output through sort: awk ... <file | sort.
â€“Â Kusalananda
7 mins ago

why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
â€“Â Zolo
9 mins ago

@Zolo The simplest way to sort the result by name in this case would be to pipe the output through sort: awk ... <file | sort.
â€“Â Kusalananda
7 mins ago

add a commentÂ |Â

up vote
6
down vote

This job is for awk. You need an array[index] to do it:

awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file

Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)

NF is to remove extra blank lines.
The data is stored in the index and value of the array. Values are referenced with the corresponding index.

answered 23 mins ago

Goro

10.2k64993

Thank you for the help
â€“Â Zolo
8 mins ago

add a commentÂ |Â

up vote
6
down vote

This job is for awk. You need an array[index] to do it:

awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file

Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)

NF is to remove extra blank lines.
The data is stored in the index and value of the array. Values are referenced with the corresponding index.

answered 23 mins ago

Goro

10.2k64993

Thank you for the help
â€“Â Zolo
8 mins ago

add a commentÂ |Â

up vote
6
down vote

This job is for awk. You need an array[index] to do it:

awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file

Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)

NF is to remove extra blank lines.
The data is stored in the index and value of the array. Values are referenced with the corresponding index.

answered 23 mins ago

Goro

10.2k64993

This job is for awk. You need an array[index] to do it:

awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file

Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)

NF is to remove extra blank lines.
The data is stored in the index and value of the array. Values are referenced with the corresponding index.

answered 23 mins ago

Goro

10.2k64993

answered 23 mins ago

Goro

10.2k64993

answered 23 mins ago

Goro

10.2k64993

answered 23 mins ago

Goro

10.2k64993

Thank you for the help
â€“Â Zolo
8 mins ago

add a commentÂ |Â

Thank you for the help
â€“Â Zolo
8 mins ago

Thank you for the help
â€“Â Zolo
8 mins ago

add a commentÂ |Â

up vote
1
down vote

while awk is using an associated array and that would be limited to the memory size you have, you could do as following instead:

sort -k2,2 infile | uniq -c

or to formatting as you want:

sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '

answered 8 mins ago

sddgob

16.2k102564

add a commentÂ |Â

up vote
1
down vote

while awk is using an associated array and that would be limited to the memory size you have, you could do as following instead:

sort -k2,2 infile | uniq -c

or to formatting as you want:

sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '

answered 8 mins ago

sddgob

16.2k102564

add a commentÂ |Â

up vote
1
down vote

while awk is using an associated array and that would be limited to the memory size you have, you could do as following instead:

sort -k2,2 infile | uniq -c

or to formatting as you want:

sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '

answered 8 mins ago

sddgob

16.2k102564

while awk is using an associated array and that would be limited to the memory size you have, you could do as following instead:

sort -k2,2 infile | uniq -c

or to formatting as you want:

sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '

answered 8 mins ago

sddgob

16.2k102564

answered 8 mins ago

sddgob

16.2k102564

answered 8 mins ago

sddgob

16.2k102564

answered 8 mins ago

sddgob

16.2k102564

add a commentÂ |Â

Zolo is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Zolo is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky