making summary of sentences

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
6
down vote

favorite












I have data and I want to summarize sentences to generate conclusions. The example below is not related to the data but just to clarify the idea so I can replicate it.



Employee Suzie signed one time.
Employee Dan signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzie signed one time.
Employee Harold signed one time.
Employee Sebastian signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzan signed one time.


I want to make a summary of these sentences, like this:



Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)


I played with awk but it seems very hard to do it, then I tried sed but didn't work. It seems sed just for finding and changing things please help me with this task?










share|improve this question









New contributor




Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Is "one" the only count possible, or can there be more events in a single line?
    – RudiC
    16 mins ago














up vote
6
down vote

favorite












I have data and I want to summarize sentences to generate conclusions. The example below is not related to the data but just to clarify the idea so I can replicate it.



Employee Suzie signed one time.
Employee Dan signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzie signed one time.
Employee Harold signed one time.
Employee Sebastian signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzan signed one time.


I want to make a summary of these sentences, like this:



Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)


I played with awk but it seems very hard to do it, then I tried sed but didn't work. It seems sed just for finding and changing things please help me with this task?










share|improve this question









New contributor




Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Is "one" the only count possible, or can there be more events in a single line?
    – RudiC
    16 mins ago












up vote
6
down vote

favorite









up vote
6
down vote

favorite











I have data and I want to summarize sentences to generate conclusions. The example below is not related to the data but just to clarify the idea so I can replicate it.



Employee Suzie signed one time.
Employee Dan signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzie signed one time.
Employee Harold signed one time.
Employee Sebastian signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzan signed one time.


I want to make a summary of these sentences, like this:



Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)


I played with awk but it seems very hard to do it, then I tried sed but didn't work. It seems sed just for finding and changing things please help me with this task?










share|improve this question









New contributor




Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I have data and I want to summarize sentences to generate conclusions. The example below is not related to the data but just to clarify the idea so I can replicate it.



Employee Suzie signed one time.
Employee Dan signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzie signed one time.
Employee Harold signed one time.
Employee Sebastian signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzan signed one time.


I want to make a summary of these sentences, like this:



Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)


I played with awk but it seems very hard to do it, then I tried sed but didn't work. It seems sed just for finding and changing things please help me with this task?







shell text-processing awk sed






share|improve this question









New contributor




Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 19 mins ago









Jeff Schaller

34.2k851113




34.2k851113






New contributor




Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 29 mins ago









Zolo

333




333




New contributor




Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Zolo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • Is "one" the only count possible, or can there be more events in a single line?
    – RudiC
    16 mins ago
















  • Is "one" the only count possible, or can there be more events in a single line?
    – RudiC
    16 mins ago















Is "one" the only count possible, or can there be more events in a single line?
– RudiC
16 mins ago




Is "one" the only count possible, or can there be more events in a single line?
– RudiC
16 mins ago










3 Answers
3






active

oldest

votes

















up vote
7
down vote



accepted










The general approach would be



$ awk ' count[$2]++ 
END
for (name in count)
printf("%s signed %d time(s)n", name, count[name])
' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)


I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END block, iterate over all the names and print out the summary for each.



For slightly nicer formatting, change the %s placeholder in the printf() call to something like %-10s to reserve 10 characters for the names (left-justified).



$ awk ' count[$2]++ 
END
for (name in count)
printf("%-10s signed %d time(s)n", name, count[name])
' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)


More fiddling around with the output (because I'm bored):



$ awk ' count[$2]++ 
END
for (name in count)
printf("%-10s signed %d time%sn", name, count[name],
count[name] > 1 ? "s" : "" )
' <file
Harold signed 1 time
Dan signed 1 time
Sebastian signed 1 time
Suzie signed 4 times
Jordan signed 2 times
Suzan signed 1 time





share|improve this answer






















  • why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
    – Zolo
    9 mins ago






  • 1




    @Zolo The simplest way to sort the result by name in this case would be to pipe the output through sort: awk ... <file | sort.
    – Kusalananda
    7 mins ago

















up vote
6
down vote













This job is for awk. You need an array[index] to do it:



awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file

Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)


NF is to remove extra blank lines.
The data is stored in the index and value of the array. Values are referenced with the corresponding index.






share|improve this answer




















  • Thank you for the help
    – Zolo
    8 mins ago

















up vote
1
down vote













while awk is using an associated array and that would be limited to the memory size you have, you could do as following instead:



sort -k2,2 infile | uniq -c


or to formatting as you want:



sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '




share




















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    Zolo is a new contributor. Be nice, and check out our Code of Conduct.









     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f476284%2fmaking-summary-of-sentences%23new-answer', 'question_page');

    );

    Post as a guest






























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    7
    down vote



    accepted










    The general approach would be



    $ awk ' count[$2]++ 
    END
    for (name in count)
    printf("%s signed %d time(s)n", name, count[name])
    ' <file
    Harold signed 1 time(s)
    Dan signed 1 time(s)
    Sebastian signed 1 time(s)
    Suzie signed 4 time(s)
    Jordan signed 2 time(s)
    Suzan signed 1 time(s)


    I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END block, iterate over all the names and print out the summary for each.



    For slightly nicer formatting, change the %s placeholder in the printf() call to something like %-10s to reserve 10 characters for the names (left-justified).



    $ awk ' count[$2]++ 
    END
    for (name in count)
    printf("%-10s signed %d time(s)n", name, count[name])
    ' <file
    Harold signed 1 time(s)
    Dan signed 1 time(s)
    Sebastian signed 1 time(s)
    Suzie signed 4 time(s)
    Jordan signed 2 time(s)
    Suzan signed 1 time(s)


    More fiddling around with the output (because I'm bored):



    $ awk ' count[$2]++ 
    END
    for (name in count)
    printf("%-10s signed %d time%sn", name, count[name],
    count[name] > 1 ? "s" : "" )
    ' <file
    Harold signed 1 time
    Dan signed 1 time
    Sebastian signed 1 time
    Suzie signed 4 times
    Jordan signed 2 times
    Suzan signed 1 time





    share|improve this answer






















    • why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
      – Zolo
      9 mins ago






    • 1




      @Zolo The simplest way to sort the result by name in this case would be to pipe the output through sort: awk ... <file | sort.
      – Kusalananda
      7 mins ago














    up vote
    7
    down vote



    accepted










    The general approach would be



    $ awk ' count[$2]++ 
    END
    for (name in count)
    printf("%s signed %d time(s)n", name, count[name])
    ' <file
    Harold signed 1 time(s)
    Dan signed 1 time(s)
    Sebastian signed 1 time(s)
    Suzie signed 4 time(s)
    Jordan signed 2 time(s)
    Suzan signed 1 time(s)


    I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END block, iterate over all the names and print out the summary for each.



    For slightly nicer formatting, change the %s placeholder in the printf() call to something like %-10s to reserve 10 characters for the names (left-justified).



    $ awk ' count[$2]++ 
    END
    for (name in count)
    printf("%-10s signed %d time(s)n", name, count[name])
    ' <file
    Harold signed 1 time(s)
    Dan signed 1 time(s)
    Sebastian signed 1 time(s)
    Suzie signed 4 time(s)
    Jordan signed 2 time(s)
    Suzan signed 1 time(s)


    More fiddling around with the output (because I'm bored):



    $ awk ' count[$2]++ 
    END
    for (name in count)
    printf("%-10s signed %d time%sn", name, count[name],
    count[name] > 1 ? "s" : "" )
    ' <file
    Harold signed 1 time
    Dan signed 1 time
    Sebastian signed 1 time
    Suzie signed 4 times
    Jordan signed 2 times
    Suzan signed 1 time





    share|improve this answer






















    • why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
      – Zolo
      9 mins ago






    • 1




      @Zolo The simplest way to sort the result by name in this case would be to pipe the output through sort: awk ... <file | sort.
      – Kusalananda
      7 mins ago












    up vote
    7
    down vote



    accepted







    up vote
    7
    down vote



    accepted






    The general approach would be



    $ awk ' count[$2]++ 
    END
    for (name in count)
    printf("%s signed %d time(s)n", name, count[name])
    ' <file
    Harold signed 1 time(s)
    Dan signed 1 time(s)
    Sebastian signed 1 time(s)
    Suzie signed 4 time(s)
    Jordan signed 2 time(s)
    Suzan signed 1 time(s)


    I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END block, iterate over all the names and print out the summary for each.



    For slightly nicer formatting, change the %s placeholder in the printf() call to something like %-10s to reserve 10 characters for the names (left-justified).



    $ awk ' count[$2]++ 
    END
    for (name in count)
    printf("%-10s signed %d time(s)n", name, count[name])
    ' <file
    Harold signed 1 time(s)
    Dan signed 1 time(s)
    Sebastian signed 1 time(s)
    Suzie signed 4 time(s)
    Jordan signed 2 time(s)
    Suzan signed 1 time(s)


    More fiddling around with the output (because I'm bored):



    $ awk ' count[$2]++ 
    END
    for (name in count)
    printf("%-10s signed %d time%sn", name, count[name],
    count[name] > 1 ? "s" : "" )
    ' <file
    Harold signed 1 time
    Dan signed 1 time
    Sebastian signed 1 time
    Suzie signed 4 times
    Jordan signed 2 times
    Suzan signed 1 time





    share|improve this answer














    The general approach would be



    $ awk ' count[$2]++ 
    END
    for (name in count)
    printf("%s signed %d time(s)n", name, count[name])
    ' <file
    Harold signed 1 time(s)
    Dan signed 1 time(s)
    Sebastian signed 1 time(s)
    Suzie signed 4 time(s)
    Jordan signed 2 time(s)
    Suzan signed 1 time(s)


    I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END block, iterate over all the names and print out the summary for each.



    For slightly nicer formatting, change the %s placeholder in the printf() call to something like %-10s to reserve 10 characters for the names (left-justified).



    $ awk ' count[$2]++ 
    END
    for (name in count)
    printf("%-10s signed %d time(s)n", name, count[name])
    ' <file
    Harold signed 1 time(s)
    Dan signed 1 time(s)
    Sebastian signed 1 time(s)
    Suzie signed 4 time(s)
    Jordan signed 2 time(s)
    Suzan signed 1 time(s)


    More fiddling around with the output (because I'm bored):



    $ awk ' count[$2]++ 
    END
    for (name in count)
    printf("%-10s signed %d time%sn", name, count[name],
    count[name] > 1 ? "s" : "" )
    ' <file
    Harold signed 1 time
    Dan signed 1 time
    Sebastian signed 1 time
    Suzie signed 4 times
    Jordan signed 2 times
    Suzan signed 1 time






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 11 mins ago

























    answered 22 mins ago









    Kusalananda

    110k15214338




    110k15214338











    • why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
      – Zolo
      9 mins ago






    • 1




      @Zolo The simplest way to sort the result by name in this case would be to pipe the output through sort: awk ... <file | sort.
      – Kusalananda
      7 mins ago
















    • why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
      – Zolo
      9 mins ago






    • 1




      @Zolo The simplest way to sort the result by name in this case would be to pipe the output through sort: awk ... <file | sort.
      – Kusalananda
      7 mins ago















    why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
    – Zolo
    9 mins ago




    why bored? Thank you very much for such great answer.... I lost any hope to do it in awk. Please is there any way to order the final result by name?
    – Zolo
    9 mins ago




    1




    1




    @Zolo The simplest way to sort the result by name in this case would be to pipe the output through sort: awk ... <file | sort.
    – Kusalananda
    7 mins ago




    @Zolo The simplest way to sort the result by name in this case would be to pipe the output through sort: awk ... <file | sort.
    – Kusalananda
    7 mins ago












    up vote
    6
    down vote













    This job is for awk. You need an array[index] to do it:



    awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file

    Jordan signed 2 time(s)
    Dan signed 1 time(s)
    Suzie signed 4 time(s)
    Suzan signed 1 time(s)
    Sebastian signed 1 time(s)
    Harold signed 1 time(s)


    NF is to remove extra blank lines.
    The data is stored in the index and value of the array. Values are referenced with the corresponding index.






    share|improve this answer




















    • Thank you for the help
      – Zolo
      8 mins ago














    up vote
    6
    down vote













    This job is for awk. You need an array[index] to do it:



    awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file

    Jordan signed 2 time(s)
    Dan signed 1 time(s)
    Suzie signed 4 time(s)
    Suzan signed 1 time(s)
    Sebastian signed 1 time(s)
    Harold signed 1 time(s)


    NF is to remove extra blank lines.
    The data is stored in the index and value of the array. Values are referenced with the corresponding index.






    share|improve this answer




















    • Thank you for the help
      – Zolo
      8 mins ago












    up vote
    6
    down vote










    up vote
    6
    down vote









    This job is for awk. You need an array[index] to do it:



    awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file

    Jordan signed 2 time(s)
    Dan signed 1 time(s)
    Suzie signed 4 time(s)
    Suzan signed 1 time(s)
    Sebastian signed 1 time(s)
    Harold signed 1 time(s)


    NF is to remove extra blank lines.
    The data is stored in the index and value of the array. Values are referenced with the corresponding index.






    share|improve this answer












    This job is for awk. You need an array[index] to do it:



    awk 'NF name[$2]++ ENDfor (each in name) print each " signed " name[each] " time(s)"' file

    Jordan signed 2 time(s)
    Dan signed 1 time(s)
    Suzie signed 4 time(s)
    Suzan signed 1 time(s)
    Sebastian signed 1 time(s)
    Harold signed 1 time(s)


    NF is to remove extra blank lines.
    The data is stored in the index and value of the array. Values are referenced with the corresponding index.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered 23 mins ago









    Goro

    10.2k64993




    10.2k64993











    • Thank you for the help
      – Zolo
      8 mins ago
















    • Thank you for the help
      – Zolo
      8 mins ago















    Thank you for the help
    – Zolo
    8 mins ago




    Thank you for the help
    – Zolo
    8 mins ago










    up vote
    1
    down vote













    while awk is using an associated array and that would be limited to the memory size you have, you could do as following instead:



    sort -k2,2 infile | uniq -c


    or to formatting as you want:



    sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '




    share
























      up vote
      1
      down vote













      while awk is using an associated array and that would be limited to the memory size you have, you could do as following instead:



      sort -k2,2 infile | uniq -c


      or to formatting as you want:



      sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '




      share






















        up vote
        1
        down vote










        up vote
        1
        down vote









        while awk is using an associated array and that would be limited to the memory size you have, you could do as following instead:



        sort -k2,2 infile | uniq -c


        or to formatting as you want:



        sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '




        share












        while awk is using an associated array and that would be limited to the memory size you have, you could do as following instead:



        sort -k2,2 infile | uniq -c


        or to formatting as you want:



        sort -k2,2 infile |uniq -c |awk ' print $3, "signed", $1, "time(s)" '





        share











        share


        share










        answered 8 mins ago









        sddgob

        16.2k102564




        16.2k102564




















            Zolo is a new contributor. Be nice, and check out our Code of Conduct.









             

            draft saved


            draft discarded


















            Zolo is a new contributor. Be nice, and check out our Code of Conduct.












            Zolo is a new contributor. Be nice, and check out our Code of Conduct.











            Zolo is a new contributor. Be nice, and check out our Code of Conduct.













             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f476284%2fmaking-summary-of-sentences%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            What does second last employer means? [closed]

            List of Gilmore Girls characters

            Confectionery