finding unique headers in a fasta file using linux command line

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












I tried to use the following command



uniq -u reference.fasta >> reference_uniq.fasta



Id like a count of the unique headers










share|improve this question







New contributor




crispr is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Did you use python? It will be easier.
    – tianhua liao
    1 hour ago














up vote
1
down vote

favorite












I tried to use the following command



uniq -u reference.fasta >> reference_uniq.fasta



Id like a count of the unique headers










share|improve this question







New contributor




crispr is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Did you use python? It will be easier.
    – tianhua liao
    1 hour ago












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I tried to use the following command



uniq -u reference.fasta >> reference_uniq.fasta



Id like a count of the unique headers










share|improve this question







New contributor




crispr is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I tried to use the following command



uniq -u reference.fasta >> reference_uniq.fasta



Id like a count of the unique headers







fasta linux






share|improve this question







New contributor




crispr is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




crispr is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




crispr is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 1 hour ago









crispr

62




62




New contributor




crispr is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





crispr is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






crispr is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • Did you use python? It will be easier.
    – tianhua liao
    1 hour ago
















  • Did you use python? It will be easier.
    – tianhua liao
    1 hour ago















Did you use python? It will be easier.
– tianhua liao
1 hour ago




Did you use python? It will be easier.
– tianhua liao
1 hour ago










3 Answers
3






active

oldest

votes

















up vote
2
down vote













If you just want the number of unique headers, you can do this:



grep '>' reference.fasta | sort | uniq -c | wc -l


If you want a list of the unique headers, you can do this:



grep '>' reference.fasta | sort | uniq


If you want a histogram of how many times each header occurs, you can do this:



grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'





share|improve this answer



























    up vote
    1
    down vote













    You can achieve your goal with a one-liner:



    grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort





    share|improve this answer



























      up vote
      0
      down vote













      The uniq command expects sorted input. Interestingly, the sort command actually has a "unique" option, -u, which means uniq is not strictly needed. For the fastest processing, you can look for the '>' character at the start of lines with grep:



      grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta


      For returning the number of unique lines, pipe through wc -l:



      grep '^>' reference.fasta | sort -u | wc -l


      For more information about regular expressions, see here.





      share




















        Your Answer




        StackExchange.ifUsing("editor", function ()
        return StackExchange.using("mathjaxEditing", function ()
        StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
        StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
        );
        );
        , "mathjax-editing");

        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "676"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        convertImagesToLinks: false,
        noModals: false,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );






        crispr is a new contributor. Be nice, and check out our Code of Conduct.









         

        draft saved


        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f5185%2ffinding-unique-headers-in-a-fasta-file-using-linux-command-line%23new-answer', 'question_page');

        );

        Post as a guest






























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        2
        down vote













        If you just want the number of unique headers, you can do this:



        grep '>' reference.fasta | sort | uniq -c | wc -l


        If you want a list of the unique headers, you can do this:



        grep '>' reference.fasta | sort | uniq


        If you want a histogram of how many times each header occurs, you can do this:



        grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'





        share|improve this answer
























          up vote
          2
          down vote













          If you just want the number of unique headers, you can do this:



          grep '>' reference.fasta | sort | uniq -c | wc -l


          If you want a list of the unique headers, you can do this:



          grep '>' reference.fasta | sort | uniq


          If you want a histogram of how many times each header occurs, you can do this:



          grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'





          share|improve this answer






















            up vote
            2
            down vote










            up vote
            2
            down vote









            If you just want the number of unique headers, you can do this:



            grep '>' reference.fasta | sort | uniq -c | wc -l


            If you want a list of the unique headers, you can do this:



            grep '>' reference.fasta | sort | uniq


            If you want a histogram of how many times each header occurs, you can do this:



            grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'





            share|improve this answer












            If you just want the number of unique headers, you can do this:



            grep '>' reference.fasta | sort | uniq -c | wc -l


            If you want a list of the unique headers, you can do this:



            grep '>' reference.fasta | sort | uniq


            If you want a histogram of how many times each header occurs, you can do this:



            grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 1 hour ago









            conchoecia

            71215




            71215




















                up vote
                1
                down vote













                You can achieve your goal with a one-liner:



                grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort





                share|improve this answer
























                  up vote
                  1
                  down vote













                  You can achieve your goal with a one-liner:



                  grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort





                  share|improve this answer






















                    up vote
                    1
                    down vote










                    up vote
                    1
                    down vote









                    You can achieve your goal with a one-liner:



                    grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort





                    share|improve this answer












                    You can achieve your goal with a one-liner:



                    grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered 1 hour ago









                    user3479780

                    712




                    712




















                        up vote
                        0
                        down vote













                        The uniq command expects sorted input. Interestingly, the sort command actually has a "unique" option, -u, which means uniq is not strictly needed. For the fastest processing, you can look for the '>' character at the start of lines with grep:



                        grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta


                        For returning the number of unique lines, pipe through wc -l:



                        grep '^>' reference.fasta | sort -u | wc -l


                        For more information about regular expressions, see here.





                        share
























                          up vote
                          0
                          down vote













                          The uniq command expects sorted input. Interestingly, the sort command actually has a "unique" option, -u, which means uniq is not strictly needed. For the fastest processing, you can look for the '>' character at the start of lines with grep:



                          grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta


                          For returning the number of unique lines, pipe through wc -l:



                          grep '^>' reference.fasta | sort -u | wc -l


                          For more information about regular expressions, see here.





                          share






















                            up vote
                            0
                            down vote










                            up vote
                            0
                            down vote









                            The uniq command expects sorted input. Interestingly, the sort command actually has a "unique" option, -u, which means uniq is not strictly needed. For the fastest processing, you can look for the '>' character at the start of lines with grep:



                            grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta


                            For returning the number of unique lines, pipe through wc -l:



                            grep '^>' reference.fasta | sort -u | wc -l


                            For more information about regular expressions, see here.





                            share












                            The uniq command expects sorted input. Interestingly, the sort command actually has a "unique" option, -u, which means uniq is not strictly needed. For the fastest processing, you can look for the '>' character at the start of lines with grep:



                            grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta


                            For returning the number of unique lines, pipe through wc -l:



                            grep '^>' reference.fasta | sort -u | wc -l


                            For more information about regular expressions, see here.






                            share











                            share


                            share










                            answered 4 mins ago









                            gringer

                            6,6132844




                            6,6132844




















                                crispr is a new contributor. Be nice, and check out our Code of Conduct.









                                 

                                draft saved


                                draft discarded


















                                crispr is a new contributor. Be nice, and check out our Code of Conduct.












                                crispr is a new contributor. Be nice, and check out our Code of Conduct.











                                crispr is a new contributor. Be nice, and check out our Code of Conduct.













                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f5185%2ffinding-unique-headers-in-a-fasta-file-using-linux-command-line%23new-answer', 'question_page');

                                );

                                Post as a guest













































































                                Comments

                                Popular posts from this blog

                                Long meetings (6-7 hours a day): Being “babysat” by supervisor

                                Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

                                Confectionery