Total reads aligning to each reference within a bam file

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite












I have two PCR amplicons that have been multiplexed and sequenced using the nanopore minion.



I have aligned the fastq reads using minimap2 with a reference file containing both amplicon sequences and generated a bam file that I have viewed using IGV.



I am looking for a way to generate some simple summary statistics.



In particular, is there a way to extract the total number of fastq reads aligning to each amplicon reference from the bam file?










share|improve this question









New contributor




CM3 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.























    up vote
    3
    down vote

    favorite












    I have two PCR amplicons that have been multiplexed and sequenced using the nanopore minion.



    I have aligned the fastq reads using minimap2 with a reference file containing both amplicon sequences and generated a bam file that I have viewed using IGV.



    I am looking for a way to generate some simple summary statistics.



    In particular, is there a way to extract the total number of fastq reads aligning to each amplicon reference from the bam file?










    share|improve this question









    New contributor




    CM3 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





















      up vote
      3
      down vote

      favorite









      up vote
      3
      down vote

      favorite











      I have two PCR amplicons that have been multiplexed and sequenced using the nanopore minion.



      I have aligned the fastq reads using minimap2 with a reference file containing both amplicon sequences and generated a bam file that I have viewed using IGV.



      I am looking for a way to generate some simple summary statistics.



      In particular, is there a way to extract the total number of fastq reads aligning to each amplicon reference from the bam file?










      share|improve this question









      New contributor




      CM3 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      I have two PCR amplicons that have been multiplexed and sequenced using the nanopore minion.



      I have aligned the fastq reads using minimap2 with a reference file containing both amplicon sequences and generated a bam file that I have viewed using IGV.



      I am looking for a way to generate some simple summary statistics.



      In particular, is there a way to extract the total number of fastq reads aligning to each amplicon reference from the bam file?







      alignment bam nanopore minion






      share|improve this question









      New contributor




      CM3 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      CM3 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 4 hours ago









      Llopis

      2,5471628




      2,5471628






      New contributor




      CM3 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 4 hours ago









      CM3

      161




      161




      New contributor




      CM3 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      CM3 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      CM3 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          1
          down vote













          This can be done using



           samtools flagstat your_bam_file





          share|improve this answer




















          • I think that flagstat counts the number of alignments in the bam file.
            – Peter Menzel
            3 hours ago

















          up vote
          1
          down vote













          This one-liner below will work better for long reads than samtools flagstat in that it only counts the primary alignment for each read and samtools flagstat doesn't seem to calculate some stats for long reads. I have never seen samtools flagstat output stats on a per-reference basis, but am curious if so!



          This answer filters out secondary and supplementary alignments for your reads (-F 2304) that have some alignment to both amplicon reference and just keep the best one. This might give a more accurate idea of how many reads of each amplicon are in the library.



          samtools view -F 2304 myfile.bam | awk -F $'t' 'a[$1, $3]++ ENDfor (i in a) split (i, sep, SUBSEP); print sep[1], sep[2], a[i]' | uniq | awk 'print($2)' | uniq -c | sort -k1 -nr


          adapted from this.






          share|improve this answer



























            up vote
            1
            down vote













            The quick way to get the number of alignments on each reference is



            samtools idxstats my_bam.bam


            Number of reads on each reference is column 3. Although, as has been pointed out, this will give you the total number of alignments per reference, not the total number of reads (each read might give rise to more than one alignment). That said I do tend to us this as generally I'm after a rough approximation, rather than an accurate number.



            In theory, only one alignment for each read should be marked as primary, so the following should give you what you need quickly and at low memory usage:



            samtools view -bF 2304 my_bam.bam > primary_only.bam
            samtools index primary_only.bam
            samtools idxstats primary_only.bam





            share|improve this answer




















              Your Answer




              StackExchange.ifUsing("editor", function ()
              return StackExchange.using("mathjaxEditing", function ()
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              );
              );
              , "mathjax-editing");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "676"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              convertImagesToLinks: false,
              noModals: false,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );






              CM3 is a new contributor. Be nice, and check out our Code of Conduct.









               

              draft saved


              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f5330%2ftotal-reads-aligning-to-each-reference-within-a-bam-file%23new-answer', 'question_page');

              );

              Post as a guest






























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              1
              down vote













              This can be done using



               samtools flagstat your_bam_file





              share|improve this answer




















              • I think that flagstat counts the number of alignments in the bam file.
                – Peter Menzel
                3 hours ago














              up vote
              1
              down vote













              This can be done using



               samtools flagstat your_bam_file





              share|improve this answer




















              • I think that flagstat counts the number of alignments in the bam file.
                – Peter Menzel
                3 hours ago












              up vote
              1
              down vote










              up vote
              1
              down vote









              This can be done using



               samtools flagstat your_bam_file





              share|improve this answer












              This can be done using



               samtools flagstat your_bam_file






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered 4 hours ago









              Ammar Sabir Cheema

              513213




              513213











              • I think that flagstat counts the number of alignments in the bam file.
                – Peter Menzel
                3 hours ago
















              • I think that flagstat counts the number of alignments in the bam file.
                – Peter Menzel
                3 hours ago















              I think that flagstat counts the number of alignments in the bam file.
              – Peter Menzel
              3 hours ago




              I think that flagstat counts the number of alignments in the bam file.
              – Peter Menzel
              3 hours ago










              up vote
              1
              down vote













              This one-liner below will work better for long reads than samtools flagstat in that it only counts the primary alignment for each read and samtools flagstat doesn't seem to calculate some stats for long reads. I have never seen samtools flagstat output stats on a per-reference basis, but am curious if so!



              This answer filters out secondary and supplementary alignments for your reads (-F 2304) that have some alignment to both amplicon reference and just keep the best one. This might give a more accurate idea of how many reads of each amplicon are in the library.



              samtools view -F 2304 myfile.bam | awk -F $'t' 'a[$1, $3]++ ENDfor (i in a) split (i, sep, SUBSEP); print sep[1], sep[2], a[i]' | uniq | awk 'print($2)' | uniq -c | sort -k1 -nr


              adapted from this.






              share|improve this answer
























                up vote
                1
                down vote













                This one-liner below will work better for long reads than samtools flagstat in that it only counts the primary alignment for each read and samtools flagstat doesn't seem to calculate some stats for long reads. I have never seen samtools flagstat output stats on a per-reference basis, but am curious if so!



                This answer filters out secondary and supplementary alignments for your reads (-F 2304) that have some alignment to both amplicon reference and just keep the best one. This might give a more accurate idea of how many reads of each amplicon are in the library.



                samtools view -F 2304 myfile.bam | awk -F $'t' 'a[$1, $3]++ ENDfor (i in a) split (i, sep, SUBSEP); print sep[1], sep[2], a[i]' | uniq | awk 'print($2)' | uniq -c | sort -k1 -nr


                adapted from this.






                share|improve this answer






















                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  This one-liner below will work better for long reads than samtools flagstat in that it only counts the primary alignment for each read and samtools flagstat doesn't seem to calculate some stats for long reads. I have never seen samtools flagstat output stats on a per-reference basis, but am curious if so!



                  This answer filters out secondary and supplementary alignments for your reads (-F 2304) that have some alignment to both amplicon reference and just keep the best one. This might give a more accurate idea of how many reads of each amplicon are in the library.



                  samtools view -F 2304 myfile.bam | awk -F $'t' 'a[$1, $3]++ ENDfor (i in a) split (i, sep, SUBSEP); print sep[1], sep[2], a[i]' | uniq | awk 'print($2)' | uniq -c | sort -k1 -nr


                  adapted from this.






                  share|improve this answer












                  This one-liner below will work better for long reads than samtools flagstat in that it only counts the primary alignment for each read and samtools flagstat doesn't seem to calculate some stats for long reads. I have never seen samtools flagstat output stats on a per-reference basis, but am curious if so!



                  This answer filters out secondary and supplementary alignments for your reads (-F 2304) that have some alignment to both amplicon reference and just keep the best one. This might give a more accurate idea of how many reads of each amplicon are in the library.



                  samtools view -F 2304 myfile.bam | awk -F $'t' 'a[$1, $3]++ ENDfor (i in a) split (i, sep, SUBSEP); print sep[1], sep[2], a[i]' | uniq | awk 'print($2)' | uniq -c | sort -k1 -nr


                  adapted from this.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 4 hours ago









                  conchoecia

                  96019




                  96019




















                      up vote
                      1
                      down vote













                      The quick way to get the number of alignments on each reference is



                      samtools idxstats my_bam.bam


                      Number of reads on each reference is column 3. Although, as has been pointed out, this will give you the total number of alignments per reference, not the total number of reads (each read might give rise to more than one alignment). That said I do tend to us this as generally I'm after a rough approximation, rather than an accurate number.



                      In theory, only one alignment for each read should be marked as primary, so the following should give you what you need quickly and at low memory usage:



                      samtools view -bF 2304 my_bam.bam > primary_only.bam
                      samtools index primary_only.bam
                      samtools idxstats primary_only.bam





                      share|improve this answer
























                        up vote
                        1
                        down vote













                        The quick way to get the number of alignments on each reference is



                        samtools idxstats my_bam.bam


                        Number of reads on each reference is column 3. Although, as has been pointed out, this will give you the total number of alignments per reference, not the total number of reads (each read might give rise to more than one alignment). That said I do tend to us this as generally I'm after a rough approximation, rather than an accurate number.



                        In theory, only one alignment for each read should be marked as primary, so the following should give you what you need quickly and at low memory usage:



                        samtools view -bF 2304 my_bam.bam > primary_only.bam
                        samtools index primary_only.bam
                        samtools idxstats primary_only.bam





                        share|improve this answer






















                          up vote
                          1
                          down vote










                          up vote
                          1
                          down vote









                          The quick way to get the number of alignments on each reference is



                          samtools idxstats my_bam.bam


                          Number of reads on each reference is column 3. Although, as has been pointed out, this will give you the total number of alignments per reference, not the total number of reads (each read might give rise to more than one alignment). That said I do tend to us this as generally I'm after a rough approximation, rather than an accurate number.



                          In theory, only one alignment for each read should be marked as primary, so the following should give you what you need quickly and at low memory usage:



                          samtools view -bF 2304 my_bam.bam > primary_only.bam
                          samtools index primary_only.bam
                          samtools idxstats primary_only.bam





                          share|improve this answer












                          The quick way to get the number of alignments on each reference is



                          samtools idxstats my_bam.bam


                          Number of reads on each reference is column 3. Although, as has been pointed out, this will give you the total number of alignments per reference, not the total number of reads (each read might give rise to more than one alignment). That said I do tend to us this as generally I'm after a rough approximation, rather than an accurate number.



                          In theory, only one alignment for each read should be marked as primary, so the following should give you what you need quickly and at low memory usage:



                          samtools view -bF 2304 my_bam.bam > primary_only.bam
                          samtools index primary_only.bam
                          samtools idxstats primary_only.bam






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered 1 hour ago









                          Ian Sudbery

                          1,656214




                          1,656214




















                              CM3 is a new contributor. Be nice, and check out our Code of Conduct.









                               

                              draft saved


                              draft discarded


















                              CM3 is a new contributor. Be nice, and check out our Code of Conduct.












                              CM3 is a new contributor. Be nice, and check out our Code of Conduct.











                              CM3 is a new contributor. Be nice, and check out our Code of Conduct.













                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f5330%2ftotal-reads-aligning-to-each-reference-within-a-bam-file%23new-answer', 'question_page');

                              );

                              Post as a guest













































































                              Comments

                              Popular posts from this blog

                              Long meetings (6-7 hours a day): Being “babysat” by supervisor

                              Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

                              Confectionery