Bash scripting FastQC for multiple fastq files in multiple directories

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
4
down vote

favorite












I am completely new to bioinformatics so I'm looking to learn how to do this.



I have multiple directories with fastq files: E.g; 10 Directories with each time series, each with Treatment and control directories, each with rep1 rep2 rep3.



For example: T9/Infected/Rep1/*.fastq.gz.



I'm looking to create a loop to run fastQC on each fastq file instead of having to submit a separate job for each directory.



Then to either output the fastQC data to a single directory or if possible a directory corresponding to each rep - e.g. rep1 results go into a folder called rep1 and so on.










share|improve this question















migrated from stackoverflow.com 3 hours ago


This question came from our site for professional and enthusiast programmers.


















    up vote
    4
    down vote

    favorite












    I am completely new to bioinformatics so I'm looking to learn how to do this.



    I have multiple directories with fastq files: E.g; 10 Directories with each time series, each with Treatment and control directories, each with rep1 rep2 rep3.



    For example: T9/Infected/Rep1/*.fastq.gz.



    I'm looking to create a loop to run fastQC on each fastq file instead of having to submit a separate job for each directory.



    Then to either output the fastQC data to a single directory or if possible a directory corresponding to each rep - e.g. rep1 results go into a folder called rep1 and so on.










    share|improve this question















    migrated from stackoverflow.com 3 hours ago


    This question came from our site for professional and enthusiast programmers.
















      up vote
      4
      down vote

      favorite









      up vote
      4
      down vote

      favorite











      I am completely new to bioinformatics so I'm looking to learn how to do this.



      I have multiple directories with fastq files: E.g; 10 Directories with each time series, each with Treatment and control directories, each with rep1 rep2 rep3.



      For example: T9/Infected/Rep1/*.fastq.gz.



      I'm looking to create a loop to run fastQC on each fastq file instead of having to submit a separate job for each directory.



      Then to either output the fastQC data to a single directory or if possible a directory corresponding to each rep - e.g. rep1 results go into a folder called rep1 and so on.










      share|improve this question















      I am completely new to bioinformatics so I'm looking to learn how to do this.



      I have multiple directories with fastq files: E.g; 10 Directories with each time series, each with Treatment and control directories, each with rep1 rep2 rep3.



      For example: T9/Infected/Rep1/*.fastq.gz.



      I'm looking to create a loop to run fastQC on each fastq file instead of having to submit a separate job for each directory.



      Then to either output the fastQC data to a single directory or if possible a directory corresponding to each rep - e.g. rep1 results go into a folder called rep1 and so on.







      fastq quality-control bash fastqc






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 2 hours ago









      Bioathlete

      1,570215




      1,570215










      asked 7 hours ago







      Ryan Carter











      migrated from stackoverflow.com 3 hours ago


      This question came from our site for professional and enthusiast programmers.






      migrated from stackoverflow.com 3 hours ago


      This question came from our site for professional and enthusiast programmers.






















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          1
          down vote













          Example dir structure:



          $ find FastQC/
          FastQC/
          FastQC/T9
          FastQC/T9/Infected
          FastQC/T9/Infected/Rep1
          FastQC/T9/Infected/Rep1/test11.fastq.gz
          FastQC/T9/Infected/Rep1/test1.fastq.gz
          FastQC/T9/Infected/Rep2
          FastQC/T9/Infected/Rep2/test2.fastq.gz
          FastQC/T9/Infected/Rep3
          FastQC/T9/Infected/Rep3/test3.fastq.gz


          If understood well You need to run some job on every *.fastq.gz file. Then You can do something like this (my example job is gzip test, replace with your job):



          Rookie:



          $ find FastQC/ -type f -name "*.fastq.gz" | xargs gzip -tv
          FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
          FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
          FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
          FastQC/T9/Infected/Rep3/test3.fastq.gz: OK


          Solid:



          $ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I gzip -tv 
          FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
          FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
          FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
          FastQC/T9/Infected/Rep3/test3.fastq.gz: OK


          1. find finds files with name *.fastq.gz and outputs it with zero byte delimited (to support weird characters like space etc. in filenames)

          2. xargs represents output as and passes it to gzip -tv


          If You want to copy files inside one heap folder:



          $ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I cp -pv FastQC_heap/
          `FastQC/T9/Infected/Rep1/test11.fastq.gz' -> `FastQC_heap/test11.fastq.gz'
          `FastQC/T9/Infected/Rep1/test1.fastq.gz' -> `FastQC_heap/test1.fastq.gz'
          `FastQC/T9/Infected/Rep2/test2.fastq.gz' -> `FastQC_heap/test2.fastq.gz'
          `FastQC/T9/Infected/Rep3/test3.fastq.gz' -> `FastQC_heap/test3.fastq.gz'





          share|improve this answer



























            up vote
            1
            down vote













            You have a lot of files. You are going to have a lot of fastQC output. This is a big data situation, and when you've got a lot of data it's best to find a way to condense it so that you can see it all in one place. MultiQC is a sort of wrapper for fastQC and has the benefits of being easier to run on multiple fastq files (no loops required), and best of all, produces a single interactive file with all of your quality results in one place. While this doesn't answer your exact question, it does solve the problem that you are looking to solve.



            After installation:



            pip install multiqc


            it's a one-liner to run:



            multiqc /toplevel/pat/to/fastqs


            Provide it with the most top-level directory and it'll search for the files it needs in the sub-directories.



            You file will look something like this. Note that you can collect much more that just fastQC stats on the fastq files, if needed. It is perfectly suited for only fastQC results as in you scenario.
            multiQC screencap






            share|improve this answer



























              up vote
              1
              down vote













              multiqc kind of glazes over some important information, like the exact adapters and duplicated sequences in a library. If you plan to spend big $$ for sequencing a library it is better to look at both the multiqc report and the actual fastqc html report to get a better idea of any error modes.



              Going off of @Kubator's answer, I noticed that there was no command to run fastqc.



              Here's a simple one-liner to run fastqc in parallel on all of your fastq files. The -j 25 uses 25 threads. Change 25 to however many threads you want/have for max speed.



              #Run fastqc on everything in parallel.
              > find ../reads/ -name '*.fastq.gz' | awk 'printf("fastqc %sn", $0)' | parallel -j 25 --verbose
              # copies all the fastqc files to directory ./
              > find ../reads/ -name '*fastqc.*' | xargs -I '' mv '' ./


              These files might be output by multiqc, anyway!






              share|improve this answer




















              • +1, but note that this will break in the unlikely case where any of the fastq file names contain whitespace. A safer version is find . -name '*.fastq.gz' | awk 'printf("fastqc "%s"n", $0)' but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version of find with -printf, like GNU find): find . -name '*.fastq.gz' -printf '"%p"n' | parallel -j 25 --verbose.
                – terdon♦
                9 mins ago










              Your Answer




              StackExchange.ifUsing("editor", function ()
              return StackExchange.using("mathjaxEditing", function ()
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              );
              );
              , "mathjax-editing");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "676"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              convertImagesToLinks: false,
              noModals: false,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













               

              draft saved


              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f5292%2fbash-scripting-fastqc-for-multiple-fastq-files-in-multiple-directories%23new-answer', 'question_page');

              );

              Post as a guest





























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              1
              down vote













              Example dir structure:



              $ find FastQC/
              FastQC/
              FastQC/T9
              FastQC/T9/Infected
              FastQC/T9/Infected/Rep1
              FastQC/T9/Infected/Rep1/test11.fastq.gz
              FastQC/T9/Infected/Rep1/test1.fastq.gz
              FastQC/T9/Infected/Rep2
              FastQC/T9/Infected/Rep2/test2.fastq.gz
              FastQC/T9/Infected/Rep3
              FastQC/T9/Infected/Rep3/test3.fastq.gz


              If understood well You need to run some job on every *.fastq.gz file. Then You can do something like this (my example job is gzip test, replace with your job):



              Rookie:



              $ find FastQC/ -type f -name "*.fastq.gz" | xargs gzip -tv
              FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
              FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
              FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
              FastQC/T9/Infected/Rep3/test3.fastq.gz: OK


              Solid:



              $ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I gzip -tv 
              FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
              FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
              FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
              FastQC/T9/Infected/Rep3/test3.fastq.gz: OK


              1. find finds files with name *.fastq.gz and outputs it with zero byte delimited (to support weird characters like space etc. in filenames)

              2. xargs represents output as and passes it to gzip -tv


              If You want to copy files inside one heap folder:



              $ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I cp -pv FastQC_heap/
              `FastQC/T9/Infected/Rep1/test11.fastq.gz' -> `FastQC_heap/test11.fastq.gz'
              `FastQC/T9/Infected/Rep1/test1.fastq.gz' -> `FastQC_heap/test1.fastq.gz'
              `FastQC/T9/Infected/Rep2/test2.fastq.gz' -> `FastQC_heap/test2.fastq.gz'
              `FastQC/T9/Infected/Rep3/test3.fastq.gz' -> `FastQC_heap/test3.fastq.gz'





              share|improve this answer
























                up vote
                1
                down vote













                Example dir structure:



                $ find FastQC/
                FastQC/
                FastQC/T9
                FastQC/T9/Infected
                FastQC/T9/Infected/Rep1
                FastQC/T9/Infected/Rep1/test11.fastq.gz
                FastQC/T9/Infected/Rep1/test1.fastq.gz
                FastQC/T9/Infected/Rep2
                FastQC/T9/Infected/Rep2/test2.fastq.gz
                FastQC/T9/Infected/Rep3
                FastQC/T9/Infected/Rep3/test3.fastq.gz


                If understood well You need to run some job on every *.fastq.gz file. Then You can do something like this (my example job is gzip test, replace with your job):



                Rookie:



                $ find FastQC/ -type f -name "*.fastq.gz" | xargs gzip -tv
                FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
                FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
                FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
                FastQC/T9/Infected/Rep3/test3.fastq.gz: OK


                Solid:



                $ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I gzip -tv 
                FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
                FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
                FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
                FastQC/T9/Infected/Rep3/test3.fastq.gz: OK


                1. find finds files with name *.fastq.gz and outputs it with zero byte delimited (to support weird characters like space etc. in filenames)

                2. xargs represents output as and passes it to gzip -tv


                If You want to copy files inside one heap folder:



                $ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I cp -pv FastQC_heap/
                `FastQC/T9/Infected/Rep1/test11.fastq.gz' -> `FastQC_heap/test11.fastq.gz'
                `FastQC/T9/Infected/Rep1/test1.fastq.gz' -> `FastQC_heap/test1.fastq.gz'
                `FastQC/T9/Infected/Rep2/test2.fastq.gz' -> `FastQC_heap/test2.fastq.gz'
                `FastQC/T9/Infected/Rep3/test3.fastq.gz' -> `FastQC_heap/test3.fastq.gz'





                share|improve this answer






















                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  Example dir structure:



                  $ find FastQC/
                  FastQC/
                  FastQC/T9
                  FastQC/T9/Infected
                  FastQC/T9/Infected/Rep1
                  FastQC/T9/Infected/Rep1/test11.fastq.gz
                  FastQC/T9/Infected/Rep1/test1.fastq.gz
                  FastQC/T9/Infected/Rep2
                  FastQC/T9/Infected/Rep2/test2.fastq.gz
                  FastQC/T9/Infected/Rep3
                  FastQC/T9/Infected/Rep3/test3.fastq.gz


                  If understood well You need to run some job on every *.fastq.gz file. Then You can do something like this (my example job is gzip test, replace with your job):



                  Rookie:



                  $ find FastQC/ -type f -name "*.fastq.gz" | xargs gzip -tv
                  FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
                  FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
                  FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
                  FastQC/T9/Infected/Rep3/test3.fastq.gz: OK


                  Solid:



                  $ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I gzip -tv 
                  FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
                  FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
                  FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
                  FastQC/T9/Infected/Rep3/test3.fastq.gz: OK


                  1. find finds files with name *.fastq.gz and outputs it with zero byte delimited (to support weird characters like space etc. in filenames)

                  2. xargs represents output as and passes it to gzip -tv


                  If You want to copy files inside one heap folder:



                  $ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I cp -pv FastQC_heap/
                  `FastQC/T9/Infected/Rep1/test11.fastq.gz' -> `FastQC_heap/test11.fastq.gz'
                  `FastQC/T9/Infected/Rep1/test1.fastq.gz' -> `FastQC_heap/test1.fastq.gz'
                  `FastQC/T9/Infected/Rep2/test2.fastq.gz' -> `FastQC_heap/test2.fastq.gz'
                  `FastQC/T9/Infected/Rep3/test3.fastq.gz' -> `FastQC_heap/test3.fastq.gz'





                  share|improve this answer












                  Example dir structure:



                  $ find FastQC/
                  FastQC/
                  FastQC/T9
                  FastQC/T9/Infected
                  FastQC/T9/Infected/Rep1
                  FastQC/T9/Infected/Rep1/test11.fastq.gz
                  FastQC/T9/Infected/Rep1/test1.fastq.gz
                  FastQC/T9/Infected/Rep2
                  FastQC/T9/Infected/Rep2/test2.fastq.gz
                  FastQC/T9/Infected/Rep3
                  FastQC/T9/Infected/Rep3/test3.fastq.gz


                  If understood well You need to run some job on every *.fastq.gz file. Then You can do something like this (my example job is gzip test, replace with your job):



                  Rookie:



                  $ find FastQC/ -type f -name "*.fastq.gz" | xargs gzip -tv
                  FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
                  FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
                  FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
                  FastQC/T9/Infected/Rep3/test3.fastq.gz: OK


                  Solid:



                  $ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I gzip -tv 
                  FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
                  FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
                  FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
                  FastQC/T9/Infected/Rep3/test3.fastq.gz: OK


                  1. find finds files with name *.fastq.gz and outputs it with zero byte delimited (to support weird characters like space etc. in filenames)

                  2. xargs represents output as and passes it to gzip -tv


                  If You want to copy files inside one heap folder:



                  $ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I cp -pv FastQC_heap/
                  `FastQC/T9/Infected/Rep1/test11.fastq.gz' -> `FastQC_heap/test11.fastq.gz'
                  `FastQC/T9/Infected/Rep1/test1.fastq.gz' -> `FastQC_heap/test1.fastq.gz'
                  `FastQC/T9/Infected/Rep2/test2.fastq.gz' -> `FastQC_heap/test2.fastq.gz'
                  `FastQC/T9/Infected/Rep3/test3.fastq.gz' -> `FastQC_heap/test3.fastq.gz'






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 6 hours ago







                  Kubator



























                      up vote
                      1
                      down vote













                      You have a lot of files. You are going to have a lot of fastQC output. This is a big data situation, and when you've got a lot of data it's best to find a way to condense it so that you can see it all in one place. MultiQC is a sort of wrapper for fastQC and has the benefits of being easier to run on multiple fastq files (no loops required), and best of all, produces a single interactive file with all of your quality results in one place. While this doesn't answer your exact question, it does solve the problem that you are looking to solve.



                      After installation:



                      pip install multiqc


                      it's a one-liner to run:



                      multiqc /toplevel/pat/to/fastqs


                      Provide it with the most top-level directory and it'll search for the files it needs in the sub-directories.



                      You file will look something like this. Note that you can collect much more that just fastQC stats on the fastq files, if needed. It is perfectly suited for only fastQC results as in you scenario.
                      multiQC screencap






                      share|improve this answer
























                        up vote
                        1
                        down vote













                        You have a lot of files. You are going to have a lot of fastQC output. This is a big data situation, and when you've got a lot of data it's best to find a way to condense it so that you can see it all in one place. MultiQC is a sort of wrapper for fastQC and has the benefits of being easier to run on multiple fastq files (no loops required), and best of all, produces a single interactive file with all of your quality results in one place. While this doesn't answer your exact question, it does solve the problem that you are looking to solve.



                        After installation:



                        pip install multiqc


                        it's a one-liner to run:



                        multiqc /toplevel/pat/to/fastqs


                        Provide it with the most top-level directory and it'll search for the files it needs in the sub-directories.



                        You file will look something like this. Note that you can collect much more that just fastQC stats on the fastq files, if needed. It is perfectly suited for only fastQC results as in you scenario.
                        multiQC screencap






                        share|improve this answer






















                          up vote
                          1
                          down vote










                          up vote
                          1
                          down vote









                          You have a lot of files. You are going to have a lot of fastQC output. This is a big data situation, and when you've got a lot of data it's best to find a way to condense it so that you can see it all in one place. MultiQC is a sort of wrapper for fastQC and has the benefits of being easier to run on multiple fastq files (no loops required), and best of all, produces a single interactive file with all of your quality results in one place. While this doesn't answer your exact question, it does solve the problem that you are looking to solve.



                          After installation:



                          pip install multiqc


                          it's a one-liner to run:



                          multiqc /toplevel/pat/to/fastqs


                          Provide it with the most top-level directory and it'll search for the files it needs in the sub-directories.



                          You file will look something like this. Note that you can collect much more that just fastQC stats on the fastq files, if needed. It is perfectly suited for only fastQC results as in you scenario.
                          multiQC screencap






                          share|improve this answer












                          You have a lot of files. You are going to have a lot of fastQC output. This is a big data situation, and when you've got a lot of data it's best to find a way to condense it so that you can see it all in one place. MultiQC is a sort of wrapper for fastQC and has the benefits of being easier to run on multiple fastq files (no loops required), and best of all, produces a single interactive file with all of your quality results in one place. While this doesn't answer your exact question, it does solve the problem that you are looking to solve.



                          After installation:



                          pip install multiqc


                          it's a one-liner to run:



                          multiqc /toplevel/pat/to/fastqs


                          Provide it with the most top-level directory and it'll search for the files it needs in the sub-directories.



                          You file will look something like this. Note that you can collect much more that just fastQC stats on the fastq files, if needed. It is perfectly suited for only fastQC results as in you scenario.
                          multiQC screencap







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered 2 hours ago









                          kohlkopf

                          486113




                          486113




















                              up vote
                              1
                              down vote













                              multiqc kind of glazes over some important information, like the exact adapters and duplicated sequences in a library. If you plan to spend big $$ for sequencing a library it is better to look at both the multiqc report and the actual fastqc html report to get a better idea of any error modes.



                              Going off of @Kubator's answer, I noticed that there was no command to run fastqc.



                              Here's a simple one-liner to run fastqc in parallel on all of your fastq files. The -j 25 uses 25 threads. Change 25 to however many threads you want/have for max speed.



                              #Run fastqc on everything in parallel.
                              > find ../reads/ -name '*.fastq.gz' | awk 'printf("fastqc %sn", $0)' | parallel -j 25 --verbose
                              # copies all the fastqc files to directory ./
                              > find ../reads/ -name '*fastqc.*' | xargs -I '' mv '' ./


                              These files might be output by multiqc, anyway!






                              share|improve this answer




















                              • +1, but note that this will break in the unlikely case where any of the fastq file names contain whitespace. A safer version is find . -name '*.fastq.gz' | awk 'printf("fastqc "%s"n", $0)' but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version of find with -printf, like GNU find): find . -name '*.fastq.gz' -printf '"%p"n' | parallel -j 25 --verbose.
                                – terdon♦
                                9 mins ago














                              up vote
                              1
                              down vote













                              multiqc kind of glazes over some important information, like the exact adapters and duplicated sequences in a library. If you plan to spend big $$ for sequencing a library it is better to look at both the multiqc report and the actual fastqc html report to get a better idea of any error modes.



                              Going off of @Kubator's answer, I noticed that there was no command to run fastqc.



                              Here's a simple one-liner to run fastqc in parallel on all of your fastq files. The -j 25 uses 25 threads. Change 25 to however many threads you want/have for max speed.



                              #Run fastqc on everything in parallel.
                              > find ../reads/ -name '*.fastq.gz' | awk 'printf("fastqc %sn", $0)' | parallel -j 25 --verbose
                              # copies all the fastqc files to directory ./
                              > find ../reads/ -name '*fastqc.*' | xargs -I '' mv '' ./


                              These files might be output by multiqc, anyway!






                              share|improve this answer




















                              • +1, but note that this will break in the unlikely case where any of the fastq file names contain whitespace. A safer version is find . -name '*.fastq.gz' | awk 'printf("fastqc "%s"n", $0)' but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version of find with -printf, like GNU find): find . -name '*.fastq.gz' -printf '"%p"n' | parallel -j 25 --verbose.
                                – terdon♦
                                9 mins ago












                              up vote
                              1
                              down vote










                              up vote
                              1
                              down vote









                              multiqc kind of glazes over some important information, like the exact adapters and duplicated sequences in a library. If you plan to spend big $$ for sequencing a library it is better to look at both the multiqc report and the actual fastqc html report to get a better idea of any error modes.



                              Going off of @Kubator's answer, I noticed that there was no command to run fastqc.



                              Here's a simple one-liner to run fastqc in parallel on all of your fastq files. The -j 25 uses 25 threads. Change 25 to however many threads you want/have for max speed.



                              #Run fastqc on everything in parallel.
                              > find ../reads/ -name '*.fastq.gz' | awk 'printf("fastqc %sn", $0)' | parallel -j 25 --verbose
                              # copies all the fastqc files to directory ./
                              > find ../reads/ -name '*fastqc.*' | xargs -I '' mv '' ./


                              These files might be output by multiqc, anyway!






                              share|improve this answer












                              multiqc kind of glazes over some important information, like the exact adapters and duplicated sequences in a library. If you plan to spend big $$ for sequencing a library it is better to look at both the multiqc report and the actual fastqc html report to get a better idea of any error modes.



                              Going off of @Kubator's answer, I noticed that there was no command to run fastqc.



                              Here's a simple one-liner to run fastqc in parallel on all of your fastq files. The -j 25 uses 25 threads. Change 25 to however many threads you want/have for max speed.



                              #Run fastqc on everything in parallel.
                              > find ../reads/ -name '*.fastq.gz' | awk 'printf("fastqc %sn", $0)' | parallel -j 25 --verbose
                              # copies all the fastqc files to directory ./
                              > find ../reads/ -name '*fastqc.*' | xargs -I '' mv '' ./


                              These files might be output by multiqc, anyway!







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered 26 mins ago









                              conchoecia

                              86818




                              86818











                              • +1, but note that this will break in the unlikely case where any of the fastq file names contain whitespace. A safer version is find . -name '*.fastq.gz' | awk 'printf("fastqc "%s"n", $0)' but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version of find with -printf, like GNU find): find . -name '*.fastq.gz' -printf '"%p"n' | parallel -j 25 --verbose.
                                – terdon♦
                                9 mins ago
















                              • +1, but note that this will break in the unlikely case where any of the fastq file names contain whitespace. A safer version is find . -name '*.fastq.gz' | awk 'printf("fastqc "%s"n", $0)' but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version of find with -printf, like GNU find): find . -name '*.fastq.gz' -printf '"%p"n' | parallel -j 25 --verbose.
                                – terdon♦
                                9 mins ago















                              +1, but note that this will break in the unlikely case where any of the fastq file names contain whitespace. A safer version is find . -name '*.fastq.gz' | awk 'printf("fastqc "%s"n", $0)' but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version of find with -printf, like GNU find): find . -name '*.fastq.gz' -printf '"%p"n' | parallel -j 25 --verbose.
                              – terdon♦
                              9 mins ago




                              +1, but note that this will break in the unlikely case where any of the fastq file names contain whitespace. A safer version is find . -name '*.fastq.gz' | awk 'printf("fastqc "%s"n", $0)' but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version of find with -printf, like GNU find): find . -name '*.fastq.gz' -printf '"%p"n' | parallel -j 25 --verbose.
                              – terdon♦
                              9 mins ago

















                               

                              draft saved


                              draft discarded















































                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f5292%2fbash-scripting-fastqc-for-multiple-fastq-files-in-multiple-directories%23new-answer', 'question_page');

                              );

                              Post as a guest













































































                              Comments

                              Popular posts from this blog

                              Long meetings (6-7 hours a day): Being “babysat” by supervisor

                              Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

                              Confectionery