Bash scripting FastQC for multiple fastq files in multiple directories
Clash Royale CLAN TAG#URR8PPP
up vote
4
down vote
favorite
I am completely new to bioinformatics so I'm looking to learn how to do this.
I have multiple directories with fastq files: E.g; 10 Directories with each time series, each with Treatment and control directories, each with rep1 rep2 rep3.
For example: T9/Infected/Rep1/*.fastq.gz.
I'm looking to create a loop to run fastQC on each fastq file instead of having to submit a separate job for each directory.
Then to either output the fastQC data to a single directory or if possible a directory corresponding to each rep - e.g. rep1 results go into a folder called rep1 and so on.
fastq quality-control bash fastqc
migrated from stackoverflow.com 3 hours ago
This question came from our site for professional and enthusiast programmers.
add a comment |Â
up vote
4
down vote
favorite
I am completely new to bioinformatics so I'm looking to learn how to do this.
I have multiple directories with fastq files: E.g; 10 Directories with each time series, each with Treatment and control directories, each with rep1 rep2 rep3.
For example: T9/Infected/Rep1/*.fastq.gz.
I'm looking to create a loop to run fastQC on each fastq file instead of having to submit a separate job for each directory.
Then to either output the fastQC data to a single directory or if possible a directory corresponding to each rep - e.g. rep1 results go into a folder called rep1 and so on.
fastq quality-control bash fastqc
migrated from stackoverflow.com 3 hours ago
This question came from our site for professional and enthusiast programmers.
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I am completely new to bioinformatics so I'm looking to learn how to do this.
I have multiple directories with fastq files: E.g; 10 Directories with each time series, each with Treatment and control directories, each with rep1 rep2 rep3.
For example: T9/Infected/Rep1/*.fastq.gz.
I'm looking to create a loop to run fastQC on each fastq file instead of having to submit a separate job for each directory.
Then to either output the fastQC data to a single directory or if possible a directory corresponding to each rep - e.g. rep1 results go into a folder called rep1 and so on.
fastq quality-control bash fastqc
I am completely new to bioinformatics so I'm looking to learn how to do this.
I have multiple directories with fastq files: E.g; 10 Directories with each time series, each with Treatment and control directories, each with rep1 rep2 rep3.
For example: T9/Infected/Rep1/*.fastq.gz.
I'm looking to create a loop to run fastQC on each fastq file instead of having to submit a separate job for each directory.
Then to either output the fastQC data to a single directory or if possible a directory corresponding to each rep - e.g. rep1 results go into a folder called rep1 and so on.
fastq quality-control bash fastqc
fastq quality-control bash fastqc
edited 2 hours ago
Bioathlete
1,570215
1,570215
asked 7 hours ago
Ryan Carter
migrated from stackoverflow.com 3 hours ago
This question came from our site for professional and enthusiast programmers.
migrated from stackoverflow.com 3 hours ago
This question came from our site for professional and enthusiast programmers.
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
1
down vote
Example dir structure:
$ find FastQC/
FastQC/
FastQC/T9
FastQC/T9/Infected
FastQC/T9/Infected/Rep1
FastQC/T9/Infected/Rep1/test11.fastq.gz
FastQC/T9/Infected/Rep1/test1.fastq.gz
FastQC/T9/Infected/Rep2
FastQC/T9/Infected/Rep2/test2.fastq.gz
FastQC/T9/Infected/Rep3
FastQC/T9/Infected/Rep3/test3.fastq.gz
If understood well You need to run some job on every *.fastq.gz file. Then You can do something like this (my example job is gzip test, replace with your job):
Rookie:
$ find FastQC/ -type f -name "*.fastq.gz" | xargs gzip -tv
FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
FastQC/T9/Infected/Rep3/test3.fastq.gz: OK
Solid:
$ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I gzip -tv
FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
FastQC/T9/Infected/Rep3/test3.fastq.gz: OK
- find finds files with name *.fastq.gz and outputs it with zero byte delimited (to support weird characters like space etc. in filenames)
- xargs represents output as and passes it to gzip -tv
If You want to copy files inside one heap folder:
$ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I cp -pv FastQC_heap/
`FastQC/T9/Infected/Rep1/test11.fastq.gz' -> `FastQC_heap/test11.fastq.gz'
`FastQC/T9/Infected/Rep1/test1.fastq.gz' -> `FastQC_heap/test1.fastq.gz'
`FastQC/T9/Infected/Rep2/test2.fastq.gz' -> `FastQC_heap/test2.fastq.gz'
`FastQC/T9/Infected/Rep3/test3.fastq.gz' -> `FastQC_heap/test3.fastq.gz'
add a comment |Â
up vote
1
down vote
You have a lot of files. You are going to have a lot of fastQC output. This is a big data situation, and when you've got a lot of data it's best to find a way to condense it so that you can see it all in one place. MultiQC is a sort of wrapper for fastQC and has the benefits of being easier to run on multiple fastq files (no loops required), and best of all, produces a single interactive file with all of your quality results in one place. While this doesn't answer your exact question, it does solve the problem that you are looking to solve.
After installation:
pip install multiqc
it's a one-liner to run:
multiqc /toplevel/pat/to/fastqs
Provide it with the most top-level directory and it'll search for the files it needs in the sub-directories.
You file will look something like this. Note that you can collect much more that just fastQC stats on the fastq files, if needed. It is perfectly suited for only fastQC results as in you scenario.
add a comment |Â
up vote
1
down vote
multiqc
kind of glazes over some important information, like the exact adapters and duplicated sequences in a library. If you plan to spend big $$ for sequencing a library it is better to look at both the multiqc
report and the actual fastqc html
report to get a better idea of any error modes.
Going off of @Kubator's answer, I noticed that there was no command to run fastqc.
Here's a simple one-liner to run fastqc in parallel on all of your fastq files. The -j 25
uses 25 threads. Change 25
to however many threads you want/have for max speed.
#Run fastqc on everything in parallel.
> find ../reads/ -name '*.fastq.gz' | awk 'printf("fastqc %sn", $0)' | parallel -j 25 --verbose
# copies all the fastqc files to directory ./
> find ../reads/ -name '*fastqc.*' | xargs -I '' mv '' ./
These files might be output by multiqc
, anyway!
+1, but note that this will break in the unlikely case where any of the fastq file names contain whitespace. A safer version isfind . -name '*.fastq.gz' | awk 'printf("fastqc "%s"n", $0)'
but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version offind
with-printf
, like GNUfind
):find . -name '*.fastq.gz' -printf '"%p"n' | parallel -j 25 --verbose
.
â terdonâ¦
9 mins ago
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Example dir structure:
$ find FastQC/
FastQC/
FastQC/T9
FastQC/T9/Infected
FastQC/T9/Infected/Rep1
FastQC/T9/Infected/Rep1/test11.fastq.gz
FastQC/T9/Infected/Rep1/test1.fastq.gz
FastQC/T9/Infected/Rep2
FastQC/T9/Infected/Rep2/test2.fastq.gz
FastQC/T9/Infected/Rep3
FastQC/T9/Infected/Rep3/test3.fastq.gz
If understood well You need to run some job on every *.fastq.gz file. Then You can do something like this (my example job is gzip test, replace with your job):
Rookie:
$ find FastQC/ -type f -name "*.fastq.gz" | xargs gzip -tv
FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
FastQC/T9/Infected/Rep3/test3.fastq.gz: OK
Solid:
$ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I gzip -tv
FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
FastQC/T9/Infected/Rep3/test3.fastq.gz: OK
- find finds files with name *.fastq.gz and outputs it with zero byte delimited (to support weird characters like space etc. in filenames)
- xargs represents output as and passes it to gzip -tv
If You want to copy files inside one heap folder:
$ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I cp -pv FastQC_heap/
`FastQC/T9/Infected/Rep1/test11.fastq.gz' -> `FastQC_heap/test11.fastq.gz'
`FastQC/T9/Infected/Rep1/test1.fastq.gz' -> `FastQC_heap/test1.fastq.gz'
`FastQC/T9/Infected/Rep2/test2.fastq.gz' -> `FastQC_heap/test2.fastq.gz'
`FastQC/T9/Infected/Rep3/test3.fastq.gz' -> `FastQC_heap/test3.fastq.gz'
add a comment |Â
up vote
1
down vote
Example dir structure:
$ find FastQC/
FastQC/
FastQC/T9
FastQC/T9/Infected
FastQC/T9/Infected/Rep1
FastQC/T9/Infected/Rep1/test11.fastq.gz
FastQC/T9/Infected/Rep1/test1.fastq.gz
FastQC/T9/Infected/Rep2
FastQC/T9/Infected/Rep2/test2.fastq.gz
FastQC/T9/Infected/Rep3
FastQC/T9/Infected/Rep3/test3.fastq.gz
If understood well You need to run some job on every *.fastq.gz file. Then You can do something like this (my example job is gzip test, replace with your job):
Rookie:
$ find FastQC/ -type f -name "*.fastq.gz" | xargs gzip -tv
FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
FastQC/T9/Infected/Rep3/test3.fastq.gz: OK
Solid:
$ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I gzip -tv
FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
FastQC/T9/Infected/Rep3/test3.fastq.gz: OK
- find finds files with name *.fastq.gz and outputs it with zero byte delimited (to support weird characters like space etc. in filenames)
- xargs represents output as and passes it to gzip -tv
If You want to copy files inside one heap folder:
$ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I cp -pv FastQC_heap/
`FastQC/T9/Infected/Rep1/test11.fastq.gz' -> `FastQC_heap/test11.fastq.gz'
`FastQC/T9/Infected/Rep1/test1.fastq.gz' -> `FastQC_heap/test1.fastq.gz'
`FastQC/T9/Infected/Rep2/test2.fastq.gz' -> `FastQC_heap/test2.fastq.gz'
`FastQC/T9/Infected/Rep3/test3.fastq.gz' -> `FastQC_heap/test3.fastq.gz'
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Example dir structure:
$ find FastQC/
FastQC/
FastQC/T9
FastQC/T9/Infected
FastQC/T9/Infected/Rep1
FastQC/T9/Infected/Rep1/test11.fastq.gz
FastQC/T9/Infected/Rep1/test1.fastq.gz
FastQC/T9/Infected/Rep2
FastQC/T9/Infected/Rep2/test2.fastq.gz
FastQC/T9/Infected/Rep3
FastQC/T9/Infected/Rep3/test3.fastq.gz
If understood well You need to run some job on every *.fastq.gz file. Then You can do something like this (my example job is gzip test, replace with your job):
Rookie:
$ find FastQC/ -type f -name "*.fastq.gz" | xargs gzip -tv
FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
FastQC/T9/Infected/Rep3/test3.fastq.gz: OK
Solid:
$ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I gzip -tv
FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
FastQC/T9/Infected/Rep3/test3.fastq.gz: OK
- find finds files with name *.fastq.gz and outputs it with zero byte delimited (to support weird characters like space etc. in filenames)
- xargs represents output as and passes it to gzip -tv
If You want to copy files inside one heap folder:
$ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I cp -pv FastQC_heap/
`FastQC/T9/Infected/Rep1/test11.fastq.gz' -> `FastQC_heap/test11.fastq.gz'
`FastQC/T9/Infected/Rep1/test1.fastq.gz' -> `FastQC_heap/test1.fastq.gz'
`FastQC/T9/Infected/Rep2/test2.fastq.gz' -> `FastQC_heap/test2.fastq.gz'
`FastQC/T9/Infected/Rep3/test3.fastq.gz' -> `FastQC_heap/test3.fastq.gz'
Example dir structure:
$ find FastQC/
FastQC/
FastQC/T9
FastQC/T9/Infected
FastQC/T9/Infected/Rep1
FastQC/T9/Infected/Rep1/test11.fastq.gz
FastQC/T9/Infected/Rep1/test1.fastq.gz
FastQC/T9/Infected/Rep2
FastQC/T9/Infected/Rep2/test2.fastq.gz
FastQC/T9/Infected/Rep3
FastQC/T9/Infected/Rep3/test3.fastq.gz
If understood well You need to run some job on every *.fastq.gz file. Then You can do something like this (my example job is gzip test, replace with your job):
Rookie:
$ find FastQC/ -type f -name "*.fastq.gz" | xargs gzip -tv
FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
FastQC/T9/Infected/Rep3/test3.fastq.gz: OK
Solid:
$ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I gzip -tv
FastQC/T9/Infected/Rep1/test11.fastq.gz: OK
FastQC/T9/Infected/Rep1/test1.fastq.gz: OK
FastQC/T9/Infected/Rep2/test2.fastq.gz: OK
FastQC/T9/Infected/Rep3/test3.fastq.gz: OK
- find finds files with name *.fastq.gz and outputs it with zero byte delimited (to support weird characters like space etc. in filenames)
- xargs represents output as and passes it to gzip -tv
If You want to copy files inside one heap folder:
$ find FastQC/ -type f -name "*.fastq.gz" -print0 | xargs -0 -I cp -pv FastQC_heap/
`FastQC/T9/Infected/Rep1/test11.fastq.gz' -> `FastQC_heap/test11.fastq.gz'
`FastQC/T9/Infected/Rep1/test1.fastq.gz' -> `FastQC_heap/test1.fastq.gz'
`FastQC/T9/Infected/Rep2/test2.fastq.gz' -> `FastQC_heap/test2.fastq.gz'
`FastQC/T9/Infected/Rep3/test3.fastq.gz' -> `FastQC_heap/test3.fastq.gz'
answered 6 hours ago
Kubator
add a comment |Â
add a comment |Â
up vote
1
down vote
You have a lot of files. You are going to have a lot of fastQC output. This is a big data situation, and when you've got a lot of data it's best to find a way to condense it so that you can see it all in one place. MultiQC is a sort of wrapper for fastQC and has the benefits of being easier to run on multiple fastq files (no loops required), and best of all, produces a single interactive file with all of your quality results in one place. While this doesn't answer your exact question, it does solve the problem that you are looking to solve.
After installation:
pip install multiqc
it's a one-liner to run:
multiqc /toplevel/pat/to/fastqs
Provide it with the most top-level directory and it'll search for the files it needs in the sub-directories.
You file will look something like this. Note that you can collect much more that just fastQC stats on the fastq files, if needed. It is perfectly suited for only fastQC results as in you scenario.
add a comment |Â
up vote
1
down vote
You have a lot of files. You are going to have a lot of fastQC output. This is a big data situation, and when you've got a lot of data it's best to find a way to condense it so that you can see it all in one place. MultiQC is a sort of wrapper for fastQC and has the benefits of being easier to run on multiple fastq files (no loops required), and best of all, produces a single interactive file with all of your quality results in one place. While this doesn't answer your exact question, it does solve the problem that you are looking to solve.
After installation:
pip install multiqc
it's a one-liner to run:
multiqc /toplevel/pat/to/fastqs
Provide it with the most top-level directory and it'll search for the files it needs in the sub-directories.
You file will look something like this. Note that you can collect much more that just fastQC stats on the fastq files, if needed. It is perfectly suited for only fastQC results as in you scenario.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
You have a lot of files. You are going to have a lot of fastQC output. This is a big data situation, and when you've got a lot of data it's best to find a way to condense it so that you can see it all in one place. MultiQC is a sort of wrapper for fastQC and has the benefits of being easier to run on multiple fastq files (no loops required), and best of all, produces a single interactive file with all of your quality results in one place. While this doesn't answer your exact question, it does solve the problem that you are looking to solve.
After installation:
pip install multiqc
it's a one-liner to run:
multiqc /toplevel/pat/to/fastqs
Provide it with the most top-level directory and it'll search for the files it needs in the sub-directories.
You file will look something like this. Note that you can collect much more that just fastQC stats on the fastq files, if needed. It is perfectly suited for only fastQC results as in you scenario.
You have a lot of files. You are going to have a lot of fastQC output. This is a big data situation, and when you've got a lot of data it's best to find a way to condense it so that you can see it all in one place. MultiQC is a sort of wrapper for fastQC and has the benefits of being easier to run on multiple fastq files (no loops required), and best of all, produces a single interactive file with all of your quality results in one place. While this doesn't answer your exact question, it does solve the problem that you are looking to solve.
After installation:
pip install multiqc
it's a one-liner to run:
multiqc /toplevel/pat/to/fastqs
Provide it with the most top-level directory and it'll search for the files it needs in the sub-directories.
You file will look something like this. Note that you can collect much more that just fastQC stats on the fastq files, if needed. It is perfectly suited for only fastQC results as in you scenario.
answered 2 hours ago
kohlkopf
486113
486113
add a comment |Â
add a comment |Â
up vote
1
down vote
multiqc
kind of glazes over some important information, like the exact adapters and duplicated sequences in a library. If you plan to spend big $$ for sequencing a library it is better to look at both the multiqc
report and the actual fastqc html
report to get a better idea of any error modes.
Going off of @Kubator's answer, I noticed that there was no command to run fastqc.
Here's a simple one-liner to run fastqc in parallel on all of your fastq files. The -j 25
uses 25 threads. Change 25
to however many threads you want/have for max speed.
#Run fastqc on everything in parallel.
> find ../reads/ -name '*.fastq.gz' | awk 'printf("fastqc %sn", $0)' | parallel -j 25 --verbose
# copies all the fastqc files to directory ./
> find ../reads/ -name '*fastqc.*' | xargs -I '' mv '' ./
These files might be output by multiqc
, anyway!
+1, but note that this will break in the unlikely case where any of the fastq file names contain whitespace. A safer version isfind . -name '*.fastq.gz' | awk 'printf("fastqc "%s"n", $0)'
but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version offind
with-printf
, like GNUfind
):find . -name '*.fastq.gz' -printf '"%p"n' | parallel -j 25 --verbose
.
â terdonâ¦
9 mins ago
add a comment |Â
up vote
1
down vote
multiqc
kind of glazes over some important information, like the exact adapters and duplicated sequences in a library. If you plan to spend big $$ for sequencing a library it is better to look at both the multiqc
report and the actual fastqc html
report to get a better idea of any error modes.
Going off of @Kubator's answer, I noticed that there was no command to run fastqc.
Here's a simple one-liner to run fastqc in parallel on all of your fastq files. The -j 25
uses 25 threads. Change 25
to however many threads you want/have for max speed.
#Run fastqc on everything in parallel.
> find ../reads/ -name '*.fastq.gz' | awk 'printf("fastqc %sn", $0)' | parallel -j 25 --verbose
# copies all the fastqc files to directory ./
> find ../reads/ -name '*fastqc.*' | xargs -I '' mv '' ./
These files might be output by multiqc
, anyway!
+1, but note that this will break in the unlikely case where any of the fastq file names contain whitespace. A safer version isfind . -name '*.fastq.gz' | awk 'printf("fastqc "%s"n", $0)'
but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version offind
with-printf
, like GNUfind
):find . -name '*.fastq.gz' -printf '"%p"n' | parallel -j 25 --verbose
.
â terdonâ¦
9 mins ago
add a comment |Â
up vote
1
down vote
up vote
1
down vote
multiqc
kind of glazes over some important information, like the exact adapters and duplicated sequences in a library. If you plan to spend big $$ for sequencing a library it is better to look at both the multiqc
report and the actual fastqc html
report to get a better idea of any error modes.
Going off of @Kubator's answer, I noticed that there was no command to run fastqc.
Here's a simple one-liner to run fastqc in parallel on all of your fastq files. The -j 25
uses 25 threads. Change 25
to however many threads you want/have for max speed.
#Run fastqc on everything in parallel.
> find ../reads/ -name '*.fastq.gz' | awk 'printf("fastqc %sn", $0)' | parallel -j 25 --verbose
# copies all the fastqc files to directory ./
> find ../reads/ -name '*fastqc.*' | xargs -I '' mv '' ./
These files might be output by multiqc
, anyway!
multiqc
kind of glazes over some important information, like the exact adapters and duplicated sequences in a library. If you plan to spend big $$ for sequencing a library it is better to look at both the multiqc
report and the actual fastqc html
report to get a better idea of any error modes.
Going off of @Kubator's answer, I noticed that there was no command to run fastqc.
Here's a simple one-liner to run fastqc in parallel on all of your fastq files. The -j 25
uses 25 threads. Change 25
to however many threads you want/have for max speed.
#Run fastqc on everything in parallel.
> find ../reads/ -name '*.fastq.gz' | awk 'printf("fastqc %sn", $0)' | parallel -j 25 --verbose
# copies all the fastqc files to directory ./
> find ../reads/ -name '*fastqc.*' | xargs -I '' mv '' ./
These files might be output by multiqc
, anyway!
answered 26 mins ago
conchoecia
86818
86818
+1, but note that this will break in the unlikely case where any of the fastq file names contain whitespace. A safer version isfind . -name '*.fastq.gz' | awk 'printf("fastqc "%s"n", $0)'
but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version offind
with-printf
, like GNUfind
):find . -name '*.fastq.gz' -printf '"%p"n' | parallel -j 25 --verbose
.
â terdonâ¦
9 mins ago
add a comment |Â
+1, but note that this will break in the unlikely case where any of the fastq file names contain whitespace. A safer version isfind . -name '*.fastq.gz' | awk 'printf("fastqc "%s"n", $0)'
but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version offind
with-printf
, like GNUfind
):find . -name '*.fastq.gz' -printf '"%p"n' | parallel -j 25 --verbose
.
â terdonâ¦
9 mins ago
+1, but note that this will break in the unlikely case where any of the fastq file names contain whitespace. A safer version is
find . -name '*.fastq.gz' | awk 'printf("fastqc "%s"n", $0)'
but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version of find
with -printf
, like GNU find
): find . -name '*.fastq.gz' -printf '"%p"n' | parallel -j 25 --verbose
.â terdonâ¦
9 mins ago
+1, but note that this will break in the unlikely case where any of the fastq file names contain whitespace. A safer version is
find . -name '*.fastq.gz' | awk 'printf("fastqc "%s"n", $0)'
but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version of find
with -printf
, like GNU find
): find . -name '*.fastq.gz' -printf '"%p"n' | parallel -j 25 --verbose
.â terdonâ¦
9 mins ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f5292%2fbash-scripting-fastqc-for-multiple-fastq-files-in-multiple-directories%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password