finding unique headers in a fasta file using linux command line
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I tried to use the following command
uniq -u reference.fasta >> reference_uniq.fasta
Id like a count of the unique headers
fasta linux
New contributor
add a comment |Â
up vote
1
down vote
favorite
I tried to use the following command
uniq -u reference.fasta >> reference_uniq.fasta
Id like a count of the unique headers
fasta linux
New contributor
Did you use python? It will be easier.
â tianhua liao
1 hour ago
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I tried to use the following command
uniq -u reference.fasta >> reference_uniq.fasta
Id like a count of the unique headers
fasta linux
New contributor
I tried to use the following command
uniq -u reference.fasta >> reference_uniq.fasta
Id like a count of the unique headers
fasta linux
fasta linux
New contributor
New contributor
New contributor
asked 1 hour ago
crispr
62
62
New contributor
New contributor
Did you use python? It will be easier.
â tianhua liao
1 hour ago
add a comment |Â
Did you use python? It will be easier.
â tianhua liao
1 hour ago
Did you use python? It will be easier.
â tianhua liao
1 hour ago
Did you use python? It will be easier.
â tianhua liao
1 hour ago
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
2
down vote
If you just want the number of unique headers, you can do this:
grep '>' reference.fasta | sort | uniq -c | wc -l
If you want a list of the unique headers, you can do this:
grep '>' reference.fasta | sort | uniq
If you want a histogram of how many times each header occurs, you can do this:
grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'
add a comment |Â
up vote
1
down vote
You can achieve your goal with a one-liner:
grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort
add a comment |Â
up vote
0
down vote
The uniq
command expects sorted input. Interestingly, the sort
command actually has a "unique" option, -u
, which means uniq
is not strictly needed. For the fastest processing, you can look for the '>' character at the start of lines with grep:
grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta
For returning the number of unique lines, pipe through wc -l
:
grep '^>' reference.fasta | sort -u | wc -l
For more information about regular expressions, see here.
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
If you just want the number of unique headers, you can do this:
grep '>' reference.fasta | sort | uniq -c | wc -l
If you want a list of the unique headers, you can do this:
grep '>' reference.fasta | sort | uniq
If you want a histogram of how many times each header occurs, you can do this:
grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'
add a comment |Â
up vote
2
down vote
If you just want the number of unique headers, you can do this:
grep '>' reference.fasta | sort | uniq -c | wc -l
If you want a list of the unique headers, you can do this:
grep '>' reference.fasta | sort | uniq
If you want a histogram of how many times each header occurs, you can do this:
grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'
add a comment |Â
up vote
2
down vote
up vote
2
down vote
If you just want the number of unique headers, you can do this:
grep '>' reference.fasta | sort | uniq -c | wc -l
If you want a list of the unique headers, you can do this:
grep '>' reference.fasta | sort | uniq
If you want a histogram of how many times each header occurs, you can do this:
grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'
If you just want the number of unique headers, you can do this:
grep '>' reference.fasta | sort | uniq -c | wc -l
If you want a list of the unique headers, you can do this:
grep '>' reference.fasta | sort | uniq
If you want a histogram of how many times each header occurs, you can do this:
grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'
answered 1 hour ago
conchoecia
71215
71215
add a comment |Â
add a comment |Â
up vote
1
down vote
You can achieve your goal with a one-liner:
grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort
add a comment |Â
up vote
1
down vote
You can achieve your goal with a one-liner:
grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort
add a comment |Â
up vote
1
down vote
up vote
1
down vote
You can achieve your goal with a one-liner:
grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort
You can achieve your goal with a one-liner:
grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort
answered 1 hour ago
user3479780
712
712
add a comment |Â
add a comment |Â
up vote
0
down vote
The uniq
command expects sorted input. Interestingly, the sort
command actually has a "unique" option, -u
, which means uniq
is not strictly needed. For the fastest processing, you can look for the '>' character at the start of lines with grep:
grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta
For returning the number of unique lines, pipe through wc -l
:
grep '^>' reference.fasta | sort -u | wc -l
For more information about regular expressions, see here.
add a comment |Â
up vote
0
down vote
The uniq
command expects sorted input. Interestingly, the sort
command actually has a "unique" option, -u
, which means uniq
is not strictly needed. For the fastest processing, you can look for the '>' character at the start of lines with grep:
grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta
For returning the number of unique lines, pipe through wc -l
:
grep '^>' reference.fasta | sort -u | wc -l
For more information about regular expressions, see here.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
The uniq
command expects sorted input. Interestingly, the sort
command actually has a "unique" option, -u
, which means uniq
is not strictly needed. For the fastest processing, you can look for the '>' character at the start of lines with grep:
grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta
For returning the number of unique lines, pipe through wc -l
:
grep '^>' reference.fasta | sort -u | wc -l
For more information about regular expressions, see here.
The uniq
command expects sorted input. Interestingly, the sort
command actually has a "unique" option, -u
, which means uniq
is not strictly needed. For the fastest processing, you can look for the '>' character at the start of lines with grep:
grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta
For returning the number of unique lines, pipe through wc -l
:
grep '^>' reference.fasta | sort -u | wc -l
For more information about regular expressions, see here.
answered 4 mins ago
gringer
6,6132844
6,6132844
add a comment |Â
add a comment |Â
crispr is a new contributor. Be nice, and check out our Code of Conduct.
crispr is a new contributor. Be nice, and check out our Code of Conduct.
crispr is a new contributor. Be nice, and check out our Code of Conduct.
crispr is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f5185%2ffinding-unique-headers-in-a-fasta-file-using-linux-command-line%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Did you use python? It will be easier.
â tianhua liao
1 hour ago