finding unique headers in a fasta file using linux command line

Clash Royale CLAN TAG#URR8PPP

up vote
1
down vote

favorite

I tried to use the following command

uniq -u reference.fasta >> reference_uniq.fasta

Id like a count of the unique headers

asked 1 hour ago

crispr

New contributor

Did you use python? It will be easier.
â€“Â tianhua liao
1 hour ago

add a commentÂ |Â

up vote
1
down vote

favorite

I tried to use the following command

uniq -u reference.fasta >> reference_uniq.fasta

Id like a count of the unique headers

asked 1 hour ago

crispr

New contributor

Did you use python? It will be easier.
â€“Â tianhua liao
1 hour ago

add a commentÂ |Â

up vote
1
down vote

favorite

I tried to use the following command

uniq -u reference.fasta >> reference_uniq.fasta

Id like a count of the unique headers

asked 1 hour ago

crispr

New contributor

I tried to use the following command

uniq -u reference.fasta >> reference_uniq.fasta

Id like a count of the unique headers

fasta linux

asked 1 hour ago

crispr

New contributor

asked 1 hour ago

crispr

New contributor

asked 1 hour ago

crispr

New contributor

asked 1 hour ago

crispr

asked 1 hour ago

crispr

New contributor

crispr is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

Did you use python? It will be easier.
â€“Â tianhua liao
1 hour ago

add a commentÂ |Â

Did you use python? It will be easier.
â€“Â tianhua liao
1 hour ago

Did you use python? It will be easier.
â€“Â tianhua liao
1 hour ago

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
2
down vote

If you just want the number of unique headers, you can do this:

grep '>' reference.fasta | sort | uniq -c | wc -l

If you want a list of the unique headers, you can do this:

grep '>' reference.fasta | sort | uniq

If you want a histogram of how many times each header occurs, you can do this:

grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'

answered 1 hour ago

conchoecia

71215

add a commentÂ |Â

up vote
1
down vote

You can achieve your goal with a one-liner:

grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort

answered 1 hour ago

user3479780

712

add a commentÂ |Â

up vote
0
down vote

The uniq command expects sorted input. Interestingly, the sort command actually has a "unique" option, -u, which means uniq is not strictly needed. For the fastest processing, you can look for the '>' character at the start of lines with grep:

grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta

For returning the number of unique lines, pipe through wc -l:

grep '^>' reference.fasta | sort -u | wc -l

For more information about regular expressions, see here.

answered 4 mins ago

gringer

6,6132844

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "676"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

crispr is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f5185%2ffinding-unique-headers-in-a-fasta-file-using-linux-command-line%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
2
down vote

If you just want the number of unique headers, you can do this:

grep '>' reference.fasta | sort | uniq -c | wc -l

If you want a list of the unique headers, you can do this:

grep '>' reference.fasta | sort | uniq

If you want a histogram of how many times each header occurs, you can do this:

grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'

answered 1 hour ago

conchoecia

71215

add a commentÂ |Â

up vote
2
down vote

If you just want the number of unique headers, you can do this:

grep '>' reference.fasta | sort | uniq -c | wc -l

If you want a list of the unique headers, you can do this:

grep '>' reference.fasta | sort | uniq

If you want a histogram of how many times each header occurs, you can do this:

grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'

answered 1 hour ago

conchoecia

71215

add a commentÂ |Â

up vote
2
down vote

If you just want the number of unique headers, you can do this:

grep '>' reference.fasta | sort | uniq -c | wc -l

If you want a list of the unique headers, you can do this:

grep '>' reference.fasta | sort | uniq

If you want a histogram of how many times each header occurs, you can do this:

grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'

answered 1 hour ago

conchoecia

71215

If you just want the number of unique headers, you can do this:

grep '>' reference.fasta | sort | uniq -c | wc -l

If you want a list of the unique headers, you can do this:

grep '>' reference.fasta | sort | uniq

If you want a histogram of how many times each header occurs, you can do this:

grep '>' reference.fasta | sort | uniq -c | awk 'printf("%st%sn", $1, $2)'

answered 1 hour ago

conchoecia

71215

answered 1 hour ago

conchoecia

71215

answered 1 hour ago

conchoecia

71215

answered 1 hour ago

conchoecia

71215

add a commentÂ |Â

up vote
1
down vote

You can achieve your goal with a one-liner:

grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort

answered 1 hour ago

user3479780

712

add a commentÂ |Â

up vote
1
down vote

You can achieve your goal with a one-liner:

grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort

answered 1 hour ago

user3479780

712

add a commentÂ |Â

up vote
1
down vote

You can achieve your goal with a one-liner:

grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort

answered 1 hour ago

user3479780

712

You can achieve your goal with a one-liner:

grep '>' reference.fasta | cut -d '>' -f 2 | sort | uniq -c | sort

answered 1 hour ago

user3479780

712

answered 1 hour ago

user3479780

712

answered 1 hour ago

user3479780

712

answered 1 hour ago

user3479780

712

add a commentÂ |Â

up vote
0
down vote

grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta

For returning the number of unique lines, pipe through wc -l:

grep '^>' reference.fasta | sort -u | wc -l

For more information about regular expressions, see here.

answered 4 mins ago

gringer

6,6132844

add a commentÂ |Â

up vote
0
down vote

grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta

For returning the number of unique lines, pipe through wc -l:

grep '^>' reference.fasta | sort -u | wc -l

For more information about regular expressions, see here.

answered 4 mins ago

gringer

6,6132844

add a commentÂ |Â

up vote
0
down vote

grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta

For returning the number of unique lines, pipe through wc -l:

grep '^>' reference.fasta | sort -u | wc -l

For more information about regular expressions, see here.

answered 4 mins ago

gringer

6,6132844

grep '^>' reference.fasta | sort -u > reference_headers_unique.fasta

For returning the number of unique lines, pipe through wc -l:

grep '^>' reference.fasta | sort -u | wc -l

For more information about regular expressions, see here.

answered 4 mins ago

gringer

6,6132844

answered 4 mins ago

gringer

6,6132844

answered 4 mins ago

gringer

6,6132844

answered 4 mins ago

gringer

6,6132844

add a commentÂ |Â

crispr is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

crispr is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky