collecting specific genome data from a file and collect it in the same title

up vote
4
down vote

favorite

I have genomes data in a file genomes-seq.txt, the titles of the sequences begain with>then the genome name

>genome.1
atcg
atcg
atcggtc

>genome.2
atct
tgcgtgctt
attttt

>genome.
sdkf
sdf;ksdf
sdlfkjdslc
edsfsfv

>genome.3
as;ldkhaskjd
asdkljdsl
asdkljasdk;l

>genome.4
ekjfhdhsa
dsfkjskajd
asdknasd


>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.234
efijhusidh
siduhygfhuji

>genome.1
ljhdcj
sdljhsdil
fweusfhygc

I want to collect the similar data for genome.1 in one file so it look like this

>genome.1
atcg
atcggtc

iruuwi
sdkljbh
sdfljnsdl
ljhdcj
sdljhsdil
fweusfhygc

but every time I do it using sed I get

>genome.1
atcg
atcg
atcggtc

>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.1
ljhdcj
sdljhsdil
fweusfhygc

multiple genome.1 how can I do it correctly so on large data set I don't need to remove all the repetitions.

edited 35 mins ago

Rui F Ribeiro

37k1273117

asked 1 hour ago

paul

233

New contributor

Hi @paul, what is your sed command that you used?
â€“Â Goro
58 mins ago

I tried but it didn't work
â€“Â paul
54 mins ago

Show what you tried and we can help fix your errors.
â€“Â glenn jackman
24 mins ago

add a commentÂ |Â

up vote
4
down vote

favorite

I have genomes data in a file genomes-seq.txt, the titles of the sequences begain with>then the genome name

>genome.1
atcg
atcg
atcggtc

>genome.2
atct
tgcgtgctt
attttt

>genome.
sdkf
sdf;ksdf
sdlfkjdslc
edsfsfv

>genome.3
as;ldkhaskjd
asdkljdsl
asdkljasdk;l

>genome.4
ekjfhdhsa
dsfkjskajd
asdknasd


>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.234
efijhusidh
siduhygfhuji

>genome.1
ljhdcj
sdljhsdil
fweusfhygc

I want to collect the similar data for genome.1 in one file so it look like this

>genome.1
atcg
atcggtc

iruuwi
sdkljbh
sdfljnsdl
ljhdcj
sdljhsdil
fweusfhygc

but every time I do it using sed I get

>genome.1
atcg
atcg
atcggtc

>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.1
ljhdcj
sdljhsdil
fweusfhygc

multiple genome.1 how can I do it correctly so on large data set I don't need to remove all the repetitions.

edited 35 mins ago

Rui F Ribeiro

37k1273117

asked 1 hour ago

paul

233

New contributor

Hi @paul, what is your sed command that you used?
â€“Â Goro
58 mins ago

I tried but it didn't work
â€“Â paul
54 mins ago

Show what you tried and we can help fix your errors.
â€“Â glenn jackman
24 mins ago

add a commentÂ |Â

up vote
4
down vote

favorite

I have genomes data in a file genomes-seq.txt, the titles of the sequences begain with>then the genome name

>genome.1
atcg
atcg
atcggtc

>genome.2
atct
tgcgtgctt
attttt

>genome.
sdkf
sdf;ksdf
sdlfkjdslc
edsfsfv

>genome.3
as;ldkhaskjd
asdkljdsl
asdkljasdk;l

>genome.4
ekjfhdhsa
dsfkjskajd
asdknasd


>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.234
efijhusidh
siduhygfhuji

>genome.1
ljhdcj
sdljhsdil
fweusfhygc

I want to collect the similar data for genome.1 in one file so it look like this

>genome.1
atcg
atcggtc

iruuwi
sdkljbh
sdfljnsdl
ljhdcj
sdljhsdil
fweusfhygc

but every time I do it using sed I get

>genome.1
atcg
atcg
atcggtc

>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.1
ljhdcj
sdljhsdil
fweusfhygc

multiple genome.1 how can I do it correctly so on large data set I don't need to remove all the repetitions.

edited 35 mins ago

Rui F Ribeiro

37k1273117

asked 1 hour ago

paul

233

New contributor

I have genomes data in a file genomes-seq.txt, the titles of the sequences begain with>then the genome name

>genome.1
atcg
atcg
atcggtc

>genome.2
atct
tgcgtgctt
attttt

>genome.
sdkf
sdf;ksdf
sdlfkjdslc
edsfsfv

>genome.3
as;ldkhaskjd
asdkljdsl
asdkljasdk;l

>genome.4
ekjfhdhsa
dsfkjskajd
asdknasd


>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.234
efijhusidh
siduhygfhuji

>genome.1
ljhdcj
sdljhsdil
fweusfhygc

I want to collect the similar data for genome.1 in one file so it look like this

>genome.1
atcg
atcggtc

iruuwi
sdkljbh
sdfljnsdl
ljhdcj
sdljhsdil
fweusfhygc

but every time I do it using sed I get

>genome.1
atcg
atcg
atcggtc

>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.1
ljhdcj
sdljhsdil
fweusfhygc

multiple genome.1 how can I do it correctly so on large data set I don't need to remove all the repetitions.

bash

edited 35 mins ago

Rui F Ribeiro

37k1273117

asked 1 hour ago

paul

233

New contributor

edited 35 mins ago

Rui F Ribeiro

37k1273117

asked 1 hour ago

paul

233

New contributor

edited 35 mins ago

Rui F Ribeiro

37k1273117

edited 35 mins ago

Rui F Ribeiro

37k1273117

edited 35 mins ago

Rui F Ribeiro

37k1273117

asked 1 hour ago

paul

233

New contributor

asked 1 hour ago

paul

233

asked 1 hour ago

paul

233

New contributor

paul is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

Hi @paul, what is your sed command that you used?
â€“Â Goro
58 mins ago

I tried but it didn't work
â€“Â paul
54 mins ago

Show what you tried and we can help fix your errors.
â€“Â glenn jackman
24 mins ago

add a commentÂ |Â

Hi @paul, what is your sed command that you used?
â€“Â Goro
58 mins ago

I tried but it didn't work
â€“Â paul
54 mins ago

Show what you tried and we can help fix your errors.
â€“Â glenn jackman
24 mins ago

Hi @paul, what is your sed command that you used?
â€“Â Goro
58 mins ago

I tried but it didn't work
â€“Â paul
54 mins ago

Show what you tried and we can help fix your errors.
â€“Â glenn jackman
24 mins ago

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
3
down vote

accepted

$sed -nr />genome.1/,/^$/p file | sed '2,$/^>genome.1$/d'

>genome.1
atcg
atcggtc

iruuwi
sdkljbh
sdfljnsdl
ljhdcj
sdljhsdil
fweusfhygc

genome.1 is the key word, change depending on the list you would like to generate.

edited 17 mins ago

answered 52 mins ago

Goro

7,86653473

add a commentÂ |Â

up vote
1
down vote

With perl

perl -00 -ne 'if (/^>genome.1n/) s/// if $. > 1; print' file

answered 20 mins ago

glenn jackman

48.4k366105

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

paul is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f474268%2fcollecting-specific-genome-data-from-a-file-and-collect-it-in-the-same-title%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
3
down vote

accepted

$sed -nr />genome.1/,/^$/p file | sed '2,$/^>genome.1$/d'

>genome.1
atcg
atcggtc

iruuwi
sdkljbh
sdfljnsdl
ljhdcj
sdljhsdil
fweusfhygc

genome.1 is the key word, change depending on the list you would like to generate.

edited 17 mins ago

answered 52 mins ago

Goro

7,86653473

add a commentÂ |Â

up vote
3
down vote

accepted

$sed -nr />genome.1/,/^$/p file | sed '2,$/^>genome.1$/d'

>genome.1
atcg
atcggtc

iruuwi
sdkljbh
sdfljnsdl
ljhdcj
sdljhsdil
fweusfhygc

genome.1 is the key word, change depending on the list you would like to generate.

edited 17 mins ago

answered 52 mins ago

Goro

7,86653473

add a commentÂ |Â

up vote
3
down vote

accepted

$sed -nr />genome.1/,/^$/p file | sed '2,$/^>genome.1$/d'

>genome.1
atcg
atcggtc

iruuwi
sdkljbh
sdfljnsdl
ljhdcj
sdljhsdil
fweusfhygc

genome.1 is the key word, change depending on the list you would like to generate.

edited 17 mins ago

answered 52 mins ago

Goro

7,86653473

$sed -nr />genome.1/,/^$/p file | sed '2,$/^>genome.1$/d'

>genome.1
atcg
atcggtc

iruuwi
sdkljbh
sdfljnsdl
ljhdcj
sdljhsdil
fweusfhygc

genome.1 is the key word, change depending on the list you would like to generate.

edited 17 mins ago

answered 52 mins ago

Goro

7,86653473

edited 17 mins ago

answered 52 mins ago

Goro

7,86653473

answered 52 mins ago

Goro

7,86653473

answered 52 mins ago

Goro

7,86653473

add a commentÂ |Â

up vote
1
down vote

With perl

perl -00 -ne 'if (/^>genome.1n/) s/// if $. > 1; print' file

answered 20 mins ago

glenn jackman

48.4k366105

add a commentÂ |Â

up vote
1
down vote

With perl

perl -00 -ne 'if (/^>genome.1n/) s/// if $. > 1; print' file

answered 20 mins ago

glenn jackman

48.4k366105

add a commentÂ |Â

up vote
1
down vote

With perl

perl -00 -ne 'if (/^>genome.1n/) s/// if $. > 1; print' file

answered 20 mins ago

glenn jackman

48.4k366105

With perl

perl -00 -ne 'if (/^>genome.1n/) s/// if $. > 1; print' file

answered 20 mins ago

glenn jackman

48.4k366105

answered 20 mins ago

glenn jackman

48.4k366105

answered 20 mins ago

glenn jackman

48.4k366105

answered 20 mins ago

glenn jackman

48.4k366105

add a commentÂ |Â

paul is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

paul is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky