Adding more information to a string
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
I have a gtf file like this:
ChrI Coding_transcript gene 8451772 8509212 . - . gene_id "UMM-S589-0.12-gene-1"
ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"
ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1"
I now want to add more information into column 9, to make it look like this:
ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"; transcript_id "UMM-S589-0.12-gene-1", exon_id "1";
ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";
Does anyone know any simple command I can use to make this file? Thank you so much!
text-processing bioinformatics
add a comment |Â
up vote
2
down vote
favorite
I have a gtf file like this:
ChrI Coding_transcript gene 8451772 8509212 . - . gene_id "UMM-S589-0.12-gene-1"
ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"
ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1"
I now want to add more information into column 9, to make it look like this:
ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"; transcript_id "UMM-S589-0.12-gene-1", exon_id "1";
ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";
Does anyone know any simple command I can use to make this file? Thank you so much!
text-processing bioinformatics
1
how do we get information about 9th column.. is that static, or plz explain if we have any condition.
– SivaPrasath
Aug 24 at 18:20
hi SivaPrasath: thank you for your comment. I don't have any information, as you can see, I want to add the info based on what is present, i.e. gene_id and transcript_id are the same, and exon number is based on the number count of "exon" on 3rd column
– filwy
Aug 24 at 18:23
I see 10 columns, with the 9th that just says "gene_id", but your example seems to append data after column 10. Also it look like you're trying to count exon, and you're only appending to exon lines. What you want is probably doable, but it's unclear what you're asking.
– Angelo
Aug 24 at 18:25
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have a gtf file like this:
ChrI Coding_transcript gene 8451772 8509212 . - . gene_id "UMM-S589-0.12-gene-1"
ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"
ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1"
I now want to add more information into column 9, to make it look like this:
ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"; transcript_id "UMM-S589-0.12-gene-1", exon_id "1";
ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";
Does anyone know any simple command I can use to make this file? Thank you so much!
text-processing bioinformatics
I have a gtf file like this:
ChrI Coding_transcript gene 8451772 8509212 . - . gene_id "UMM-S589-0.12-gene-1"
ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"
ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1"
I now want to add more information into column 9, to make it look like this:
ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"; transcript_id "UMM-S589-0.12-gene-1", exon_id "1";
ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";
Does anyone know any simple command I can use to make this file? Thank you so much!
text-processing bioinformatics
edited Aug 24 at 18:33


Jeff Schaller
32k849109
32k849109
asked Aug 24 at 18:14
filwy
132
132
1
how do we get information about 9th column.. is that static, or plz explain if we have any condition.
– SivaPrasath
Aug 24 at 18:20
hi SivaPrasath: thank you for your comment. I don't have any information, as you can see, I want to add the info based on what is present, i.e. gene_id and transcript_id are the same, and exon number is based on the number count of "exon" on 3rd column
– filwy
Aug 24 at 18:23
I see 10 columns, with the 9th that just says "gene_id", but your example seems to append data after column 10. Also it look like you're trying to count exon, and you're only appending to exon lines. What you want is probably doable, but it's unclear what you're asking.
– Angelo
Aug 24 at 18:25
add a comment |Â
1
how do we get information about 9th column.. is that static, or plz explain if we have any condition.
– SivaPrasath
Aug 24 at 18:20
hi SivaPrasath: thank you for your comment. I don't have any information, as you can see, I want to add the info based on what is present, i.e. gene_id and transcript_id are the same, and exon number is based on the number count of "exon" on 3rd column
– filwy
Aug 24 at 18:23
I see 10 columns, with the 9th that just says "gene_id", but your example seems to append data after column 10. Also it look like you're trying to count exon, and you're only appending to exon lines. What you want is probably doable, but it's unclear what you're asking.
– Angelo
Aug 24 at 18:25
1
1
how do we get information about 9th column.. is that static, or plz explain if we have any condition.
– SivaPrasath
Aug 24 at 18:20
how do we get information about 9th column.. is that static, or plz explain if we have any condition.
– SivaPrasath
Aug 24 at 18:20
hi SivaPrasath: thank you for your comment. I don't have any information, as you can see, I want to add the info based on what is present, i.e. gene_id and transcript_id are the same, and exon number is based on the number count of "exon" on 3rd column
– filwy
Aug 24 at 18:23
hi SivaPrasath: thank you for your comment. I don't have any information, as you can see, I want to add the info based on what is present, i.e. gene_id and transcript_id are the same, and exon number is based on the number count of "exon" on 3rd column
– filwy
Aug 24 at 18:23
I see 10 columns, with the 9th that just says "gene_id", but your example seems to append data after column 10. Also it look like you're trying to count exon, and you're only appending to exon lines. What you want is probably doable, but it's unclear what you're asking.
– Angelo
Aug 24 at 18:25
I see 10 columns, with the 9th that just says "gene_id", but your example seems to append data after column 10. Also it look like you're trying to count exon, and you're only appending to exon lines. What you want is probably doable, but it's unclear what you're asking.
– Angelo
Aug 24 at 18:25
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
4
down vote
accepted
Try this:
awk 'NF==10print $0";transcript_id "$10", exon_id """"++count[$3]"""";"NF!=10print $0' file.gtf
Output:
ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "1";
ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";
NF==10
check if the number of fields is 10.print $0
print the complete line.transcript_id $10
since it is same as gene_id++count[$3]
print the occerance of exon(3rd field)
NF!=10
just print the line.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
Try this:
awk 'NF==10print $0";transcript_id "$10", exon_id """"++count[$3]"""";"NF!=10print $0' file.gtf
Output:
ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "1";
ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";
NF==10
check if the number of fields is 10.print $0
print the complete line.transcript_id $10
since it is same as gene_id++count[$3]
print the occerance of exon(3rd field)
NF!=10
just print the line.
add a comment |Â
up vote
4
down vote
accepted
Try this:
awk 'NF==10print $0";transcript_id "$10", exon_id """"++count[$3]"""";"NF!=10print $0' file.gtf
Output:
ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "1";
ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";
NF==10
check if the number of fields is 10.print $0
print the complete line.transcript_id $10
since it is same as gene_id++count[$3]
print the occerance of exon(3rd field)
NF!=10
just print the line.
add a comment |Â
up vote
4
down vote
accepted
up vote
4
down vote
accepted
Try this:
awk 'NF==10print $0";transcript_id "$10", exon_id """"++count[$3]"""";"NF!=10print $0' file.gtf
Output:
ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "1";
ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";
NF==10
check if the number of fields is 10.print $0
print the complete line.transcript_id $10
since it is same as gene_id++count[$3]
print the occerance of exon(3rd field)
NF!=10
just print the line.
Try this:
awk 'NF==10print $0";transcript_id "$10", exon_id """"++count[$3]"""";"NF!=10print $0' file.gtf
Output:
ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "1";
ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";
NF==10
check if the number of fields is 10.print $0
print the complete line.transcript_id $10
since it is same as gene_id++count[$3]
print the occerance of exon(3rd field)
NF!=10
just print the line.
edited Aug 24 at 18:49
answered Aug 24 at 18:42


SivaPrasath
1
1
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f464689%2fadding-more-information-to-a-string%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
how do we get information about 9th column.. is that static, or plz explain if we have any condition.
– SivaPrasath
Aug 24 at 18:20
hi SivaPrasath: thank you for your comment. I don't have any information, as you can see, I want to add the info based on what is present, i.e. gene_id and transcript_id are the same, and exon number is based on the number count of "exon" on 3rd column
– filwy
Aug 24 at 18:23
I see 10 columns, with the 9th that just says "gene_id", but your example seems to append data after column 10. Also it look like you're trying to count exon, and you're only appending to exon lines. What you want is probably doable, but it's unclear what you're asking.
– Angelo
Aug 24 at 18:25