Adding more information to a string

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












I have a gtf file like this:



ChrI Coding_transcript gene 8451772 8509212 . - . gene_id "UMM-S589-0.12-gene-1"

ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"

ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1"


I now want to add more information into column 9, to make it look like this:



ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"; transcript_id "UMM-S589-0.12-gene-1", exon_id "1";

ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";


Does anyone know any simple command I can use to make this file? Thank you so much!







share|improve this question


















  • 1




    how do we get information about 9th column.. is that static, or plz explain if we have any condition.
    – SivaPrasath
    Aug 24 at 18:20










  • hi SivaPrasath: thank you for your comment. I don't have any information, as you can see, I want to add the info based on what is present, i.e. gene_id and transcript_id are the same, and exon number is based on the number count of "exon" on 3rd column
    – filwy
    Aug 24 at 18:23










  • I see 10 columns, with the 9th that just says "gene_id", but your example seems to append data after column 10. Also it look like you're trying to count exon, and you're only appending to exon lines. What you want is probably doable, but it's unclear what you're asking.
    – Angelo
    Aug 24 at 18:25















up vote
2
down vote

favorite












I have a gtf file like this:



ChrI Coding_transcript gene 8451772 8509212 . - . gene_id "UMM-S589-0.12-gene-1"

ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"

ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1"


I now want to add more information into column 9, to make it look like this:



ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"; transcript_id "UMM-S589-0.12-gene-1", exon_id "1";

ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";


Does anyone know any simple command I can use to make this file? Thank you so much!







share|improve this question


















  • 1




    how do we get information about 9th column.. is that static, or plz explain if we have any condition.
    – SivaPrasath
    Aug 24 at 18:20










  • hi SivaPrasath: thank you for your comment. I don't have any information, as you can see, I want to add the info based on what is present, i.e. gene_id and transcript_id are the same, and exon number is based on the number count of "exon" on 3rd column
    – filwy
    Aug 24 at 18:23










  • I see 10 columns, with the 9th that just says "gene_id", but your example seems to append data after column 10. Also it look like you're trying to count exon, and you're only appending to exon lines. What you want is probably doable, but it's unclear what you're asking.
    – Angelo
    Aug 24 at 18:25













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I have a gtf file like this:



ChrI Coding_transcript gene 8451772 8509212 . - . gene_id "UMM-S589-0.12-gene-1"

ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"

ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1"


I now want to add more information into column 9, to make it look like this:



ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"; transcript_id "UMM-S589-0.12-gene-1", exon_id "1";

ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";


Does anyone know any simple command I can use to make this file? Thank you so much!







share|improve this question














I have a gtf file like this:



ChrI Coding_transcript gene 8451772 8509212 . - . gene_id "UMM-S589-0.12-gene-1"

ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"

ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1"


I now want to add more information into column 9, to make it look like this:



ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1"; transcript_id "UMM-S589-0.12-gene-1", exon_id "1";

ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";


Does anyone know any simple command I can use to make this file? Thank you so much!









share|improve this question













share|improve this question




share|improve this question








edited Aug 24 at 18:33









Jeff Schaller

32k849109




32k849109










asked Aug 24 at 18:14









filwy

132




132







  • 1




    how do we get information about 9th column.. is that static, or plz explain if we have any condition.
    – SivaPrasath
    Aug 24 at 18:20










  • hi SivaPrasath: thank you for your comment. I don't have any information, as you can see, I want to add the info based on what is present, i.e. gene_id and transcript_id are the same, and exon number is based on the number count of "exon" on 3rd column
    – filwy
    Aug 24 at 18:23










  • I see 10 columns, with the 9th that just says "gene_id", but your example seems to append data after column 10. Also it look like you're trying to count exon, and you're only appending to exon lines. What you want is probably doable, but it's unclear what you're asking.
    – Angelo
    Aug 24 at 18:25













  • 1




    how do we get information about 9th column.. is that static, or plz explain if we have any condition.
    – SivaPrasath
    Aug 24 at 18:20










  • hi SivaPrasath: thank you for your comment. I don't have any information, as you can see, I want to add the info based on what is present, i.e. gene_id and transcript_id are the same, and exon number is based on the number count of "exon" on 3rd column
    – filwy
    Aug 24 at 18:23










  • I see 10 columns, with the 9th that just says "gene_id", but your example seems to append data after column 10. Also it look like you're trying to count exon, and you're only appending to exon lines. What you want is probably doable, but it's unclear what you're asking.
    – Angelo
    Aug 24 at 18:25








1




1




how do we get information about 9th column.. is that static, or plz explain if we have any condition.
– SivaPrasath
Aug 24 at 18:20




how do we get information about 9th column.. is that static, or plz explain if we have any condition.
– SivaPrasath
Aug 24 at 18:20












hi SivaPrasath: thank you for your comment. I don't have any information, as you can see, I want to add the info based on what is present, i.e. gene_id and transcript_id are the same, and exon number is based on the number count of "exon" on 3rd column
– filwy
Aug 24 at 18:23




hi SivaPrasath: thank you for your comment. I don't have any information, as you can see, I want to add the info based on what is present, i.e. gene_id and transcript_id are the same, and exon number is based on the number count of "exon" on 3rd column
– filwy
Aug 24 at 18:23












I see 10 columns, with the 9th that just says "gene_id", but your example seems to append data after column 10. Also it look like you're trying to count exon, and you're only appending to exon lines. What you want is probably doable, but it's unclear what you're asking.
– Angelo
Aug 24 at 18:25





I see 10 columns, with the 9th that just says "gene_id", but your example seems to append data after column 10. Also it look like you're trying to count exon, and you're only appending to exon lines. What you want is probably doable, but it's unclear what you're asking.
– Angelo
Aug 24 at 18:25











1 Answer
1






active

oldest

votes

















up vote
4
down vote



accepted










Try this:



awk 'NF==10print $0";transcript_id "$10", exon_id """"++count[$3]"""";"NF!=10print $0' file.gtf


Output:



ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "1";

ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";



  • NF==10 check if the number of fields is 10.


    • print $0 print the complete line.


    • transcript_id $10 since it is same as gene_id


    • ++count[$3] print the occerance of exon(3rd field)



  • NF!=10 just print the line.





share|improve this answer






















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f464689%2fadding-more-information-to-a-string%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    4
    down vote



    accepted










    Try this:



    awk 'NF==10print $0";transcript_id "$10", exon_id """"++count[$3]"""";"NF!=10print $0' file.gtf


    Output:



    ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "1";

    ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";



    • NF==10 check if the number of fields is 10.


      • print $0 print the complete line.


      • transcript_id $10 since it is same as gene_id


      • ++count[$3] print the occerance of exon(3rd field)



    • NF!=10 just print the line.





    share|improve this answer


























      up vote
      4
      down vote



      accepted










      Try this:



      awk 'NF==10print $0";transcript_id "$10", exon_id """"++count[$3]"""";"NF!=10print $0' file.gtf


      Output:



      ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "1";

      ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";



      • NF==10 check if the number of fields is 10.


        • print $0 print the complete line.


        • transcript_id $10 since it is same as gene_id


        • ++count[$3] print the occerance of exon(3rd field)



      • NF!=10 just print the line.





      share|improve this answer
























        up vote
        4
        down vote



        accepted







        up vote
        4
        down vote



        accepted






        Try this:



        awk 'NF==10print $0";transcript_id "$10", exon_id """"++count[$3]"""";"NF!=10print $0' file.gtf


        Output:



        ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "1";

        ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";



        • NF==10 check if the number of fields is 10.


          • print $0 print the complete line.


          • transcript_id $10 since it is same as gene_id


          • ++count[$3] print the occerance of exon(3rd field)



        • NF!=10 just print the line.





        share|improve this answer














        Try this:



        awk 'NF==10print $0";transcript_id "$10", exon_id """"++count[$3]"""";"NF!=10print $0' file.gtf


        Output:



        ChrI Coding_transcript exon 8501974 8509212 . - . gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "1";

        ChrI Coding_transcript exon 8491643 8501928 . - 0 gene_id "UMM-S589-0.12-gene-1";transcript_id "UMM-S589-0.12-gene-1", exon_id "2";



        • NF==10 check if the number of fields is 10.


          • print $0 print the complete line.


          • transcript_id $10 since it is same as gene_id


          • ++count[$3] print the occerance of exon(3rd field)



        • NF!=10 just print the line.






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Aug 24 at 18:49

























        answered Aug 24 at 18:42









        SivaPrasath

        1




        1



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f464689%2fadding-more-information-to-a-string%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            What does second last employer means? [closed]

            List of Gilmore Girls characters

            Confectionery