Delimit by space but ignore backslash space

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












5678 
testing, group
[testing
ip 5.6.7.8
launch-wizard-1 0.0.0.0/0
456dlkjfa
1.2.3.4
test 1.2.3.4/32 4.3.2.0/23 4.3.2.0/23
default 4.3.2.0/23 4.3.2.0/23
launch-wizard-2 0.0.0.0/0
launch-wizard-3 0.0.0.0/0
2.3.4.5/32


I would like to get the first column of the above but the catch is that, I need to treat (backslash space) as a part of the column, so awk 'print $1' should give me



5678
testing, group
[testing
ip 5.6.7.8
launch-wizard-1
456dlkjfa
1.2.3.4
test
default
launch-wizard-2
launch-wizard-3
2.3.4.5/32









share|improve this question



























    up vote
    2
    down vote

    favorite












    5678 
    testing, group
    [testing
    ip 5.6.7.8
    launch-wizard-1 0.0.0.0/0
    456dlkjfa
    1.2.3.4
    test 1.2.3.4/32 4.3.2.0/23 4.3.2.0/23
    default 4.3.2.0/23 4.3.2.0/23
    launch-wizard-2 0.0.0.0/0
    launch-wizard-3 0.0.0.0/0
    2.3.4.5/32


    I would like to get the first column of the above but the catch is that, I need to treat (backslash space) as a part of the column, so awk 'print $1' should give me



    5678
    testing, group
    [testing
    ip 5.6.7.8
    launch-wizard-1
    456dlkjfa
    1.2.3.4
    test
    default
    launch-wizard-2
    launch-wizard-3
    2.3.4.5/32









    share|improve this question

























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      5678 
      testing, group
      [testing
      ip 5.6.7.8
      launch-wizard-1 0.0.0.0/0
      456dlkjfa
      1.2.3.4
      test 1.2.3.4/32 4.3.2.0/23 4.3.2.0/23
      default 4.3.2.0/23 4.3.2.0/23
      launch-wizard-2 0.0.0.0/0
      launch-wizard-3 0.0.0.0/0
      2.3.4.5/32


      I would like to get the first column of the above but the catch is that, I need to treat (backslash space) as a part of the column, so awk 'print $1' should give me



      5678
      testing, group
      [testing
      ip 5.6.7.8
      launch-wizard-1
      456dlkjfa
      1.2.3.4
      test
      default
      launch-wizard-2
      launch-wizard-3
      2.3.4.5/32









      share|improve this question















      5678 
      testing, group
      [testing
      ip 5.6.7.8
      launch-wizard-1 0.0.0.0/0
      456dlkjfa
      1.2.3.4
      test 1.2.3.4/32 4.3.2.0/23 4.3.2.0/23
      default 4.3.2.0/23 4.3.2.0/23
      launch-wizard-2 0.0.0.0/0
      launch-wizard-3 0.0.0.0/0
      2.3.4.5/32


      I would like to get the first column of the above but the catch is that, I need to treat (backslash space) as a part of the column, so awk 'print $1' should give me



      5678
      testing, group
      [testing
      ip 5.6.7.8
      launch-wizard-1
      456dlkjfa
      1.2.3.4
      test
      default
      launch-wizard-2
      launch-wizard-3
      2.3.4.5/32






      text-processing awk sed






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 6 hours ago









      αғsнιη

      16k92563




      16k92563










      asked 7 hours ago









      GypsyCosmonaut

      683628




      683628




















          4 Answers
          4






          active

          oldest

          votes

















          up vote
          4
          down vote



          accepted










          with gnu awk (gawk) you can use some zero-length assertions like < or >:



          $ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
          a b


          but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



          $ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
          a b,





          share|improve this answer





























            up vote
            3
            down vote













            You could substitute space with something else and back again afterwards.



            sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'





            share|improve this answer




















            • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
              – ctac_
              5 hours ago










            • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
              – glenn jackman
              3 hours ago


















            up vote
            3
            down vote













            With just sed:



            sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


            Or shorter:



            sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


            This (([^]*\ )1,)?[^ ]* matches:




            • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


            • ([^]*\ )1,: matching above with one-or-more times of occurrences.


            • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


            • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


            • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

            then is replacement part just print the 1 which is the output:



            5678
            testing, group
            [testing
            ip 5.6.7.8
            launch-wizard-1
            456dlkjfa
            1.2.3.4
            test
            default
            launch-wizard-2
            launch-wizard-3
            2.3.4.5/32





            share|improve this answer





























              up vote
              1
              down vote













              With GNU grep or compatible:



              grep -Po '^(\.|S)*'


              Or with ERE:



              grep -Eo '^(\.|[^[:space:]])*'




              share




















                Your Answer







                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "106"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                convertImagesToLinks: false,
                noModals: false,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: null,
                bindNavPrevention: true,
                postfix: "",
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                 

                draft saved


                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f471353%2fdelimit-by-space-but-ignore-backslash-space%23new-answer', 'question_page');

                );

                Post as a guest






























                4 Answers
                4






                active

                oldest

                votes








                4 Answers
                4






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes








                up vote
                4
                down vote



                accepted










                with gnu awk (gawk) you can use some zero-length assertions like < or >:



                $ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
                a b


                but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



                $ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
                a b,





                share|improve this answer


























                  up vote
                  4
                  down vote



                  accepted










                  with gnu awk (gawk) you can use some zero-length assertions like < or >:



                  $ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
                  a b


                  but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



                  $ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
                  a b,





                  share|improve this answer
























                    up vote
                    4
                    down vote



                    accepted







                    up vote
                    4
                    down vote



                    accepted






                    with gnu awk (gawk) you can use some zero-length assertions like < or >:



                    $ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
                    a b


                    but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



                    $ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
                    a b,





                    share|improve this answer














                    with gnu awk (gawk) you can use some zero-length assertions like < or >:



                    $ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
                    a b


                    but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



                    $ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
                    a b,






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited 3 mins ago

























                    answered 5 hours ago









                    mosvy

                    1,2328




                    1,2328






















                        up vote
                        3
                        down vote













                        You could substitute space with something else and back again afterwards.



                        sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'





                        share|improve this answer




















                        • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                          – ctac_
                          5 hours ago










                        • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                          – glenn jackman
                          3 hours ago















                        up vote
                        3
                        down vote













                        You could substitute space with something else and back again afterwards.



                        sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'





                        share|improve this answer




















                        • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                          – ctac_
                          5 hours ago










                        • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                          – glenn jackman
                          3 hours ago













                        up vote
                        3
                        down vote










                        up vote
                        3
                        down vote









                        You could substitute space with something else and back again afterwards.



                        sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'





                        share|improve this answer












                        You could substitute space with something else and back again afterwards.



                        sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'






                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered 6 hours ago









                        RoVo

                        1,646213




                        1,646213











                        • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                          – ctac_
                          5 hours ago










                        • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                          – glenn jackman
                          3 hours ago

















                        • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                          – ctac_
                          5 hours ago










                        • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                          – glenn jackman
                          3 hours ago
















                        Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                        – ctac_
                        5 hours ago




                        Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                        – ctac_
                        5 hours ago












                        Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                        – glenn jackman
                        3 hours ago





                        Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                        – glenn jackman
                        3 hours ago











                        up vote
                        3
                        down vote













                        With just sed:



                        sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


                        Or shorter:



                        sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


                        This (([^]*\ )1,)?[^ ]* matches:




                        • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


                        • ([^]*\ )1,: matching above with one-or-more times of occurrences.


                        • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


                        • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


                        • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

                        then is replacement part just print the 1 which is the output:



                        5678
                        testing, group
                        [testing
                        ip 5.6.7.8
                        launch-wizard-1
                        456dlkjfa
                        1.2.3.4
                        test
                        default
                        launch-wizard-2
                        launch-wizard-3
                        2.3.4.5/32





                        share|improve this answer


























                          up vote
                          3
                          down vote













                          With just sed:



                          sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


                          Or shorter:



                          sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


                          This (([^]*\ )1,)?[^ ]* matches:




                          • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


                          • ([^]*\ )1,: matching above with one-or-more times of occurrences.


                          • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


                          • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


                          • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

                          then is replacement part just print the 1 which is the output:



                          5678
                          testing, group
                          [testing
                          ip 5.6.7.8
                          launch-wizard-1
                          456dlkjfa
                          1.2.3.4
                          test
                          default
                          launch-wizard-2
                          launch-wizard-3
                          2.3.4.5/32





                          share|improve this answer
























                            up vote
                            3
                            down vote










                            up vote
                            3
                            down vote









                            With just sed:



                            sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


                            Or shorter:



                            sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


                            This (([^]*\ )1,)?[^ ]* matches:




                            • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


                            • ([^]*\ )1,: matching above with one-or-more times of occurrences.


                            • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


                            • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


                            • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

                            then is replacement part just print the 1 which is the output:



                            5678
                            testing, group
                            [testing
                            ip 5.6.7.8
                            launch-wizard-1
                            456dlkjfa
                            1.2.3.4
                            test
                            default
                            launch-wizard-2
                            launch-wizard-3
                            2.3.4.5/32





                            share|improve this answer














                            With just sed:



                            sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


                            Or shorter:



                            sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


                            This (([^]*\ )1,)?[^ ]* matches:




                            • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


                            • ([^]*\ )1,: matching above with one-or-more times of occurrences.


                            • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


                            • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


                            • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

                            then is replacement part just print the 1 which is the output:



                            5678
                            testing, group
                            [testing
                            ip 5.6.7.8
                            launch-wizard-1
                            456dlkjfa
                            1.2.3.4
                            test
                            default
                            launch-wizard-2
                            launch-wizard-3
                            2.3.4.5/32






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited 6 hours ago

























                            answered 6 hours ago









                            αғsнιη

                            16k92563




                            16k92563




















                                up vote
                                1
                                down vote













                                With GNU grep or compatible:



                                grep -Po '^(\.|S)*'


                                Or with ERE:



                                grep -Eo '^(\.|[^[:space:]])*'




                                share
























                                  up vote
                                  1
                                  down vote













                                  With GNU grep or compatible:



                                  grep -Po '^(\.|S)*'


                                  Or with ERE:



                                  grep -Eo '^(\.|[^[:space:]])*'




                                  share






















                                    up vote
                                    1
                                    down vote










                                    up vote
                                    1
                                    down vote









                                    With GNU grep or compatible:



                                    grep -Po '^(\.|S)*'


                                    Or with ERE:



                                    grep -Eo '^(\.|[^[:space:]])*'




                                    share












                                    With GNU grep or compatible:



                                    grep -Po '^(\.|S)*'


                                    Or with ERE:



                                    grep -Eo '^(\.|[^[:space:]])*'





                                    share











                                    share


                                    share










                                    answered 3 mins ago









                                    Stéphane Chazelas

                                    286k53527866




                                    286k53527866



























                                         

                                        draft saved


                                        draft discarded















































                                         


                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f471353%2fdelimit-by-space-but-ignore-backslash-space%23new-answer', 'question_page');

                                        );

                                        Post as a guest













































































                                        Comments

                                        Popular posts from this blog

                                        Long meetings (6-7 hours a day): Being “babysat” by supervisor

                                        Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

                                        Confectionery