Treat tab as eight characters in grep regex

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite












To count lines wider than 80 columns I currently use this:



$ git grep -h -c -v '^.,80$' **/*.c,h,pl,y 
|awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
44984


(-h courtesy of @stéphane-chazelas.)



Unfortunately, the repo uses tabs for indenting so the grep pattern
is inaccurate. Is there a way to have the regex treat tabs at the
standard width of 8 chars like wc -L does?



For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).



For performance reasons I’d prefer a solution that works inside
git-grep(1) or maybe another grep tool, without preprocessing
files
.










share|improve this question



























    up vote
    3
    down vote

    favorite












    To count lines wider than 80 columns I currently use this:



    $ git grep -h -c -v '^.,80$' **/*.c,h,pl,y 
    |awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
    44984


    (-h courtesy of @stéphane-chazelas.)



    Unfortunately, the repo uses tabs for indenting so the grep pattern
    is inaccurate. Is there a way to have the regex treat tabs at the
    standard width of 8 chars like wc -L does?



    For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).



    For performance reasons I’d prefer a solution that works inside
    git-grep(1) or maybe another grep tool, without preprocessing
    files
    .










    share|improve this question

























      up vote
      3
      down vote

      favorite









      up vote
      3
      down vote

      favorite











      To count lines wider than 80 columns I currently use this:



      $ git grep -h -c -v '^.,80$' **/*.c,h,pl,y 
      |awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
      44984


      (-h courtesy of @stéphane-chazelas.)



      Unfortunately, the repo uses tabs for indenting so the grep pattern
      is inaccurate. Is there a way to have the regex treat tabs at the
      standard width of 8 chars like wc -L does?



      For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).



      For performance reasons I’d prefer a solution that works inside
      git-grep(1) or maybe another grep tool, without preprocessing
      files
      .










      share|improve this question















      To count lines wider than 80 columns I currently use this:



      $ git grep -h -c -v '^.,80$' **/*.c,h,pl,y 
      |awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
      44984


      (-h courtesy of @stéphane-chazelas.)



      Unfortunately, the repo uses tabs for indenting so the grep pattern
      is inaccurate. Is there a way to have the regex treat tabs at the
      standard width of 8 chars like wc -L does?



      For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).



      For performance reasons I’d prefer a solution that works inside
      git-grep(1) or maybe another grep tool, without preprocessing
      files
      .







      grep






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 1 hour ago









      roaima

      40.1k547110




      40.1k547110










      asked 2 hours ago









      phg

      634416




      634416




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.



          • No tabs, at least 81 characters

          • One tab, at least 73 characters

          • Two tabs, at least 65 characeters

          • Etc.

          The resulting mess is as follows, with your awk statement summing the individual line counts to provide a grand total



          git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
          awk ' i+=$1 END printf ("%dn", i) '





          share|improve this answer



























            up vote
            2
            down vote













            Preprocess the files by piping them through expand. The expand utility will expand tabs appropriately (using the standard tab stops at every 8th character).



            find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
            awk 'length > 80 n++ END print n '





            share|improve this answer



























              up vote
              1
              down vote













              GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns. wc -L also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).



              $ printf 'abcdetn' | wc -L
              8


              Here, you could use expand to expand those TABs to spaces:



              git grep -h '' ./**/*.c,h,pl,y | expand | grep -cE '.81'


              That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).



              $ printf 'ééééétn' | wc -L
              8
              $ printf 'ééééétn' | expand | wc -L
              11


              Also note that ./**/*.c,h,pl,y would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh or bash -O failglob) if either of those globs don't match.



              With zsh, you'd use ./**/*.(c|h|p[ly])(D.) which is one glob, and where D includes hidden files and . restricts to regular files.






              share|improve this answer






















              • “GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.” You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
                – phg
                2 hours ago










              • @phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
                – Stéphane Chazelas
                1 hour ago










              • For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
                – phg
                1 hour ago










              Your Answer







              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "106"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              convertImagesToLinks: false,
              noModals: false,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













               

              draft saved


              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f468966%2ftreat-tab-as-eight-characters-in-grep-regex%23new-answer', 'question_page');

              );

              Post as a guest






























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              3
              down vote



              accepted










              If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.



              • No tabs, at least 81 characters

              • One tab, at least 73 characters

              • Two tabs, at least 65 characeters

              • Etc.

              The resulting mess is as follows, with your awk statement summing the individual line counts to provide a grand total



              git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
              awk ' i+=$1 END printf ("%dn", i) '





              share|improve this answer
























                up vote
                3
                down vote



                accepted










                If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.



                • No tabs, at least 81 characters

                • One tab, at least 73 characters

                • Two tabs, at least 65 characeters

                • Etc.

                The resulting mess is as follows, with your awk statement summing the individual line counts to provide a grand total



                git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
                awk ' i+=$1 END printf ("%dn", i) '





                share|improve this answer






















                  up vote
                  3
                  down vote



                  accepted







                  up vote
                  3
                  down vote



                  accepted






                  If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.



                  • No tabs, at least 81 characters

                  • One tab, at least 73 characters

                  • Two tabs, at least 65 characeters

                  • Etc.

                  The resulting mess is as follows, with your awk statement summing the individual line counts to provide a grand total



                  git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
                  awk ' i+=$1 END printf ("%dn", i) '





                  share|improve this answer












                  If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.



                  • No tabs, at least 81 characters

                  • One tab, at least 73 characters

                  • Two tabs, at least 65 characeters

                  • Etc.

                  The resulting mess is as follows, with your awk statement summing the individual line counts to provide a grand total



                  git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
                  awk ' i+=$1 END printf ("%dn", i) '






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 1 hour ago









                  roaima

                  40.1k547110




                  40.1k547110






















                      up vote
                      2
                      down vote













                      Preprocess the files by piping them through expand. The expand utility will expand tabs appropriately (using the standard tab stops at every 8th character).



                      find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
                      awk 'length > 80 n++ END print n '





                      share|improve this answer
























                        up vote
                        2
                        down vote













                        Preprocess the files by piping them through expand. The expand utility will expand tabs appropriately (using the standard tab stops at every 8th character).



                        find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
                        awk 'length > 80 n++ END print n '





                        share|improve this answer






















                          up vote
                          2
                          down vote










                          up vote
                          2
                          down vote









                          Preprocess the files by piping them through expand. The expand utility will expand tabs appropriately (using the standard tab stops at every 8th character).



                          find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
                          awk 'length > 80 n++ END print n '





                          share|improve this answer












                          Preprocess the files by piping them through expand. The expand utility will expand tabs appropriately (using the standard tab stops at every 8th character).



                          find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
                          awk 'length > 80 n++ END print n '






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered 2 hours ago









                          Kusalananda

                          105k14209326




                          105k14209326




















                              up vote
                              1
                              down vote













                              GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns. wc -L also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).



                              $ printf 'abcdetn' | wc -L
                              8


                              Here, you could use expand to expand those TABs to spaces:



                              git grep -h '' ./**/*.c,h,pl,y | expand | grep -cE '.81'


                              That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).



                              $ printf 'ééééétn' | wc -L
                              8
                              $ printf 'ééééétn' | expand | wc -L
                              11


                              Also note that ./**/*.c,h,pl,y would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh or bash -O failglob) if either of those globs don't match.



                              With zsh, you'd use ./**/*.(c|h|p[ly])(D.) which is one glob, and where D includes hidden files and . restricts to regular files.






                              share|improve this answer






















                              • “GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.” You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
                                – phg
                                2 hours ago










                              • @phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
                                – Stéphane Chazelas
                                1 hour ago










                              • For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
                                – phg
                                1 hour ago














                              up vote
                              1
                              down vote













                              GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns. wc -L also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).



                              $ printf 'abcdetn' | wc -L
                              8


                              Here, you could use expand to expand those TABs to spaces:



                              git grep -h '' ./**/*.c,h,pl,y | expand | grep -cE '.81'


                              That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).



                              $ printf 'ééééétn' | wc -L
                              8
                              $ printf 'ééééétn' | expand | wc -L
                              11


                              Also note that ./**/*.c,h,pl,y would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh or bash -O failglob) if either of those globs don't match.



                              With zsh, you'd use ./**/*.(c|h|p[ly])(D.) which is one glob, and where D includes hidden files and . restricts to regular files.






                              share|improve this answer






















                              • “GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.” You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
                                – phg
                                2 hours ago










                              • @phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
                                – Stéphane Chazelas
                                1 hour ago










                              • For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
                                – phg
                                1 hour ago












                              up vote
                              1
                              down vote










                              up vote
                              1
                              down vote









                              GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns. wc -L also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).



                              $ printf 'abcdetn' | wc -L
                              8


                              Here, you could use expand to expand those TABs to spaces:



                              git grep -h '' ./**/*.c,h,pl,y | expand | grep -cE '.81'


                              That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).



                              $ printf 'ééééétn' | wc -L
                              8
                              $ printf 'ééééétn' | expand | wc -L
                              11


                              Also note that ./**/*.c,h,pl,y would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh or bash -O failglob) if either of those globs don't match.



                              With zsh, you'd use ./**/*.(c|h|p[ly])(D.) which is one glob, and where D includes hidden files and . restricts to regular files.






                              share|improve this answer














                              GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns. wc -L also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).



                              $ printf 'abcdetn' | wc -L
                              8


                              Here, you could use expand to expand those TABs to spaces:



                              git grep -h '' ./**/*.c,h,pl,y | expand | grep -cE '.81'


                              That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).



                              $ printf 'ééééétn' | wc -L
                              8
                              $ printf 'ééééétn' | expand | wc -L
                              11


                              Also note that ./**/*.c,h,pl,y would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh or bash -O failglob) if either of those globs don't match.



                              With zsh, you'd use ./**/*.(c|h|p[ly])(D.) which is one glob, and where D includes hidden files and . restricts to regular files.







                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited 1 hour ago

























                              answered 2 hours ago









                              Stéphane Chazelas

                              283k53522859




                              283k53522859











                              • “GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.” You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
                                – phg
                                2 hours ago










                              • @phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
                                – Stéphane Chazelas
                                1 hour ago










                              • For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
                                – phg
                                1 hour ago
















                              • “GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.” You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
                                – phg
                                2 hours ago










                              • @phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
                                – Stéphane Chazelas
                                1 hour ago










                              • For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
                                – phg
                                1 hour ago















                              “GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.” You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
                              – phg
                              2 hours ago




                              “GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.” You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
                              – phg
                              2 hours ago












                              @phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
                              – Stéphane Chazelas
                              1 hour ago




                              @phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
                              – Stéphane Chazelas
                              1 hour ago












                              For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
                              – phg
                              1 hour ago




                              For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
                              – phg
                              1 hour ago

















                               

                              draft saved


                              draft discarded















































                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f468966%2ftreat-tab-as-eight-characters-in-grep-regex%23new-answer', 'question_page');

                              );

                              Post as a guest













































































                              Comments

                              Popular posts from this blog

                              What does second last employer means? [closed]

                              List of Gilmore Girls characters

                              One-line joke