Count lines wider than 80 columns, taking tabs correctly into account

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
6
down vote

favorite












To count lines wider than 80 columns I currently use this:



$ git grep -h -c -v '^.,80$' **/*.c,h,pl,y 
|awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
44984


(-h courtesy of @stéphane-chazelas.)



Unfortunately, the repo uses tabs for indenting so the grep pattern
is inaccurate. Is there a way to have the regex treat tabs at the
standard width of 8 chars like wc -L does?



For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).



For performance reasons I’d prefer a solution that works inside
git-grep(1) or maybe another grep tool, without preprocessing
files
.










share|improve this question



























    up vote
    6
    down vote

    favorite












    To count lines wider than 80 columns I currently use this:



    $ git grep -h -c -v '^.,80$' **/*.c,h,pl,y 
    |awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
    44984


    (-h courtesy of @stéphane-chazelas.)



    Unfortunately, the repo uses tabs for indenting so the grep pattern
    is inaccurate. Is there a way to have the regex treat tabs at the
    standard width of 8 chars like wc -L does?



    For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).



    For performance reasons I’d prefer a solution that works inside
    git-grep(1) or maybe another grep tool, without preprocessing
    files
    .










    share|improve this question

























      up vote
      6
      down vote

      favorite









      up vote
      6
      down vote

      favorite











      To count lines wider than 80 columns I currently use this:



      $ git grep -h -c -v '^.,80$' **/*.c,h,pl,y 
      |awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
      44984


      (-h courtesy of @stéphane-chazelas.)



      Unfortunately, the repo uses tabs for indenting so the grep pattern
      is inaccurate. Is there a way to have the regex treat tabs at the
      standard width of 8 chars like wc -L does?



      For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).



      For performance reasons I’d prefer a solution that works inside
      git-grep(1) or maybe another grep tool, without preprocessing
      files
      .










      share|improve this question















      To count lines wider than 80 columns I currently use this:



      $ git grep -h -c -v '^.,80$' **/*.c,h,pl,y 
      |awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
      44984


      (-h courtesy of @stéphane-chazelas.)



      Unfortunately, the repo uses tabs for indenting so the grep pattern
      is inaccurate. Is there a way to have the regex treat tabs at the
      standard width of 8 chars like wc -L does?



      For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).



      For performance reasons I’d prefer a solution that works inside
      git-grep(1) or maybe another grep tool, without preprocessing
      files
      .







      grep






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 17 mins ago









      ilkkachu

      50.9k678140




      50.9k678140










      asked 7 hours ago









      phg

      649416




      649416




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          8
          down vote



          accepted










          If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.



          • No tabs, at least 81 characters

          • One tab, at least 73 characters

          • Two tabs, at least 65 characeters

          • Etc.

          The resulting mess is as follows, with your awk statement summing the individual line counts to provide a grand total



          git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
          awk ' i+=$1 END printf ("%dn", i) '





          share|improve this answer




















          • Note that git grep -P (at least with my 2.18.0 version on Debian here) doesn't work with multi-byte characters. For instance, it considers that © (common in source files) is 2 characters instead of one when encoded in UTF-8. It's OK with -E. You can work around it in UTF-8 locales by writing git grep -hcP '(*UTF8)...'
            – Stéphane Chazelas
            22 mins ago


















          up vote
          7
          down vote













          GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as they would be displayed in a terminal with TAB stops every 8 columns so would have a "width" ranging from 1 to 8 characters depending on where they're found on the line. wc -L also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).



          $ printf 'abcdetn' | wc -L
          8


          Here, you could use expand to expand those TABs to spaces:



          git grep -h '' ./**/*.c,h,pl,y | expand | tr -d 'r' | grep -cE '.81'


          (also not counting CR characters in case some of those files come from the Microsoft world).



          That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).



          $ printf 'ééééétn' | wc -L
          8
          $ printf 'ééééétn' | expand | wc -L
          11


          Also note that ./**/*.c,h,pl,y would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh or bash -O failglob) if either of those globs don't match.



          With zsh, you'd use ./**/*.(c|h|p[ly])(D.) which is one glob, and where D includes hidden files and . restricts to regular files.



          For a solution that takes into account the actual width of characters (assuming all the text files are encoded in the locale's character encoding) you could use:



          git grep -h '' ./**/*.(c|h|p[ly])(.) |
          perl -Mopen=locale -MText::Tabs -MText::CharWidth=mbswidth -lne '
          $_ = expand($_);
          s/[[:cntrl:]]//g;
          $n++ if mbswidth(expand($_)) > 80;
          ENDprint 0+$n'


          Here removing all control characters (but NL, the record delimiter and TAB which is expanded), not just CR as mbswidth() at least on GNU systems considers them as having a width of -1. In any case, it's not really possible to always know what impact a control character will have on the display width of text, as that depends on how the displaying device interprets those control characters. Another commonly found control character in text files is form feed, but it's usually found on its own on a line, so is unlikely to make any difference here.






          share|improve this answer






















          • “GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.” You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
            – phg
            7 hours ago










          • @phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
            – Stéphane Chazelas
            6 hours ago










          • For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
            – phg
            6 hours ago






          • 1




            @phg, looking at the Linux kernel source tree (an old checkout from May I had lying about), @roaima's approach finds 32408 too many lines. Not so much about mixed tab+spc indenting, but because tab is also used for column alignments for table-like sequences of #define symbol value or declarations.
            – Stéphane Chazelas
            1 hour ago










          • Interesting result, but we’re not dealing with the the kernel tree here.
            – phg
            1 hour ago

















          up vote
          5
          down vote













          Preprocess the files by piping them through expand. The expand utility will expand tabs appropriately (using the standard tab stops at every 8th character).



          find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
          awk 'length > 80 n++ END print n '





          share|improve this answer




















            Your Answer







            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "106"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f468966%2fcount-lines-wider-than-80-columns-taking-tabs-correctly-into-account%23new-answer', 'question_page');

            );

            Post as a guest






























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            8
            down vote



            accepted










            If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.



            • No tabs, at least 81 characters

            • One tab, at least 73 characters

            • Two tabs, at least 65 characeters

            • Etc.

            The resulting mess is as follows, with your awk statement summing the individual line counts to provide a grand total



            git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
            awk ' i+=$1 END printf ("%dn", i) '





            share|improve this answer




















            • Note that git grep -P (at least with my 2.18.0 version on Debian here) doesn't work with multi-byte characters. For instance, it considers that © (common in source files) is 2 characters instead of one when encoded in UTF-8. It's OK with -E. You can work around it in UTF-8 locales by writing git grep -hcP '(*UTF8)...'
              – Stéphane Chazelas
              22 mins ago















            up vote
            8
            down vote



            accepted










            If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.



            • No tabs, at least 81 characters

            • One tab, at least 73 characters

            • Two tabs, at least 65 characeters

            • Etc.

            The resulting mess is as follows, with your awk statement summing the individual line counts to provide a grand total



            git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
            awk ' i+=$1 END printf ("%dn", i) '





            share|improve this answer




















            • Note that git grep -P (at least with my 2.18.0 version on Debian here) doesn't work with multi-byte characters. For instance, it considers that © (common in source files) is 2 characters instead of one when encoded in UTF-8. It's OK with -E. You can work around it in UTF-8 locales by writing git grep -hcP '(*UTF8)...'
              – Stéphane Chazelas
              22 mins ago













            up vote
            8
            down vote



            accepted







            up vote
            8
            down vote



            accepted






            If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.



            • No tabs, at least 81 characters

            • One tab, at least 73 characters

            • Two tabs, at least 65 characeters

            • Etc.

            The resulting mess is as follows, with your awk statement summing the individual line counts to provide a grand total



            git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
            awk ' i+=$1 END printf ("%dn", i) '





            share|improve this answer












            If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.



            • No tabs, at least 81 characters

            • One tab, at least 73 characters

            • Two tabs, at least 65 characeters

            • Etc.

            The resulting mess is as follows, with your awk statement summing the individual line counts to provide a grand total



            git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
            awk ' i+=$1 END printf ("%dn", i) '






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 6 hours ago









            roaima

            40.2k547110




            40.2k547110











            • Note that git grep -P (at least with my 2.18.0 version on Debian here) doesn't work with multi-byte characters. For instance, it considers that © (common in source files) is 2 characters instead of one when encoded in UTF-8. It's OK with -E. You can work around it in UTF-8 locales by writing git grep -hcP '(*UTF8)...'
              – Stéphane Chazelas
              22 mins ago

















            • Note that git grep -P (at least with my 2.18.0 version on Debian here) doesn't work with multi-byte characters. For instance, it considers that © (common in source files) is 2 characters instead of one when encoded in UTF-8. It's OK with -E. You can work around it in UTF-8 locales by writing git grep -hcP '(*UTF8)...'
              – Stéphane Chazelas
              22 mins ago
















            Note that git grep -P (at least with my 2.18.0 version on Debian here) doesn't work with multi-byte characters. For instance, it considers that © (common in source files) is 2 characters instead of one when encoded in UTF-8. It's OK with -E. You can work around it in UTF-8 locales by writing git grep -hcP '(*UTF8)...'
            – Stéphane Chazelas
            22 mins ago





            Note that git grep -P (at least with my 2.18.0 version on Debian here) doesn't work with multi-byte characters. For instance, it considers that © (common in source files) is 2 characters instead of one when encoded in UTF-8. It's OK with -E. You can work around it in UTF-8 locales by writing git grep -hcP '(*UTF8)...'
            – Stéphane Chazelas
            22 mins ago













            up vote
            7
            down vote













            GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as they would be displayed in a terminal with TAB stops every 8 columns so would have a "width" ranging from 1 to 8 characters depending on where they're found on the line. wc -L also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).



            $ printf 'abcdetn' | wc -L
            8


            Here, you could use expand to expand those TABs to spaces:



            git grep -h '' ./**/*.c,h,pl,y | expand | tr -d 'r' | grep -cE '.81'


            (also not counting CR characters in case some of those files come from the Microsoft world).



            That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).



            $ printf 'ééééétn' | wc -L
            8
            $ printf 'ééééétn' | expand | wc -L
            11


            Also note that ./**/*.c,h,pl,y would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh or bash -O failglob) if either of those globs don't match.



            With zsh, you'd use ./**/*.(c|h|p[ly])(D.) which is one glob, and where D includes hidden files and . restricts to regular files.



            For a solution that takes into account the actual width of characters (assuming all the text files are encoded in the locale's character encoding) you could use:



            git grep -h '' ./**/*.(c|h|p[ly])(.) |
            perl -Mopen=locale -MText::Tabs -MText::CharWidth=mbswidth -lne '
            $_ = expand($_);
            s/[[:cntrl:]]//g;
            $n++ if mbswidth(expand($_)) > 80;
            ENDprint 0+$n'


            Here removing all control characters (but NL, the record delimiter and TAB which is expanded), not just CR as mbswidth() at least on GNU systems considers them as having a width of -1. In any case, it's not really possible to always know what impact a control character will have on the display width of text, as that depends on how the displaying device interprets those control characters. Another commonly found control character in text files is form feed, but it's usually found on its own on a line, so is unlikely to make any difference here.






            share|improve this answer






















            • “GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.” You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
              – phg
              7 hours ago










            • @phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
              – Stéphane Chazelas
              6 hours ago










            • For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
              – phg
              6 hours ago






            • 1




              @phg, looking at the Linux kernel source tree (an old checkout from May I had lying about), @roaima's approach finds 32408 too many lines. Not so much about mixed tab+spc indenting, but because tab is also used for column alignments for table-like sequences of #define symbol value or declarations.
              – Stéphane Chazelas
              1 hour ago










            • Interesting result, but we’re not dealing with the the kernel tree here.
              – phg
              1 hour ago














            up vote
            7
            down vote













            GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as they would be displayed in a terminal with TAB stops every 8 columns so would have a "width" ranging from 1 to 8 characters depending on where they're found on the line. wc -L also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).



            $ printf 'abcdetn' | wc -L
            8


            Here, you could use expand to expand those TABs to spaces:



            git grep -h '' ./**/*.c,h,pl,y | expand | tr -d 'r' | grep -cE '.81'


            (also not counting CR characters in case some of those files come from the Microsoft world).



            That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).



            $ printf 'ééééétn' | wc -L
            8
            $ printf 'ééééétn' | expand | wc -L
            11


            Also note that ./**/*.c,h,pl,y would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh or bash -O failglob) if either of those globs don't match.



            With zsh, you'd use ./**/*.(c|h|p[ly])(D.) which is one glob, and where D includes hidden files and . restricts to regular files.



            For a solution that takes into account the actual width of characters (assuming all the text files are encoded in the locale's character encoding) you could use:



            git grep -h '' ./**/*.(c|h|p[ly])(.) |
            perl -Mopen=locale -MText::Tabs -MText::CharWidth=mbswidth -lne '
            $_ = expand($_);
            s/[[:cntrl:]]//g;
            $n++ if mbswidth(expand($_)) > 80;
            ENDprint 0+$n'


            Here removing all control characters (but NL, the record delimiter and TAB which is expanded), not just CR as mbswidth() at least on GNU systems considers them as having a width of -1. In any case, it's not really possible to always know what impact a control character will have on the display width of text, as that depends on how the displaying device interprets those control characters. Another commonly found control character in text files is form feed, but it's usually found on its own on a line, so is unlikely to make any difference here.






            share|improve this answer






















            • “GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.” You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
              – phg
              7 hours ago










            • @phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
              – Stéphane Chazelas
              6 hours ago










            • For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
              – phg
              6 hours ago






            • 1




              @phg, looking at the Linux kernel source tree (an old checkout from May I had lying about), @roaima's approach finds 32408 too many lines. Not so much about mixed tab+spc indenting, but because tab is also used for column alignments for table-like sequences of #define symbol value or declarations.
              – Stéphane Chazelas
              1 hour ago










            • Interesting result, but we’re not dealing with the the kernel tree here.
              – phg
              1 hour ago












            up vote
            7
            down vote










            up vote
            7
            down vote









            GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as they would be displayed in a terminal with TAB stops every 8 columns so would have a "width" ranging from 1 to 8 characters depending on where they're found on the line. wc -L also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).



            $ printf 'abcdetn' | wc -L
            8


            Here, you could use expand to expand those TABs to spaces:



            git grep -h '' ./**/*.c,h,pl,y | expand | tr -d 'r' | grep -cE '.81'


            (also not counting CR characters in case some of those files come from the Microsoft world).



            That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).



            $ printf 'ééééétn' | wc -L
            8
            $ printf 'ééééétn' | expand | wc -L
            11


            Also note that ./**/*.c,h,pl,y would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh or bash -O failglob) if either of those globs don't match.



            With zsh, you'd use ./**/*.(c|h|p[ly])(D.) which is one glob, and where D includes hidden files and . restricts to regular files.



            For a solution that takes into account the actual width of characters (assuming all the text files are encoded in the locale's character encoding) you could use:



            git grep -h '' ./**/*.(c|h|p[ly])(.) |
            perl -Mopen=locale -MText::Tabs -MText::CharWidth=mbswidth -lne '
            $_ = expand($_);
            s/[[:cntrl:]]//g;
            $n++ if mbswidth(expand($_)) > 80;
            ENDprint 0+$n'


            Here removing all control characters (but NL, the record delimiter and TAB which is expanded), not just CR as mbswidth() at least on GNU systems considers them as having a width of -1. In any case, it's not really possible to always know what impact a control character will have on the display width of text, as that depends on how the displaying device interprets those control characters. Another commonly found control character in text files is form feed, but it's usually found on its own on a line, so is unlikely to make any difference here.






            share|improve this answer














            GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as they would be displayed in a terminal with TAB stops every 8 columns so would have a "width" ranging from 1 to 8 characters depending on where they're found on the line. wc -L also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).



            $ printf 'abcdetn' | wc -L
            8


            Here, you could use expand to expand those TABs to spaces:



            git grep -h '' ./**/*.c,h,pl,y | expand | tr -d 'r' | grep -cE '.81'


            (also not counting CR characters in case some of those files come from the Microsoft world).



            That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).



            $ printf 'ééééétn' | wc -L
            8
            $ printf 'ééééétn' | expand | wc -L
            11


            Also note that ./**/*.c,h,pl,y would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh or bash -O failglob) if either of those globs don't match.



            With zsh, you'd use ./**/*.(c|h|p[ly])(D.) which is one glob, and where D includes hidden files and . restricts to regular files.



            For a solution that takes into account the actual width of characters (assuming all the text files are encoded in the locale's character encoding) you could use:



            git grep -h '' ./**/*.(c|h|p[ly])(.) |
            perl -Mopen=locale -MText::Tabs -MText::CharWidth=mbswidth -lne '
            $_ = expand($_);
            s/[[:cntrl:]]//g;
            $n++ if mbswidth(expand($_)) > 80;
            ENDprint 0+$n'


            Here removing all control characters (but NL, the record delimiter and TAB which is expanded), not just CR as mbswidth() at least on GNU systems considers them as having a width of -1. In any case, it's not really possible to always know what impact a control character will have on the display width of text, as that depends on how the displaying device interprets those control characters. Another commonly found control character in text files is form feed, but it's usually found on its own on a line, so is unlikely to make any difference here.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 21 mins ago

























            answered 7 hours ago









            Stéphane Chazelas

            284k53522859




            284k53522859











            • “GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.” You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
              – phg
              7 hours ago










            • @phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
              – Stéphane Chazelas
              6 hours ago










            • For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
              – phg
              6 hours ago






            • 1




              @phg, looking at the Linux kernel source tree (an old checkout from May I had lying about), @roaima's approach finds 32408 too many lines. Not so much about mixed tab+spc indenting, but because tab is also used for column alignments for table-like sequences of #define symbol value or declarations.
              – Stéphane Chazelas
              1 hour ago










            • Interesting result, but we’re not dealing with the the kernel tree here.
              – phg
              1 hour ago
















            • “GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.” You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
              – phg
              7 hours ago










            • @phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
              – Stéphane Chazelas
              6 hours ago










            • For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
              – phg
              6 hours ago






            • 1




              @phg, looking at the Linux kernel source tree (an old checkout from May I had lying about), @roaima's approach finds 32408 too many lines. Not so much about mixed tab+spc indenting, but because tab is also used for column alignments for table-like sequences of #define symbol value or declarations.
              – Stéphane Chazelas
              1 hour ago










            • Interesting result, but we’re not dealing with the the kernel tree here.
              – phg
              1 hour ago















            “GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.” You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
            – phg
            7 hours ago




            “GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.” You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
            – phg
            7 hours ago












            @phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
            – Stéphane Chazelas
            6 hours ago




            @phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
            – Stéphane Chazelas
            6 hours ago












            For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
            – phg
            6 hours ago




            For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
            – phg
            6 hours ago




            1




            1




            @phg, looking at the Linux kernel source tree (an old checkout from May I had lying about), @roaima's approach finds 32408 too many lines. Not so much about mixed tab+spc indenting, but because tab is also used for column alignments for table-like sequences of #define symbol value or declarations.
            – Stéphane Chazelas
            1 hour ago




            @phg, looking at the Linux kernel source tree (an old checkout from May I had lying about), @roaima's approach finds 32408 too many lines. Not so much about mixed tab+spc indenting, but because tab is also used for column alignments for table-like sequences of #define symbol value or declarations.
            – Stéphane Chazelas
            1 hour ago












            Interesting result, but we’re not dealing with the the kernel tree here.
            – phg
            1 hour ago




            Interesting result, but we’re not dealing with the the kernel tree here.
            – phg
            1 hour ago










            up vote
            5
            down vote













            Preprocess the files by piping them through expand. The expand utility will expand tabs appropriately (using the standard tab stops at every 8th character).



            find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
            awk 'length > 80 n++ END print n '





            share|improve this answer
























              up vote
              5
              down vote













              Preprocess the files by piping them through expand. The expand utility will expand tabs appropriately (using the standard tab stops at every 8th character).



              find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
              awk 'length > 80 n++ END print n '





              share|improve this answer






















                up vote
                5
                down vote










                up vote
                5
                down vote









                Preprocess the files by piping them through expand. The expand utility will expand tabs appropriately (using the standard tab stops at every 8th character).



                find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
                awk 'length > 80 n++ END print n '





                share|improve this answer












                Preprocess the files by piping them through expand. The expand utility will expand tabs appropriately (using the standard tab stops at every 8th character).



                find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
                awk 'length > 80 n++ END print n '






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 7 hours ago









                Kusalananda

                105k14209326




                105k14209326



























                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f468966%2fcount-lines-wider-than-80-columns-taking-tabs-correctly-into-account%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Comments

                    Popular posts from this blog

                    What does second last employer means? [closed]

                    Installing NextGIS Connect into QGIS 3?

                    One-line joke