Count lines wider than 80 columns, taking tabs correctly into account
Clash Royale CLAN TAG#URR8PPP
up vote
6
down vote
favorite
To count lines wider than 80 columns I currently use this:
$ git grep -h -c -v '^.,80$' **/*.c,h,pl,y
|awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
44984
(-h
courtesy of @stéphane-chazelas.)
Unfortunately, the repo uses tabs for indenting so the grep pattern
is inaccurate. Is there a way to have the regex treat tabs at the
standard width of 8 chars like wc -L
does?
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit
hooks in lieu of discipline).
For performance reasons IâÂÂd prefer a solution that works insidegit-grep(1)
or maybe another grep tool, without preprocessing
files.
grep
add a comment |Â
up vote
6
down vote
favorite
To count lines wider than 80 columns I currently use this:
$ git grep -h -c -v '^.,80$' **/*.c,h,pl,y
|awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
44984
(-h
courtesy of @stéphane-chazelas.)
Unfortunately, the repo uses tabs for indenting so the grep pattern
is inaccurate. Is there a way to have the regex treat tabs at the
standard width of 8 chars like wc -L
does?
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit
hooks in lieu of discipline).
For performance reasons IâÂÂd prefer a solution that works insidegit-grep(1)
or maybe another grep tool, without preprocessing
files.
grep
add a comment |Â
up vote
6
down vote
favorite
up vote
6
down vote
favorite
To count lines wider than 80 columns I currently use this:
$ git grep -h -c -v '^.,80$' **/*.c,h,pl,y
|awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
44984
(-h
courtesy of @stéphane-chazelas.)
Unfortunately, the repo uses tabs for indenting so the grep pattern
is inaccurate. Is there a way to have the regex treat tabs at the
standard width of 8 chars like wc -L
does?
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit
hooks in lieu of discipline).
For performance reasons IâÂÂd prefer a solution that works insidegit-grep(1)
or maybe another grep tool, without preprocessing
files.
grep
To count lines wider than 80 columns I currently use this:
$ git grep -h -c -v '^.,80$' **/*.c,h,pl,y
|awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
44984
(-h
courtesy of @stéphane-chazelas.)
Unfortunately, the repo uses tabs for indenting so the grep pattern
is inaccurate. Is there a way to have the regex treat tabs at the
standard width of 8 chars like wc -L
does?
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit
hooks in lieu of discipline).
For performance reasons IâÂÂd prefer a solution that works insidegit-grep(1)
or maybe another grep tool, without preprocessing
files.
grep
grep
edited 17 mins ago
ilkkachu
50.9k678140
50.9k678140
asked 7 hours ago
phg
649416
649416
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
8
down vote
accepted
If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.
- No tabs, at least 81 characters
- One tab, at least 73 characters
- Two tabs, at least 65 characeters
- Etc.
The resulting mess is as follows, with your awk
statement summing the individual line counts to provide a grand total
git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
awk ' i+=$1 END printf ("%dn", i) '
Note thatgit grep -P
(at least with my 2.18.0 version on Debian here) doesn't work with multi-byte characters. For instance, it considers that©
(common in source files) is 2 characters instead of one when encoded in UTF-8. It's OK with-E
. You can work around it in UTF-8 locales by writinggit grep -hcP '(*UTF8)...'
â Stéphane Chazelas
22 mins ago
add a comment |Â
up vote
7
down vote
GNU wc -L
doesn't treat TABs as 8 characters, it treats TABs as they would be displayed in a terminal with TAB stops every 8 columns so would have a "width" ranging from 1 to 8 characters depending on where they're found on the line. wc -L
also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).
$ printf 'abcdetn' | wc -L
8
Here, you could use expand
to expand those TABs to spaces:
git grep -h '' ./**/*.c,h,pl,y | expand | tr -d 'r' | grep -cE '.81'
(also not counting CR characters in case some of those files come from the Microsoft world).
That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand
currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).
$ printf 'ééééétn' | wc -L
8
$ printf 'ééééétn' | expand | wc -L
11
Also note that ./**/*.c,h,pl,y
would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh
or bash -O failglob
) if either of those globs don't match.
With zsh
, you'd use ./**/*.(c|h|p[ly])(D.)
which is one glob, and where D
includes hidden files and .
restricts to regular files.
For a solution that takes into account the actual width of characters (assuming all the text files are encoded in the locale's character encoding) you could use:
git grep -h '' ./**/*.(c|h|p[ly])(.) |
perl -Mopen=locale -MText::Tabs -MText::CharWidth=mbswidth -lne '
$_ = expand($_);
s/[[:cntrl:]]//g;
$n++ if mbswidth(expand($_)) > 80;
ENDprint 0+$n'
Here removing all control characters (but NL, the record delimiter and TAB which is expanded), not just CR as mbswidth()
at least on GNU systems considers them as having a width of -1
. In any case, it's not really possible to always know what impact a control character will have on the display width of text, as that depends on how the displaying device interprets those control characters. Another commonly found control character in text files is form feed, but it's usually found on its own on a line, so is unlikely to make any difference here.
âÂÂGNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.â YouâÂÂre correct, of course, but when tabs are used only for indenting that boils down to the same thing.
â phg
7 hours ago
@phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
â Stéphane Chazelas
6 hours ago
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
â phg
6 hours ago
1
@phg, looking at the Linux kernel source tree (an old checkout from May I had lying about), @roaima's approach finds 32408 too many lines. Not so much about mixed tab+spc indenting, but because tab is also used for column alignments for table-like sequences of#define symbol value
or declarations.
â Stéphane Chazelas
1 hour ago
Interesting result, but weâÂÂre not dealing with the the kernel tree here.
â phg
1 hour ago
add a comment |Â
up vote
5
down vote
Preprocess the files by piping them through expand
. The expand
utility will expand tabs appropriately (using the standard tab stops at every 8th character).
find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
awk 'length > 80 n++ END print n '
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
8
down vote
accepted
If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.
- No tabs, at least 81 characters
- One tab, at least 73 characters
- Two tabs, at least 65 characeters
- Etc.
The resulting mess is as follows, with your awk
statement summing the individual line counts to provide a grand total
git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
awk ' i+=$1 END printf ("%dn", i) '
Note thatgit grep -P
(at least with my 2.18.0 version on Debian here) doesn't work with multi-byte characters. For instance, it considers that©
(common in source files) is 2 characters instead of one when encoded in UTF-8. It's OK with-E
. You can work around it in UTF-8 locales by writinggit grep -hcP '(*UTF8)...'
â Stéphane Chazelas
22 mins ago
add a comment |Â
up vote
8
down vote
accepted
If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.
- No tabs, at least 81 characters
- One tab, at least 73 characters
- Two tabs, at least 65 characeters
- Etc.
The resulting mess is as follows, with your awk
statement summing the individual line counts to provide a grand total
git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
awk ' i+=$1 END printf ("%dn", i) '
Note thatgit grep -P
(at least with my 2.18.0 version on Debian here) doesn't work with multi-byte characters. For instance, it considers that©
(common in source files) is 2 characters instead of one when encoded in UTF-8. It's OK with-E
. You can work around it in UTF-8 locales by writinggit grep -hcP '(*UTF8)...'
â Stéphane Chazelas
22 mins ago
add a comment |Â
up vote
8
down vote
accepted
up vote
8
down vote
accepted
If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.
- No tabs, at least 81 characters
- One tab, at least 73 characters
- Two tabs, at least 65 characeters
- Etc.
The resulting mess is as follows, with your awk
statement summing the individual line counts to provide a grand total
git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
awk ' i+=$1 END printf ("%dn", i) '
If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.
- No tabs, at least 81 characters
- One tab, at least 73 characters
- Two tabs, at least 65 characeters
- Etc.
The resulting mess is as follows, with your awk
statement summing the individual line counts to provide a grand total
git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
awk ' i+=$1 END printf ("%dn", i) '
answered 6 hours ago
roaima
40.2k547110
40.2k547110
Note thatgit grep -P
(at least with my 2.18.0 version on Debian here) doesn't work with multi-byte characters. For instance, it considers that©
(common in source files) is 2 characters instead of one when encoded in UTF-8. It's OK with-E
. You can work around it in UTF-8 locales by writinggit grep -hcP '(*UTF8)...'
â Stéphane Chazelas
22 mins ago
add a comment |Â
Note thatgit grep -P
(at least with my 2.18.0 version on Debian here) doesn't work with multi-byte characters. For instance, it considers that©
(common in source files) is 2 characters instead of one when encoded in UTF-8. It's OK with-E
. You can work around it in UTF-8 locales by writinggit grep -hcP '(*UTF8)...'
â Stéphane Chazelas
22 mins ago
Note that
git grep -P
(at least with my 2.18.0 version on Debian here) doesn't work with multi-byte characters. For instance, it considers that ©
(common in source files) is 2 characters instead of one when encoded in UTF-8. It's OK with -E
. You can work around it in UTF-8 locales by writing git grep -hcP '(*UTF8)...'
â Stéphane Chazelas
22 mins ago
Note that
git grep -P
(at least with my 2.18.0 version on Debian here) doesn't work with multi-byte characters. For instance, it considers that ©
(common in source files) is 2 characters instead of one when encoded in UTF-8. It's OK with -E
. You can work around it in UTF-8 locales by writing git grep -hcP '(*UTF8)...'
â Stéphane Chazelas
22 mins ago
add a comment |Â
up vote
7
down vote
GNU wc -L
doesn't treat TABs as 8 characters, it treats TABs as they would be displayed in a terminal with TAB stops every 8 columns so would have a "width" ranging from 1 to 8 characters depending on where they're found on the line. wc -L
also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).
$ printf 'abcdetn' | wc -L
8
Here, you could use expand
to expand those TABs to spaces:
git grep -h '' ./**/*.c,h,pl,y | expand | tr -d 'r' | grep -cE '.81'
(also not counting CR characters in case some of those files come from the Microsoft world).
That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand
currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).
$ printf 'ééééétn' | wc -L
8
$ printf 'ééééétn' | expand | wc -L
11
Also note that ./**/*.c,h,pl,y
would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh
or bash -O failglob
) if either of those globs don't match.
With zsh
, you'd use ./**/*.(c|h|p[ly])(D.)
which is one glob, and where D
includes hidden files and .
restricts to regular files.
For a solution that takes into account the actual width of characters (assuming all the text files are encoded in the locale's character encoding) you could use:
git grep -h '' ./**/*.(c|h|p[ly])(.) |
perl -Mopen=locale -MText::Tabs -MText::CharWidth=mbswidth -lne '
$_ = expand($_);
s/[[:cntrl:]]//g;
$n++ if mbswidth(expand($_)) > 80;
ENDprint 0+$n'
Here removing all control characters (but NL, the record delimiter and TAB which is expanded), not just CR as mbswidth()
at least on GNU systems considers them as having a width of -1
. In any case, it's not really possible to always know what impact a control character will have on the display width of text, as that depends on how the displaying device interprets those control characters. Another commonly found control character in text files is form feed, but it's usually found on its own on a line, so is unlikely to make any difference here.
âÂÂGNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.â YouâÂÂre correct, of course, but when tabs are used only for indenting that boils down to the same thing.
â phg
7 hours ago
@phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
â Stéphane Chazelas
6 hours ago
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
â phg
6 hours ago
1
@phg, looking at the Linux kernel source tree (an old checkout from May I had lying about), @roaima's approach finds 32408 too many lines. Not so much about mixed tab+spc indenting, but because tab is also used for column alignments for table-like sequences of#define symbol value
or declarations.
â Stéphane Chazelas
1 hour ago
Interesting result, but weâÂÂre not dealing with the the kernel tree here.
â phg
1 hour ago
add a comment |Â
up vote
7
down vote
GNU wc -L
doesn't treat TABs as 8 characters, it treats TABs as they would be displayed in a terminal with TAB stops every 8 columns so would have a "width" ranging from 1 to 8 characters depending on where they're found on the line. wc -L
also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).
$ printf 'abcdetn' | wc -L
8
Here, you could use expand
to expand those TABs to spaces:
git grep -h '' ./**/*.c,h,pl,y | expand | tr -d 'r' | grep -cE '.81'
(also not counting CR characters in case some of those files come from the Microsoft world).
That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand
currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).
$ printf 'ééééétn' | wc -L
8
$ printf 'ééééétn' | expand | wc -L
11
Also note that ./**/*.c,h,pl,y
would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh
or bash -O failglob
) if either of those globs don't match.
With zsh
, you'd use ./**/*.(c|h|p[ly])(D.)
which is one glob, and where D
includes hidden files and .
restricts to regular files.
For a solution that takes into account the actual width of characters (assuming all the text files are encoded in the locale's character encoding) you could use:
git grep -h '' ./**/*.(c|h|p[ly])(.) |
perl -Mopen=locale -MText::Tabs -MText::CharWidth=mbswidth -lne '
$_ = expand($_);
s/[[:cntrl:]]//g;
$n++ if mbswidth(expand($_)) > 80;
ENDprint 0+$n'
Here removing all control characters (but NL, the record delimiter and TAB which is expanded), not just CR as mbswidth()
at least on GNU systems considers them as having a width of -1
. In any case, it's not really possible to always know what impact a control character will have on the display width of text, as that depends on how the displaying device interprets those control characters. Another commonly found control character in text files is form feed, but it's usually found on its own on a line, so is unlikely to make any difference here.
âÂÂGNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.â YouâÂÂre correct, of course, but when tabs are used only for indenting that boils down to the same thing.
â phg
7 hours ago
@phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
â Stéphane Chazelas
6 hours ago
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
â phg
6 hours ago
1
@phg, looking at the Linux kernel source tree (an old checkout from May I had lying about), @roaima's approach finds 32408 too many lines. Not so much about mixed tab+spc indenting, but because tab is also used for column alignments for table-like sequences of#define symbol value
or declarations.
â Stéphane Chazelas
1 hour ago
Interesting result, but weâÂÂre not dealing with the the kernel tree here.
â phg
1 hour ago
add a comment |Â
up vote
7
down vote
up vote
7
down vote
GNU wc -L
doesn't treat TABs as 8 characters, it treats TABs as they would be displayed in a terminal with TAB stops every 8 columns so would have a "width" ranging from 1 to 8 characters depending on where they're found on the line. wc -L
also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).
$ printf 'abcdetn' | wc -L
8
Here, you could use expand
to expand those TABs to spaces:
git grep -h '' ./**/*.c,h,pl,y | expand | tr -d 'r' | grep -cE '.81'
(also not counting CR characters in case some of those files come from the Microsoft world).
That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand
currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).
$ printf 'ééééétn' | wc -L
8
$ printf 'ééééétn' | expand | wc -L
11
Also note that ./**/*.c,h,pl,y
would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh
or bash -O failglob
) if either of those globs don't match.
With zsh
, you'd use ./**/*.(c|h|p[ly])(D.)
which is one glob, and where D
includes hidden files and .
restricts to regular files.
For a solution that takes into account the actual width of characters (assuming all the text files are encoded in the locale's character encoding) you could use:
git grep -h '' ./**/*.(c|h|p[ly])(.) |
perl -Mopen=locale -MText::Tabs -MText::CharWidth=mbswidth -lne '
$_ = expand($_);
s/[[:cntrl:]]//g;
$n++ if mbswidth(expand($_)) > 80;
ENDprint 0+$n'
Here removing all control characters (but NL, the record delimiter and TAB which is expanded), not just CR as mbswidth()
at least on GNU systems considers them as having a width of -1
. In any case, it's not really possible to always know what impact a control character will have on the display width of text, as that depends on how the displaying device interprets those control characters. Another commonly found control character in text files is form feed, but it's usually found on its own on a line, so is unlikely to make any difference here.
GNU wc -L
doesn't treat TABs as 8 characters, it treats TABs as they would be displayed in a terminal with TAB stops every 8 columns so would have a "width" ranging from 1 to 8 characters depending on where they're found on the line. wc -L
also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).
$ printf 'abcdetn' | wc -L
8
Here, you could use expand
to expand those TABs to spaces:
git grep -h '' ./**/*.c,h,pl,y | expand | tr -d 'r' | grep -cE '.81'
(also not counting CR characters in case some of those files come from the Microsoft world).
That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand
currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).
$ printf 'ééééétn' | wc -L
8
$ printf 'ééééétn' | expand | wc -L
11
Also note that ./**/*.c,h,pl,y
would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh
or bash -O failglob
) if either of those globs don't match.
With zsh
, you'd use ./**/*.(c|h|p[ly])(D.)
which is one glob, and where D
includes hidden files and .
restricts to regular files.
For a solution that takes into account the actual width of characters (assuming all the text files are encoded in the locale's character encoding) you could use:
git grep -h '' ./**/*.(c|h|p[ly])(.) |
perl -Mopen=locale -MText::Tabs -MText::CharWidth=mbswidth -lne '
$_ = expand($_);
s/[[:cntrl:]]//g;
$n++ if mbswidth(expand($_)) > 80;
ENDprint 0+$n'
Here removing all control characters (but NL, the record delimiter and TAB which is expanded), not just CR as mbswidth()
at least on GNU systems considers them as having a width of -1
. In any case, it's not really possible to always know what impact a control character will have on the display width of text, as that depends on how the displaying device interprets those control characters. Another commonly found control character in text files is form feed, but it's usually found on its own on a line, so is unlikely to make any difference here.
edited 21 mins ago
answered 7 hours ago
Stéphane Chazelas
284k53522859
284k53522859
âÂÂGNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.â YouâÂÂre correct, of course, but when tabs are used only for indenting that boils down to the same thing.
â phg
7 hours ago
@phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
â Stéphane Chazelas
6 hours ago
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
â phg
6 hours ago
1
@phg, looking at the Linux kernel source tree (an old checkout from May I had lying about), @roaima's approach finds 32408 too many lines. Not so much about mixed tab+spc indenting, but because tab is also used for column alignments for table-like sequences of#define symbol value
or declarations.
â Stéphane Chazelas
1 hour ago
Interesting result, but weâÂÂre not dealing with the the kernel tree here.
â phg
1 hour ago
add a comment |Â
âÂÂGNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.â YouâÂÂre correct, of course, but when tabs are used only for indenting that boils down to the same thing.
â phg
7 hours ago
@phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
â Stéphane Chazelas
6 hours ago
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
â phg
6 hours ago
1
@phg, looking at the Linux kernel source tree (an old checkout from May I had lying about), @roaima's approach finds 32408 too many lines. Not so much about mixed tab+spc indenting, but because tab is also used for column alignments for table-like sequences of#define symbol value
or declarations.
â Stéphane Chazelas
1 hour ago
Interesting result, but weâÂÂre not dealing with the the kernel tree here.
â phg
1 hour ago
âÂÂGNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.â YouâÂÂre correct, of course, but when tabs are used only for indenting that boils down to the same thing.
â phg
7 hours ago
âÂÂGNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.â YouâÂÂre correct, of course, but when tabs are used only for indenting that boils down to the same thing.
â phg
7 hours ago
@phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
â Stéphane Chazelas
6 hours ago
@phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
â Stéphane Chazelas
6 hours ago
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
â phg
6 hours ago
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
â phg
6 hours ago
1
1
@phg, looking at the Linux kernel source tree (an old checkout from May I had lying about), @roaima's approach finds 32408 too many lines. Not so much about mixed tab+spc indenting, but because tab is also used for column alignments for table-like sequences of
#define symbol value
or declarations.â Stéphane Chazelas
1 hour ago
@phg, looking at the Linux kernel source tree (an old checkout from May I had lying about), @roaima's approach finds 32408 too many lines. Not so much about mixed tab+spc indenting, but because tab is also used for column alignments for table-like sequences of
#define symbol value
or declarations.â Stéphane Chazelas
1 hour ago
Interesting result, but weâÂÂre not dealing with the the kernel tree here.
â phg
1 hour ago
Interesting result, but weâÂÂre not dealing with the the kernel tree here.
â phg
1 hour ago
add a comment |Â
up vote
5
down vote
Preprocess the files by piping them through expand
. The expand
utility will expand tabs appropriately (using the standard tab stops at every 8th character).
find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
awk 'length > 80 n++ END print n '
add a comment |Â
up vote
5
down vote
Preprocess the files by piping them through expand
. The expand
utility will expand tabs appropriately (using the standard tab stops at every 8th character).
find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
awk 'length > 80 n++ END print n '
add a comment |Â
up vote
5
down vote
up vote
5
down vote
Preprocess the files by piping them through expand
. The expand
utility will expand tabs appropriately (using the standard tab stops at every 8th character).
find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
awk 'length > 80 n++ END print n '
Preprocess the files by piping them through expand
. The expand
utility will expand tabs appropriately (using the standard tab stops at every 8th character).
find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
awk 'length > 80 n++ END print n '
answered 7 hours ago
Kusalananda
105k14209326
105k14209326
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f468966%2fcount-lines-wider-than-80-columns-taking-tabs-correctly-into-account%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password