Treat tab as eight characters in grep regex
Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
To count lines wider than 80 columns I currently use this:
$ git grep -h -c -v '^.,80$' **/*.c,h,pl,y
|awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
44984
(-h
courtesy of @stéphane-chazelas.)
Unfortunately, the repo uses tabs for indenting so the grep pattern
is inaccurate. Is there a way to have the regex treat tabs at the
standard width of 8 chars like wc -L
does?
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit
hooks in lieu of discipline).
For performance reasons I’d prefer a solution that works insidegit-grep(1)
or maybe another grep tool, without preprocessing
files.
grep
add a comment |Â
up vote
3
down vote
favorite
To count lines wider than 80 columns I currently use this:
$ git grep -h -c -v '^.,80$' **/*.c,h,pl,y
|awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
44984
(-h
courtesy of @stéphane-chazelas.)
Unfortunately, the repo uses tabs for indenting so the grep pattern
is inaccurate. Is there a way to have the regex treat tabs at the
standard width of 8 chars like wc -L
does?
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit
hooks in lieu of discipline).
For performance reasons I’d prefer a solution that works insidegit-grep(1)
or maybe another grep tool, without preprocessing
files.
grep
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
To count lines wider than 80 columns I currently use this:
$ git grep -h -c -v '^.,80$' **/*.c,h,pl,y
|awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
44984
(-h
courtesy of @stéphane-chazelas.)
Unfortunately, the repo uses tabs for indenting so the grep pattern
is inaccurate. Is there a way to have the regex treat tabs at the
standard width of 8 chars like wc -L
does?
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit
hooks in lieu of discipline).
For performance reasons I’d prefer a solution that works insidegit-grep(1)
or maybe another grep tool, without preprocessing
files.
grep
To count lines wider than 80 columns I currently use this:
$ git grep -h -c -v '^.,80$' **/*.c,h,pl,y
|awk 'BEGIN i=0 i+=$1 END printf ("%dn", i) '
44984
(-h
courtesy of @stéphane-chazelas.)
Unfortunately, the repo uses tabs for indenting so the grep pattern
is inaccurate. Is there a way to have the regex treat tabs at the
standard width of 8 chars like wc -L
does?
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit
hooks in lieu of discipline).
For performance reasons I’d prefer a solution that works insidegit-grep(1)
or maybe another grep tool, without preprocessing
files.
grep
grep
edited 1 hour ago


roaima
40.1k547110
40.1k547110
asked 2 hours ago
phg
634416
634416
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
3
down vote
accepted
If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.
- No tabs, at least 81 characters
- One tab, at least 73 characters
- Two tabs, at least 65 characeters
- Etc.
The resulting mess is as follows, with your awk
statement summing the individual line counts to provide a grand total
git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
awk ' i+=$1 END printf ("%dn", i) '
add a comment |Â
up vote
2
down vote
Preprocess the files by piping them through expand
. The expand
utility will expand tabs appropriately (using the standard tab stops at every 8th character).
find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
awk 'length > 80 n++ END print n '
add a comment |Â
up vote
1
down vote
GNU wc -L
doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns. wc -L
also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).
$ printf 'abcdetn' | wc -L
8
Here, you could use expand
to expand those TABs to spaces:
git grep -h '' ./**/*.c,h,pl,y | expand | grep -cE '.81'
That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand
currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).
$ printf 'ééééétn' | wc -L
8
$ printf 'ééééétn' | expand | wc -L
11
Also note that ./**/*.c,h,pl,y
would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh
or bash -O failglob
) if either of those globs don't match.
With zsh
, you'd use ./**/*.(c|h|p[ly])(D.)
which is one glob, and where D
includes hidden files and .
restricts to regular files.
“GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.†You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
– phg
2 hours ago
@phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
– Stéphane Chazelas
1 hour ago
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
– phg
1 hour ago
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.
- No tabs, at least 81 characters
- One tab, at least 73 characters
- Two tabs, at least 65 characeters
- Etc.
The resulting mess is as follows, with your awk
statement summing the individual line counts to provide a grand total
git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
awk ' i+=$1 END printf ("%dn", i) '
add a comment |Â
up vote
3
down vote
accepted
If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.
- No tabs, at least 81 characters
- One tab, at least 73 characters
- Two tabs, at least 65 characeters
- Etc.
The resulting mess is as follows, with your awk
statement summing the individual line counts to provide a grand total
git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
awk ' i+=$1 END printf ("%dn", i) '
add a comment |Â
up vote
3
down vote
accepted
up vote
3
down vote
accepted
If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.
- No tabs, at least 81 characters
- One tab, at least 73 characters
- Two tabs, at least 65 characeters
- Etc.
The resulting mess is as follows, with your awk
statement summing the individual line counts to provide a grand total
git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
awk ' i+=$1 END printf ("%dn", i) '
If we can assume per your comment that tab characters will appear only at the beginning of lines, then we can count alternatives to a minimum of 80 characters.
- No tabs, at least 81 characters
- One tab, at least 73 characters
- Two tabs, at least 65 characeters
- Etc.
The resulting mess is as follows, with your awk
statement summing the individual line counts to provide a grand total
git grep -hcP '^(.81,|t.73,|t2.65,|t3.57,|t4.49,|t5.41,|t6.33,|t7.25,|t8.17,|t9.9,|t10.)' **/*.c,h,pl,y |
awk ' i+=$1 END printf ("%dn", i) '
answered 1 hour ago


roaima
40.1k547110
40.1k547110
add a comment |Â
add a comment |Â
up vote
2
down vote
Preprocess the files by piping them through expand
. The expand
utility will expand tabs appropriately (using the standard tab stops at every 8th character).
find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
awk 'length > 80 n++ END print n '
add a comment |Â
up vote
2
down vote
Preprocess the files by piping them through expand
. The expand
utility will expand tabs appropriately (using the standard tab stops at every 8th character).
find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
awk 'length > 80 n++ END print n '
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Preprocess the files by piping them through expand
. The expand
utility will expand tabs appropriately (using the standard tab stops at every 8th character).
find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
awk 'length > 80 n++ END print n '
Preprocess the files by piping them through expand
. The expand
utility will expand tabs appropriately (using the standard tab stops at every 8th character).
find . -type f ( -name '*.[ch]' -o -name '*.p[ly]' ) -exec expand + |
awk 'length > 80 n++ END print n '
answered 2 hours ago


Kusalananda
105k14209326
105k14209326
add a comment |Â
add a comment |Â
up vote
1
down vote
GNU wc -L
doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns. wc -L
also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).
$ printf 'abcdetn' | wc -L
8
Here, you could use expand
to expand those TABs to spaces:
git grep -h '' ./**/*.c,h,pl,y | expand | grep -cE '.81'
That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand
currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).
$ printf 'ééééétn' | wc -L
8
$ printf 'ééééétn' | expand | wc -L
11
Also note that ./**/*.c,h,pl,y
would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh
or bash -O failglob
) if either of those globs don't match.
With zsh
, you'd use ./**/*.(c|h|p[ly])(D.)
which is one glob, and where D
includes hidden files and .
restricts to regular files.
“GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.†You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
– phg
2 hours ago
@phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
– Stéphane Chazelas
1 hour ago
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
– phg
1 hour ago
add a comment |Â
up vote
1
down vote
GNU wc -L
doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns. wc -L
also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).
$ printf 'abcdetn' | wc -L
8
Here, you could use expand
to expand those TABs to spaces:
git grep -h '' ./**/*.c,h,pl,y | expand | grep -cE '.81'
That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand
currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).
$ printf 'ééééétn' | wc -L
8
$ printf 'ééééétn' | expand | wc -L
11
Also note that ./**/*.c,h,pl,y
would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh
or bash -O failglob
) if either of those globs don't match.
With zsh
, you'd use ./**/*.(c|h|p[ly])(D.)
which is one glob, and where D
includes hidden files and .
restricts to regular files.
“GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.†You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
– phg
2 hours ago
@phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
– Stéphane Chazelas
1 hour ago
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
– phg
1 hour ago
add a comment |Â
up vote
1
down vote
up vote
1
down vote
GNU wc -L
doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns. wc -L
also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).
$ printf 'abcdetn' | wc -L
8
Here, you could use expand
to expand those TABs to spaces:
git grep -h '' ./**/*.c,h,pl,y | expand | grep -cE '.81'
That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand
currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).
$ printf 'ééééétn' | wc -L
8
$ printf 'ééééétn' | expand | wc -L
11
Also note that ./**/*.c,h,pl,y
would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh
or bash -O failglob
) if either of those globs don't match.
With zsh
, you'd use ./**/*.(c|h|p[ly])(D.)
which is one glob, and where D
includes hidden files and .
restricts to regular files.
GNU wc -L
doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns. wc -L
also considers the display width of other characters (whether they're 0, 1 or 2 columns wide).
$ printf 'abcdetn' | wc -L
8
Here, you could use expand
to expand those TABs to spaces:
git grep -h '' ./**/*.c,h,pl,y | expand | grep -cE '.81'
That covers TABs but not single-width or double-width characters. Note that the GNU implementation of expand
currently doesn't expand TABs properly if there are multi-byte characters (let alone zero-width or double-width ones).
$ printf 'ééééétn' | wc -L
8
$ printf 'ééééétn' | expand | wc -L
11
Also note that ./**/*.c,h,pl,y
would by default skip hidden files or files in hidden directories. As the brace expansion expands to several globs, you would also get errors (fatal with zsh
or bash -O failglob
) if either of those globs don't match.
With zsh
, you'd use ./**/*.(c|h|p[ly])(D.)
which is one glob, and where D
includes hidden files and .
restricts to regular files.
edited 1 hour ago
answered 2 hours ago


Stéphane Chazelas
283k53522859
283k53522859
“GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.†You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
– phg
2 hours ago
@phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
– Stéphane Chazelas
1 hour ago
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
– phg
1 hour ago
add a comment |Â
“GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.†You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
– phg
2 hours ago
@phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
– Stéphane Chazelas
1 hour ago
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
– phg
1 hour ago
“GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.†You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
– phg
2 hours ago
“GNU wc -L doesn't treat TABs as 8 characters, it treats TABs as it would be displayed in a terminal with TAB stops every 8 columns.†You’re correct, of course, but when tabs are used only for indenting that boils down to the same thing.
– phg
2 hours ago
@phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
– Stéphane Chazelas
1 hour ago
@phg, not if they're mixed with spaces (like 3 spaces, one tab, 3 spaces at the start of a line gives a width of 11 not 16).
– Stéphane Chazelas
1 hour ago
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
– phg
1 hour ago
For the purpose of this question we may assume the contributors were disciplined enough to indent consistently (or that they have git commit hooks in lieu of discipline).
– phg
1 hour ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f468966%2ftreat-tab-as-eight-characters-in-grep-regex%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password