What prevents stdout/stderr from interleaving
Clash Royale CLAN TAG#URR8PPP
up vote
4
down vote
favorite
Say I run some processes:
#!/usr/bin/env bash
foo &
bar &
baz &
wait;
I run the above script like so:
foobarbaz | cat
as far as I can tell, when any of the procs write to stdout/stderr, their output never interleaves - each line of stdio seems to be atomic. How does that work? What utility controls how each line is atomic?
shell osx stdout output stderr
add a comment |Â
up vote
4
down vote
favorite
Say I run some processes:
#!/usr/bin/env bash
foo &
bar &
baz &
wait;
I run the above script like so:
foobarbaz | cat
as far as I can tell, when any of the procs write to stdout/stderr, their output never interleaves - each line of stdio seems to be atomic. How does that work? What utility controls how each line is atomic?
shell osx stdout output stderr
2
How much data does your commands output? Try making them output a few kilobytes.
â Kusalananda
1 hour ago
You mean where one of the commands outputs a few kb before a newline?
â Alexander Mills
44 mins ago
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
Say I run some processes:
#!/usr/bin/env bash
foo &
bar &
baz &
wait;
I run the above script like so:
foobarbaz | cat
as far as I can tell, when any of the procs write to stdout/stderr, their output never interleaves - each line of stdio seems to be atomic. How does that work? What utility controls how each line is atomic?
shell osx stdout output stderr
Say I run some processes:
#!/usr/bin/env bash
foo &
bar &
baz &
wait;
I run the above script like so:
foobarbaz | cat
as far as I can tell, when any of the procs write to stdout/stderr, their output never interleaves - each line of stdio seems to be atomic. How does that work? What utility controls how each line is atomic?
shell osx stdout output stderr
shell osx stdout output stderr
edited 1 hour ago
Jeff Schaller
34.1k851113
34.1k851113
asked 1 hour ago
Alexander Mills
2,0321132
2,0321132
2
How much data does your commands output? Try making them output a few kilobytes.
â Kusalananda
1 hour ago
You mean where one of the commands outputs a few kb before a newline?
â Alexander Mills
44 mins ago
add a comment |Â
2
How much data does your commands output? Try making them output a few kilobytes.
â Kusalananda
1 hour ago
You mean where one of the commands outputs a few kb before a newline?
â Alexander Mills
44 mins ago
2
2
How much data does your commands output? Try making them output a few kilobytes.
â Kusalananda
1 hour ago
How much data does your commands output? Try making them output a few kilobytes.
â Kusalananda
1 hour ago
You mean where one of the commands outputs a few kb before a newline?
â Alexander Mills
44 mins ago
You mean where one of the commands outputs a few kb before a newline?
â Alexander Mills
44 mins ago
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
5
down vote
It depends how the programs buffer their output. The stdio library that most programs use when they're writing uses buffers to make output more efficient. Instead of outputting data as soon as the program calls a library function to write to a file, the function stores this data in a buffer, and only actually outputs the data once the buffer has filled up. This means that output is done in batches. More precisely, there are three output modes:
- Unbuffered: the data is written immediately, without using a buffer. This can be slow if the program writes its output in small pieces, e.g. character by character. This is the default mode for standard error.
- Fully buffered: the data is only written when the buffer is full. This is the default mode when writing to a pipe or to a regular file, except with stderr.
- Line-buffered: the data is written after each newline, or when the buffer is full. This is the default mode when writing to a terminal, except with stderr.
Programs can reprogram each file to behave differently, and can explicitly flush the buffer. The buffer is flushed automatically when a program closes the file or exits normally.
If all the programs that are writing to the same pipe either use line-buffered mode, or use unbuffered mode and write each line with a single call to an output function, and if the lines are short enough to write in a single chunk, then the output will be an interleaving of whole lines. But if one of the programs uses fully-buffered mode, or if the lines are too long, then you will see mixed lines.
Here is an example where I interleave the output from two programs. I used GNU coreutils on Linux; different versions of these utilities may behave differently.
yes aaaa
writesaaaa
forever in what is essentially equivalent to line-buffered mode. Theyes
utility actually writes multiple lines at a time, but each time it emits output, the output is a whole number of lines.echo bbbb; done | grep b
writesbbbb
forever in unbuffered mode. It uses a buffer size of 8192, and each line is 5 bytes long. Since 5 does not divide 8192, the boundaries between writes are not at a line boundary in general.
Let's pitch them together.
$ yes aaaa & while true; do echo bbbb; done | head -n 999999 | grep -e ab -e ba
bbaaaa
bbbbaaaa
baaaa
bbbaaaa
bbaaaa
bbbaaaa
ab
bbbbaaa
As you can see, yes sometimes interrupted grep and vice versa. Only about 0.001% of the lines got interrupted, but it happened. The output is randomized so the number of interruptions will vary, but I saw at least a few interruptions every time. There would be a higher fraction of interrupted lines if the lines were longer, since the likelihood of an interruption increases as the number of lines per buffer decreases.
interesting, so what might be a good way to ensure that all lines were written tocat
atomically, such that the cat process receives whole lines from either foo/bar/baz but not half a line from one and half a line from another, etc. Is there something I can do with the bash script?
â Alexander Mills
55 mins ago
1
sounds this applies to my case also where I had hundreds of files andawk
was produced two (or more) lines of output for same ID withfind -type f -name 'myfiles*' -print0 | xargs -0 awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] '
but withfind -type f -name 'myfiles*' -print0 | xargs -0 cat| awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] '
it correctly produced only one line for every IDs.
â sddgob
49 mins ago
To prevent any interleaving, I can do that with in a programming env like Node.js, but with bash/shell, not sure how to do it.
â Alexander Mills
45 mins ago
@AlexanderMills Write to different files, then process the files separately, or concatenate them.
â Kusalananda
8 mins ago
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
It depends how the programs buffer their output. The stdio library that most programs use when they're writing uses buffers to make output more efficient. Instead of outputting data as soon as the program calls a library function to write to a file, the function stores this data in a buffer, and only actually outputs the data once the buffer has filled up. This means that output is done in batches. More precisely, there are three output modes:
- Unbuffered: the data is written immediately, without using a buffer. This can be slow if the program writes its output in small pieces, e.g. character by character. This is the default mode for standard error.
- Fully buffered: the data is only written when the buffer is full. This is the default mode when writing to a pipe or to a regular file, except with stderr.
- Line-buffered: the data is written after each newline, or when the buffer is full. This is the default mode when writing to a terminal, except with stderr.
Programs can reprogram each file to behave differently, and can explicitly flush the buffer. The buffer is flushed automatically when a program closes the file or exits normally.
If all the programs that are writing to the same pipe either use line-buffered mode, or use unbuffered mode and write each line with a single call to an output function, and if the lines are short enough to write in a single chunk, then the output will be an interleaving of whole lines. But if one of the programs uses fully-buffered mode, or if the lines are too long, then you will see mixed lines.
Here is an example where I interleave the output from two programs. I used GNU coreutils on Linux; different versions of these utilities may behave differently.
yes aaaa
writesaaaa
forever in what is essentially equivalent to line-buffered mode. Theyes
utility actually writes multiple lines at a time, but each time it emits output, the output is a whole number of lines.echo bbbb; done | grep b
writesbbbb
forever in unbuffered mode. It uses a buffer size of 8192, and each line is 5 bytes long. Since 5 does not divide 8192, the boundaries between writes are not at a line boundary in general.
Let's pitch them together.
$ yes aaaa & while true; do echo bbbb; done | head -n 999999 | grep -e ab -e ba
bbaaaa
bbbbaaaa
baaaa
bbbaaaa
bbaaaa
bbbaaaa
ab
bbbbaaa
As you can see, yes sometimes interrupted grep and vice versa. Only about 0.001% of the lines got interrupted, but it happened. The output is randomized so the number of interruptions will vary, but I saw at least a few interruptions every time. There would be a higher fraction of interrupted lines if the lines were longer, since the likelihood of an interruption increases as the number of lines per buffer decreases.
interesting, so what might be a good way to ensure that all lines were written tocat
atomically, such that the cat process receives whole lines from either foo/bar/baz but not half a line from one and half a line from another, etc. Is there something I can do with the bash script?
â Alexander Mills
55 mins ago
1
sounds this applies to my case also where I had hundreds of files andawk
was produced two (or more) lines of output for same ID withfind -type f -name 'myfiles*' -print0 | xargs -0 awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] '
but withfind -type f -name 'myfiles*' -print0 | xargs -0 cat| awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] '
it correctly produced only one line for every IDs.
â sddgob
49 mins ago
To prevent any interleaving, I can do that with in a programming env like Node.js, but with bash/shell, not sure how to do it.
â Alexander Mills
45 mins ago
@AlexanderMills Write to different files, then process the files separately, or concatenate them.
â Kusalananda
8 mins ago
add a comment |Â
up vote
5
down vote
It depends how the programs buffer their output. The stdio library that most programs use when they're writing uses buffers to make output more efficient. Instead of outputting data as soon as the program calls a library function to write to a file, the function stores this data in a buffer, and only actually outputs the data once the buffer has filled up. This means that output is done in batches. More precisely, there are three output modes:
- Unbuffered: the data is written immediately, without using a buffer. This can be slow if the program writes its output in small pieces, e.g. character by character. This is the default mode for standard error.
- Fully buffered: the data is only written when the buffer is full. This is the default mode when writing to a pipe or to a regular file, except with stderr.
- Line-buffered: the data is written after each newline, or when the buffer is full. This is the default mode when writing to a terminal, except with stderr.
Programs can reprogram each file to behave differently, and can explicitly flush the buffer. The buffer is flushed automatically when a program closes the file or exits normally.
If all the programs that are writing to the same pipe either use line-buffered mode, or use unbuffered mode and write each line with a single call to an output function, and if the lines are short enough to write in a single chunk, then the output will be an interleaving of whole lines. But if one of the programs uses fully-buffered mode, or if the lines are too long, then you will see mixed lines.
Here is an example where I interleave the output from two programs. I used GNU coreutils on Linux; different versions of these utilities may behave differently.
yes aaaa
writesaaaa
forever in what is essentially equivalent to line-buffered mode. Theyes
utility actually writes multiple lines at a time, but each time it emits output, the output is a whole number of lines.echo bbbb; done | grep b
writesbbbb
forever in unbuffered mode. It uses a buffer size of 8192, and each line is 5 bytes long. Since 5 does not divide 8192, the boundaries between writes are not at a line boundary in general.
Let's pitch them together.
$ yes aaaa & while true; do echo bbbb; done | head -n 999999 | grep -e ab -e ba
bbaaaa
bbbbaaaa
baaaa
bbbaaaa
bbaaaa
bbbaaaa
ab
bbbbaaa
As you can see, yes sometimes interrupted grep and vice versa. Only about 0.001% of the lines got interrupted, but it happened. The output is randomized so the number of interruptions will vary, but I saw at least a few interruptions every time. There would be a higher fraction of interrupted lines if the lines were longer, since the likelihood of an interruption increases as the number of lines per buffer decreases.
interesting, so what might be a good way to ensure that all lines were written tocat
atomically, such that the cat process receives whole lines from either foo/bar/baz but not half a line from one and half a line from another, etc. Is there something I can do with the bash script?
â Alexander Mills
55 mins ago
1
sounds this applies to my case also where I had hundreds of files andawk
was produced two (or more) lines of output for same ID withfind -type f -name 'myfiles*' -print0 | xargs -0 awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] '
but withfind -type f -name 'myfiles*' -print0 | xargs -0 cat| awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] '
it correctly produced only one line for every IDs.
â sddgob
49 mins ago
To prevent any interleaving, I can do that with in a programming env like Node.js, but with bash/shell, not sure how to do it.
â Alexander Mills
45 mins ago
@AlexanderMills Write to different files, then process the files separately, or concatenate them.
â Kusalananda
8 mins ago
add a comment |Â
up vote
5
down vote
up vote
5
down vote
It depends how the programs buffer their output. The stdio library that most programs use when they're writing uses buffers to make output more efficient. Instead of outputting data as soon as the program calls a library function to write to a file, the function stores this data in a buffer, and only actually outputs the data once the buffer has filled up. This means that output is done in batches. More precisely, there are three output modes:
- Unbuffered: the data is written immediately, without using a buffer. This can be slow if the program writes its output in small pieces, e.g. character by character. This is the default mode for standard error.
- Fully buffered: the data is only written when the buffer is full. This is the default mode when writing to a pipe or to a regular file, except with stderr.
- Line-buffered: the data is written after each newline, or when the buffer is full. This is the default mode when writing to a terminal, except with stderr.
Programs can reprogram each file to behave differently, and can explicitly flush the buffer. The buffer is flushed automatically when a program closes the file or exits normally.
If all the programs that are writing to the same pipe either use line-buffered mode, or use unbuffered mode and write each line with a single call to an output function, and if the lines are short enough to write in a single chunk, then the output will be an interleaving of whole lines. But if one of the programs uses fully-buffered mode, or if the lines are too long, then you will see mixed lines.
Here is an example where I interleave the output from two programs. I used GNU coreutils on Linux; different versions of these utilities may behave differently.
yes aaaa
writesaaaa
forever in what is essentially equivalent to line-buffered mode. Theyes
utility actually writes multiple lines at a time, but each time it emits output, the output is a whole number of lines.echo bbbb; done | grep b
writesbbbb
forever in unbuffered mode. It uses a buffer size of 8192, and each line is 5 bytes long. Since 5 does not divide 8192, the boundaries between writes are not at a line boundary in general.
Let's pitch them together.
$ yes aaaa & while true; do echo bbbb; done | head -n 999999 | grep -e ab -e ba
bbaaaa
bbbbaaaa
baaaa
bbbaaaa
bbaaaa
bbbaaaa
ab
bbbbaaa
As you can see, yes sometimes interrupted grep and vice versa. Only about 0.001% of the lines got interrupted, but it happened. The output is randomized so the number of interruptions will vary, but I saw at least a few interruptions every time. There would be a higher fraction of interrupted lines if the lines were longer, since the likelihood of an interruption increases as the number of lines per buffer decreases.
It depends how the programs buffer their output. The stdio library that most programs use when they're writing uses buffers to make output more efficient. Instead of outputting data as soon as the program calls a library function to write to a file, the function stores this data in a buffer, and only actually outputs the data once the buffer has filled up. This means that output is done in batches. More precisely, there are three output modes:
- Unbuffered: the data is written immediately, without using a buffer. This can be slow if the program writes its output in small pieces, e.g. character by character. This is the default mode for standard error.
- Fully buffered: the data is only written when the buffer is full. This is the default mode when writing to a pipe or to a regular file, except with stderr.
- Line-buffered: the data is written after each newline, or when the buffer is full. This is the default mode when writing to a terminal, except with stderr.
Programs can reprogram each file to behave differently, and can explicitly flush the buffer. The buffer is flushed automatically when a program closes the file or exits normally.
If all the programs that are writing to the same pipe either use line-buffered mode, or use unbuffered mode and write each line with a single call to an output function, and if the lines are short enough to write in a single chunk, then the output will be an interleaving of whole lines. But if one of the programs uses fully-buffered mode, or if the lines are too long, then you will see mixed lines.
Here is an example where I interleave the output from two programs. I used GNU coreutils on Linux; different versions of these utilities may behave differently.
yes aaaa
writesaaaa
forever in what is essentially equivalent to line-buffered mode. Theyes
utility actually writes multiple lines at a time, but each time it emits output, the output is a whole number of lines.echo bbbb; done | grep b
writesbbbb
forever in unbuffered mode. It uses a buffer size of 8192, and each line is 5 bytes long. Since 5 does not divide 8192, the boundaries between writes are not at a line boundary in general.
Let's pitch them together.
$ yes aaaa & while true; do echo bbbb; done | head -n 999999 | grep -e ab -e ba
bbaaaa
bbbbaaaa
baaaa
bbbaaaa
bbaaaa
bbbaaaa
ab
bbbbaaa
As you can see, yes sometimes interrupted grep and vice versa. Only about 0.001% of the lines got interrupted, but it happened. The output is randomized so the number of interruptions will vary, but I saw at least a few interruptions every time. There would be a higher fraction of interrupted lines if the lines were longer, since the likelihood of an interruption increases as the number of lines per buffer decreases.
answered 1 hour ago
Gilles
515k12110231551
515k12110231551
interesting, so what might be a good way to ensure that all lines were written tocat
atomically, such that the cat process receives whole lines from either foo/bar/baz but not half a line from one and half a line from another, etc. Is there something I can do with the bash script?
â Alexander Mills
55 mins ago
1
sounds this applies to my case also where I had hundreds of files andawk
was produced two (or more) lines of output for same ID withfind -type f -name 'myfiles*' -print0 | xargs -0 awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] '
but withfind -type f -name 'myfiles*' -print0 | xargs -0 cat| awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] '
it correctly produced only one line for every IDs.
â sddgob
49 mins ago
To prevent any interleaving, I can do that with in a programming env like Node.js, but with bash/shell, not sure how to do it.
â Alexander Mills
45 mins ago
@AlexanderMills Write to different files, then process the files separately, or concatenate them.
â Kusalananda
8 mins ago
add a comment |Â
interesting, so what might be a good way to ensure that all lines were written tocat
atomically, such that the cat process receives whole lines from either foo/bar/baz but not half a line from one and half a line from another, etc. Is there something I can do with the bash script?
â Alexander Mills
55 mins ago
1
sounds this applies to my case also where I had hundreds of files andawk
was produced two (or more) lines of output for same ID withfind -type f -name 'myfiles*' -print0 | xargs -0 awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] '
but withfind -type f -name 'myfiles*' -print0 | xargs -0 cat| awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] '
it correctly produced only one line for every IDs.
â sddgob
49 mins ago
To prevent any interleaving, I can do that with in a programming env like Node.js, but with bash/shell, not sure how to do it.
â Alexander Mills
45 mins ago
@AlexanderMills Write to different files, then process the files separately, or concatenate them.
â Kusalananda
8 mins ago
interesting, so what might be a good way to ensure that all lines were written to
cat
atomically, such that the cat process receives whole lines from either foo/bar/baz but not half a line from one and half a line from another, etc. Is there something I can do with the bash script?â Alexander Mills
55 mins ago
interesting, so what might be a good way to ensure that all lines were written to
cat
atomically, such that the cat process receives whole lines from either foo/bar/baz but not half a line from one and half a line from another, etc. Is there something I can do with the bash script?â Alexander Mills
55 mins ago
1
1
sounds this applies to my case also where I had hundreds of files and
awk
was produced two (or more) lines of output for same ID with find -type f -name 'myfiles*' -print0 | xargs -0 awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] '
but with find -type f -name 'myfiles*' -print0 | xargs -0 cat| awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] '
it correctly produced only one line for every IDs.â sddgob
49 mins ago
sounds this applies to my case also where I had hundreds of files and
awk
was produced two (or more) lines of output for same ID with find -type f -name 'myfiles*' -print0 | xargs -0 awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] '
but with find -type f -name 'myfiles*' -print0 | xargs -0 cat| awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] '
it correctly produced only one line for every IDs.â sddgob
49 mins ago
To prevent any interleaving, I can do that with in a programming env like Node.js, but with bash/shell, not sure how to do it.
â Alexander Mills
45 mins ago
To prevent any interleaving, I can do that with in a programming env like Node.js, but with bash/shell, not sure how to do it.
â Alexander Mills
45 mins ago
@AlexanderMills Write to different files, then process the files separately, or concatenate them.
â Kusalananda
8 mins ago
@AlexanderMills Write to different files, then process the files separately, or concatenate them.
â Kusalananda
8 mins ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f476080%2fwhat-prevents-stdout-stderr-from-interleaving%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
2
How much data does your commands output? Try making them output a few kilobytes.
â Kusalananda
1 hour ago
You mean where one of the commands outputs a few kb before a newline?
â Alexander Mills
44 mins ago