What prevents stdout/stderr from interleaving

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
4
down vote

favorite
2












Say I run some processes:



#!/usr/bin/env bash

foo &
bar &
baz &

wait;


I run the above script like so:



foobarbaz | cat


as far as I can tell, when any of the procs write to stdout/stderr, their output never interleaves - each line of stdio seems to be atomic. How does that work? What utility controls how each line is atomic?










share|improve this question



















  • 2




    How much data does your commands output? Try making them output a few kilobytes.
    – Kusalananda
    1 hour ago










  • You mean where one of the commands outputs a few kb before a newline?
    – Alexander Mills
    44 mins ago














up vote
4
down vote

favorite
2












Say I run some processes:



#!/usr/bin/env bash

foo &
bar &
baz &

wait;


I run the above script like so:



foobarbaz | cat


as far as I can tell, when any of the procs write to stdout/stderr, their output never interleaves - each line of stdio seems to be atomic. How does that work? What utility controls how each line is atomic?










share|improve this question



















  • 2




    How much data does your commands output? Try making them output a few kilobytes.
    – Kusalananda
    1 hour ago










  • You mean where one of the commands outputs a few kb before a newline?
    – Alexander Mills
    44 mins ago












up vote
4
down vote

favorite
2









up vote
4
down vote

favorite
2






2





Say I run some processes:



#!/usr/bin/env bash

foo &
bar &
baz &

wait;


I run the above script like so:



foobarbaz | cat


as far as I can tell, when any of the procs write to stdout/stderr, their output never interleaves - each line of stdio seems to be atomic. How does that work? What utility controls how each line is atomic?










share|improve this question















Say I run some processes:



#!/usr/bin/env bash

foo &
bar &
baz &

wait;


I run the above script like so:



foobarbaz | cat


as far as I can tell, when any of the procs write to stdout/stderr, their output never interleaves - each line of stdio seems to be atomic. How does that work? What utility controls how each line is atomic?







shell osx stdout output stderr






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 1 hour ago









Jeff Schaller

34.1k851113




34.1k851113










asked 1 hour ago









Alexander Mills

2,0321132




2,0321132







  • 2




    How much data does your commands output? Try making them output a few kilobytes.
    – Kusalananda
    1 hour ago










  • You mean where one of the commands outputs a few kb before a newline?
    – Alexander Mills
    44 mins ago












  • 2




    How much data does your commands output? Try making them output a few kilobytes.
    – Kusalananda
    1 hour ago










  • You mean where one of the commands outputs a few kb before a newline?
    – Alexander Mills
    44 mins ago







2




2




How much data does your commands output? Try making them output a few kilobytes.
– Kusalananda
1 hour ago




How much data does your commands output? Try making them output a few kilobytes.
– Kusalananda
1 hour ago












You mean where one of the commands outputs a few kb before a newline?
– Alexander Mills
44 mins ago




You mean where one of the commands outputs a few kb before a newline?
– Alexander Mills
44 mins ago










1 Answer
1






active

oldest

votes

















up vote
5
down vote













It depends how the programs buffer their output. The stdio library that most programs use when they're writing uses buffers to make output more efficient. Instead of outputting data as soon as the program calls a library function to write to a file, the function stores this data in a buffer, and only actually outputs the data once the buffer has filled up. This means that output is done in batches. More precisely, there are three output modes:



  • Unbuffered: the data is written immediately, without using a buffer. This can be slow if the program writes its output in small pieces, e.g. character by character. This is the default mode for standard error.

  • Fully buffered: the data is only written when the buffer is full. This is the default mode when writing to a pipe or to a regular file, except with stderr.

  • Line-buffered: the data is written after each newline, or when the buffer is full. This is the default mode when writing to a terminal, except with stderr.

Programs can reprogram each file to behave differently, and can explicitly flush the buffer. The buffer is flushed automatically when a program closes the file or exits normally.



If all the programs that are writing to the same pipe either use line-buffered mode, or use unbuffered mode and write each line with a single call to an output function, and if the lines are short enough to write in a single chunk, then the output will be an interleaving of whole lines. But if one of the programs uses fully-buffered mode, or if the lines are too long, then you will see mixed lines.



Here is an example where I interleave the output from two programs. I used GNU coreutils on Linux; different versions of these utilities may behave differently.




  • yes aaaa writes aaaa forever in what is essentially equivalent to line-buffered mode. The yes utility actually writes multiple lines at a time, but each time it emits output, the output is a whole number of lines.


  • echo bbbb; done | grep b writes bbbb forever in unbuffered mode. It uses a buffer size of 8192, and each line is 5 bytes long. Since 5 does not divide 8192, the boundaries between writes are not at a line boundary in general.

Let's pitch them together.



$ yes aaaa & while true; do echo bbbb; done | head -n 999999 | grep -e ab -e ba
bbaaaa
bbbbaaaa
baaaa
bbbaaaa
bbaaaa
bbbaaaa
ab
bbbbaaa


As you can see, yes sometimes interrupted grep and vice versa. Only about 0.001% of the lines got interrupted, but it happened. The output is randomized so the number of interruptions will vary, but I saw at least a few interruptions every time. There would be a higher fraction of interrupted lines if the lines were longer, since the likelihood of an interruption increases as the number of lines per buffer decreases.






share|improve this answer




















  • interesting, so what might be a good way to ensure that all lines were written to cat atomically, such that the cat process receives whole lines from either foo/bar/baz but not half a line from one and half a line from another, etc. Is there something I can do with the bash script?
    – Alexander Mills
    55 mins ago







  • 1




    sounds this applies to my case also where I had hundreds of files and awk was produced two (or more) lines of output for same ID with find -type f -name 'myfiles*' -print0 | xargs -0 awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] ' but with find -type f -name 'myfiles*' -print0 | xargs -0 cat| awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] ' it correctly produced only one line for every IDs.
    – sddgob
    49 mins ago











  • To prevent any interleaving, I can do that with in a programming env like Node.js, but with bash/shell, not sure how to do it.
    – Alexander Mills
    45 mins ago










  • @AlexanderMills Write to different files, then process the files separately, or concatenate them.
    – Kusalananda
    8 mins ago










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f476080%2fwhat-prevents-stdout-stderr-from-interleaving%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
5
down vote













It depends how the programs buffer their output. The stdio library that most programs use when they're writing uses buffers to make output more efficient. Instead of outputting data as soon as the program calls a library function to write to a file, the function stores this data in a buffer, and only actually outputs the data once the buffer has filled up. This means that output is done in batches. More precisely, there are three output modes:



  • Unbuffered: the data is written immediately, without using a buffer. This can be slow if the program writes its output in small pieces, e.g. character by character. This is the default mode for standard error.

  • Fully buffered: the data is only written when the buffer is full. This is the default mode when writing to a pipe or to a regular file, except with stderr.

  • Line-buffered: the data is written after each newline, or when the buffer is full. This is the default mode when writing to a terminal, except with stderr.

Programs can reprogram each file to behave differently, and can explicitly flush the buffer. The buffer is flushed automatically when a program closes the file or exits normally.



If all the programs that are writing to the same pipe either use line-buffered mode, or use unbuffered mode and write each line with a single call to an output function, and if the lines are short enough to write in a single chunk, then the output will be an interleaving of whole lines. But if one of the programs uses fully-buffered mode, or if the lines are too long, then you will see mixed lines.



Here is an example where I interleave the output from two programs. I used GNU coreutils on Linux; different versions of these utilities may behave differently.




  • yes aaaa writes aaaa forever in what is essentially equivalent to line-buffered mode. The yes utility actually writes multiple lines at a time, but each time it emits output, the output is a whole number of lines.


  • echo bbbb; done | grep b writes bbbb forever in unbuffered mode. It uses a buffer size of 8192, and each line is 5 bytes long. Since 5 does not divide 8192, the boundaries between writes are not at a line boundary in general.

Let's pitch them together.



$ yes aaaa & while true; do echo bbbb; done | head -n 999999 | grep -e ab -e ba
bbaaaa
bbbbaaaa
baaaa
bbbaaaa
bbaaaa
bbbaaaa
ab
bbbbaaa


As you can see, yes sometimes interrupted grep and vice versa. Only about 0.001% of the lines got interrupted, but it happened. The output is randomized so the number of interruptions will vary, but I saw at least a few interruptions every time. There would be a higher fraction of interrupted lines if the lines were longer, since the likelihood of an interruption increases as the number of lines per buffer decreases.






share|improve this answer




















  • interesting, so what might be a good way to ensure that all lines were written to cat atomically, such that the cat process receives whole lines from either foo/bar/baz but not half a line from one and half a line from another, etc. Is there something I can do with the bash script?
    – Alexander Mills
    55 mins ago







  • 1




    sounds this applies to my case also where I had hundreds of files and awk was produced two (or more) lines of output for same ID with find -type f -name 'myfiles*' -print0 | xargs -0 awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] ' but with find -type f -name 'myfiles*' -print0 | xargs -0 cat| awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] ' it correctly produced only one line for every IDs.
    – sddgob
    49 mins ago











  • To prevent any interleaving, I can do that with in a programming env like Node.js, but with bash/shell, not sure how to do it.
    – Alexander Mills
    45 mins ago










  • @AlexanderMills Write to different files, then process the files separately, or concatenate them.
    – Kusalananda
    8 mins ago














up vote
5
down vote













It depends how the programs buffer their output. The stdio library that most programs use when they're writing uses buffers to make output more efficient. Instead of outputting data as soon as the program calls a library function to write to a file, the function stores this data in a buffer, and only actually outputs the data once the buffer has filled up. This means that output is done in batches. More precisely, there are three output modes:



  • Unbuffered: the data is written immediately, without using a buffer. This can be slow if the program writes its output in small pieces, e.g. character by character. This is the default mode for standard error.

  • Fully buffered: the data is only written when the buffer is full. This is the default mode when writing to a pipe or to a regular file, except with stderr.

  • Line-buffered: the data is written after each newline, or when the buffer is full. This is the default mode when writing to a terminal, except with stderr.

Programs can reprogram each file to behave differently, and can explicitly flush the buffer. The buffer is flushed automatically when a program closes the file or exits normally.



If all the programs that are writing to the same pipe either use line-buffered mode, or use unbuffered mode and write each line with a single call to an output function, and if the lines are short enough to write in a single chunk, then the output will be an interleaving of whole lines. But if one of the programs uses fully-buffered mode, or if the lines are too long, then you will see mixed lines.



Here is an example where I interleave the output from two programs. I used GNU coreutils on Linux; different versions of these utilities may behave differently.




  • yes aaaa writes aaaa forever in what is essentially equivalent to line-buffered mode. The yes utility actually writes multiple lines at a time, but each time it emits output, the output is a whole number of lines.


  • echo bbbb; done | grep b writes bbbb forever in unbuffered mode. It uses a buffer size of 8192, and each line is 5 bytes long. Since 5 does not divide 8192, the boundaries between writes are not at a line boundary in general.

Let's pitch them together.



$ yes aaaa & while true; do echo bbbb; done | head -n 999999 | grep -e ab -e ba
bbaaaa
bbbbaaaa
baaaa
bbbaaaa
bbaaaa
bbbaaaa
ab
bbbbaaa


As you can see, yes sometimes interrupted grep and vice versa. Only about 0.001% of the lines got interrupted, but it happened. The output is randomized so the number of interruptions will vary, but I saw at least a few interruptions every time. There would be a higher fraction of interrupted lines if the lines were longer, since the likelihood of an interruption increases as the number of lines per buffer decreases.






share|improve this answer




















  • interesting, so what might be a good way to ensure that all lines were written to cat atomically, such that the cat process receives whole lines from either foo/bar/baz but not half a line from one and half a line from another, etc. Is there something I can do with the bash script?
    – Alexander Mills
    55 mins ago







  • 1




    sounds this applies to my case also where I had hundreds of files and awk was produced two (or more) lines of output for same ID with find -type f -name 'myfiles*' -print0 | xargs -0 awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] ' but with find -type f -name 'myfiles*' -print0 | xargs -0 cat| awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] ' it correctly produced only one line for every IDs.
    – sddgob
    49 mins ago











  • To prevent any interleaving, I can do that with in a programming env like Node.js, but with bash/shell, not sure how to do it.
    – Alexander Mills
    45 mins ago










  • @AlexanderMills Write to different files, then process the files separately, or concatenate them.
    – Kusalananda
    8 mins ago












up vote
5
down vote










up vote
5
down vote









It depends how the programs buffer their output. The stdio library that most programs use when they're writing uses buffers to make output more efficient. Instead of outputting data as soon as the program calls a library function to write to a file, the function stores this data in a buffer, and only actually outputs the data once the buffer has filled up. This means that output is done in batches. More precisely, there are three output modes:



  • Unbuffered: the data is written immediately, without using a buffer. This can be slow if the program writes its output in small pieces, e.g. character by character. This is the default mode for standard error.

  • Fully buffered: the data is only written when the buffer is full. This is the default mode when writing to a pipe or to a regular file, except with stderr.

  • Line-buffered: the data is written after each newline, or when the buffer is full. This is the default mode when writing to a terminal, except with stderr.

Programs can reprogram each file to behave differently, and can explicitly flush the buffer. The buffer is flushed automatically when a program closes the file or exits normally.



If all the programs that are writing to the same pipe either use line-buffered mode, or use unbuffered mode and write each line with a single call to an output function, and if the lines are short enough to write in a single chunk, then the output will be an interleaving of whole lines. But if one of the programs uses fully-buffered mode, or if the lines are too long, then you will see mixed lines.



Here is an example where I interleave the output from two programs. I used GNU coreutils on Linux; different versions of these utilities may behave differently.




  • yes aaaa writes aaaa forever in what is essentially equivalent to line-buffered mode. The yes utility actually writes multiple lines at a time, but each time it emits output, the output is a whole number of lines.


  • echo bbbb; done | grep b writes bbbb forever in unbuffered mode. It uses a buffer size of 8192, and each line is 5 bytes long. Since 5 does not divide 8192, the boundaries between writes are not at a line boundary in general.

Let's pitch them together.



$ yes aaaa & while true; do echo bbbb; done | head -n 999999 | grep -e ab -e ba
bbaaaa
bbbbaaaa
baaaa
bbbaaaa
bbaaaa
bbbaaaa
ab
bbbbaaa


As you can see, yes sometimes interrupted grep and vice versa. Only about 0.001% of the lines got interrupted, but it happened. The output is randomized so the number of interruptions will vary, but I saw at least a few interruptions every time. There would be a higher fraction of interrupted lines if the lines were longer, since the likelihood of an interruption increases as the number of lines per buffer decreases.






share|improve this answer












It depends how the programs buffer their output. The stdio library that most programs use when they're writing uses buffers to make output more efficient. Instead of outputting data as soon as the program calls a library function to write to a file, the function stores this data in a buffer, and only actually outputs the data once the buffer has filled up. This means that output is done in batches. More precisely, there are three output modes:



  • Unbuffered: the data is written immediately, without using a buffer. This can be slow if the program writes its output in small pieces, e.g. character by character. This is the default mode for standard error.

  • Fully buffered: the data is only written when the buffer is full. This is the default mode when writing to a pipe or to a regular file, except with stderr.

  • Line-buffered: the data is written after each newline, or when the buffer is full. This is the default mode when writing to a terminal, except with stderr.

Programs can reprogram each file to behave differently, and can explicitly flush the buffer. The buffer is flushed automatically when a program closes the file or exits normally.



If all the programs that are writing to the same pipe either use line-buffered mode, or use unbuffered mode and write each line with a single call to an output function, and if the lines are short enough to write in a single chunk, then the output will be an interleaving of whole lines. But if one of the programs uses fully-buffered mode, or if the lines are too long, then you will see mixed lines.



Here is an example where I interleave the output from two programs. I used GNU coreutils on Linux; different versions of these utilities may behave differently.




  • yes aaaa writes aaaa forever in what is essentially equivalent to line-buffered mode. The yes utility actually writes multiple lines at a time, but each time it emits output, the output is a whole number of lines.


  • echo bbbb; done | grep b writes bbbb forever in unbuffered mode. It uses a buffer size of 8192, and each line is 5 bytes long. Since 5 does not divide 8192, the boundaries between writes are not at a line boundary in general.

Let's pitch them together.



$ yes aaaa & while true; do echo bbbb; done | head -n 999999 | grep -e ab -e ba
bbaaaa
bbbbaaaa
baaaa
bbbaaaa
bbaaaa
bbbaaaa
ab
bbbbaaa


As you can see, yes sometimes interrupted grep and vice versa. Only about 0.001% of the lines got interrupted, but it happened. The output is randomized so the number of interruptions will vary, but I saw at least a few interruptions every time. There would be a higher fraction of interrupted lines if the lines were longer, since the likelihood of an interruption increases as the number of lines per buffer decreases.







share|improve this answer












share|improve this answer



share|improve this answer










answered 1 hour ago









Gilles

515k12110231551




515k12110231551











  • interesting, so what might be a good way to ensure that all lines were written to cat atomically, such that the cat process receives whole lines from either foo/bar/baz but not half a line from one and half a line from another, etc. Is there something I can do with the bash script?
    – Alexander Mills
    55 mins ago







  • 1




    sounds this applies to my case also where I had hundreds of files and awk was produced two (or more) lines of output for same ID with find -type f -name 'myfiles*' -print0 | xargs -0 awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] ' but with find -type f -name 'myfiles*' -print0 | xargs -0 cat| awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] ' it correctly produced only one line for every IDs.
    – sddgob
    49 mins ago











  • To prevent any interleaving, I can do that with in a programming env like Node.js, but with bash/shell, not sure how to do it.
    – Alexander Mills
    45 mins ago










  • @AlexanderMills Write to different files, then process the files separately, or concatenate them.
    – Kusalananda
    8 mins ago
















  • interesting, so what might be a good way to ensure that all lines were written to cat atomically, such that the cat process receives whole lines from either foo/bar/baz but not half a line from one and half a line from another, etc. Is there something I can do with the bash script?
    – Alexander Mills
    55 mins ago







  • 1




    sounds this applies to my case also where I had hundreds of files and awk was produced two (or more) lines of output for same ID with find -type f -name 'myfiles*' -print0 | xargs -0 awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] ' but with find -type f -name 'myfiles*' -print0 | xargs -0 cat| awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] ' it correctly produced only one line for every IDs.
    – sddgob
    49 mins ago











  • To prevent any interleaving, I can do that with in a programming env like Node.js, but with bash/shell, not sure how to do it.
    – Alexander Mills
    45 mins ago










  • @AlexanderMills Write to different files, then process the files separately, or concatenate them.
    – Kusalananda
    8 mins ago















interesting, so what might be a good way to ensure that all lines were written to cat atomically, such that the cat process receives whole lines from either foo/bar/baz but not half a line from one and half a line from another, etc. Is there something I can do with the bash script?
– Alexander Mills
55 mins ago





interesting, so what might be a good way to ensure that all lines were written to cat atomically, such that the cat process receives whole lines from either foo/bar/baz but not half a line from one and half a line from another, etc. Is there something I can do with the bash script?
– Alexander Mills
55 mins ago





1




1




sounds this applies to my case also where I had hundreds of files and awk was produced two (or more) lines of output for same ID with find -type f -name 'myfiles*' -print0 | xargs -0 awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] ' but with find -type f -name 'myfiles*' -print0 | xargs -0 cat| awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] ' it correctly produced only one line for every IDs.
– sddgob
49 mins ago





sounds this applies to my case also where I had hundreds of files and awk was produced two (or more) lines of output for same ID with find -type f -name 'myfiles*' -print0 | xargs -0 awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] ' but with find -type f -name 'myfiles*' -print0 | xargs -0 cat| awk ' seen[$1]= seen[$1] $2 END for(x in seen) print x, seen[x] ' it correctly produced only one line for every IDs.
– sddgob
49 mins ago













To prevent any interleaving, I can do that with in a programming env like Node.js, but with bash/shell, not sure how to do it.
– Alexander Mills
45 mins ago




To prevent any interleaving, I can do that with in a programming env like Node.js, but with bash/shell, not sure how to do it.
– Alexander Mills
45 mins ago












@AlexanderMills Write to different files, then process the files separately, or concatenate them.
– Kusalananda
8 mins ago




@AlexanderMills Write to different files, then process the files separately, or concatenate them.
– Kusalananda
8 mins ago

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f476080%2fwhat-prevents-stdout-stderr-from-interleaving%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What does second last employer means? [closed]

Installing NextGIS Connect into QGIS 3?

Confectionery