Are file edits in Linux directly saved into disk?
Clash Royale CLAN TAG#URR8PPP
up vote
52
down vote
favorite
I used to think that file changes are saved directly into the disk, that is, as soon as I close the file and decide to click/select save. However, in a recent conversation, a friend of mine told me that is not usually true; the OS (specifically we were talking about Linux systems) keeps the changes in memory and it has a daemon that actually writes the content from memory to the disk.
He even gave the example of external flash drives: these are mounted into the system (copied into memory) and sometimes data loss happens because the daemon did not yet save the contents into the flash memory; that is why we unmount flash drives.
I have no knowledge about operating systems functioning, and so I have absolutely no idea whether this is true and in which circumstances. My main question is: does this happen like described in Linux/Unix systems (and maybe other OSes)? For instance, does this mean that if I turn off the computer immediately after I edit and save a file, my changes will be most likely lost? Perhaps it depends on the disk type -- traditional hard drives vs. solid-state disks?
The question refers specifically to filesystems that have a disk to store the information, even though any clarification or comparison is well received.
linux filesystems disk
 |Â
show 6 more comments
up vote
52
down vote
favorite
I used to think that file changes are saved directly into the disk, that is, as soon as I close the file and decide to click/select save. However, in a recent conversation, a friend of mine told me that is not usually true; the OS (specifically we were talking about Linux systems) keeps the changes in memory and it has a daemon that actually writes the content from memory to the disk.
He even gave the example of external flash drives: these are mounted into the system (copied into memory) and sometimes data loss happens because the daemon did not yet save the contents into the flash memory; that is why we unmount flash drives.
I have no knowledge about operating systems functioning, and so I have absolutely no idea whether this is true and in which circumstances. My main question is: does this happen like described in Linux/Unix systems (and maybe other OSes)? For instance, does this mean that if I turn off the computer immediately after I edit and save a file, my changes will be most likely lost? Perhaps it depends on the disk type -- traditional hard drives vs. solid-state disks?
The question refers specifically to filesystems that have a disk to store the information, even though any clarification or comparison is well received.
linux filesystems disk
8
FAO: Close Vote queue reviewers. This is not a request for learning materials. See unix.meta.stackexchange.com/q/3892/22812
â Anthony Geoghegan
Aug 22 at 21:06
2
The cache is opaque to the user, in the best case you mustsync
, and applications mustflush
to guarantee caches are written back, but even a sucessfullsync
does not guarantee write back to physical disk only that kernel caches are flushed to disk, which may have latency in the driver or disk hardware (e.g. on-drive cache that you lose)
â crasic
Aug 23 at 2:41
1
While I don't agree that it's a request for learning materials, I do think the question is a little Broad in its current form. Limit the scope to Linux distributions (or whatever specific OS) and possibly limit it to certain storage technologies and filesystems.
â Jeff Schaller
Aug 23 at 12:21
3
As @AnthonyGeoghegan pointed out, I don't consider this question a request for learning materials. I think it's rather specific; I did not ask for a long and deep explanation or a manual about Linux filesystems; only about a brief idea that I wanted to clear out.
â JuanRocamonde
Aug 23 at 13:19
3
It is true that as it is it may be a bit broad, @JeffSchaller; I'm going to try to edit it a bit; however, honestly if the site is not for this type of questions, that directly address Linux functioning, then what is it for?
â JuanRocamonde
Aug 23 at 13:20
 |Â
show 6 more comments
up vote
52
down vote
favorite
up vote
52
down vote
favorite
I used to think that file changes are saved directly into the disk, that is, as soon as I close the file and decide to click/select save. However, in a recent conversation, a friend of mine told me that is not usually true; the OS (specifically we were talking about Linux systems) keeps the changes in memory and it has a daemon that actually writes the content from memory to the disk.
He even gave the example of external flash drives: these are mounted into the system (copied into memory) and sometimes data loss happens because the daemon did not yet save the contents into the flash memory; that is why we unmount flash drives.
I have no knowledge about operating systems functioning, and so I have absolutely no idea whether this is true and in which circumstances. My main question is: does this happen like described in Linux/Unix systems (and maybe other OSes)? For instance, does this mean that if I turn off the computer immediately after I edit and save a file, my changes will be most likely lost? Perhaps it depends on the disk type -- traditional hard drives vs. solid-state disks?
The question refers specifically to filesystems that have a disk to store the information, even though any clarification or comparison is well received.
linux filesystems disk
I used to think that file changes are saved directly into the disk, that is, as soon as I close the file and decide to click/select save. However, in a recent conversation, a friend of mine told me that is not usually true; the OS (specifically we were talking about Linux systems) keeps the changes in memory and it has a daemon that actually writes the content from memory to the disk.
He even gave the example of external flash drives: these are mounted into the system (copied into memory) and sometimes data loss happens because the daemon did not yet save the contents into the flash memory; that is why we unmount flash drives.
I have no knowledge about operating systems functioning, and so I have absolutely no idea whether this is true and in which circumstances. My main question is: does this happen like described in Linux/Unix systems (and maybe other OSes)? For instance, does this mean that if I turn off the computer immediately after I edit and save a file, my changes will be most likely lost? Perhaps it depends on the disk type -- traditional hard drives vs. solid-state disks?
The question refers specifically to filesystems that have a disk to store the information, even though any clarification or comparison is well received.
linux filesystems disk
edited Aug 24 at 11:23
Jeff Schaller
32k849109
32k849109
asked Aug 22 at 16:34
JuanRocamonde
36639
36639
8
FAO: Close Vote queue reviewers. This is not a request for learning materials. See unix.meta.stackexchange.com/q/3892/22812
â Anthony Geoghegan
Aug 22 at 21:06
2
The cache is opaque to the user, in the best case you mustsync
, and applications mustflush
to guarantee caches are written back, but even a sucessfullsync
does not guarantee write back to physical disk only that kernel caches are flushed to disk, which may have latency in the driver or disk hardware (e.g. on-drive cache that you lose)
â crasic
Aug 23 at 2:41
1
While I don't agree that it's a request for learning materials, I do think the question is a little Broad in its current form. Limit the scope to Linux distributions (or whatever specific OS) and possibly limit it to certain storage technologies and filesystems.
â Jeff Schaller
Aug 23 at 12:21
3
As @AnthonyGeoghegan pointed out, I don't consider this question a request for learning materials. I think it's rather specific; I did not ask for a long and deep explanation or a manual about Linux filesystems; only about a brief idea that I wanted to clear out.
â JuanRocamonde
Aug 23 at 13:19
3
It is true that as it is it may be a bit broad, @JeffSchaller; I'm going to try to edit it a bit; however, honestly if the site is not for this type of questions, that directly address Linux functioning, then what is it for?
â JuanRocamonde
Aug 23 at 13:20
 |Â
show 6 more comments
8
FAO: Close Vote queue reviewers. This is not a request for learning materials. See unix.meta.stackexchange.com/q/3892/22812
â Anthony Geoghegan
Aug 22 at 21:06
2
The cache is opaque to the user, in the best case you mustsync
, and applications mustflush
to guarantee caches are written back, but even a sucessfullsync
does not guarantee write back to physical disk only that kernel caches are flushed to disk, which may have latency in the driver or disk hardware (e.g. on-drive cache that you lose)
â crasic
Aug 23 at 2:41
1
While I don't agree that it's a request for learning materials, I do think the question is a little Broad in its current form. Limit the scope to Linux distributions (or whatever specific OS) and possibly limit it to certain storage technologies and filesystems.
â Jeff Schaller
Aug 23 at 12:21
3
As @AnthonyGeoghegan pointed out, I don't consider this question a request for learning materials. I think it's rather specific; I did not ask for a long and deep explanation or a manual about Linux filesystems; only about a brief idea that I wanted to clear out.
â JuanRocamonde
Aug 23 at 13:19
3
It is true that as it is it may be a bit broad, @JeffSchaller; I'm going to try to edit it a bit; however, honestly if the site is not for this type of questions, that directly address Linux functioning, then what is it for?
â JuanRocamonde
Aug 23 at 13:20
8
8
FAO: Close Vote queue reviewers. This is not a request for learning materials. See unix.meta.stackexchange.com/q/3892/22812
â Anthony Geoghegan
Aug 22 at 21:06
FAO: Close Vote queue reviewers. This is not a request for learning materials. See unix.meta.stackexchange.com/q/3892/22812
â Anthony Geoghegan
Aug 22 at 21:06
2
2
The cache is opaque to the user, in the best case you must
sync
, and applications must flush
to guarantee caches are written back, but even a sucessfull sync
does not guarantee write back to physical disk only that kernel caches are flushed to disk, which may have latency in the driver or disk hardware (e.g. on-drive cache that you lose)â crasic
Aug 23 at 2:41
The cache is opaque to the user, in the best case you must
sync
, and applications must flush
to guarantee caches are written back, but even a sucessfull sync
does not guarantee write back to physical disk only that kernel caches are flushed to disk, which may have latency in the driver or disk hardware (e.g. on-drive cache that you lose)â crasic
Aug 23 at 2:41
1
1
While I don't agree that it's a request for learning materials, I do think the question is a little Broad in its current form. Limit the scope to Linux distributions (or whatever specific OS) and possibly limit it to certain storage technologies and filesystems.
â Jeff Schaller
Aug 23 at 12:21
While I don't agree that it's a request for learning materials, I do think the question is a little Broad in its current form. Limit the scope to Linux distributions (or whatever specific OS) and possibly limit it to certain storage technologies and filesystems.
â Jeff Schaller
Aug 23 at 12:21
3
3
As @AnthonyGeoghegan pointed out, I don't consider this question a request for learning materials. I think it's rather specific; I did not ask for a long and deep explanation or a manual about Linux filesystems; only about a brief idea that I wanted to clear out.
â JuanRocamonde
Aug 23 at 13:19
As @AnthonyGeoghegan pointed out, I don't consider this question a request for learning materials. I think it's rather specific; I did not ask for a long and deep explanation or a manual about Linux filesystems; only about a brief idea that I wanted to clear out.
â JuanRocamonde
Aug 23 at 13:19
3
3
It is true that as it is it may be a bit broad, @JeffSchaller; I'm going to try to edit it a bit; however, honestly if the site is not for this type of questions, that directly address Linux functioning, then what is it for?
â JuanRocamonde
Aug 23 at 13:20
It is true that as it is it may be a bit broad, @JeffSchaller; I'm going to try to edit it a bit; however, honestly if the site is not for this type of questions, that directly address Linux functioning, then what is it for?
â JuanRocamonde
Aug 23 at 13:20
 |Â
show 6 more comments
5 Answers
5
active
oldest
votes
up vote
66
down vote
accepted
if I turn off the computer immediately after I edit and save a file, my changes will be most likely lost?
They might be. I wouldn't say "most likely", but the likelihood depends on a lot of things.
An easy way to increase performance of file writes, is for the OS to just cache the data, tell (lie to) the application the write went through, and then actually do the write later. This is especially useful if there's other disk activity going on at the same time: the OS can prioritize reads and do the writes later. It can also remove the need for an actual write completely, e.g., in the case where a temporary file is removed quickly afterwards.
The caching issue is more pronounced if the storage is slow. Copying files from a fast SSD to a slow USB stick will probably involve a lot of write caching, since the USB stick just can't keep up. But your cp
command returns faster, so you can carry on working, possibly even editing the files that were just copied.
Of course caching like that has the downside you note, some data might be lost before it's actually saved. The user will be miffed if their editor told them the write was successful, but the file wasn't actually on the disk. Which is why there's the fsync()
system call, which is supposed to return only after the file has actually hit the disk. Your editor can use that to make sure the data is fine before reporting to the user that the write succeeded.
I said, "is supposed to", since the drive itself might tell the same lies to the OS and say that the write is complete, while the file really only exists in a volatile write cache within the drive. Depending on the drive, there might be no way around that.
In addition to fsync()
, there are also the sync()
and syncfs()
system calls that ask the system to make sure all system-wide writes or all writes on a particular filesystem have hit the disk. The utility sync
can be used to call those.
Then there's also the O_DIRECT
flag to open()
, which is supposed to "try to minimize cache effects of the I/O to and from this file." Removing caching reduces performance, so that's mostly used by applications (databases) that do their own caching and want to be in control of it.
(O_DIRECT
isn't without its issues, the comments about it in the man page are somewhat amusing.)
What happens on a power-out also depends on the filesystem. It's not just the file data that you should be concerned about, but the filesystem metadata. Having the file data on disk isn't much use if you can't find it. Just extending a file to a larger size will require allocating new data blocks, and they need to be marked somewhere.
How a filesystem deals with metadata changes and the ordering between metadata and data writes varies a lot. E.g., with ext4
, if you set the mount flag data=journal
, then all writes â even data writes â go through the journal and should be rather safe. That also means they get written twice, so performance goes down. The default options try to order the writes so that the data is on the disk before the metadata is updated. Other options or other filesystem may be better or worse; I won't even try a comprehensive study.
In practice, on a lightly loaded system, the file should hit the disk within a few seconds. If you're dealing with removable storage, unmount the filesystem before pulling the media to make sure the data is actually sent to the drive, and there's no further activity. (Or have your GUI environment do that for you.)
Yoursome cases where
link doesn't appear to say about any such cases â it's instead saying that there were problems when the apps didn't usefsync
. Or should I look into the comments to find these cases you are pointing at?
â Ruslan
Aug 22 at 21:17
1
You can also usesync
directly as a system shell command to poke the kernel to flush all caches.
â crasic
Aug 23 at 2:43
3
In practice, on a lightly loaded system, the file will hit the disk within a moment. Only if your editor usesfsync()
after writing the file. The Linux default for/proc/sys/vm/dirty_writeback_centisecs
is 500 (5 seconds), and PowerTop recommends setting it to 1500 (15 seconds). (kernel.org/doc/Documentation/sysctl/vm.txt). On a lightly loaded system, the kernel will just let it sit dirty in the page-cache that long afterwrite()
before flushing to disk, to optimize for the case where it's deleted or modified again soon.
â Peter Cordes
Aug 24 at 14:06
2
+1 for since the drive itself might make the same lies to the OS. My understanding is that drives doing that kind of caching also have enough power capacitance to allow their caches to be saved even on catastrophic power loss. This isn't OS-specific; Windows has the "Safely remove USB" mechanism to perform cache flushing before the user unplugs.
â studog
Aug 24 at 15:30
1
@studog, I wouldn't be so sure, especially on consumer hardware. But it might be just paranoia. It would be interesting to test, though.
â ilkkachu
Aug 24 at 16:58
 |Â
show 2 more comments
up vote
14
down vote
There is an extremely simple way to prove that it cannot be true that file edits are always directly saved to disk, namely the fact that there are filesystems that aren't backed by a disk in the first place. If a filesystem doesn't have a disk in the first place, then it cannot possibly write the changes to disk, ever.
Some examples are:
tmpfs
, a file system that only exists in RAM (or more precisely, in the buffer cache)ramfs
, a file system that only exists in RAM
any network file system (NFS, CIFS/SMB, AFS, AFP, â¦)
any virtual filesystem (sysfs
,procfs
,devfs
,shmfs
, â¦)
But even for disk-backed file systems this is usually not true. The page How To Corrupt An SQLite Database has a chapter called Failure to sync which describes many different ways in which writes (in this cases commits to an SQLite database) can fail to arrive on disk. SQLite also has a white paper explaining the many hoops you have to jump through to guarantee Atomic Commit In SQLite. (Note that Atomic Write is a much harder than problem than just Write, but of course writing to disk is sub-problem of atomic writing, and you can learn a lot about that problem, too, from this paper.) This paper has a section on Things That Can Go Wrong which includes a subsection about Incomplete Disk Flushes that give some examples of subtle intricacies that might prevent a write from reaching the disk (such as the HDD controller reporting that it has written to disk when it fact it hasn'tàâ yes, there are HDD manufacturers that do this, and it might even be legal according to the ATA spec, because it is ambiguously worded in this respect).
10
The first part of this answer is just besserwissering about the exact word used. I don't see how it serves any purpose other than to ridicule the user. Obviously a network file system will not write to a local disk but the question still holds there.
â pipe
Aug 23 at 11:27
3
As @pipe pointed out, the fact that there are filesystems that don't save data into a disk because they don't use a disk to store data, does not decide whether those who do have it may or may not save it directly. However, answer looks interesting
â JuanRocamonde
Aug 23 at 14:32
1
@pipe I'm pretty sure using the term "besserwissering" is besserwissering! Saying that as a German Besserwisser with authority.
â Volker Siegel
Aug 24 at 21:36
add a comment |Â
up vote
11
down vote
It is true that most operating systems, including Unix, Linux and Windows use a write cache to speed up operations. That means that turning a computer off without shutting it down is a bad idea and may lead to data loss. The same is true if you remove an USB storage before it is ready to be removed.
Most systems also offer the option to make writes synchronous. That means that the data will be on disk before an application receives a success confirmation, at the cost of being slower.
In short, there is a reason why you should properly shut down your computer and properly prepare USB storage for removal.
Thank you for your reply! Is there a way to force disk writing of a specific file in Linux? Maybe a link to a tutorial or a docs page, even an SE question would just be fine :)
â JuanRocamonde
Aug 22 at 16:55
4
You can force the write of the file with thefsync()
syscall from a program. From a shell, just use thesync
command.
â RalfFriedl
Aug 22 at 17:01
2
There are (or at least were) some filesystems in some versions of Linux, wheresync
was implemented as a no-op. And even for filesystems that do correctly implementsync
, there is still the problem that some disk firmwares implementFLUSH CACHE
as a no-op or immediately return from it and perform it in the background.
â Jörg W Mittag
Aug 22 at 21:46
add a comment |Â
up vote
8
down vote
1. Flash-based storage
Does it depend upon the disk type (traditional hard drives vs. solid-state disks) or any other variable that I might not be aware of? Does it happen (if it does) only in Linux or is this present in other OSes?
When you have a choice, you should not allow flash-based storage to lose power without a clean shutdown.
On low-cost storage like SD cards, you can expect to lose entire erase-blocks (several times larger than 4KB), losing data which could belong to different files or essential structures of the filesystem.
Some expensive SSDs may claim to offer better guarantees in the face of power failure. However third-party testing suggests that many expensive SSDs fail to do so. The layer that remaps blocks for "wear levelling" is complex and proprietary. Possible failures include loss of all data on the drive.
Applying our testing framework, we test 17 commodity SSDs from six different vendors using more than three thousand fault injection cycles in total. Our experimental results reveal that 14 of the 17 tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, metadata corruption, and total device failure.
2017: https://dl.acm.org/citation.cfm?id=2992782&preflayout=flat
2013: https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf?wptouch_preview_theme=enabled
2. Spinning hard disk drives
Spinning HDDs have different characteristics. For safety and simplicity, I recommend assuming they have the same practical uncertainty as flash-based storage.
Unless you have specific evidence, which you clearly don't. I don't have comparative figures for spinning HDDs.
A HDD might leave one incompletely written sector with a bad checksum, which will give us a nice read failure later on. Broadly speaking, this failure mode of HDDs is entirely expected; native Linux filesystems are designed with it in mind. They aim to preserve the contract of fsync()
in the face of this type of power loss fault. (We'd really like to see this guaranteed on SSDs).
However I'm not sure whether Linux filesystems achieve this in all cases, or whether that's even possible.
The next boot after this type of fault may require a filesystem repair. This being Linux, it is possible that the filesystem repair will ask some questions that you do not understand, where you can only press Y and hope that it will sort itself out.
2.1 If you don't know what the fsync() contract is
The fsync() contract is a source of both good news and bad news. You must understand the good news first.
Good news: fsync()
is well-documented as the correct way to write file data e.g. when you hit "save". And it is widely understood that e.g. text editors must replace existing files atomically using rename()
. This is meant to make sure that you always either keep the old file, or get the new file (which was fsync()
ed before the rename). You don't want to be left with a half-written version of the new file.
Bad news: for many years, calling fsync() on the most popular Linux filesystem could effectively leave the whole system hanging for tens of seconds. Since applications can do nothing about this, it was very common to optimistically use rename() without fsync(), which appeared to be relatively reliable on this filesystem.
Therefore, applications exist which do not use fsync() correctly.
The next version of this filesystem generally avoided the fsync() hang - at the same time as it started relying on the correct use of fsync().
This is all pretty bad. Understanding this history is probably not helped by the dismissive tone and invective which was used by many of the conflicting kernel developers.
The current resolution is that the current most popular Linux filesystem defaults to supporting the rename() pattern without requiring fsync() implements "bug-for-bug compatibility" with the previous version. This can be disabled with the mount option noauto_da_alloc
.
This is not a complete protection. Basically it flushes the pending IO at rename() time, but it doesn't wait for the IO to complete before renaming. This is much better than e.g. a 60 second danger window though! See also the answer to Which filesystems require fsync() for crash-safety when replacing an existing file with rename()?
Some less popular filesystems do not provide protection. XFS refuses to do so. And UBIFS has not implemented it either, apparently it could be accepted but needs a lot of work to make it possible. The same page points out that UBIFS has several other "TODO" issues for data integrity, including on power loss. UBIFS is a filesystem used directly on flash storage. I imagine some of the difficulties UBIFS mentions with flash storage could be relevant to the SSD bugs.
add a comment |Â
up vote
5
down vote
On a lightly loaded system, the kernel will let newly-written file data sit in the page-cache for maybe 30 seconds after a write()
, before flushing it to disk, to optimize for the case where it's deleted or modified again soon.
Linux's dirty_expire_centisecs
defaults to 3000 (30 seconds), and controls how long before newly-written data "expires". (See https://lwn.net/Articles/322823/).
See https://www.kernel.org/doc/Documentation/sysctl/vm.txt for more related tunables, and google for lots more. (e.g. google on dirty_writeback_centisecs
).
The Linux default for /proc/sys/vm/dirty_writeback_centisecs
is 500 (5 seconds), and PowerTop recommends setting it to 1500 (15 seconds) to reduce power consumption.
Delayed write-back also gives time for the kernel to see how big a file will be, before starting to write it to disk. Filesystems with delayed allocation (like XFS, and probably others these days) don't even choose where on disk to put a newly-written file's data until necessary, separately from allocating space for the inode itself. This reduces fragmentation by letting them avoid putting the start of a large file in a 1 meg gap between other files, for example.
If lots of data is being written, then writeback to disk can be triggered by a threshold for how much dirty (not yet synced to disk) data can be in the pagecache.
If you aren't doing much else, though, your hard-drive activity light won't go on for 5 (or 15) seconds after hitting save on a small file.
If your editor used fsync()
after writing the file, the kernel will write it to disk without delay. (And fsync
won't return until the data has actually been sent to disk).
Write caching within the disk can also be a thing, but disks normally try to commit their write-cache to permanent storage ASAP, unlike Linux's page-cache algorithms. Disk write caches are more of a store buffer to absorb small bursts of writes, but maybe also to delay writes in favour of reads, and give the disks firmware room to optimize a seek pattern (e.g. do two nearby writes or reads instead of doing one, then seeking far away, then seeking back.)
On a rotating (magnetic) disk, you might see a few seek delays of 7 to 10 ms each before data from a SATA write command is actually safe from power-off, if there were pending reads/writes ahead of your write. (Some other answers on this question go into more detail about disk write caches and write barriers that journalled FSes can use to avoid corruption.)
add a comment |Â
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
66
down vote
accepted
if I turn off the computer immediately after I edit and save a file, my changes will be most likely lost?
They might be. I wouldn't say "most likely", but the likelihood depends on a lot of things.
An easy way to increase performance of file writes, is for the OS to just cache the data, tell (lie to) the application the write went through, and then actually do the write later. This is especially useful if there's other disk activity going on at the same time: the OS can prioritize reads and do the writes later. It can also remove the need for an actual write completely, e.g., in the case where a temporary file is removed quickly afterwards.
The caching issue is more pronounced if the storage is slow. Copying files from a fast SSD to a slow USB stick will probably involve a lot of write caching, since the USB stick just can't keep up. But your cp
command returns faster, so you can carry on working, possibly even editing the files that were just copied.
Of course caching like that has the downside you note, some data might be lost before it's actually saved. The user will be miffed if their editor told them the write was successful, but the file wasn't actually on the disk. Which is why there's the fsync()
system call, which is supposed to return only after the file has actually hit the disk. Your editor can use that to make sure the data is fine before reporting to the user that the write succeeded.
I said, "is supposed to", since the drive itself might tell the same lies to the OS and say that the write is complete, while the file really only exists in a volatile write cache within the drive. Depending on the drive, there might be no way around that.
In addition to fsync()
, there are also the sync()
and syncfs()
system calls that ask the system to make sure all system-wide writes or all writes on a particular filesystem have hit the disk. The utility sync
can be used to call those.
Then there's also the O_DIRECT
flag to open()
, which is supposed to "try to minimize cache effects of the I/O to and from this file." Removing caching reduces performance, so that's mostly used by applications (databases) that do their own caching and want to be in control of it.
(O_DIRECT
isn't without its issues, the comments about it in the man page are somewhat amusing.)
What happens on a power-out also depends on the filesystem. It's not just the file data that you should be concerned about, but the filesystem metadata. Having the file data on disk isn't much use if you can't find it. Just extending a file to a larger size will require allocating new data blocks, and they need to be marked somewhere.
How a filesystem deals with metadata changes and the ordering between metadata and data writes varies a lot. E.g., with ext4
, if you set the mount flag data=journal
, then all writes â even data writes â go through the journal and should be rather safe. That also means they get written twice, so performance goes down. The default options try to order the writes so that the data is on the disk before the metadata is updated. Other options or other filesystem may be better or worse; I won't even try a comprehensive study.
In practice, on a lightly loaded system, the file should hit the disk within a few seconds. If you're dealing with removable storage, unmount the filesystem before pulling the media to make sure the data is actually sent to the drive, and there's no further activity. (Or have your GUI environment do that for you.)
Yoursome cases where
link doesn't appear to say about any such cases â it's instead saying that there were problems when the apps didn't usefsync
. Or should I look into the comments to find these cases you are pointing at?
â Ruslan
Aug 22 at 21:17
1
You can also usesync
directly as a system shell command to poke the kernel to flush all caches.
â crasic
Aug 23 at 2:43
3
In practice, on a lightly loaded system, the file will hit the disk within a moment. Only if your editor usesfsync()
after writing the file. The Linux default for/proc/sys/vm/dirty_writeback_centisecs
is 500 (5 seconds), and PowerTop recommends setting it to 1500 (15 seconds). (kernel.org/doc/Documentation/sysctl/vm.txt). On a lightly loaded system, the kernel will just let it sit dirty in the page-cache that long afterwrite()
before flushing to disk, to optimize for the case where it's deleted or modified again soon.
â Peter Cordes
Aug 24 at 14:06
2
+1 for since the drive itself might make the same lies to the OS. My understanding is that drives doing that kind of caching also have enough power capacitance to allow their caches to be saved even on catastrophic power loss. This isn't OS-specific; Windows has the "Safely remove USB" mechanism to perform cache flushing before the user unplugs.
â studog
Aug 24 at 15:30
1
@studog, I wouldn't be so sure, especially on consumer hardware. But it might be just paranoia. It would be interesting to test, though.
â ilkkachu
Aug 24 at 16:58
 |Â
show 2 more comments
up vote
66
down vote
accepted
if I turn off the computer immediately after I edit and save a file, my changes will be most likely lost?
They might be. I wouldn't say "most likely", but the likelihood depends on a lot of things.
An easy way to increase performance of file writes, is for the OS to just cache the data, tell (lie to) the application the write went through, and then actually do the write later. This is especially useful if there's other disk activity going on at the same time: the OS can prioritize reads and do the writes later. It can also remove the need for an actual write completely, e.g., in the case where a temporary file is removed quickly afterwards.
The caching issue is more pronounced if the storage is slow. Copying files from a fast SSD to a slow USB stick will probably involve a lot of write caching, since the USB stick just can't keep up. But your cp
command returns faster, so you can carry on working, possibly even editing the files that were just copied.
Of course caching like that has the downside you note, some data might be lost before it's actually saved. The user will be miffed if their editor told them the write was successful, but the file wasn't actually on the disk. Which is why there's the fsync()
system call, which is supposed to return only after the file has actually hit the disk. Your editor can use that to make sure the data is fine before reporting to the user that the write succeeded.
I said, "is supposed to", since the drive itself might tell the same lies to the OS and say that the write is complete, while the file really only exists in a volatile write cache within the drive. Depending on the drive, there might be no way around that.
In addition to fsync()
, there are also the sync()
and syncfs()
system calls that ask the system to make sure all system-wide writes or all writes on a particular filesystem have hit the disk. The utility sync
can be used to call those.
Then there's also the O_DIRECT
flag to open()
, which is supposed to "try to minimize cache effects of the I/O to and from this file." Removing caching reduces performance, so that's mostly used by applications (databases) that do their own caching and want to be in control of it.
(O_DIRECT
isn't without its issues, the comments about it in the man page are somewhat amusing.)
What happens on a power-out also depends on the filesystem. It's not just the file data that you should be concerned about, but the filesystem metadata. Having the file data on disk isn't much use if you can't find it. Just extending a file to a larger size will require allocating new data blocks, and they need to be marked somewhere.
How a filesystem deals with metadata changes and the ordering between metadata and data writes varies a lot. E.g., with ext4
, if you set the mount flag data=journal
, then all writes â even data writes â go through the journal and should be rather safe. That also means they get written twice, so performance goes down. The default options try to order the writes so that the data is on the disk before the metadata is updated. Other options or other filesystem may be better or worse; I won't even try a comprehensive study.
In practice, on a lightly loaded system, the file should hit the disk within a few seconds. If you're dealing with removable storage, unmount the filesystem before pulling the media to make sure the data is actually sent to the drive, and there's no further activity. (Or have your GUI environment do that for you.)
Yoursome cases where
link doesn't appear to say about any such cases â it's instead saying that there were problems when the apps didn't usefsync
. Or should I look into the comments to find these cases you are pointing at?
â Ruslan
Aug 22 at 21:17
1
You can also usesync
directly as a system shell command to poke the kernel to flush all caches.
â crasic
Aug 23 at 2:43
3
In practice, on a lightly loaded system, the file will hit the disk within a moment. Only if your editor usesfsync()
after writing the file. The Linux default for/proc/sys/vm/dirty_writeback_centisecs
is 500 (5 seconds), and PowerTop recommends setting it to 1500 (15 seconds). (kernel.org/doc/Documentation/sysctl/vm.txt). On a lightly loaded system, the kernel will just let it sit dirty in the page-cache that long afterwrite()
before flushing to disk, to optimize for the case where it's deleted or modified again soon.
â Peter Cordes
Aug 24 at 14:06
2
+1 for since the drive itself might make the same lies to the OS. My understanding is that drives doing that kind of caching also have enough power capacitance to allow their caches to be saved even on catastrophic power loss. This isn't OS-specific; Windows has the "Safely remove USB" mechanism to perform cache flushing before the user unplugs.
â studog
Aug 24 at 15:30
1
@studog, I wouldn't be so sure, especially on consumer hardware. But it might be just paranoia. It would be interesting to test, though.
â ilkkachu
Aug 24 at 16:58
 |Â
show 2 more comments
up vote
66
down vote
accepted
up vote
66
down vote
accepted
if I turn off the computer immediately after I edit and save a file, my changes will be most likely lost?
They might be. I wouldn't say "most likely", but the likelihood depends on a lot of things.
An easy way to increase performance of file writes, is for the OS to just cache the data, tell (lie to) the application the write went through, and then actually do the write later. This is especially useful if there's other disk activity going on at the same time: the OS can prioritize reads and do the writes later. It can also remove the need for an actual write completely, e.g., in the case where a temporary file is removed quickly afterwards.
The caching issue is more pronounced if the storage is slow. Copying files from a fast SSD to a slow USB stick will probably involve a lot of write caching, since the USB stick just can't keep up. But your cp
command returns faster, so you can carry on working, possibly even editing the files that were just copied.
Of course caching like that has the downside you note, some data might be lost before it's actually saved. The user will be miffed if their editor told them the write was successful, but the file wasn't actually on the disk. Which is why there's the fsync()
system call, which is supposed to return only after the file has actually hit the disk. Your editor can use that to make sure the data is fine before reporting to the user that the write succeeded.
I said, "is supposed to", since the drive itself might tell the same lies to the OS and say that the write is complete, while the file really only exists in a volatile write cache within the drive. Depending on the drive, there might be no way around that.
In addition to fsync()
, there are also the sync()
and syncfs()
system calls that ask the system to make sure all system-wide writes or all writes on a particular filesystem have hit the disk. The utility sync
can be used to call those.
Then there's also the O_DIRECT
flag to open()
, which is supposed to "try to minimize cache effects of the I/O to and from this file." Removing caching reduces performance, so that's mostly used by applications (databases) that do their own caching and want to be in control of it.
(O_DIRECT
isn't without its issues, the comments about it in the man page are somewhat amusing.)
What happens on a power-out also depends on the filesystem. It's not just the file data that you should be concerned about, but the filesystem metadata. Having the file data on disk isn't much use if you can't find it. Just extending a file to a larger size will require allocating new data blocks, and they need to be marked somewhere.
How a filesystem deals with metadata changes and the ordering between metadata and data writes varies a lot. E.g., with ext4
, if you set the mount flag data=journal
, then all writes â even data writes â go through the journal and should be rather safe. That also means they get written twice, so performance goes down. The default options try to order the writes so that the data is on the disk before the metadata is updated. Other options or other filesystem may be better or worse; I won't even try a comprehensive study.
In practice, on a lightly loaded system, the file should hit the disk within a few seconds. If you're dealing with removable storage, unmount the filesystem before pulling the media to make sure the data is actually sent to the drive, and there's no further activity. (Or have your GUI environment do that for you.)
if I turn off the computer immediately after I edit and save a file, my changes will be most likely lost?
They might be. I wouldn't say "most likely", but the likelihood depends on a lot of things.
An easy way to increase performance of file writes, is for the OS to just cache the data, tell (lie to) the application the write went through, and then actually do the write later. This is especially useful if there's other disk activity going on at the same time: the OS can prioritize reads and do the writes later. It can also remove the need for an actual write completely, e.g., in the case where a temporary file is removed quickly afterwards.
The caching issue is more pronounced if the storage is slow. Copying files from a fast SSD to a slow USB stick will probably involve a lot of write caching, since the USB stick just can't keep up. But your cp
command returns faster, so you can carry on working, possibly even editing the files that were just copied.
Of course caching like that has the downside you note, some data might be lost before it's actually saved. The user will be miffed if their editor told them the write was successful, but the file wasn't actually on the disk. Which is why there's the fsync()
system call, which is supposed to return only after the file has actually hit the disk. Your editor can use that to make sure the data is fine before reporting to the user that the write succeeded.
I said, "is supposed to", since the drive itself might tell the same lies to the OS and say that the write is complete, while the file really only exists in a volatile write cache within the drive. Depending on the drive, there might be no way around that.
In addition to fsync()
, there are also the sync()
and syncfs()
system calls that ask the system to make sure all system-wide writes or all writes on a particular filesystem have hit the disk. The utility sync
can be used to call those.
Then there's also the O_DIRECT
flag to open()
, which is supposed to "try to minimize cache effects of the I/O to and from this file." Removing caching reduces performance, so that's mostly used by applications (databases) that do their own caching and want to be in control of it.
(O_DIRECT
isn't without its issues, the comments about it in the man page are somewhat amusing.)
What happens on a power-out also depends on the filesystem. It's not just the file data that you should be concerned about, but the filesystem metadata. Having the file data on disk isn't much use if you can't find it. Just extending a file to a larger size will require allocating new data blocks, and they need to be marked somewhere.
How a filesystem deals with metadata changes and the ordering between metadata and data writes varies a lot. E.g., with ext4
, if you set the mount flag data=journal
, then all writes â even data writes â go through the journal and should be rather safe. That also means they get written twice, so performance goes down. The default options try to order the writes so that the data is on the disk before the metadata is updated. Other options or other filesystem may be better or worse; I won't even try a comprehensive study.
In practice, on a lightly loaded system, the file should hit the disk within a few seconds. If you're dealing with removable storage, unmount the filesystem before pulling the media to make sure the data is actually sent to the drive, and there's no further activity. (Or have your GUI environment do that for you.)
edited Aug 30 at 14:44
Anthony Geoghegan
7,27033752
7,27033752
answered Aug 22 at 17:32
ilkkachu
50.4k677138
50.4k677138
Yoursome cases where
link doesn't appear to say about any such cases â it's instead saying that there were problems when the apps didn't usefsync
. Or should I look into the comments to find these cases you are pointing at?
â Ruslan
Aug 22 at 21:17
1
You can also usesync
directly as a system shell command to poke the kernel to flush all caches.
â crasic
Aug 23 at 2:43
3
In practice, on a lightly loaded system, the file will hit the disk within a moment. Only if your editor usesfsync()
after writing the file. The Linux default for/proc/sys/vm/dirty_writeback_centisecs
is 500 (5 seconds), and PowerTop recommends setting it to 1500 (15 seconds). (kernel.org/doc/Documentation/sysctl/vm.txt). On a lightly loaded system, the kernel will just let it sit dirty in the page-cache that long afterwrite()
before flushing to disk, to optimize for the case where it's deleted or modified again soon.
â Peter Cordes
Aug 24 at 14:06
2
+1 for since the drive itself might make the same lies to the OS. My understanding is that drives doing that kind of caching also have enough power capacitance to allow their caches to be saved even on catastrophic power loss. This isn't OS-specific; Windows has the "Safely remove USB" mechanism to perform cache flushing before the user unplugs.
â studog
Aug 24 at 15:30
1
@studog, I wouldn't be so sure, especially on consumer hardware. But it might be just paranoia. It would be interesting to test, though.
â ilkkachu
Aug 24 at 16:58
 |Â
show 2 more comments
Yoursome cases where
link doesn't appear to say about any such cases â it's instead saying that there were problems when the apps didn't usefsync
. Or should I look into the comments to find these cases you are pointing at?
â Ruslan
Aug 22 at 21:17
1
You can also usesync
directly as a system shell command to poke the kernel to flush all caches.
â crasic
Aug 23 at 2:43
3
In practice, on a lightly loaded system, the file will hit the disk within a moment. Only if your editor usesfsync()
after writing the file. The Linux default for/proc/sys/vm/dirty_writeback_centisecs
is 500 (5 seconds), and PowerTop recommends setting it to 1500 (15 seconds). (kernel.org/doc/Documentation/sysctl/vm.txt). On a lightly loaded system, the kernel will just let it sit dirty in the page-cache that long afterwrite()
before flushing to disk, to optimize for the case where it's deleted or modified again soon.
â Peter Cordes
Aug 24 at 14:06
2
+1 for since the drive itself might make the same lies to the OS. My understanding is that drives doing that kind of caching also have enough power capacitance to allow their caches to be saved even on catastrophic power loss. This isn't OS-specific; Windows has the "Safely remove USB" mechanism to perform cache flushing before the user unplugs.
â studog
Aug 24 at 15:30
1
@studog, I wouldn't be so sure, especially on consumer hardware. But it might be just paranoia. It would be interesting to test, though.
â ilkkachu
Aug 24 at 16:58
Your
some cases where
link doesn't appear to say about any such cases â it's instead saying that there were problems when the apps didn't use fsync
. Or should I look into the comments to find these cases you are pointing at?â Ruslan
Aug 22 at 21:17
Your
some cases where
link doesn't appear to say about any such cases â it's instead saying that there were problems when the apps didn't use fsync
. Or should I look into the comments to find these cases you are pointing at?â Ruslan
Aug 22 at 21:17
1
1
You can also use
sync
directly as a system shell command to poke the kernel to flush all caches.â crasic
Aug 23 at 2:43
You can also use
sync
directly as a system shell command to poke the kernel to flush all caches.â crasic
Aug 23 at 2:43
3
3
In practice, on a lightly loaded system, the file will hit the disk within a moment. Only if your editor uses
fsync()
after writing the file. The Linux default for /proc/sys/vm/dirty_writeback_centisecs
is 500 (5 seconds), and PowerTop recommends setting it to 1500 (15 seconds). (kernel.org/doc/Documentation/sysctl/vm.txt). On a lightly loaded system, the kernel will just let it sit dirty in the page-cache that long after write()
before flushing to disk, to optimize for the case where it's deleted or modified again soon.â Peter Cordes
Aug 24 at 14:06
In practice, on a lightly loaded system, the file will hit the disk within a moment. Only if your editor uses
fsync()
after writing the file. The Linux default for /proc/sys/vm/dirty_writeback_centisecs
is 500 (5 seconds), and PowerTop recommends setting it to 1500 (15 seconds). (kernel.org/doc/Documentation/sysctl/vm.txt). On a lightly loaded system, the kernel will just let it sit dirty in the page-cache that long after write()
before flushing to disk, to optimize for the case where it's deleted or modified again soon.â Peter Cordes
Aug 24 at 14:06
2
2
+1 for since the drive itself might make the same lies to the OS. My understanding is that drives doing that kind of caching also have enough power capacitance to allow their caches to be saved even on catastrophic power loss. This isn't OS-specific; Windows has the "Safely remove USB" mechanism to perform cache flushing before the user unplugs.
â studog
Aug 24 at 15:30
+1 for since the drive itself might make the same lies to the OS. My understanding is that drives doing that kind of caching also have enough power capacitance to allow their caches to be saved even on catastrophic power loss. This isn't OS-specific; Windows has the "Safely remove USB" mechanism to perform cache flushing before the user unplugs.
â studog
Aug 24 at 15:30
1
1
@studog, I wouldn't be so sure, especially on consumer hardware. But it might be just paranoia. It would be interesting to test, though.
â ilkkachu
Aug 24 at 16:58
@studog, I wouldn't be so sure, especially on consumer hardware. But it might be just paranoia. It would be interesting to test, though.
â ilkkachu
Aug 24 at 16:58
 |Â
show 2 more comments
up vote
14
down vote
There is an extremely simple way to prove that it cannot be true that file edits are always directly saved to disk, namely the fact that there are filesystems that aren't backed by a disk in the first place. If a filesystem doesn't have a disk in the first place, then it cannot possibly write the changes to disk, ever.
Some examples are:
tmpfs
, a file system that only exists in RAM (or more precisely, in the buffer cache)ramfs
, a file system that only exists in RAM
any network file system (NFS, CIFS/SMB, AFS, AFP, â¦)
any virtual filesystem (sysfs
,procfs
,devfs
,shmfs
, â¦)
But even for disk-backed file systems this is usually not true. The page How To Corrupt An SQLite Database has a chapter called Failure to sync which describes many different ways in which writes (in this cases commits to an SQLite database) can fail to arrive on disk. SQLite also has a white paper explaining the many hoops you have to jump through to guarantee Atomic Commit In SQLite. (Note that Atomic Write is a much harder than problem than just Write, but of course writing to disk is sub-problem of atomic writing, and you can learn a lot about that problem, too, from this paper.) This paper has a section on Things That Can Go Wrong which includes a subsection about Incomplete Disk Flushes that give some examples of subtle intricacies that might prevent a write from reaching the disk (such as the HDD controller reporting that it has written to disk when it fact it hasn'tàâ yes, there are HDD manufacturers that do this, and it might even be legal according to the ATA spec, because it is ambiguously worded in this respect).
10
The first part of this answer is just besserwissering about the exact word used. I don't see how it serves any purpose other than to ridicule the user. Obviously a network file system will not write to a local disk but the question still holds there.
â pipe
Aug 23 at 11:27
3
As @pipe pointed out, the fact that there are filesystems that don't save data into a disk because they don't use a disk to store data, does not decide whether those who do have it may or may not save it directly. However, answer looks interesting
â JuanRocamonde
Aug 23 at 14:32
1
@pipe I'm pretty sure using the term "besserwissering" is besserwissering! Saying that as a German Besserwisser with authority.
â Volker Siegel
Aug 24 at 21:36
add a comment |Â
up vote
14
down vote
There is an extremely simple way to prove that it cannot be true that file edits are always directly saved to disk, namely the fact that there are filesystems that aren't backed by a disk in the first place. If a filesystem doesn't have a disk in the first place, then it cannot possibly write the changes to disk, ever.
Some examples are:
tmpfs
, a file system that only exists in RAM (or more precisely, in the buffer cache)ramfs
, a file system that only exists in RAM
any network file system (NFS, CIFS/SMB, AFS, AFP, â¦)
any virtual filesystem (sysfs
,procfs
,devfs
,shmfs
, â¦)
But even for disk-backed file systems this is usually not true. The page How To Corrupt An SQLite Database has a chapter called Failure to sync which describes many different ways in which writes (in this cases commits to an SQLite database) can fail to arrive on disk. SQLite also has a white paper explaining the many hoops you have to jump through to guarantee Atomic Commit In SQLite. (Note that Atomic Write is a much harder than problem than just Write, but of course writing to disk is sub-problem of atomic writing, and you can learn a lot about that problem, too, from this paper.) This paper has a section on Things That Can Go Wrong which includes a subsection about Incomplete Disk Flushes that give some examples of subtle intricacies that might prevent a write from reaching the disk (such as the HDD controller reporting that it has written to disk when it fact it hasn'tàâ yes, there are HDD manufacturers that do this, and it might even be legal according to the ATA spec, because it is ambiguously worded in this respect).
10
The first part of this answer is just besserwissering about the exact word used. I don't see how it serves any purpose other than to ridicule the user. Obviously a network file system will not write to a local disk but the question still holds there.
â pipe
Aug 23 at 11:27
3
As @pipe pointed out, the fact that there are filesystems that don't save data into a disk because they don't use a disk to store data, does not decide whether those who do have it may or may not save it directly. However, answer looks interesting
â JuanRocamonde
Aug 23 at 14:32
1
@pipe I'm pretty sure using the term "besserwissering" is besserwissering! Saying that as a German Besserwisser with authority.
â Volker Siegel
Aug 24 at 21:36
add a comment |Â
up vote
14
down vote
up vote
14
down vote
There is an extremely simple way to prove that it cannot be true that file edits are always directly saved to disk, namely the fact that there are filesystems that aren't backed by a disk in the first place. If a filesystem doesn't have a disk in the first place, then it cannot possibly write the changes to disk, ever.
Some examples are:
tmpfs
, a file system that only exists in RAM (or more precisely, in the buffer cache)ramfs
, a file system that only exists in RAM
any network file system (NFS, CIFS/SMB, AFS, AFP, â¦)
any virtual filesystem (sysfs
,procfs
,devfs
,shmfs
, â¦)
But even for disk-backed file systems this is usually not true. The page How To Corrupt An SQLite Database has a chapter called Failure to sync which describes many different ways in which writes (in this cases commits to an SQLite database) can fail to arrive on disk. SQLite also has a white paper explaining the many hoops you have to jump through to guarantee Atomic Commit In SQLite. (Note that Atomic Write is a much harder than problem than just Write, but of course writing to disk is sub-problem of atomic writing, and you can learn a lot about that problem, too, from this paper.) This paper has a section on Things That Can Go Wrong which includes a subsection about Incomplete Disk Flushes that give some examples of subtle intricacies that might prevent a write from reaching the disk (such as the HDD controller reporting that it has written to disk when it fact it hasn'tàâ yes, there are HDD manufacturers that do this, and it might even be legal according to the ATA spec, because it is ambiguously worded in this respect).
There is an extremely simple way to prove that it cannot be true that file edits are always directly saved to disk, namely the fact that there are filesystems that aren't backed by a disk in the first place. If a filesystem doesn't have a disk in the first place, then it cannot possibly write the changes to disk, ever.
Some examples are:
tmpfs
, a file system that only exists in RAM (or more precisely, in the buffer cache)ramfs
, a file system that only exists in RAM
any network file system (NFS, CIFS/SMB, AFS, AFP, â¦)
any virtual filesystem (sysfs
,procfs
,devfs
,shmfs
, â¦)
But even for disk-backed file systems this is usually not true. The page How To Corrupt An SQLite Database has a chapter called Failure to sync which describes many different ways in which writes (in this cases commits to an SQLite database) can fail to arrive on disk. SQLite also has a white paper explaining the many hoops you have to jump through to guarantee Atomic Commit In SQLite. (Note that Atomic Write is a much harder than problem than just Write, but of course writing to disk is sub-problem of atomic writing, and you can learn a lot about that problem, too, from this paper.) This paper has a section on Things That Can Go Wrong which includes a subsection about Incomplete Disk Flushes that give some examples of subtle intricacies that might prevent a write from reaching the disk (such as the HDD controller reporting that it has written to disk when it fact it hasn'tàâ yes, there are HDD manufacturers that do this, and it might even be legal according to the ATA spec, because it is ambiguously worded in this respect).
edited Aug 22 at 22:00
ilkkachu
50.4k677138
50.4k677138
answered Aug 22 at 21:25
Jörg W Mittag
1,4011213
1,4011213
10
The first part of this answer is just besserwissering about the exact word used. I don't see how it serves any purpose other than to ridicule the user. Obviously a network file system will not write to a local disk but the question still holds there.
â pipe
Aug 23 at 11:27
3
As @pipe pointed out, the fact that there are filesystems that don't save data into a disk because they don't use a disk to store data, does not decide whether those who do have it may or may not save it directly. However, answer looks interesting
â JuanRocamonde
Aug 23 at 14:32
1
@pipe I'm pretty sure using the term "besserwissering" is besserwissering! Saying that as a German Besserwisser with authority.
â Volker Siegel
Aug 24 at 21:36
add a comment |Â
10
The first part of this answer is just besserwissering about the exact word used. I don't see how it serves any purpose other than to ridicule the user. Obviously a network file system will not write to a local disk but the question still holds there.
â pipe
Aug 23 at 11:27
3
As @pipe pointed out, the fact that there are filesystems that don't save data into a disk because they don't use a disk to store data, does not decide whether those who do have it may or may not save it directly. However, answer looks interesting
â JuanRocamonde
Aug 23 at 14:32
1
@pipe I'm pretty sure using the term "besserwissering" is besserwissering! Saying that as a German Besserwisser with authority.
â Volker Siegel
Aug 24 at 21:36
10
10
The first part of this answer is just besserwissering about the exact word used. I don't see how it serves any purpose other than to ridicule the user. Obviously a network file system will not write to a local disk but the question still holds there.
â pipe
Aug 23 at 11:27
The first part of this answer is just besserwissering about the exact word used. I don't see how it serves any purpose other than to ridicule the user. Obviously a network file system will not write to a local disk but the question still holds there.
â pipe
Aug 23 at 11:27
3
3
As @pipe pointed out, the fact that there are filesystems that don't save data into a disk because they don't use a disk to store data, does not decide whether those who do have it may or may not save it directly. However, answer looks interesting
â JuanRocamonde
Aug 23 at 14:32
As @pipe pointed out, the fact that there are filesystems that don't save data into a disk because they don't use a disk to store data, does not decide whether those who do have it may or may not save it directly. However, answer looks interesting
â JuanRocamonde
Aug 23 at 14:32
1
1
@pipe I'm pretty sure using the term "besserwissering" is besserwissering! Saying that as a German Besserwisser with authority.
â Volker Siegel
Aug 24 at 21:36
@pipe I'm pretty sure using the term "besserwissering" is besserwissering! Saying that as a German Besserwisser with authority.
â Volker Siegel
Aug 24 at 21:36
add a comment |Â
up vote
11
down vote
It is true that most operating systems, including Unix, Linux and Windows use a write cache to speed up operations. That means that turning a computer off without shutting it down is a bad idea and may lead to data loss. The same is true if you remove an USB storage before it is ready to be removed.
Most systems also offer the option to make writes synchronous. That means that the data will be on disk before an application receives a success confirmation, at the cost of being slower.
In short, there is a reason why you should properly shut down your computer and properly prepare USB storage for removal.
Thank you for your reply! Is there a way to force disk writing of a specific file in Linux? Maybe a link to a tutorial or a docs page, even an SE question would just be fine :)
â JuanRocamonde
Aug 22 at 16:55
4
You can force the write of the file with thefsync()
syscall from a program. From a shell, just use thesync
command.
â RalfFriedl
Aug 22 at 17:01
2
There are (or at least were) some filesystems in some versions of Linux, wheresync
was implemented as a no-op. And even for filesystems that do correctly implementsync
, there is still the problem that some disk firmwares implementFLUSH CACHE
as a no-op or immediately return from it and perform it in the background.
â Jörg W Mittag
Aug 22 at 21:46
add a comment |Â
up vote
11
down vote
It is true that most operating systems, including Unix, Linux and Windows use a write cache to speed up operations. That means that turning a computer off without shutting it down is a bad idea and may lead to data loss. The same is true if you remove an USB storage before it is ready to be removed.
Most systems also offer the option to make writes synchronous. That means that the data will be on disk before an application receives a success confirmation, at the cost of being slower.
In short, there is a reason why you should properly shut down your computer and properly prepare USB storage for removal.
Thank you for your reply! Is there a way to force disk writing of a specific file in Linux? Maybe a link to a tutorial or a docs page, even an SE question would just be fine :)
â JuanRocamonde
Aug 22 at 16:55
4
You can force the write of the file with thefsync()
syscall from a program. From a shell, just use thesync
command.
â RalfFriedl
Aug 22 at 17:01
2
There are (or at least were) some filesystems in some versions of Linux, wheresync
was implemented as a no-op. And even for filesystems that do correctly implementsync
, there is still the problem that some disk firmwares implementFLUSH CACHE
as a no-op or immediately return from it and perform it in the background.
â Jörg W Mittag
Aug 22 at 21:46
add a comment |Â
up vote
11
down vote
up vote
11
down vote
It is true that most operating systems, including Unix, Linux and Windows use a write cache to speed up operations. That means that turning a computer off without shutting it down is a bad idea and may lead to data loss. The same is true if you remove an USB storage before it is ready to be removed.
Most systems also offer the option to make writes synchronous. That means that the data will be on disk before an application receives a success confirmation, at the cost of being slower.
In short, there is a reason why you should properly shut down your computer and properly prepare USB storage for removal.
It is true that most operating systems, including Unix, Linux and Windows use a write cache to speed up operations. That means that turning a computer off without shutting it down is a bad idea and may lead to data loss. The same is true if you remove an USB storage before it is ready to be removed.
Most systems also offer the option to make writes synchronous. That means that the data will be on disk before an application receives a success confirmation, at the cost of being slower.
In short, there is a reason why you should properly shut down your computer and properly prepare USB storage for removal.
edited Aug 22 at 16:58
Kusalananda
105k14207325
105k14207325
answered Aug 22 at 16:54
RalfFriedl
3,1051522
3,1051522
Thank you for your reply! Is there a way to force disk writing of a specific file in Linux? Maybe a link to a tutorial or a docs page, even an SE question would just be fine :)
â JuanRocamonde
Aug 22 at 16:55
4
You can force the write of the file with thefsync()
syscall from a program. From a shell, just use thesync
command.
â RalfFriedl
Aug 22 at 17:01
2
There are (or at least were) some filesystems in some versions of Linux, wheresync
was implemented as a no-op. And even for filesystems that do correctly implementsync
, there is still the problem that some disk firmwares implementFLUSH CACHE
as a no-op or immediately return from it and perform it in the background.
â Jörg W Mittag
Aug 22 at 21:46
add a comment |Â
Thank you for your reply! Is there a way to force disk writing of a specific file in Linux? Maybe a link to a tutorial or a docs page, even an SE question would just be fine :)
â JuanRocamonde
Aug 22 at 16:55
4
You can force the write of the file with thefsync()
syscall from a program. From a shell, just use thesync
command.
â RalfFriedl
Aug 22 at 17:01
2
There are (or at least were) some filesystems in some versions of Linux, wheresync
was implemented as a no-op. And even for filesystems that do correctly implementsync
, there is still the problem that some disk firmwares implementFLUSH CACHE
as a no-op or immediately return from it and perform it in the background.
â Jörg W Mittag
Aug 22 at 21:46
Thank you for your reply! Is there a way to force disk writing of a specific file in Linux? Maybe a link to a tutorial or a docs page, even an SE question would just be fine :)
â JuanRocamonde
Aug 22 at 16:55
Thank you for your reply! Is there a way to force disk writing of a specific file in Linux? Maybe a link to a tutorial or a docs page, even an SE question would just be fine :)
â JuanRocamonde
Aug 22 at 16:55
4
4
You can force the write of the file with the
fsync()
syscall from a program. From a shell, just use the sync
command.â RalfFriedl
Aug 22 at 17:01
You can force the write of the file with the
fsync()
syscall from a program. From a shell, just use the sync
command.â RalfFriedl
Aug 22 at 17:01
2
2
There are (or at least were) some filesystems in some versions of Linux, where
sync
was implemented as a no-op. And even for filesystems that do correctly implement sync
, there is still the problem that some disk firmwares implement FLUSH CACHE
as a no-op or immediately return from it and perform it in the background.â Jörg W Mittag
Aug 22 at 21:46
There are (or at least were) some filesystems in some versions of Linux, where
sync
was implemented as a no-op. And even for filesystems that do correctly implement sync
, there is still the problem that some disk firmwares implement FLUSH CACHE
as a no-op or immediately return from it and perform it in the background.â Jörg W Mittag
Aug 22 at 21:46
add a comment |Â
up vote
8
down vote
1. Flash-based storage
Does it depend upon the disk type (traditional hard drives vs. solid-state disks) or any other variable that I might not be aware of? Does it happen (if it does) only in Linux or is this present in other OSes?
When you have a choice, you should not allow flash-based storage to lose power without a clean shutdown.
On low-cost storage like SD cards, you can expect to lose entire erase-blocks (several times larger than 4KB), losing data which could belong to different files or essential structures of the filesystem.
Some expensive SSDs may claim to offer better guarantees in the face of power failure. However third-party testing suggests that many expensive SSDs fail to do so. The layer that remaps blocks for "wear levelling" is complex and proprietary. Possible failures include loss of all data on the drive.
Applying our testing framework, we test 17 commodity SSDs from six different vendors using more than three thousand fault injection cycles in total. Our experimental results reveal that 14 of the 17 tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, metadata corruption, and total device failure.
2017: https://dl.acm.org/citation.cfm?id=2992782&preflayout=flat
2013: https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf?wptouch_preview_theme=enabled
2. Spinning hard disk drives
Spinning HDDs have different characteristics. For safety and simplicity, I recommend assuming they have the same practical uncertainty as flash-based storage.
Unless you have specific evidence, which you clearly don't. I don't have comparative figures for spinning HDDs.
A HDD might leave one incompletely written sector with a bad checksum, which will give us a nice read failure later on. Broadly speaking, this failure mode of HDDs is entirely expected; native Linux filesystems are designed with it in mind. They aim to preserve the contract of fsync()
in the face of this type of power loss fault. (We'd really like to see this guaranteed on SSDs).
However I'm not sure whether Linux filesystems achieve this in all cases, or whether that's even possible.
The next boot after this type of fault may require a filesystem repair. This being Linux, it is possible that the filesystem repair will ask some questions that you do not understand, where you can only press Y and hope that it will sort itself out.
2.1 If you don't know what the fsync() contract is
The fsync() contract is a source of both good news and bad news. You must understand the good news first.
Good news: fsync()
is well-documented as the correct way to write file data e.g. when you hit "save". And it is widely understood that e.g. text editors must replace existing files atomically using rename()
. This is meant to make sure that you always either keep the old file, or get the new file (which was fsync()
ed before the rename). You don't want to be left with a half-written version of the new file.
Bad news: for many years, calling fsync() on the most popular Linux filesystem could effectively leave the whole system hanging for tens of seconds. Since applications can do nothing about this, it was very common to optimistically use rename() without fsync(), which appeared to be relatively reliable on this filesystem.
Therefore, applications exist which do not use fsync() correctly.
The next version of this filesystem generally avoided the fsync() hang - at the same time as it started relying on the correct use of fsync().
This is all pretty bad. Understanding this history is probably not helped by the dismissive tone and invective which was used by many of the conflicting kernel developers.
The current resolution is that the current most popular Linux filesystem defaults to supporting the rename() pattern without requiring fsync() implements "bug-for-bug compatibility" with the previous version. This can be disabled with the mount option noauto_da_alloc
.
This is not a complete protection. Basically it flushes the pending IO at rename() time, but it doesn't wait for the IO to complete before renaming. This is much better than e.g. a 60 second danger window though! See also the answer to Which filesystems require fsync() for crash-safety when replacing an existing file with rename()?
Some less popular filesystems do not provide protection. XFS refuses to do so. And UBIFS has not implemented it either, apparently it could be accepted but needs a lot of work to make it possible. The same page points out that UBIFS has several other "TODO" issues for data integrity, including on power loss. UBIFS is a filesystem used directly on flash storage. I imagine some of the difficulties UBIFS mentions with flash storage could be relevant to the SSD bugs.
add a comment |Â
up vote
8
down vote
1. Flash-based storage
Does it depend upon the disk type (traditional hard drives vs. solid-state disks) or any other variable that I might not be aware of? Does it happen (if it does) only in Linux or is this present in other OSes?
When you have a choice, you should not allow flash-based storage to lose power without a clean shutdown.
On low-cost storage like SD cards, you can expect to lose entire erase-blocks (several times larger than 4KB), losing data which could belong to different files or essential structures of the filesystem.
Some expensive SSDs may claim to offer better guarantees in the face of power failure. However third-party testing suggests that many expensive SSDs fail to do so. The layer that remaps blocks for "wear levelling" is complex and proprietary. Possible failures include loss of all data on the drive.
Applying our testing framework, we test 17 commodity SSDs from six different vendors using more than three thousand fault injection cycles in total. Our experimental results reveal that 14 of the 17 tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, metadata corruption, and total device failure.
2017: https://dl.acm.org/citation.cfm?id=2992782&preflayout=flat
2013: https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf?wptouch_preview_theme=enabled
2. Spinning hard disk drives
Spinning HDDs have different characteristics. For safety and simplicity, I recommend assuming they have the same practical uncertainty as flash-based storage.
Unless you have specific evidence, which you clearly don't. I don't have comparative figures for spinning HDDs.
A HDD might leave one incompletely written sector with a bad checksum, which will give us a nice read failure later on. Broadly speaking, this failure mode of HDDs is entirely expected; native Linux filesystems are designed with it in mind. They aim to preserve the contract of fsync()
in the face of this type of power loss fault. (We'd really like to see this guaranteed on SSDs).
However I'm not sure whether Linux filesystems achieve this in all cases, or whether that's even possible.
The next boot after this type of fault may require a filesystem repair. This being Linux, it is possible that the filesystem repair will ask some questions that you do not understand, where you can only press Y and hope that it will sort itself out.
2.1 If you don't know what the fsync() contract is
The fsync() contract is a source of both good news and bad news. You must understand the good news first.
Good news: fsync()
is well-documented as the correct way to write file data e.g. when you hit "save". And it is widely understood that e.g. text editors must replace existing files atomically using rename()
. This is meant to make sure that you always either keep the old file, or get the new file (which was fsync()
ed before the rename). You don't want to be left with a half-written version of the new file.
Bad news: for many years, calling fsync() on the most popular Linux filesystem could effectively leave the whole system hanging for tens of seconds. Since applications can do nothing about this, it was very common to optimistically use rename() without fsync(), which appeared to be relatively reliable on this filesystem.
Therefore, applications exist which do not use fsync() correctly.
The next version of this filesystem generally avoided the fsync() hang - at the same time as it started relying on the correct use of fsync().
This is all pretty bad. Understanding this history is probably not helped by the dismissive tone and invective which was used by many of the conflicting kernel developers.
The current resolution is that the current most popular Linux filesystem defaults to supporting the rename() pattern without requiring fsync() implements "bug-for-bug compatibility" with the previous version. This can be disabled with the mount option noauto_da_alloc
.
This is not a complete protection. Basically it flushes the pending IO at rename() time, but it doesn't wait for the IO to complete before renaming. This is much better than e.g. a 60 second danger window though! See also the answer to Which filesystems require fsync() for crash-safety when replacing an existing file with rename()?
Some less popular filesystems do not provide protection. XFS refuses to do so. And UBIFS has not implemented it either, apparently it could be accepted but needs a lot of work to make it possible. The same page points out that UBIFS has several other "TODO" issues for data integrity, including on power loss. UBIFS is a filesystem used directly on flash storage. I imagine some of the difficulties UBIFS mentions with flash storage could be relevant to the SSD bugs.
add a comment |Â
up vote
8
down vote
up vote
8
down vote
1. Flash-based storage
Does it depend upon the disk type (traditional hard drives vs. solid-state disks) or any other variable that I might not be aware of? Does it happen (if it does) only in Linux or is this present in other OSes?
When you have a choice, you should not allow flash-based storage to lose power without a clean shutdown.
On low-cost storage like SD cards, you can expect to lose entire erase-blocks (several times larger than 4KB), losing data which could belong to different files or essential structures of the filesystem.
Some expensive SSDs may claim to offer better guarantees in the face of power failure. However third-party testing suggests that many expensive SSDs fail to do so. The layer that remaps blocks for "wear levelling" is complex and proprietary. Possible failures include loss of all data on the drive.
Applying our testing framework, we test 17 commodity SSDs from six different vendors using more than three thousand fault injection cycles in total. Our experimental results reveal that 14 of the 17 tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, metadata corruption, and total device failure.
2017: https://dl.acm.org/citation.cfm?id=2992782&preflayout=flat
2013: https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf?wptouch_preview_theme=enabled
2. Spinning hard disk drives
Spinning HDDs have different characteristics. For safety and simplicity, I recommend assuming they have the same practical uncertainty as flash-based storage.
Unless you have specific evidence, which you clearly don't. I don't have comparative figures for spinning HDDs.
A HDD might leave one incompletely written sector with a bad checksum, which will give us a nice read failure later on. Broadly speaking, this failure mode of HDDs is entirely expected; native Linux filesystems are designed with it in mind. They aim to preserve the contract of fsync()
in the face of this type of power loss fault. (We'd really like to see this guaranteed on SSDs).
However I'm not sure whether Linux filesystems achieve this in all cases, or whether that's even possible.
The next boot after this type of fault may require a filesystem repair. This being Linux, it is possible that the filesystem repair will ask some questions that you do not understand, where you can only press Y and hope that it will sort itself out.
2.1 If you don't know what the fsync() contract is
The fsync() contract is a source of both good news and bad news. You must understand the good news first.
Good news: fsync()
is well-documented as the correct way to write file data e.g. when you hit "save". And it is widely understood that e.g. text editors must replace existing files atomically using rename()
. This is meant to make sure that you always either keep the old file, or get the new file (which was fsync()
ed before the rename). You don't want to be left with a half-written version of the new file.
Bad news: for many years, calling fsync() on the most popular Linux filesystem could effectively leave the whole system hanging for tens of seconds. Since applications can do nothing about this, it was very common to optimistically use rename() without fsync(), which appeared to be relatively reliable on this filesystem.
Therefore, applications exist which do not use fsync() correctly.
The next version of this filesystem generally avoided the fsync() hang - at the same time as it started relying on the correct use of fsync().
This is all pretty bad. Understanding this history is probably not helped by the dismissive tone and invective which was used by many of the conflicting kernel developers.
The current resolution is that the current most popular Linux filesystem defaults to supporting the rename() pattern without requiring fsync() implements "bug-for-bug compatibility" with the previous version. This can be disabled with the mount option noauto_da_alloc
.
This is not a complete protection. Basically it flushes the pending IO at rename() time, but it doesn't wait for the IO to complete before renaming. This is much better than e.g. a 60 second danger window though! See also the answer to Which filesystems require fsync() for crash-safety when replacing an existing file with rename()?
Some less popular filesystems do not provide protection. XFS refuses to do so. And UBIFS has not implemented it either, apparently it could be accepted but needs a lot of work to make it possible. The same page points out that UBIFS has several other "TODO" issues for data integrity, including on power loss. UBIFS is a filesystem used directly on flash storage. I imagine some of the difficulties UBIFS mentions with flash storage could be relevant to the SSD bugs.
1. Flash-based storage
Does it depend upon the disk type (traditional hard drives vs. solid-state disks) or any other variable that I might not be aware of? Does it happen (if it does) only in Linux or is this present in other OSes?
When you have a choice, you should not allow flash-based storage to lose power without a clean shutdown.
On low-cost storage like SD cards, you can expect to lose entire erase-blocks (several times larger than 4KB), losing data which could belong to different files or essential structures of the filesystem.
Some expensive SSDs may claim to offer better guarantees in the face of power failure. However third-party testing suggests that many expensive SSDs fail to do so. The layer that remaps blocks for "wear levelling" is complex and proprietary. Possible failures include loss of all data on the drive.
Applying our testing framework, we test 17 commodity SSDs from six different vendors using more than three thousand fault injection cycles in total. Our experimental results reveal that 14 of the 17 tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, metadata corruption, and total device failure.
2017: https://dl.acm.org/citation.cfm?id=2992782&preflayout=flat
2013: https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf?wptouch_preview_theme=enabled
2. Spinning hard disk drives
Spinning HDDs have different characteristics. For safety and simplicity, I recommend assuming they have the same practical uncertainty as flash-based storage.
Unless you have specific evidence, which you clearly don't. I don't have comparative figures for spinning HDDs.
A HDD might leave one incompletely written sector with a bad checksum, which will give us a nice read failure later on. Broadly speaking, this failure mode of HDDs is entirely expected; native Linux filesystems are designed with it in mind. They aim to preserve the contract of fsync()
in the face of this type of power loss fault. (We'd really like to see this guaranteed on SSDs).
However I'm not sure whether Linux filesystems achieve this in all cases, or whether that's even possible.
The next boot after this type of fault may require a filesystem repair. This being Linux, it is possible that the filesystem repair will ask some questions that you do not understand, where you can only press Y and hope that it will sort itself out.
2.1 If you don't know what the fsync() contract is
The fsync() contract is a source of both good news and bad news. You must understand the good news first.
Good news: fsync()
is well-documented as the correct way to write file data e.g. when you hit "save". And it is widely understood that e.g. text editors must replace existing files atomically using rename()
. This is meant to make sure that you always either keep the old file, or get the new file (which was fsync()
ed before the rename). You don't want to be left with a half-written version of the new file.
Bad news: for many years, calling fsync() on the most popular Linux filesystem could effectively leave the whole system hanging for tens of seconds. Since applications can do nothing about this, it was very common to optimistically use rename() without fsync(), which appeared to be relatively reliable on this filesystem.
Therefore, applications exist which do not use fsync() correctly.
The next version of this filesystem generally avoided the fsync() hang - at the same time as it started relying on the correct use of fsync().
This is all pretty bad. Understanding this history is probably not helped by the dismissive tone and invective which was used by many of the conflicting kernel developers.
The current resolution is that the current most popular Linux filesystem defaults to supporting the rename() pattern without requiring fsync() implements "bug-for-bug compatibility" with the previous version. This can be disabled with the mount option noauto_da_alloc
.
This is not a complete protection. Basically it flushes the pending IO at rename() time, but it doesn't wait for the IO to complete before renaming. This is much better than e.g. a 60 second danger window though! See also the answer to Which filesystems require fsync() for crash-safety when replacing an existing file with rename()?
Some less popular filesystems do not provide protection. XFS refuses to do so. And UBIFS has not implemented it either, apparently it could be accepted but needs a lot of work to make it possible. The same page points out that UBIFS has several other "TODO" issues for data integrity, including on power loss. UBIFS is a filesystem used directly on flash storage. I imagine some of the difficulties UBIFS mentions with flash storage could be relevant to the SSD bugs.
edited Aug 23 at 13:36
answered Aug 23 at 10:12
sourcejedi
19.5k32681
19.5k32681
add a comment |Â
add a comment |Â
up vote
5
down vote
On a lightly loaded system, the kernel will let newly-written file data sit in the page-cache for maybe 30 seconds after a write()
, before flushing it to disk, to optimize for the case where it's deleted or modified again soon.
Linux's dirty_expire_centisecs
defaults to 3000 (30 seconds), and controls how long before newly-written data "expires". (See https://lwn.net/Articles/322823/).
See https://www.kernel.org/doc/Documentation/sysctl/vm.txt for more related tunables, and google for lots more. (e.g. google on dirty_writeback_centisecs
).
The Linux default for /proc/sys/vm/dirty_writeback_centisecs
is 500 (5 seconds), and PowerTop recommends setting it to 1500 (15 seconds) to reduce power consumption.
Delayed write-back also gives time for the kernel to see how big a file will be, before starting to write it to disk. Filesystems with delayed allocation (like XFS, and probably others these days) don't even choose where on disk to put a newly-written file's data until necessary, separately from allocating space for the inode itself. This reduces fragmentation by letting them avoid putting the start of a large file in a 1 meg gap between other files, for example.
If lots of data is being written, then writeback to disk can be triggered by a threshold for how much dirty (not yet synced to disk) data can be in the pagecache.
If you aren't doing much else, though, your hard-drive activity light won't go on for 5 (or 15) seconds after hitting save on a small file.
If your editor used fsync()
after writing the file, the kernel will write it to disk without delay. (And fsync
won't return until the data has actually been sent to disk).
Write caching within the disk can also be a thing, but disks normally try to commit their write-cache to permanent storage ASAP, unlike Linux's page-cache algorithms. Disk write caches are more of a store buffer to absorb small bursts of writes, but maybe also to delay writes in favour of reads, and give the disks firmware room to optimize a seek pattern (e.g. do two nearby writes or reads instead of doing one, then seeking far away, then seeking back.)
On a rotating (magnetic) disk, you might see a few seek delays of 7 to 10 ms each before data from a SATA write command is actually safe from power-off, if there were pending reads/writes ahead of your write. (Some other answers on this question go into more detail about disk write caches and write barriers that journalled FSes can use to avoid corruption.)
add a comment |Â
up vote
5
down vote
On a lightly loaded system, the kernel will let newly-written file data sit in the page-cache for maybe 30 seconds after a write()
, before flushing it to disk, to optimize for the case where it's deleted or modified again soon.
Linux's dirty_expire_centisecs
defaults to 3000 (30 seconds), and controls how long before newly-written data "expires". (See https://lwn.net/Articles/322823/).
See https://www.kernel.org/doc/Documentation/sysctl/vm.txt for more related tunables, and google for lots more. (e.g. google on dirty_writeback_centisecs
).
The Linux default for /proc/sys/vm/dirty_writeback_centisecs
is 500 (5 seconds), and PowerTop recommends setting it to 1500 (15 seconds) to reduce power consumption.
Delayed write-back also gives time for the kernel to see how big a file will be, before starting to write it to disk. Filesystems with delayed allocation (like XFS, and probably others these days) don't even choose where on disk to put a newly-written file's data until necessary, separately from allocating space for the inode itself. This reduces fragmentation by letting them avoid putting the start of a large file in a 1 meg gap between other files, for example.
If lots of data is being written, then writeback to disk can be triggered by a threshold for how much dirty (not yet synced to disk) data can be in the pagecache.
If you aren't doing much else, though, your hard-drive activity light won't go on for 5 (or 15) seconds after hitting save on a small file.
If your editor used fsync()
after writing the file, the kernel will write it to disk without delay. (And fsync
won't return until the data has actually been sent to disk).
Write caching within the disk can also be a thing, but disks normally try to commit their write-cache to permanent storage ASAP, unlike Linux's page-cache algorithms. Disk write caches are more of a store buffer to absorb small bursts of writes, but maybe also to delay writes in favour of reads, and give the disks firmware room to optimize a seek pattern (e.g. do two nearby writes or reads instead of doing one, then seeking far away, then seeking back.)
On a rotating (magnetic) disk, you might see a few seek delays of 7 to 10 ms each before data from a SATA write command is actually safe from power-off, if there were pending reads/writes ahead of your write. (Some other answers on this question go into more detail about disk write caches and write barriers that journalled FSes can use to avoid corruption.)
add a comment |Â
up vote
5
down vote
up vote
5
down vote
On a lightly loaded system, the kernel will let newly-written file data sit in the page-cache for maybe 30 seconds after a write()
, before flushing it to disk, to optimize for the case where it's deleted or modified again soon.
Linux's dirty_expire_centisecs
defaults to 3000 (30 seconds), and controls how long before newly-written data "expires". (See https://lwn.net/Articles/322823/).
See https://www.kernel.org/doc/Documentation/sysctl/vm.txt for more related tunables, and google for lots more. (e.g. google on dirty_writeback_centisecs
).
The Linux default for /proc/sys/vm/dirty_writeback_centisecs
is 500 (5 seconds), and PowerTop recommends setting it to 1500 (15 seconds) to reduce power consumption.
Delayed write-back also gives time for the kernel to see how big a file will be, before starting to write it to disk. Filesystems with delayed allocation (like XFS, and probably others these days) don't even choose where on disk to put a newly-written file's data until necessary, separately from allocating space for the inode itself. This reduces fragmentation by letting them avoid putting the start of a large file in a 1 meg gap between other files, for example.
If lots of data is being written, then writeback to disk can be triggered by a threshold for how much dirty (not yet synced to disk) data can be in the pagecache.
If you aren't doing much else, though, your hard-drive activity light won't go on for 5 (or 15) seconds after hitting save on a small file.
If your editor used fsync()
after writing the file, the kernel will write it to disk without delay. (And fsync
won't return until the data has actually been sent to disk).
Write caching within the disk can also be a thing, but disks normally try to commit their write-cache to permanent storage ASAP, unlike Linux's page-cache algorithms. Disk write caches are more of a store buffer to absorb small bursts of writes, but maybe also to delay writes in favour of reads, and give the disks firmware room to optimize a seek pattern (e.g. do two nearby writes or reads instead of doing one, then seeking far away, then seeking back.)
On a rotating (magnetic) disk, you might see a few seek delays of 7 to 10 ms each before data from a SATA write command is actually safe from power-off, if there were pending reads/writes ahead of your write. (Some other answers on this question go into more detail about disk write caches and write barriers that journalled FSes can use to avoid corruption.)
On a lightly loaded system, the kernel will let newly-written file data sit in the page-cache for maybe 30 seconds after a write()
, before flushing it to disk, to optimize for the case where it's deleted or modified again soon.
Linux's dirty_expire_centisecs
defaults to 3000 (30 seconds), and controls how long before newly-written data "expires". (See https://lwn.net/Articles/322823/).
See https://www.kernel.org/doc/Documentation/sysctl/vm.txt for more related tunables, and google for lots more. (e.g. google on dirty_writeback_centisecs
).
The Linux default for /proc/sys/vm/dirty_writeback_centisecs
is 500 (5 seconds), and PowerTop recommends setting it to 1500 (15 seconds) to reduce power consumption.
Delayed write-back also gives time for the kernel to see how big a file will be, before starting to write it to disk. Filesystems with delayed allocation (like XFS, and probably others these days) don't even choose where on disk to put a newly-written file's data until necessary, separately from allocating space for the inode itself. This reduces fragmentation by letting them avoid putting the start of a large file in a 1 meg gap between other files, for example.
If lots of data is being written, then writeback to disk can be triggered by a threshold for how much dirty (not yet synced to disk) data can be in the pagecache.
If you aren't doing much else, though, your hard-drive activity light won't go on for 5 (or 15) seconds after hitting save on a small file.
If your editor used fsync()
after writing the file, the kernel will write it to disk without delay. (And fsync
won't return until the data has actually been sent to disk).
Write caching within the disk can also be a thing, but disks normally try to commit their write-cache to permanent storage ASAP, unlike Linux's page-cache algorithms. Disk write caches are more of a store buffer to absorb small bursts of writes, but maybe also to delay writes in favour of reads, and give the disks firmware room to optimize a seek pattern (e.g. do two nearby writes or reads instead of doing one, then seeking far away, then seeking back.)
On a rotating (magnetic) disk, you might see a few seek delays of 7 to 10 ms each before data from a SATA write command is actually safe from power-off, if there were pending reads/writes ahead of your write. (Some other answers on this question go into more detail about disk write caches and write barriers that journalled FSes can use to avoid corruption.)
edited Aug 24 at 16:25
answered Aug 24 at 14:21
Peter Cordes
4,0631032
4,0631032
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f464184%2fare-file-edits-in-linux-directly-saved-into-disk%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
8
FAO: Close Vote queue reviewers. This is not a request for learning materials. See unix.meta.stackexchange.com/q/3892/22812
â Anthony Geoghegan
Aug 22 at 21:06
2
The cache is opaque to the user, in the best case you must
sync
, and applications mustflush
to guarantee caches are written back, but even a sucessfullsync
does not guarantee write back to physical disk only that kernel caches are flushed to disk, which may have latency in the driver or disk hardware (e.g. on-drive cache that you lose)â crasic
Aug 23 at 2:41
1
While I don't agree that it's a request for learning materials, I do think the question is a little Broad in its current form. Limit the scope to Linux distributions (or whatever specific OS) and possibly limit it to certain storage technologies and filesystems.
â Jeff Schaller
Aug 23 at 12:21
3
As @AnthonyGeoghegan pointed out, I don't consider this question a request for learning materials. I think it's rather specific; I did not ask for a long and deep explanation or a manual about Linux filesystems; only about a brief idea that I wanted to clear out.
â JuanRocamonde
Aug 23 at 13:19
3
It is true that as it is it may be a bit broad, @JeffSchaller; I'm going to try to edit it a bit; however, honestly if the site is not for this type of questions, that directly address Linux functioning, then what is it for?
â JuanRocamonde
Aug 23 at 13:20