Message ID | 5587EF5F.90207@nod.at (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jun 22, 2015 at 01:19:59PM +0200, Richard Weinberger wrote: > > > The bottome lins is that if you care about files being written, you > > need to use fsync(). Should git use fsync() by default? Well, if you > > are willing to accept that if your system crashes within a second or > > so of your last git operation, you might need to run "git fsck" and > > potentially recover from a busted repo, maybe speed is more important > > for you (and git is known for its speed/performance, after all. :-) I made a typo in the above. s/second/minute/. (Linux's writeback timer is 30 seconds, but if the disk is busy it might take a bit longer to get all of the data blocks written out to disk and committed.) > I think core.fsyncObjectFiles documentation really needs an update. > What about this one? > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 43bb53c..b08fa11 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -693,10 +693,16 @@ core.whitespace:: > core.fsyncObjectFiles:: > This boolean will enable 'fsync()' when writing object files. > + > -This is a total waste of time and effort on a filesystem that orders > -data writes properly, but can be useful for filesystems that do not use > -journalling (traditional UNIX filesystems) or that only journal metadata > -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). > +For performance reasons git does not call 'fsync()' after writing object > +files. This means that after a power cut your git repository can get > +corrupted as not all data hit the storage media. Especially on modern > +filesystems like ext4, xfs or btrfs this can happen very easily. > +If you have to face power cuts and care about your data it is strongly > +recommended to enable this setting. > +Please note that git's behavior used to be safe on ext3 with data=ordered, > +for any other filesystems or mount settings this is not the case as > +POSIX clearly states that you have to call 'fsync()' to make sure that > +all data is written. My main complaint about this is that it's a bit Linux-centric. For example, the fact that fsync(2) is needed to push data out of the cache is also true for MacOS (and indeed all other Unix systems going back three decades) as well as Windows. In fact, it's not a matter of "POSIX says", but "POSIX documented", but since standards are held in high esteem, it's sometimes a bit more convenient to use them as an appeal to authority. :-) (Ext3's data=ordered behaviour is an outlier, and in fact, the reason why it mostly safe to skip fsync(2) calls when using ext3 data=ordered was an accidental side effect of another problem which was trying to solve based on the relatively primitive way it handled block allocation.) Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
diff --git a/Documentation/config.txt b/Documentation/config.txt index 43bb53c..b08fa11 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -693,10 +693,16 @@ core.whitespace:: core.fsyncObjectFiles:: This boolean will enable 'fsync()' when writing object files. + -This is a total waste of time and effort on a filesystem that orders -data writes properly, but can be useful for filesystems that do not use -journalling (traditional UNIX filesystems) or that only journal metadata -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). +For performance reasons git does not call 'fsync()' after writing object +files. This means that after a power cut your git repository can get +corrupted as not all data hit the storage media. Especially on modern +filesystems like ext4, xfs or btrfs this can happen very easily. +If you have to face power cuts and care about your data it is strongly +recommended to enable this setting. +Please note that git's behavior used to be safe on ext3 with data=ordered, +for any other filesystems or mount settings this is not the case as +POSIX clearly states that you have to call 'fsync()' to make sure that +all data is written. core.preloadIndex:: Enable parallel index preload for operations like 'git diff'