Message ID | BANLkTikoMZCFzA6jUw+ddHZ3x2cvO_NzhA@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jun 16, 2011 at 12:57:36PM +0300, Pekka Enberg wrote:
> Uh-oh. Someone needs to apply this patch to sync_file_range():
There actually are a few cases where using it makes sense. It's just
the minority.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
* Christoph Hellwig <hch@infradead.org> wrote: > On Thu, Jun 16, 2011 at 12:57:36PM +0300, Pekka Enberg wrote: > > Uh-oh. Someone needs to apply this patch to sync_file_range(): > > There actually are a few cases where using it makes sense. [...] Such as? I don't think apps can actually know whether disk blocks have been 'instantiated' by a particular filesystem or not, so the manpage: Some details None of these operations write out the file’s metadata. Therefore, unless the appli- cation is strictly performing overwrites of already-instantiated disk blocks, there are no guarantees that the data will be available after a crash. is rather misleading. This is a dangerous (and rather pointless) syscall and this should be made much clearer in the manpage. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 16, 2011 at 01:22:30PM +0200, Ingo Molnar wrote: > Such as? I don't think apps can actually know whether disk blocks > have been 'instantiated' by a particular filesystem or not, so the > manpage: In general they can't. The only good use case for sync_file_range is to paper over^H^H^H^H^H^H^H^H^Hcontrol write back behaviour. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Christoph Hellwig <hch@infradead.org> wrote: > On Thu, Jun 16, 2011 at 01:22:30PM +0200, Ingo Molnar wrote: > > Such as? I don't think apps can actually know whether disk blocks > > have been 'instantiated' by a particular filesystem or not, so > > the manpage: > > In general they can't. The only good use case for sync_file_range > is to paper over^H^H^H^H^H^H^H^H^Hcontrol write back behaviour. Well, if overwrite is fundamentally safe on a filesystem (which is most of them) then sync_file_range() would work - and it has the big advantage that it's a pretty simple facility. Filesystems that cannot guarantee that should map their sync_file_range() implementation to fdatasync() or so, right? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 16, 2011 at 01:40:45PM +0200, Ingo Molnar wrote: > Filesystems that cannot guarantee that should map their > sync_file_range() implementation to fdatasync() or so, right? Filesystems aren't even told about sync_file_range, it's purely a VM thing, which is the root of the problem. In-kernel we have all the infrastructure for a real ranged fsync/fdatasync, and once we get a killer users for that can triviall export it at the syscall level. I don't think mapping sync_file_range with it's weird set of flags and confusing behaviour to it is a good idea, though. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/16/2011 07:22 AM, Ingo Molnar wrote: > > * Christoph Hellwig<hch@infradead.org> wrote: > >> On Thu, Jun 16, 2011 at 12:57:36PM +0300, Pekka Enberg wrote: >>> Uh-oh. Someone needs to apply this patch to sync_file_range(): >> >> There actually are a few cases where using it makes sense. [...] > > Such as? I don't think apps can actually know whether disk blocks > have been 'instantiated' by a particular filesystem or not, so the > manpage: > > Some details > None of these operations write out the file’s metadata. Therefore, unless the appli- > cation is strictly performing overwrites of already-instantiated disk blocks, there > are no guarantees that the data will be available after a crash. > > is rather misleading. This is a dangerous (and rather pointless) > syscall and this should be made much clearer in the manpage. Not pointless at all -- see Linus's sync_file_range() examples in "Re: Unexpected splice "always copy" behavior observed" thread from May 2010. Apps like MythTV may use it for streaming data to disk, basically shoving the VM out of the way to give the app more fine-grained writeout control. Just don't mistake sync_file_range() for a data integrity syscall. Jeff -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/sync.c b/fs/sync.c index ba76b96..32078aa 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -277,6 +277,8 @@ SYSCALL_DEFINE(sync_file_range)(int fd, loff_t offset, loff_t nbytes, int fput_needed; umode_t i_mode; + WARN_ONCE(1, "when this breaks, you get to keep both pieces"); + ret = -EINVAL; if (flags & ~VALID_FLAGS)