Message ID | 20220305124636.2002383-2-chenxiaosong2@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | nfs: check writeback errors correctly | expand |
On Sat, 2022-03-05 at 20:46 +0800, ChenXiaoSong wrote: > filemap_sample_wb_err() will return 0 if nobody has seen the error > yet, > then filemap_check_wb_err() will return the unchanged writeback > error. > > Reproducer: > nfs server | nfs client > --------------------------------|----------------------------------- > ------------ > # No space left on server | > fallocate -l 100G /server/file1 | > | mount -t nfs $nfs_server_ip:/ /mnt > | # Expected error: No space left on > device > | dd if=/dev/zero of=/mnt/file2 > count=1 ibs=100K > # Release space on server | > rm /server/file1 | > | # Unexpected error: No space left > on device > | dd if=/dev/zero of=/mnt/file2 > count=1 ibs=100K > 'rm' doesn't open any files or do any I/O, so it shouldn't be returning any errors from the page cache. IOW: The problem here is not that we're failing to clear an error from the page cache. It is that something in 'rm' is checking the page cache and returning any errors that it finds there. Is 'rm' perhaps doing a stat() on the file it is deleting? If so, does this patch fix the bug? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/comm it/?id=d19e0183a883
It would be more clear if I update the reproducer like this: nfs server | nfs client --------------------------------- |--------------------------------- # No space left on server | fallocate -l 100G /server/nospace | | mount -t nfs $nfs_server_ip:/ /mnt | | # Expected error | dd if=/dev/zero of=/mnt/file | | # Release space on mountpoint | rm /mnt/nospace | | # Unexpected error | dd if=/dev/zero of=/mnt/file The Unexpected error (No space left on device) when doing second `dd`, is from unconsumed writeback error after close() the file when doing first `dd`. There is enough space when doing second `dd`, we should not report the nospace error. We should report and consume the writeback error when userspace call close()->flush(), the writeback error should not be left for next open(). Currently, fsync() will consume the writeback error while calling file_check_and_advance_wb_err(), close()->flush() should also consume the writeback error. 在 2022/3/6 0:53, Trond Myklebust 写道: > 'rm' doesn't open any files or do any I/O, so it shouldn't be returning > any errors from the page cache. > > IOW: The problem here is not that we're failing to clear an error from > the page cache. It is that something in 'rm' is checking the page cache > and returning any errors that it finds there. > > Is 'rm' perhaps doing a stat() on the file it is deleting? If so, does > this patch fix the bug? > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/comm > it/?id=d19e0183a883 >
On Sun, 2022-03-06 at 11:54 +0800, chenxiaosong (A) wrote: > It would be more clear if I update the reproducer like this: > > nfs server | nfs client > --------------------------------- |-------------------------------- > - > # No space left on server | > fallocate -l 100G /server/nospace | > | mount -t nfs $nfs_server_ip:/ > /mnt > | > | # Expected error > | dd if=/dev/zero of=/mnt/file > | > | # Release space on mountpoint > | rm /mnt/nospace > | > | # Unexpected error > | dd if=/dev/zero of=/mnt/file > > The Unexpected error (No space left on device) when doing second > `dd`, > is from unconsumed writeback error after close() the file when doing > first `dd`. There is enough space when doing second `dd`, we should > not > report the nospace error. > > We should report and consume the writeback error when userspace call > close()->flush(), the writeback error should not be left for next > open(). > > Currently, fsync() will consume the writeback error while calling > file_check_and_advance_wb_err(), close()->flush() should also consume > the writeback error. No. That's not part of the API of any other filesystem. Why should we make an exception for NFS? The problem here seems to be the way that filemap_sample_wb_err() is defined, and how it initialises f->f_wb_err in the function do_dentry_open(). It is designed to do exactly what you see above.
On Sun, 2022-03-06 at 09:04 -0500, Trond Myklebust wrote: > On Sun, 2022-03-06 at 11:54 +0800, chenxiaosong (A) wrote: > > It would be more clear if I update the reproducer like this: > > > > nfs server | nfs client > > --------------------------------- |------------------------------ > > -- > > - > > # No space left on server | > > fallocate -l 100G /server/nospace | > > | mount -t nfs $nfs_server_ip:/ > > /mnt > > | > > | # Expected error > > | dd if=/dev/zero of=/mnt/file > > | > > | # Release space on mountpoint > > | rm /mnt/nospace > > | > > | # Unexpected error > > | dd if=/dev/zero of=/mnt/file > > > > The Unexpected error (No space left on device) when doing second > > `dd`, > > is from unconsumed writeback error after close() the file when > > doing > > first `dd`. There is enough space when doing second `dd`, we should > > not > > report the nospace error. > > > > We should report and consume the writeback error when userspace > > call > > close()->flush(), the writeback error should not be left for next > > open(). > > > > Currently, fsync() will consume the writeback error while calling > > file_check_and_advance_wb_err(), close()->flush() should also > > consume > > the writeback error. > > No. That's not part of the API of any other filesystem. Why should we > make an exception for NFS? > > The problem here seems to be the way that filemap_sample_wb_err() is > defined, and how it initialises f->f_wb_err in the function > do_dentry_open(). It is designed to do exactly what you see above. > Just to clarify a little. I don't see a need to consume the writeback errors on close(), unless other filesystems do the same. If the intention is that fsync() should see _all_ errors that haven't already been seen, then NFS should follow the same semantics as all the other filesystems. However, if that is the case (i.e. if the semantics for filemap_sample_wb_err() are deliberate, and the intention is that open() should behave as it does today) then we should fix the various instances of filemap_sample_wb_err() / filemap_check_wb_err() in the NFS and nfsd code to ignore the old errors. Their intention is definitely to only report errors that occur in the timeframe between the two calls.
在 2022/3/6 23:08, Trond Myklebust 写道: > > Just to clarify a little. > > I don't see a need to consume the writeback errors on close(), unless > other filesystems do the same. If the intention is that fsync() should > see _all_ errors that haven't already been seen, then NFS should follow > the same semantics as all the other filesystems. > Other filesystem will _not_ clear writeback error on close(). And other filesystem will _not_ clear writeback error on async write() too. Other filesystem _only_ clear writeback error on fsync() or sync write(). Should NFS follow the same semantics as all the other filesystems?
On Tue, 2022-04-12 at 21:46 +0800, chenxiaosong (A) wrote: > 在 2022/3/6 23:08, Trond Myklebust 写道: > > > > Just to clarify a little. > > > > I don't see a need to consume the writeback errors on close(), > > unless > > other filesystems do the same. If the intention is that fsync() > > should > > see _all_ errors that haven't already been seen, then NFS should > > follow > > the same semantics as all the other filesystems. > > > > Other filesystem will _not_ clear writeback error on close(). > And other filesystem will _not_ clear writeback error on async > write() too. > > Other filesystem _only_ clear writeback error on fsync() or sync > write(). > Yes. We might even consider not reporting writeback errors at all in close(), since most developers don't check it. We certainly don't want to clear those errors there because the manpages don't document that as being the case. > Should NFS follow the same semantics as all the other filesystems? It needs to follow the semantics described in the manpage for write(2) and fsync(2) as closely as possible, yes. That documentation is supposed to be normative for application developers. We won't guarantee to immediately report ENOSPC like other filesystems do (because that would require us to only support synchronous writes), however that behaviour is already documented in the manpage. We may also report some errors that are not documented in the manpage (e.g. EACCES or EROFS) simply because those errors cannot always be reported at open() time, as would be the case for a local filesystem. That's just how the NFS protocol works (particularly for the case of the stateless NFSv3 protocol).
在 2022/4/12 21:56, Trond Myklebust 写道: > On Tue, 2022-04-12 at 21:46 +0800, chenxiaosong (A) wrote: >> >> Other filesystem will _not_ clear writeback error on close(). >> And other filesystem will _not_ clear writeback error on async >> write() too. >> >> Other filesystem _only_ clear writeback error on fsync() or sync >> write(). >> > > Yes. We might even consider not reporting writeback errors at all in > close(), since most developers don't check it. We certainly don't want > to clear those errors there because the manpages don't document that as > being the case. > >> Should NFS follow the same semantics as all the other filesystems? > > It needs to follow the semantics described in the manpage for write(2) > and fsync(2) as closely as possible, yes. That documentation is > supposed to be normative for application developers. > > We won't guarantee to immediately report ENOSPC like other filesystems > do (because that would require us to only support synchronous writes), > however that behaviour is already documented in the manpage. > > We may also report some errors that are not documented in the manpage > (e.g. EACCES or EROFS) simply because those errors cannot always be > reported at open() time, as would be the case for a local filesystem. > That's just how the NFS protocol works (particularly for the case of > the stateless NFSv3 protocol). > After merging your patchset, NFS will clear wb error on async write(), is this reasonable? And more importantly, we can not detect new error by using filemap_sample_wb_err()/filemap_sample_wb_err() while nfs_wb_all(),just as I described: ```c since = filemap_sample_wb_err() = 0 errseq_sample if (!(old & ERRSEQ_SEEN)) // nobody see the error return 0; nfs_wb_all // no new error error = filemap_check_wb_err(..., since) != 0 // unexpected error ```
On Tue, 2022-04-12 at 22:12 +0800, chenxiaosong (A) wrote: > 在 2022/4/12 21:56, Trond Myklebust 写道: > > On Tue, 2022-04-12 at 21:46 +0800, chenxiaosong (A) wrote: > > > > > > Other filesystem will _not_ clear writeback error on close(). > > > And other filesystem will _not_ clear writeback error on async > > > write() too. > > > > > > Other filesystem _only_ clear writeback error on fsync() or sync > > > write(). > > > > > > > Yes. We might even consider not reporting writeback errors at all > > in > > close(), since most developers don't check it. We certainly don't > > want > > to clear those errors there because the manpages don't document > > that as > > being the case. > > > > > Should NFS follow the same semantics as all the other > > > filesystems? > > > > It needs to follow the semantics described in the manpage for > > write(2) > > and fsync(2) as closely as possible, yes. That documentation is > > supposed to be normative for application developers. > > > > We won't guarantee to immediately report ENOSPC like other > > filesystems > > do (because that would require us to only support synchronous > > writes), > > however that behaviour is already documented in the manpage. > > > > We may also report some errors that are not documented in the > > manpage > > (e.g. EACCES or EROFS) simply because those errors cannot always be > > reported at open() time, as would be the case for a local > > filesystem. > > That's just how the NFS protocol works (particularly for the case > > of > > the stateless NFSv3 protocol). > > > > After merging your patchset, NFS will clear wb error on async > write(), > is this reasonable? > It will clear ENOSPC, EDQUOT and EFBIG. It should not clear other errors that are not supposed to be reported by write(). > And more importantly, we can not detect new error by using > filemap_sample_wb_err()/filemap_sample_wb_err() while > nfs_wb_all(),just > as I described: > > ```c > since = filemap_sample_wb_err() = 0 > errseq_sample > if (!(old & ERRSEQ_SEEN)) // nobody see the error > return 0; > nfs_wb_all // no new error > error = filemap_check_wb_err(..., since) != 0 // unexpected error > ``` As I keep repeating, that is _documented behaviour_!
在 2022/4/12 22:27, Trond Myklebust 写道: > > It will clear ENOSPC, EDQUOT and EFBIG. It should not clear other > errors that are not supposed to be reported by write(). > > As I keep repeating, that is _documented behaviour_! > Hi Trond: You may mean that write(2) manpage described: > Since Linux 4.13, errors from write-back come with a promise that > they may be reported by subsequent. write(2) requests, and will be > reported by a subsequent fsync(2) (whether or not they were also > reported by write(2)). The manpage mentioned that "reported by a subsequent fsync(2)", your patch[1] clear the wb err on _async_ write(), and wb err will _not_ be reported by subsequent fsync(2), is it documented behaviour? All other filesystems will _not_ clear any wb err on _async_ write(). [1] https://patchwork.kernel.org/project/linux-nfs/patch/20220411213346.762302-4-trondmy@kernel.org/
在 2022/4/20 16:50, chenxiaosong (A) 写道: > 在 2022/4/12 22:27, Trond Myklebust 写道: > >> >> It will clear ENOSPC, EDQUOT and EFBIG. It should not clear other >> errors that are not supposed to be reported by write(). >> >> As I keep repeating, that is _documented behaviour_! >> > > Hi Trond: > > You may mean that write(2) manpage described: > >> Since Linux 4.13, errors from write-back come with a promise that >> they may be reported by subsequent. write(2) requests, and will be >> reported by a subsequent fsync(2) (whether or not they were also >> reported by write(2)). > > The manpage mentioned that "reported by a subsequent fsync(2)", your > patch[1] clear the wb err on _async_ write(), and wb err will _not_ be > reported by subsequent fsync(2), is it documented behaviour? > > All other filesystems will _not_ clear any wb err on _async_ write(). > > [1] > https://patchwork.kernel.org/project/linux-nfs/patch/20220411213346.762302-4-trondmy@kernel.org/ > Hi Trond: write(2) manpage described: > On some filesystems, including NFS, it does not even guarantee that > space has successfully been reserved for the data. In this case, some > errors might be delayed until a future write(2), fsync(2), or even > close(2). The only way to be sure is to call fsync(2) after you are > done writing all your data. Maybe it mean that: writeback errors of NFS is delayed until future sync write() and fsync(), because sync write() will call fsync(). We all agreed that close()->flush() should not clear writeback errors, should async write() do the same thing like other filesystems?
diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 76d76acbc594..83d63bce9596 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -141,7 +141,6 @@ static int nfs_file_flush(struct file *file, fl_owner_t id) { struct inode *inode = file_inode(file); - errseq_t since; dprintk("NFS: flush(%pD2)\n", file); @@ -150,9 +149,8 @@ nfs_file_flush(struct file *file, fl_owner_t id) return 0; /* Flush writes to the server and return any errors */ - since = filemap_sample_wb_err(file->f_mapping); nfs_wb_all(inode); - return filemap_check_wb_err(file->f_mapping, since); + return file_check_and_advance_wb_err(file); } ssize_t diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index e79ae4cbc395..63a57e5b6db7 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -111,7 +111,6 @@ static int nfs4_file_flush(struct file *file, fl_owner_t id) { struct inode *inode = file_inode(file); - errseq_t since; dprintk("NFS: flush(%pD2)\n", file); @@ -127,9 +126,8 @@ nfs4_file_flush(struct file *file, fl_owner_t id) return filemap_fdatawrite(file->f_mapping); /* Flush writes to the server and return any errors */ - since = filemap_sample_wb_err(file->f_mapping); nfs_wb_all(inode); - return filemap_check_wb_err(file->f_mapping, since); + return file_check_and_advance_wb_err(file); } #ifdef CONFIG_NFS_V4_2
filemap_sample_wb_err() will return 0 if nobody has seen the error yet, then filemap_check_wb_err() will return the unchanged writeback error. Reproducer: nfs server | nfs client --------------------------------|----------------------------------------------- # No space left on server | fallocate -l 100G /server/file1 | | mount -t nfs $nfs_server_ip:/ /mnt | # Expected error: No space left on device | dd if=/dev/zero of=/mnt/file2 count=1 ibs=100K # Release space on server | rm /server/file1 | | # Unexpected error: No space left on device | dd if=/dev/zero of=/mnt/file2 count=1 ibs=100K Fix this by using file_check_and_advance_wb_err(). If there is an error during the writeback process, it should be returned when user space calls close() or fsync(), flush() is called when user space calls close(). Note that fsync() will not be called after close(). Fixes: 67dd23f9e6fb ("nfs: ensure correct writeback errors are returned on close()") Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> --- fs/nfs/file.c | 4 +--- fs/nfs/nfs4file.c | 4 +--- 2 files changed, 2 insertions(+), 6 deletions(-)