[v2,0/4] Handling NFSv3 I/O errors in knfsd

Message ID	20190902170258.92522-1-trond.myklebust@hammerspace.com (mailing list archive)
Headers	show Return-Path: <SRS0=i+Nf=W5=vger.kernel.org=linux-nfs-owner@kernel.org> From: Trond Myklebust <trondmy@gmail.com> To: "J.Bruce Fields" <bfields@redhat.com> Cc: linux-nfs@vger.kernel.org Subject: [PATCH v2 0/4] Handling NFSv3 I/O errors in knfsd Date: Mon, 2 Sep 2019 13:02:54 -0400 Message-Id: <20190902170258.92522-1-trond.myklebust@hammerspace.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk
Series	Handling NFSv3 I/O errors in knfsd \| expand [v2,0/4] Handling NFSv3 I/O errors in knfsd [v2,1/4] nfsd: nfsd_file cache entries should be per net namespace [v2,2/4] nfsd: Support the server resetting the boot verifier [v2,3/4] nfsd: Don't garbage collect files that might contain write errors [v2,4/4] nfsd: Reset the boot verifier on all write I/O errors

Message ID

20190902170258.92522-1-trond.myklebust@hammerspace.com (mailing list archive)

Headers

From: Trond Myklebust <trondmy@gmail.com>
To: "J.Bruce Fields" <bfields@redhat.com>
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH v2 0/4] Handling NFSv3 I/O errors in knfsd
Date: Mon,  2 Sep 2019 13:02:54 -0400
Message-Id: <20190902170258.92522-1-trond.myklebust@hammerspace.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk

Series

Handling NFSv3 I/O errors in knfsd | expand

Message

Trond Myklebust Sept. 2, 2019, 5:02 p.m. UTC

Recently, a number of changes went into the kernel to try to ensure
that I/O errors (specifically write errors) are reported to the
application once and only once. The vehicle for ensuring the errors
are reported is the struct file, which uses the 'f_wb_err' field to
track which errors have been reported.

The problem is that errors are mainly intended to be reported through
fsync(). If the client is doing synchronous writes, then all is well,
but if it is doing unstable writes, then the errors may not be
reported until the client calls COMMIT. If the file cache has
thrown out the struct file, due to memory pressure, or just because
the client took a long while between the last WRITE and the COMMIT,
then the error report may be lost, and the client may just think
its data is safely stored.

Note that the problem is compounded by the fact that NFSv3 is stateless,
so the server never knows that the client may have rebooted, so there
can be no guarantee that a COMMIT will ever be sent.

The following patch set attempts to remedy the situation using 2
strategies:

1) If the inode is dirty, then avoid garbage collecting the file
   from the file cache.
2) If the file is closed, and we see that it would have reported
   an error to COMMIT, then we bump the boot verifier in order to
   ensure the client retransmits all its writes.

Note that if multiple clients were writing to the same file, then
we probably want to bump the boot verifier anyway, since only one
COMMIT will see the error report (because the cached file is also
shared).

So in order to implement the above strategy, we first have to do
the following: split up the file cache to act per net namespace,
since the boot verifier is per net namespace. Then add a helper
to update the boot verifier.

---
v2:
- Add patch to bump the boot verifier on all write/commit errors
- Fix initialisation of 'seq' in nfsd_copy_boot_verifier()

Trond Myklebust (4):
  nfsd: nfsd_file cache entries should be per net namespace
  nfsd: Support the server resetting the boot verifier
  nfsd: Don't garbage collect files that might contain write errors
  nfsd: Reset the boot verifier on all write I/O errors

 fs/nfsd/export.c    |  2 +-
 fs/nfsd/filecache.c | 76 +++++++++++++++++++++++++++++++++++++--------
 fs/nfsd/filecache.h |  3 +-
 fs/nfsd/netns.h     |  4 +++
 fs/nfsd/nfs3xdr.c   | 13 +++++---
 fs/nfsd/nfs4proc.c  | 14 +++------
 fs/nfsd/nfsctl.c    |  1 +
 fs/nfsd/nfssvc.c    | 32 ++++++++++++++++++-
 fs/nfsd/vfs.c       | 19 +++++++++---
 9 files changed, 130 insertions(+), 34 deletions(-)

Comments

J. Bruce Fields Sept. 10, 2019, 1:11 p.m. UTC | #1

Looks OK to me; applying for 5.4.

Any ideas for easy ways to test this?  Maybe I should look at
Documentation/fault-injection/fault-injection.txt.

--b.

On Mon, Sep 02, 2019 at 01:02:54PM -0400, Trond Myklebust wrote:
> Recently, a number of changes went into the kernel to try to ensure
> that I/O errors (specifically write errors) are reported to the
> application once and only once. The vehicle for ensuring the errors
> are reported is the struct file, which uses the 'f_wb_err' field to
> track which errors have been reported.
> 
> The problem is that errors are mainly intended to be reported through
> fsync(). If the client is doing synchronous writes, then all is well,
> but if it is doing unstable writes, then the errors may not be
> reported until the client calls COMMIT. If the file cache has
> thrown out the struct file, due to memory pressure, or just because
> the client took a long while between the last WRITE and the COMMIT,
> then the error report may be lost, and the client may just think
> its data is safely stored.
> 
> Note that the problem is compounded by the fact that NFSv3 is stateless,
> so the server never knows that the client may have rebooted, so there
> can be no guarantee that a COMMIT will ever be sent.
> 
> The following patch set attempts to remedy the situation using 2
> strategies:
> 
> 1) If the inode is dirty, then avoid garbage collecting the file
>    from the file cache.
> 2) If the file is closed, and we see that it would have reported
>    an error to COMMIT, then we bump the boot verifier in order to
>    ensure the client retransmits all its writes.
> 
> Note that if multiple clients were writing to the same file, then
> we probably want to bump the boot verifier anyway, since only one
> COMMIT will see the error report (because the cached file is also
> shared).
> 
> So in order to implement the above strategy, we first have to do
> the following: split up the file cache to act per net namespace,
> since the boot verifier is per net namespace. Then add a helper
> to update the boot verifier.
> 
> ---
> v2:
> - Add patch to bump the boot verifier on all write/commit errors
> - Fix initialisation of 'seq' in nfsd_copy_boot_verifier()
> 
> Trond Myklebust (4):
>   nfsd: nfsd_file cache entries should be per net namespace
>   nfsd: Support the server resetting the boot verifier
>   nfsd: Don't garbage collect files that might contain write errors
>   nfsd: Reset the boot verifier on all write I/O errors
> 
>  fs/nfsd/export.c    |  2 +-
>  fs/nfsd/filecache.c | 76 +++++++++++++++++++++++++++++++++++++--------
>  fs/nfsd/filecache.h |  3 +-
>  fs/nfsd/netns.h     |  4 +++
>  fs/nfsd/nfs3xdr.c   | 13 +++++---
>  fs/nfsd/nfs4proc.c  | 14 +++------
>  fs/nfsd/nfsctl.c    |  1 +
>  fs/nfsd/nfssvc.c    | 32 ++++++++++++++++++-
>  fs/nfsd/vfs.c       | 19 +++++++++---
>  9 files changed, 130 insertions(+), 34 deletions(-)
> 
> -- 
> 2.21.0