mbox series

[0/3] nfs: fix -ENOSPC DIO write regression

Message ID 20220722181220.81636-1-jlayton@kernel.org (mailing list archive)
Headers show
Series nfs: fix -ENOSPC DIO write regression | expand

Message

Jeff Layton July 22, 2022, 6:12 p.m. UTC
Boyang reported that xfstest generic/476 would never complete when run
against a filesystem that was "too small".

What I found was that we would end up trying to issue a large DIO write
that would come back short. The kernel would then follow up and try to
write out the rest and get back -ENOSPC. It would then try to issue a
commit, which would then try to reissue the writes, and around it would
go.

This patchset seems to fix it. Unfortunately, I'm not positive which
patch _broke_ this as it seems to have happened quite some time ago.

Jeff Layton (3):
  nfs: add new nfs_direct_req tracepoint events
  nfs: always check dreq->error after a commit
  nfs: only issue commit in DIO codepath if we have uncommitted data

 fs/nfs/direct.c         | 50 +++++++++--------------------
 fs/nfs/internal.h       | 33 ++++++++++++++++++++
 fs/nfs/nfstrace.h       | 69 +++++++++++++++++++++++++++++++++++++++++
 fs/nfs/write.c          | 48 +++++++++++++++++-----------
 include/linux/nfs_xdr.h |  1 +
 5 files changed, 148 insertions(+), 53 deletions(-)

Comments

Trond Myklebust July 24, 2022, 7:10 p.m. UTC | #1
On Fri, 2022-07-22 at 14:12 -0400, Jeff Layton wrote:
> Boyang reported that xfstest generic/476 would never complete when
> run
> against a filesystem that was "too small".
> 
> What I found was that we would end up trying to issue a large DIO
> write
> that would come back short. The kernel would then follow up and try
> to
> write out the rest and get back -ENOSPC. It would then try to issue a
> commit, which would then try to reissue the writes, and around it
> would
> go.
> 
> This patchset seems to fix it. Unfortunately, I'm not positive which
> patch _broke_ this as it seems to have happened quite some time ago.
> 
> Jeff Layton (3):
>   nfs: add new nfs_direct_req tracepoint events
>   nfs: always check dreq->error after a commit
>   nfs: only issue commit in DIO codepath if we have uncommitted data
> 
>  fs/nfs/direct.c         | 50 +++++++++--------------------
>  fs/nfs/internal.h       | 33 ++++++++++++++++++++
>  fs/nfs/nfstrace.h       | 69
> +++++++++++++++++++++++++++++++++++++++++
>  fs/nfs/write.c          | 48 +++++++++++++++++-----------
>  include/linux/nfs_xdr.h |  1 +
>  5 files changed, 148 insertions(+), 53 deletions(-)
> 

With this series applied, I'm seeing things like xfstests generic/013
looping forever.
Trond Myklebust July 24, 2022, 8:18 p.m. UTC | #2
On Sun, 2022-07-24 at 19:10 +0000, Trond Myklebust wrote:
> On Fri, 2022-07-22 at 14:12 -0400, Jeff Layton wrote:
> > Boyang reported that xfstest generic/476 would never complete when
> > run
> > against a filesystem that was "too small".
> > 
> > What I found was that we would end up trying to issue a large DIO
> > write
> > that would come back short. The kernel would then follow up and try
> > to
> > write out the rest and get back -ENOSPC. It would then try to issue
> > a
> > commit, which would then try to reissue the writes, and around it
> > would
> > go.
> > 
> > This patchset seems to fix it. Unfortunately, I'm not positive
> > which
> > patch _broke_ this as it seems to have happened quite some time
> > ago.
> > 
> > Jeff Layton (3):
> >   nfs: add new nfs_direct_req tracepoint events
> >   nfs: always check dreq->error after a commit
> >   nfs: only issue commit in DIO codepath if we have uncommitted
> > data
> > 
> >  fs/nfs/direct.c         | 50 +++++++++--------------------
> >  fs/nfs/internal.h       | 33 ++++++++++++++++++++
> >  fs/nfs/nfstrace.h       | 69
> > +++++++++++++++++++++++++++++++++++++++++
> >  fs/nfs/write.c          | 48 +++++++++++++++++-----------
> >  include/linux/nfs_xdr.h |  1 +
> >  5 files changed, 148 insertions(+), 53 deletions(-)
> > 
> 
> With this series applied, I'm seeing things like xfstests generic/013
> looping forever.
> 

Sorry, false alarm... That turned out to be due to an interesting
readahead config issue.
Boyang Xue Aug. 4, 2022, 8:54 a.m. UTC | #3
Hi Jeff,

Thanks for fixing this! I have run some tests against this patchset
for days, and the results are all good. generic/476 would complete
within 30 mins typically. These tests are:

For verifying generic/476:
xfstests-multihost-nfsv3-over-ext4
xfstests-multihost-nfsv3-over-feature-ext4
xfstests-multihost-nfsv3-over-feature-xfs
xfstests-multihost-nfsv3-over-xfs
xfstests-multihost-nfsv4.0-over-ext4
xfstests-multihost-nfsv4.0-over-feature-ext4
xfstests-multihost-nfsv4.0-over-feature-xfs
xfstests-multihost-nfsv4.0-over-xfs
xfstests-multihost-nfsv4.1-over-ext4
xfstests-multihost-nfsv4.1-over-feature-ext4
xfstests-multihost-nfsv4.1-over-feature-xfs
xfstests-multihost-nfsv4.1-over-xfs
xfstests-multihost-nfsv4.2-over-ext4
xfstests-multihost-nfsv4.2-over-feature-ext4
xfstests-multihost-nfsv4.2-over-feature-xfs
xfstests-multihost-nfsv4.2-over-xfs
xfstests-localhost-nfsv3
xfstests-localhost-nfsv4.0
xfstests-localhost-nfsv4.1
xfstests-localhost-nfsv4.2

Regression tests:
ltp-nfsv{3,4.0,4.1,4.2}
pjd-test: nfs
nfs-connectathon
nfs-sanity-check

All tests were run on x86_64, aarch64, ppc64le, and s390x (part, due
to some config issues).

Hope this helps.

Thanks,
Boyang

On Sat, Jul 23, 2022 at 2:12 AM Jeff Layton <jlayton@kernel.org> wrote:
>
> Boyang reported that xfstest generic/476 would never complete when run
> against a filesystem that was "too small".
>
> What I found was that we would end up trying to issue a large DIO write
> that would come back short. The kernel would then follow up and try to
> write out the rest and get back -ENOSPC. It would then try to issue a
> commit, which would then try to reissue the writes, and around it would
> go.
>
> This patchset seems to fix it. Unfortunately, I'm not positive which
> patch _broke_ this as it seems to have happened quite some time ago.
>
> Jeff Layton (3):
>   nfs: add new nfs_direct_req tracepoint events
>   nfs: always check dreq->error after a commit
>   nfs: only issue commit in DIO codepath if we have uncommitted data
>
>  fs/nfs/direct.c         | 50 +++++++++--------------------
>  fs/nfs/internal.h       | 33 ++++++++++++++++++++
>  fs/nfs/nfstrace.h       | 69 +++++++++++++++++++++++++++++++++++++++++
>  fs/nfs/write.c          | 48 +++++++++++++++++-----------
>  include/linux/nfs_xdr.h |  1 +
>  5 files changed, 148 insertions(+), 53 deletions(-)
>
> --
> 2.36.1
>