Message ID | 20220722181220.81636-1-jlayton@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | nfs: fix -ENOSPC DIO write regression | expand |
On Fri, 2022-07-22 at 14:12 -0400, Jeff Layton wrote: > Boyang reported that xfstest generic/476 would never complete when > run > against a filesystem that was "too small". > > What I found was that we would end up trying to issue a large DIO > write > that would come back short. The kernel would then follow up and try > to > write out the rest and get back -ENOSPC. It would then try to issue a > commit, which would then try to reissue the writes, and around it > would > go. > > This patchset seems to fix it. Unfortunately, I'm not positive which > patch _broke_ this as it seems to have happened quite some time ago. > > Jeff Layton (3): > nfs: add new nfs_direct_req tracepoint events > nfs: always check dreq->error after a commit > nfs: only issue commit in DIO codepath if we have uncommitted data > > fs/nfs/direct.c | 50 +++++++++-------------------- > fs/nfs/internal.h | 33 ++++++++++++++++++++ > fs/nfs/nfstrace.h | 69 > +++++++++++++++++++++++++++++++++++++++++ > fs/nfs/write.c | 48 +++++++++++++++++----------- > include/linux/nfs_xdr.h | 1 + > 5 files changed, 148 insertions(+), 53 deletions(-) > With this series applied, I'm seeing things like xfstests generic/013 looping forever.
On Sun, 2022-07-24 at 19:10 +0000, Trond Myklebust wrote: > On Fri, 2022-07-22 at 14:12 -0400, Jeff Layton wrote: > > Boyang reported that xfstest generic/476 would never complete when > > run > > against a filesystem that was "too small". > > > > What I found was that we would end up trying to issue a large DIO > > write > > that would come back short. The kernel would then follow up and try > > to > > write out the rest and get back -ENOSPC. It would then try to issue > > a > > commit, which would then try to reissue the writes, and around it > > would > > go. > > > > This patchset seems to fix it. Unfortunately, I'm not positive > > which > > patch _broke_ this as it seems to have happened quite some time > > ago. > > > > Jeff Layton (3): > > nfs: add new nfs_direct_req tracepoint events > > nfs: always check dreq->error after a commit > > nfs: only issue commit in DIO codepath if we have uncommitted > > data > > > > fs/nfs/direct.c | 50 +++++++++-------------------- > > fs/nfs/internal.h | 33 ++++++++++++++++++++ > > fs/nfs/nfstrace.h | 69 > > +++++++++++++++++++++++++++++++++++++++++ > > fs/nfs/write.c | 48 +++++++++++++++++----------- > > include/linux/nfs_xdr.h | 1 + > > 5 files changed, 148 insertions(+), 53 deletions(-) > > > > With this series applied, I'm seeing things like xfstests generic/013 > looping forever. > Sorry, false alarm... That turned out to be due to an interesting readahead config issue.
Hi Jeff, Thanks for fixing this! I have run some tests against this patchset for days, and the results are all good. generic/476 would complete within 30 mins typically. These tests are: For verifying generic/476: xfstests-multihost-nfsv3-over-ext4 xfstests-multihost-nfsv3-over-feature-ext4 xfstests-multihost-nfsv3-over-feature-xfs xfstests-multihost-nfsv3-over-xfs xfstests-multihost-nfsv4.0-over-ext4 xfstests-multihost-nfsv4.0-over-feature-ext4 xfstests-multihost-nfsv4.0-over-feature-xfs xfstests-multihost-nfsv4.0-over-xfs xfstests-multihost-nfsv4.1-over-ext4 xfstests-multihost-nfsv4.1-over-feature-ext4 xfstests-multihost-nfsv4.1-over-feature-xfs xfstests-multihost-nfsv4.1-over-xfs xfstests-multihost-nfsv4.2-over-ext4 xfstests-multihost-nfsv4.2-over-feature-ext4 xfstests-multihost-nfsv4.2-over-feature-xfs xfstests-multihost-nfsv4.2-over-xfs xfstests-localhost-nfsv3 xfstests-localhost-nfsv4.0 xfstests-localhost-nfsv4.1 xfstests-localhost-nfsv4.2 Regression tests: ltp-nfsv{3,4.0,4.1,4.2} pjd-test: nfs nfs-connectathon nfs-sanity-check All tests were run on x86_64, aarch64, ppc64le, and s390x (part, due to some config issues). Hope this helps. Thanks, Boyang On Sat, Jul 23, 2022 at 2:12 AM Jeff Layton <jlayton@kernel.org> wrote: > > Boyang reported that xfstest generic/476 would never complete when run > against a filesystem that was "too small". > > What I found was that we would end up trying to issue a large DIO write > that would come back short. The kernel would then follow up and try to > write out the rest and get back -ENOSPC. It would then try to issue a > commit, which would then try to reissue the writes, and around it would > go. > > This patchset seems to fix it. Unfortunately, I'm not positive which > patch _broke_ this as it seems to have happened quite some time ago. > > Jeff Layton (3): > nfs: add new nfs_direct_req tracepoint events > nfs: always check dreq->error after a commit > nfs: only issue commit in DIO codepath if we have uncommitted data > > fs/nfs/direct.c | 50 +++++++++-------------------- > fs/nfs/internal.h | 33 ++++++++++++++++++++ > fs/nfs/nfstrace.h | 69 +++++++++++++++++++++++++++++++++++++++++ > fs/nfs/write.c | 48 +++++++++++++++++----------- > include/linux/nfs_xdr.h | 1 + > 5 files changed, 148 insertions(+), 53 deletions(-) > > -- > 2.36.1 >