Message ID | 20220729164715.75702-3-jlayton@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] nfsd: add new tracepoint in nfsd_open_break_lease | expand |
> On Jul 29, 2022, at 12:47 PM, Jeff Layton <jlayton@kernel.org> wrote: > > We had a report from the spring Bake-a-thon of data corruption in some > nfstest_interop tests. Looking at the traces showed the NFS server > allowing a v3 WRITE to proceed while a read delegation was still > outstanding. > > Currently, we only set NFSD_FILE_BREAK_* flags if > NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc. > NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for > COMMIT ops, where we need a writeable filehandle but don't need to > break read leases. > > It doesn't make any sense to consult that flag when allocating a file > since the file may be used on subsequent calls where we do want to break > the lease (and the usage of it here seems to be reverse from what it > should be anyway). > > Also, after calling nfsd_open_break_lease, we don't want to clear the > BREAK_* bits. A lease could end up being set on it later (more than > once) and we need to be able to break those leases as well. > > This means that the NFSD_FILE_BREAK_* flags now just mirror > NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just > drop those flags and unconditionally call nfsd_open_break_lease every > time. > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360 > Fixes: 65294c1f2c5e (nfsd: add a new struct file caching facility to nfsd) > Reported-by: Olga Kornieskaia <kolga@netapp.com> > Signed-off-by: Jeff Layton <jlayton@kernel.org> I'm going to go out on a limb and predict this will conflict heavily with the filecache overhaul patches I have queued for next. :-) Do you believe this is something that urgently needs to be backported to stable kernels, or can it be rebased on top of the filecache overhaul work? > --- > fs/nfsd/filecache.c | 26 +++----------------------- > fs/nfsd/filecache.h | 4 +--- > fs/nfsd/trace.h | 2 -- > 3 files changed, 4 insertions(+), 28 deletions(-) > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c > index 4758c2a3fcf8..7e566ddca388 100644 > --- a/fs/nfsd/filecache.c > +++ b/fs/nfsd/filecache.c > @@ -283,7 +283,7 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode) > } > > static struct nfsd_file * > -nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) > +nfsd_file_alloc(struct nfsd_file_lookup_key *key) > { > struct nfsd_file *nf; > > @@ -301,12 +301,6 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) > /* nf_ref is pre-incremented for hash table */ > refcount_set(&nf->nf_ref, 2); > nf->nf_may = key->need; > - if (may & NFSD_MAY_NOT_BREAK_LEASE) { > - if (may & NFSD_MAY_WRITE) > - __set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags); > - if (may & NFSD_MAY_READ) > - __set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); > - } > nf->nf_mark = NULL; > } > return nf; > @@ -1090,7 +1084,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, > if (nf) > goto wait_for_construction; > > - new = nfsd_file_alloc(&key, may_flags); > + new = nfsd_file_alloc(&key); > if (!new) { > status = nfserr_jukebox; > goto out_status; > @@ -1130,21 +1124,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, > nfsd_file_lru_remove(nf); > this_cpu_inc(nfsd_file_cache_hits); > > - if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) { > - bool write = (may_flags & NFSD_MAY_WRITE); > - > - if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) || > - (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) { > - status = nfserrno(nfsd_open_break_lease( > - file_inode(nf->nf_file), may_flags)); > - if (status == nfs_ok) { > - clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); > - if (write) > - clear_bit(NFSD_FILE_BREAK_WRITE, > - &nf->nf_flags); > - } > - } > - } > + status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); > out: > if (status == nfs_ok) { > if (open) > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h > index d534b76cb65b..8e8c0c47d67d 100644 > --- a/fs/nfsd/filecache.h > +++ b/fs/nfsd/filecache.h > @@ -37,9 +37,7 @@ struct nfsd_file { > struct net *nf_net; > #define NFSD_FILE_HASHED (0) > #define NFSD_FILE_PENDING (1) > -#define NFSD_FILE_BREAK_READ (2) > -#define NFSD_FILE_BREAK_WRITE (3) > -#define NFSD_FILE_REFERENCED (4) > +#define NFSD_FILE_REFERENCED (2) > unsigned long nf_flags; > struct inode *nf_inode; /* don't deref */ > refcount_t nf_ref; > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h > index e9c5d0f56977..2bd867a96eba 100644 > --- a/fs/nfsd/trace.h > +++ b/fs/nfsd/trace.h > @@ -758,8 +758,6 @@ DEFINE_CLID_EVENT(confirmed_r); > __print_flags(val, "|", \ > { 1 << NFSD_FILE_HASHED, "HASHED" }, \ > { 1 << NFSD_FILE_PENDING, "PENDING" }, \ > - { 1 << NFSD_FILE_BREAK_READ, "BREAK_READ" }, \ > - { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" }, \ > { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}) > > DECLARE_EVENT_CLASS(nfsd_file_class, > -- > 2.37.1 > -- Chuck Lever
On Fri, 2022-07-29 at 17:21 +0000, Chuck Lever III wrote: > > > On Jul 29, 2022, at 12:47 PM, Jeff Layton <jlayton@kernel.org> wrote: > > > > We had a report from the spring Bake-a-thon of data corruption in some > > nfstest_interop tests. Looking at the traces showed the NFS server > > allowing a v3 WRITE to proceed while a read delegation was still > > outstanding. > > > > Currently, we only set NFSD_FILE_BREAK_* flags if > > NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc. > > NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for > > COMMIT ops, where we need a writeable filehandle but don't need to > > break read leases. > > > > It doesn't make any sense to consult that flag when allocating a file > > since the file may be used on subsequent calls where we do want to break > > the lease (and the usage of it here seems to be reverse from what it > > should be anyway). > > > > Also, after calling nfsd_open_break_lease, we don't want to clear the > > BREAK_* bits. A lease could end up being set on it later (more than > > once) and we need to be able to break those leases as well. > > > > This means that the NFSD_FILE_BREAK_* flags now just mirror > > NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just > > drop those flags and unconditionally call nfsd_open_break_lease every > > time. > > > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360 > > Fixes: 65294c1f2c5e (nfsd: add a new struct file caching facility to nfsd) > > Reported-by: Olga Kornieskaia <kolga@netapp.com> > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > I'm going to go out on a limb and predict this will conflict > heavily with the filecache overhaul patches I have queued for > next. :-) > > Do you believe this is something that urgently needs to be > backported to stable kernels, or can it be rebased on top of > the filecache overhaul work? > > I based this on top of your for-next branch and I think the filecache is already in there. It's a pretty nasty bug that we probably will want backported, so it might make sense to respin this on top of mainline and put it in ahead of the filecache overhaul. Thoughts? > > --- > > fs/nfsd/filecache.c | 26 +++----------------------- > > fs/nfsd/filecache.h | 4 +--- > > fs/nfsd/trace.h | 2 -- > > 3 files changed, 4 insertions(+), 28 deletions(-) > > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c > > index 4758c2a3fcf8..7e566ddca388 100644 > > --- a/fs/nfsd/filecache.c > > +++ b/fs/nfsd/filecache.c > > @@ -283,7 +283,7 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode) > > } > > > > static struct nfsd_file * > > -nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) > > +nfsd_file_alloc(struct nfsd_file_lookup_key *key) > > { > > struct nfsd_file *nf; > > > > @@ -301,12 +301,6 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) > > /* nf_ref is pre-incremented for hash table */ > > refcount_set(&nf->nf_ref, 2); > > nf->nf_may = key->need; > > - if (may & NFSD_MAY_NOT_BREAK_LEASE) { > > - if (may & NFSD_MAY_WRITE) > > - __set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags); > > - if (may & NFSD_MAY_READ) > > - __set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); > > - } > > nf->nf_mark = NULL; > > } > > return nf; > > @@ -1090,7 +1084,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, > > if (nf) > > goto wait_for_construction; > > > > - new = nfsd_file_alloc(&key, may_flags); > > + new = nfsd_file_alloc(&key); > > if (!new) { > > status = nfserr_jukebox; > > goto out_status; > > @@ -1130,21 +1124,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, > > nfsd_file_lru_remove(nf); > > this_cpu_inc(nfsd_file_cache_hits); > > > > - if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) { > > - bool write = (may_flags & NFSD_MAY_WRITE); > > - > > - if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) || > > - (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) { > > - status = nfserrno(nfsd_open_break_lease( > > - file_inode(nf->nf_file), may_flags)); > > - if (status == nfs_ok) { > > - clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); > > - if (write) > > - clear_bit(NFSD_FILE_BREAK_WRITE, > > - &nf->nf_flags); > > - } > > - } > > - } > > + status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); > > out: > > if (status == nfs_ok) { > > if (open) > > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h > > index d534b76cb65b..8e8c0c47d67d 100644 > > --- a/fs/nfsd/filecache.h > > +++ b/fs/nfsd/filecache.h > > @@ -37,9 +37,7 @@ struct nfsd_file { > > struct net *nf_net; > > #define NFSD_FILE_HASHED (0) > > #define NFSD_FILE_PENDING (1) > > -#define NFSD_FILE_BREAK_READ (2) > > -#define NFSD_FILE_BREAK_WRITE (3) > > -#define NFSD_FILE_REFERENCED (4) > > +#define NFSD_FILE_REFERENCED (2) > > unsigned long nf_flags; > > struct inode *nf_inode; /* don't deref */ > > refcount_t nf_ref; > > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h > > index e9c5d0f56977..2bd867a96eba 100644 > > --- a/fs/nfsd/trace.h > > +++ b/fs/nfsd/trace.h > > @@ -758,8 +758,6 @@ DEFINE_CLID_EVENT(confirmed_r); > > __print_flags(val, "|", \ > > { 1 << NFSD_FILE_HASHED, "HASHED" }, \ > > { 1 << NFSD_FILE_PENDING, "PENDING" }, \ > > - { 1 << NFSD_FILE_BREAK_READ, "BREAK_READ" }, \ > > - { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" }, \ > > { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}) > > > > DECLARE_EVENT_CLASS(nfsd_file_class, > > -- > > 2.37.1 > > > > -- > Chuck Lever > > >
> On Jul 29, 2022, at 1:34 PM, Jeff Layton <jlayton@kernel.org> wrote: > > On Fri, 2022-07-29 at 17:21 +0000, Chuck Lever III wrote: >> >>> On Jul 29, 2022, at 12:47 PM, Jeff Layton <jlayton@kernel.org> wrote: >>> >>> We had a report from the spring Bake-a-thon of data corruption in some >>> nfstest_interop tests. Looking at the traces showed the NFS server >>> allowing a v3 WRITE to proceed while a read delegation was still >>> outstanding. >>> >>> Currently, we only set NFSD_FILE_BREAK_* flags if >>> NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc. >>> NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for >>> COMMIT ops, where we need a writeable filehandle but don't need to >>> break read leases. >>> >>> It doesn't make any sense to consult that flag when allocating a file >>> since the file may be used on subsequent calls where we do want to break >>> the lease (and the usage of it here seems to be reverse from what it >>> should be anyway). >>> >>> Also, after calling nfsd_open_break_lease, we don't want to clear the >>> BREAK_* bits. A lease could end up being set on it later (more than >>> once) and we need to be able to break those leases as well. >>> >>> This means that the NFSD_FILE_BREAK_* flags now just mirror >>> NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just >>> drop those flags and unconditionally call nfsd_open_break_lease every >>> time. >>> >>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360 >>> Fixes: 65294c1f2c5e (nfsd: add a new struct file caching facility to nfsd) >>> Reported-by: Olga Kornieskaia <kolga@netapp.com> >>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >> >> I'm going to go out on a limb and predict this will conflict >> heavily with the filecache overhaul patches I have queued for >> next. :-) >> >> Do you believe this is something that urgently needs to be >> backported to stable kernels, or can it be rebased on top of >> the filecache overhaul work? >> >> > > I based this on top of your for-next branch and I think the filecache is > already in there. > > It's a pretty nasty bug that we probably will want backported, so it > might make sense to respin this on top of mainline and put it in ahead > of the filecache overhaul. I am a generally a proponent of enabling fix backports. I encourage you to test the respin on 5.19 and 5.18 at least because the moment that patch hits upstream, Sasha and Greg will pull it into stable. I don't relish the idea of having to fix the fix, if you catch my drift. And perhaps when you repost, the fix should be reordered before the patches that add the tracepoints. > Thoughts? Rebasing all that is mechanically straightforward to do. The only issue is that the filecache work and the first PR tag is already in my for-next branch. If you don't think it will trigger massive heartburn for Linus and Stephen, I can pull that stuff out now, and postpone my first PR until the second week of the merge window. >>> --- >>> fs/nfsd/filecache.c | 26 +++----------------------- >>> fs/nfsd/filecache.h | 4 +--- >>> fs/nfsd/trace.h | 2 -- >>> 3 files changed, 4 insertions(+), 28 deletions(-) >>> >>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c >>> index 4758c2a3fcf8..7e566ddca388 100644 >>> --- a/fs/nfsd/filecache.c >>> +++ b/fs/nfsd/filecache.c >>> @@ -283,7 +283,7 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode) >>> } >>> >>> static struct nfsd_file * >>> -nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) >>> +nfsd_file_alloc(struct nfsd_file_lookup_key *key) >>> { >>> struct nfsd_file *nf; >>> >>> @@ -301,12 +301,6 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) >>> /* nf_ref is pre-incremented for hash table */ >>> refcount_set(&nf->nf_ref, 2); >>> nf->nf_may = key->need; >>> - if (may & NFSD_MAY_NOT_BREAK_LEASE) { >>> - if (may & NFSD_MAY_WRITE) >>> - __set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags); >>> - if (may & NFSD_MAY_READ) >>> - __set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); >>> - } >>> nf->nf_mark = NULL; >>> } >>> return nf; >>> @@ -1090,7 +1084,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, >>> if (nf) >>> goto wait_for_construction; >>> >>> - new = nfsd_file_alloc(&key, may_flags); >>> + new = nfsd_file_alloc(&key); >>> if (!new) { >>> status = nfserr_jukebox; >>> goto out_status; >>> @@ -1130,21 +1124,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, >>> nfsd_file_lru_remove(nf); >>> this_cpu_inc(nfsd_file_cache_hits); >>> >>> - if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) { >>> - bool write = (may_flags & NFSD_MAY_WRITE); >>> - >>> - if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) || >>> - (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) { >>> - status = nfserrno(nfsd_open_break_lease( >>> - file_inode(nf->nf_file), may_flags)); >>> - if (status == nfs_ok) { >>> - clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); >>> - if (write) >>> - clear_bit(NFSD_FILE_BREAK_WRITE, >>> - &nf->nf_flags); >>> - } >>> - } >>> - } >>> + status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); >>> out: >>> if (status == nfs_ok) { >>> if (open) >>> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h >>> index d534b76cb65b..8e8c0c47d67d 100644 >>> --- a/fs/nfsd/filecache.h >>> +++ b/fs/nfsd/filecache.h >>> @@ -37,9 +37,7 @@ struct nfsd_file { >>> struct net *nf_net; >>> #define NFSD_FILE_HASHED (0) >>> #define NFSD_FILE_PENDING (1) >>> -#define NFSD_FILE_BREAK_READ (2) >>> -#define NFSD_FILE_BREAK_WRITE (3) >>> -#define NFSD_FILE_REFERENCED (4) >>> +#define NFSD_FILE_REFERENCED (2) >>> unsigned long nf_flags; >>> struct inode *nf_inode; /* don't deref */ >>> refcount_t nf_ref; >>> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h >>> index e9c5d0f56977..2bd867a96eba 100644 >>> --- a/fs/nfsd/trace.h >>> +++ b/fs/nfsd/trace.h >>> @@ -758,8 +758,6 @@ DEFINE_CLID_EVENT(confirmed_r); >>> __print_flags(val, "|", \ >>> { 1 << NFSD_FILE_HASHED, "HASHED" }, \ >>> { 1 << NFSD_FILE_PENDING, "PENDING" }, \ >>> - { 1 << NFSD_FILE_BREAK_READ, "BREAK_READ" }, \ >>> - { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" }, \ >>> { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}) >>> >>> DECLARE_EVENT_CLASS(nfsd_file_class, >>> -- >>> 2.37.1 >>> >> >> -- >> Chuck Lever >> >> >> > > -- > Jeff Layton <jlayton@kernel.org> -- Chuck Lever
On Fri, 2022-07-29 at 17:46 +0000, Chuck Lever III wrote: > > > On Jul 29, 2022, at 1:34 PM, Jeff Layton <jlayton@kernel.org> wrote: > > > > On Fri, 2022-07-29 at 17:21 +0000, Chuck Lever III wrote: > > > > > > > On Jul 29, 2022, at 12:47 PM, Jeff Layton <jlayton@kernel.org> wrote: > > > > > > > > We had a report from the spring Bake-a-thon of data corruption in some > > > > nfstest_interop tests. Looking at the traces showed the NFS server > > > > allowing a v3 WRITE to proceed while a read delegation was still > > > > outstanding. > > > > > > > > Currently, we only set NFSD_FILE_BREAK_* flags if > > > > NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc. > > > > NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for > > > > COMMIT ops, where we need a writeable filehandle but don't need to > > > > break read leases. > > > > > > > > It doesn't make any sense to consult that flag when allocating a file > > > > since the file may be used on subsequent calls where we do want to break > > > > the lease (and the usage of it here seems to be reverse from what it > > > > should be anyway). > > > > > > > > Also, after calling nfsd_open_break_lease, we don't want to clear the > > > > BREAK_* bits. A lease could end up being set on it later (more than > > > > once) and we need to be able to break those leases as well. > > > > > > > > This means that the NFSD_FILE_BREAK_* flags now just mirror > > > > NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just > > > > drop those flags and unconditionally call nfsd_open_break_lease every > > > > time. > > > > > > > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360 > > > > Fixes: 65294c1f2c5e (nfsd: add a new struct file caching facility to nfsd) > > > > Reported-by: Olga Kornieskaia <kolga@netapp.com> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > > > I'm going to go out on a limb and predict this will conflict > > > heavily with the filecache overhaul patches I have queued for > > > next. :-) > > > > > > Do you believe this is something that urgently needs to be > > > backported to stable kernels, or can it be rebased on top of > > > the filecache overhaul work? > > > > > > > > > > I based this on top of your for-next branch and I think the filecache is > > already in there. > > > > It's a pretty nasty bug that we probably will want backported, so it > > might make sense to respin this on top of mainline and put it in ahead > > of the filecache overhaul. > > I am a generally a proponent of enabling fix backports. > > I encourage you to test the respin on 5.19 and 5.18 at least > because the moment that patch hits upstream, Sasha and Greg > will pull it into stable. I don't relish the idea of having > to fix the fix, if you catch my drift. > > And perhaps when you repost, the fix should be reordered > before the patches that add the tracepoints. > That all sounds good. I'll do that in a bit, and will send a v2. > > > Thoughts? > > Rebasing all that is mechanically straightforward to do. > > The only issue is that the filecache work and the first PR > tag is already in my for-next branch. If you don't think it > will trigger massive heartburn for Linus and Stephen, I can > pull that stuff out now, and postpone my first PR until the > second week of the merge window. > That may be best. I think we want to see this fix go in almost everywhere. > > > > > --- > > > > fs/nfsd/filecache.c | 26 +++----------------------- > > > > fs/nfsd/filecache.h | 4 +--- > > > > fs/nfsd/trace.h | 2 -- > > > > 3 files changed, 4 insertions(+), 28 deletions(-) > > > > > > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c > > > > index 4758c2a3fcf8..7e566ddca388 100644 > > > > --- a/fs/nfsd/filecache.c > > > > +++ b/fs/nfsd/filecache.c > > > > @@ -283,7 +283,7 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode) > > > > } > > > > > > > > static struct nfsd_file * > > > > -nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) > > > > +nfsd_file_alloc(struct nfsd_file_lookup_key *key) > > > > { > > > > struct nfsd_file *nf; > > > > > > > > @@ -301,12 +301,6 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) > > > > /* nf_ref is pre-incremented for hash table */ > > > > refcount_set(&nf->nf_ref, 2); > > > > nf->nf_may = key->need; > > > > - if (may & NFSD_MAY_NOT_BREAK_LEASE) { > > > > - if (may & NFSD_MAY_WRITE) > > > > - __set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags); > > > > - if (may & NFSD_MAY_READ) > > > > - __set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); > > > > - } > > > > nf->nf_mark = NULL; > > > > } > > > > return nf; > > > > @@ -1090,7 +1084,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, > > > > if (nf) > > > > goto wait_for_construction; > > > > > > > > - new = nfsd_file_alloc(&key, may_flags); > > > > + new = nfsd_file_alloc(&key); > > > > if (!new) { > > > > status = nfserr_jukebox; > > > > goto out_status; > > > > @@ -1130,21 +1124,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, > > > > nfsd_file_lru_remove(nf); > > > > this_cpu_inc(nfsd_file_cache_hits); > > > > > > > > - if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) { > > > > - bool write = (may_flags & NFSD_MAY_WRITE); > > > > - > > > > - if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) || > > > > - (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) { > > > > - status = nfserrno(nfsd_open_break_lease( > > > > - file_inode(nf->nf_file), may_flags)); > > > > - if (status == nfs_ok) { > > > > - clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); > > > > - if (write) > > > > - clear_bit(NFSD_FILE_BREAK_WRITE, > > > > - &nf->nf_flags); > > > > - } > > > > - } > > > > - } > > > > + status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); > > > > out: > > > > if (status == nfs_ok) { > > > > if (open) > > > > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h > > > > index d534b76cb65b..8e8c0c47d67d 100644 > > > > --- a/fs/nfsd/filecache.h > > > > +++ b/fs/nfsd/filecache.h > > > > @@ -37,9 +37,7 @@ struct nfsd_file { > > > > struct net *nf_net; > > > > #define NFSD_FILE_HASHED (0) > > > > #define NFSD_FILE_PENDING (1) > > > > -#define NFSD_FILE_BREAK_READ (2) > > > > -#define NFSD_FILE_BREAK_WRITE (3) > > > > -#define NFSD_FILE_REFERENCED (4) > > > > +#define NFSD_FILE_REFERENCED (2) > > > > unsigned long nf_flags; > > > > struct inode *nf_inode; /* don't deref */ > > > > refcount_t nf_ref; > > > > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h > > > > index e9c5d0f56977..2bd867a96eba 100644 > > > > --- a/fs/nfsd/trace.h > > > > +++ b/fs/nfsd/trace.h > > > > @@ -758,8 +758,6 @@ DEFINE_CLID_EVENT(confirmed_r); > > > > __print_flags(val, "|", \ > > > > { 1 << NFSD_FILE_HASHED, "HASHED" }, \ > > > > { 1 << NFSD_FILE_PENDING, "PENDING" }, \ > > > > - { 1 << NFSD_FILE_BREAK_READ, "BREAK_READ" }, \ > > > > - { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" }, \ > > > > { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}) > > > > > > > > DECLARE_EVENT_CLASS(nfsd_file_class, > > > > -- > > > > 2.37.1 > > > > > > > > > > -- > > > Chuck Lever > > > > > > > > > > > > > -- > > Jeff Layton <jlayton@kernel.org> > > -- > Chuck Lever > > >
> On Jul 29, 2022, at 1:55 PM, Jeff Layton <jlayton@kernel.org> wrote: > > On Fri, 2022-07-29 at 17:46 +0000, Chuck Lever III wrote: >> >>> On Jul 29, 2022, at 1:34 PM, Jeff Layton <jlayton@kernel.org> wrote: >>> >>> On Fri, 2022-07-29 at 17:21 +0000, Chuck Lever III wrote: >>>> >>>>> On Jul 29, 2022, at 12:47 PM, Jeff Layton <jlayton@kernel.org> wrote: >>>>> >>>>> We had a report from the spring Bake-a-thon of data corruption in some >>>>> nfstest_interop tests. Looking at the traces showed the NFS server >>>>> allowing a v3 WRITE to proceed while a read delegation was still >>>>> outstanding. >>>>> >>>>> Currently, we only set NFSD_FILE_BREAK_* flags if >>>>> NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc. >>>>> NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for >>>>> COMMIT ops, where we need a writeable filehandle but don't need to >>>>> break read leases. >>>>> >>>>> It doesn't make any sense to consult that flag when allocating a file >>>>> since the file may be used on subsequent calls where we do want to break >>>>> the lease (and the usage of it here seems to be reverse from what it >>>>> should be anyway). >>>>> >>>>> Also, after calling nfsd_open_break_lease, we don't want to clear the >>>>> BREAK_* bits. A lease could end up being set on it later (more than >>>>> once) and we need to be able to break those leases as well. >>>>> >>>>> This means that the NFSD_FILE_BREAK_* flags now just mirror >>>>> NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just >>>>> drop those flags and unconditionally call nfsd_open_break_lease every >>>>> time. >>>>> >>>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360 >>>>> Fixes: 65294c1f2c5e (nfsd: add a new struct file caching facility to nfsd) >>>>> Reported-by: Olga Kornieskaia <kolga@netapp.com> >>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>>> >>>> I'm going to go out on a limb and predict this will conflict >>>> heavily with the filecache overhaul patches I have queued for >>>> next. :-) >>>> >>>> Do you believe this is something that urgently needs to be >>>> backported to stable kernels, or can it be rebased on top of >>>> the filecache overhaul work? >>>> >>>> >>> >>> I based this on top of your for-next branch and I think the filecache is >>> already in there. >>> >>> It's a pretty nasty bug that we probably will want backported, so it >>> might make sense to respin this on top of mainline and put it in ahead >>> of the filecache overhaul. >> >> I am a generally a proponent of enabling fix backports. >> >> I encourage you to test the respin on 5.19 and 5.18 at least >> because the moment that patch hits upstream, Sasha and Greg >> will pull it into stable. I don't relish the idea of having >> to fix the fix, if you catch my drift. >> >> And perhaps when you repost, the fix should be reordered >> before the patches that add the tracepoints. >> > > That all sounds good. I'll do that in a bit, and will send a v2. > >> >>> Thoughts? >> >> Rebasing all that is mechanically straightforward to do. >> >> The only issue is that the filecache work and the first PR >> tag is already in my for-next branch. If you don't think it >> will trigger massive heartburn for Linus and Stephen, I can >> pull that stuff out now, and postpone my first PR until the >> second week of the merge window. >> > > That may be best. I think we want to see this fix go in almost > everywhere. I will push the removal update to my public tree now. You can send your v2 over the weekend or on Monday. >>>>> --- >>>>> fs/nfsd/filecache.c | 26 +++----------------------- >>>>> fs/nfsd/filecache.h | 4 +--- >>>>> fs/nfsd/trace.h | 2 -- >>>>> 3 files changed, 4 insertions(+), 28 deletions(-) >>>>> >>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c >>>>> index 4758c2a3fcf8..7e566ddca388 100644 >>>>> --- a/fs/nfsd/filecache.c >>>>> +++ b/fs/nfsd/filecache.c >>>>> @@ -283,7 +283,7 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode) >>>>> } >>>>> >>>>> static struct nfsd_file * >>>>> -nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) >>>>> +nfsd_file_alloc(struct nfsd_file_lookup_key *key) >>>>> { >>>>> struct nfsd_file *nf; >>>>> >>>>> @@ -301,12 +301,6 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) >>>>> /* nf_ref is pre-incremented for hash table */ >>>>> refcount_set(&nf->nf_ref, 2); >>>>> nf->nf_may = key->need; >>>>> - if (may & NFSD_MAY_NOT_BREAK_LEASE) { >>>>> - if (may & NFSD_MAY_WRITE) >>>>> - __set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags); >>>>> - if (may & NFSD_MAY_READ) >>>>> - __set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); >>>>> - } >>>>> nf->nf_mark = NULL; >>>>> } >>>>> return nf; >>>>> @@ -1090,7 +1084,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, >>>>> if (nf) >>>>> goto wait_for_construction; >>>>> >>>>> - new = nfsd_file_alloc(&key, may_flags); >>>>> + new = nfsd_file_alloc(&key); >>>>> if (!new) { >>>>> status = nfserr_jukebox; >>>>> goto out_status; >>>>> @@ -1130,21 +1124,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, >>>>> nfsd_file_lru_remove(nf); >>>>> this_cpu_inc(nfsd_file_cache_hits); >>>>> >>>>> - if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) { >>>>> - bool write = (may_flags & NFSD_MAY_WRITE); >>>>> - >>>>> - if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) || >>>>> - (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) { >>>>> - status = nfserrno(nfsd_open_break_lease( >>>>> - file_inode(nf->nf_file), may_flags)); >>>>> - if (status == nfs_ok) { >>>>> - clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); >>>>> - if (write) >>>>> - clear_bit(NFSD_FILE_BREAK_WRITE, >>>>> - &nf->nf_flags); >>>>> - } >>>>> - } >>>>> - } >>>>> + status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); >>>>> out: >>>>> if (status == nfs_ok) { >>>>> if (open) >>>>> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h >>>>> index d534b76cb65b..8e8c0c47d67d 100644 >>>>> --- a/fs/nfsd/filecache.h >>>>> +++ b/fs/nfsd/filecache.h >>>>> @@ -37,9 +37,7 @@ struct nfsd_file { >>>>> struct net *nf_net; >>>>> #define NFSD_FILE_HASHED (0) >>>>> #define NFSD_FILE_PENDING (1) >>>>> -#define NFSD_FILE_BREAK_READ (2) >>>>> -#define NFSD_FILE_BREAK_WRITE (3) >>>>> -#define NFSD_FILE_REFERENCED (4) >>>>> +#define NFSD_FILE_REFERENCED (2) >>>>> unsigned long nf_flags; >>>>> struct inode *nf_inode; /* don't deref */ >>>>> refcount_t nf_ref; >>>>> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h >>>>> index e9c5d0f56977..2bd867a96eba 100644 >>>>> --- a/fs/nfsd/trace.h >>>>> +++ b/fs/nfsd/trace.h >>>>> @@ -758,8 +758,6 @@ DEFINE_CLID_EVENT(confirmed_r); >>>>> __print_flags(val, "|", \ >>>>> { 1 << NFSD_FILE_HASHED, "HASHED" }, \ >>>>> { 1 << NFSD_FILE_PENDING, "PENDING" }, \ >>>>> - { 1 << NFSD_FILE_BREAK_READ, "BREAK_READ" }, \ >>>>> - { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" }, \ >>>>> { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}) >>>>> >>>>> DECLARE_EVENT_CLASS(nfsd_file_class, >>>>> -- >>>>> 2.37.1 >>>>> >>>> >>>> -- >>>> Chuck Lever >>>> >>>> >>>> >>> >>> -- >>> Jeff Layton <jlayton@kernel.org> >> >> -- >> Chuck Lever >> >> >> > > -- > Jeff Layton <jlayton@kernel.org> -- Chuck Lever
> On Jul 29, 2022, at 1:46 PM, Chuck Lever III <chuck.lever@oracle.com> wrote: > > > >> On Jul 29, 2022, at 1:34 PM, Jeff Layton <jlayton@kernel.org> wrote: >> >> On Fri, 2022-07-29 at 17:21 +0000, Chuck Lever III wrote: >>> >>>> On Jul 29, 2022, at 12:47 PM, Jeff Layton <jlayton@kernel.org> wrote: >>>> >>>> We had a report from the spring Bake-a-thon of data corruption in some >>>> nfstest_interop tests. Looking at the traces showed the NFS server >>>> allowing a v3 WRITE to proceed while a read delegation was still >>>> outstanding. >>>> >>>> Currently, we only set NFSD_FILE_BREAK_* flags if >>>> NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc. >>>> NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for >>>> COMMIT ops, where we need a writeable filehandle but don't need to >>>> break read leases. >>>> >>>> It doesn't make any sense to consult that flag when allocating a file >>>> since the file may be used on subsequent calls where we do want to break >>>> the lease (and the usage of it here seems to be reverse from what it >>>> should be anyway). >>>> >>>> Also, after calling nfsd_open_break_lease, we don't want to clear the >>>> BREAK_* bits. A lease could end up being set on it later (more than >>>> once) and we need to be able to break those leases as well. >>>> >>>> This means that the NFSD_FILE_BREAK_* flags now just mirror >>>> NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just >>>> drop those flags and unconditionally call nfsd_open_break_lease every >>>> time. >>>> >>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360 >>>> Fixes: 65294c1f2c5e (nfsd: add a new struct file caching facility to nfsd) >>>> Reported-by: Olga Kornieskaia <kolga@netapp.com> >>>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>> >>> I'm going to go out on a limb and predict this will conflict >>> heavily with the filecache overhaul patches I have queued for >>> next. :-) >>> >>> Do you believe this is something that urgently needs to be >>> backported to stable kernels, or can it be rebased on top of >>> the filecache overhaul work? >>> >>> >> >> I based this on top of your for-next branch and I think the filecache is >> already in there. >> >> It's a pretty nasty bug that we probably will want backported, so it >> might make sense to respin this on top of mainline and put it in ahead >> of the filecache overhaul. > > I am a generally a proponent of enabling fix backports. > > I encourage you to test the respin on 5.19 and 5.18 at least > because the moment that patch hits upstream, Sasha and Greg > will pull it into stable. I don't relish the idea of having > to fix the fix, if you catch my drift. > > And perhaps when you repost, the fix should be reordered > before the patches that add the tracepoints. Well... let's rebase the fix, but the tracepoint patches can go in after the filecache overhaul, I think. That should make it a little easier. -- Chuck Lever
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 4758c2a3fcf8..7e566ddca388 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -283,7 +283,7 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode) } static struct nfsd_file * -nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) +nfsd_file_alloc(struct nfsd_file_lookup_key *key) { struct nfsd_file *nf; @@ -301,12 +301,6 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) /* nf_ref is pre-incremented for hash table */ refcount_set(&nf->nf_ref, 2); nf->nf_may = key->need; - if (may & NFSD_MAY_NOT_BREAK_LEASE) { - if (may & NFSD_MAY_WRITE) - __set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags); - if (may & NFSD_MAY_READ) - __set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); - } nf->nf_mark = NULL; } return nf; @@ -1090,7 +1084,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, if (nf) goto wait_for_construction; - new = nfsd_file_alloc(&key, may_flags); + new = nfsd_file_alloc(&key); if (!new) { status = nfserr_jukebox; goto out_status; @@ -1130,21 +1124,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, nfsd_file_lru_remove(nf); this_cpu_inc(nfsd_file_cache_hits); - if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) { - bool write = (may_flags & NFSD_MAY_WRITE); - - if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) || - (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) { - status = nfserrno(nfsd_open_break_lease( - file_inode(nf->nf_file), may_flags)); - if (status == nfs_ok) { - clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); - if (write) - clear_bit(NFSD_FILE_BREAK_WRITE, - &nf->nf_flags); - } - } - } + status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); out: if (status == nfs_ok) { if (open) diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index d534b76cb65b..8e8c0c47d67d 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -37,9 +37,7 @@ struct nfsd_file { struct net *nf_net; #define NFSD_FILE_HASHED (0) #define NFSD_FILE_PENDING (1) -#define NFSD_FILE_BREAK_READ (2) -#define NFSD_FILE_BREAK_WRITE (3) -#define NFSD_FILE_REFERENCED (4) +#define NFSD_FILE_REFERENCED (2) unsigned long nf_flags; struct inode *nf_inode; /* don't deref */ refcount_t nf_ref; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index e9c5d0f56977..2bd867a96eba 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -758,8 +758,6 @@ DEFINE_CLID_EVENT(confirmed_r); __print_flags(val, "|", \ { 1 << NFSD_FILE_HASHED, "HASHED" }, \ { 1 << NFSD_FILE_PENDING, "PENDING" }, \ - { 1 << NFSD_FILE_BREAK_READ, "BREAK_READ" }, \ - { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" }, \ { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}) DECLARE_EVENT_CLASS(nfsd_file_class,
We had a report from the spring Bake-a-thon of data corruption in some nfstest_interop tests. Looking at the traces showed the NFS server allowing a v3 WRITE to proceed while a read delegation was still outstanding. Currently, we only set NFSD_FILE_BREAK_* flags if NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc. NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for COMMIT ops, where we need a writeable filehandle but don't need to break read leases. It doesn't make any sense to consult that flag when allocating a file since the file may be used on subsequent calls where we do want to break the lease (and the usage of it here seems to be reverse from what it should be anyway). Also, after calling nfsd_open_break_lease, we don't want to clear the BREAK_* bits. A lease could end up being set on it later (more than once) and we need to be able to break those leases as well. This means that the NFSD_FILE_BREAK_* flags now just mirror NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just drop those flags and unconditionally call nfsd_open_break_lease every time. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360 Fixes: 65294c1f2c5e (nfsd: add a new struct file caching facility to nfsd) Reported-by: Olga Kornieskaia <kolga@netapp.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/nfsd/filecache.c | 26 +++----------------------- fs/nfsd/filecache.h | 4 +--- fs/nfsd/trace.h | 2 -- 3 files changed, 4 insertions(+), 28 deletions(-)