Message ID | 20220118220016.GB16108@fieldses.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] lockd: fix server crash on reboot of client holding lock | expand |
On Tue, Jan 18, 2022 at 05:00:16PM -0500, Bruce Fields wrote: > From: "J. Bruce Fields" <bfields@redhat.com> > > I thought I was iterating over the array when actually the iteration is > over the values contained in the array? > : Would you like me to apply this against a 5.15.x kernel and test locally? Or should I just wait for a 5.15.x stable series update which includes it? Regards jonathan
On Wed, Jan 19, 2022 at 08:50:50AM +1030, Jonathan Woithe wrote: > On Tue, Jan 18, 2022 at 05:00:16PM -0500, Bruce Fields wrote: > > From: "J. Bruce Fields" <bfields@redhat.com> > > > > I thought I was iterating over the array when actually the iteration is > > over the values contained in the array? > > : > > Would you like me to apply this against a 5.15.x kernel and test locally? > Or should I just wait for a 5.15.x stable series update which includes it? I'm pretty confident I'm reproducing the same problem you saw, so it'd be fine to just wait for an update. (But if you do test these patches, let us know, one more confirmation never hurts.) --b.
> On Jan 18, 2022, at 5:00 PM, Bruce Fields <bfields@fieldses.org> wrote: > > From: "J. Bruce Fields" <bfields@redhat.com> > > I thought I was iterating over the array when actually the iteration is > over the values contained in the array? > > Ugh, keep it simple. > > Symptoms were a null deference in vfs_lock_file() when an NFSv3 client > that previously held a lock came back up and sent a notify. > > Reported-by: Jonathan Woithe <jwoithe@just42.net> > Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file") > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > --- > fs/lockd/svcsubs.c | 17 +++++++++-------- > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c > index cb3a7512c33e..54c2e42130ca 100644 > --- a/fs/lockd/svcsubs.c > +++ b/fs/lockd/svcsubs.c > @@ -179,19 +179,20 @@ nlm_delete_file(struct nlm_file *file) > static int nlm_unlock_files(struct nlm_file *file) > { > struct file_lock lock; > - struct file *f; > > lock.fl_type = F_UNLCK; > lock.fl_start = 0; > lock.fl_end = OFFSET_MAX; > - for (f = file->f_file[0]; f <= file->f_file[1]; f++) { > - if (f && vfs_lock_file(f, F_SETLK, &lock, NULL) < 0) { > - pr_warn("lockd: unlock failure in %s:%d\n", > - __FILE__, __LINE__); > - return 1; > - } > - } > + if (file->f_file[O_RDONLY] && > + vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, &lock, NULL)) > + goto out_err; > + if (file->f_file[O_WRONLY] && > + vfs_lock_file(file->f_file[O_WRONLY], F_SETLK, &lock, NULL)) > + goto out_err; > return 0; > +out_err: > + pr_warn("lockd: unlock failure in %s:%d\n", __FILE__, __LINE__); > + return 1; > } > > /* > -- > 2.34.1 > Hi Bruce, thanks for the fixes. They've passed my basic smoke tests. I've applied both patches for the very next nfsd PR. See: git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git for-next -- Chuck Lever
On Wed, Jan 19, 2022 at 04:18:10PM +0000, Chuck Lever III wrote: > > On Jan 18, 2022, at 5:00 PM, Bruce Fields <bfields@fieldses.org> wrote: > > > > From: "J. Bruce Fields" <bfields@redhat.com> > > > > I thought I was iterating over the array when actually the iteration is > > over the values contained in the array? > > > > Ugh, keep it simple. > > > > Symptoms were a null deference in vfs_lock_file() when an NFSv3 client > > that previously held a lock came back up and sent a notify. > > > > Reported-by: Jonathan Woithe <jwoithe@just42.net> > > Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file") > > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > > --- > > : > Hi Bruce, thanks for the fixes. They've passed my basic smoke tests. > I've applied both patches for the very next nfsd PR. See: > > git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git for-next OOI, is it expected that these fixes will appear in a 5.15.x stable branch patch at some point? I've looked at the 5.15.17 and 5.15.18 changelogs and they don't appear to be mentioned yet. Regards jonathan
> On Jan 31, 2022, at 5:20 PM, Jonathan Woithe <jwoithe@just42.net> wrote: > > On Wed, Jan 19, 2022 at 04:18:10PM +0000, Chuck Lever III wrote: >>> On Jan 18, 2022, at 5:00 PM, Bruce Fields <bfields@fieldses.org> wrote: >>> >>> From: "J. Bruce Fields" <bfields@redhat.com> >>> >>> I thought I was iterating over the array when actually the iteration is >>> over the values contained in the array? >>> >>> Ugh, keep it simple. >>> >>> Symptoms were a null deference in vfs_lock_file() when an NFSv3 client >>> that previously held a lock came back up and sent a notify. >>> >>> Reported-by: Jonathan Woithe <jwoithe@just42.net> >>> Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file") >>> Signed-off-by: J. Bruce Fields <bfields@redhat.com> >>> --- >>> : >> Hi Bruce, thanks for the fixes. They've passed my basic smoke tests. >> I've applied both patches for the very next nfsd PR. See: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git for-next > > OOI, is it expected that these fixes will appear in a 5.15.x stable branch > patch at some point? I've looked at the 5.15.17 and 5.15.18 changelogs and > they don't appear to be mentioned yet. These are in linux-next right now. I intend to send a pull request this week. I was waiting for some other fixes, but those are not going as quickly as I hoped, so I'm going to send these two along with another fix that are all ready now. Once they are in v5.17-rc, they will be picked up automatically and applied to open stable branches. If they do not apply cleanly, then someone will have to construct and test a version of the fixes specifically for each of the stable branches. -- Chuck Lever
On Tue, Jan 18, 2022 at 05:27:03PM -0500, Bruce Fields wrote: > On Wed, Jan 19, 2022 at 08:50:50AM +1030, Jonathan Woithe wrote: > > On Tue, Jan 18, 2022 at 05:00:16PM -0500, Bruce Fields wrote: > > > From: "J. Bruce Fields" <bfields@redhat.com> > > > > > > I thought I was iterating over the array when actually the iteration is > > > over the values contained in the array? > > > : > > > > Would you like me to apply this against a 5.15.x kernel and test locally? > > Or should I just wait for a 5.15.x stable series update which includes it? > > I'm pretty confident I'm reproducing the same problem you saw, so it'd > be fine to just wait for an update. > > (But if you do test these patches, let us know, one more confirmation > never hurts.) The shift back to a 5.15.x kernel ended up being delayed for a while for various reasons. The server concerned was eventually upgraded to 5.15.27 on 9 March 2022. In that time, client machines have been turned on and off and inevitably the conditions which caused the crash have been exercised many times (libreoffice, firefox and thunderbird are used daily on almost all of the clients). The server has not experienced the crash since the upgrade. On the basis of this I think it's fair to consider our problem solved. Thanks for your quick response to the report and the rapid provision of the fix. Regards jonathan
On Thu, Mar 24, 2022 at 10:03:23AM +1030, Jonathan Woithe wrote: > On Tue, Jan 18, 2022 at 05:27:03PM -0500, Bruce Fields wrote: > > On Wed, Jan 19, 2022 at 08:50:50AM +1030, Jonathan Woithe wrote: > > > On Tue, Jan 18, 2022 at 05:00:16PM -0500, Bruce Fields wrote: > > > > From: "J. Bruce Fields" <bfields@redhat.com> > > > > > > > > I thought I was iterating over the array when actually the iteration is > > > > over the values contained in the array? > > > > : > > > > > > Would you like me to apply this against a 5.15.x kernel and test locally? > > > Or should I just wait for a 5.15.x stable series update which includes it? > > > > I'm pretty confident I'm reproducing the same problem you saw, so it'd > > be fine to just wait for an update. > > > > (But if you do test these patches, let us know, one more confirmation > > never hurts.) > > The shift back to a 5.15.x kernel ended up being delayed for a while for > various reasons. The server concerned was eventually upgraded to 5.15.27 on > 9 March 2022. In that time, client machines have been turned on and off and > inevitably the conditions which caused the crash have been exercised many > times (libreoffice, firefox and thunderbird are used daily on almost all of > the clients). The server has not experienced the crash since the upgrade. > On the basis of this I think it's fair to consider our problem solved. Thanks for the confirmation.--b.
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index cb3a7512c33e..54c2e42130ca 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -179,19 +179,20 @@ nlm_delete_file(struct nlm_file *file) static int nlm_unlock_files(struct nlm_file *file) { struct file_lock lock; - struct file *f; lock.fl_type = F_UNLCK; lock.fl_start = 0; lock.fl_end = OFFSET_MAX; - for (f = file->f_file[0]; f <= file->f_file[1]; f++) { - if (f && vfs_lock_file(f, F_SETLK, &lock, NULL) < 0) { - pr_warn("lockd: unlock failure in %s:%d\n", - __FILE__, __LINE__); - return 1; - } - } + if (file->f_file[O_RDONLY] && + vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, &lock, NULL)) + goto out_err; + if (file->f_file[O_WRONLY] && + vfs_lock_file(file->f_file[O_WRONLY], F_SETLK, &lock, NULL)) + goto out_err; return 0; +out_err: + pr_warn("lockd: unlock failure in %s:%d\n", __FILE__, __LINE__); + return 1; } /*