mbox series

[0/3] Fix RELEASE_LOCKOWNER

Message ID 170681433624.15250.5881267576986350500.stgit@klimt.1015granger.net (mailing list archive)
Headers show
Series Fix RELEASE_LOCKOWNER | expand

Message

Chuck Lever Feb. 1, 2024, 7:06 p.m. UTC
Passes pynfs, fstests, and the git regression suite. Please apply
these to origin/linux-5.4.y.

---

Chuck Lever (2):
      NFSD: Modernize nfsd4_release_lockowner()
      NFSD: Add documenting comment for nfsd4_release_lockowner()

NeilBrown (1):
      nfsd: fix RELEASE_LOCKOWNER


 fs/nfsd/nfs4state.c | 65 +++++++++++++++++++++++++--------------------
 1 file changed, 36 insertions(+), 29 deletions(-)

--
Chuck Lever

Comments

NeilBrown Feb. 1, 2024, 10:24 p.m. UTC | #1
On Fri, 02 Feb 2024, Chuck Lever wrote:
> Passes pynfs, fstests, and the git regression suite. Please apply
> these to origin/linux-5.4.y.

I should have mentioned this a day or two ago but I hadn't quite made
all the connection yet...

The RELEASE_LOCKOWNER bug was masking a double-free bug that was fixed
by
Commit 47446d74f170 ("nfsd4: add refcount for nfsd4_blocked_lock")
which landed in v5.17 and wasn't marked as a bugfix, and so has not gone to
stable kernels.

Any kernel earlier than v5.17 that receives the RELEASE_LOCKOWNER fix
also needs the nfsd4_blocked_lock fix.  There is a minor follow-up fix
for that nfsd4_blocked_lock fix which Chuck queued yesterday.

The problem scenario is that an nfsd4_lock() call finds a conflicting
lock and so has a reference to a particular nfsd4_blocked_lock.  A concurrent
nfsd4_read_lockowner call frees all the nfsd4_blocked_locks including
the one held in nfsd4_lock().  nfsd4_lock then tries to free the
blocked_lock it has, and results in a double-free or a use-after-free.

Before either patch is applied, the extra reference on the lock-owner
than nfsd4_lock holds causes nfsd4_realease_lockowner() to incorrectly
return an error and NOT free the blocks_lock.
With only the RELEASE_LOCKOWNER fix applied, the double-free happens.
With both patches applied the refcount on the nfsd4_blocked_lock prevents
the double-free.

Kernels before 4.9 are (probably) not affected as they didn't have
find_or_allocate_block() which takes the second reference to a shared
object.  But that is ancient history - those kernels are well past EOL.

Thanks,
NeilBrown


> 
> ---
> 
> Chuck Lever (2):
>       NFSD: Modernize nfsd4_release_lockowner()
>       NFSD: Add documenting comment for nfsd4_release_lockowner()
> 
> NeilBrown (1):
>       nfsd: fix RELEASE_LOCKOWNER
> 
> 
>  fs/nfsd/nfs4state.c | 65 +++++++++++++++++++++++++--------------------
>  1 file changed, 36 insertions(+), 29 deletions(-)
> 
> --
> Chuck Lever
> 
> 
>
Chuck Lever III Feb. 2, 2024, 2:12 p.m. UTC | #2
> On Feb 1, 2024, at 5:24 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Fri, 02 Feb 2024, Chuck Lever wrote:
>> Passes pynfs, fstests, and the git regression suite. Please apply
>> these to origin/linux-5.4.y.
> 
> I should have mentioned this a day or two ago but I hadn't quite made
> all the connection yet...
> 
> The RELEASE_LOCKOWNER bug was masking a double-free bug that was fixed
> by
> Commit 47446d74f170 ("nfsd4: add refcount for nfsd4_blocked_lock")
> which landed in v5.17 and wasn't marked as a bugfix, and so has not gone to
> stable kernels.

Then, instructions to stable@vger.kernel.org:

Do not apply the patches I just sent for 5.15, 5.10, and 5.4. I will
apply 47446d74f170, run the tests again, and resend.


> Any kernel earlier than v5.17 that receives the RELEASE_LOCKOWNER fix
> also needs the nfsd4_blocked_lock fix.  There is a minor follow-up fix
> for that nfsd4_blocked_lock fix which Chuck queued yesterday.
> 
> The problem scenario is that an nfsd4_lock() call finds a conflicting
> lock and so has a reference to a particular nfsd4_blocked_lock.  A concurrent
> nfsd4_read_lockowner call frees all the nfsd4_blocked_locks including
> the one held in nfsd4_lock().  nfsd4_lock then tries to free the
> blocked_lock it has, and results in a double-free or a use-after-free.
> 
> Before either patch is applied, the extra reference on the lock-owner
> than nfsd4_lock holds causes nfsd4_realease_lockowner() to incorrectly
> return an error and NOT free the blocks_lock.
> With only the RELEASE_LOCKOWNER fix applied, the double-free happens.

Our test suite currently does not exercise this use case, apparently.
I didn't see a problem like this during testing.


> With both patches applied the refcount on the nfsd4_blocked_lock prevents
> the double-free.
> 
> Kernels before 4.9 are (probably) not affected as they didn't have
> find_or_allocate_block() which takes the second reference to a shared
> object.  But that is ancient history - those kernels are well past EOL.
> 
> Thanks,
> NeilBrown
> 
> 
>> 
>> ---
>> 
>> Chuck Lever (2):
>>      NFSD: Modernize nfsd4_release_lockowner()
>>      NFSD: Add documenting comment for nfsd4_release_lockowner()
>> 
>> NeilBrown (1):
>>      nfsd: fix RELEASE_LOCKOWNER
>> 
>> 
>> fs/nfsd/nfs4state.c | 65 +++++++++++++++++++++++++--------------------
>> 1 file changed, 36 insertions(+), 29 deletions(-)
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> 

--
Chuck Lever
NeilBrown Feb. 3, 2024, 12:26 a.m. UTC | #3
On Sat, 03 Feb 2024, Chuck Lever III wrote:
> 
> 
> > On Feb 1, 2024, at 5:24 PM, NeilBrown <neilb@suse.de> wrote:
> > 
> > On Fri, 02 Feb 2024, Chuck Lever wrote:
> >> Passes pynfs, fstests, and the git regression suite. Please apply
> >> these to origin/linux-5.4.y.
> > 
> > I should have mentioned this a day or two ago but I hadn't quite made
> > all the connection yet...
> > 
> > The RELEASE_LOCKOWNER bug was masking a double-free bug that was fixed
> > by
> > Commit 47446d74f170 ("nfsd4: add refcount for nfsd4_blocked_lock")
> > which landed in v5.17 and wasn't marked as a bugfix, and so has not gone to
> > stable kernels.
> 
> Then, instructions to stable@vger.kernel.org:
> 
> Do not apply the patches I just sent for 5.15, 5.10, and 5.4. I will
> apply 47446d74f170, run the tests again, and resend.
> 
> 
> > Any kernel earlier than v5.17 that receives the RELEASE_LOCKOWNER fix
> > also needs the nfsd4_blocked_lock fix.  There is a minor follow-up fix
> > for that nfsd4_blocked_lock fix which Chuck queued yesterday.
> > 
> > The problem scenario is that an nfsd4_lock() call finds a conflicting
> > lock and so has a reference to a particular nfsd4_blocked_lock.  A concurrent
> > nfsd4_read_lockowner call frees all the nfsd4_blocked_locks including
> > the one held in nfsd4_lock().  nfsd4_lock then tries to free the
> > blocked_lock it has, and results in a double-free or a use-after-free.
> > 
> > Before either patch is applied, the extra reference on the lock-owner
> > than nfsd4_lock holds causes nfsd4_realease_lockowner() to incorrectly
> > return an error and NOT free the blocks_lock.
> > With only the RELEASE_LOCKOWNER fix applied, the double-free happens.
> 
> Our test suite currently does not exercise this use case, apparently.
> I didn't see a problem like this during testing.
> 

Our OpenQA testing found it (as did our customer :-).
Quoting from a bugzilla that unfortunately is not public (though might
not be accessible to anyone with an account)

https://bugzilla.suse.com/show_bug.cgi?id=1219349

     LTP test nfslock01.sh randomly fails on the latest SLE-15SP4 and
     SLE-15SP5 KOTD.  The failures appear only when testing NFS protocol
     v4.0, other versions do not seem to be affected.  The test either
     gets stuck or sometimes triggers kernel oops.  The contents of the
     kernel backtrace vary.  All archs appear to be affected. 

Does your test suite cover v4.0?  Does it include LTP ?

Thanks,
NeilBrown


> 
> > With both patches applied the refcount on the nfsd4_blocked_lock prevents
> > the double-free.
> > 
> > Kernels before 4.9 are (probably) not affected as they didn't have
> > find_or_allocate_block() which takes the second reference to a shared
> > object.  But that is ancient history - those kernels are well past EOL.
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> >> 
> >> ---
> >> 
> >> Chuck Lever (2):
> >>      NFSD: Modernize nfsd4_release_lockowner()
> >>      NFSD: Add documenting comment for nfsd4_release_lockowner()
> >> 
> >> NeilBrown (1):
> >>      nfsd: fix RELEASE_LOCKOWNER
> >> 
> >> 
> >> fs/nfsd/nfs4state.c | 65 +++++++++++++++++++++++++--------------------
> >> 1 file changed, 36 insertions(+), 29 deletions(-)
> >> 
> >> --
> >> Chuck Lever
> >> 
> >> 
> >> 
> > 
> 
> --
> Chuck Lever
> 
> 
>
Greg KH Feb. 3, 2024, 1:04 a.m. UTC | #4
On Fri, Feb 02, 2024 at 02:12:11PM +0000, Chuck Lever III wrote:
> 
> 
> > On Feb 1, 2024, at 5:24 PM, NeilBrown <neilb@suse.de> wrote:
> > 
> > On Fri, 02 Feb 2024, Chuck Lever wrote:
> >> Passes pynfs, fstests, and the git regression suite. Please apply
> >> these to origin/linux-5.4.y.
> > 
> > I should have mentioned this a day or two ago but I hadn't quite made
> > all the connection yet...
> > 
> > The RELEASE_LOCKOWNER bug was masking a double-free bug that was fixed
> > by
> > Commit 47446d74f170 ("nfsd4: add refcount for nfsd4_blocked_lock")
> > which landed in v5.17 and wasn't marked as a bugfix, and so has not gone to
> > stable kernels.
> 
> Then, instructions to stable@vger.kernel.org:
> 
> Do not apply the patches I just sent for 5.15, 5.10, and 5.4. I will
> apply 47446d74f170, run the tests again, and resend.

Thanks for letting us know, I've dropped them from my review queue.

greg k-h
Chuck Lever III Feb. 3, 2024, 3:43 a.m. UTC | #5
> On Feb 2, 2024, at 7:26 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Sat, 03 Feb 2024, Chuck Lever III wrote:
>> 
>> 
>>> On Feb 1, 2024, at 5:24 PM, NeilBrown <neilb@suse.de> wrote:
>>> 
>>> On Fri, 02 Feb 2024, Chuck Lever wrote:
>>>> Passes pynfs, fstests, and the git regression suite. Please apply
>>>> these to origin/linux-5.4.y.
>>> 
>>> I should have mentioned this a day or two ago but I hadn't quite made
>>> all the connection yet...
>>> 
>>> The RELEASE_LOCKOWNER bug was masking a double-free bug that was fixed
>>> by
>>> Commit 47446d74f170 ("nfsd4: add refcount for nfsd4_blocked_lock")
>>> which landed in v5.17 and wasn't marked as a bugfix, and so has not gone to
>>> stable kernels.
>> 
>> Then, instructions to stable@vger.kernel.org:
>> 
>> Do not apply the patches I just sent for 5.15, 5.10, and 5.4. I will
>> apply 47446d74f170, run the tests again, and resend.
>> 
>> 
>>> Any kernel earlier than v5.17 that receives the RELEASE_LOCKOWNER fix
>>> also needs the nfsd4_blocked_lock fix.  There is a minor follow-up fix
>>> for that nfsd4_blocked_lock fix which Chuck queued yesterday.
>>> 
>>> The problem scenario is that an nfsd4_lock() call finds a conflicting
>>> lock and so has a reference to a particular nfsd4_blocked_lock.  A concurrent
>>> nfsd4_read_lockowner call frees all the nfsd4_blocked_locks including
>>> the one held in nfsd4_lock().  nfsd4_lock then tries to free the
>>> blocked_lock it has, and results in a double-free or a use-after-free.
>>> 
>>> Before either patch is applied, the extra reference on the lock-owner
>>> than nfsd4_lock holds causes nfsd4_realease_lockowner() to incorrectly
>>> return an error and NOT free the blocks_lock.
>>> With only the RELEASE_LOCKOWNER fix applied, the double-free happens.
>> 
>> Our test suite currently does not exercise this use case, apparently.
>> I didn't see a problem like this during testing.
>> 
> 
> Our OpenQA testing found it (as did our customer :-).
> Quoting from a bugzilla that unfortunately is not public (though might
> not be accessible to anyone with an account)
> 
> https://bugzilla.suse.com/show_bug.cgi?id=1219349
> 
>     LTP test nfslock01.sh randomly fails on the latest SLE-15SP4 and
>     SLE-15SP5 KOTD.  The failures appear only when testing NFS protocol
>     v4.0, other versions do not seem to be affected.  The test either
>     gets stuck or sometimes triggers kernel oops.  The contents of the
>     kernel backtrace vary.  All archs appear to be affected. 
> 
> Does your test suite cover v4.0?

pynfs covers v4.0 and v4.1.
I ran fstests and the git regression suite with only NFSv4.0.


> Does it include LTP ?

kdevops doesn't include an LTP workflow at this time.


> Thanks,
> NeilBrown
> 
> 
>> 
>>> With both patches applied the refcount on the nfsd4_blocked_lock prevents
>>> the double-free.
>>> 
>>> Kernels before 4.9 are (probably) not affected as they didn't have
>>> find_or_allocate_block() which takes the second reference to a shared
>>> object.  But that is ancient history - those kernels are well past EOL.
>>> 
>>> Thanks,
>>> NeilBrown
>>> 
>>> 
>>>> 
>>>> ---
>>>> 
>>>> Chuck Lever (2):
>>>>     NFSD: Modernize nfsd4_release_lockowner()
>>>>     NFSD: Add documenting comment for nfsd4_release_lockowner()
>>>> 
>>>> NeilBrown (1):
>>>>     nfsd: fix RELEASE_LOCKOWNER
>>>> 
>>>> 
>>>> fs/nfsd/nfs4state.c | 65 +++++++++++++++++++++++++--------------------
>>>> 1 file changed, 36 insertions(+), 29 deletions(-)
>>>> 
>>>> --
>>>> Chuck Lever
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> --
>> Chuck Lever


--
Chuck Lever