diff mbox series

[1/1] nfsd: fix race between laundromat and free_stateid

Message ID 20241010221801.35462-1-okorniev@redhat.com (mailing list archive)
State New
Headers show
Series [1/1] nfsd: fix race between laundromat and free_stateid | expand

Commit Message

Olga Kornievskaia Oct. 10, 2024, 10:18 p.m. UTC
There is a race between laundromat handling of revoked delegations
and a client sending free_stateid operation. Laundromat thread
finds that delegation has expired and needs to be revoked so it
marks the delegation stid revoked and it puts it on a reaper list
but then it unlock the state lock and the actual delegation revocation
happens without the lock. Once the stid is marked revoked a racing
free_stateid processing thread does the following (1) it calls
list_del_init() which removes it from the reaper list and (2) frees
the delegation stid structure. The laundromat thread ends up not
calling the revoke_delegation() function for this particular delegation
but that means it will no release the lock lease that exists on
the file.

Now, a new open for this file comes in and ends up finding that
lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
up trying to derefence a freed delegation stateid. Leading to the
followint use-after-free KASAN warning:

kernel: ==================================================================
kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
kernel:
kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9
kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
kernel: Call trace:
kernel: dump_backtrace+0x98/0x120
kernel: show_stack+0x1c/0x30
kernel: dump_stack_lvl+0x80/0xe8
kernel: print_address_description.constprop.0+0x84/0x390
kernel: print_report+0xa4/0x268
kernel: kasan_report+0xb4/0xf8
kernel: __asan_report_load8_noabort+0x1c/0x28
kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: leases_conflict+0x68/0x370
kernel: __break_lease+0x204/0xc38
kernel: nfsd_open_break_lease+0x8c/0xf0 [nfsd]
kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
kernel: nfsd4_open+0xa08/0xe80 [nfsd]
kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
kernel: svc_process+0x3d4/0x7e0 [sunrpc]
kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
kernel: nfsd+0x270/0x400 [nfsd]
kernel: kthread+0x288/0x310
kernel: ret_from_fork+0x10/0x20

Proposing to have laundromat thread hold the state_lock over both
marking thru revoking the delegation as well as making free_stateid
acquire state_lock before accessing the list. Making sure that
revoke_delegation() (ie kernel_setlease(unlock)) is called for
every delegation that was revoked and added to the reaper list.

CC: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>

--- I can't figure out the Fixes: tag. Laundromat's behaviour has
been like that forever. But the free_stateid bits wont apply before
the 1e3577a4521e ("SUNRPC: discard sv_refcnt, and svc_get/svc_put").
But we used that fixes tag already with a previous fix for a different
problem.
---
 fs/nfsd/nfs4state.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Jeff Layton Oct. 11, 2024, 1:04 p.m. UTC | #1
On Thu, 2024-10-10 at 18:18 -0400, Olga Kornievskaia wrote:
> There is a race between laundromat handling of revoked delegations
> and a client sending free_stateid operation. Laundromat thread
> finds that delegation has expired and needs to be revoked so it
> marks the delegation stid revoked and it puts it on a reaper list
> but then it unlock the state lock and the actual delegation revocation
> happens without the lock. Once the stid is marked revoked a racing
> free_stateid processing thread does the following (1) it calls
> list_del_init() which removes it from the reaper list and (2) frees
> the delegation stid structure. The laundromat thread ends up not
> calling the revoke_delegation() function for this particular delegation
> but that means it will no release the lock lease that exists on
> the file.
> 
> Now, a new open for this file comes in and ends up finding that
> lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
> up trying to derefence a freed delegation stateid. Leading to the
> followint use-after-free KASAN warning:
> 
> kernel: ==================================================================
> kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
> kernel:
> kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9
> kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
> kernel: Call trace:
> kernel: dump_backtrace+0x98/0x120
> kernel: show_stack+0x1c/0x30
> kernel: dump_stack_lvl+0x80/0xe8
> kernel: print_address_description.constprop.0+0x84/0x390
> kernel: print_report+0xa4/0x268
> kernel: kasan_report+0xb4/0xf8
> kernel: __asan_report_load8_noabort+0x1c/0x28
> kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> kernel: leases_conflict+0x68/0x370
> kernel: __break_lease+0x204/0xc38
> kernel: nfsd_open_break_lease+0x8c/0xf0 [nfsd]
> kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
> kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
> kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
> kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
> kernel: nfsd4_open+0xa08/0xe80 [nfsd]
> kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
> kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
> kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
> kernel: svc_process+0x3d4/0x7e0 [sunrpc]
> kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
> kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
> kernel: nfsd+0x270/0x400 [nfsd]
> kernel: kthread+0x288/0x310
> kernel: ret_from_fork+0x10/0x20
> 
> Proposing to have laundromat thread hold the state_lock over both
> marking thru revoking the delegation as well as making free_stateid
> acquire state_lock before accessing the list. Making sure that
> revoke_delegation() (ie kernel_setlease(unlock)) is called for
> every delegation that was revoked and added to the reaper list.
> 

Nice detective work!

> CC: stable@vger.kernel.org
> Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
> 
> --- I can't figure out the Fixes: tag. Laundromat's behaviour has
> been like that forever. But the free_stateid bits wont apply before
> the 1e3577a4521e ("SUNRPC: discard sv_refcnt, and svc_get/svc_put").
> But we used that fixes tag already with a previous fix for a different
> problem.
> ---
>  fs/nfsd/nfs4state.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 9c2b1d251ab3..c97907d7fb38 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -6605,13 +6605,13 @@ nfs4_laundromat(struct nfsd_net *nn)
>  		unhash_delegation_locked(dp, SC_STATUS_REVOKED);
>  		list_add(&dp->dl_recall_lru, &reaplist);
>  	}
> -	spin_unlock(&state_lock);
>  	while (!list_empty(&reaplist)) {
>  		dp = list_first_entry(&reaplist, struct nfs4_delegation,
>  					dl_recall_lru);
>  		list_del_init(&dp->dl_recall_lru);
>  		revoke_delegation(dp);
>  	}
> +	spin_unlock(&state_lock);
>  
>  	spin_lock(&nn->client_lock);
>  	while (!list_empty(&nn->close_lru)) {
> @@ -7213,7 +7213,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		if (s->sc_status & SC_STATUS_REVOKED) {
>  			spin_unlock(&s->sc_lock);
>  			dp = delegstateid(s);
> +			spin_lock(&state_lock);
>  			list_del_init(&dp->dl_recall_lru);
> +			spin_unlock(&state_lock);
>  			spin_unlock(&cl->cl_lock);
>  			nfs4_put_stid(s);
>  			ret = nfs_ok;


I'm not thrilled with this patch, but it does seem like it would fix
the problem. Long term, I think we need to get rid of the state_lock,
but I don't have a real plan for that just yet as it's not 100% clear
what it still protects.

As far as a Fixes tag, this bug is likely very old. I'd say it probably
got introduced here:

    2d4a532d385f nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock

In any case, let's go with this patch until we can come up with a
better plan for delegation handling.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Trond Myklebust Oct. 11, 2024, 1:14 p.m. UTC | #2
On Fri, 2024-10-11 at 09:04 -0400, Jeff Layton wrote:
> On Thu, 2024-10-10 at 18:18 -0400, Olga Kornievskaia wrote:
> > There is a race between laundromat handling of revoked delegations
> > and a client sending free_stateid operation. Laundromat thread
> > finds that delegation has expired and needs to be revoked so it
> > marks the delegation stid revoked and it puts it on a reaper list
> > but then it unlock the state lock and the actual delegation
> > revocation
> > happens without the lock. Once the stid is marked revoked a racing
> > free_stateid processing thread does the following (1) it calls
> > list_del_init() which removes it from the reaper list and (2) frees
> > the delegation stid structure. The laundromat thread ends up not
> > calling the revoke_delegation() function for this particular
> > delegation
> > but that means it will no release the lock lease that exists on
> > the file.
> > 
> > Now, a new open for this file comes in and ends up finding that
> > lease list isn't empty and calls nfsd_breaker_owns_lease() which
> > ends
> > up trying to derefence a freed delegation stateid. Leading to the
> > followint use-after-free KASAN warning:
> > 
> > kernel:
> > ==================================================================
> > kernel: BUG: KASAN: slab-use-after-free in
> > nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
> > kernel:
> > kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not
> > tainted 6.11.0-rc7+ #9
> > kernel: Hardware name: Apple Inc. Apple Virtualization Generic
> > Platform, BIOS 2069.0.0.0.0 08/03/2024
> > kernel: Call trace:
> > kernel: dump_backtrace+0x98/0x120
> > kernel: show_stack+0x1c/0x30
> > kernel: dump_stack_lvl+0x80/0xe8
> > kernel: print_address_description.constprop.0+0x84/0x390
> > kernel: print_report+0xa4/0x268
> > kernel: kasan_report+0xb4/0xf8
> > kernel: __asan_report_load8_noabort+0x1c/0x28
> > kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > kernel: leases_conflict+0x68/0x370
> > kernel: __break_lease+0x204/0xc38
> > kernel: nfsd_open_break_lease+0x8c/0xf0 [nfsd]
> > kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
> > kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
> > kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
> > kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
> > kernel: nfsd4_open+0xa08/0xe80 [nfsd]
> > kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
> > kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
> > kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
> > kernel: svc_process+0x3d4/0x7e0 [sunrpc]
> > kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
> > kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
> > kernel: nfsd+0x270/0x400 [nfsd]
> > kernel: kthread+0x288/0x310
> > kernel: ret_from_fork+0x10/0x20
> > 
> > Proposing to have laundromat thread hold the state_lock over both
> > marking thru revoking the delegation as well as making free_stateid
> > acquire state_lock before accessing the list. Making sure that
> > revoke_delegation() (ie kernel_setlease(unlock)) is called for
> > every delegation that was revoked and added to the reaper list.
> > 
> 
> Nice detective work!
> 
> > CC: stable@vger.kernel.org
> > Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
> > 
> > --- I can't figure out the Fixes: tag. Laundromat's behaviour has
> > been like that forever. But the free_stateid bits wont apply before
> > the 1e3577a4521e ("SUNRPC: discard sv_refcnt, and
> > svc_get/svc_put").
> > But we used that fixes tag already with a previous fix for a
> > different
> > problem.
> > ---
> >  fs/nfsd/nfs4state.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 9c2b1d251ab3..c97907d7fb38 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -6605,13 +6605,13 @@ nfs4_laundromat(struct nfsd_net *nn)
> >  		unhash_delegation_locked(dp, SC_STATUS_REVOKED);
> >  		list_add(&dp->dl_recall_lru, &reaplist);
> >  	}
> > -	spin_unlock(&state_lock);
> >  	while (!list_empty(&reaplist)) {
> >  		dp = list_first_entry(&reaplist, struct
> > nfs4_delegation,
> >  					dl_recall_lru);
> >  		list_del_init(&dp->dl_recall_lru);
> >  		revoke_delegation(dp);
> >  	}
> > +	spin_unlock(&state_lock);
> >  
> >  	spin_lock(&nn->client_lock);
> >  	while (!list_empty(&nn->close_lru)) {
> > @@ -7213,7 +7213,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp,
> > struct nfsd4_compound_state *cstate,
> >  		if (s->sc_status & SC_STATUS_REVOKED) {
> >  			spin_unlock(&s->sc_lock);
> >  			dp = delegstateid(s);
> > +			spin_lock(&state_lock);
> >  			list_del_init(&dp->dl_recall_lru);
> > +			spin_unlock(&state_lock);
> >  			spin_unlock(&cl->cl_lock);

nfs4_set_delegation() takes these locks in the opposite order, so as it
stands, this patch introduces a potential ABBA deadlock.
I suggest moving the state_lock so that it is taken before and released
after the cl->cl_lock.


> >  			nfs4_put_stid(s);
> >  			ret = nfs_ok;
> 
> 
> I'm not thrilled with this patch, but it does seem like it would fix
> the problem. Long term, I think we need to get rid of the state_lock,
> but I don't have a real plan for that just yet as it's not 100% clear
> what it still protects.
> 
> As far as a Fixes tag, this bug is likely very old. I'd say it
> probably
> got introduced here:
> 
>     2d4a532d385f nfsd: ensure that clp->cl_revoked list is protected
> by clp->cl_lock
> 
> In any case, let's go with this patch until we can come up with a
> better plan for delegation handling.
> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>
Olga Kornievskaia Oct. 11, 2024, 1:43 p.m. UTC | #3
On Fri, Oct 11, 2024 at 9:14 AM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Fri, 2024-10-11 at 09:04 -0400, Jeff Layton wrote:
> > On Thu, 2024-10-10 at 18:18 -0400, Olga Kornievskaia wrote:
> > > There is a race between laundromat handling of revoked delegations
> > > and a client sending free_stateid operation. Laundromat thread
> > > finds that delegation has expired and needs to be revoked so it
> > > marks the delegation stid revoked and it puts it on a reaper list
> > > but then it unlock the state lock and the actual delegation
> > > revocation
> > > happens without the lock. Once the stid is marked revoked a racing
> > > free_stateid processing thread does the following (1) it calls
> > > list_del_init() which removes it from the reaper list and (2) frees
> > > the delegation stid structure. The laundromat thread ends up not
> > > calling the revoke_delegation() function for this particular
> > > delegation
> > > but that means it will no release the lock lease that exists on
> > > the file.
> > >
> > > Now, a new open for this file comes in and ends up finding that
> > > lease list isn't empty and calls nfsd_breaker_owns_lease() which
> > > ends
> > > up trying to derefence a freed delegation stateid. Leading to the
> > > followint use-after-free KASAN warning:
> > >
> > > kernel:
> > > ==================================================================
> > > kernel: BUG: KASAN: slab-use-after-free in
> > > nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > > kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
> > > kernel:
> > > kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not
> > > tainted 6.11.0-rc7+ #9
> > > kernel: Hardware name: Apple Inc. Apple Virtualization Generic
> > > Platform, BIOS 2069.0.0.0.0 08/03/2024
> > > kernel: Call trace:
> > > kernel: dump_backtrace+0x98/0x120
> > > kernel: show_stack+0x1c/0x30
> > > kernel: dump_stack_lvl+0x80/0xe8
> > > kernel: print_address_description.constprop.0+0x84/0x390
> > > kernel: print_report+0xa4/0x268
> > > kernel: kasan_report+0xb4/0xf8
> > > kernel: __asan_report_load8_noabort+0x1c/0x28
> > > kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > > kernel: leases_conflict+0x68/0x370
> > > kernel: __break_lease+0x204/0xc38
> > > kernel: nfsd_open_break_lease+0x8c/0xf0 [nfsd]
> > > kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
> > > kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
> > > kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
> > > kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
> > > kernel: nfsd4_open+0xa08/0xe80 [nfsd]
> > > kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
> > > kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
> > > kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
> > > kernel: svc_process+0x3d4/0x7e0 [sunrpc]
> > > kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
> > > kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
> > > kernel: nfsd+0x270/0x400 [nfsd]
> > > kernel: kthread+0x288/0x310
> > > kernel: ret_from_fork+0x10/0x20
> > >
> > > Proposing to have laundromat thread hold the state_lock over both
> > > marking thru revoking the delegation as well as making free_stateid
> > > acquire state_lock before accessing the list. Making sure that
> > > revoke_delegation() (ie kernel_setlease(unlock)) is called for
> > > every delegation that was revoked and added to the reaper list.
> > >
> >
> > Nice detective work!
> >
> > > CC: stable@vger.kernel.org
> > > Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
> > >
> > > --- I can't figure out the Fixes: tag. Laundromat's behaviour has
> > > been like that forever. But the free_stateid bits wont apply before
> > > the 1e3577a4521e ("SUNRPC: discard sv_refcnt, and
> > > svc_get/svc_put").
> > > But we used that fixes tag already with a previous fix for a
> > > different
> > > problem.
> > > ---
> > >  fs/nfsd/nfs4state.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index 9c2b1d251ab3..c97907d7fb38 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -6605,13 +6605,13 @@ nfs4_laundromat(struct nfsd_net *nn)
> > >             unhash_delegation_locked(dp, SC_STATUS_REVOKED);
> > >             list_add(&dp->dl_recall_lru, &reaplist);
> > >     }
> > > -   spin_unlock(&state_lock);
> > >     while (!list_empty(&reaplist)) {
> > >             dp = list_first_entry(&reaplist, struct
> > > nfs4_delegation,
> > >                                     dl_recall_lru);
> > >             list_del_init(&dp->dl_recall_lru);
> > >             revoke_delegation(dp);
> > >     }
> > > +   spin_unlock(&state_lock);
> > >
> > >     spin_lock(&nn->client_lock);
> > >     while (!list_empty(&nn->close_lru)) {
> > > @@ -7213,7 +7213,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp,
> > > struct nfsd4_compound_state *cstate,
> > >             if (s->sc_status & SC_STATUS_REVOKED) {
> > >                     spin_unlock(&s->sc_lock);
> > >                     dp = delegstateid(s);
> > > +                   spin_lock(&state_lock);
> > >                     list_del_init(&dp->dl_recall_lru);
> > > +                   spin_unlock(&state_lock);
> > >                     spin_unlock(&cl->cl_lock);
>
> nfs4_set_delegation() takes these locks in the opposite order, so as it
> stands, this patch introduces a potential ABBA deadlock.
> I suggest moving the state_lock so that it is taken before and released
> after the cl->cl_lock.

I understand the comment about the deadlock and the need but I would
like to make sure if I understand the direction. Are you saying that
the state_lock should be taken in the beginning of the
nfsd4_free_stated() before we take the client lock?

>
>
> > >                     nfs4_put_stid(s);
> > >                     ret = nfs_ok;
> >
> >
> > I'm not thrilled with this patch, but it does seem like it would fix
> > the problem. Long term, I think we need to get rid of the state_lock,
> > but I don't have a real plan for that just yet as it's not 100% clear
> > what it still protects.
> >
> > As far as a Fixes tag, this bug is likely very old. I'd say it
> > probably
> > got introduced here:
> >
> >     2d4a532d385f nfsd: ensure that clp->cl_revoked list is protected
> > by clp->cl_lock
> >
> > In any case, let's go with this patch until we can come up with a
> > better plan for delegation handling.
> >
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> >
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>
Olga Kornievskaia Oct. 11, 2024, 1:59 p.m. UTC | #4
On Fri, Oct 11, 2024 at 9:06 AM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Thu, 2024-10-10 at 18:18 -0400, Olga Kornievskaia wrote:
> > There is a race between laundromat handling of revoked delegations
> > and a client sending free_stateid operation. Laundromat thread
> > finds that delegation has expired and needs to be revoked so it
> > marks the delegation stid revoked and it puts it on a reaper list
> > but then it unlock the state lock and the actual delegation revocation
> > happens without the lock. Once the stid is marked revoked a racing
> > free_stateid processing thread does the following (1) it calls
> > list_del_init() which removes it from the reaper list and (2) frees
> > the delegation stid structure. The laundromat thread ends up not
> > calling the revoke_delegation() function for this particular delegation
> > but that means it will no release the lock lease that exists on
> > the file.
> >
> > Now, a new open for this file comes in and ends up finding that
> > lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
> > up trying to derefence a freed delegation stateid. Leading to the
> > followint use-after-free KASAN warning:
> >
> > kernel: ==================================================================
> > kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
> > kernel:
> > kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9
> > kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
> > kernel: Call trace:
> > kernel: dump_backtrace+0x98/0x120
> > kernel: show_stack+0x1c/0x30
> > kernel: dump_stack_lvl+0x80/0xe8
> > kernel: print_address_description.constprop.0+0x84/0x390
> > kernel: print_report+0xa4/0x268
> > kernel: kasan_report+0xb4/0xf8
> > kernel: __asan_report_load8_noabort+0x1c/0x28
> > kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > kernel: leases_conflict+0x68/0x370
> > kernel: __break_lease+0x204/0xc38
> > kernel: nfsd_open_break_lease+0x8c/0xf0 [nfsd]
> > kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
> > kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
> > kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
> > kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
> > kernel: nfsd4_open+0xa08/0xe80 [nfsd]
> > kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
> > kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
> > kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
> > kernel: svc_process+0x3d4/0x7e0 [sunrpc]
> > kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
> > kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
> > kernel: nfsd+0x270/0x400 [nfsd]
> > kernel: kthread+0x288/0x310
> > kernel: ret_from_fork+0x10/0x20
> >
> > Proposing to have laundromat thread hold the state_lock over both
> > marking thru revoking the delegation as well as making free_stateid
> > acquire state_lock before accessing the list. Making sure that
> > revoke_delegation() (ie kernel_setlease(unlock)) is called for
> > every delegation that was revoked and added to the reaper list.
> >
>
> Nice detective work!
>
> > CC: stable@vger.kernel.org
> > Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
> >
> > --- I can't figure out the Fixes: tag. Laundromat's behaviour has
> > been like that forever. But the free_stateid bits wont apply before
> > the 1e3577a4521e ("SUNRPC: discard sv_refcnt, and svc_get/svc_put").
> > But we used that fixes tag already with a previous fix for a different
> > problem.
> > ---
> >  fs/nfsd/nfs4state.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 9c2b1d251ab3..c97907d7fb38 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -6605,13 +6605,13 @@ nfs4_laundromat(struct nfsd_net *nn)
> >               unhash_delegation_locked(dp, SC_STATUS_REVOKED);
> >               list_add(&dp->dl_recall_lru, &reaplist);
> >       }
> > -     spin_unlock(&state_lock);
> >       while (!list_empty(&reaplist)) {
> >               dp = list_first_entry(&reaplist, struct nfs4_delegation,
> >                                       dl_recall_lru);
> >               list_del_init(&dp->dl_recall_lru);
> >               revoke_delegation(dp);
> >       }
> > +     spin_unlock(&state_lock);
> >
> >       spin_lock(&nn->client_lock);
> >       while (!list_empty(&nn->close_lru)) {
> > @@ -7213,7 +7213,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >               if (s->sc_status & SC_STATUS_REVOKED) {
> >                       spin_unlock(&s->sc_lock);
> >                       dp = delegstateid(s);
> > +                     spin_lock(&state_lock);
> >                       list_del_init(&dp->dl_recall_lru);
> > +                     spin_unlock(&state_lock);
> >                       spin_unlock(&cl->cl_lock);
> >                       nfs4_put_stid(s);
> >                       ret = nfs_ok;
>
>
> I'm not thrilled with this patch, but it does seem like it would fix
> the problem. Long term, I think we need to get rid of the state_lock,
> but I don't have a real plan for that just yet as it's not 100% clear
> what it still protects.

I wasn't thrilled with the patch either but I couldn't think of
something else so I thought to see what others think.

My apologies, I also realized that when I tried to make sure I
submitted the patch against the latest code I did this against
nfsd-next not nfsd-fixes. There should have been that line with
changed sc_status to LOCKED. Again, doesn't change the fix, just what
this patch was applied to before sending it...

I'll send out a v2 but also try to test what Trond suggest by doing
locking differently.

>
> As far as a Fixes tag, this bug is likely very old. I'd say it probably
> got introduced here:
>
>     2d4a532d385f nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock
>
> In any case, let's go with this patch until we can come up with a
> better plan for delegation handling.
>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>
Chuck Lever III Oct. 11, 2024, 2:39 p.m. UTC | #5
On Thu, Oct 10, 2024 at 06:18:01PM -0400, Olga Kornievskaia wrote:
> There is a race between laundromat handling of revoked delegations
> and a client sending free_stateid operation. Laundromat thread
> finds that delegation has expired and needs to be revoked so it
> marks the delegation stid revoked and it puts it on a reaper list
> but then it unlock the state lock and the actual delegation revocation
> happens without the lock. Once the stid is marked revoked a racing
> free_stateid processing thread does the following (1) it calls
> list_del_init() which removes it from the reaper list and (2) frees
> the delegation stid structure. The laundromat thread ends up not
> calling the revoke_delegation() function for this particular delegation
> but that means it will no release the lock lease that exists on
> the file.
> 
> Now, a new open for this file comes in and ends up finding that
> lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
> up trying to derefence a freed delegation stateid. Leading to the
> followint use-after-free KASAN warning:
> 
> kernel: ==================================================================
> kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
> kernel:
> kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9
> kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
> kernel: Call trace:
> kernel: dump_backtrace+0x98/0x120
> kernel: show_stack+0x1c/0x30
> kernel: dump_stack_lvl+0x80/0xe8
> kernel: print_address_description.constprop.0+0x84/0x390
> kernel: print_report+0xa4/0x268
> kernel: kasan_report+0xb4/0xf8
> kernel: __asan_report_load8_noabort+0x1c/0x28
> kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> kernel: leases_conflict+0x68/0x370
> kernel: __break_lease+0x204/0xc38
> kernel: nfsd_open_break_lease+0x8c/0xf0 [nfsd]
> kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
> kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
> kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
> kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
> kernel: nfsd4_open+0xa08/0xe80 [nfsd]
> kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
> kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
> kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
> kernel: svc_process+0x3d4/0x7e0 [sunrpc]
> kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
> kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
> kernel: nfsd+0x270/0x400 [nfsd]
> kernel: kthread+0x288/0x310
> kernel: ret_from_fork+0x10/0x20
> 
> Proposing to have laundromat thread hold the state_lock over both
> marking thru revoking the delegation as well as making free_stateid
> acquire state_lock before accessing the list. Making sure that
> revoke_delegation() (ie kernel_setlease(unlock)) is called for
> every delegation that was revoked and added to the reaper list.
> 
> CC: stable@vger.kernel.org
> Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
> 
> --- I can't figure out the Fixes: tag. Laundromat's behaviour has
> been like that forever. But the free_stateid bits wont apply before
> the 1e3577a4521e ("SUNRPC: discard sv_refcnt, and svc_get/svc_put").
> But we used that fixes tag already with a previous fix for a different
> problem.
> ---
>  fs/nfsd/nfs4state.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 9c2b1d251ab3..c97907d7fb38 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -6605,13 +6605,13 @@ nfs4_laundromat(struct nfsd_net *nn)
>  		unhash_delegation_locked(dp, SC_STATUS_REVOKED);
>  		list_add(&dp->dl_recall_lru, &reaplist);
>  	}
> -	spin_unlock(&state_lock);
>  	while (!list_empty(&reaplist)) {
>  		dp = list_first_entry(&reaplist, struct nfs4_delegation,
>  					dl_recall_lru);
>  		list_del_init(&dp->dl_recall_lru);
>  		revoke_delegation(dp);
>  	}
> +	spin_unlock(&state_lock);

Code review suggests revoke_delegation() (and in particular,
destroy_unhashed_deleg(), must not be called while holding
state_lock().


>  	spin_lock(&nn->client_lock);
>  	while (!list_empty(&nn->close_lru)) {
> @@ -7213,7 +7213,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		if (s->sc_status & SC_STATUS_REVOKED) {
>  			spin_unlock(&s->sc_lock);
>  			dp = delegstateid(s);
> +			spin_lock(&state_lock);
>  			list_del_init(&dp->dl_recall_lru);
> +			spin_unlock(&state_lock);

Existing code is inconsistent about how manipulation of
dl_recall_lru is protected. Most instances do use state_lock for
this purpose, but a few, including this one, use cl->cl_lock. Does
the other instance using cl_lock need review and correction as well?

I'd prefer to see this fix make the protection of dl_recall_lru
consistent everywhere.


>  			spin_unlock(&cl->cl_lock);
>  			nfs4_put_stid(s);
>  			ret = nfs_ok;
> -- 
> 2.43.5
>
Jeff Layton Oct. 11, 2024, 4:17 p.m. UTC | #6
On Fri, 2024-10-11 at 10:39 -0400, Chuck Lever wrote:
> On Thu, Oct 10, 2024 at 06:18:01PM -0400, Olga Kornievskaia wrote:
> > There is a race between laundromat handling of revoked delegations
> > and a client sending free_stateid operation. Laundromat thread
> > finds that delegation has expired and needs to be revoked so it
> > marks the delegation stid revoked and it puts it on a reaper list
> > but then it unlock the state lock and the actual delegation revocation
> > happens without the lock. Once the stid is marked revoked a racing
> > free_stateid processing thread does the following (1) it calls
> > list_del_init() which removes it from the reaper list and (2) frees
> > the delegation stid structure. The laundromat thread ends up not
> > calling the revoke_delegation() function for this particular delegation
> > but that means it will no release the lock lease that exists on
> > the file.
> > 
> > Now, a new open for this file comes in and ends up finding that
> > lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
> > up trying to derefence a freed delegation stateid. Leading to the
> > followint use-after-free KASAN warning:
> > 
> > kernel: ==================================================================
> > kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
> > kernel:
> > kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9
> > kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
> > kernel: Call trace:
> > kernel: dump_backtrace+0x98/0x120
> > kernel: show_stack+0x1c/0x30
> > kernel: dump_stack_lvl+0x80/0xe8
> > kernel: print_address_description.constprop.0+0x84/0x390
> > kernel: print_report+0xa4/0x268
> > kernel: kasan_report+0xb4/0xf8
> > kernel: __asan_report_load8_noabort+0x1c/0x28
> > kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > kernel: leases_conflict+0x68/0x370
> > kernel: __break_lease+0x204/0xc38
> > kernel: nfsd_open_break_lease+0x8c/0xf0 [nfsd]
> > kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
> > kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
> > kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
> > kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
> > kernel: nfsd4_open+0xa08/0xe80 [nfsd]
> > kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
> > kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
> > kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
> > kernel: svc_process+0x3d4/0x7e0 [sunrpc]
> > kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
> > kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
> > kernel: nfsd+0x270/0x400 [nfsd]
> > kernel: kthread+0x288/0x310
> > kernel: ret_from_fork+0x10/0x20
> > 
> > Proposing to have laundromat thread hold the state_lock over both
> > marking thru revoking the delegation as well as making free_stateid
> > acquire state_lock before accessing the list. Making sure that
> > revoke_delegation() (ie kernel_setlease(unlock)) is called for
> > every delegation that was revoked and added to the reaper list.
> > 
> > CC: stable@vger.kernel.org
> > Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
> > 
> > --- I can't figure out the Fixes: tag. Laundromat's behaviour has
> > been like that forever. But the free_stateid bits wont apply before
> > the 1e3577a4521e ("SUNRPC: discard sv_refcnt, and svc_get/svc_put").
> > But we used that fixes tag already with a previous fix for a different
> > problem.
> > ---
> >  fs/nfsd/nfs4state.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 9c2b1d251ab3..c97907d7fb38 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -6605,13 +6605,13 @@ nfs4_laundromat(struct nfsd_net *nn)
> >  		unhash_delegation_locked(dp, SC_STATUS_REVOKED);
> >  		list_add(&dp->dl_recall_lru, &reaplist);
> >  	}
> > -	spin_unlock(&state_lock);
> >  	while (!list_empty(&reaplist)) {
> >  		dp = list_first_entry(&reaplist, struct nfs4_delegation,
> >  					dl_recall_lru);
> >  		list_del_init(&dp->dl_recall_lru);
> >  		revoke_delegation(dp);
> >  	}
> > +	spin_unlock(&state_lock);
> 
> Code review suggests revoke_delegation() (and in particular,
> destroy_unhashed_deleg(), must not be called while holding
> state_lock().
>

We'd be calling nfs4_unlock_deleg_lease with a spinlock held, which is
sort of gross.

That said, I don't love this fix either.

> 
> >  	spin_lock(&nn->client_lock);
> >  	while (!list_empty(&nn->close_lru)) {
> > @@ -7213,7 +7213,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  		if (s->sc_status & SC_STATUS_REVOKED) {
> >  			spin_unlock(&s->sc_lock);
> >  			dp = delegstateid(s);
> > +			spin_lock(&state_lock);
> >  			list_del_init(&dp->dl_recall_lru);
> > +			spin_unlock(&state_lock);
> 
> Existing code is inconsistent about how manipulation of
> dl_recall_lru is protected. Most instances do use state_lock for
> this purpose, but a few, including this one, use cl->cl_lock. Does
> the other instance using cl_lock need review and correction as well?
> 
> I'd prefer to see this fix make the protection of dl_recall_lru
> consistent everywhere.
> 

Agreed. The locking around the delegation handling has been a mess for
a long time. I'd really like to have a way to fix this that didn't
require having to rework all of this code however.

How about something like this patch instead? Olga, thoughts?


[PATCH] nfsd: add new SC_STATUS_FREEABLE to prevent race with  FREE_STATEID

Olga identified a race between the laundromat and FREE_STATEID handling.
The crux of the problem is that free_stateid can proceed while the
laundromat is still processing the revocation.

Add a new SC_STATUS_FREEABLE flag that is set once the revocation is
complete. Have nfsd4_free_stateid only consider delegations that have
this flag set.

Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4state.c | 3 ++-
 fs/nfsd/state.h     | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 73c4b983c048..b71a2cc7f2dd 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1371,6 +1371,7 @@ static void revoke_delegation(struct nfs4_delegation *dp)
 		spin_lock(&clp->cl_lock);
 		refcount_inc(&dp->dl_stid.sc_count);
 		list_add(&dp->dl_recall_lru, &clp->cl_revoked);
+		dp->dl_stid.sc_status |= SC_STATUS_FREEABLE;
 		spin_unlock(&clp->cl_lock);
 	}
 	destroy_unhashed_deleg(dp);
@@ -7207,7 +7208,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	spin_lock(&s->sc_lock);
 	switch (s->sc_type) {
 	case SC_TYPE_DELEG:
-		if (s->sc_status & SC_STATUS_REVOKED) {
+		if (s->sc_status & SC_STATUS_FREEABLE) {
 			spin_unlock(&s->sc_lock);
 			dp = delegstateid(s);
 			list_del_init(&dp->dl_recall_lru);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 874fcab2b183..4f3b941b09d3 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -114,6 +114,7 @@ struct nfs4_stid {
 /* For a deleg stateid kept around only to process free_stateid's: */
 #define SC_STATUS_REVOKED	BIT(1)
 #define SC_STATUS_ADMIN_REVOKED	BIT(2)
+#define SC_STATUS_FREEABLE	BIT(3)
 	unsigned short		sc_status;
 
 	struct list_head	sc_cp_list;
Olga Kornievskaia Oct. 11, 2024, 8:01 p.m. UTC | #7
On Fri, Oct 11, 2024 at 12:17 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Fri, 2024-10-11 at 10:39 -0400, Chuck Lever wrote:
> > On Thu, Oct 10, 2024 at 06:18:01PM -0400, Olga Kornievskaia wrote:
> > > There is a race between laundromat handling of revoked delegations
> > > and a client sending free_stateid operation. Laundromat thread
> > > finds that delegation has expired and needs to be revoked so it
> > > marks the delegation stid revoked and it puts it on a reaper list
> > > but then it unlock the state lock and the actual delegation revocation
> > > happens without the lock. Once the stid is marked revoked a racing
> > > free_stateid processing thread does the following (1) it calls
> > > list_del_init() which removes it from the reaper list and (2) frees
> > > the delegation stid structure. The laundromat thread ends up not
> > > calling the revoke_delegation() function for this particular delegation
> > > but that means it will no release the lock lease that exists on
> > > the file.
> > >
> > > Now, a new open for this file comes in and ends up finding that
> > > lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
> > > up trying to derefence a freed delegation stateid. Leading to the
> > > followint use-after-free KASAN warning:
> > >
> > > kernel: ==================================================================
> > > kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > > kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
> > > kernel:
> > > kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9
> > > kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
> > > kernel: Call trace:
> > > kernel: dump_backtrace+0x98/0x120
> > > kernel: show_stack+0x1c/0x30
> > > kernel: dump_stack_lvl+0x80/0xe8
> > > kernel: print_address_description.constprop.0+0x84/0x390
> > > kernel: print_report+0xa4/0x268
> > > kernel: kasan_report+0xb4/0xf8
> > > kernel: __asan_report_load8_noabort+0x1c/0x28
> > > kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > > kernel: leases_conflict+0x68/0x370
> > > kernel: __break_lease+0x204/0xc38
> > > kernel: nfsd_open_break_lease+0x8c/0xf0 [nfsd]
> > > kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
> > > kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
> > > kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
> > > kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
> > > kernel: nfsd4_open+0xa08/0xe80 [nfsd]
> > > kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
> > > kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
> > > kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
> > > kernel: svc_process+0x3d4/0x7e0 [sunrpc]
> > > kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
> > > kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
> > > kernel: nfsd+0x270/0x400 [nfsd]
> > > kernel: kthread+0x288/0x310
> > > kernel: ret_from_fork+0x10/0x20
> > >
> > > Proposing to have laundromat thread hold the state_lock over both
> > > marking thru revoking the delegation as well as making free_stateid
> > > acquire state_lock before accessing the list. Making sure that
> > > revoke_delegation() (ie kernel_setlease(unlock)) is called for
> > > every delegation that was revoked and added to the reaper list.
> > >
> > > CC: stable@vger.kernel.org
> > > Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
> > >
> > > --- I can't figure out the Fixes: tag. Laundromat's behaviour has
> > > been like that forever. But the free_stateid bits wont apply before
> > > the 1e3577a4521e ("SUNRPC: discard sv_refcnt, and svc_get/svc_put").
> > > But we used that fixes tag already with a previous fix for a different
> > > problem.
> > > ---
> > >  fs/nfsd/nfs4state.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index 9c2b1d251ab3..c97907d7fb38 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -6605,13 +6605,13 @@ nfs4_laundromat(struct nfsd_net *nn)
> > >             unhash_delegation_locked(dp, SC_STATUS_REVOKED);
> > >             list_add(&dp->dl_recall_lru, &reaplist);
> > >     }
> > > -   spin_unlock(&state_lock);
> > >     while (!list_empty(&reaplist)) {
> > >             dp = list_first_entry(&reaplist, struct nfs4_delegation,
> > >                                     dl_recall_lru);
> > >             list_del_init(&dp->dl_recall_lru);
> > >             revoke_delegation(dp);
> > >     }
> > > +   spin_unlock(&state_lock);
> >
> > Code review suggests revoke_delegation() (and in particular,
> > destroy_unhashed_deleg(), must not be called while holding
> > state_lock().
> >
>
> We'd be calling nfs4_unlock_deleg_lease with a spinlock held, which is
> sort of gross.
>
> That said, I don't love this fix either.
>
> >
> > >     spin_lock(&nn->client_lock);
> > >     while (!list_empty(&nn->close_lru)) {
> > > @@ -7213,7 +7213,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > >             if (s->sc_status & SC_STATUS_REVOKED) {
> > >                     spin_unlock(&s->sc_lock);
> > >                     dp = delegstateid(s);
> > > +                   spin_lock(&state_lock);
> > >                     list_del_init(&dp->dl_recall_lru);
> > > +                   spin_unlock(&state_lock);
> >
> > Existing code is inconsistent about how manipulation of
> > dl_recall_lru is protected. Most instances do use state_lock for
> > this purpose, but a few, including this one, use cl->cl_lock. Does
> > the other instance using cl_lock need review and correction as well?
> >
> > I'd prefer to see this fix make the protection of dl_recall_lru
> > consistent everywhere.
> >
>
> Agreed. The locking around the delegation handling has been a mess for
> a long time. I'd really like to have a way to fix this that didn't
> require having to rework all of this code however.
>
> How about something like this patch instead? Olga, thoughts?

I think this patch will prevent the UAF but it doesn't work for
another reason (tested it too). As the free_stateid operation can come
in before the freeable flag is set (and so the nfsd4_free_stateid
function would not do anything). But it needs to remove this
delegation from cl_revoked which the laundromat puts it on as a part
of revoked_delegation() otherwise the server never clears
recallable_state_revoked. And I think this put_stid() that
free_stateid does is also needed. So what happens is free_stateid
comes and goes and the sequence flag is set and is never cleared.

Laundromat threat when it starts revocation process it either needs to
'finish it' but it needs to make sure that if free_stateid arrives in
the meanwhile it has to wait but still run. Or I was thinking that
perhaps, we can make free_stateid act like delegreturn. but I wasn't
sure if free_stateid is allowed to act like delegreturn. but this also
fixes the problem if the free_stateid arrives and takes the work away
from the laundromat thread then free_stateid finishes the return just
like a delegreturn (which unlocks the lease). But I'm unclear if there
isn't any races between revoke_delegation and destroy_delegation.

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 56b261608af4..1ef6933b1ccb 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -7159,6 +7159,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate,
                        dp = delegstateid(s);
                        list_del_init(&dp->dl_recall_lru);
                        spin_unlock(&cl->cl_lock);
+                       destroy_delegation(dp);
                        nfs4_put_stid(s);
                        ret = nfs_ok;
                        goto out;



> [PATCH] nfsd: add new SC_STATUS_FREEABLE to prevent race with  FREE_STATEID
>
> Olga identified a race between the laundromat and FREE_STATEID handling.
> The crux of the problem is that free_stateid can proceed while the
> laundromat is still processing the revocation.
>
> Add a new SC_STATUS_FREEABLE flag that is set once the revocation is
> complete. Have nfsd4_free_stateid only consider delegations that have
> this flag set.
>
> Reported-by: Olga Kornievskaia <okorniev@redhat.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/nfsd/nfs4state.c | 3 ++-
>  fs/nfsd/state.h     | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 73c4b983c048..b71a2cc7f2dd 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1371,6 +1371,7 @@ static void revoke_delegation(struct nfs4_delegation *dp)
>                 spin_lock(&clp->cl_lock);
>                 refcount_inc(&dp->dl_stid.sc_count);
>                 list_add(&dp->dl_recall_lru, &clp->cl_revoked);
> +               dp->dl_stid.sc_status |= SC_STATUS_FREEABLE;
>                 spin_unlock(&clp->cl_lock);
>         }
>         destroy_unhashed_deleg(dp);
> @@ -7207,7 +7208,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>         spin_lock(&s->sc_lock);
>         switch (s->sc_type) {
>         case SC_TYPE_DELEG:
> -               if (s->sc_status & SC_STATUS_REVOKED) {
> +               if (s->sc_status & SC_STATUS_FREEABLE) {
>                         spin_unlock(&s->sc_lock);
>                         dp = delegstateid(s);
>                         list_del_init(&dp->dl_recall_lru);
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 874fcab2b183..4f3b941b09d3 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -114,6 +114,7 @@ struct nfs4_stid {
>  /* For a deleg stateid kept around only to process free_stateid's: */
>  #define SC_STATUS_REVOKED      BIT(1)
>  #define SC_STATUS_ADMIN_REVOKED        BIT(2)
> +#define SC_STATUS_FREEABLE     BIT(3)
>         unsigned short          sc_status;
>
>         struct list_head        sc_cp_list;
> --
> 2.47.0
>
Jeff Layton Oct. 11, 2024, 8:16 p.m. UTC | #8
On Fri, 2024-10-11 at 16:01 -0400, Olga Kornievskaia wrote:
> On Fri, Oct 11, 2024 at 12:17 PM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Fri, 2024-10-11 at 10:39 -0400, Chuck Lever wrote:
> > > On Thu, Oct 10, 2024 at 06:18:01PM -0400, Olga Kornievskaia wrote:
> > > > There is a race between laundromat handling of revoked delegations
> > > > and a client sending free_stateid operation. Laundromat thread
> > > > finds that delegation has expired and needs to be revoked so it
> > > > marks the delegation stid revoked and it puts it on a reaper list
> > > > but then it unlock the state lock and the actual delegation revocation
> > > > happens without the lock. Once the stid is marked revoked a racing
> > > > free_stateid processing thread does the following (1) it calls
> > > > list_del_init() which removes it from the reaper list and (2) frees
> > > > the delegation stid structure. The laundromat thread ends up not
> > > > calling the revoke_delegation() function for this particular delegation
> > > > but that means it will no release the lock lease that exists on
> > > > the file.
> > > > 
> > > > Now, a new open for this file comes in and ends up finding that
> > > > lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
> > > > up trying to derefence a freed delegation stateid. Leading to the
> > > > followint use-after-free KASAN warning:
> > > > 
> > > > kernel: ==================================================================
> > > > kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > > > kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
> > > > kernel:
> > > > kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9
> > > > kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
> > > > kernel: Call trace:
> > > > kernel: dump_backtrace+0x98/0x120
> > > > kernel: show_stack+0x1c/0x30
> > > > kernel: dump_stack_lvl+0x80/0xe8
> > > > kernel: print_address_description.constprop.0+0x84/0x390
> > > > kernel: print_report+0xa4/0x268
> > > > kernel: kasan_report+0xb4/0xf8
> > > > kernel: __asan_report_load8_noabort+0x1c/0x28
> > > > kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > > > kernel: leases_conflict+0x68/0x370
> > > > kernel: __break_lease+0x204/0xc38
> > > > kernel: nfsd_open_break_lease+0x8c/0xf0 [nfsd]
> > > > kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
> > > > kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
> > > > kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
> > > > kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
> > > > kernel: nfsd4_open+0xa08/0xe80 [nfsd]
> > > > kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
> > > > kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
> > > > kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
> > > > kernel: svc_process+0x3d4/0x7e0 [sunrpc]
> > > > kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
> > > > kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
> > > > kernel: nfsd+0x270/0x400 [nfsd]
> > > > kernel: kthread+0x288/0x310
> > > > kernel: ret_from_fork+0x10/0x20
> > > > 
> > > > Proposing to have laundromat thread hold the state_lock over both
> > > > marking thru revoking the delegation as well as making free_stateid
> > > > acquire state_lock before accessing the list. Making sure that
> > > > revoke_delegation() (ie kernel_setlease(unlock)) is called for
> > > > every delegation that was revoked and added to the reaper list.
> > > > 
> > > > CC: stable@vger.kernel.org
> > > > Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
> > > > 
> > > > --- I can't figure out the Fixes: tag. Laundromat's behaviour has
> > > > been like that forever. But the free_stateid bits wont apply before
> > > > the 1e3577a4521e ("SUNRPC: discard sv_refcnt, and svc_get/svc_put").
> > > > But we used that fixes tag already with a previous fix for a different
> > > > problem.
> > > > ---
> > > >  fs/nfsd/nfs4state.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > index 9c2b1d251ab3..c97907d7fb38 100644
> > > > --- a/fs/nfsd/nfs4state.c
> > > > +++ b/fs/nfsd/nfs4state.c
> > > > @@ -6605,13 +6605,13 @@ nfs4_laundromat(struct nfsd_net *nn)
> > > >             unhash_delegation_locked(dp, SC_STATUS_REVOKED);
> > > >             list_add(&dp->dl_recall_lru, &reaplist);
> > > >     }
> > > > -   spin_unlock(&state_lock);
> > > >     while (!list_empty(&reaplist)) {
> > > >             dp = list_first_entry(&reaplist, struct nfs4_delegation,
> > > >                                     dl_recall_lru);
> > > >             list_del_init(&dp->dl_recall_lru);
> > > >             revoke_delegation(dp);
> > > >     }
> > > > +   spin_unlock(&state_lock);
> > > 
> > > Code review suggests revoke_delegation() (and in particular,
> > > destroy_unhashed_deleg(), must not be called while holding
> > > state_lock().
> > > 
> > 
> > We'd be calling nfs4_unlock_deleg_lease with a spinlock held, which is
> > sort of gross.
> > 
> > That said, I don't love this fix either.
> > 
> > > 
> > > >     spin_lock(&nn->client_lock);
> > > >     while (!list_empty(&nn->close_lru)) {
> > > > @@ -7213,7 +7213,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > >             if (s->sc_status & SC_STATUS_REVOKED) {
> > > >                     spin_unlock(&s->sc_lock);
> > > >                     dp = delegstateid(s);
> > > > +                   spin_lock(&state_lock);
> > > >                     list_del_init(&dp->dl_recall_lru);
> > > > +                   spin_unlock(&state_lock);
> > > 
> > > Existing code is inconsistent about how manipulation of
> > > dl_recall_lru is protected. Most instances do use state_lock for
> > > this purpose, but a few, including this one, use cl->cl_lock. Does
> > > the other instance using cl_lock need review and correction as well?
> > > 
> > > I'd prefer to see this fix make the protection of dl_recall_lru
> > > consistent everywhere.
> > > 
> > 
> > Agreed. The locking around the delegation handling has been a mess for
> > a long time. I'd really like to have a way to fix this that didn't
> > require having to rework all of this code however.
> > 
> > How about something like this patch instead? Olga, thoughts?
> 
> I think this patch will prevent the UAF but it doesn't work for
> another reason (tested it too). As the free_stateid operation can come
> in before the freeable flag is set (and so the nfsd4_free_stateid
> function would not do anything). 
>
> But it needs to remove this
> delegation from cl_revoked which the laundromat puts it on as a part
> of revoked_delegation() otherwise the server never clears
> recallable_state_revoked. And I think this put_stid() that
> free_stateid does is also needed. So what happens is free_stateid
> comes and goes and the sequence flag is set and is never cleared.
> 

Right. It hasn't been "officially" revoked yet, so if a FREE_STATEID
races in while it's REVOKED but not yet FREEABLE, it should just send
back NFS4ERR_LOCKS_HELD. Shouldn't the client assume something like
this race has occurred and retry it in that case?

I'm also curious as to why the client is sending a FREE_STATEID so
quickly in this case. Hasn't the laundromat just marked this thing
REVOKED and now we're already trying to process a FREE_STATEID for it.

> Laundromat threat when it starts revocation process it either needs to
> 'finish it' but it needs to make sure that if free_stateid arrives in
> the meanwhile it has to wait but still run. Or I was thinking that
> perhaps, we can make free_stateid act like delegreturn. but I wasn't
> sure if free_stateid is allowed to act like delegreturn. but this also
> fixes the problem if the free_stateid arrives and takes the work away
> from the laundromat thread then free_stateid finishes the return just
> like a delegreturn (which unlocks the lease). But I'm unclear if there
> isn't any races between revoke_delegation and destroy_delegation.
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 56b261608af4..1ef6933b1ccb 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -7159,6 +7159,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp,
> struct nfsd4_compound_state *cstate,
>                         dp = delegstateid(s);
>                         list_del_init(&dp->dl_recall_lru);
>                         spin_unlock(&cl->cl_lock);
> +                       destroy_delegation(dp);
>                         nfs4_put_stid(s);
>                         ret = nfs_ok;
>                         goto out;
> 
> 
> 
> > [PATCH] nfsd: add new SC_STATUS_FREEABLE to prevent race with  FREE_STATEID
> > 
> > Olga identified a race between the laundromat and FREE_STATEID handling.
> > The crux of the problem is that free_stateid can proceed while the
> > laundromat is still processing the revocation.
> > 
> > Add a new SC_STATUS_FREEABLE flag that is set once the revocation is
> > complete. Have nfsd4_free_stateid only consider delegations that have
> > this flag set.
> > 
> > Reported-by: Olga Kornievskaia <okorniev@redhat.com>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/nfsd/nfs4state.c | 3 ++-
> >  fs/nfsd/state.h     | 1 +
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 73c4b983c048..b71a2cc7f2dd 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -1371,6 +1371,7 @@ static void revoke_delegation(struct nfs4_delegation *dp)
> >                 spin_lock(&clp->cl_lock);
> >                 refcount_inc(&dp->dl_stid.sc_count);
> >                 list_add(&dp->dl_recall_lru, &clp->cl_revoked);
> > +               dp->dl_stid.sc_status |= SC_STATUS_FREEABLE;
> >                 spin_unlock(&clp->cl_lock);
> >         }
> >         destroy_unhashed_deleg(dp);
> > @@ -7207,7 +7208,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >         spin_lock(&s->sc_lock);
> >         switch (s->sc_type) {
> >         case SC_TYPE_DELEG:
> > -               if (s->sc_status & SC_STATUS_REVOKED) {
> > +               if (s->sc_status & SC_STATUS_FREEABLE) {
> >                         spin_unlock(&s->sc_lock);
> >                         dp = delegstateid(s);
> >                         list_del_init(&dp->dl_recall_lru);
> > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> > index 874fcab2b183..4f3b941b09d3 100644
> > --- a/fs/nfsd/state.h
> > +++ b/fs/nfsd/state.h
> > @@ -114,6 +114,7 @@ struct nfs4_stid {
> >  /* For a deleg stateid kept around only to process free_stateid's: */
> >  #define SC_STATUS_REVOKED      BIT(1)
> >  #define SC_STATUS_ADMIN_REVOKED        BIT(2)
> > +#define SC_STATUS_FREEABLE     BIT(3)
> >         unsigned short          sc_status;
> > 
> >         struct list_head        sc_cp_list;
> > --
> > 2.47.0
> > 
>
Trond Myklebust Oct. 11, 2024, 8:57 p.m. UTC | #9
On Fri, 2024-10-11 at 10:39 -0400, Chuck Lever wrote:
> On Thu, Oct 10, 2024 at 06:18:01PM -0400, Olga Kornievskaia wrote:
> > There is a race between laundromat handling of revoked delegations
> > and a client sending free_stateid operation. Laundromat thread
> > finds that delegation has expired and needs to be revoked so it
> > marks the delegation stid revoked and it puts it on a reaper list
> > but then it unlock the state lock and the actual delegation
> > revocation
> > happens without the lock. Once the stid is marked revoked a racing
> > free_stateid processing thread does the following (1) it calls
> > list_del_init() which removes it from the reaper list and (2) frees
> > the delegation stid structure. The laundromat thread ends up not
> > calling the revoke_delegation() function for this particular
> > delegation
> > but that means it will no release the lock lease that exists on
> > the file.
> > 
> > Now, a new open for this file comes in and ends up finding that
> > lease list isn't empty and calls nfsd_breaker_owns_lease() which
> > ends
> > up trying to derefence a freed delegation stateid. Leading to the
> > followint use-after-free KASAN warning:
> > 
> > kernel:
> > ==================================================================
> > kernel: BUG: KASAN: slab-use-after-free in
> > nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
> > kernel:
> > kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not
> > tainted 6.11.0-rc7+ #9
> > kernel: Hardware name: Apple Inc. Apple Virtualization Generic
> > Platform, BIOS 2069.0.0.0.0 08/03/2024
> > kernel: Call trace:
> > kernel: dump_backtrace+0x98/0x120
> > kernel: show_stack+0x1c/0x30
> > kernel: dump_stack_lvl+0x80/0xe8
> > kernel: print_address_description.constprop.0+0x84/0x390
> > kernel: print_report+0xa4/0x268
> > kernel: kasan_report+0xb4/0xf8
> > kernel: __asan_report_load8_noabort+0x1c/0x28
> > kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > kernel: leases_conflict+0x68/0x370
> > kernel: __break_lease+0x204/0xc38
> > kernel: nfsd_open_break_lease+0x8c/0xf0 [nfsd]
> > kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
> > kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
> > kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
> > kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
> > kernel: nfsd4_open+0xa08/0xe80 [nfsd]
> > kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
> > kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
> > kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
> > kernel: svc_process+0x3d4/0x7e0 [sunrpc]
> > kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
> > kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
> > kernel: nfsd+0x270/0x400 [nfsd]
> > kernel: kthread+0x288/0x310
> > kernel: ret_from_fork+0x10/0x20
> > 
> > Proposing to have laundromat thread hold the state_lock over both
> > marking thru revoking the delegation as well as making free_stateid
> > acquire state_lock before accessing the list. Making sure that
> > revoke_delegation() (ie kernel_setlease(unlock)) is called for
> > every delegation that was revoked and added to the reaper list.
> > 
> > CC: stable@vger.kernel.org
> > Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
> > 
> > --- I can't figure out the Fixes: tag. Laundromat's behaviour has
> > been like that forever. But the free_stateid bits wont apply before
> > the 1e3577a4521e ("SUNRPC: discard sv_refcnt, and
> > svc_get/svc_put").
> > But we used that fixes tag already with a previous fix for a
> > different
> > problem.
> > ---
> >  fs/nfsd/nfs4state.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 9c2b1d251ab3..c97907d7fb38 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -6605,13 +6605,13 @@ nfs4_laundromat(struct nfsd_net *nn)
> >   unhash_delegation_locked(dp, SC_STATUS_REVOKED);
> >   list_add(&dp->dl_recall_lru, &reaplist);
> >   }
> > - spin_unlock(&state_lock);
> >   while (!list_empty(&reaplist)) {
> >   dp = list_first_entry(&reaplist, struct nfs4_delegation,
> >   dl_recall_lru);
> >   list_del_init(&dp->dl_recall_lru);
> >   revoke_delegation(dp);
> >   }
> > + spin_unlock(&state_lock);
> 
> Code review suggests revoke_delegation() (and in particular,
> destroy_unhashed_deleg(), must not be called while holding
> state_lock().
> 
> 
> >   spin_lock(&nn->client_lock);
> >   while (!list_empty(&nn->close_lru)) {
> > @@ -7213,7 +7213,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp,
> > struct nfsd4_compound_state *cstate,
> >   if (s->sc_status & SC_STATUS_REVOKED) {
> >   spin_unlock(&s->sc_lock);
> >   dp = delegstateid(s);
> > + spin_lock(&state_lock);
> >   list_del_init(&dp->dl_recall_lru);
> > + spin_unlock(&state_lock);
> 
> Existing code is inconsistent about how manipulation of
> dl_recall_lru is protected. Most instances do use state_lock for
> this purpose, but a few, including this one, use cl->cl_lock. Does
> the other instance using cl_lock need review and correction as well?
> 
> I'd prefer to see this fix make the protection of dl_recall_lru
> consistent everywhere.

The problem appears to be that the same list entry field dp-
>dl_recall_lru is being reused for several completely different lists:
 * clp->cl_revoked
 * nn->del_recall_lru
and the occasional private list.

It looks as if the intention is to protect clp->cl_revoked with a
spinlock that is local to the client, whereas nn->del_recall_lru is
protected by a global lock (i.e. state_lock).

In most cases where unhash_delegation_locked() is being called, then it
is obvious that the dl_recall_lru field is assigned to the nn-
>del_recall_lru list, and so state_lock needs to be held.

However the two calls to destroy_delegation() don't appear to guarantee
anything w.r.t. what dl_recall_lru is being used for. So perhaps those
calls ought to grab dp->dl_stid.sc_client->cl_lock before calling
unhash_delegation_locked()?

The other thing is the issue of dl_recall_lru being put on private
lists (including in nfs4_laundromat(), nfs4_state_shutdown_net() and
__destroy_client()). I'd suggest that practice is only safe if the call
to unhash_delegation_locked() is actually successful. Otherwise, there
is a danger of a race where the delegation can be freed while it is on
the private list.

> 
> 
> >   spin_unlock(&cl->cl_lock);
> >   nfs4_put_stid(s);
> >   ret = nfs_ok;
> > -- 
> > 2.43.5
> > 
>
Olga Kornievskaia Oct. 11, 2024, 8:57 p.m. UTC | #10
On Fri, Oct 11, 2024 at 4:16 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Fri, 2024-10-11 at 16:01 -0400, Olga Kornievskaia wrote:
> > On Fri, Oct 11, 2024 at 12:17 PM Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > On Fri, 2024-10-11 at 10:39 -0400, Chuck Lever wrote:
> > > > On Thu, Oct 10, 2024 at 06:18:01PM -0400, Olga Kornievskaia wrote:
> > > > > There is a race between laundromat handling of revoked delegations
> > > > > and a client sending free_stateid operation. Laundromat thread
> > > > > finds that delegation has expired and needs to be revoked so it
> > > > > marks the delegation stid revoked and it puts it on a reaper list
> > > > > but then it unlock the state lock and the actual delegation revocation
> > > > > happens without the lock. Once the stid is marked revoked a racing
> > > > > free_stateid processing thread does the following (1) it calls
> > > > > list_del_init() which removes it from the reaper list and (2) frees
> > > > > the delegation stid structure. The laundromat thread ends up not
> > > > > calling the revoke_delegation() function for this particular delegation
> > > > > but that means it will no release the lock lease that exists on
> > > > > the file.
> > > > >
> > > > > Now, a new open for this file comes in and ends up finding that
> > > > > lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
> > > > > up trying to derefence a freed delegation stateid. Leading to the
> > > > > followint use-after-free KASAN warning:
> > > > >
> > > > > kernel: ==================================================================
> > > > > kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > > > > kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
> > > > > kernel:
> > > > > kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9
> > > > > kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
> > > > > kernel: Call trace:
> > > > > kernel: dump_backtrace+0x98/0x120
> > > > > kernel: show_stack+0x1c/0x30
> > > > > kernel: dump_stack_lvl+0x80/0xe8
> > > > > kernel: print_address_description.constprop.0+0x84/0x390
> > > > > kernel: print_report+0xa4/0x268
> > > > > kernel: kasan_report+0xb4/0xf8
> > > > > kernel: __asan_report_load8_noabort+0x1c/0x28
> > > > > kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
> > > > > kernel: leases_conflict+0x68/0x370
> > > > > kernel: __break_lease+0x204/0xc38
> > > > > kernel: nfsd_open_break_lease+0x8c/0xf0 [nfsd]
> > > > > kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
> > > > > kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
> > > > > kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
> > > > > kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
> > > > > kernel: nfsd4_open+0xa08/0xe80 [nfsd]
> > > > > kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
> > > > > kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
> > > > > kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
> > > > > kernel: svc_process+0x3d4/0x7e0 [sunrpc]
> > > > > kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
> > > > > kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
> > > > > kernel: nfsd+0x270/0x400 [nfsd]
> > > > > kernel: kthread+0x288/0x310
> > > > > kernel: ret_from_fork+0x10/0x20
> > > > >
> > > > > Proposing to have laundromat thread hold the state_lock over both
> > > > > marking thru revoking the delegation as well as making free_stateid
> > > > > acquire state_lock before accessing the list. Making sure that
> > > > > revoke_delegation() (ie kernel_setlease(unlock)) is called for
> > > > > every delegation that was revoked and added to the reaper list.
> > > > >
> > > > > CC: stable@vger.kernel.org
> > > > > Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
> > > > >
> > > > > --- I can't figure out the Fixes: tag. Laundromat's behaviour has
> > > > > been like that forever. But the free_stateid bits wont apply before
> > > > > the 1e3577a4521e ("SUNRPC: discard sv_refcnt, and svc_get/svc_put").
> > > > > But we used that fixes tag already with a previous fix for a different
> > > > > problem.
> > > > > ---
> > > > >  fs/nfsd/nfs4state.c | 4 +++-
> > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > index 9c2b1d251ab3..c97907d7fb38 100644
> > > > > --- a/fs/nfsd/nfs4state.c
> > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > @@ -6605,13 +6605,13 @@ nfs4_laundromat(struct nfsd_net *nn)
> > > > >             unhash_delegation_locked(dp, SC_STATUS_REVOKED);
> > > > >             list_add(&dp->dl_recall_lru, &reaplist);
> > > > >     }
> > > > > -   spin_unlock(&state_lock);
> > > > >     while (!list_empty(&reaplist)) {
> > > > >             dp = list_first_entry(&reaplist, struct nfs4_delegation,
> > > > >                                     dl_recall_lru);
> > > > >             list_del_init(&dp->dl_recall_lru);
> > > > >             revoke_delegation(dp);
> > > > >     }
> > > > > +   spin_unlock(&state_lock);
> > > >
> > > > Code review suggests revoke_delegation() (and in particular,
> > > > destroy_unhashed_deleg(), must not be called while holding
> > > > state_lock().
> > > >
> > >
> > > We'd be calling nfs4_unlock_deleg_lease with a spinlock held, which is
> > > sort of gross.
> > >
> > > That said, I don't love this fix either.
> > >
> > > >
> > > > >     spin_lock(&nn->client_lock);
> > > > >     while (!list_empty(&nn->close_lru)) {
> > > > > @@ -7213,7 +7213,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > >             if (s->sc_status & SC_STATUS_REVOKED) {
> > > > >                     spin_unlock(&s->sc_lock);
> > > > >                     dp = delegstateid(s);
> > > > > +                   spin_lock(&state_lock);
> > > > >                     list_del_init(&dp->dl_recall_lru);
> > > > > +                   spin_unlock(&state_lock);
> > > >
> > > > Existing code is inconsistent about how manipulation of
> > > > dl_recall_lru is protected. Most instances do use state_lock for
> > > > this purpose, but a few, including this one, use cl->cl_lock. Does
> > > > the other instance using cl_lock need review and correction as well?
> > > >
> > > > I'd prefer to see this fix make the protection of dl_recall_lru
> > > > consistent everywhere.
> > > >
> > >
> > > Agreed. The locking around the delegation handling has been a mess for
> > > a long time. I'd really like to have a way to fix this that didn't
> > > require having to rework all of this code however.
> > >
> > > How about something like this patch instead? Olga, thoughts?
> >
> > I think this patch will prevent the UAF but it doesn't work for
> > another reason (tested it too). As the free_stateid operation can come
> > in before the freeable flag is set (and so the nfsd4_free_stateid
> > function would not do anything).
> >
> > But it needs to remove this
> > delegation from cl_revoked which the laundromat puts it on as a part
> > of revoked_delegation() otherwise the server never clears
> > recallable_state_revoked. And I think this put_stid() that
> > free_stateid does is also needed. So what happens is free_stateid
> > comes and goes and the sequence flag is set and is never cleared.
> >
>
> Right. It hasn't been "officially" revoked yet, so if a FREE_STATEID
> races in while it's REVOKED but not yet FREEABLE, it should just send
> back NFS4ERR_LOCKS_HELD. Shouldn't the client assume something like
> this race has occurred and retry it in that case?

Actually this would not be an appropriate error to return I think. I
think the client sends TEST_STATEID gets revoked (or it got
err_revoked on the delegreturn). Then it sends FREE_STATEID so it's
not possible that a valid locking state exists (it would be a broken
server to return such an error). No reason for the client to retry.

> I'm also curious as to why the client is sending a FREE_STATEID so
> quickly in this case. Hasn't the laundromat just marked this thing
> REVOKED and now we're already trying to process a FREE_STATEID for it.

The test is as follows. (read) open 10k file (get delegations) and
hold it open for some time. Locally do write into the filesystem into
those 10k files. The server starts working on it's 10k cb_recalls.
Client is working on doing delegreturn. But the lease (is artificially
set to be short(er) then the server is able to go thru all its
callbacks and delegreturn). Thus laundromat kicks in declaring
delegation state expired for all those callbacks. The server then sets
a flag in the sequence (fails on the delegreturns too with revoked
error) and client switches to instead send test_stateid/free_stateid.
I guess client is fast enough in sending the free_stateid before the
laundromat thread gets to revoking that particular delegation in the
reaper list.

to restate: a bunch of opens, cb_recalls with delegreturns which are
switched to test_stateids/free_stateids, then eventually followed by
closes. Then opens again --> leading to UAF

> > Laundromat threat when it starts revocation process it either needs to
> > 'finish it' but it needs to make sure that if free_stateid arrives in
> > the meanwhile it has to wait but still run. Or I was thinking that
> > perhaps, we can make free_stateid act like delegreturn. but I wasn't
> > sure if free_stateid is allowed to act like delegreturn. but this also
> > fixes the problem if the free_stateid arrives and takes the work away
> > from the laundromat thread then free_stateid finishes the return just
> > like a delegreturn (which unlocks the lease). But I'm unclear if there
> > isn't any races between revoke_delegation and destroy_delegation.
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 56b261608af4..1ef6933b1ccb 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -7159,6 +7159,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp,
> > struct nfsd4_compound_state *cstate,
> >                         dp = delegstateid(s);
> >                         list_del_init(&dp->dl_recall_lru);
> >                         spin_unlock(&cl->cl_lock);
> > +                       destroy_delegation(dp);
> >                         nfs4_put_stid(s);
> >                         ret = nfs_ok;
> >                         goto out;
> >
> >
> >
> > > [PATCH] nfsd: add new SC_STATUS_FREEABLE to prevent race with  FREE_STATEID
> > >
> > > Olga identified a race between the laundromat and FREE_STATEID handling.
> > > The crux of the problem is that free_stateid can proceed while the
> > > laundromat is still processing the revocation.
> > >
> > > Add a new SC_STATUS_FREEABLE flag that is set once the revocation is
> > > complete. Have nfsd4_free_stateid only consider delegations that have
> > > this flag set.
> > >
> > > Reported-by: Olga Kornievskaia <okorniev@redhat.com>
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  fs/nfsd/nfs4state.c | 3 ++-
> > >  fs/nfsd/state.h     | 1 +
> > >  2 files changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index 73c4b983c048..b71a2cc7f2dd 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -1371,6 +1371,7 @@ static void revoke_delegation(struct nfs4_delegation *dp)
> > >                 spin_lock(&clp->cl_lock);
> > >                 refcount_inc(&dp->dl_stid.sc_count);
> > >                 list_add(&dp->dl_recall_lru, &clp->cl_revoked);
> > > +               dp->dl_stid.sc_status |= SC_STATUS_FREEABLE;
> > >                 spin_unlock(&clp->cl_lock);
> > >         }
> > >         destroy_unhashed_deleg(dp);
> > > @@ -7207,7 +7208,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > >         spin_lock(&s->sc_lock);
> > >         switch (s->sc_type) {
> > >         case SC_TYPE_DELEG:
> > > -               if (s->sc_status & SC_STATUS_REVOKED) {
> > > +               if (s->sc_status & SC_STATUS_FREEABLE) {
> > >                         spin_unlock(&s->sc_lock);
> > >                         dp = delegstateid(s);
> > >                         list_del_init(&dp->dl_recall_lru);
> > > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> > > index 874fcab2b183..4f3b941b09d3 100644
> > > --- a/fs/nfsd/state.h
> > > +++ b/fs/nfsd/state.h
> > > @@ -114,6 +114,7 @@ struct nfs4_stid {
> > >  /* For a deleg stateid kept around only to process free_stateid's: */
> > >  #define SC_STATUS_REVOKED      BIT(1)
> > >  #define SC_STATUS_ADMIN_REVOKED        BIT(2)
> > > +#define SC_STATUS_FREEABLE     BIT(3)
> > >         unsigned short          sc_status;
> > >
> > >         struct list_head        sc_cp_list;
> > > --
> > > 2.47.0
> > >
> >
>
> --
> Jeff Layton <jlayton@kernel.org>
>
diff mbox series

Patch

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 9c2b1d251ab3..c97907d7fb38 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -6605,13 +6605,13 @@  nfs4_laundromat(struct nfsd_net *nn)
 		unhash_delegation_locked(dp, SC_STATUS_REVOKED);
 		list_add(&dp->dl_recall_lru, &reaplist);
 	}
-	spin_unlock(&state_lock);
 	while (!list_empty(&reaplist)) {
 		dp = list_first_entry(&reaplist, struct nfs4_delegation,
 					dl_recall_lru);
 		list_del_init(&dp->dl_recall_lru);
 		revoke_delegation(dp);
 	}
+	spin_unlock(&state_lock);
 
 	spin_lock(&nn->client_lock);
 	while (!list_empty(&nn->close_lru)) {
@@ -7213,7 +7213,9 @@  nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		if (s->sc_status & SC_STATUS_REVOKED) {
 			spin_unlock(&s->sc_lock);
 			dp = delegstateid(s);
+			spin_lock(&state_lock);
 			list_del_init(&dp->dl_recall_lru);
+			spin_unlock(&state_lock);
 			spin_unlock(&cl->cl_lock);
 			nfs4_put_stid(s);
 			ret = nfs_ok;