diff mbox series

sunrpc: fix refcount leak for rpc auth modules

Message ID 20210226230437.jfgagcq5magzlrtv@tuedko18.puzzle-itc.de (mailing list archive)
State New, archived
Headers show
Series sunrpc: fix refcount leak for rpc auth modules | expand

Commit Message

Daniel Kobras Feb. 26, 2021, 11:04 p.m. UTC
If an auth module's accept op returns SVC_CLOSE, svc_process_common()
enters a call path that does not call svc_authorise() before leaving the
function, and thus leaks a reference on the auth module's refcount. Hence,
make sure calls to svc_authenticate() and svc_authorise() are paired for
all call paths, to make sure rpc auth modules can be unloaded.

Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message")
Signed-off-by: Daniel Kobras <kobras@puzzle-itc.de>
---
Hi!

While debugging NFS on a system with misconfigured krb5 settings, we noticed
a suspiciously high refcount on the auth_rpcgss module, despite all of its
consumers already unloaded. I wasn't able to analyze any further on the live
system, but had a look at the code afterwards, and found a path that seems
to leak references if the mechanism's accept() op shuts down a connection
early. Although I couldn't verify, this seem to be a plausible fix.

Kind regards,

Daniel

 net/sunrpc/svc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Chuck Lever III March 1, 2021, 3:20 p.m. UTC | #1
> On Feb 26, 2021, at 6:04 PM, Daniel Kobras <kobras@puzzle-itc.de> wrote:
> 
> If an auth module's accept op returns SVC_CLOSE, svc_process_common()
> enters a call path that does not call svc_authorise() before leaving the
> function, and thus leaks a reference on the auth module's refcount. Hence,
> make sure calls to svc_authenticate() and svc_authorise() are paired for
> all call paths, to make sure rpc auth modules can be unloaded.
> 
> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message")
> Signed-off-by: Daniel Kobras <kobras@puzzle-itc.de>
> ---
> Hi!
> 
> While debugging NFS on a system with misconfigured krb5 settings, we noticed
> a suspiciously high refcount on the auth_rpcgss module, despite all of its
> consumers already unloaded. I wasn't able to analyze any further on the live
> system, but had a look at the code afterwards, and found a path that seems
> to leak references if the mechanism's accept() op shuts down a connection
> early. Although I couldn't verify, this seem to be a plausible fix.
> 
> Kind regards,
> 
> Daniel

Hi Daniel-

I've provisionally included your patch in my NFSD for-rc topic branch
here:

git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git

Your bug report seems plausible, but I need to take a closer look at that
code and your proposed change. Would very much like to hear from others,
too.


> net/sunrpc/svc.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 61fb8a18552c..d76dc9d95d16 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
> 
>  sendit:
> 	if (svc_authorise(rqstp))
> -		goto close;
> +		goto close_xprt;
> 	return 1;		/* Caller can now send it */
> 
> release_dropit:
> @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
> 	return 0;
> 
>  close:
> +	svc_authorise(rqstp);
> +close_xprt:
> 	if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
> 		svc_close_xprt(rqstp->rq_xprt);
> 	dprintk("svc: svc_process close\n");
> @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
> err_short_len:
> 	svc_printk(rqstp, "short len %zd, dropping request\n",
> 			argv->iov_len);
> -	goto close;
> +	goto close_xprt;
> 
> err_bad_rpc:
> 	serv->sv_stats->rpcbadfmt++;
> -- 
> 2.25.1
> 
> 
> -- 
> Puzzle ITC Deutschland GmbH
> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 
> Tübingen
> 
> Eingetragen am Amtsgericht Stuttgart HRB 765802
> Geschäftsführer: 
> Lukas Kallies, Daniel Kobras, Mark Pröhl
> 

--
Chuck Lever
J. Bruce Fields March 1, 2021, 4:28 p.m. UTC | #2
On Mon, Mar 01, 2021 at 03:20:24PM +0000, Chuck Lever wrote:
> 
> > On Feb 26, 2021, at 6:04 PM, Daniel Kobras <kobras@puzzle-itc.de> wrote:
> > 
> > If an auth module's accept op returns SVC_CLOSE, svc_process_common()
> > enters a call path that does not call svc_authorise() before leaving the
> > function, and thus leaks a reference on the auth module's refcount. Hence,
> > make sure calls to svc_authenticate() and svc_authorise() are paired for
> > all call paths, to make sure rpc auth modules can be unloaded.
> > 
> > Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message")
> > Signed-off-by: Daniel Kobras <kobras@puzzle-itc.de>
> > ---
> > Hi!
> > 
> > While debugging NFS on a system with misconfigured krb5 settings, we noticed
> > a suspiciously high refcount on the auth_rpcgss module, despite all of its
> > consumers already unloaded. I wasn't able to analyze any further on the live
> > system, but had a look at the code afterwards, and found a path that seems
> > to leak references if the mechanism's accept() op shuts down a connection
> > early. Although I couldn't verify, this seem to be a plausible fix.
> > 
> > Kind regards,
> > 
> > Daniel
> 
> Hi Daniel-
> 
> I've provisionally included your patch in my NFSD for-rc topic branch
> here:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
> 
> Your bug report seems plausible, but I need to take a closer look at that
> code and your proposed change. Would very much like to hear from others,
> too.

So, the effect of this is to call svc_authorise more often.  I think
that's always safe, because svc_authorise is a no-op unless rq_authops
is set, it clears rq_authops itself, and rq_authops being set is a
guarantee that ->accept() already ran.

It's harder to know if this solves the problem, as I see a lot of other
mentions of THIS_MODULE in svcauth_gss.c.

Possibly orthogonal to this problem, but: svcauth_gss_release
unconditionally dereferences rqstp->rq_auth_data.  Isn't that a NULL
dereference if the kmalloc at the start of svcauth_gss_accept() fails?

Finally, should we care about module reference leaks?  Does anyone
really *need* to unload modules?  And will bad stuff happen when the
count overflows, or does the module code fail safely somehow in the
overflow case?  I know, bugs are bugs, I should care about fixing all of
them, shame on me....

--b.

> 
> 
> > net/sunrpc/svc.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index 61fb8a18552c..d76dc9d95d16 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
> > 
> >  sendit:
> > 	if (svc_authorise(rqstp))
> > -		goto close;
> > +		goto close_xprt;
> > 	return 1;		/* Caller can now send it */
> > 
> > release_dropit:
> > @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
> > 	return 0;
> > 
> >  close:
> > +	svc_authorise(rqstp);
> > +close_xprt:
> > 	if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
> > 		svc_close_xprt(rqstp->rq_xprt);
> > 	dprintk("svc: svc_process close\n");
> > @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
> > err_short_len:
> > 	svc_printk(rqstp, "short len %zd, dropping request\n",
> > 			argv->iov_len);
> > -	goto close;
> > +	goto close_xprt;
> > 
> > err_bad_rpc:
> > 	serv->sv_stats->rpcbadfmt++;
> > -- 
> > 2.25.1
> > 
> > 
> > -- 
> > Puzzle ITC Deutschland GmbH
> > Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 
> > Tübingen
> > 
> > Eingetragen am Amtsgericht Stuttgart HRB 765802
> > Geschäftsführer: 
> > Lukas Kallies, Daniel Kobras, Mark Pröhl
> > 
> 
> --
> Chuck Lever
> 
> 
>
Chuck Lever III March 1, 2021, 5:44 p.m. UTC | #3
> On Mar 1, 2021, at 11:28 AM, Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Mon, Mar 01, 2021 at 03:20:24PM +0000, Chuck Lever wrote:
>> 
>>> On Feb 26, 2021, at 6:04 PM, Daniel Kobras <kobras@puzzle-itc.de> wrote:
>>> 
>>> If an auth module's accept op returns SVC_CLOSE, svc_process_common()
>>> enters a call path that does not call svc_authorise() before leaving the
>>> function, and thus leaks a reference on the auth module's refcount. Hence,
>>> make sure calls to svc_authenticate() and svc_authorise() are paired for
>>> all call paths, to make sure rpc auth modules can be unloaded.
>>> 
>>> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message")
>>> Signed-off-by: Daniel Kobras <kobras@puzzle-itc.de>
>>> ---
>>> Hi!
>>> 
>>> While debugging NFS on a system with misconfigured krb5 settings, we noticed
>>> a suspiciously high refcount on the auth_rpcgss module, despite all of its
>>> consumers already unloaded. I wasn't able to analyze any further on the live
>>> system, but had a look at the code afterwards, and found a path that seems
>>> to leak references if the mechanism's accept() op shuts down a connection
>>> early. Although I couldn't verify, this seem to be a plausible fix.
>>> 
>>> Kind regards,
>>> 
>>> Daniel
>> 
>> Hi Daniel-
>> 
>> I've provisionally included your patch in my NFSD for-rc topic branch
>> here:
>> 
>> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
>> 
>> Your bug report seems plausible, but I need to take a closer look at that
>> code and your proposed change. Would very much like to hear from others,
>> too.
> 
> So, the effect of this is to call svc_authorise more often.  I think
> that's always safe, because svc_authorise is a no-op unless rq_authops
> is set, it clears rq_authops itself, and rq_authops being set is a
> guarantee that ->accept() already ran.
> 
> It's harder to know if this solves the problem, as I see a lot of other
> mentions of THIS_MODULE in svcauth_gss.c.

Perhaps a deeper audit is necessary.

A small code change to inject SVC_CLOSE returns at random would enable
a more dynamic analysis.


> Possibly orthogonal to this problem, but: svcauth_gss_release
> unconditionally dereferences rqstp->rq_auth_data.  Isn't that a NULL
> dereference if the kmalloc at the start of svcauth_gss_accept() fails?
> 
> Finally, should we care about module reference leaks?

I would prefer that module reference counting work as expected. When it
doesn't that tends to lead to people (say, me) hunting for bugs that
might actually be serious.


> Does anyone really *need* to unload modules?

Anyone who wants to replace the module with a newer build that fixes a
bug. It avoids a full reboot, and for some that's important.


> And will bad stuff happen when the
> count overflows, or does the module code fail safely somehow in the
> overflow case?  I know, bugs are bugs, I should care about fixing all of
> them, shame on me....
> 
> --b.
> 
>> 
>> 
>>> net/sunrpc/svc.c | 6 ++++--
>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
>>> index 61fb8a18552c..d76dc9d95d16 100644
>>> --- a/net/sunrpc/svc.c
>>> +++ b/net/sunrpc/svc.c
>>> @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>> 
>>> sendit:
>>> 	if (svc_authorise(rqstp))
>>> -		goto close;
>>> +		goto close_xprt;
>>> 	return 1;		/* Caller can now send it */
>>> 
>>> release_dropit:
>>> @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>> 	return 0;
>>> 
>>> close:
>>> +	svc_authorise(rqstp);
>>> +close_xprt:
>>> 	if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
>>> 		svc_close_xprt(rqstp->rq_xprt);
>>> 	dprintk("svc: svc_process close\n");
>>> @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>> err_short_len:
>>> 	svc_printk(rqstp, "short len %zd, dropping request\n",
>>> 			argv->iov_len);
>>> -	goto close;
>>> +	goto close_xprt;
>>> 
>>> err_bad_rpc:
>>> 	serv->sv_stats->rpcbadfmt++;
>>> -- 
>>> 2.25.1
>>> 
>>> 
>>> -- 
>>> Puzzle ITC Deutschland GmbH
>>> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 
>>> Tübingen
>>> 
>>> Eingetragen am Amtsgericht Stuttgart HRB 765802
>>> Geschäftsführer: 
>>> Lukas Kallies, Daniel Kobras, Mark Pröhl
>>> 
>> 
>> --
>> Chuck Lever

--
Chuck Lever
J. Bruce Fields March 1, 2021, 6:15 p.m. UTC | #4
On Mon, Mar 01, 2021 at 05:44:02PM +0000, Chuck Lever wrote:
> > So, the effect of this is to call svc_authorise more often.  I think
> > that's always safe, because svc_authorise is a no-op unless rq_authops
> > is set, it clears rq_authops itself, and rq_authops being set is a
> > guarantee that ->accept() already ran.
> > 
> > It's harder to know if this solves the problem, as I see a lot of other
> > mentions of THIS_MODULE in svcauth_gss.c.
> 
> Perhaps a deeper audit is necessary.
> 
> A small code change to inject SVC_CLOSE returns at random would enable
> a more dynamic analysis.

That might be interesting.

I don't think tihs patch necessarily has to wait for that.

> > Possibly orthogonal to this problem, but: svcauth_gss_release
> > unconditionally dereferences rqstp->rq_auth_data.  Isn't that a NULL
> > dereference if the kmalloc at the start of svcauth_gss_accept() fails?
> > 
> > Finally, should we care about module reference leaks?
> 
> I would prefer that module reference counting work as expected. When it
> doesn't that tends to lead to people (say, me) hunting for bugs that
> might actually be serious.
> 
> 
> > Does anyone really *need* to unload modules?
> 
> Anyone who wants to replace the module with a newer build that fixes a
> bug. It avoids a full reboot, and for some that's important.

Fair enough.

--b.
Chuck Lever III March 1, 2021, 6:21 p.m. UTC | #5
> On Mar 1, 2021, at 1:15 PM, Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Mon, Mar 01, 2021 at 05:44:02PM +0000, Chuck Lever wrote:
>>> So, the effect of this is to call svc_authorise more often.  I think
>>> that's always safe, because svc_authorise is a no-op unless rq_authops
>>> is set, it clears rq_authops itself, and rq_authops being set is a
>>> guarantee that ->accept() already ran.
>>> 
>>> It's harder to know if this solves the problem, as I see a lot of other
>>> mentions of THIS_MODULE in svcauth_gss.c.
>> 
>> Perhaps a deeper audit is necessary.
>> 
>> A small code change to inject SVC_CLOSE returns at random would enable
>> a more dynamic analysis.
> 
> That might be interesting.
> 
> I don't think this patch necessarily has to wait for that.

OK. It's in for-rc now, and sounds like that doesn't need to change.

Poking around a little, I see a try_module_get() and module_put() done
for every RPC. Considering that both have a preempt_disable/enable pair,
that seems a little expensive for the value it provides. One might like
to see the module reference count handled a little less frequently, but
I don't see an obvious way to address that.


>>> Possibly orthogonal to this problem, but: svcauth_gss_release
>>> unconditionally dereferences rqstp->rq_auth_data.  Isn't that a NULL
>>> dereference if the kmalloc at the start of svcauth_gss_accept() fails?
>>> 
>>> Finally, should we care about module reference leaks?
>> 
>> I would prefer that module reference counting work as expected. When it
>> doesn't that tends to lead to people (say, me) hunting for bugs that
>> might actually be serious.
>> 
>> 
>>> Does anyone really *need* to unload modules?
>> 
>> Anyone who wants to replace the module with a newer build that fixes a
>> bug. It avoids a full reboot, and for some that's important.
> 
> Fair enough.
> 
> --b.

--
Chuck Lever
Daniel Kobras March 2, 2021, 11:50 a.m. UTC | #6
Hi all!

Am 01.03.21 um 18:44 schrieb Chuck Lever:
>> On Mar 1, 2021, at 11:28 AM, Bruce Fields <bfields@fieldses.org> wrote:
>>
>> On Mon, Mar 01, 2021 at 03:20:24PM +0000, Chuck Lever wrote:
>>>
>>>> On Feb 26, 2021, at 6:04 PM, Daniel Kobras <kobras@puzzle-itc.de> wrote:
>>>>
>>>> If an auth module's accept op returns SVC_CLOSE, svc_process_common()
>>>> enters a call path that does not call svc_authorise() before leaving the
>>>> function, and thus leaks a reference on the auth module's refcount. Hence,
>>>> make sure calls to svc_authenticate() and svc_authorise() are paired for
>>>> all call paths, to make sure rpc auth modules can be unloaded.
>>>>
>>>> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message")
>>>> Signed-off-by: Daniel Kobras <kobras@puzzle-itc.de>
>>>> ---
>>>> Hi!
>>>>
>>>> While debugging NFS on a system with misconfigured krb5 settings, we noticed
>>>> a suspiciously high refcount on the auth_rpcgss module, despite all of its
>>>> consumers already unloaded. I wasn't able to analyze any further on the live
>>>> system, but had a look at the code afterwards, and found a path that seems
>>>> to leak references if the mechanism's accept() op shuts down a connection
>>>> early. Although I couldn't verify, this seem to be a plausible fix.
>>>>
>>>> Kind regards,
>>>>
>>>> Daniel
>>>
>>> Hi Daniel-
>>>
>>> I've provisionally included your patch in my NFSD for-rc topic branch
>>> here:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
>>>
>>> Your bug report seems plausible, but I need to take a closer look at that
>>> code and your proposed change. Would very much like to hear from others,
>>> too.
>>
>> So, the effect of this is to call svc_authorise more often.  I think
>> that's always safe, because svc_authorise is a no-op unless rq_authops
>> is set, it clears rq_authops itself, and rq_authops being set is a
>> guarantee that ->accept() already ran.
>>
>> It's harder to know if this solves the problem, as I see a lot of other
>> mentions of THIS_MODULE in svcauth_gss.c.
> 
> Perhaps a deeper audit is necessary.
> 
> A small code change to inject SVC_CLOSE returns at random would enable
> a more dynamic analysis.

I've managed to come up with simple reproducer for this bug:

On a working krb5 NFS mount from a test client, check which enctype is
used in the ticket for the NFS service. Then unmount, and exclude this
enctype from permitted_enctypes in the server's /etc/krb5.conf.[*]
Trying to mount again from the test client should now fail (EPERM), and
each mount attempt increases the refcount of the server's auth_rpcgss
module (by 22 in my test).

Exchanging sunrpc.ko for a version with the patch applied, and
re-running the same test, the refcount remains constant instead. This
confirms the initial analysis, and indicates the fix is actually correct.

[*] For a quick test in a standard setup, I've used
      [libdefaults]
      permitted_enctypes = aes128-cts-hmac-sha1-96
      (...)
    to make the normal AES256 tickets fail. A more realistic scenario
    would be a client that's restricted to RC4, and a server with
    RC4 keys on the KDC, but only AES allowed in krb5.conf. Default
    behaviour of typical AD join tools makes it easy to end up in a
    situation like this.

>> Possibly orthogonal to this problem, but: svcauth_gss_release
>> unconditionally dereferences rqstp->rq_auth_data.  Isn't that a NULL
>> dereference if the kmalloc at the start of svcauth_gss_accept() fails?
>>
>> Finally, should we care about module reference leaks?
> 
> I would prefer that module reference counting work as expected. When it
> doesn't that tends to lead to people (say, me) hunting for bugs that
> might actually be serious.

The refcount leak is the easily visible consequence, but the skipped
call to svcauth_gss_release() probably also leaks a small amount of
memory for each request. Repeating the test case above for a longer
period of time (eg. by throwing an automounter into the mix), this might
eventually become noticeable.

>> Does anyone really *need* to unload modules?
> 
> Anyone who wants to replace the module with a newer build that fixes a
> bug. It avoids a full reboot, and for some that's important.

Switching from rpc.svcgssd to gssproxy without taking down the machine
as a whole was the situation that originally prompted me to look into
this, but I admit that's a rather rare use case.

>> And will bad stuff happen when the
>> count overflows, or does the module code fail safely somehow in the
>> overflow case?  I know, bugs are bugs, I should care about fixing all of
>> them, shame on me....
>>
>> --b.
>>
>>>
>>>
>>>> net/sunrpc/svc.c | 6 ++++--
>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
>>>> index 61fb8a18552c..d76dc9d95d16 100644
>>>> --- a/net/sunrpc/svc.c
>>>> +++ b/net/sunrpc/svc.c
>>>> @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>>
>>>> sendit:
>>>> 	if (svc_authorise(rqstp))
>>>> -		goto close;
>>>> +		goto close_xprt;
>>>> 	return 1;		/* Caller can now send it */
>>>>
>>>> release_dropit:
>>>> @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>> 	return 0;
>>>>
>>>> close:
>>>> +	svc_authorise(rqstp);
>>>> +close_xprt:
>>>> 	if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
>>>> 		svc_close_xprt(rqstp->rq_xprt);
>>>> 	dprintk("svc: svc_process close\n");
>>>> @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>> err_short_len:
>>>> 	svc_printk(rqstp, "short len %zd, dropping request\n",
>>>> 			argv->iov_len);
>>>> -	goto close;
>>>> +	goto close_xprt;
>>>>
>>>> err_bad_rpc:
>>>> 	serv->sv_stats->rpcbadfmt++;
>>>> -- 
>>>> 2.25.1
>>>>
>>>>
>>>> -- 
>>>> Puzzle ITC Deutschland GmbH
>>>> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 
>>>> Tübingen
>>>>
>>>> Eingetragen am Amtsgericht Stuttgart HRB 765802
>>>> Geschäftsführer: 
>>>> Lukas Kallies, Daniel Kobras, Mark Pröhl
>>>>
>>>
>>> --
>>> Chuck Lever
> 
> --
> Chuck Lever

Kind regards,

Daniel
Chuck Lever III March 2, 2021, 3:48 p.m. UTC | #7
Excellent, Daniel. Thanks for following up! I will add a Link: tag
to this thread in the patch description.


> On Mar 2, 2021, at 6:50 AM, Daniel Kobras <kobras@puzzle-itc.de> wrote:
> 
> Hi all!
> 
> Am 01.03.21 um 18:44 schrieb Chuck Lever:
>>> On Mar 1, 2021, at 11:28 AM, Bruce Fields <bfields@fieldses.org> wrote:
>>> 
>>> On Mon, Mar 01, 2021 at 03:20:24PM +0000, Chuck Lever wrote:
>>>> 
>>>>> On Feb 26, 2021, at 6:04 PM, Daniel Kobras <kobras@puzzle-itc.de> wrote:
>>>>> 
>>>>> If an auth module's accept op returns SVC_CLOSE, svc_process_common()
>>>>> enters a call path that does not call svc_authorise() before leaving the
>>>>> function, and thus leaks a reference on the auth module's refcount. Hence,
>>>>> make sure calls to svc_authenticate() and svc_authorise() are paired for
>>>>> all call paths, to make sure rpc auth modules can be unloaded.
>>>>> 
>>>>> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message")
>>>>> Signed-off-by: Daniel Kobras <kobras@puzzle-itc.de>
>>>>> ---
>>>>> Hi!
>>>>> 
>>>>> While debugging NFS on a system with misconfigured krb5 settings, we noticed
>>>>> a suspiciously high refcount on the auth_rpcgss module, despite all of its
>>>>> consumers already unloaded. I wasn't able to analyze any further on the live
>>>>> system, but had a look at the code afterwards, and found a path that seems
>>>>> to leak references if the mechanism's accept() op shuts down a connection
>>>>> early. Although I couldn't verify, this seem to be a plausible fix.
>>>>> 
>>>>> Kind regards,
>>>>> 
>>>>> Daniel
>>>> 
>>>> Hi Daniel-
>>>> 
>>>> I've provisionally included your patch in my NFSD for-rc topic branch
>>>> here:
>>>> 
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
>>>> 
>>>> Your bug report seems plausible, but I need to take a closer look at that
>>>> code and your proposed change. Would very much like to hear from others,
>>>> too.
>>> 
>>> So, the effect of this is to call svc_authorise more often.  I think
>>> that's always safe, because svc_authorise is a no-op unless rq_authops
>>> is set, it clears rq_authops itself, and rq_authops being set is a
>>> guarantee that ->accept() already ran.
>>> 
>>> It's harder to know if this solves the problem, as I see a lot of other
>>> mentions of THIS_MODULE in svcauth_gss.c.
>> 
>> Perhaps a deeper audit is necessary.
>> 
>> A small code change to inject SVC_CLOSE returns at random would enable
>> a more dynamic analysis.
> 
> I've managed to come up with simple reproducer for this bug:
> 
> On a working krb5 NFS mount from a test client, check which enctype is
> used in the ticket for the NFS service. Then unmount, and exclude this
> enctype from permitted_enctypes in the server's /etc/krb5.conf.[*]
> Trying to mount again from the test client should now fail (EPERM), and
> each mount attempt increases the refcount of the server's auth_rpcgss
> module (by 22 in my test).
> 
> Exchanging sunrpc.ko for a version with the patch applied, and
> re-running the same test, the refcount remains constant instead. This
> confirms the initial analysis, and indicates the fix is actually correct.
> 
> [*] For a quick test in a standard setup, I've used
>      [libdefaults]
>      permitted_enctypes = aes128-cts-hmac-sha1-96
>      (...)
>    to make the normal AES256 tickets fail. A more realistic scenario
>    would be a client that's restricted to RC4, and a server with
>    RC4 keys on the KDC, but only AES allowed in krb5.conf. Default
>    behaviour of typical AD join tools makes it easy to end up in a
>    situation like this.
> 
>>> Possibly orthogonal to this problem, but: svcauth_gss_release
>>> unconditionally dereferences rqstp->rq_auth_data.  Isn't that a NULL
>>> dereference if the kmalloc at the start of svcauth_gss_accept() fails?
>>> 
>>> Finally, should we care about module reference leaks?
>> 
>> I would prefer that module reference counting work as expected. When it
>> doesn't that tends to lead to people (say, me) hunting for bugs that
>> might actually be serious.
> 
> The refcount leak is the easily visible consequence, but the skipped
> call to svcauth_gss_release() probably also leaks a small amount of
> memory for each request. Repeating the test case above for a longer
> period of time (eg. by throwing an automounter into the mix), this might
> eventually become noticeable.
> 
>>> Does anyone really *need* to unload modules?
>> 
>> Anyone who wants to replace the module with a newer build that fixes a
>> bug. It avoids a full reboot, and for some that's important.
> 
> Switching from rpc.svcgssd to gssproxy without taking down the machine
> as a whole was the situation that originally prompted me to look into
> this, but I admit that's a rather rare use case.
> 
>>> And will bad stuff happen when the
>>> count overflows, or does the module code fail safely somehow in the
>>> overflow case?  I know, bugs are bugs, I should care about fixing all of
>>> them, shame on me....
>>> 
>>> --b.
>>> 
>>>> 
>>>> 
>>>>> net/sunrpc/svc.c | 6 ++++--
>>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>> 
>>>>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
>>>>> index 61fb8a18552c..d76dc9d95d16 100644
>>>>> --- a/net/sunrpc/svc.c
>>>>> +++ b/net/sunrpc/svc.c
>>>>> @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>>> 
>>>>> sendit:
>>>>> 	if (svc_authorise(rqstp))
>>>>> -		goto close;
>>>>> +		goto close_xprt;
>>>>> 	return 1;		/* Caller can now send it */
>>>>> 
>>>>> release_dropit:
>>>>> @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>>> 	return 0;
>>>>> 
>>>>> close:
>>>>> +	svc_authorise(rqstp);
>>>>> +close_xprt:
>>>>> 	if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
>>>>> 		svc_close_xprt(rqstp->rq_xprt);
>>>>> 	dprintk("svc: svc_process close\n");
>>>>> @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>>> err_short_len:
>>>>> 	svc_printk(rqstp, "short len %zd, dropping request\n",
>>>>> 			argv->iov_len);
>>>>> -	goto close;
>>>>> +	goto close_xprt;
>>>>> 
>>>>> err_bad_rpc:
>>>>> 	serv->sv_stats->rpcbadfmt++;
>>>>> -- 
>>>>> 2.25.1
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Puzzle ITC Deutschland GmbH
>>>>> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 
>>>>> Tübingen
>>>>> 
>>>>> Eingetragen am Amtsgericht Stuttgart HRB 765802
>>>>> Geschäftsführer: 
>>>>> Lukas Kallies, Daniel Kobras, Mark Pröhl
>>>>> 
>>>> 
>>>> --
>>>> Chuck Lever
>> 
>> --
>> Chuck Lever
> 
> Kind regards,
> 
> Daniel
> -- 
> Daniel Kobras
> Principal Architect
> Puzzle ITC Deutschland
> +49 7071 14316 0
> www.puzzle-itc.de
> 
> -- 
> Puzzle ITC Deutschland GmbH
> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 
> Tübingen
> 
> Eingetragen am Amtsgericht Stuttgart HRB 765802
> Geschäftsführer: 
> Lukas Kallies, Daniel Kobras, Mark Pröhl

--
Chuck Lever
diff mbox series

Patch

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 61fb8a18552c..d76dc9d95d16 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1413,7 +1413,7 @@  svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
 
  sendit:
 	if (svc_authorise(rqstp))
-		goto close;
+		goto close_xprt;
 	return 1;		/* Caller can now send it */
 
 release_dropit:
@@ -1425,6 +1425,8 @@  svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
 	return 0;
 
  close:
+	svc_authorise(rqstp);
+close_xprt:
 	if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
 		svc_close_xprt(rqstp->rq_xprt);
 	dprintk("svc: svc_process close\n");
@@ -1433,7 +1435,7 @@  svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
 err_short_len:
 	svc_printk(rqstp, "short len %zd, dropping request\n",
 			argv->iov_len);
-	goto close;
+	goto close_xprt;
 
 err_bad_rpc:
 	serv->sv_stats->rpcbadfmt++;