Message ID | 20210226230437.jfgagcq5magzlrtv@tuedko18.puzzle-itc.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | sunrpc: fix refcount leak for rpc auth modules | expand |
> On Feb 26, 2021, at 6:04 PM, Daniel Kobras <kobras@puzzle-itc.de> wrote: > > If an auth module's accept op returns SVC_CLOSE, svc_process_common() > enters a call path that does not call svc_authorise() before leaving the > function, and thus leaks a reference on the auth module's refcount. Hence, > make sure calls to svc_authenticate() and svc_authorise() are paired for > all call paths, to make sure rpc auth modules can be unloaded. > > Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message") > Signed-off-by: Daniel Kobras <kobras@puzzle-itc.de> > --- > Hi! > > While debugging NFS on a system with misconfigured krb5 settings, we noticed > a suspiciously high refcount on the auth_rpcgss module, despite all of its > consumers already unloaded. I wasn't able to analyze any further on the live > system, but had a look at the code afterwards, and found a path that seems > to leak references if the mechanism's accept() op shuts down a connection > early. Although I couldn't verify, this seem to be a plausible fix. > > Kind regards, > > Daniel Hi Daniel- I've provisionally included your patch in my NFSD for-rc topic branch here: git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git Your bug report seems plausible, but I need to take a closer look at that code and your proposed change. Would very much like to hear from others, too. > net/sunrpc/svc.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c > index 61fb8a18552c..d76dc9d95d16 100644 > --- a/net/sunrpc/svc.c > +++ b/net/sunrpc/svc.c > @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) > > sendit: > if (svc_authorise(rqstp)) > - goto close; > + goto close_xprt; > return 1; /* Caller can now send it */ > > release_dropit: > @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) > return 0; > > close: > + svc_authorise(rqstp); > +close_xprt: > if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags)) > svc_close_xprt(rqstp->rq_xprt); > dprintk("svc: svc_process close\n"); > @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) > err_short_len: > svc_printk(rqstp, "short len %zd, dropping request\n", > argv->iov_len); > - goto close; > + goto close_xprt; > > err_bad_rpc: > serv->sv_stats->rpcbadfmt++; > -- > 2.25.1 > > > -- > Puzzle ITC Deutschland GmbH > Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 > Tübingen > > Eingetragen am Amtsgericht Stuttgart HRB 765802 > Geschäftsführer: > Lukas Kallies, Daniel Kobras, Mark Pröhl > -- Chuck Lever
On Mon, Mar 01, 2021 at 03:20:24PM +0000, Chuck Lever wrote: > > > On Feb 26, 2021, at 6:04 PM, Daniel Kobras <kobras@puzzle-itc.de> wrote: > > > > If an auth module's accept op returns SVC_CLOSE, svc_process_common() > > enters a call path that does not call svc_authorise() before leaving the > > function, and thus leaks a reference on the auth module's refcount. Hence, > > make sure calls to svc_authenticate() and svc_authorise() are paired for > > all call paths, to make sure rpc auth modules can be unloaded. > > > > Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message") > > Signed-off-by: Daniel Kobras <kobras@puzzle-itc.de> > > --- > > Hi! > > > > While debugging NFS on a system with misconfigured krb5 settings, we noticed > > a suspiciously high refcount on the auth_rpcgss module, despite all of its > > consumers already unloaded. I wasn't able to analyze any further on the live > > system, but had a look at the code afterwards, and found a path that seems > > to leak references if the mechanism's accept() op shuts down a connection > > early. Although I couldn't verify, this seem to be a plausible fix. > > > > Kind regards, > > > > Daniel > > Hi Daniel- > > I've provisionally included your patch in my NFSD for-rc topic branch > here: > > git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git > > Your bug report seems plausible, but I need to take a closer look at that > code and your proposed change. Would very much like to hear from others, > too. So, the effect of this is to call svc_authorise more often. I think that's always safe, because svc_authorise is a no-op unless rq_authops is set, it clears rq_authops itself, and rq_authops being set is a guarantee that ->accept() already ran. It's harder to know if this solves the problem, as I see a lot of other mentions of THIS_MODULE in svcauth_gss.c. Possibly orthogonal to this problem, but: svcauth_gss_release unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL dereference if the kmalloc at the start of svcauth_gss_accept() fails? Finally, should we care about module reference leaks? Does anyone really *need* to unload modules? And will bad stuff happen when the count overflows, or does the module code fail safely somehow in the overflow case? I know, bugs are bugs, I should care about fixing all of them, shame on me.... --b. > > > > net/sunrpc/svc.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c > > index 61fb8a18552c..d76dc9d95d16 100644 > > --- a/net/sunrpc/svc.c > > +++ b/net/sunrpc/svc.c > > @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) > > > > sendit: > > if (svc_authorise(rqstp)) > > - goto close; > > + goto close_xprt; > > return 1; /* Caller can now send it */ > > > > release_dropit: > > @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) > > return 0; > > > > close: > > + svc_authorise(rqstp); > > +close_xprt: > > if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags)) > > svc_close_xprt(rqstp->rq_xprt); > > dprintk("svc: svc_process close\n"); > > @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) > > err_short_len: > > svc_printk(rqstp, "short len %zd, dropping request\n", > > argv->iov_len); > > - goto close; > > + goto close_xprt; > > > > err_bad_rpc: > > serv->sv_stats->rpcbadfmt++; > > -- > > 2.25.1 > > > > > > -- > > Puzzle ITC Deutschland GmbH > > Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 > > Tübingen > > > > Eingetragen am Amtsgericht Stuttgart HRB 765802 > > Geschäftsführer: > > Lukas Kallies, Daniel Kobras, Mark Pröhl > > > > -- > Chuck Lever > > >
> On Mar 1, 2021, at 11:28 AM, Bruce Fields <bfields@fieldses.org> wrote: > > On Mon, Mar 01, 2021 at 03:20:24PM +0000, Chuck Lever wrote: >> >>> On Feb 26, 2021, at 6:04 PM, Daniel Kobras <kobras@puzzle-itc.de> wrote: >>> >>> If an auth module's accept op returns SVC_CLOSE, svc_process_common() >>> enters a call path that does not call svc_authorise() before leaving the >>> function, and thus leaks a reference on the auth module's refcount. Hence, >>> make sure calls to svc_authenticate() and svc_authorise() are paired for >>> all call paths, to make sure rpc auth modules can be unloaded. >>> >>> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message") >>> Signed-off-by: Daniel Kobras <kobras@puzzle-itc.de> >>> --- >>> Hi! >>> >>> While debugging NFS on a system with misconfigured krb5 settings, we noticed >>> a suspiciously high refcount on the auth_rpcgss module, despite all of its >>> consumers already unloaded. I wasn't able to analyze any further on the live >>> system, but had a look at the code afterwards, and found a path that seems >>> to leak references if the mechanism's accept() op shuts down a connection >>> early. Although I couldn't verify, this seem to be a plausible fix. >>> >>> Kind regards, >>> >>> Daniel >> >> Hi Daniel- >> >> I've provisionally included your patch in my NFSD for-rc topic branch >> here: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git >> >> Your bug report seems plausible, but I need to take a closer look at that >> code and your proposed change. Would very much like to hear from others, >> too. > > So, the effect of this is to call svc_authorise more often. I think > that's always safe, because svc_authorise is a no-op unless rq_authops > is set, it clears rq_authops itself, and rq_authops being set is a > guarantee that ->accept() already ran. > > It's harder to know if this solves the problem, as I see a lot of other > mentions of THIS_MODULE in svcauth_gss.c. Perhaps a deeper audit is necessary. A small code change to inject SVC_CLOSE returns at random would enable a more dynamic analysis. > Possibly orthogonal to this problem, but: svcauth_gss_release > unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL > dereference if the kmalloc at the start of svcauth_gss_accept() fails? > > Finally, should we care about module reference leaks? I would prefer that module reference counting work as expected. When it doesn't that tends to lead to people (say, me) hunting for bugs that might actually be serious. > Does anyone really *need* to unload modules? Anyone who wants to replace the module with a newer build that fixes a bug. It avoids a full reboot, and for some that's important. > And will bad stuff happen when the > count overflows, or does the module code fail safely somehow in the > overflow case? I know, bugs are bugs, I should care about fixing all of > them, shame on me.... > > --b. > >> >> >>> net/sunrpc/svc.c | 6 ++++-- >>> 1 file changed, 4 insertions(+), 2 deletions(-) >>> >>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c >>> index 61fb8a18552c..d76dc9d95d16 100644 >>> --- a/net/sunrpc/svc.c >>> +++ b/net/sunrpc/svc.c >>> @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) >>> >>> sendit: >>> if (svc_authorise(rqstp)) >>> - goto close; >>> + goto close_xprt; >>> return 1; /* Caller can now send it */ >>> >>> release_dropit: >>> @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) >>> return 0; >>> >>> close: >>> + svc_authorise(rqstp); >>> +close_xprt: >>> if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags)) >>> svc_close_xprt(rqstp->rq_xprt); >>> dprintk("svc: svc_process close\n"); >>> @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) >>> err_short_len: >>> svc_printk(rqstp, "short len %zd, dropping request\n", >>> argv->iov_len); >>> - goto close; >>> + goto close_xprt; >>> >>> err_bad_rpc: >>> serv->sv_stats->rpcbadfmt++; >>> -- >>> 2.25.1 >>> >>> >>> -- >>> Puzzle ITC Deutschland GmbH >>> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 >>> Tübingen >>> >>> Eingetragen am Amtsgericht Stuttgart HRB 765802 >>> Geschäftsführer: >>> Lukas Kallies, Daniel Kobras, Mark Pröhl >>> >> >> -- >> Chuck Lever -- Chuck Lever
On Mon, Mar 01, 2021 at 05:44:02PM +0000, Chuck Lever wrote: > > So, the effect of this is to call svc_authorise more often. I think > > that's always safe, because svc_authorise is a no-op unless rq_authops > > is set, it clears rq_authops itself, and rq_authops being set is a > > guarantee that ->accept() already ran. > > > > It's harder to know if this solves the problem, as I see a lot of other > > mentions of THIS_MODULE in svcauth_gss.c. > > Perhaps a deeper audit is necessary. > > A small code change to inject SVC_CLOSE returns at random would enable > a more dynamic analysis. That might be interesting. I don't think tihs patch necessarily has to wait for that. > > Possibly orthogonal to this problem, but: svcauth_gss_release > > unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL > > dereference if the kmalloc at the start of svcauth_gss_accept() fails? > > > > Finally, should we care about module reference leaks? > > I would prefer that module reference counting work as expected. When it > doesn't that tends to lead to people (say, me) hunting for bugs that > might actually be serious. > > > > Does anyone really *need* to unload modules? > > Anyone who wants to replace the module with a newer build that fixes a > bug. It avoids a full reboot, and for some that's important. Fair enough. --b.
> On Mar 1, 2021, at 1:15 PM, Bruce Fields <bfields@fieldses.org> wrote: > > On Mon, Mar 01, 2021 at 05:44:02PM +0000, Chuck Lever wrote: >>> So, the effect of this is to call svc_authorise more often. I think >>> that's always safe, because svc_authorise is a no-op unless rq_authops >>> is set, it clears rq_authops itself, and rq_authops being set is a >>> guarantee that ->accept() already ran. >>> >>> It's harder to know if this solves the problem, as I see a lot of other >>> mentions of THIS_MODULE in svcauth_gss.c. >> >> Perhaps a deeper audit is necessary. >> >> A small code change to inject SVC_CLOSE returns at random would enable >> a more dynamic analysis. > > That might be interesting. > > I don't think this patch necessarily has to wait for that. OK. It's in for-rc now, and sounds like that doesn't need to change. Poking around a little, I see a try_module_get() and module_put() done for every RPC. Considering that both have a preempt_disable/enable pair, that seems a little expensive for the value it provides. One might like to see the module reference count handled a little less frequently, but I don't see an obvious way to address that. >>> Possibly orthogonal to this problem, but: svcauth_gss_release >>> unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL >>> dereference if the kmalloc at the start of svcauth_gss_accept() fails? >>> >>> Finally, should we care about module reference leaks? >> >> I would prefer that module reference counting work as expected. When it >> doesn't that tends to lead to people (say, me) hunting for bugs that >> might actually be serious. >> >> >>> Does anyone really *need* to unload modules? >> >> Anyone who wants to replace the module with a newer build that fixes a >> bug. It avoids a full reboot, and for some that's important. > > Fair enough. > > --b. -- Chuck Lever
Hi all! Am 01.03.21 um 18:44 schrieb Chuck Lever: >> On Mar 1, 2021, at 11:28 AM, Bruce Fields <bfields@fieldses.org> wrote: >> >> On Mon, Mar 01, 2021 at 03:20:24PM +0000, Chuck Lever wrote: >>> >>>> On Feb 26, 2021, at 6:04 PM, Daniel Kobras <kobras@puzzle-itc.de> wrote: >>>> >>>> If an auth module's accept op returns SVC_CLOSE, svc_process_common() >>>> enters a call path that does not call svc_authorise() before leaving the >>>> function, and thus leaks a reference on the auth module's refcount. Hence, >>>> make sure calls to svc_authenticate() and svc_authorise() are paired for >>>> all call paths, to make sure rpc auth modules can be unloaded. >>>> >>>> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message") >>>> Signed-off-by: Daniel Kobras <kobras@puzzle-itc.de> >>>> --- >>>> Hi! >>>> >>>> While debugging NFS on a system with misconfigured krb5 settings, we noticed >>>> a suspiciously high refcount on the auth_rpcgss module, despite all of its >>>> consumers already unloaded. I wasn't able to analyze any further on the live >>>> system, but had a look at the code afterwards, and found a path that seems >>>> to leak references if the mechanism's accept() op shuts down a connection >>>> early. Although I couldn't verify, this seem to be a plausible fix. >>>> >>>> Kind regards, >>>> >>>> Daniel >>> >>> Hi Daniel- >>> >>> I've provisionally included your patch in my NFSD for-rc topic branch >>> here: >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git >>> >>> Your bug report seems plausible, but I need to take a closer look at that >>> code and your proposed change. Would very much like to hear from others, >>> too. >> >> So, the effect of this is to call svc_authorise more often. I think >> that's always safe, because svc_authorise is a no-op unless rq_authops >> is set, it clears rq_authops itself, and rq_authops being set is a >> guarantee that ->accept() already ran. >> >> It's harder to know if this solves the problem, as I see a lot of other >> mentions of THIS_MODULE in svcauth_gss.c. > > Perhaps a deeper audit is necessary. > > A small code change to inject SVC_CLOSE returns at random would enable > a more dynamic analysis. I've managed to come up with simple reproducer for this bug: On a working krb5 NFS mount from a test client, check which enctype is used in the ticket for the NFS service. Then unmount, and exclude this enctype from permitted_enctypes in the server's /etc/krb5.conf.[*] Trying to mount again from the test client should now fail (EPERM), and each mount attempt increases the refcount of the server's auth_rpcgss module (by 22 in my test). Exchanging sunrpc.ko for a version with the patch applied, and re-running the same test, the refcount remains constant instead. This confirms the initial analysis, and indicates the fix is actually correct. [*] For a quick test in a standard setup, I've used [libdefaults] permitted_enctypes = aes128-cts-hmac-sha1-96 (...) to make the normal AES256 tickets fail. A more realistic scenario would be a client that's restricted to RC4, and a server with RC4 keys on the KDC, but only AES allowed in krb5.conf. Default behaviour of typical AD join tools makes it easy to end up in a situation like this. >> Possibly orthogonal to this problem, but: svcauth_gss_release >> unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL >> dereference if the kmalloc at the start of svcauth_gss_accept() fails? >> >> Finally, should we care about module reference leaks? > > I would prefer that module reference counting work as expected. When it > doesn't that tends to lead to people (say, me) hunting for bugs that > might actually be serious. The refcount leak is the easily visible consequence, but the skipped call to svcauth_gss_release() probably also leaks a small amount of memory for each request. Repeating the test case above for a longer period of time (eg. by throwing an automounter into the mix), this might eventually become noticeable. >> Does anyone really *need* to unload modules? > > Anyone who wants to replace the module with a newer build that fixes a > bug. It avoids a full reboot, and for some that's important. Switching from rpc.svcgssd to gssproxy without taking down the machine as a whole was the situation that originally prompted me to look into this, but I admit that's a rather rare use case. >> And will bad stuff happen when the >> count overflows, or does the module code fail safely somehow in the >> overflow case? I know, bugs are bugs, I should care about fixing all of >> them, shame on me.... >> >> --b. >> >>> >>> >>>> net/sunrpc/svc.c | 6 ++++-- >>>> 1 file changed, 4 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c >>>> index 61fb8a18552c..d76dc9d95d16 100644 >>>> --- a/net/sunrpc/svc.c >>>> +++ b/net/sunrpc/svc.c >>>> @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) >>>> >>>> sendit: >>>> if (svc_authorise(rqstp)) >>>> - goto close; >>>> + goto close_xprt; >>>> return 1; /* Caller can now send it */ >>>> >>>> release_dropit: >>>> @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) >>>> return 0; >>>> >>>> close: >>>> + svc_authorise(rqstp); >>>> +close_xprt: >>>> if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags)) >>>> svc_close_xprt(rqstp->rq_xprt); >>>> dprintk("svc: svc_process close\n"); >>>> @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) >>>> err_short_len: >>>> svc_printk(rqstp, "short len %zd, dropping request\n", >>>> argv->iov_len); >>>> - goto close; >>>> + goto close_xprt; >>>> >>>> err_bad_rpc: >>>> serv->sv_stats->rpcbadfmt++; >>>> -- >>>> 2.25.1 >>>> >>>> >>>> -- >>>> Puzzle ITC Deutschland GmbH >>>> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 >>>> Tübingen >>>> >>>> Eingetragen am Amtsgericht Stuttgart HRB 765802 >>>> Geschäftsführer: >>>> Lukas Kallies, Daniel Kobras, Mark Pröhl >>>> >>> >>> -- >>> Chuck Lever > > -- > Chuck Lever Kind regards, Daniel
Excellent, Daniel. Thanks for following up! I will add a Link: tag to this thread in the patch description. > On Mar 2, 2021, at 6:50 AM, Daniel Kobras <kobras@puzzle-itc.de> wrote: > > Hi all! > > Am 01.03.21 um 18:44 schrieb Chuck Lever: >>> On Mar 1, 2021, at 11:28 AM, Bruce Fields <bfields@fieldses.org> wrote: >>> >>> On Mon, Mar 01, 2021 at 03:20:24PM +0000, Chuck Lever wrote: >>>> >>>>> On Feb 26, 2021, at 6:04 PM, Daniel Kobras <kobras@puzzle-itc.de> wrote: >>>>> >>>>> If an auth module's accept op returns SVC_CLOSE, svc_process_common() >>>>> enters a call path that does not call svc_authorise() before leaving the >>>>> function, and thus leaks a reference on the auth module's refcount. Hence, >>>>> make sure calls to svc_authenticate() and svc_authorise() are paired for >>>>> all call paths, to make sure rpc auth modules can be unloaded. >>>>> >>>>> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message") >>>>> Signed-off-by: Daniel Kobras <kobras@puzzle-itc.de> >>>>> --- >>>>> Hi! >>>>> >>>>> While debugging NFS on a system with misconfigured krb5 settings, we noticed >>>>> a suspiciously high refcount on the auth_rpcgss module, despite all of its >>>>> consumers already unloaded. I wasn't able to analyze any further on the live >>>>> system, but had a look at the code afterwards, and found a path that seems >>>>> to leak references if the mechanism's accept() op shuts down a connection >>>>> early. Although I couldn't verify, this seem to be a plausible fix. >>>>> >>>>> Kind regards, >>>>> >>>>> Daniel >>>> >>>> Hi Daniel- >>>> >>>> I've provisionally included your patch in my NFSD for-rc topic branch >>>> here: >>>> >>>> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git >>>> >>>> Your bug report seems plausible, but I need to take a closer look at that >>>> code and your proposed change. Would very much like to hear from others, >>>> too. >>> >>> So, the effect of this is to call svc_authorise more often. I think >>> that's always safe, because svc_authorise is a no-op unless rq_authops >>> is set, it clears rq_authops itself, and rq_authops being set is a >>> guarantee that ->accept() already ran. >>> >>> It's harder to know if this solves the problem, as I see a lot of other >>> mentions of THIS_MODULE in svcauth_gss.c. >> >> Perhaps a deeper audit is necessary. >> >> A small code change to inject SVC_CLOSE returns at random would enable >> a more dynamic analysis. > > I've managed to come up with simple reproducer for this bug: > > On a working krb5 NFS mount from a test client, check which enctype is > used in the ticket for the NFS service. Then unmount, and exclude this > enctype from permitted_enctypes in the server's /etc/krb5.conf.[*] > Trying to mount again from the test client should now fail (EPERM), and > each mount attempt increases the refcount of the server's auth_rpcgss > module (by 22 in my test). > > Exchanging sunrpc.ko for a version with the patch applied, and > re-running the same test, the refcount remains constant instead. This > confirms the initial analysis, and indicates the fix is actually correct. > > [*] For a quick test in a standard setup, I've used > [libdefaults] > permitted_enctypes = aes128-cts-hmac-sha1-96 > (...) > to make the normal AES256 tickets fail. A more realistic scenario > would be a client that's restricted to RC4, and a server with > RC4 keys on the KDC, but only AES allowed in krb5.conf. Default > behaviour of typical AD join tools makes it easy to end up in a > situation like this. > >>> Possibly orthogonal to this problem, but: svcauth_gss_release >>> unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL >>> dereference if the kmalloc at the start of svcauth_gss_accept() fails? >>> >>> Finally, should we care about module reference leaks? >> >> I would prefer that module reference counting work as expected. When it >> doesn't that tends to lead to people (say, me) hunting for bugs that >> might actually be serious. > > The refcount leak is the easily visible consequence, but the skipped > call to svcauth_gss_release() probably also leaks a small amount of > memory for each request. Repeating the test case above for a longer > period of time (eg. by throwing an automounter into the mix), this might > eventually become noticeable. > >>> Does anyone really *need* to unload modules? >> >> Anyone who wants to replace the module with a newer build that fixes a >> bug. It avoids a full reboot, and for some that's important. > > Switching from rpc.svcgssd to gssproxy without taking down the machine > as a whole was the situation that originally prompted me to look into > this, but I admit that's a rather rare use case. > >>> And will bad stuff happen when the >>> count overflows, or does the module code fail safely somehow in the >>> overflow case? I know, bugs are bugs, I should care about fixing all of >>> them, shame on me.... >>> >>> --b. >>> >>>> >>>> >>>>> net/sunrpc/svc.c | 6 ++++-- >>>>> 1 file changed, 4 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c >>>>> index 61fb8a18552c..d76dc9d95d16 100644 >>>>> --- a/net/sunrpc/svc.c >>>>> +++ b/net/sunrpc/svc.c >>>>> @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) >>>>> >>>>> sendit: >>>>> if (svc_authorise(rqstp)) >>>>> - goto close; >>>>> + goto close_xprt; >>>>> return 1; /* Caller can now send it */ >>>>> >>>>> release_dropit: >>>>> @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) >>>>> return 0; >>>>> >>>>> close: >>>>> + svc_authorise(rqstp); >>>>> +close_xprt: >>>>> if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags)) >>>>> svc_close_xprt(rqstp->rq_xprt); >>>>> dprintk("svc: svc_process close\n"); >>>>> @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) >>>>> err_short_len: >>>>> svc_printk(rqstp, "short len %zd, dropping request\n", >>>>> argv->iov_len); >>>>> - goto close; >>>>> + goto close_xprt; >>>>> >>>>> err_bad_rpc: >>>>> serv->sv_stats->rpcbadfmt++; >>>>> -- >>>>> 2.25.1 >>>>> >>>>> >>>>> -- >>>>> Puzzle ITC Deutschland GmbH >>>>> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 >>>>> Tübingen >>>>> >>>>> Eingetragen am Amtsgericht Stuttgart HRB 765802 >>>>> Geschäftsführer: >>>>> Lukas Kallies, Daniel Kobras, Mark Pröhl >>>>> >>>> >>>> -- >>>> Chuck Lever >> >> -- >> Chuck Lever > > Kind regards, > > Daniel > -- > Daniel Kobras > Principal Architect > Puzzle ITC Deutschland > +49 7071 14316 0 > www.puzzle-itc.de > > -- > Puzzle ITC Deutschland GmbH > Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 > Tübingen > > Eingetragen am Amtsgericht Stuttgart HRB 765802 > Geschäftsführer: > Lukas Kallies, Daniel Kobras, Mark Pröhl -- Chuck Lever
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 61fb8a18552c..d76dc9d95d16 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) sendit: if (svc_authorise(rqstp)) - goto close; + goto close_xprt; return 1; /* Caller can now send it */ release_dropit: @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) return 0; close: + svc_authorise(rqstp); +close_xprt: if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags)) svc_close_xprt(rqstp->rq_xprt); dprintk("svc: svc_process close\n"); @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) err_short_len: svc_printk(rqstp, "short len %zd, dropping request\n", argv->iov_len); - goto close; + goto close_xprt; err_bad_rpc: serv->sv_stats->rpcbadfmt++;
If an auth module's accept op returns SVC_CLOSE, svc_process_common() enters a call path that does not call svc_authorise() before leaving the function, and thus leaks a reference on the auth module's refcount. Hence, make sure calls to svc_authenticate() and svc_authorise() are paired for all call paths, to make sure rpc auth modules can be unloaded. Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message") Signed-off-by: Daniel Kobras <kobras@puzzle-itc.de> --- Hi! While debugging NFS on a system with misconfigured krb5 settings, we noticed a suspiciously high refcount on the auth_rpcgss module, despite all of its consumers already unloaded. I wasn't able to analyze any further on the live system, but had a look at the code afterwards, and found a path that seems to leak references if the mechanism's accept() op shuts down a connection early. Although I couldn't verify, this seem to be a plausible fix. Kind regards, Daniel net/sunrpc/svc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)