diff mbox series

[1/1] rpcbind: Fix DoS vulnerability in rpcbind.

Message ID 20210807170206.68768-1-dai.ngo@oracle.com (mailing list archive)
State New, archived
Headers show
Series [1/1] rpcbind: Fix DoS vulnerability in rpcbind. | expand

Commit Message

Dai Ngo Aug. 7, 2021, 5:02 p.m. UTC
Currently my_svc_run does not handle poll time allowing idle TCP
connections to remain ESTABLISHED indefinitely. When the number
of connections reaches the limit the open file descriptors
(ulimit -n) then accept(2) fails with EMFILE. Since libtirpc does
not handle EMFILE returned from accept(2) this get my_svc_run into
a tight loop calling accept(2) resulting in the RPC service being
down, it's no longer able to service any requests.

Fix by add call to __svc_destroy_idle to my_svc_run to remove
inactive connections when poll(2) returns timeout.

Fixes: b2c9430f46c4 Use poll() instead of select() in svc_run()
Signed-off-by: dai.ngo@oracle.com
---
 src/rpcb_svc_com.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Chuck Lever III Aug. 8, 2021, 7:36 p.m. UTC | #1
Hi Dai-

Thanks for posting. I agree this is a real bug,
but I'm going to quibble a little with the
details of your proposed solution.


> On Aug 7, 2021, at 1:02 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> 
> Currently my_svc_run does not handle poll time allowing idle TCP

s/poll time/poll timeout/

> connections to remain ESTABLISHED indefinitely. When the number
> of connections reaches the limit the open file descriptors
> (ulimit -n) then accept(2) fails with EMFILE. Since libtirpc does
> not handle EMFILE returned from accept(2)

Just curious, should libtirpc be handling EMFILE?


> this get my_svc_run into
> a tight loop calling accept(2) resulting in the RPC service being
> down, it's no longer able to service any requests.
> 
> Fix by add call to __svc_destroy_idle to my_svc_run to remove
> inactive connections when poll(2) returns timeout.
> 
> Fixes: b2c9430f46c4 Use poll() instead of select() in svc_run()

I don't have this commit in my rpcbind repo.

I do have 44bf15b86861 ("rpcbind: don't use obsolete
svc_fdset interface of libtirpc"). IMO that's the one
you should list here.

Also, please Cc: Thorsten Kukuk <kukuk@thkukuk.de> on
future versions of this patch series in case he has
any comments.


> Signed-off-by: dai.ngo@oracle.com
> ---
> src/rpcb_svc_com.c | 3 +++
> 1 file changed, 3 insertions(+)
> 
> diff --git a/src/rpcb_svc_com.c b/src/rpcb_svc_com.c
> index 1743dadf5db7..cb33519010d3 100644
> --- a/src/rpcb_svc_com.c
> +++ b/src/rpcb_svc_com.c
> @@ -1048,6 +1048,8 @@ netbuffree(struct netbuf *ap)
> }
> 
> 
> +extern void __svc_destroy_idle(int, bool_t);

Your libtirpc patch does not add __svc_destroy_idle()
as part of the official libtirpc API. We really must
avoid adding "secret" library entry points (not to
mention, as SteveD pointed out, a SONAME change
would be required).

rpcbind used to call __svc_clean_idle(), which is
undocumented (as far as I can tell). 44bf15b86861
removed that call.

I think a better approach would be to duplicate your
proposed __svc_destroy_idle() code in rpcbind, since
rpcbind is not actually using the library's RPC
dispatcher.

That would get rid of the technical debt of calling
an undocumented library API.

The helper would need to call svc_vc_destroy()
rather than __xprt_unregister_unlocked() and
__svc_vc_dodestroy(). Also not clear to me that
rpcbind's my_svc_run() needs the protection of
holding svc_fd_lock, if rpcbind itself is not
multi-threaded?


The alternative would be to construct a fully
documented public API with man pages and header
declarations. I'm not terribly comfortable with
diverging from the historical Sun TI-RPC programming
interface, however.


> +
> void
> my_svc_run()
> {
> @@ -1076,6 +1078,7 @@ my_svc_run()
> 			 * other outside event) and not caused by poll().
> 			 */
> 		case 0:
> +			__svc_destroy_idle(30, FALSE);
> 			continue;
> 		default:
> 			/*
> -- 
> 2.26.2
> 

--
Chuck Lever
Dai Ngo Aug. 8, 2021, 9:18 p.m. UTC | #2
Hi Chuck,

On 8/8/21 12:36 PM, Chuck Lever III wrote:
> Hi Dai-
>
> Thanks for posting. I agree this is a real bug,
> but I'm going to quibble a little with the
> details of your proposed solution.
>
>
>> On Aug 7, 2021, at 1:02 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>
>> Currently my_svc_run does not handle poll time allowing idle TCP
> s/poll time/poll timeout/

thanks, will fix.

>
>> connections to remain ESTABLISHED indefinitely. When the number
>> of connections reaches the limit the open file descriptors
>> (ulimit -n) then accept(2) fails with EMFILE. Since libtirpc does
>> not handle EMFILE returned from accept(2)
> Just curious, should libtirpc be handling EMFILE?

Yes, if we don't handle EMFILE returned from accept in libtirpc
then svc_run gets in a tight loop trying to accept the same connection:

poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=5, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND},
       ...
      {fd=36, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, ...], 1019, 30000) = 1 ([{fd=4, revents=POLLIN|POLLRDNORM}])

accept(4, 0x7ffd4ac368f0, [128])        = -1 EMFILE (Too many open files)

This is what causes the RPC service to be down.

>
>
>> this get my_svc_run into
>> a tight loop calling accept(2) resulting in the RPC service being
>> down, it's no longer able to service any requests.
>>
>> Fix by add call to __svc_destroy_idle to my_svc_run to remove
>> inactive connections when poll(2) returns timeout.
>>
>> Fixes: b2c9430f46c4 Use poll() instead of select() in svc_run()
> I don't have this commit in my rpcbind repo.
>
> I do have 44bf15b86861 ("rpcbind: don't use obsolete
> svc_fdset interface of libtirpc"). IMO that's the one
> you should list here.

Sorry, my mistake, that is the commit for libtirpc.
The commit for rpcbind is as you listed:

44bf15b8     rpcbind: don't use obsolete svc_fdset interface of libtirpc

>
> Also, please Cc: Thorsten Kukuk <kukuk@thkukuk.de> on
> future versions of this patch series in case he has
> any comments.

will do.

>
>
>> Signed-off-by: dai.ngo@oracle.com
>> ---
>> src/rpcb_svc_com.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/src/rpcb_svc_com.c b/src/rpcb_svc_com.c
>> index 1743dadf5db7..cb33519010d3 100644
>> --- a/src/rpcb_svc_com.c
>> +++ b/src/rpcb_svc_com.c
>> @@ -1048,6 +1048,8 @@ netbuffree(struct netbuf *ap)
>> }
>>
>>
>> +extern void __svc_destroy_idle(int, bool_t);
> Your libtirpc patch does not add __svc_destroy_idle()
> as part of the official libtirpc API. We really must
> avoid adding "secret" library entry points (not to
> mention, as SteveD pointed out, a SONAME change
> would be required).
>
> rpcbind used to call __svc_clean_idle(), which is
> undocumented (as far as I can tell).

I try to use the same mechanism as __svc_clean_idle, assuming
it was an acceptable approach.

> 44bf15b86861
> removed that call.

which is one of the bugs that causes this problem.

>
> I think a better approach would be to duplicate your
> proposed __svc_destroy_idle() code in rpcbind, since
> rpcbind is not actually using the library's RPC
> dispatcher.

then we have to do the same for nfs-utils. That means the
same code exits in 3 places: libtirpc, rpcbind and nfs-utils.
And any new consumer that uses it own my_svc_run will
need its own __svc_destroy_idle. It does not sound very
efficient.

>
> That would get rid of the technical debt of calling
> an undocumented library API.
>
> The helper would need to call svc_vc_destroy()
> rather than __xprt_unregister_unlocked() and
> __svc_vc_dodestroy(). Also not clear to me that
> rpcbind's my_svc_run() needs the protection of
> holding svc_fd_lock, if rpcbind itself is not
> multi-threaded?

The lock is held by __svc_destroy_idle, or __svc_clean_idle
before it was removed, in libtirpc. I think libtirpc is
multi-threaded.

>
>
> The alternative would be to construct a fully
> documented public API with man pages and header
> declarations. I'm not terribly comfortable with
> diverging from the historical Sun TI-RPC programming
> interface, however.

What I'm trying to do is to follow what __svc_clean_idle
used to do before it was removed. I can't totally re-use
__svc_clean_idle because it was written to work with
svc_fdset and select.  The current svc_run uses poll so
the handling of idle connections is a little difference.

-Dai

>
>
>> +
>> void
>> my_svc_run()
>> {
>> @@ -1076,6 +1078,7 @@ my_svc_run()
>> 			 * other outside event) and not caused by poll().
>> 			 */
>> 		case 0:
>> +			__svc_destroy_idle(30, FALSE);
>> 			continue;
>> 		default:
>> 			/*
>> -- 
>> 2.26.2
>>
> --
> Chuck Lever
>
>
>
Chuck Lever III Aug. 8, 2021, 10:38 p.m. UTC | #3
> On Aug 8, 2021, at 5:18 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> 
> On 8/8/21 12:36 PM, Chuck Lever III wrote:
>> 
>>> Signed-off-by: dai.ngo@oracle.com
>>> ---
>>> src/rpcb_svc_com.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>> 
>>> diff --git a/src/rpcb_svc_com.c b/src/rpcb_svc_com.c
>>> index 1743dadf5db7..cb33519010d3 100644
>>> --- a/src/rpcb_svc_com.c
>>> +++ b/src/rpcb_svc_com.c
>>> @@ -1048,6 +1048,8 @@ netbuffree(struct netbuf *ap)
>>> }
>>> 
>>> 
>>> +extern void __svc_destroy_idle(int, bool_t);
>> Your libtirpc patch does not add __svc_destroy_idle()
>> as part of the official libtirpc API. We really must
>> avoid adding "secret" library entry points (not to
>> mention, as SteveD pointed out, a SONAME change
>> would be required).
>> 
>> rpcbind used to call __svc_clean_idle(), which is
>> undocumented (as far as I can tell).
> 
> I try to use the same mechanism as __svc_clean_idle, assuming
> it was an acceptable approach.

Understood. I'm not quibbling with that mechanism, but
with the addition of another private API. Private APIs
are very difficult to maintain in the long run.


>> 44bf15b86861 removed that call.
> 
> which is one of the bugs that causes this problem.
> 
>> I think a better approach would be to duplicate your
>> proposed __svc_destroy_idle() code in rpcbind, since
>> rpcbind is not actually using the library's RPC
>> dispatcher.
> 
> then we have to do the same for nfs-utils.

Exactly.


> That means the
> same code exits in 3 places: libtirpc, rpcbind and nfs-utils.
> And any new consumer that uses it own my_svc_run will
> need its own __svc_destroy_idle. It does not sound very
> efficient.

You mean you don't particularly care for the duplication
of the code. Fair enough.

I don't have any problem copying this small piece of
code into callers that long ago went to the effort
of implementing their own polling loops. The code
duplication here isn't bothering me much.

If you see a way to convert these external callers to
use enough of the library svc APIs that they get to
use __svc_destroy_idle() automatically, that would to
me be an acceptable alternative.


>> That would get rid of the technical debt of calling
>> an undocumented library API.
>> 
>> The helper would need to call svc_vc_destroy()
>> rather than __xprt_unregister_unlocked() and
>> __svc_vc_dodestroy(). Also not clear to me that
>> rpcbind's my_svc_run() needs the protection of
>> holding svc_fd_lock, if rpcbind itself is not
>> multi-threaded?
> 
> The lock is held by __svc_destroy_idle, or __svc_clean_idle
> before it was removed, in libtirpc. I think libtirpc is
> multi-threaded.

libtirpc is multi-threaded, so __svc_destroy_idle()
is correct to take that lock. I'm suggesting that
the copies of destroy_idle() in the external callers
would likely not need to take this lock. I don't see
any references to it in the rpcbind source code, so
it's probably not necessary for the rpcbind version
of the destroy_idle helper to take the lock.

Note that svc_fd_lock is not part of the public
library API either.


>> The alternative would be to construct a fully
>> documented public API with man pages and header
>> declarations. I'm not terribly comfortable with
>> diverging from the historical Sun TI-RPC programming
>> interface, however.
> 
> What I'm trying to do is to follow what __svc_clean_idle
> used to do before it was removed.

Sure, I get that. That seems like a prudent approach.


> I can't totally re-use
> __svc_clean_idle because it was written to work with
> svc_fdset and select.  The current svc_run uses poll so
> the handling of idle connections is a little difference.

Introducing yet another private API should be off the
table. An alternative fix would be to add a public API
that is documented and appears in the headers. Again,
I would rather not diverge from the historical Sun
TI-RPC library API, and SteveD would rather not have
to change the library's SONAME, so I'm hesitant to
add a public API too.

The bottom line is that adding a library API to handle
idle connections is the right architectural approach,
all things being equal. But the goal for this library
is compatibility with a historical API that is also
available in other operating systems. We have to stick
with that API if we can.


--
Chuck Lever
diff mbox series

Patch

diff --git a/src/rpcb_svc_com.c b/src/rpcb_svc_com.c
index 1743dadf5db7..cb33519010d3 100644
--- a/src/rpcb_svc_com.c
+++ b/src/rpcb_svc_com.c
@@ -1048,6 +1048,8 @@  netbuffree(struct netbuf *ap)
 }
 
 
+extern void __svc_destroy_idle(int, bool_t);
+
 void
 my_svc_run()
 {
@@ -1076,6 +1078,7 @@  my_svc_run()
 			 * other outside event) and not caused by poll().
 			 */
 		case 0:
+			__svc_destroy_idle(30, FALSE);
 			continue;
 		default:
 			/*