mbox series

[mptcp-next,0/6] mptcp: sockopt: uniform code to get/set values

Message ID 20230707-mptcp-unify-sockopt-issue-353-v1-0-693e15c06646@tessares.net (mailing list archive)
Headers show
Series mptcp: sockopt: uniform code to get/set values | expand

Message

Matthieu Baerts July 7, 2023, 4 p.m. UTC
Here is the implementation of an idea I suggested a few months ago: it
is a factoring of the sockopt code trying to uniform it, mainly to rely
on {ip,ipv6,tcp}_[gs]etsockopt() instead of copying what is done there
to get and set values.

That's a first step. More might come if we are OK with the approach,
e.g.

- For SOL_TCP socket options we cannot store info in the MPTCP socket so
  we still need to "manually" store it somewhere in the msk. Maybe we
  could store info in the different subflows (and create a first one if
  there is no subflow) and retrieve info from there?

- For SOL_SOCKET socket options, it looks like it should be feasible to
  do something similar to what I did here but I'm probably missing
  something. This could clean quite a lot of code, e.g. the last commit
  from:

    https://github.com/matttbe/mptcp_net-next/commits/b4/mptcp-unify-sockopt-issue-353

- To sync the socket options when a new subflow has been created, for
  the moment, we copy info from the msk to the subflow but maybe it
   would be fine to call some getsockopt() / setsockopt() to do the
  sync? If we take this approach, we might want to save a list of
  setsockopt() that have been done in the past on this socket and only
  re-do these ones? That should help with the maintenance of these
  socket options and ease the support of new ones.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
---
Matthieu Baerts (6):
      mptcp: sockopt: move tcp_inq code to a dedicated function
      mptcp: sockopt: update supported list
      mptcp: sockopt: get val in a generic way
      mptcp: sockopt: add missing getsockopt() options
      mptcp: sockopt: set val in a generic way
      mptcp: sockopt: support IP_TTL & IPV6_UNICAST_HOPS

 net/mptcp/sockopt.c | 241 ++++++++++++++++++++++++++--------------------------
 1 file changed, 121 insertions(+), 120 deletions(-)
---
base-commit: cde4fa6a0a5ed3effb75008ce99eb842bc448e5d
change-id: 20230221-mptcp-unify-sockopt-issue-353-1ad56ba6b04e

Best regards,

Comments

Paolo Abeni July 10, 2023, 4:51 p.m. UTC | #1
On Fri, 2023-07-07 at 18:00 +0200, Matthieu Baerts wrote:
> Here is the implementation of an idea I suggested a few months ago: it
> is a factoring of the sockopt code trying to uniform it, mainly to rely
> on {ip,ipv6,tcp}_[gs]etsockopt() instead of copying what is done there
> to get and set values.
> 
> That's a first step. More might come if we are OK with the approach,
> e.g.
> 
> - For SOL_TCP socket options we cannot store info in the MPTCP socket so
>   we still need to "manually" store it somewhere in the msk. Maybe we
>   could store info in the different subflows (and create a first one if
>   there is no subflow) and retrieve info from there?
> 
> - For SOL_SOCKET socket options, it looks like it should be feasible to
>   do something similar to what I did here but I'm probably missing
>   something. This could clean quite a lot of code, e.g. the last commit
>   from:
> 
>     https://github.com/matttbe/mptcp_net-next/commits/b4/mptcp-unify-sockopt-issue-353
> 
> - To sync the socket options when a new subflow has been created, for
>   the moment, we copy info from the msk to the subflow but maybe it
>    would be fine to call some getsockopt() / setsockopt() to do the
>   sync? If we take this approach, we might want to save a list of
>   setsockopt() that have been done in the past on this socket and only
>   re-do these ones? That should help with the maintenance of these
>   socket options and ease the support of new ones.

I would highly discourage this latest approach. We will have to limit
the list length, breaking the sync at some point. And there are few
quite gray area, e.g. how errors should be handled? I also fear the
"final" state of the subflow could be different from the
expected/correct one for non trivial socktopt. 


Cheers,

Paolo
Matthieu Baerts July 11, 2023, 10:22 a.m. UTC | #2
Hi Paolo,

Thank you for your feedback!

On 10/07/2023 18:51, Paolo Abeni wrote:
> On Fri, 2023-07-07 at 18:00 +0200, Matthieu Baerts wrote:
>> Here is the implementation of an idea I suggested a few months ago: it
>> is a factoring of the sockopt code trying to uniform it, mainly to rely
>> on {ip,ipv6,tcp}_[gs]etsockopt() instead of copying what is done there
>> to get and set values.
>>
>> That's a first step. More might come if we are OK with the approach,
>> e.g.
>>
>> - For SOL_TCP socket options we cannot store info in the MPTCP socket so
>>   we still need to "manually" store it somewhere in the msk. Maybe we
>>   could store info in the different subflows (and create a first one if
>>   there is no subflow) and retrieve info from there?
>>
>> - For SOL_SOCKET socket options, it looks like it should be feasible to
>>   do something similar to what I did here but I'm probably missing
>>   something. This could clean quite a lot of code, e.g. the last commit
>>   from:
>>
>>     https://github.com/matttbe/mptcp_net-next/commits/b4/mptcp-unify-sockopt-issue-353
>>
>> - To sync the socket options when a new subflow has been created, for
>>   the moment, we copy info from the msk to the subflow but maybe it
>>    would be fine to call some getsockopt() / setsockopt() to do the
>>   sync? If we take this approach, we might want to save a list of
>>   setsockopt() that have been done in the past on this socket and only
>>   re-do these ones? That should help with the maintenance of these
>>   socket options and ease the support of new ones.
> 
> I would highly discourage this latest approach.

Before going further, I want to insist on the fact that it is just an
idea, I didn't look at the implementation (and to be honest, I was not
looking at doing that now :) )

> We will have to limit
> the list length, breaking the sync at some point.

Why would we have to limit the list length and break the sync?

I understand having a list might not feel good but I'm not sure what
else we can have.

> And there are few
> quite gray area, e.g. how errors should be handled?

Same as what we have now I guess: errors are ignored :)
But I don't see what else we can and it doesn't feel worst than before
(regarding how we handle errors during the sync)

> I also fear the
> "final" state of the subflow could be different from the
> expected/correct one for non trivial socktopt.

I would expect it to be better because we would follow the "normal" flow
to set values. What bother me is the fact we bypass checks and directly
set values. If the code change on the different IP/TCP/SOCKET socket
options side, we might (very likely) miss the fact we also need to
update what we are doing on the MPTCP side, e.g. if new checks or extra
actions (BPF/Netfilter hooks) are added later, etc. So the idea of this
approach is to ease the maintenance (but maybe that's not needed and/or
not worth it) :)

Of course, this would work only if we can do something like *only* for
each setsockopt() that have been done before:

  void sync_sockopt(sock, level, optname)
  {
    socklen_t buflen;
    char val[1024]; // we can also record the max given before

    buflen = sizeof(val);

    if ((err = getsockopt(sock, level, optname, val, &buflen) < 0))
        return;

    setsockopt(sock, level, optname, val, buflen);
  }


Cheers,
Matt
Paolo Abeni July 11, 2023, 12:50 p.m. UTC | #3
On Tue, 2023-07-11 at 12:22 +0200, Matthieu Baerts wrote:
> On 10/07/2023 18:51, Paolo Abeni wrote:
> > On Fri, 2023-07-07 at 18:00 +0200, Matthieu Baerts wrote:
> > > Here is the implementation of an idea I suggested a few months ago: it
> > > is a factoring of the sockopt code trying to uniform it, mainly to rely
> > > on {ip,ipv6,tcp}_[gs]etsockopt() instead of copying what is done there
> > > to get and set values.
> > > 
> > > That's a first step. More might come if we are OK with the approach,
> > > e.g.
> > > 
> > > - For SOL_TCP socket options we cannot store info in the MPTCP socket so
> > >   we still need to "manually" store it somewhere in the msk. Maybe we
> > >   could store info in the different subflows (and create a first one if
> > >   there is no subflow) and retrieve info from there?
> > > 
> > > - For SOL_SOCKET socket options, it looks like it should be feasible to
> > >   do something similar to what I did here but I'm probably missing
> > >   something. This could clean quite a lot of code, e.g. the last commit
> > >   from:
> > > 
> > >     https://github.com/matttbe/mptcp_net-next/commits/b4/mptcp-unify-sockopt-issue-353
> > > 
> > > - To sync the socket options when a new subflow has been created, for
> > >   the moment, we copy info from the msk to the subflow but maybe it
> > >    would be fine to call some getsockopt() / setsockopt() to do the
> > >   sync? If we take this approach, we might want to save a list of
> > >   setsockopt() that have been done in the past on this socket and only
> > >   re-do these ones? That should help with the maintenance of these
> > >   socket options and ease the support of new ones.
> > 
> > I would highly discourage this latest approach.
> 
> Before going further, I want to insist on the fact that it is just an
> idea, I didn't look at the implementation (and to be honest, I was not
> looking at doing that now :) )
> 
> > We will have to limit
> > the list length, breaking the sync at some point.
> 
> Why would we have to limit the list length and break the sync?
> 
> I understand having a list might not feel good but I'm not sure what
> else we can have.

Because otherwise syzkaller (or evil unprivileged user-space program)
can easily exhaust the kernel memory doing a lot of setsockopt: we need
to allocate the list entry, and we can't never flush the list.

> > And there are few
> > quite gray area, e.g. how errors should be handled?
> 
> Same as what we have now I guess: errors are ignored :)
> But I don't see what else we can and it doesn't feel worst than before
> (regarding how we handle errors during the sync)

Note the with the current schema sync can fail only while setting the
CA, and even that requires a quite non trivial setup. If we start
replying arbitrary complex sockopt sequence, I fear the we can hit
failures much more frequently.

> 
> > I also fear the
> > "final" state of the subflow could be different from the
> > expected/correct one for non trivial socktopt.
> 
> I would expect it to be better because we would follow the "normal" flow
> to set values. What bother me is the fact we bypass checks and directly
> set values. 

Actually we don't! Such checks are performed once for the sockopt on
the main/initial subflow (or by the mptcp impl).

IMHO it's safer then the list alternative. For passive socket the
latter will do the full sockopt checking using a different initial sock
internal state: the sockopt will be applied after cloning and the
related initialization.

For plain TCP sockopt value are inherited from the parent via the clone
itself, which is quite near to what the current sync implementation is
doing.

> If the code change on the different IP/TCP/SOCKET socket
> options side, we might (very likely) miss the fact we also need to
> update what we are doing on the MPTCP side, e.g. if new checks or extra
> actions (BPF/Netfilter hooks) are added later, etc. So the idea of this
> approach is to ease the maintenance (but maybe that's not needed and/or
> not worth it) :)

The eBPF maintainer mentioned that new hooks are discouraged, IIRC ;)


Cheers,

Paolo
Matthieu Baerts July 11, 2023, 1:55 p.m. UTC | #4
On 11/07/2023 14:50, Paolo Abeni wrote:
> On Tue, 2023-07-11 at 12:22 +0200, Matthieu Baerts wrote:
>> On 10/07/2023 18:51, Paolo Abeni wrote:
>>> On Fri, 2023-07-07 at 18:00 +0200, Matthieu Baerts wrote:
>>>> Here is the implementation of an idea I suggested a few months ago: it
>>>> is a factoring of the sockopt code trying to uniform it, mainly to rely
>>>> on {ip,ipv6,tcp}_[gs]etsockopt() instead of copying what is done there
>>>> to get and set values.
>>>>
>>>> That's a first step. More might come if we are OK with the approach,
>>>> e.g.
>>>>
>>>> - For SOL_TCP socket options we cannot store info in the MPTCP socket so
>>>>   we still need to "manually" store it somewhere in the msk. Maybe we
>>>>   could store info in the different subflows (and create a first one if
>>>>   there is no subflow) and retrieve info from there?
>>>>
>>>> - For SOL_SOCKET socket options, it looks like it should be feasible to
>>>>   do something similar to what I did here but I'm probably missing
>>>>   something. This could clean quite a lot of code, e.g. the last commit
>>>>   from:
>>>>
>>>>     https://github.com/matttbe/mptcp_net-next/commits/b4/mptcp-unify-sockopt-issue-353
>>>>
>>>> - To sync the socket options when a new subflow has been created, for
>>>>   the moment, we copy info from the msk to the subflow but maybe it
>>>>    would be fine to call some getsockopt() / setsockopt() to do the
>>>>   sync? If we take this approach, we might want to save a list of
>>>>   setsockopt() that have been done in the past on this socket and only
>>>>   re-do these ones? That should help with the maintenance of these
>>>>   socket options and ease the support of new ones.
>>>
>>> I would highly discourage this latest approach.
>>
>> Before going further, I want to insist on the fact that it is just an
>> idea, I didn't look at the implementation (and to be honest, I was not
>> looking at doing that now :) )
>>
>>> We will have to limit
>>> the list length, breaking the sync at some point.
>>
>> Why would we have to limit the list length and break the sync?
>>
>> I understand having a list might not feel good but I'm not sure what
>> else we can have.
> 
> Because otherwise syzkaller (or evil unprivileged user-space program)
> can easily exhaust the kernel memory doing a lot of setsockopt: we need
> to allocate the list entry, and we can't never flush the list.

I see, thank you. We could then add items (with just "level" and
"optname") in the list only if the initial setsockopt() operation was a
success and if there was no previous entry with the same info.

But I see the risk.

>>> And there are few
>>> quite gray area, e.g. how errors should be handled?
>>
>> Same as what we have now I guess: errors are ignored :)
>> But I don't see what else we can and it doesn't feel worst than before
>> (regarding how we handle errors during the sync)
> 
> Note the with the current schema sync can fail only while setting the
> CA, and even that requires a quite non trivial setup. If we start
> replying arbitrary complex sockopt sequence, I fear the we can hit
> failures much more frequently.

Maybe. We can also start with an "allow list".

>>> I also fear the
>>> "final" state of the subflow could be different from the
>>> expected/correct one for non trivial socktopt.
>>
>> I would expect it to be better because we would follow the "normal" flow
>> to set values. What bother me is the fact we bypass checks and directly
>> set values. 
> 
> Actually we don't! Such checks are performed once for the sockopt on
> the main/initial subflow (or by the mptcp impl).
> 
> IMHO it's safer then the list alternative. For passive socket the
> latter will do the full sockopt checking using a different initial sock
> internal state: the sockopt will be applied after cloning and the
> related initialization.
> 
> For plain TCP sockopt value are inherited from the parent via the clone
> itself, which is quite near to what the current sync implementation is
> doing.

Yes but that's only good for passive sockets. It would be nice if we
could clone all the attributes of the first subflow when new ones are
created :)

It might be safer now but if we don't follow the code we duplicated from
the other layers (i.e. what is exactly done with a TCP socket when a
setsockopt() is done), it might be less safe later than changing the
different values using the sockopt API.

>> If the code change on the different IP/TCP/SOCKET socket
>> options side, we might (very likely) miss the fact we also need to
>> update what we are doing on the MPTCP side, e.g. if new checks or extra
>> actions (BPF/Netfilter hooks) are added later, etc. So the idea of this
>> approach is to ease the maintenance (but maybe that's not needed and/or
>> not worth it) :)
> 
> The eBPF maintainer mentioned that new hooks are discouraged, IIRC ;)

Until someone comes with the new extended eBPF technology :-P

No but what is done by the kernel when a "setsockopt()" is emitted could
be modified, e.g. if the value is no longer a bit but an enum or if a
security check needs to be done before, etc. (Or maybe I'm thinking
about unrealistic/very unlikely/forbidden modifications)?

Cheers,
Matt