Message ID | 20230707-mptcp-unify-sockopt-issue-353-v1-0-693e15c06646@tessares.net (mailing list archive) |
---|---|
Headers | show |
Series | mptcp: sockopt: uniform code to get/set values | expand |
On Fri, 2023-07-07 at 18:00 +0200, Matthieu Baerts wrote: > Here is the implementation of an idea I suggested a few months ago: it > is a factoring of the sockopt code trying to uniform it, mainly to rely > on {ip,ipv6,tcp}_[gs]etsockopt() instead of copying what is done there > to get and set values. > > That's a first step. More might come if we are OK with the approach, > e.g. > > - For SOL_TCP socket options we cannot store info in the MPTCP socket so > we still need to "manually" store it somewhere in the msk. Maybe we > could store info in the different subflows (and create a first one if > there is no subflow) and retrieve info from there? > > - For SOL_SOCKET socket options, it looks like it should be feasible to > do something similar to what I did here but I'm probably missing > something. This could clean quite a lot of code, e.g. the last commit > from: > > https://github.com/matttbe/mptcp_net-next/commits/b4/mptcp-unify-sockopt-issue-353 > > - To sync the socket options when a new subflow has been created, for > the moment, we copy info from the msk to the subflow but maybe it > would be fine to call some getsockopt() / setsockopt() to do the > sync? If we take this approach, we might want to save a list of > setsockopt() that have been done in the past on this socket and only > re-do these ones? That should help with the maintenance of these > socket options and ease the support of new ones. I would highly discourage this latest approach. We will have to limit the list length, breaking the sync at some point. And there are few quite gray area, e.g. how errors should be handled? I also fear the "final" state of the subflow could be different from the expected/correct one for non trivial socktopt. Cheers, Paolo
Hi Paolo, Thank you for your feedback! On 10/07/2023 18:51, Paolo Abeni wrote: > On Fri, 2023-07-07 at 18:00 +0200, Matthieu Baerts wrote: >> Here is the implementation of an idea I suggested a few months ago: it >> is a factoring of the sockopt code trying to uniform it, mainly to rely >> on {ip,ipv6,tcp}_[gs]etsockopt() instead of copying what is done there >> to get and set values. >> >> That's a first step. More might come if we are OK with the approach, >> e.g. >> >> - For SOL_TCP socket options we cannot store info in the MPTCP socket so >> we still need to "manually" store it somewhere in the msk. Maybe we >> could store info in the different subflows (and create a first one if >> there is no subflow) and retrieve info from there? >> >> - For SOL_SOCKET socket options, it looks like it should be feasible to >> do something similar to what I did here but I'm probably missing >> something. This could clean quite a lot of code, e.g. the last commit >> from: >> >> https://github.com/matttbe/mptcp_net-next/commits/b4/mptcp-unify-sockopt-issue-353 >> >> - To sync the socket options when a new subflow has been created, for >> the moment, we copy info from the msk to the subflow but maybe it >> would be fine to call some getsockopt() / setsockopt() to do the >> sync? If we take this approach, we might want to save a list of >> setsockopt() that have been done in the past on this socket and only >> re-do these ones? That should help with the maintenance of these >> socket options and ease the support of new ones. > > I would highly discourage this latest approach. Before going further, I want to insist on the fact that it is just an idea, I didn't look at the implementation (and to be honest, I was not looking at doing that now :) ) > We will have to limit > the list length, breaking the sync at some point. Why would we have to limit the list length and break the sync? I understand having a list might not feel good but I'm not sure what else we can have. > And there are few > quite gray area, e.g. how errors should be handled? Same as what we have now I guess: errors are ignored :) But I don't see what else we can and it doesn't feel worst than before (regarding how we handle errors during the sync) > I also fear the > "final" state of the subflow could be different from the > expected/correct one for non trivial socktopt. I would expect it to be better because we would follow the "normal" flow to set values. What bother me is the fact we bypass checks and directly set values. If the code change on the different IP/TCP/SOCKET socket options side, we might (very likely) miss the fact we also need to update what we are doing on the MPTCP side, e.g. if new checks or extra actions (BPF/Netfilter hooks) are added later, etc. So the idea of this approach is to ease the maintenance (but maybe that's not needed and/or not worth it) :) Of course, this would work only if we can do something like *only* for each setsockopt() that have been done before: void sync_sockopt(sock, level, optname) { socklen_t buflen; char val[1024]; // we can also record the max given before buflen = sizeof(val); if ((err = getsockopt(sock, level, optname, val, &buflen) < 0)) return; setsockopt(sock, level, optname, val, buflen); } Cheers, Matt
On Tue, 2023-07-11 at 12:22 +0200, Matthieu Baerts wrote: > On 10/07/2023 18:51, Paolo Abeni wrote: > > On Fri, 2023-07-07 at 18:00 +0200, Matthieu Baerts wrote: > > > Here is the implementation of an idea I suggested a few months ago: it > > > is a factoring of the sockopt code trying to uniform it, mainly to rely > > > on {ip,ipv6,tcp}_[gs]etsockopt() instead of copying what is done there > > > to get and set values. > > > > > > That's a first step. More might come if we are OK with the approach, > > > e.g. > > > > > > - For SOL_TCP socket options we cannot store info in the MPTCP socket so > > > we still need to "manually" store it somewhere in the msk. Maybe we > > > could store info in the different subflows (and create a first one if > > > there is no subflow) and retrieve info from there? > > > > > > - For SOL_SOCKET socket options, it looks like it should be feasible to > > > do something similar to what I did here but I'm probably missing > > > something. This could clean quite a lot of code, e.g. the last commit > > > from: > > > > > > https://github.com/matttbe/mptcp_net-next/commits/b4/mptcp-unify-sockopt-issue-353 > > > > > > - To sync the socket options when a new subflow has been created, for > > > the moment, we copy info from the msk to the subflow but maybe it > > > would be fine to call some getsockopt() / setsockopt() to do the > > > sync? If we take this approach, we might want to save a list of > > > setsockopt() that have been done in the past on this socket and only > > > re-do these ones? That should help with the maintenance of these > > > socket options and ease the support of new ones. > > > > I would highly discourage this latest approach. > > Before going further, I want to insist on the fact that it is just an > idea, I didn't look at the implementation (and to be honest, I was not > looking at doing that now :) ) > > > We will have to limit > > the list length, breaking the sync at some point. > > Why would we have to limit the list length and break the sync? > > I understand having a list might not feel good but I'm not sure what > else we can have. Because otherwise syzkaller (or evil unprivileged user-space program) can easily exhaust the kernel memory doing a lot of setsockopt: we need to allocate the list entry, and we can't never flush the list. > > And there are few > > quite gray area, e.g. how errors should be handled? > > Same as what we have now I guess: errors are ignored :) > But I don't see what else we can and it doesn't feel worst than before > (regarding how we handle errors during the sync) Note the with the current schema sync can fail only while setting the CA, and even that requires a quite non trivial setup. If we start replying arbitrary complex sockopt sequence, I fear the we can hit failures much more frequently. > > > I also fear the > > "final" state of the subflow could be different from the > > expected/correct one for non trivial socktopt. > > I would expect it to be better because we would follow the "normal" flow > to set values. What bother me is the fact we bypass checks and directly > set values. Actually we don't! Such checks are performed once for the sockopt on the main/initial subflow (or by the mptcp impl). IMHO it's safer then the list alternative. For passive socket the latter will do the full sockopt checking using a different initial sock internal state: the sockopt will be applied after cloning and the related initialization. For plain TCP sockopt value are inherited from the parent via the clone itself, which is quite near to what the current sync implementation is doing. > If the code change on the different IP/TCP/SOCKET socket > options side, we might (very likely) miss the fact we also need to > update what we are doing on the MPTCP side, e.g. if new checks or extra > actions (BPF/Netfilter hooks) are added later, etc. So the idea of this > approach is to ease the maintenance (but maybe that's not needed and/or > not worth it) :) The eBPF maintainer mentioned that new hooks are discouraged, IIRC ;) Cheers, Paolo
On 11/07/2023 14:50, Paolo Abeni wrote: > On Tue, 2023-07-11 at 12:22 +0200, Matthieu Baerts wrote: >> On 10/07/2023 18:51, Paolo Abeni wrote: >>> On Fri, 2023-07-07 at 18:00 +0200, Matthieu Baerts wrote: >>>> Here is the implementation of an idea I suggested a few months ago: it >>>> is a factoring of the sockopt code trying to uniform it, mainly to rely >>>> on {ip,ipv6,tcp}_[gs]etsockopt() instead of copying what is done there >>>> to get and set values. >>>> >>>> That's a first step. More might come if we are OK with the approach, >>>> e.g. >>>> >>>> - For SOL_TCP socket options we cannot store info in the MPTCP socket so >>>> we still need to "manually" store it somewhere in the msk. Maybe we >>>> could store info in the different subflows (and create a first one if >>>> there is no subflow) and retrieve info from there? >>>> >>>> - For SOL_SOCKET socket options, it looks like it should be feasible to >>>> do something similar to what I did here but I'm probably missing >>>> something. This could clean quite a lot of code, e.g. the last commit >>>> from: >>>> >>>> https://github.com/matttbe/mptcp_net-next/commits/b4/mptcp-unify-sockopt-issue-353 >>>> >>>> - To sync the socket options when a new subflow has been created, for >>>> the moment, we copy info from the msk to the subflow but maybe it >>>> would be fine to call some getsockopt() / setsockopt() to do the >>>> sync? If we take this approach, we might want to save a list of >>>> setsockopt() that have been done in the past on this socket and only >>>> re-do these ones? That should help with the maintenance of these >>>> socket options and ease the support of new ones. >>> >>> I would highly discourage this latest approach. >> >> Before going further, I want to insist on the fact that it is just an >> idea, I didn't look at the implementation (and to be honest, I was not >> looking at doing that now :) ) >> >>> We will have to limit >>> the list length, breaking the sync at some point. >> >> Why would we have to limit the list length and break the sync? >> >> I understand having a list might not feel good but I'm not sure what >> else we can have. > > Because otherwise syzkaller (or evil unprivileged user-space program) > can easily exhaust the kernel memory doing a lot of setsockopt: we need > to allocate the list entry, and we can't never flush the list. I see, thank you. We could then add items (with just "level" and "optname") in the list only if the initial setsockopt() operation was a success and if there was no previous entry with the same info. But I see the risk. >>> And there are few >>> quite gray area, e.g. how errors should be handled? >> >> Same as what we have now I guess: errors are ignored :) >> But I don't see what else we can and it doesn't feel worst than before >> (regarding how we handle errors during the sync) > > Note the with the current schema sync can fail only while setting the > CA, and even that requires a quite non trivial setup. If we start > replying arbitrary complex sockopt sequence, I fear the we can hit > failures much more frequently. Maybe. We can also start with an "allow list". >>> I also fear the >>> "final" state of the subflow could be different from the >>> expected/correct one for non trivial socktopt. >> >> I would expect it to be better because we would follow the "normal" flow >> to set values. What bother me is the fact we bypass checks and directly >> set values. > > Actually we don't! Such checks are performed once for the sockopt on > the main/initial subflow (or by the mptcp impl). > > IMHO it's safer then the list alternative. For passive socket the > latter will do the full sockopt checking using a different initial sock > internal state: the sockopt will be applied after cloning and the > related initialization. > > For plain TCP sockopt value are inherited from the parent via the clone > itself, which is quite near to what the current sync implementation is > doing. Yes but that's only good for passive sockets. It would be nice if we could clone all the attributes of the first subflow when new ones are created :) It might be safer now but if we don't follow the code we duplicated from the other layers (i.e. what is exactly done with a TCP socket when a setsockopt() is done), it might be less safe later than changing the different values using the sockopt API. >> If the code change on the different IP/TCP/SOCKET socket >> options side, we might (very likely) miss the fact we also need to >> update what we are doing on the MPTCP side, e.g. if new checks or extra >> actions (BPF/Netfilter hooks) are added later, etc. So the idea of this >> approach is to ease the maintenance (but maybe that's not needed and/or >> not worth it) :) > > The eBPF maintainer mentioned that new hooks are discouraged, IIRC ;) Until someone comes with the new extended eBPF technology :-P No but what is done by the kernel when a "setsockopt()" is emitted could be modified, e.g. if the value is no longer a bit but an enum or if a security check needs to be done before, etc. (Or maybe I'm thinking about unrealistic/very unlikely/forbidden modifications)? Cheers, Matt
Here is the implementation of an idea I suggested a few months ago: it is a factoring of the sockopt code trying to uniform it, mainly to rely on {ip,ipv6,tcp}_[gs]etsockopt() instead of copying what is done there to get and set values. That's a first step. More might come if we are OK with the approach, e.g. - For SOL_TCP socket options we cannot store info in the MPTCP socket so we still need to "manually" store it somewhere in the msk. Maybe we could store info in the different subflows (and create a first one if there is no subflow) and retrieve info from there? - For SOL_SOCKET socket options, it looks like it should be feasible to do something similar to what I did here but I'm probably missing something. This could clean quite a lot of code, e.g. the last commit from: https://github.com/matttbe/mptcp_net-next/commits/b4/mptcp-unify-sockopt-issue-353 - To sync the socket options when a new subflow has been created, for the moment, we copy info from the msk to the subflow but maybe it would be fine to call some getsockopt() / setsockopt() to do the sync? If we take this approach, we might want to save a list of setsockopt() that have been done in the past on this socket and only re-do these ones? That should help with the maintenance of these socket options and ease the support of new ones. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> --- Matthieu Baerts (6): mptcp: sockopt: move tcp_inq code to a dedicated function mptcp: sockopt: update supported list mptcp: sockopt: get val in a generic way mptcp: sockopt: add missing getsockopt() options mptcp: sockopt: set val in a generic way mptcp: sockopt: support IP_TTL & IPV6_UNICAST_HOPS net/mptcp/sockopt.c | 241 ++++++++++++++++++++++++++-------------------------- 1 file changed, 121 insertions(+), 120 deletions(-) --- base-commit: cde4fa6a0a5ed3effb75008ce99eb842bc448e5d change-id: 20230221-mptcp-unify-sockopt-issue-353-1ad56ba6b04e Best regards,