Message ID | d19e0bd55ea5477d94567c00735b78d8da6a38cb.1673666803.git.lucien.xin@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: support ipv4 big tcp | expand |
On Fri, Jan 13, 2023 at 10:31 PM Xin Long <lucien.xin@gmail.com> wrote: > > It may process IPv4 TCP GSO packets in cipso_v4_skbuff_setattr(), so > the iph->tot_len update should use iph_set_totlen(). > > Note that for these non GSO packets, the new iph tot_len with extra > iph option len added may become greater than 65535, the old process > will cast it and set iph->tot_len to it, which is a bug. In theory, > iph options shouldn't be added for these big packets in here, a fix > may be needed here in the future. For now this patch is only to set > iph->tot_len to 0 when it happens. I'm not entirely clear on the paragraph above, but we do need to be able to set/modify the IP options in cipso_v4_skbuff_setattr() in order to support CIPSO labeling. I'm open to better and/or alternative solutions compared to what we are doing now, but I can't support a change that is a bug waiting to bite us. My apologies if I'm interpreting your comments incorrectly and that isn't the case here. > Signed-off-by: Xin Long <lucien.xin@gmail.com> > --- > net/ipv4/cipso_ipv4.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c > index 6cd3b6c559f0..79ae7204e8ed 100644 > --- a/net/ipv4/cipso_ipv4.c > +++ b/net/ipv4/cipso_ipv4.c > @@ -2222,7 +2222,7 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb, > memset((char *)(iph + 1) + buf_len, 0, opt_len - buf_len); > if (len_delta != 0) { > iph->ihl = 5 + (opt_len >> 2); > - iph->tot_len = htons(skb->len); > + iph_set_totlen(iph, skb->len); > } > ip_send_check(iph); > > -- > 2.31.1 -- paul-moore.com
On Sat, Jan 14, 2023 at 10:39 AM Paul Moore <paul@paul-moore.com> wrote: > > On Fri, Jan 13, 2023 at 10:31 PM Xin Long <lucien.xin@gmail.com> wrote: > > > > It may process IPv4 TCP GSO packets in cipso_v4_skbuff_setattr(), so > > the iph->tot_len update should use iph_set_totlen(). > > > > Note that for these non GSO packets, the new iph tot_len with extra > > iph option len added may become greater than 65535, the old process > > will cast it and set iph->tot_len to it, which is a bug. In theory, > > iph options shouldn't be added for these big packets in here, a fix > > may be needed here in the future. For now this patch is only to set > > iph->tot_len to 0 when it happens. > > I'm not entirely clear on the paragraph above, but we do need to be > able to set/modify the IP options in cipso_v4_skbuff_setattr() in > order to support CIPSO labeling. I'm open to better and/or > alternative solutions compared to what we are doing now, but I can't > support a change that is a bug waiting to bite us. My apologies if > I'm interpreting your comments incorrectly and that isn't the case > here. setting the IP options may cause the packet size to grow (both iph->tot_len and skb->len), for example: before setting it, iph->tot_len=65535, after setting it, iph->tot_len=65535 + 14 (assume the IP option len is 14) however, tot_len is 16 bit, and can't be set to "65535 + 14". Hope the above makes it clearer. This problem exists with or without this patch. Not sure how it should be fixed in cipso_v4. I knew tcpmss_tg4() setting TCP options which also causes the packet size to grow, but it requires no data payload: /* There is data after the header so the option can't be added * without moving it, and doing so may make the SYN packet * itself too large. Accept the packet unmodified instead. */ if (len > tcp_hdrlen) return 0; and then the new iph->tot_len won't exceed 65535. Not sure if we can skip the big packet, or segment/fragment the big packet in cipso_v4_skbuff_setattr(). Again, this patch fixes the issue when it's an IPv4 BIG TCP packets, and it doesn't introduce any new issues or fix any old issues, but only fix what the IPv4 BIG TCP may introduce. Thanks. > > > Signed-off-by: Xin Long <lucien.xin@gmail.com> > > --- > > net/ipv4/cipso_ipv4.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c > > index 6cd3b6c559f0..79ae7204e8ed 100644 > > --- a/net/ipv4/cipso_ipv4.c > > +++ b/net/ipv4/cipso_ipv4.c > > @@ -2222,7 +2222,7 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb, > > memset((char *)(iph + 1) + buf_len, 0, opt_len - buf_len); > > if (len_delta != 0) { > > iph->ihl = 5 + (opt_len >> 2); > > - iph->tot_len = htons(skb->len); > > + iph_set_totlen(iph, skb->len); > > } > > ip_send_check(iph); > > > > -- > > 2.31.1 > > -- > paul-moore.com
On Sat, Jan 14, 2023 at 12:54 PM Xin Long <lucien.xin@gmail.com> wrote: > On Sat, Jan 14, 2023 at 10:39 AM Paul Moore <paul@paul-moore.com> wrote: > > On Fri, Jan 13, 2023 at 10:31 PM Xin Long <lucien.xin@gmail.com> wrote: > > > > > > It may process IPv4 TCP GSO packets in cipso_v4_skbuff_setattr(), so > > > the iph->tot_len update should use iph_set_totlen(). > > > > > > Note that for these non GSO packets, the new iph tot_len with extra > > > iph option len added may become greater than 65535, the old process > > > will cast it and set iph->tot_len to it, which is a bug. In theory, > > > iph options shouldn't be added for these big packets in here, a fix > > > may be needed here in the future. For now this patch is only to set > > > iph->tot_len to 0 when it happens. > > > > I'm not entirely clear on the paragraph above, but we do need to be > > able to set/modify the IP options in cipso_v4_skbuff_setattr() in > > order to support CIPSO labeling. I'm open to better and/or > > alternative solutions compared to what we are doing now, but I can't > > support a change that is a bug waiting to bite us. My apologies if > > I'm interpreting your comments incorrectly and that isn't the case > > here. > setting the IP options may cause the packet size to grow (both iph->tot_len > and skb->len), for example: > > before setting it, iph->tot_len=65535, > after setting it, iph->tot_len=65535 + 14 (assume the IP option len is 14) > however, tot_len is 16 bit, and can't be set to "65535 + 14". > > Hope the above makes it clearer. Thanks, it does. > This problem exists with or without this patch. Not sure how it should > be fixed in cipso_v4 ... > > Not sure if we can skip the big packet, or segment/fragment the big packet > in cipso_v4_skbuff_setattr(). We can't skip the CIPSO labeling as that would be the network packet equivalent of not assigning a owner/group/mode to a file on the filesystem, which is a Very Bad Thing :) I spent a little bit of time this morning looking at the problem and I think the right approach is two-fold: first introduce a simple check in cipso_v4_skbuff_setattr() which returns -E2BIG if the packet length grows beyond 65535. It's rather crude, but it's a tiny patch and should at least ensure that the upper layers (NetLabel and SELinux) don't send the packet with a bogus length field; it will result in packet drops, but honestly that seems preferable to a mangled packet which will likely be dropped at some point in the network anyway. diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c index 6cd3b6c559f0..f19c9beda745 100644 --- a/net/ipv4/cipso_ipv4.c +++ b/net/ipv4/cipso_ipv4.c @@ -2183,8 +2183,10 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb, * that the security label is applied to the packet - we do the same * thing when using the socket options and it hasn't caused a problem, * if we need to we can always revisit this choice later */ - len_delta = opt_len - opt->optlen; + if ((skb->len + len_delta) > 65535) + return -E2BIG; + /* if we don't ensure enough headroom we could panic on the skb_push() * call below so make sure we have enough, we are also "mangling" the * packet so we should probably do a copy-on-write call anyway */ The second step will be to add a max-length IPv4 option filled with IPOPT_NOOP to the local sockets in the case of netlbl_sock_setattr()/NETLBL_NLTYPE_ADDRSELECT. In this case we would either end up replacing the padding with a proper CIPSO option or removing it completely in netlbl_skbuff_setattr(); in either case the total packet length remains the same or decreases so we should be "safe". The forwarded packet case is still a bit of an issue, but I think the likelihood of someone using 64k max-size IPv4 packets on the wire *and* passing this traffic through a Linux cross-domain router with CIPSO (re)labeling is about as close to zero as one can possibly get. At least given the size check present in step one (patchlet above) the packet will be safely (?) dropped on the Linux system so an admin will have some idea where to start looking. I've got some basic patches written for both, but I need to at least give them a quick sanity test before posting. -- paul-moore.com
On Mon, Jan 16, 2023 at 11:46 AM Paul Moore <paul@paul-moore.com> wrote: > > On Sat, Jan 14, 2023 at 12:54 PM Xin Long <lucien.xin@gmail.com> wrote: > > On Sat, Jan 14, 2023 at 10:39 AM Paul Moore <paul@paul-moore.com> wrote: > > > On Fri, Jan 13, 2023 at 10:31 PM Xin Long <lucien.xin@gmail.com> wrote: > > > > > > > > It may process IPv4 TCP GSO packets in cipso_v4_skbuff_setattr(), so > > > > the iph->tot_len update should use iph_set_totlen(). > > > > > > > > Note that for these non GSO packets, the new iph tot_len with extra > > > > iph option len added may become greater than 65535, the old process > > > > will cast it and set iph->tot_len to it, which is a bug. In theory, > > > > iph options shouldn't be added for these big packets in here, a fix > > > > may be needed here in the future. For now this patch is only to set > > > > iph->tot_len to 0 when it happens. > > > > > > I'm not entirely clear on the paragraph above, but we do need to be > > > able to set/modify the IP options in cipso_v4_skbuff_setattr() in > > > order to support CIPSO labeling. I'm open to better and/or > > > alternative solutions compared to what we are doing now, but I can't > > > support a change that is a bug waiting to bite us. My apologies if > > > I'm interpreting your comments incorrectly and that isn't the case > > > here. > > setting the IP options may cause the packet size to grow (both iph->tot_len > > and skb->len), for example: > > > > before setting it, iph->tot_len=65535, > > after setting it, iph->tot_len=65535 + 14 (assume the IP option len is 14) > > however, tot_len is 16 bit, and can't be set to "65535 + 14". > > > > Hope the above makes it clearer. > > Thanks, it does. > > > This problem exists with or without this patch. Not sure how it should > > be fixed in cipso_v4 ... > > > > Not sure if we can skip the big packet, or segment/fragment the big packet > > in cipso_v4_skbuff_setattr(). > > We can't skip the CIPSO labeling as that would be the network packet > equivalent of not assigning a owner/group/mode to a file on the > filesystem, which is a Very Bad Thing :) > > I spent a little bit of time this morning looking at the problem and I > think the right approach is two-fold: first introduce a simple check > in cipso_v4_skbuff_setattr() which returns -E2BIG if the packet length > grows beyond 65535. It's rather crude, but it's a tiny patch and > should at least ensure that the upper layers (NetLabel and SELinux) > don't send the packet with a bogus length field; it will result in > packet drops, but honestly that seems preferable to a mangled packet > which will likely be dropped at some point in the network anyway. > > diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c > index 6cd3b6c559f0..f19c9beda745 100644 > --- a/net/ipv4/cipso_ipv4.c > +++ b/net/ipv4/cipso_ipv4.c > @@ -2183,8 +2183,10 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb, > * that the security label is applied to the packet - we do the same > * thing when using the socket options and it hasn't caused a problem, > * if we need to we can always revisit this choice later */ > - > len_delta = opt_len - opt->optlen; > + if ((skb->len + len_delta) > 65535) > + return -E2BIG; > + Right, looks crude. :-) Note that for BIG TCP packets skb->len is bigger than 65535. (I assume CIPSO labeling will drop BIG TCP packets.) > /* if we don't ensure enough headroom we could panic on the skb_push() > * call below so make sure we have enough, we are also "mangling" the > * packet so we should probably do a copy-on-write call anyway */ > > The second step will be to add a max-length IPv4 option filled with > IPOPT_NOOP to the local sockets in the case of > netlbl_sock_setattr()/NETLBL_NLTYPE_ADDRSELECT. In this case we would > either end up replacing the padding with a proper CIPSO option or > removing it completely in netlbl_skbuff_setattr(); in either case the > total packet length remains the same or decreases so we should be > "safe". sounds better. > > The forwarded packet case is still a bit of an issue, but I think the > likelihood of someone using 64k max-size IPv4 packets on the wire > *and* passing this traffic through a Linux cross-domain router with > CIPSO (re)labeling is about as close to zero as one can possibly get. > At least given the size check present in step one (patchlet above) the > packet will be safely (?) dropped on the Linux system so an admin will > have some idea where to start looking. don't know the likelihood of CIPSO (re)labeling on a Linux cross-domain router. But 64K packets could be GRO packets merged on the interface (GRO enabled) of the router, not directly from the wire. > > I've got some basic patches written for both, but I need to at least > give them a quick sanity test before posting. > Good that you can take care of it from the security side. I think a similar problem exists in calipso_skbuff_setattr() in net/ipv6/calipso.c. Thanks.
On Mon, Jan 16, 2023 at 12:37 PM Xin Long <lucien.xin@gmail.com> wrote: > On Mon, Jan 16, 2023 at 11:46 AM Paul Moore <paul@paul-moore.com> wrote: > > On Sat, Jan 14, 2023 at 12:54 PM Xin Long <lucien.xin@gmail.com> wrote: > > > On Sat, Jan 14, 2023 at 10:39 AM Paul Moore <paul@paul-moore.com> wrote: > > > > On Fri, Jan 13, 2023 at 10:31 PM Xin Long <lucien.xin@gmail.com> wrote: > > > > > > > > > > It may process IPv4 TCP GSO packets in cipso_v4_skbuff_setattr(), so > > > > > the iph->tot_len update should use iph_set_totlen(). > > > > > > > > > > Note that for these non GSO packets, the new iph tot_len with extra > > > > > iph option len added may become greater than 65535, the old process > > > > > will cast it and set iph->tot_len to it, which is a bug. In theory, > > > > > iph options shouldn't be added for these big packets in here, a fix > > > > > may be needed here in the future. For now this patch is only to set > > > > > iph->tot_len to 0 when it happens. > > > > > > > > I'm not entirely clear on the paragraph above, but we do need to be > > > > able to set/modify the IP options in cipso_v4_skbuff_setattr() in > > > > order to support CIPSO labeling. I'm open to better and/or > > > > alternative solutions compared to what we are doing now, but I can't > > > > support a change that is a bug waiting to bite us. My apologies if > > > > I'm interpreting your comments incorrectly and that isn't the case > > > > here. > > > setting the IP options may cause the packet size to grow (both iph->tot_len > > > and skb->len), for example: > > > > > > before setting it, iph->tot_len=65535, > > > after setting it, iph->tot_len=65535 + 14 (assume the IP option len is 14) > > > however, tot_len is 16 bit, and can't be set to "65535 + 14". > > > > > > Hope the above makes it clearer. > > > > Thanks, it does. > > > > > This problem exists with or without this patch. Not sure how it should > > > be fixed in cipso_v4 ... > > > > > > Not sure if we can skip the big packet, or segment/fragment the big packet > > > in cipso_v4_skbuff_setattr(). > > > > We can't skip the CIPSO labeling as that would be the network packet > > equivalent of not assigning a owner/group/mode to a file on the > > filesystem, which is a Very Bad Thing :) > > > > I spent a little bit of time this morning looking at the problem and I > > think the right approach is two-fold: first introduce a simple check > > in cipso_v4_skbuff_setattr() which returns -E2BIG if the packet length > > grows beyond 65535. It's rather crude, but it's a tiny patch and > > should at least ensure that the upper layers (NetLabel and SELinux) > > don't send the packet with a bogus length field; it will result in > > packet drops, but honestly that seems preferable to a mangled packet > > which will likely be dropped at some point in the network anyway. > > > > diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c > > index 6cd3b6c559f0..f19c9beda745 100644 > > --- a/net/ipv4/cipso_ipv4.c > > +++ b/net/ipv4/cipso_ipv4.c > > @@ -2183,8 +2183,10 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb, > > * that the security label is applied to the packet - we do the same > > * thing when using the socket options and it hasn't caused a problem, > > * if we need to we can always revisit this choice later */ > > - > > len_delta = opt_len - opt->optlen; > > + if ((skb->len + len_delta) > 65535) > > + return -E2BIG; > > + > > Right, looks crude. :-) Yes, but what else can we do? There is fragmentation, but that is rather ugly and we would still need a solution for when the don't fragment bit is set. I'm open to suggestions. > Note that for BIG TCP packets skb->len is bigger than 65535. > (I assume CIPSO labeling will drop BIG TCP packets.) It seems like there is still ongoing discussion about even enabling BIG TCP for IPv4, however for this discussion let's assume that BIG TCP is merged for IPv4. We really should have a solution that allows CIPSO for both normal and BIG TCP, if we don't we force distros and admins to choose between the two and that isn't good. We should do better. If skb->len > 64k in the case of BIG TCP, how is the packet eventually divided/fragmented in such a way that the total length field in the IPv4 header doesn't overflow? Or is that simply handled at the driver/device layer and we simply set skb->len to whatever the size is, regardless of the 16-bit length limit? If that is the case, does the driver/device layer handle copying the IPv4 options and setting the header/total-length fields in each packet? Or is it something else completely? > > /* if we don't ensure enough headroom we could panic on the skb_push() > > * call below so make sure we have enough, we are also "mangling" the > > * packet so we should probably do a copy-on-write call anyway */ > > > > The second step will be to add a max-length IPv4 option filled with > > IPOPT_NOOP to the local sockets in the case of > > netlbl_sock_setattr()/NETLBL_NLTYPE_ADDRSELECT. In this case we would > > either end up replacing the padding with a proper CIPSO option or > > removing it completely in netlbl_skbuff_setattr(); in either case the > > total packet length remains the same or decreases so we should be > > "safe". > > sounds better. To be clear, it's not a one-or-the-other choice, both patches would be necessary as the padding route described in the second step would only apply to traffic generated locally from a socket. We still have the ugly problem of dealing with forwarded traffic, which it looks like we discuss more on that below ... > > The forwarded packet case is still a bit of an issue, but I think the > > likelihood of someone using 64k max-size IPv4 packets on the wire > > *and* passing this traffic through a Linux cross-domain router with > > CIPSO (re)labeling is about as close to zero as one can possibly get. > > At least given the size check present in step one (patchlet above) the > > packet will be safely (?) dropped on the Linux system so an admin will > > have some idea where to start looking. > > don't know the likelihood of CIPSO (re)labeling on a Linux cross-domain router. > But 64K packets could be GRO packets merged on the interface (GRO enabled) > of the router, not directly from the wire. In the GRO case, is it safe to grow the packet such that skb->len is greater than 64k? I presume that the device/driver is going to split the packet anyway and populate the IPv4 total length fields in the header anyway, right? If we can't grow the packet beyond 64k, is there some way to signal to the driver/device at runtime that the largest packet we can process is 64k minus 40 bytes (for the IPv4 options)? > > I've got some basic patches written for both, but I need to at least > > give them a quick sanity test before posting. > > Good that you can take care of it from the security side. It's not yet clear to me that we can, see the above questions. > I think a similar problem exists in calipso_skbuff_setattr() in > net/ipv6/calipso.c. Probably, but one mess at a time, especially since several of the questions above apply to both IPv4 and IPv6.
On Mon, Jan 16, 2023 at 1:13 PM Paul Moore <paul@paul-moore.com> wrote: > > On Mon, Jan 16, 2023 at 12:37 PM Xin Long <lucien.xin@gmail.com> wrote: > > On Mon, Jan 16, 2023 at 11:46 AM Paul Moore <paul@paul-moore.com> wrote: > > > On Sat, Jan 14, 2023 at 12:54 PM Xin Long <lucien.xin@gmail.com> wrote: > > > > On Sat, Jan 14, 2023 at 10:39 AM Paul Moore <paul@paul-moore.com> wrote: > > > > > On Fri, Jan 13, 2023 at 10:31 PM Xin Long <lucien.xin@gmail.com> wrote: > > > > > > > > > > > > It may process IPv4 TCP GSO packets in cipso_v4_skbuff_setattr(), so > > > > > > the iph->tot_len update should use iph_set_totlen(). > > > > > > > > > > > > Note that for these non GSO packets, the new iph tot_len with extra > > > > > > iph option len added may become greater than 65535, the old process > > > > > > will cast it and set iph->tot_len to it, which is a bug. In theory, > > > > > > iph options shouldn't be added for these big packets in here, a fix > > > > > > may be needed here in the future. For now this patch is only to set > > > > > > iph->tot_len to 0 when it happens. > > > > > > > > > > I'm not entirely clear on the paragraph above, but we do need to be > > > > > able to set/modify the IP options in cipso_v4_skbuff_setattr() in > > > > > order to support CIPSO labeling. I'm open to better and/or > > > > > alternative solutions compared to what we are doing now, but I can't > > > > > support a change that is a bug waiting to bite us. My apologies if > > > > > I'm interpreting your comments incorrectly and that isn't the case > > > > > here. > > > > setting the IP options may cause the packet size to grow (both iph->tot_len > > > > and skb->len), for example: > > > > > > > > before setting it, iph->tot_len=65535, > > > > after setting it, iph->tot_len=65535 + 14 (assume the IP option len is 14) > > > > however, tot_len is 16 bit, and can't be set to "65535 + 14". > > > > > > > > Hope the above makes it clearer. > > > > > > Thanks, it does. > > > > > > > This problem exists with or without this patch. Not sure how it should > > > > be fixed in cipso_v4 ... > > > > > > > > Not sure if we can skip the big packet, or segment/fragment the big packet > > > > in cipso_v4_skbuff_setattr(). > > > > > > We can't skip the CIPSO labeling as that would be the network packet > > > equivalent of not assigning a owner/group/mode to a file on the > > > filesystem, which is a Very Bad Thing :) > > > > > > I spent a little bit of time this morning looking at the problem and I > > > think the right approach is two-fold: first introduce a simple check > > > in cipso_v4_skbuff_setattr() which returns -E2BIG if the packet length > > > grows beyond 65535. It's rather crude, but it's a tiny patch and > > > should at least ensure that the upper layers (NetLabel and SELinux) > > > don't send the packet with a bogus length field; it will result in > > > packet drops, but honestly that seems preferable to a mangled packet > > > which will likely be dropped at some point in the network anyway. > > > > > > diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c > > > index 6cd3b6c559f0..f19c9beda745 100644 > > > --- a/net/ipv4/cipso_ipv4.c > > > +++ b/net/ipv4/cipso_ipv4.c > > > @@ -2183,8 +2183,10 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb, > > > * that the security label is applied to the packet - we do the same > > > * thing when using the socket options and it hasn't caused a problem, > > > * if we need to we can always revisit this choice later */ > > > - > > > len_delta = opt_len - opt->optlen; > > > + if ((skb->len + len_delta) > 65535) > > > + return -E2BIG; > > > + > > > > Right, looks crude. :-) > > Yes, but what else can we do? There is fragmentation, but that is > rather ugly and we would still need a solution for when the don't > fragment bit is set. I'm open to suggestions. looking at ovs_dp_upcall(), for GSO/GRO packets it goes to queue_gso_packets() where it calls __skb_gso_segment() to segment it into small segs/skbs, then process these segs instead. I'm thinking you can try to do the same in cipso_v4_skbuff_setattr(), and I don't think 64K non-GSO packets exist in the user environment, so taking care of GSO packets should be enough. I just don't know if the security_hook will be able to process these smaller segs/skbs after the segment. > > > Note that for BIG TCP packets skb->len is bigger than 65535. > > (I assume CIPSO labeling will drop BIG TCP packets.) > > It seems like there is still ongoing discussion about even enabling > BIG TCP for IPv4, however for this discussion let's assume that BIG > TCP is merged for IPv4. > > We really should have a solution that allows CIPSO for both normal and > BIG TCP, if we don't we force distros and admins to choose between the > two and that isn't good. We should do better. If skb->len > 64k in > the case of BIG TCP, how is the packet eventually divided/fragmented > in such a way that the total length field in the IPv4 header doesn't > overflow? Or is that simply handled at the driver/device layer and we > simply set skb->len to whatever the size is, regardless of the 16-bit Yes, for BIG TCP, 16-bit length is set to 0, and it just uses skb->len as the IP packet length. > length limit? If that is the case, does the driver/device layer > handle copying the IPv4 options and setting the header/total-length > fields in each packet? Or is it something else completely? Yes, I think the driver/device layer will handle copying the IPv4 options and setting the header/total-length, and that's how it works. > > > > /* if we don't ensure enough headroom we could panic on the skb_push() > > > * call below so make sure we have enough, we are also "mangling" the > > > * packet so we should probably do a copy-on-write call anyway */ > > > > > > The second step will be to add a max-length IPv4 option filled with > > > IPOPT_NOOP to the local sockets in the case of > > > netlbl_sock_setattr()/NETLBL_NLTYPE_ADDRSELECT. In this case we would > > > either end up replacing the padding with a proper CIPSO option or > > > removing it completely in netlbl_skbuff_setattr(); in either case the > > > total packet length remains the same or decreases so we should be > > > "safe". > > > > sounds better. > > To be clear, it's not a one-or-the-other choice, both patches would be > necessary as the padding route described in the second step would only > apply to traffic generated locally from a socket. We still have the > ugly problem of dealing with forwarded traffic, which it looks like we > discuss more on that below ... I got that. > > > > The forwarded packet case is still a bit of an issue, but I think the > > > likelihood of someone using 64k max-size IPv4 packets on the wire > > > *and* passing this traffic through a Linux cross-domain router with > > > CIPSO (re)labeling is about as close to zero as one can possibly get. > > > At least given the size check present in step one (patchlet above) the > > > packet will be safely (?) dropped on the Linux system so an admin will > > > have some idea where to start looking. > > > > don't know the likelihood of CIPSO (re)labeling on a Linux cross-domain router. > > But 64K packets could be GRO packets merged on the interface (GRO enabled) > > of the router, not directly from the wire. > > In the GRO case, is it safe to grow the packet such that skb->len is > greater than 64k? I presume that the device/driver is going to split > the packet anyway and populate the IPv4 total length fields in the > header anyway, right? If we can't grow the packet beyond 64k, is > there some way to signal to the driver/device at runtime that the > largest packet we can process is 64k minus 40 bytes (for the IPv4 > options)? at runtime, not as far as I know. It's a field of the network device that can be modified by: # ip link set dev eth0 gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE > > > > I've got some basic patches written for both, but I need to at least > > > give them a quick sanity test before posting. > > > > Good that you can take care of it from the security side. > > It's not yet clear to me that we can, see the above questions. > > > I think a similar problem exists in calipso_skbuff_setattr() in > > net/ipv6/calipso.c. > > Probably, but one mess at a time, especially since several of the > questions above apply to both IPv4 and IPv6. > Just wanted you to know that IPv6 BIG TCP is something already in the kernel, unlike IPv4 BIG TCP, we can avoid this problem in advance before it's in the kernel. Thanks.
On 1/16/23 12:33 PM, Xin Long wrote: >> We really should have a solution that allows CIPSO for both normal and >> BIG TCP, if we don't we force distros and admins to choose between the >> two and that isn't good. We should do better. If skb->len > 64k in >> the case of BIG TCP, how is the packet eventually divided/fragmented >> in such a way that the total length field in the IPv4 header doesn't >> overflow? Or is that simply handled at the driver/device layer and we >> simply set skb->len to whatever the size is, regardless of the 16-bit > Yes, for BIG TCP, 16-bit length is set to 0, and it just uses skb->len > as the IP packet length. > >> length limit? If that is the case, does the driver/device layer >> handle copying the IPv4 options and setting the header/total-length >> fields in each packet? Or is it something else completely? > Yes, I think the driver/device layer will handle copying the IPv4 options > and setting the header/total-length, and that's how it works. IPv4 options, like TCP options, should be part of the header that gets replicate across GSO sliced packets by the hardware. ie., both should be transparent to well designed hardware (and for h/w that made poor choices standard 64kB GSO is the limit for its users).
On Mon, Jan 16, 2023 at 2:35 PM Xin Long <lucien.xin@gmail.com> wrote: > On Mon, Jan 16, 2023 at 1:13 PM Paul Moore <paul@paul-moore.com> wrote: > > On Mon, Jan 16, 2023 at 12:37 PM Xin Long <lucien.xin@gmail.com> wrote: > > > On Mon, Jan 16, 2023 at 11:46 AM Paul Moore <paul@paul-moore.com> wrote: > > > > On Sat, Jan 14, 2023 at 12:54 PM Xin Long <lucien.xin@gmail.com> wrote: > > > > > On Sat, Jan 14, 2023 at 10:39 AM Paul Moore <paul@paul-moore.com> wrote: > > > > > > On Fri, Jan 13, 2023 at 10:31 PM Xin Long <lucien.xin@gmail.com> wrote: ... > > > > We can't skip the CIPSO labeling as that would be the network packet > > > > equivalent of not assigning a owner/group/mode to a file on the > > > > filesystem, which is a Very Bad Thing :) > > > > > > > > I spent a little bit of time this morning looking at the problem and I > > > > think the right approach is two-fold: first introduce a simple check > > > > in cipso_v4_skbuff_setattr() which returns -E2BIG if the packet length > > > > grows beyond 65535. It's rather crude, but it's a tiny patch and > > > > should at least ensure that the upper layers (NetLabel and SELinux) > > > > don't send the packet with a bogus length field; it will result in > > > > packet drops, but honestly that seems preferable to a mangled packet > > > > which will likely be dropped at some point in the network anyway. > > > > > > > > diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c > > > > index 6cd3b6c559f0..f19c9beda745 100644 > > > > --- a/net/ipv4/cipso_ipv4.c > > > > +++ b/net/ipv4/cipso_ipv4.c > > > > @@ -2183,8 +2183,10 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb, > > > > * that the security label is applied to the packet - we do the same > > > > * thing when using the socket options and it hasn't caused a problem, > > > > * if we need to we can always revisit this choice later */ > > > > - > > > > len_delta = opt_len - opt->optlen; > > > > + if ((skb->len + len_delta) > 65535) > > > > + return -E2BIG; > > > > + > > > > > > Right, looks crude. :-) > > > > Yes, but what else can we do? There is fragmentation, but that is > > rather ugly and we would still need a solution for when the don't > > fragment bit is set. I'm open to suggestions. > > looking at ovs_dp_upcall(), for GSO/GRO packets it goes to > queue_gso_packets() where it calls __skb_gso_segment() > to segment it into small segs/skbs, then process these segs instead. > > I'm thinking you can try to do the same in cipso_v4_skbuff_setattr(), > and I don't think 64K non-GSO packets exist in the user environment, > so taking care of GSO packets should be enough. Thanks, I'll take a look. > I just don't know if the security_hook will be able to process these > smaller segs/skbs after the segment. As long as the smaller, segmented packets have the IPv4 options preserved/copied on each smaller packet it should be okay. > > It seems like there is still ongoing discussion about even enabling > > BIG TCP for IPv4, however for this discussion let's assume that BIG > > TCP is merged for IPv4. > > > > We really should have a solution that allows CIPSO for both normal and > > BIG TCP, if we don't we force distros and admins to choose between the > > two and that isn't good. We should do better. If skb->len > 64k in > > the case of BIG TCP, how is the packet eventually divided/fragmented > > in such a way that the total length field in the IPv4 header doesn't > > overflow? Or is that simply handled at the driver/device layer and we > > simply set skb->len to whatever the size is, regardless of the 16-bit > > Yes, for BIG TCP, 16-bit length is set to 0, and it just uses skb->len > as the IP packet length. In the BIG TCP case, when is the IPv4 header zero'd out? Currently cipso_v4_skbuff_setattr() is called in the NF_INET_LOCAL_OUT and NF_INET_FORWARD chains, is there an easy way to distinguish between a traditional segmentation offload mechanism, e.g. GSO, and BIG TCP? If BIG TCP allows for arbitrarily large packets we can just grow the skb->len value as needed and leave the total length field in the IPv4 header untouched/zero, but we would need to be able to distinguish between a segmentation offload and BIG TCP. > > In the GRO case, is it safe to grow the packet such that skb->len is > > greater than 64k? I presume that the device/driver is going to split > > the packet anyway and populate the IPv4 total length fields in the > > header anyway, right? If we can't grow the packet beyond 64k, is > > there some way to signal to the driver/device at runtime that the > > largest packet we can process is 64k minus 40 bytes (for the IPv4 > > options)? > > at runtime, not as far as I know. > It's a field of the network device that can be modified by: > # ip link set dev eth0 gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE I need to look at the OVS case above, but one possibility would be to have the kernel adjust the GSO size down by 40 bytes when CONFIG_NETLABEL is enabled, but that isn't a great option, and not something I consider a first (or second) choice.
On Tue, Jan 17, 2023 at 2:51 PM Paul Moore <paul@paul-moore.com> wrote: > On Mon, Jan 16, 2023 at 2:35 PM Xin Long <lucien.xin@gmail.com> wrote: > > On Mon, Jan 16, 2023 at 1:13 PM Paul Moore <paul@paul-moore.com> wrote: > > > On Mon, Jan 16, 2023 at 12:37 PM Xin Long <lucien.xin@gmail.com> wrote: > > > > On Mon, Jan 16, 2023 at 11:46 AM Paul Moore <paul@paul-moore.com> wrote: > > > > > On Sat, Jan 14, 2023 at 12:54 PM Xin Long <lucien.xin@gmail.com> wrote: > > > > > > On Sat, Jan 14, 2023 at 10:39 AM Paul Moore <paul@paul-moore.com> wrote: > > > > > > > On Fri, Jan 13, 2023 at 10:31 PM Xin Long <lucien.xin@gmail.com> wrote: > > ... > > > > > > We can't skip the CIPSO labeling as that would be the network packet > > > > > equivalent of not assigning a owner/group/mode to a file on the > > > > > filesystem, which is a Very Bad Thing :) > > > > > > > > > > I spent a little bit of time this morning looking at the problem and I > > > > > think the right approach is two-fold: first introduce a simple check > > > > > in cipso_v4_skbuff_setattr() which returns -E2BIG if the packet length > > > > > grows beyond 65535. It's rather crude, but it's a tiny patch and > > > > > should at least ensure that the upper layers (NetLabel and SELinux) > > > > > don't send the packet with a bogus length field; it will result in > > > > > packet drops, but honestly that seems preferable to a mangled packet > > > > > which will likely be dropped at some point in the network anyway. > > > > > > > > > > diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c > > > > > index 6cd3b6c559f0..f19c9beda745 100644 > > > > > --- a/net/ipv4/cipso_ipv4.c > > > > > +++ b/net/ipv4/cipso_ipv4.c > > > > > @@ -2183,8 +2183,10 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb, > > > > > * that the security label is applied to the packet - we do the same > > > > > * thing when using the socket options and it hasn't caused a problem, > > > > > * if we need to we can always revisit this choice later */ > > > > > - > > > > > len_delta = opt_len - opt->optlen; > > > > > + if ((skb->len + len_delta) > 65535) > > > > > + return -E2BIG; > > > > > + > > > > > > > > Right, looks crude. :-) > > > > > > Yes, but what else can we do? There is fragmentation, but that is > > > rather ugly and we would still need a solution for when the don't > > > fragment bit is set. I'm open to suggestions. > > > > looking at ovs_dp_upcall(), for GSO/GRO packets it goes to > > queue_gso_packets() where it calls __skb_gso_segment() > > to segment it into small segs/skbs, then process these segs instead. > > > > I'm thinking you can try to do the same in cipso_v4_skbuff_setattr(), > > and I don't think 64K non-GSO packets exist in the user environment, > > so taking care of GSO packets should be enough. > > Thanks, I'll take a look. Unfortunately I don't think the ovs_dp_upcall() approach will work as that is an endpoint in the kernel which sends the GSO'd packet up to userspace in segements. In the case of cipso_v4_skbuff_setattr() we are setting an IPv4 option on a packet in either the NF_INET_LOCAL_OUT or NF_INET_FORWARD output path. I believe we can resolve the LOCAL_OUT case with the padding approach I mentioned previously, but the FORWARD path remains a challenge; I simply don't see a way to handle growing the packet beyond 64k in the forward path. I'm also realizing that we should be sending a ICMP_FRAG_NEEDED in the forward case when we have to drop the packet due to size issues, as the normal MTU/size check happens prior to the NF_INET_FORWARD hooks (and hence cipso_v4_skbuff_setattr()). > > > It seems like there is still ongoing discussion about even enabling > > > BIG TCP for IPv4, however for this discussion let's assume that BIG > > > TCP is merged for IPv4. > > > > > > We really should have a solution that allows CIPSO for both normal and > > > BIG TCP, if we don't we force distros and admins to choose between the > > > two and that isn't good. We should do better. If skb->len > 64k in > > > the case of BIG TCP, how is the packet eventually divided/fragmented > > > in such a way that the total length field in the IPv4 header doesn't > > > overflow? Or is that simply handled at the driver/device layer and we > > > simply set skb->len to whatever the size is, regardless of the 16-bit > > > > Yes, for BIG TCP, 16-bit length is set to 0, and it just uses skb->len > > as the IP packet length. > > In the BIG TCP case, when is the IPv4 header zero'd out? Currently > cipso_v4_skbuff_setattr() is called in the NF_INET_LOCAL_OUT and > NF_INET_FORWARD chains, is there an easy way to distinguish between a > traditional segmentation offload mechanism, e.g. GSO, and BIG TCP? If > BIG TCP allows for arbitrarily large packets we can just grow the > skb->len value as needed and leave the total length field in the IPv4 > header untouched/zero, but we would need to be able to distinguish > between a segmentation offload and BIG TCP. Keeping the above questions as they still apply, rather I could still use some help understanding what a BIG TCP packet would look like during LOCAL_OUT and FORWARD. > > > In the GRO case, is it safe to grow the packet such that skb->len is > > > greater than 64k? I presume that the device/driver is going to split > > > the packet anyway and populate the IPv4 total length fields in the > > > header anyway, right? If we can't grow the packet beyond 64k, is > > > there some way to signal to the driver/device at runtime that the > > > largest packet we can process is 64k minus 40 bytes (for the IPv4 > > > options)? > > > > at runtime, not as far as I know. > > It's a field of the network device that can be modified by: > > # ip link set dev eth0 gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE > > I need to look at the OVS case above, but one possibility would be to > have the kernel adjust the GSO size down by 40 bytes when > CONFIG_NETLABEL is enabled, but that isn't a great option, and not > something I consider a first (or second) choice. Looking more at the GSO related code, this isn't likely to work.
On 1/17/23 3:46 PM, Paul Moore wrote: >> >> In the BIG TCP case, when is the IPv4 header zero'd out? Currently >> cipso_v4_skbuff_setattr() is called in the NF_INET_LOCAL_OUT and >> NF_INET_FORWARD chains, is there an easy way to distinguish between a >> traditional segmentation offload mechanism, e.g. GSO, and BIG TCP? If >> BIG TCP allows for arbitrarily large packets we can just grow the >> skb->len value as needed and leave the total length field in the IPv4 >> header untouched/zero, but we would need to be able to distinguish >> between a segmentation offload and BIG TCP. > > Keeping the above questions as they still apply, rather I could still > use some help understanding what a BIG TCP packet would look like > during LOCAL_OUT and FORWARD. skb->len > 64kb. you don't typically look at the IP / IPv6 header and its total length field and I thought the first patch in the series added a handler for doing that. > >>>> In the GRO case, is it safe to grow the packet such that skb->len is >>>> greater than 64k? I presume that the device/driver is going to split >>>> the packet anyway and populate the IPv4 total length fields in the >>>> header anyway, right? If we can't grow the packet beyond 64k, is >>>> there some way to signal to the driver/device at runtime that the >>>> largest packet we can process is 64k minus 40 bytes (for the IPv4 >>>> options)? >>> >>> at runtime, not as far as I know. >>> It's a field of the network device that can be modified by: >>> # ip link set dev eth0 gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE >> >> I need to look at the OVS case above, but one possibility would be to >> have the kernel adjust the GSO size down by 40 bytes when >> CONFIG_NETLABEL is enabled, but that isn't a great option, and not >> something I consider a first (or second) choice. > > Looking more at the GSO related code, this isn't likely to work. > icsk_ext_hdr_len is adjusted by cipso for its options. Does that not cover what is needed?
On Tue, Jan 17, 2023 at 9:47 PM David Ahern <dsahern@gmail.com> wrote: > On 1/17/23 3:46 PM, Paul Moore wrote: > >> > >> In the BIG TCP case, when is the IPv4 header zero'd out? Currently > >> cipso_v4_skbuff_setattr() is called in the NF_INET_LOCAL_OUT and > >> NF_INET_FORWARD chains, is there an easy way to distinguish between a > >> traditional segmentation offload mechanism, e.g. GSO, and BIG TCP? If > >> BIG TCP allows for arbitrarily large packets we can just grow the > >> skb->len value as needed and leave the total length field in the IPv4 > >> header untouched/zero, but we would need to be able to distinguish > >> between a segmentation offload and BIG TCP. > > > > Keeping the above questions as they still apply, rather I could still > > use some help understanding what a BIG TCP packet would look like > > during LOCAL_OUT and FORWARD. > > skb->len > 64kb. you don't typically look at the IP / IPv6 header and > its total length field and I thought the first patch in the series added > a handler for doing that. Thanks, I was just curious if there was some other mechanism but that works. As of this moment, the patchset I'm working on is still independent of the BIG TCP patches, and I want to make sure I'm not doing anything that will make the BIG TCP patches any more challenging. > >>>> In the GRO case, is it safe to grow the packet such that skb->len is > >>>> greater than 64k? I presume that the device/driver is going to split > >>>> the packet anyway and populate the IPv4 total length fields in the > >>>> header anyway, right? If we can't grow the packet beyond 64k, is > >>>> there some way to signal to the driver/device at runtime that the > >>>> largest packet we can process is 64k minus 40 bytes (for the IPv4 > >>>> options)? > >>> > >>> at runtime, not as far as I know. > >>> It's a field of the network device that can be modified by: > >>> # ip link set dev eth0 gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE > >> > >> I need to look at the OVS case above, but one possibility would be to > >> have the kernel adjust the GSO size down by 40 bytes when > >> CONFIG_NETLABEL is enabled, but that isn't a great option, and not > >> something I consider a first (or second) choice. > > > > Looking more at the GSO related code, this isn't likely to work. > > icsk_ext_hdr_len is adjusted by cipso for its options. Does that not > cover what is needed? Adjusting the icsk_ext_hdr_len only applies to CIPSO labels that are attached via the associated local sock, traffic that is labeled by cipso_v4_skbuff_setattr() in the LOCAL_OUT or FORWARD netfilter hooks does not have the icsk_ext_hdr_len adjustment. Although as I mentioned earlier, I am adding a patch which would pad out the IPv4 option header in the LOCAL_OUT labeling scenario so icsk_ext_hdr_len will be adjusted for all locally generated TCP/connected/is_icsk traffic. Forwarded traffic still remains an issue; but I think the only thing we can do is drop it and send an icmp message back to the sender with an adjusted MTU value. -- paul-moore.com
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c index 6cd3b6c559f0..79ae7204e8ed 100644 --- a/net/ipv4/cipso_ipv4.c +++ b/net/ipv4/cipso_ipv4.c @@ -2222,7 +2222,7 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb, memset((char *)(iph + 1) + buf_len, 0, opt_len - buf_len); if (len_delta != 0) { iph->ihl = 5 + (opt_len >> 2); - iph->tot_len = htons(skb->len); + iph_set_totlen(iph, skb->len); } ip_send_check(iph);
It may process IPv4 TCP GSO packets in cipso_v4_skbuff_setattr(), so the iph->tot_len update should use iph_set_totlen(). Note that for these non GSO packets, the new iph tot_len with extra iph option len added may become greater than 65535, the old process will cast it and set iph->tot_len to it, which is a bug. In theory, iph options shouldn't be added for these big packets in here, a fix may be needed here in the future. For now this patch is only to set iph->tot_len to 0 when it happens. Signed-off-by: Xin Long <lucien.xin@gmail.com> --- net/ipv4/cipso_ipv4.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)