Message ID | 20210902193447.94039-2-willemdebruijn.kernel@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] ip_gre: validate csum_start only if CHECKSUM_PARTIAL | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 2 maintainers not CCed: yoshfuji@linux-ipv6.org dsahern@kernel.org |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 12 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > From: Willem de Bruijn <willemb@google.com> > > Only test integrity of csum_start if checksum offload is configured. > > With checksum offload and GRE tunnel checksum, gre_build_header will > cheaply build the GRE checksum using local checksum offload. This > depends on inner packet csum offload, and thus that csum_start points > behind GRE. But validate this condition only with checksum offload. > > Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ > Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start") > Signed-off-by: Willem de Bruijn <willemb@google.com> > --- > net/ipv4/ip_gre.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > index 177d26d8fb9c..09311992a617 100644 > --- a/net/ipv4/ip_gre.c > +++ b/net/ipv4/ip_gre.c > @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev, > > static int gre_handle_offloads(struct sk_buff *skb, bool csum) > { > - if (csum && skb_checksum_start(skb) < skb->data) > + /* Local checksum offload requires csum offload of the inner packet */ > + if (csum && skb->ip_summed == CHECKSUM_PARTIAL && > + skb_checksum_start(skb) < skb->data) > return -EINVAL; > + > return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE); > } So a few minor nits. First I think we need this for both v4 and v6 since it looks like this code is reproduced for net/ipv6/ip6_gre.c. Second I don't know if we even need to bother including the "csum" portion of the lookup since that technically is just telling us if the GRE tunnel is requesting a checksum or not and I am not sure that applies to the fact that the inner L4 header is going to be what is requesting the checksum offload most likely. Also maybe this should be triggering a WARN_ON_ONCE if we hit this as the path triggering this should be fixed rather than us silently dropping frames. We should be figuring out what cases are resulting in us getting CHECKSUM_PARTIAL without skb_checksum_start being set.
On Thu, Sep 2, 2021 at 4:25 PM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > From: Willem de Bruijn <willemb@google.com> > > > > Only test integrity of csum_start if checksum offload is configured. > > > > With checksum offload and GRE tunnel checksum, gre_build_header will > > cheaply build the GRE checksum using local checksum offload. This > > depends on inner packet csum offload, and thus that csum_start points > > behind GRE. But validate this condition only with checksum offload. > > > > Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ > > Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start") > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > --- > > net/ipv4/ip_gre.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > > index 177d26d8fb9c..09311992a617 100644 > > --- a/net/ipv4/ip_gre.c > > +++ b/net/ipv4/ip_gre.c > > @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev, > > > > static int gre_handle_offloads(struct sk_buff *skb, bool csum) > > { > > - if (csum && skb_checksum_start(skb) < skb->data) > > + /* Local checksum offload requires csum offload of the inner packet */ > > + if (csum && skb->ip_summed == CHECKSUM_PARTIAL && > > + skb_checksum_start(skb) < skb->data) > > return -EINVAL; > > + > > return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE); > > } Thanks for taking a look. > So a few minor nits. > > First I think we need this for both v4 and v6 since it looks like this > code is reproduced for net/ipv6/ip6_gre.c. I sent a separate patch for v6. Perhaps should have made it a series to make this more clear. > Second I don't know if we even need to bother including the "csum" > portion of the lookup since that technically is just telling us if the > GRE tunnel is requesting a checksum or not and I am not sure that > applies to the fact that the inner L4 header is going to be what is > requesting the checksum offload most likely. This test introduced in the original patch specifically protects the GRE tunnel checksum calculation using lco_csum. The regular inner packet path likely is already robust (as similar bug reports would be more likely for that more common case). > Also maybe this should be triggering a WARN_ON_ONCE if we hit this as > the path triggering this should be fixed rather than us silently > dropping frames. We should be figuring out what cases are resulting in > us getting CHECKSUM_PARTIAL without skb_checksum_start being set. We already know that bad packets can enter the kernel and trigger this, using packet sockets and virtio_net_hdr. Unfortunately, this *is* the fix.
On Thu, Sep 2, 2021 at 1:30 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Thu, Sep 2, 2021 at 4:25 PM Alexander Duyck > <alexander.duyck@gmail.com> wrote: > > > > On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > From: Willem de Bruijn <willemb@google.com> > > > > > > Only test integrity of csum_start if checksum offload is configured. > > > > > > With checksum offload and GRE tunnel checksum, gre_build_header will > > > cheaply build the GRE checksum using local checksum offload. This > > > depends on inner packet csum offload, and thus that csum_start points > > > behind GRE. But validate this condition only with checksum offload. > > > > > > Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ > > > Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start") > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > --- > > > net/ipv4/ip_gre.c | 5 ++++- > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > > > index 177d26d8fb9c..09311992a617 100644 > > > --- a/net/ipv4/ip_gre.c > > > +++ b/net/ipv4/ip_gre.c > > > @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev, > > > > > > static int gre_handle_offloads(struct sk_buff *skb, bool csum) > > > { > > > - if (csum && skb_checksum_start(skb) < skb->data) > > > + /* Local checksum offload requires csum offload of the inner packet */ > > > + if (csum && skb->ip_summed == CHECKSUM_PARTIAL && > > > + skb_checksum_start(skb) < skb->data) > > > return -EINVAL; > > > + > > > return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE); > > > } > > Thanks for taking a look. > > > So a few minor nits. > > > > First I think we need this for both v4 and v6 since it looks like this > > code is reproduced for net/ipv6/ip6_gre.c. > > I sent a separate patch for v6. Perhaps should have made it a series > to make this more clear. Yeah, that was part of the reason I assumed the ipv6 patch was overlooked. > > Second I don't know if we even need to bother including the "csum" > > portion of the lookup since that technically is just telling us if the > > GRE tunnel is requesting a checksum or not and I am not sure that > > applies to the fact that the inner L4 header is going to be what is > > requesting the checksum offload most likely. > > This test introduced in the original patch specifically protects the > GRE tunnel checksum calculation using lco_csum. The regular inner > packet path likely is already robust (as similar bug reports would be > more likely for that more common case). I was just thinking in terms of shaving off some extra comparisons. I suppose it depends on if this is being inlined or not. If it is being inlined there are at least 2 cases where the if statement would be dropped since csum is explicitly false. My thought was that by just jumping straight to the skb->ip_summed check we can drop the lookup for csum since it would be implied by the fact that skb->ip_summed is being checked. It would create a broader check, but at the same time it would reduce the number of comparisons in the call. > > Also maybe this should be triggering a WARN_ON_ONCE if we hit this as > > the path triggering this should be fixed rather than us silently > > dropping frames. We should be figuring out what cases are resulting in > > us getting CHECKSUM_PARTIAL without skb_checksum_start being set. > > We already know that bad packets can enter the kernel and trigger > this, using packet sockets and virtio_net_hdr. Unfortunately, this > *is* the fix. It sounds almost like we need a CHECKSUM_DODGY to go along with the SKB_GSO_DODGY in order to resolve these kinds of issues. So essentially we have a source that we know can give us bad packets. We really should be performing some sort of validation on these much earlier in order to clean them up so that we don't have to add this sort of exception handling code all over the Tx path.
On Thu, Sep 2, 2021 at 5:17 PM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > On Thu, Sep 2, 2021 at 1:30 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Thu, Sep 2, 2021 at 4:25 PM Alexander Duyck > > <alexander.duyck@gmail.com> wrote: > > > > > > On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > From: Willem de Bruijn <willemb@google.com> > > > > > > > > Only test integrity of csum_start if checksum offload is configured. > > > > > > > > With checksum offload and GRE tunnel checksum, gre_build_header will > > > > cheaply build the GRE checksum using local checksum offload. This > > > > depends on inner packet csum offload, and thus that csum_start points > > > > behind GRE. But validate this condition only with checksum offload. > > > > > > > > Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ > > > > Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start") > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > --- > > > > net/ipv4/ip_gre.c | 5 ++++- > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > > > > index 177d26d8fb9c..09311992a617 100644 > > > > --- a/net/ipv4/ip_gre.c > > > > +++ b/net/ipv4/ip_gre.c > > > > @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev, > > > > > > > > static int gre_handle_offloads(struct sk_buff *skb, bool csum) > > > > { > > > > - if (csum && skb_checksum_start(skb) < skb->data) > > > > + /* Local checksum offload requires csum offload of the inner packet */ > > > > + if (csum && skb->ip_summed == CHECKSUM_PARTIAL && > > > > + skb_checksum_start(skb) < skb->data) > > > > return -EINVAL; > > > > + > > > > return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE); > > > > } > > > > Thanks for taking a look. > > > > > So a few minor nits. > > > > > > First I think we need this for both v4 and v6 since it looks like this > > > code is reproduced for net/ipv6/ip6_gre.c. > > > > I sent a separate patch for v6. Perhaps should have made it a series > > to make this more clear. > > Yeah, that was part of the reason I assumed the ipv6 patch was overlooked. I was in two minds only because a series should come with a cover letter and thus one extra email added to the firehose. But this makes clear the value. Will just do that in the future. > > > Second I don't know if we even need to bother including the "csum" > > > portion of the lookup since that technically is just telling us if the > > > GRE tunnel is requesting a checksum or not and I am not sure that > > > applies to the fact that the inner L4 header is going to be what is > > > requesting the checksum offload most likely. > > > > This test introduced in the original patch specifically protects the > > GRE tunnel checksum calculation using lco_csum. The regular inner > > packet path likely is already robust (as similar bug reports would be > > more likely for that more common case). > > I was just thinking in terms of shaving off some extra comparisons. I > suppose it depends on if this is being inlined or not. If it is being > inlined there are at least 2 cases where the if statement would be > dropped since csum is explicitly false. My thought was that by just > jumping straight to the skb->ip_summed check we can drop the lookup > for csum since it would be implied by the fact that skb->ip_summed is > being checked. It would create a broader check, but at the same time > it would reduce the number of comparisons in the call. Most GRE tunnels don't have checksums and csum is likely in a register, as function argument, so it likely is the cheaper test? More functional argument: if !csum, the GRE tunnel does not care about the integrity of csum_start. So I think that it should not read it at all. > > > Also maybe this should be triggering a WARN_ON_ONCE if we hit this as > > > the path triggering this should be fixed rather than us silently > > > dropping frames. We should be figuring out what cases are resulting in > > > us getting CHECKSUM_PARTIAL without skb_checksum_start being set. > > > > We already know that bad packets can enter the kernel and trigger > > this, using packet sockets and virtio_net_hdr. Unfortunately, this > > *is* the fix. > > It sounds almost like we need a CHECKSUM_DODGY to go along with the > SKB_GSO_DODGY in order to resolve these kinds of issues. > > So essentially we have a source that we know can give us bad packets. > We really should be performing some sort of validation on these much > earlier in order to clean them up so that we don't have to add this > sort of exception handling code all over the Tx path. Agreed with the concern. I've been arguing for validation at kernel entry of virtio_net_hdr. As an optional feature, if nothing else: https://patchwork.kernel.org/project/netdevbpf/patch/20210616203448.995314-3-tannerlove.kernel@gmail.com/ But unless we accept the cost of full parsing to identify the transport headers, we cannot predict at that stage whether the field is bogus, let alone whether it might trigger a bug later on. We do basic validation: csum_start is verified to be within the skb linear, so not totally out of bounds. That the offset is not just bogus, but causes a bug appears to be a rare exception peculiar to the GRE tunnel. Only it pulls the outer header (in ipgre_xmit), applies lco_csum and can so trigger negative overflow, as far as I could tell. That's why we decided to add the limited check local to that code. I'm not sure how we would use CHECKSUM_DODGY in practice.
On Thu, Sep 2, 2021 at 2:45 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Thu, Sep 2, 2021 at 5:17 PM Alexander Duyck > <alexander.duyck@gmail.com> wrote: > > > > On Thu, Sep 2, 2021 at 1:30 PM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > On Thu, Sep 2, 2021 at 4:25 PM Alexander Duyck > > > <alexander.duyck@gmail.com> wrote: > > > > > > > > On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > From: Willem de Bruijn <willemb@google.com> > > > > > > > > > > Only test integrity of csum_start if checksum offload is configured. > > > > > > > > > > With checksum offload and GRE tunnel checksum, gre_build_header will > > > > > cheaply build the GRE checksum using local checksum offload. This > > > > > depends on inner packet csum offload, and thus that csum_start points > > > > > behind GRE. But validate this condition only with checksum offload. > > > > > > > > > > Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ > > > > > Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start") > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > --- > > > > > net/ipv4/ip_gre.c | 5 ++++- > > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > > > > > index 177d26d8fb9c..09311992a617 100644 > > > > > --- a/net/ipv4/ip_gre.c > > > > > +++ b/net/ipv4/ip_gre.c > > > > > @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev, > > > > > > > > > > static int gre_handle_offloads(struct sk_buff *skb, bool csum) > > > > > { > > > > > - if (csum && skb_checksum_start(skb) < skb->data) > > > > > + /* Local checksum offload requires csum offload of the inner packet */ > > > > > + if (csum && skb->ip_summed == CHECKSUM_PARTIAL && > > > > > + skb_checksum_start(skb) < skb->data) > > > > > return -EINVAL; > > > > > + > > > > > return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE); > > > > > } > > > > > > Thanks for taking a look. > > > > > > > So a few minor nits. > > > > > > > > First I think we need this for both v4 and v6 since it looks like this > > > > code is reproduced for net/ipv6/ip6_gre.c. > > > > > > I sent a separate patch for v6. Perhaps should have made it a series > > > to make this more clear. > > > > Yeah, that was part of the reason I assumed the ipv6 patch was overlooked. > > I was in two minds only because a series should come with a cover > letter and thus one extra email added to the firehose. But this makes > clear the value. Will just do that in the future. > > > > > Second I don't know if we even need to bother including the "csum" > > > > portion of the lookup since that technically is just telling us if the > > > > GRE tunnel is requesting a checksum or not and I am not sure that > > > > applies to the fact that the inner L4 header is going to be what is > > > > requesting the checksum offload most likely. > > > > > > This test introduced in the original patch specifically protects the > > > GRE tunnel checksum calculation using lco_csum. The regular inner > > > packet path likely is already robust (as similar bug reports would be > > > more likely for that more common case). > > > > I was just thinking in terms of shaving off some extra comparisons. I > > suppose it depends on if this is being inlined or not. If it is being > > inlined there are at least 2 cases where the if statement would be > > dropped since csum is explicitly false. My thought was that by just > > jumping straight to the skb->ip_summed check we can drop the lookup > > for csum since it would be implied by the fact that skb->ip_summed is > > being checked. It would create a broader check, but at the same time > > it would reduce the number of comparisons in the call. > > Most GRE tunnels don't have checksums and csum is likely in a register, > as function argument, so it likely is the cheaper test? > > More functional argument: if !csum, the GRE tunnel does not care about > the integrity of csum_start. So I think that it should not read it at all. The problem as I see it is that it is just passing the problem on. Adding the check to the GRE drivers only really fixes one spot that exposes the issue. In this case it was triggered because the lco_csum was causing issues. What happens when one of these packets makes it down to hardware and it has to deal with the malformed csum_start? I suspect we end up potentially causing issues where Tx metadata could be malformed resulting in a dropped packet at the best case, and kernel panics at the worst. > > > > Also maybe this should be triggering a WARN_ON_ONCE if we hit this as > > > > the path triggering this should be fixed rather than us silently > > > > dropping frames. We should be figuring out what cases are resulting in > > > > us getting CHECKSUM_PARTIAL without skb_checksum_start being set. > > > > > > We already know that bad packets can enter the kernel and trigger > > > this, using packet sockets and virtio_net_hdr. Unfortunately, this > > > *is* the fix. > > > > It sounds almost like we need a CHECKSUM_DODGY to go along with the > > SKB_GSO_DODGY in order to resolve these kinds of issues. > > > > So essentially we have a source that we know can give us bad packets. > > We really should be performing some sort of validation on these much > > earlier in order to clean them up so that we don't have to add this > > sort of exception handling code all over the Tx path. > > Agreed with the concern. I've been arguing for validation at kernel > entry of virtio_net_hdr. As an optional feature, if nothing else: > https://patchwork.kernel.org/project/netdevbpf/patch/20210616203448.995314-3-tannerlove.kernel@gmail.com/ > > But unless we accept the cost of full parsing to identify the > transport headers, we cannot predict at that stage whether the field > is bogus, let alone whether it might trigger a bug later on. We do > basic validation: csum_start is verified to be within the skb linear, > so not totally out of bounds. > > That the offset is not just bogus, but causes a bug appears to be a > rare exception peculiar to the GRE tunnel. Only it pulls the outer > header (in ipgre_xmit), applies lco_csum and can so trigger negative > overflow, as far as I could tell. That's why we decided to add the > limited check local to that code. > > I'm not sure how we would use CHECKSUM_DODGY in practice. The issue is drivers with NETIF_F_HW_CSUM would be expecting a reasonable csum_start and csum_offset. If the hardware is only advertising that and we are expecting it to offload the checksum we should probably be doing some sort of validation on user derived inputs to make sure that they aren't totally ridiculous as is the case here where the original issue was that the csum_start wasn't even in the packet data. Would it maybe make sense to look at reverting the earlier fixes and instead updating skb_partial_csum_set so that we cannot write a csum_start value that is less than the start of skb->data? That way we are addressing this at the source instead of allowing the garbage data to propagate further down the stack and having to drop it at the driver level which is going to have us playing whack a mole trying to fix it where it pops up. It seems like we are already validating the upper bounds for these values in that function so why not validate the lower bounds as well?
On Thu, Sep 2, 2021 at 10:01 PM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > On Thu, Sep 2, 2021 at 2:45 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Thu, Sep 2, 2021 at 5:17 PM Alexander Duyck > > <alexander.duyck@gmail.com> wrote: > > > > > > On Thu, Sep 2, 2021 at 1:30 PM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > On Thu, Sep 2, 2021 at 4:25 PM Alexander Duyck > > > > <alexander.duyck@gmail.com> wrote: > > > > > > > > > > On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn > > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > > > From: Willem de Bruijn <willemb@google.com> > > > > > > > > > > > > Only test integrity of csum_start if checksum offload is configured. > > > > > > > > > > > > With checksum offload and GRE tunnel checksum, gre_build_header will > > > > > > cheaply build the GRE checksum using local checksum offload. This > > > > > > depends on inner packet csum offload, and thus that csum_start points > > > > > > behind GRE. But validate this condition only with checksum offload. > > > > > > > > > > > > Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ > > > > > > Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start") > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > --- > > > > > > net/ipv4/ip_gre.c | 5 ++++- > > > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > > > > > > index 177d26d8fb9c..09311992a617 100644 > > > > > > --- a/net/ipv4/ip_gre.c > > > > > > +++ b/net/ipv4/ip_gre.c > > > > > > @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev, > > > > > > > > > > > > static int gre_handle_offloads(struct sk_buff *skb, bool csum) > > > > > > { > > > > > > - if (csum && skb_checksum_start(skb) < skb->data) > > > > > > + /* Local checksum offload requires csum offload of the inner packet */ > > > > > > + if (csum && skb->ip_summed == CHECKSUM_PARTIAL && > > > > > > + skb_checksum_start(skb) < skb->data) > > > > > > return -EINVAL; > > > > > > + > > > > > > return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE); > > > > > > } > > > > > > > > Thanks for taking a look. > > > > > > > > > So a few minor nits. > > > > > > > > > > First I think we need this for both v4 and v6 since it looks like this > > > > > code is reproduced for net/ipv6/ip6_gre.c. > > > > > > > > I sent a separate patch for v6. Perhaps should have made it a series > > > > to make this more clear. > > > > > > Yeah, that was part of the reason I assumed the ipv6 patch was overlooked. > > > > I was in two minds only because a series should come with a cover > > letter and thus one extra email added to the firehose. But this makes > > clear the value. Will just do that in the future. > > > > > > > Second I don't know if we even need to bother including the "csum" > > > > > portion of the lookup since that technically is just telling us if the > > > > > GRE tunnel is requesting a checksum or not and I am not sure that > > > > > applies to the fact that the inner L4 header is going to be what is > > > > > requesting the checksum offload most likely. > > > > > > > > This test introduced in the original patch specifically protects the > > > > GRE tunnel checksum calculation using lco_csum. The regular inner > > > > packet path likely is already robust (as similar bug reports would be > > > > more likely for that more common case). > > > > > > I was just thinking in terms of shaving off some extra comparisons. I > > > suppose it depends on if this is being inlined or not. If it is being > > > inlined there are at least 2 cases where the if statement would be > > > dropped since csum is explicitly false. My thought was that by just > > > jumping straight to the skb->ip_summed check we can drop the lookup > > > for csum since it would be implied by the fact that skb->ip_summed is > > > being checked. It would create a broader check, but at the same time > > > it would reduce the number of comparisons in the call. > > > > Most GRE tunnels don't have checksums and csum is likely in a register, > > as function argument, so it likely is the cheaper test? > > > > More functional argument: if !csum, the GRE tunnel does not care about > > the integrity of csum_start. So I think that it should not read it at all. > > The problem as I see it is that it is just passing the problem on. > Adding the check to the GRE drivers only really fixes one spot that > exposes the issue. In this case it was triggered because the lco_csum > was causing issues. What happens when one of these packets makes it > down to hardware and it has to deal with the malformed csum_start? I > suspect we end up potentially causing issues where Tx metadata could > be malformed resulting in a dropped packet at the best case, and > kernel panics at the worst. > > > > > > Also maybe this should be triggering a WARN_ON_ONCE if we hit this as > > > > > the path triggering this should be fixed rather than us silently > > > > > dropping frames. We should be figuring out what cases are resulting in > > > > > us getting CHECKSUM_PARTIAL without skb_checksum_start being set. > > > > > > > > We already know that bad packets can enter the kernel and trigger > > > > this, using packet sockets and virtio_net_hdr. Unfortunately, this > > > > *is* the fix. > > > > > > It sounds almost like we need a CHECKSUM_DODGY to go along with the > > > SKB_GSO_DODGY in order to resolve these kinds of issues. > > > > > > So essentially we have a source that we know can give us bad packets. > > > We really should be performing some sort of validation on these much > > > earlier in order to clean them up so that we don't have to add this > > > sort of exception handling code all over the Tx path. > > > > Agreed with the concern. I've been arguing for validation at kernel > > entry of virtio_net_hdr. As an optional feature, if nothing else: > > https://patchwork.kernel.org/project/netdevbpf/patch/20210616203448.995314-3-tannerlove.kernel@gmail.com/ > > > > But unless we accept the cost of full parsing to identify the > > transport headers, we cannot predict at that stage whether the field > > is bogus, let alone whether it might trigger a bug later on. We do > > basic validation: csum_start is verified to be within the skb linear, > > so not totally out of bounds. > > > > That the offset is not just bogus, but causes a bug appears to be a > > rare exception peculiar to the GRE tunnel. Only it pulls the outer > > header (in ipgre_xmit), applies lco_csum and can so trigger negative > > overflow, as far as I could tell. That's why we decided to add the > > limited check local to that code. > > > > I'm not sure how we would use CHECKSUM_DODGY in practice. > > The issue is drivers with NETIF_F_HW_CSUM would be expecting a > reasonable csum_start and csum_offset. If the hardware is only > advertising that and we are expecting it to offload the checksum we > should probably be doing some sort of validation on user derived > inputs to make sure that they aren't totally ridiculous as is the case > here where the original issue was that the csum_start wasn't even in > the packet data. > > Would it maybe make sense to look at reverting the earlier fixes and > instead updating skb_partial_csum_set so that we cannot write a > csum_start value that is less than the start of skb->data? That way we > are addressing this at the source instead of allowing the garbage data > to propagate further down the stack and having to drop it at the > driver level which is going to have us playing whack a mole trying to > fix it where it pops up. It seems like we are already validating the > upper bounds for these values in that function so why not validate the > lower bounds as well? skb_partial_csum_set verifies that csum_start is within bounds at the time it is called. The offset passed from userspace is an unsigned int added to skb_headroom(skb), so >= skb->data. The issue is with ipgre_xmit calling skb_pull. Only then does it become out of bounds. From what I saw, pulling headers like that is very rare in the transmit path. Indeed, I did not see it in any other tunnels. We could instrument each of these directly, but at non-trivial cost. The negative underflow and kernel crash is very specific to lco_csum, which calculates the offset between csum_offset and transport_offset. Standard checksum offset doesn't. One alternative fix, then, would be to add a negative overflow check in lco_csum itself. That only has three callers and unfortunately by the time we hit that for GRE in gre_build_header, there no longer is a path to cleanly dropping the packet and returning an error. I don't find this bad input whack-a-mole elegant either. But I thought it was the least bad option..
On Thu, Sep 2, 2021 at 7:41 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Thu, Sep 2, 2021 at 10:01 PM Alexander Duyck > <alexander.duyck@gmail.com> wrote: > > > > On Thu, Sep 2, 2021 at 2:45 PM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > On Thu, Sep 2, 2021 at 5:17 PM Alexander Duyck > > > <alexander.duyck@gmail.com> wrote: > > > > > > > > On Thu, Sep 2, 2021 at 1:30 PM Willem de Bruijn > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > On Thu, Sep 2, 2021 at 4:25 PM Alexander Duyck > > > > > <alexander.duyck@gmail.com> wrote: > > > > > > > > > > > > On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn > > > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > > > > > From: Willem de Bruijn <willemb@google.com> > > > > > > > > > > > > > > Only test integrity of csum_start if checksum offload is configured. > > > > > > > > > > > > > > With checksum offload and GRE tunnel checksum, gre_build_header will > > > > > > > cheaply build the GRE checksum using local checksum offload. This > > > > > > > depends on inner packet csum offload, and thus that csum_start points > > > > > > > behind GRE. But validate this condition only with checksum offload. > > > > > > > > > > > > > > Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ > > > > > > > Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start") > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > > --- > > > > > > > net/ipv4/ip_gre.c | 5 ++++- > > > > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > > > > > > > index 177d26d8fb9c..09311992a617 100644 > > > > > > > --- a/net/ipv4/ip_gre.c > > > > > > > +++ b/net/ipv4/ip_gre.c > > > > > > > @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev, > > > > > > > > > > > > > > static int gre_handle_offloads(struct sk_buff *skb, bool csum) > > > > > > > { > > > > > > > - if (csum && skb_checksum_start(skb) < skb->data) > > > > > > > + /* Local checksum offload requires csum offload of the inner packet */ > > > > > > > + if (csum && skb->ip_summed == CHECKSUM_PARTIAL && > > > > > > > + skb_checksum_start(skb) < skb->data) > > > > > > > return -EINVAL; > > > > > > > + > > > > > > > return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE); > > > > > > > } > > > > > > > > > > Thanks for taking a look. > > > > > > > > > > > So a few minor nits. > > > > > > > > > > > > First I think we need this for both v4 and v6 since it looks like this > > > > > > code is reproduced for net/ipv6/ip6_gre.c. > > > > > > > > > > I sent a separate patch for v6. Perhaps should have made it a series > > > > > to make this more clear. > > > > > > > > Yeah, that was part of the reason I assumed the ipv6 patch was overlooked. > > > > > > I was in two minds only because a series should come with a cover > > > letter and thus one extra email added to the firehose. But this makes > > > clear the value. Will just do that in the future. > > > > > > > > > Second I don't know if we even need to bother including the "csum" > > > > > > portion of the lookup since that technically is just telling us if the > > > > > > GRE tunnel is requesting a checksum or not and I am not sure that > > > > > > applies to the fact that the inner L4 header is going to be what is > > > > > > requesting the checksum offload most likely. > > > > > > > > > > This test introduced in the original patch specifically protects the > > > > > GRE tunnel checksum calculation using lco_csum. The regular inner > > > > > packet path likely is already robust (as similar bug reports would be > > > > > more likely for that more common case). > > > > > > > > I was just thinking in terms of shaving off some extra comparisons. I > > > > suppose it depends on if this is being inlined or not. If it is being > > > > inlined there are at least 2 cases where the if statement would be > > > > dropped since csum is explicitly false. My thought was that by just > > > > jumping straight to the skb->ip_summed check we can drop the lookup > > > > for csum since it would be implied by the fact that skb->ip_summed is > > > > being checked. It would create a broader check, but at the same time > > > > it would reduce the number of comparisons in the call. > > > > > > Most GRE tunnels don't have checksums and csum is likely in a register, > > > as function argument, so it likely is the cheaper test? > > > > > > More functional argument: if !csum, the GRE tunnel does not care about > > > the integrity of csum_start. So I think that it should not read it at all. > > > > The problem as I see it is that it is just passing the problem on. > > Adding the check to the GRE drivers only really fixes one spot that > > exposes the issue. In this case it was triggered because the lco_csum > > was causing issues. What happens when one of these packets makes it > > down to hardware and it has to deal with the malformed csum_start? I > > suspect we end up potentially causing issues where Tx metadata could > > be malformed resulting in a dropped packet at the best case, and > > kernel panics at the worst. > > > > > > > > Also maybe this should be triggering a WARN_ON_ONCE if we hit this as > > > > > > the path triggering this should be fixed rather than us silently > > > > > > dropping frames. We should be figuring out what cases are resulting in > > > > > > us getting CHECKSUM_PARTIAL without skb_checksum_start being set. > > > > > > > > > > We already know that bad packets can enter the kernel and trigger > > > > > this, using packet sockets and virtio_net_hdr. Unfortunately, this > > > > > *is* the fix. > > > > > > > > It sounds almost like we need a CHECKSUM_DODGY to go along with the > > > > SKB_GSO_DODGY in order to resolve these kinds of issues. > > > > > > > > So essentially we have a source that we know can give us bad packets. > > > > We really should be performing some sort of validation on these much > > > > earlier in order to clean them up so that we don't have to add this > > > > sort of exception handling code all over the Tx path. > > > > > > Agreed with the concern. I've been arguing for validation at kernel > > > entry of virtio_net_hdr. As an optional feature, if nothing else: > > > https://patchwork.kernel.org/project/netdevbpf/patch/20210616203448.995314-3-tannerlove.kernel@gmail.com/ > > > > > > But unless we accept the cost of full parsing to identify the > > > transport headers, we cannot predict at that stage whether the field > > > is bogus, let alone whether it might trigger a bug later on. We do > > > basic validation: csum_start is verified to be within the skb linear, > > > so not totally out of bounds. > > > > > > That the offset is not just bogus, but causes a bug appears to be a > > > rare exception peculiar to the GRE tunnel. Only it pulls the outer > > > header (in ipgre_xmit), applies lco_csum and can so trigger negative > > > overflow, as far as I could tell. That's why we decided to add the > > > limited check local to that code. > > > > > > I'm not sure how we would use CHECKSUM_DODGY in practice. > > > > The issue is drivers with NETIF_F_HW_CSUM would be expecting a > > reasonable csum_start and csum_offset. If the hardware is only > > advertising that and we are expecting it to offload the checksum we > > should probably be doing some sort of validation on user derived > > inputs to make sure that they aren't totally ridiculous as is the case > > here where the original issue was that the csum_start wasn't even in > > the packet data. > > > > Would it maybe make sense to look at reverting the earlier fixes and > > instead updating skb_partial_csum_set so that we cannot write a > > csum_start value that is less than the start of skb->data? That way we > > are addressing this at the source instead of allowing the garbage data > > to propagate further down the stack and having to drop it at the > > driver level which is going to have us playing whack a mole trying to > > fix it where it pops up. It seems like we are already validating the > > upper bounds for these values in that function so why not validate the > > lower bounds as well? > > skb_partial_csum_set verifies that csum_start is within bounds > at the time it is called. The offset passed from userspace is > an unsigned int added to skb_headroom(skb), so >= skb->data. > > The issue is with ipgre_xmit calling skb_pull. Only then does it > become out of bounds. From what I saw, pulling headers like > that is very rare in the transmit path. Indeed, I did not see it in > any other tunnels. We could instrument each of these directly, > but at non-trivial cost. So there are some positives and negatives to addressing this in ipgre_xmit. Specifically if we address it before the pull we could do some other quick checks to verify the offset. If the offset and start are both in the region to be pulled we could just drop the offload, whereas if the offset is stored somewhere in the unstripped data we could then drop the packet and count it as a drop without having to modify the frame via the skb_pull. > The negative underflow and kernel crash is very specific to > lco_csum, which calculates the offset between csum_offset > and transport_offset. Standard checksum offset doesn't. > > One alternative fix, then, would be to add a negative overflow > check in lco_csum itself. > That only has three callers and unfortunately by the time we hit > that for GRE in gre_build_header, there no longer is a path to > cleanly dropping the packet and returning an error. Right. We could find the problem there but it doesn't solve the issue. The problem is the expectation that the offset exists in the area after the checksum for the tunnel header, not before. > I don't find this bad input whack-a-mole elegant either. But I > thought it was the least bad option.. The part that has been confusing me is how the virtio_net_hdr could have been pointing as the IP or GRE headers since they shouldn't have been present on the frame provided by the userspace. I think I am starting to see now. So in the case of af_packet it looks like it is calling dev_hard_header which calls header_ops->create before doing the virtio_net_hdr_to_skb call. Do I have that right? Maybe for those cases we need to look at adding an unsigned int argument to virtio_net_hdr_to_skb in which we could pass 0 for the unused case or dev->hard_header_len in the cases where we have something like af_packet that is transmitting over an ipgre tunnel. The general idea is to prevent these virtio_net_hdr_to_skb calls from pointing the csum_start into headers that userspace was not responsible for populating.
> > > The issue is drivers with NETIF_F_HW_CSUM would be expecting a > > > reasonable csum_start and csum_offset. If the hardware is only > > > advertising that and we are expecting it to offload the checksum we > > > should probably be doing some sort of validation on user derived > > > inputs to make sure that they aren't totally ridiculous as is the case > > > here where the original issue was that the csum_start wasn't even in > > > the packet data. > > > > > > Would it maybe make sense to look at reverting the earlier fixes and > > > instead updating skb_partial_csum_set so that we cannot write a > > > csum_start value that is less than the start of skb->data? That way we > > > are addressing this at the source instead of allowing the garbage data > > > to propagate further down the stack and having to drop it at the > > > driver level which is going to have us playing whack a mole trying to > > > fix it where it pops up. It seems like we are already validating the > > > upper bounds for these values in that function so why not validate the > > > lower bounds as well? > > > > skb_partial_csum_set verifies that csum_start is within bounds > > at the time it is called. The offset passed from userspace is > > an unsigned int added to skb_headroom(skb), so >= skb->data. > > > > The issue is with ipgre_xmit calling skb_pull. Only then does it > > become out of bounds. From what I saw, pulling headers like > > that is very rare in the transmit path. Indeed, I did not see it in > > any other tunnels. We could instrument each of these directly, > > but at non-trivial cost. > > So there are some positives and negatives to addressing this in > ipgre_xmit. Specifically if we address it before the pull we could do > some other quick checks to verify the offset. If the offset and start > are both in the region to be pulled we could just drop the offload, These cases would currently crash the kernel. They are so clearly bad input that I would not even try to start accommodating them. Drop them. > whereas if the offset is stored somewhere in the unstripped data we > could then drop the packet and count it as a drop without having to > modify the frame via the skb_pull. This is a broader issue that userspace can pass any csum_start as long as it is within packet bounds. We could address it here specifically for the GRE header. But that still leaves many potentially bad offsets further in the packet in this case, and all the other cases. Checking that specific header seems a bit arbitrary to me, and might actually give false confidence. We could certainly move the validation from gre_handle_offloads to before skb_pull, to make it more obvious *why* the check exists. > > The negative underflow and kernel crash is very specific to > > lco_csum, which calculates the offset between csum_offset > > and transport_offset. Standard checksum offset doesn't. > > > > One alternative fix, then, would be to add a negative overflow > > check in lco_csum itself. > > That only has three callers and unfortunately by the time we hit > > that for GRE in gre_build_header, there no longer is a path to > > cleanly dropping the packet and returning an error. > > Right. We could find the problem there but it doesn't solve the issue. > The problem is the expectation that the offset exists in the area > after the checksum for the tunnel header, not before. > > > I don't find this bad input whack-a-mole elegant either. But I > > thought it was the least bad option.. > > The part that has been confusing me is how the virtio_net_hdr could > have been pointing as the IP or GRE headers since they shouldn't have > been present on the frame provided by the userspace. I think I am > starting to see now. > > So in the case of af_packet it looks like it is calling > dev_hard_header which calls header_ops->create before doing the > virtio_net_hdr_to_skb call. Do I have that right? Mostly yes. That is the case for SOCK_DGRAM. For SOCK_RAW, the caller is expected to have prepared these headers. Note that for the ip_gre device to have header_ops, this will be ipgre_header_ops, which .create member function comment starts with "/* Nice toy. Unfortunately, useless in real life :-)". We recently had a small series of fixes related to this special case and packet sockets, such as commit fdafed459998 ("ip_gre: set dev->hard_header_len and dev->needed_headroom properly"). This case of a GRE device that receives packets with outer IP + GRE headers is out of the ordinary. > Maybe for those > cases we need to look at adding an unsigned int argument to > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or > dev->hard_header_len in the cases where we have something like > af_packet that is transmitting over an ipgre tunnel. The general idea > is to prevent these virtio_net_hdr_to_skb calls from pointing the > csum_start into headers that userspace was not responsible for > populating. One issue with that is that dev->hard_header_len itself is imprecise for protocols with variable length link layer headers. There, too, we have had a variety of bug fixes in the past. It also adds cost to every user of virtio_net_hdr, while we only know one issue in a rare case of the IP_GRE device.
On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > <snip> > > whereas if the offset is stored somewhere in the unstripped data we > > could then drop the packet and count it as a drop without having to > > modify the frame via the skb_pull. > > This is a broader issue that userspace can pass any csum_start as long > as it is within packet bounds. We could address it here specifically > for the GRE header. But that still leaves many potentially bad offsets > further in the packet in this case, and all the other cases. Checking > that specific header seems a bit arbitrary to me, and might actually > give false confidence. > > We could certainly move the validation from gre_handle_offloads to > before skb_pull, to make it more obvious *why* the check exists. Agreed. My main concern is that the csum_start is able to be located somewhere where the userspace didn't write. For the most part the csum_start and csum_offset just needs to be restricted to the regions that the userspace actually wrote to. > > > The negative underflow and kernel crash is very specific to > > > lco_csum, which calculates the offset between csum_offset > > > and transport_offset. Standard checksum offset doesn't. > > > > > > One alternative fix, then, would be to add a negative overflow > > > check in lco_csum itself. > > > That only has three callers and unfortunately by the time we hit > > > that for GRE in gre_build_header, there no longer is a path to > > > cleanly dropping the packet and returning an error. > > > > Right. We could find the problem there but it doesn't solve the issue. > > The problem is the expectation that the offset exists in the area > > after the checksum for the tunnel header, not before. > > > > > I don't find this bad input whack-a-mole elegant either. But I > > > thought it was the least bad option.. > > > > The part that has been confusing me is how the virtio_net_hdr could > > have been pointing as the IP or GRE headers since they shouldn't have > > been present on the frame provided by the userspace. I think I am > > starting to see now. > > > > So in the case of af_packet it looks like it is calling > > dev_hard_header which calls header_ops->create before doing the > > virtio_net_hdr_to_skb call. Do I have that right? > > Mostly yes. That is the case for SOCK_DGRAM. For SOCK_RAW, the caller > is expected to have prepared these headers. > > Note that for the ip_gre device to have header_ops, this will be > ipgre_header_ops, which .create member function comment starts with > "/* Nice toy. Unfortunately, useless in real life :-)". We recently > had a small series of fixes related to this special case and packet > sockets, such as commit fdafed459998 ("ip_gre: set > dev->hard_header_len and dev->needed_headroom properly"). This case of > a GRE device that receives packets with outer IP + GRE headers is out > of the ordinary. I'm not sure the comment is about the functionality of the function as much as the addressing. Seems to be about setting up a multicast GRE configuration. > > Maybe for those > > cases we need to look at adding an unsigned int argument to > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or > > dev->hard_header_len in the cases where we have something like > > af_packet that is transmitting over an ipgre tunnel. The general idea > > is to prevent these virtio_net_hdr_to_skb calls from pointing the > > csum_start into headers that userspace was not responsible for > > populating. > > One issue with that is that dev->hard_header_len itself is imprecise > for protocols with variable length link layer headers. There, too, we > have had a variety of bug fixes in the past. > > It also adds cost to every user of virtio_net_hdr, while we only know > one issue in a rare case of the IP_GRE device. Quick question, the assumption is that the checksum should always be performed starting no earlier than the transport header right? Looking over virtio_net_hdr_to_skb it looks like it is already verifying the transport header is in the linear portion of the skb. I'm wondering if we couldn't just look at adding a check to verify the transport offset is <= csum start? We might also be able to get rid of one of the two calls to pskb_may_pull by doing that.
On Fri, Sep 3, 2021 at 7:27 PM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > <snip> > > > > whereas if the offset is stored somewhere in the unstripped data we > > > could then drop the packet and count it as a drop without having to > > > modify the frame via the skb_pull. > > > > This is a broader issue that userspace can pass any csum_start as long > > as it is within packet bounds. We could address it here specifically > > for the GRE header. But that still leaves many potentially bad offsets > > further in the packet in this case, and all the other cases. Checking > > that specific header seems a bit arbitrary to me, and might actually > > give false confidence. > > > > We could certainly move the validation from gre_handle_offloads to > > before skb_pull, to make it more obvious *why* the check exists. > > Agreed. My main concern is that the csum_start is able to be located > somewhere where the userspace didn't write. For the most part the > csum_start and csum_offset just needs to be restricted to the regions > that the userspace actually wrote to. I don't quite follow. Even with this bug, the offset is somewhere userspace wrote. That data is just pulled. > > > Maybe for those > > > cases we need to look at adding an unsigned int argument to > > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or > > > dev->hard_header_len in the cases where we have something like > > > af_packet that is transmitting over an ipgre tunnel. The general idea > > > is to prevent these virtio_net_hdr_to_skb calls from pointing the > > > csum_start into headers that userspace was not responsible for > > > populating. > > > > One issue with that is that dev->hard_header_len itself is imprecise > > for protocols with variable length link layer headers. There, too, we > > have had a variety of bug fixes in the past. > > > > It also adds cost to every user of virtio_net_hdr, while we only know > > one issue in a rare case of the IP_GRE device. > > Quick question, the assumption is that the checksum should always be > performed starting no earlier than the transport header right? Looking > over virtio_net_hdr_to_skb it looks like it is already verifying the > transport header is in the linear portion of the skb. I'm wondering if > we couldn't just look at adding a check to verify the transport offset > is <= csum start? We might also be able to get rid of one of the two > calls to pskb_may_pull by doing that. Are you referring to this part in the .._NEEDS_CSUM branch? if (!skb_partial_csum_set(skb, start, off)) return -EINVAL; p_off = skb_transport_offset(skb) + thlen; if (!pskb_may_pull(skb, p_off)) return -EINVAL; skb_partial_csum_set is actually what sets the transport offset, derived from start.
On Sat, Sep 4, 2021 at 7:46 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Fri, Sep 3, 2021 at 7:27 PM Alexander Duyck > <alexander.duyck@gmail.com> wrote: > > > > On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > <snip> > > > > > > whereas if the offset is stored somewhere in the unstripped data we > > > > could then drop the packet and count it as a drop without having to > > > > modify the frame via the skb_pull. > > > > > > This is a broader issue that userspace can pass any csum_start as long > > > as it is within packet bounds. We could address it here specifically > > > for the GRE header. But that still leaves many potentially bad offsets > > > further in the packet in this case, and all the other cases. Checking > > > that specific header seems a bit arbitrary to me, and might actually > > > give false confidence. > > > > > > We could certainly move the validation from gre_handle_offloads to > > > before skb_pull, to make it more obvious *why* the check exists. > > > > Agreed. My main concern is that the csum_start is able to be located > > somewhere where the userspace didn't write. For the most part the > > csum_start and csum_offset just needs to be restricted to the regions > > that the userspace actually wrote to. > > I don't quite follow. Even with this bug, the offset is somewhere userspace > wrote. That data is just pulled. Sorry, I was thinking of the SOCK_DGRAM case where the header is added via a call to dev_hard_header(). > > > > Maybe for those > > > > cases we need to look at adding an unsigned int argument to > > > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or > > > > dev->hard_header_len in the cases where we have something like > > > > af_packet that is transmitting over an ipgre tunnel. The general idea > > > > is to prevent these virtio_net_hdr_to_skb calls from pointing the > > > > csum_start into headers that userspace was not responsible for > > > > populating. > > > > > > One issue with that is that dev->hard_header_len itself is imprecise > > > for protocols with variable length link layer headers. There, too, we > > > have had a variety of bug fixes in the past. > > > > > > It also adds cost to every user of virtio_net_hdr, while we only know > > > one issue in a rare case of the IP_GRE device. > > > > Quick question, the assumption is that the checksum should always be > > performed starting no earlier than the transport header right? Looking > > over virtio_net_hdr_to_skb it looks like it is already verifying the > > transport header is in the linear portion of the skb. I'm wondering if > > we couldn't just look at adding a check to verify the transport offset > > is <= csum start? We might also be able to get rid of one of the two > > calls to pskb_may_pull by doing that. > > Are you referring to this part in the .._NEEDS_CSUM branch? > > if (!skb_partial_csum_set(skb, start, off)) > return -EINVAL; > > p_off = skb_transport_offset(skb) + thlen; > if (!pskb_may_pull(skb, p_off)) > return -EINVAL; > > skb_partial_csum_set is actually what sets the transport offset, > derived from start. Ugh, I had overlooked that as I was more focused on the skb_probe_transport_header calls in the af_packet code. So we can have both the transport offset and the csum_start in a region that gets stripped by the ipgre code. Worse yet the inner transport header will also be pointing somewhere outside of the encapsulated region when we pass it off to skb_reset_inner_headers(). Maybe it would make sense to just have the check look into the transport offset instead of csum start as that way you are essentially addressing two possible issues instead of one, and it would effectively combine multiple checks as the uninitialized value is ~0 which should always be greater than "skb_headroom + tunnel->hlen + sizeof(struct iphdr)". I think you mentioned before placing a check just before you make the call to skb_pull in the GRE transmit path. Doing that we would at least reduce the impact as it would only apply in the header_ops case in ipgre_xmit instead of being applied to all the transmit paths which don't perform the pull.
On Sat, Sep 4, 2021 at 11:37 AM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > On Sat, Sep 4, 2021 at 7:46 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Fri, Sep 3, 2021 at 7:27 PM Alexander Duyck > > <alexander.duyck@gmail.com> wrote: > > > > > > On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > <snip> > > > > > > > > whereas if the offset is stored somewhere in the unstripped data we > > > > > could then drop the packet and count it as a drop without having to > > > > > modify the frame via the skb_pull. > > > > > > > > This is a broader issue that userspace can pass any csum_start as long > > > > as it is within packet bounds. We could address it here specifically > > > > for the GRE header. But that still leaves many potentially bad offsets > > > > further in the packet in this case, and all the other cases. Checking > > > > that specific header seems a bit arbitrary to me, and might actually > > > > give false confidence. > > > > > > > > We could certainly move the validation from gre_handle_offloads to > > > > before skb_pull, to make it more obvious *why* the check exists. > > > > > > Agreed. My main concern is that the csum_start is able to be located > > > somewhere where the userspace didn't write. For the most part the > > > csum_start and csum_offset just needs to be restricted to the regions > > > that the userspace actually wrote to. > > > > I don't quite follow. Even with this bug, the offset is somewhere userspace > > wrote. That data is just pulled. > > Sorry, I was thinking of the SOCK_DGRAM case where the header is added > via a call to dev_hard_header(). > > > > > > Maybe for those > > > > > cases we need to look at adding an unsigned int argument to > > > > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or > > > > > dev->hard_header_len in the cases where we have something like > > > > > af_packet that is transmitting over an ipgre tunnel. The general idea > > > > > is to prevent these virtio_net_hdr_to_skb calls from pointing the > > > > > csum_start into headers that userspace was not responsible for > > > > > populating. > > > > > > > > One issue with that is that dev->hard_header_len itself is imprecise > > > > for protocols with variable length link layer headers. There, too, we > > > > have had a variety of bug fixes in the past. > > > > > > > > It also adds cost to every user of virtio_net_hdr, while we only know > > > > one issue in a rare case of the IP_GRE device. > > > > > > Quick question, the assumption is that the checksum should always be > > > performed starting no earlier than the transport header right? Looking > > > over virtio_net_hdr_to_skb it looks like it is already verifying the > > > transport header is in the linear portion of the skb. I'm wondering if > > > we couldn't just look at adding a check to verify the transport offset > > > is <= csum start? We might also be able to get rid of one of the two > > > calls to pskb_may_pull by doing that. > > > > Are you referring to this part in the .._NEEDS_CSUM branch? > > > > if (!skb_partial_csum_set(skb, start, off)) > > return -EINVAL; > > > > p_off = skb_transport_offset(skb) + thlen; > > if (!pskb_may_pull(skb, p_off)) > > return -EINVAL; > > > > skb_partial_csum_set is actually what sets the transport offset, > > derived from start. > > Ugh, I had overlooked that as I was more focused on the > skb_probe_transport_header calls in the af_packet code. > > So we can have both the transport offset and the csum_start in a > region that gets stripped by the ipgre code. Worse yet the inner > transport header will also be pointing somewhere outside of the > encapsulated region when we pass it off to skb_reset_inner_headers(). > > Maybe it would make sense to just have the check look into the > transport offset instead of csum start as that way you are essentially > addressing two possible issues instead of one, and it would > effectively combine multiple checks as the uninitialized value is ~0 > which should always be greater than "skb_headroom + tunnel->hlen + > sizeof(struct iphdr)". I think you mentioned before placing a check > just before you make the call to skb_pull in the GRE transmit path. > Doing that we would at least reduce the impact as it would only apply > in the header_ops case in ipgre_xmit instead of being applied to all > the transmit paths which don't perform the pull. Do you mean if (dev->header_ops) { + int pull_len = tunnel->hlen + sizeof(struct iphdr); + if (skb_cow_head(skb, 0)) goto free_skb; tnl_params = (const struct iphdr *)skb->data; + if (pull_len > skb_transport_offset(skb)) + goto free_skb; + /* Pull skb since ip_tunnel_xmit() needs skb->data pointing * to gre header. */ - skb_pull(skb, tunnel->hlen + sizeof(struct iphdr)); + skb_pull(skb, pull_len); skb_reset_mac_header(skb); plus then static int gre_handle_offloads(struct sk_buff *skb, bool csum) { - /* Local checksum offload requires csum offload of the inner packet */ - if (csum && skb->ip_summed == CHECKSUM_PARTIAL && - skb_checksum_start(skb) < skb->data) - return -EINVAL; - return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE); }
On Sat, Sep 4, 2021 at 2:40 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Sat, Sep 4, 2021 at 11:37 AM Alexander Duyck > <alexander.duyck@gmail.com> wrote: > > > > On Sat, Sep 4, 2021 at 7:46 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > On Fri, Sep 3, 2021 at 7:27 PM Alexander Duyck > > > <alexander.duyck@gmail.com> wrote: > > > > > > > > On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > > > > <snip> > > > > > > > > > > whereas if the offset is stored somewhere in the unstripped data we > > > > > > could then drop the packet and count it as a drop without having to > > > > > > modify the frame via the skb_pull. > > > > > > > > > > This is a broader issue that userspace can pass any csum_start as long > > > > > as it is within packet bounds. We could address it here specifically > > > > > for the GRE header. But that still leaves many potentially bad offsets > > > > > further in the packet in this case, and all the other cases. Checking > > > > > that specific header seems a bit arbitrary to me, and might actually > > > > > give false confidence. > > > > > > > > > > We could certainly move the validation from gre_handle_offloads to > > > > > before skb_pull, to make it more obvious *why* the check exists. > > > > > > > > Agreed. My main concern is that the csum_start is able to be located > > > > somewhere where the userspace didn't write. For the most part the > > > > csum_start and csum_offset just needs to be restricted to the regions > > > > that the userspace actually wrote to. > > > > > > I don't quite follow. Even with this bug, the offset is somewhere userspace > > > wrote. That data is just pulled. > > > > Sorry, I was thinking of the SOCK_DGRAM case where the header is added > > via a call to dev_hard_header(). > > > > > > > > Maybe for those > > > > > > cases we need to look at adding an unsigned int argument to > > > > > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or > > > > > > dev->hard_header_len in the cases where we have something like > > > > > > af_packet that is transmitting over an ipgre tunnel. The general idea > > > > > > is to prevent these virtio_net_hdr_to_skb calls from pointing the > > > > > > csum_start into headers that userspace was not responsible for > > > > > > populating. > > > > > > > > > > One issue with that is that dev->hard_header_len itself is imprecise > > > > > for protocols with variable length link layer headers. There, too, we > > > > > have had a variety of bug fixes in the past. > > > > > > > > > > It also adds cost to every user of virtio_net_hdr, while we only know > > > > > one issue in a rare case of the IP_GRE device. > > > > > > > > Quick question, the assumption is that the checksum should always be > > > > performed starting no earlier than the transport header right? Looking > > > > over virtio_net_hdr_to_skb it looks like it is already verifying the > > > > transport header is in the linear portion of the skb. I'm wondering if > > > > we couldn't just look at adding a check to verify the transport offset > > > > is <= csum start? We might also be able to get rid of one of the two > > > > calls to pskb_may_pull by doing that. > > > > > > Are you referring to this part in the .._NEEDS_CSUM branch? > > > > > > if (!skb_partial_csum_set(skb, start, off)) > > > return -EINVAL; > > > > > > p_off = skb_transport_offset(skb) + thlen; > > > if (!pskb_may_pull(skb, p_off)) > > > return -EINVAL; > > > > > > skb_partial_csum_set is actually what sets the transport offset, > > > derived from start. > > > > Ugh, I had overlooked that as I was more focused on the > > skb_probe_transport_header calls in the af_packet code. > > > > So we can have both the transport offset and the csum_start in a > > region that gets stripped by the ipgre code. Worse yet the inner > > transport header will also be pointing somewhere outside of the > > encapsulated region when we pass it off to skb_reset_inner_headers(). > > > > Maybe it would make sense to just have the check look into the > > transport offset instead of csum start as that way you are essentially > > addressing two possible issues instead of one, and it would > > effectively combine multiple checks as the uninitialized value is ~0 > > which should always be greater than "skb_headroom + tunnel->hlen + > > sizeof(struct iphdr)". I think you mentioned before placing a check > > just before you make the call to skb_pull in the GRE transmit path. > > Doing that we would at least reduce the impact as it would only apply > > in the header_ops case in ipgre_xmit instead of being applied to all > > the transmit paths which don't perform the pull. > > Do you mean > > if (dev->header_ops) { > + int pull_len = tunnel->hlen + sizeof(struct iphdr); > + > if (skb_cow_head(skb, 0)) > goto free_skb; > > tnl_params = (const struct iphdr *)skb->data; > > + if (pull_len > skb_transport_offset(skb)) > + goto free_skb; > + > /* Pull skb since ip_tunnel_xmit() needs skb->data pointing > * to gre header. > */ > - skb_pull(skb, tunnel->hlen + sizeof(struct iphdr)); > + skb_pull(skb, pull_len); > skb_reset_mac_header(skb); > > plus then > > static int gre_handle_offloads(struct sk_buff *skb, bool csum) > { > - /* Local checksum offload requires csum offload of the inner packet */ > - if (csum && skb->ip_summed == CHECKSUM_PARTIAL && > - skb_checksum_start(skb) < skb->data) > - return -EINVAL; > - > return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : > SKB_GSO_GRE); > } Yes, this is what I was thinking. We will also need an IPv6 version of this as well, and may want to add a comment clarifying that this is to prevent us from pointing inner offsets at pulled headers. It lets us drop the csum, ipsummed, and csum_start checks in favor of just the skb_transport_offset comparison which should be a net win since it reduces the number of paths the code is encountered in, and reduces the number of checks to just 1.
On Sat, Sep 4, 2021 at 5:54 PM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > On Sat, Sep 4, 2021 at 2:40 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Sat, Sep 4, 2021 at 11:37 AM Alexander Duyck > > <alexander.duyck@gmail.com> wrote: > > > > > > On Sat, Sep 4, 2021 at 7:46 AM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > On Fri, Sep 3, 2021 at 7:27 PM Alexander Duyck > > > > <alexander.duyck@gmail.com> wrote: > > > > > > > > > > On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn > > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > > > > > > > <snip> > > > > > > > > > > > > whereas if the offset is stored somewhere in the unstripped data we > > > > > > > could then drop the packet and count it as a drop without having to > > > > > > > modify the frame via the skb_pull. > > > > > > > > > > > > This is a broader issue that userspace can pass any csum_start as long > > > > > > as it is within packet bounds. We could address it here specifically > > > > > > for the GRE header. But that still leaves many potentially bad offsets > > > > > > further in the packet in this case, and all the other cases. Checking > > > > > > that specific header seems a bit arbitrary to me, and might actually > > > > > > give false confidence. > > > > > > > > > > > > We could certainly move the validation from gre_handle_offloads to > > > > > > before skb_pull, to make it more obvious *why* the check exists. > > > > > > > > > > Agreed. My main concern is that the csum_start is able to be located > > > > > somewhere where the userspace didn't write. For the most part the > > > > > csum_start and csum_offset just needs to be restricted to the regions > > > > > that the userspace actually wrote to. > > > > > > > > I don't quite follow. Even with this bug, the offset is somewhere userspace > > > > wrote. That data is just pulled. > > > > > > Sorry, I was thinking of the SOCK_DGRAM case where the header is added > > > via a call to dev_hard_header(). > > > > > > > > > > Maybe for those > > > > > > > cases we need to look at adding an unsigned int argument to > > > > > > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or > > > > > > > dev->hard_header_len in the cases where we have something like > > > > > > > af_packet that is transmitting over an ipgre tunnel. The general idea > > > > > > > is to prevent these virtio_net_hdr_to_skb calls from pointing the > > > > > > > csum_start into headers that userspace was not responsible for > > > > > > > populating. > > > > > > > > > > > > One issue with that is that dev->hard_header_len itself is imprecise > > > > > > for protocols with variable length link layer headers. There, too, we > > > > > > have had a variety of bug fixes in the past. > > > > > > > > > > > > It also adds cost to every user of virtio_net_hdr, while we only know > > > > > > one issue in a rare case of the IP_GRE device. > > > > > > > > > > Quick question, the assumption is that the checksum should always be > > > > > performed starting no earlier than the transport header right? Looking > > > > > over virtio_net_hdr_to_skb it looks like it is already verifying the > > > > > transport header is in the linear portion of the skb. I'm wondering if > > > > > we couldn't just look at adding a check to verify the transport offset > > > > > is <= csum start? We might also be able to get rid of one of the two > > > > > calls to pskb_may_pull by doing that. > > > > > > > > Are you referring to this part in the .._NEEDS_CSUM branch? > > > > > > > > if (!skb_partial_csum_set(skb, start, off)) > > > > return -EINVAL; > > > > > > > > p_off = skb_transport_offset(skb) + thlen; > > > > if (!pskb_may_pull(skb, p_off)) > > > > return -EINVAL; > > > > > > > > skb_partial_csum_set is actually what sets the transport offset, > > > > derived from start. > > > > > > Ugh, I had overlooked that as I was more focused on the > > > skb_probe_transport_header calls in the af_packet code. > > > > > > So we can have both the transport offset and the csum_start in a > > > region that gets stripped by the ipgre code. Worse yet the inner > > > transport header will also be pointing somewhere outside of the > > > encapsulated region when we pass it off to skb_reset_inner_headers(). > > > > > > Maybe it would make sense to just have the check look into the > > > transport offset instead of csum start as that way you are essentially > > > addressing two possible issues instead of one, and it would > > > effectively combine multiple checks as the uninitialized value is ~0 > > > which should always be greater than "skb_headroom + tunnel->hlen + > > > sizeof(struct iphdr)". I think you mentioned before placing a check > > > just before you make the call to skb_pull in the GRE transmit path. > > > Doing that we would at least reduce the impact as it would only apply > > > in the header_ops case in ipgre_xmit instead of being applied to all > > > the transmit paths which don't perform the pull. > > > > Do you mean > > > > if (dev->header_ops) { > > + int pull_len = tunnel->hlen + sizeof(struct iphdr); > > + > > if (skb_cow_head(skb, 0)) > > goto free_skb; > > > > tnl_params = (const struct iphdr *)skb->data; > > > > + if (pull_len > skb_transport_offset(skb)) > > + goto free_skb; > > + > > /* Pull skb since ip_tunnel_xmit() needs skb->data pointing > > * to gre header. > > */ > > - skb_pull(skb, tunnel->hlen + sizeof(struct iphdr)); > > + skb_pull(skb, pull_len); > > skb_reset_mac_header(skb); > > > > plus then > > > > static int gre_handle_offloads(struct sk_buff *skb, bool csum) > > { > > - /* Local checksum offload requires csum offload of the inner packet */ > > - if (csum && skb->ip_summed == CHECKSUM_PARTIAL && > > - skb_checksum_start(skb) < skb->data) > > - return -EINVAL; > > - > > return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : > > SKB_GSO_GRE); > > } > > Yes, this is what I was thinking. We will also need an IPv6 version of > this as well, and may want to add a comment clarifying that this is to > prevent us from pointing inner offsets at pulled headers. > > It lets us drop the csum, ipsummed, and csum_start checks in favor of > just the skb_transport_offset comparison which should be a net win > since it reduces the number of paths the code is encountered in, and > reduces the number of checks to just 1. Okay. Yes, this looks better to me too. Thanks. Do you want to submit it? Or I can do it, either way.
On Sat, Sep 4, 2021 at 3:05 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Sat, Sep 4, 2021 at 5:54 PM Alexander Duyck > <alexander.duyck@gmail.com> wrote: > > > > On Sat, Sep 4, 2021 at 2:40 PM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > On Sat, Sep 4, 2021 at 11:37 AM Alexander Duyck > > > <alexander.duyck@gmail.com> wrote: > > > > > > > > On Sat, Sep 4, 2021 at 7:46 AM Willem de Bruijn > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > On Fri, Sep 3, 2021 at 7:27 PM Alexander Duyck > > > > > <alexander.duyck@gmail.com> wrote: > > > > > > > > > > > > On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn > > > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > <snip> > > > > > > > > > > > > > > whereas if the offset is stored somewhere in the unstripped data we > > > > > > > > could then drop the packet and count it as a drop without having to > > > > > > > > modify the frame via the skb_pull. > > > > > > > > > > > > > > This is a broader issue that userspace can pass any csum_start as long > > > > > > > as it is within packet bounds. We could address it here specifically > > > > > > > for the GRE header. But that still leaves many potentially bad offsets > > > > > > > further in the packet in this case, and all the other cases. Checking > > > > > > > that specific header seems a bit arbitrary to me, and might actually > > > > > > > give false confidence. > > > > > > > > > > > > > > We could certainly move the validation from gre_handle_offloads to > > > > > > > before skb_pull, to make it more obvious *why* the check exists. > > > > > > > > > > > > Agreed. My main concern is that the csum_start is able to be located > > > > > > somewhere where the userspace didn't write. For the most part the > > > > > > csum_start and csum_offset just needs to be restricted to the regions > > > > > > that the userspace actually wrote to. > > > > > > > > > > I don't quite follow. Even with this bug, the offset is somewhere userspace > > > > > wrote. That data is just pulled. > > > > > > > > Sorry, I was thinking of the SOCK_DGRAM case where the header is added > > > > via a call to dev_hard_header(). > > > > > > > > > > > > Maybe for those > > > > > > > > cases we need to look at adding an unsigned int argument to > > > > > > > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or > > > > > > > > dev->hard_header_len in the cases where we have something like > > > > > > > > af_packet that is transmitting over an ipgre tunnel. The general idea > > > > > > > > is to prevent these virtio_net_hdr_to_skb calls from pointing the > > > > > > > > csum_start into headers that userspace was not responsible for > > > > > > > > populating. > > > > > > > > > > > > > > One issue with that is that dev->hard_header_len itself is imprecise > > > > > > > for protocols with variable length link layer headers. There, too, we > > > > > > > have had a variety of bug fixes in the past. > > > > > > > > > > > > > > It also adds cost to every user of virtio_net_hdr, while we only know > > > > > > > one issue in a rare case of the IP_GRE device. > > > > > > > > > > > > Quick question, the assumption is that the checksum should always be > > > > > > performed starting no earlier than the transport header right? Looking > > > > > > over virtio_net_hdr_to_skb it looks like it is already verifying the > > > > > > transport header is in the linear portion of the skb. I'm wondering if > > > > > > we couldn't just look at adding a check to verify the transport offset > > > > > > is <= csum start? We might also be able to get rid of one of the two > > > > > > calls to pskb_may_pull by doing that. > > > > > > > > > > Are you referring to this part in the .._NEEDS_CSUM branch? > > > > > > > > > > if (!skb_partial_csum_set(skb, start, off)) > > > > > return -EINVAL; > > > > > > > > > > p_off = skb_transport_offset(skb) + thlen; > > > > > if (!pskb_may_pull(skb, p_off)) > > > > > return -EINVAL; > > > > > > > > > > skb_partial_csum_set is actually what sets the transport offset, > > > > > derived from start. > > > > > > > > Ugh, I had overlooked that as I was more focused on the > > > > skb_probe_transport_header calls in the af_packet code. > > > > > > > > So we can have both the transport offset and the csum_start in a > > > > region that gets stripped by the ipgre code. Worse yet the inner > > > > transport header will also be pointing somewhere outside of the > > > > encapsulated region when we pass it off to skb_reset_inner_headers(). > > > > > > > > Maybe it would make sense to just have the check look into the > > > > transport offset instead of csum start as that way you are essentially > > > > addressing two possible issues instead of one, and it would > > > > effectively combine multiple checks as the uninitialized value is ~0 > > > > which should always be greater than "skb_headroom + tunnel->hlen + > > > > sizeof(struct iphdr)". I think you mentioned before placing a check > > > > just before you make the call to skb_pull in the GRE transmit path. > > > > Doing that we would at least reduce the impact as it would only apply > > > > in the header_ops case in ipgre_xmit instead of being applied to all > > > > the transmit paths which don't perform the pull. > > > > > > Do you mean > > > > > > if (dev->header_ops) { > > > + int pull_len = tunnel->hlen + sizeof(struct iphdr); > > > + > > > if (skb_cow_head(skb, 0)) > > > goto free_skb; > > > > > > tnl_params = (const struct iphdr *)skb->data; > > > > > > + if (pull_len > skb_transport_offset(skb)) > > > + goto free_skb; > > > + > > > /* Pull skb since ip_tunnel_xmit() needs skb->data pointing > > > * to gre header. > > > */ > > > - skb_pull(skb, tunnel->hlen + sizeof(struct iphdr)); > > > + skb_pull(skb, pull_len); > > > skb_reset_mac_header(skb); > > > > > > plus then > > > > > > static int gre_handle_offloads(struct sk_buff *skb, bool csum) > > > { > > > - /* Local checksum offload requires csum offload of the inner packet */ > > > - if (csum && skb->ip_summed == CHECKSUM_PARTIAL && > > > - skb_checksum_start(skb) < skb->data) > > > - return -EINVAL; > > > - > > > return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : > > > SKB_GSO_GRE); > > > } > > > > Yes, this is what I was thinking. We will also need an IPv6 version of > > this as well, and may want to add a comment clarifying that this is to > > prevent us from pointing inner offsets at pulled headers. > > > > It lets us drop the csum, ipsummed, and csum_start checks in favor of > > just the skb_transport_offset comparison which should be a net win > > since it reduces the number of paths the code is encountered in, and > > reduces the number of checks to just 1. > > Okay. Yes, this looks better to me too. Thanks. > > Do you want to submit it? Or I can do it, either way. You can do it since you have essentially already written half the code.. :) Thanks. - Alex
On Sat, Sep 4, 2021 at 7:47 PM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > On Sat, Sep 4, 2021 at 3:05 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Sat, Sep 4, 2021 at 5:54 PM Alexander Duyck > > <alexander.duyck@gmail.com> wrote: > > > > > > On Sat, Sep 4, 2021 at 2:40 PM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > On Sat, Sep 4, 2021 at 11:37 AM Alexander Duyck > > > > <alexander.duyck@gmail.com> wrote: > > > > > > > > > > On Sat, Sep 4, 2021 at 7:46 AM Willem de Bruijn > > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > > > On Fri, Sep 3, 2021 at 7:27 PM Alexander Duyck > > > > > > <alexander.duyck@gmail.com> wrote: > > > > > > > > > > > > > > On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn > > > > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > <snip> > > > > > > > > > > > > > > > > whereas if the offset is stored somewhere in the unstripped data we > > > > > > > > > could then drop the packet and count it as a drop without having to > > > > > > > > > modify the frame via the skb_pull. > > > > > > > > > > > > > > > > This is a broader issue that userspace can pass any csum_start as long > > > > > > > > as it is within packet bounds. We could address it here specifically > > > > > > > > for the GRE header. But that still leaves many potentially bad offsets > > > > > > > > further in the packet in this case, and all the other cases. Checking > > > > > > > > that specific header seems a bit arbitrary to me, and might actually > > > > > > > > give false confidence. > > > > > > > > > > > > > > > > We could certainly move the validation from gre_handle_offloads to > > > > > > > > before skb_pull, to make it more obvious *why* the check exists. > > > > > > > > > > > > > > Agreed. My main concern is that the csum_start is able to be located > > > > > > > somewhere where the userspace didn't write. For the most part the > > > > > > > csum_start and csum_offset just needs to be restricted to the regions > > > > > > > that the userspace actually wrote to. > > > > > > > > > > > > I don't quite follow. Even with this bug, the offset is somewhere userspace > > > > > > wrote. That data is just pulled. > > > > > > > > > > Sorry, I was thinking of the SOCK_DGRAM case where the header is added > > > > > via a call to dev_hard_header(). > > > > > > > > > > > > > > Maybe for those > > > > > > > > > cases we need to look at adding an unsigned int argument to > > > > > > > > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or > > > > > > > > > dev->hard_header_len in the cases where we have something like > > > > > > > > > af_packet that is transmitting over an ipgre tunnel. The general idea > > > > > > > > > is to prevent these virtio_net_hdr_to_skb calls from pointing the > > > > > > > > > csum_start into headers that userspace was not responsible for > > > > > > > > > populating. > > > > > > > > > > > > > > > > One issue with that is that dev->hard_header_len itself is imprecise > > > > > > > > for protocols with variable length link layer headers. There, too, we > > > > > > > > have had a variety of bug fixes in the past. > > > > > > > > > > > > > > > > It also adds cost to every user of virtio_net_hdr, while we only know > > > > > > > > one issue in a rare case of the IP_GRE device. > > > > > > > > > > > > > > Quick question, the assumption is that the checksum should always be > > > > > > > performed starting no earlier than the transport header right? Looking > > > > > > > over virtio_net_hdr_to_skb it looks like it is already verifying the > > > > > > > transport header is in the linear portion of the skb. I'm wondering if > > > > > > > we couldn't just look at adding a check to verify the transport offset > > > > > > > is <= csum start? We might also be able to get rid of one of the two > > > > > > > calls to pskb_may_pull by doing that. > > > > > > > > > > > > Are you referring to this part in the .._NEEDS_CSUM branch? > > > > > > > > > > > > if (!skb_partial_csum_set(skb, start, off)) > > > > > > return -EINVAL; > > > > > > > > > > > > p_off = skb_transport_offset(skb) + thlen; > > > > > > if (!pskb_may_pull(skb, p_off)) > > > > > > return -EINVAL; > > > > > > > > > > > > skb_partial_csum_set is actually what sets the transport offset, > > > > > > derived from start. > > > > > > > > > > Ugh, I had overlooked that as I was more focused on the > > > > > skb_probe_transport_header calls in the af_packet code. > > > > > > > > > > So we can have both the transport offset and the csum_start in a > > > > > region that gets stripped by the ipgre code. Worse yet the inner > > > > > transport header will also be pointing somewhere outside of the > > > > > encapsulated region when we pass it off to skb_reset_inner_headers(). > > > > > > > > > > Maybe it would make sense to just have the check look into the > > > > > transport offset instead of csum start as that way you are essentially > > > > > addressing two possible issues instead of one, and it would > > > > > effectively combine multiple checks as the uninitialized value is ~0 > > > > > which should always be greater than "skb_headroom + tunnel->hlen + > > > > > sizeof(struct iphdr)". I think you mentioned before placing a check > > > > > just before you make the call to skb_pull in the GRE transmit path. > > > > > Doing that we would at least reduce the impact as it would only apply > > > > > in the header_ops case in ipgre_xmit instead of being applied to all > > > > > the transmit paths which don't perform the pull. > > > > > > > > Do you mean > > > > > > > > if (dev->header_ops) { > > > > + int pull_len = tunnel->hlen + sizeof(struct iphdr); > > > > + > > > > if (skb_cow_head(skb, 0)) > > > > goto free_skb; > > > > > > > > tnl_params = (const struct iphdr *)skb->data; > > > > > > > > + if (pull_len > skb_transport_offset(skb)) > > > > + goto free_skb; > > > > + > > > > /* Pull skb since ip_tunnel_xmit() needs skb->data pointing > > > > * to gre header. > > > > */ > > > > - skb_pull(skb, tunnel->hlen + sizeof(struct iphdr)); > > > > + skb_pull(skb, pull_len); > > > > skb_reset_mac_header(skb); > > > > > > > > plus then > > > > > > > > static int gre_handle_offloads(struct sk_buff *skb, bool csum) > > > > { > > > > - /* Local checksum offload requires csum offload of the inner packet */ > > > > - if (csum && skb->ip_summed == CHECKSUM_PARTIAL && > > > > - skb_checksum_start(skb) < skb->data) > > > > - return -EINVAL; > > > > - > > > > return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : > > > > SKB_GSO_GRE); > > > > } > > > > > > Yes, this is what I was thinking. We will also need an IPv6 version of > > > this as well, and may want to add a comment clarifying that this is to > > > prevent us from pointing inner offsets at pulled headers. > > > > > > It lets us drop the csum, ipsummed, and csum_start checks in favor of > > > just the skb_transport_offset comparison which should be a net win > > > since it reduces the number of paths the code is encountered in, and > > > reduces the number of checks to just 1. > > > > Okay. Yes, this looks better to me too. Thanks. > > > > Do you want to submit it? Or I can do it, either way. > > You can do it since you have essentially already written half the code.. :) Sent, but only the ipv4 patch. I actually do not see an equivalent skb_pull path in ip6_gre.c. Will take a closer look later, but don't have time for that now. https://patchwork.kernel.org/project/netdevbpf/patch/20210905152109.1805619-1-willemdebruijn.kernel@gmail.com/
On Sun, Sep 5, 2021 at 8:24 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Sat, Sep 4, 2021 at 7:47 PM Alexander Duyck > <alexander.duyck@gmail.com> wrote: <snip> > > You can do it since you have essentially already written half the code.. :) > > Sent, but only the ipv4 patch. > > I actually do not see an equivalent skb_pull path in ip6_gre.c. Will > take a closer look later, but don't have time for that now. > > https://patchwork.kernel.org/project/netdevbpf/patch/20210905152109.1805619-1-willemdebruijn.kernel@gmail.com/ So does that mean that commit 9cf448c200ba ("ip6_gre: add validation for csum_start") is adding overhead that doesn't really address anything, and is introducing false positives? If so, should we just revert it? Thanks, - Alex
On Sun, Sep 5, 2021 at 11:53 AM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > On Sun, Sep 5, 2021 at 8:24 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Sat, Sep 4, 2021 at 7:47 PM Alexander Duyck > > <alexander.duyck@gmail.com> wrote: > > <snip> > > > > You can do it since you have essentially already written half the code.. :) > > > > Sent, but only the ipv4 patch. > > > > I actually do not see an equivalent skb_pull path in ip6_gre.c. Will > > take a closer look later, but don't have time for that now. > > > > https://patchwork.kernel.org/project/netdevbpf/patch/20210905152109.1805619-1-willemdebruijn.kernel@gmail.com/ > > So does that mean that commit 9cf448c200ba ("ip6_gre: add validation > for csum_start") is adding overhead that doesn't really address > anything, and is introducing false positives? If so, should we just > revert it? I agree. I didn't immediately understand how there can be an ip6gre_header analogous to ipgre_header (for headers_ops.create), but no skb_pull in ip6gre_tunnel_xmit equivalent to the one in ipgre_xmit. One difference between the two of them is the return value, which returns an offset: ip6gre_header returns t->hlen. ipgre_header returns t->hlen + sizeof(*iph). This extra offset was introduced in commit 77a482bdb2e6 ("ip_gre: fix ipgre_header to return correct offset") as a fix to previous commit c544193214 ("GRE: Refactor GRE tunneling code."). Before the latter commit the IPv4 header was included in t->hlen, similar to IPv6. After that commit it is no longer. The latter patch is also the one that introduces the skb_pull in ipgre_xmit. I can't say that I fully understand how this works for IPv6. But I think that is sufficient to understand that this skb_pull issue does not affect the ip6_gre path and we best just revert that commit.
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 177d26d8fb9c..09311992a617 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev, static int gre_handle_offloads(struct sk_buff *skb, bool csum) { - if (csum && skb_checksum_start(skb) < skb->data) + /* Local checksum offload requires csum offload of the inner packet */ + if (csum && skb->ip_summed == CHECKSUM_PARTIAL && + skb_checksum_start(skb) < skb->data) return -EINVAL; + return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE); }