diff mbox series

[net] ip_gre: validate csum_start only if CHECKSUM_PARTIAL

Message ID 20210902193447.94039-2-willemdebruijn.kernel@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] ip_gre: validate csum_start only if CHECKSUM_PARTIAL | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers warning 2 maintainers not CCed: yoshfuji@linux-ipv6.org dsahern@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 12 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link

Commit Message

Willem de Bruijn Sept. 2, 2021, 7:34 p.m. UTC
From: Willem de Bruijn <willemb@google.com>

Only test integrity of csum_start if checksum offload is configured.

With checksum offload and GRE tunnel checksum, gre_build_header will
cheaply build the GRE checksum using local checksum offload. This
depends on inner packet csum offload, and thus that csum_start points
behind GRE. But validate this condition only with checksum offload.

Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start")
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/ipv4/ip_gre.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Alexander Duyck Sept. 2, 2021, 8:25 p.m. UTC | #1
On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> From: Willem de Bruijn <willemb@google.com>
>
> Only test integrity of csum_start if checksum offload is configured.
>
> With checksum offload and GRE tunnel checksum, gre_build_header will
> cheaply build the GRE checksum using local checksum offload. This
> depends on inner packet csum offload, and thus that csum_start points
> behind GRE. But validate this condition only with checksum offload.
>
> Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
> Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start")
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  net/ipv4/ip_gre.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 177d26d8fb9c..09311992a617 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
>
>  static int gre_handle_offloads(struct sk_buff *skb, bool csum)
>  {
> -       if (csum && skb_checksum_start(skb) < skb->data)
> +       /* Local checksum offload requires csum offload of the inner packet */
> +       if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
> +           skb_checksum_start(skb) < skb->data)
>                 return -EINVAL;
> +
>         return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE);
>  }

So a few minor nits.

First I think we need this for both v4 and v6 since it looks like this
code is reproduced for net/ipv6/ip6_gre.c.

Second I don't know if we even need to bother including the "csum"
portion of the lookup since that technically is just telling us if the
GRE tunnel is requesting a checksum or not and I am not sure that
applies to the fact that the inner L4 header is going to be what is
requesting the checksum offload most likely.

Also maybe this should be triggering a WARN_ON_ONCE if we hit this as
the path triggering this should be fixed rather than us silently
dropping frames. We should be figuring out what cases are resulting in
us getting CHECKSUM_PARTIAL without skb_checksum_start being set.
Willem de Bruijn Sept. 2, 2021, 8:29 p.m. UTC | #2
On Thu, Sep 2, 2021 at 4:25 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > From: Willem de Bruijn <willemb@google.com>
> >
> > Only test integrity of csum_start if checksum offload is configured.
> >
> > With checksum offload and GRE tunnel checksum, gre_build_header will
> > cheaply build the GRE checksum using local checksum offload. This
> > depends on inner packet csum offload, and thus that csum_start points
> > behind GRE. But validate this condition only with checksum offload.
> >
> > Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
> > Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start")
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > ---
> >  net/ipv4/ip_gre.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> > index 177d26d8fb9c..09311992a617 100644
> > --- a/net/ipv4/ip_gre.c
> > +++ b/net/ipv4/ip_gre.c
> > @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
> >
> >  static int gre_handle_offloads(struct sk_buff *skb, bool csum)
> >  {
> > -       if (csum && skb_checksum_start(skb) < skb->data)
> > +       /* Local checksum offload requires csum offload of the inner packet */
> > +       if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
> > +           skb_checksum_start(skb) < skb->data)
> >                 return -EINVAL;
> > +
> >         return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE);
> >  }

Thanks for taking a look.

> So a few minor nits.
>
> First I think we need this for both v4 and v6 since it looks like this
> code is reproduced for net/ipv6/ip6_gre.c.

I sent a separate patch for v6. Perhaps should have made it a series
to make this more clear.

> Second I don't know if we even need to bother including the "csum"
> portion of the lookup since that technically is just telling us if the
> GRE tunnel is requesting a checksum or not and I am not sure that
> applies to the fact that the inner L4 header is going to be what is
> requesting the checksum offload most likely.

This test introduced in the original patch specifically protects the
GRE tunnel checksum calculation using lco_csum. The regular inner
packet path likely is already robust (as similar bug reports would be
more likely for that more common case).

> Also maybe this should be triggering a WARN_ON_ONCE if we hit this as
> the path triggering this should be fixed rather than us silently
> dropping frames. We should be figuring out what cases are resulting in
> us getting CHECKSUM_PARTIAL without skb_checksum_start being set.

We already know that bad packets can enter the kernel and trigger
this, using packet sockets and virtio_net_hdr. Unfortunately, this
*is* the fix.
Alexander Duyck Sept. 2, 2021, 9:17 p.m. UTC | #3
On Thu, Sep 2, 2021 at 1:30 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Thu, Sep 2, 2021 at 4:25 PM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > From: Willem de Bruijn <willemb@google.com>
> > >
> > > Only test integrity of csum_start if checksum offload is configured.
> > >
> > > With checksum offload and GRE tunnel checksum, gre_build_header will
> > > cheaply build the GRE checksum using local checksum offload. This
> > > depends on inner packet csum offload, and thus that csum_start points
> > > behind GRE. But validate this condition only with checksum offload.
> > >
> > > Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
> > > Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start")
> > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > ---
> > >  net/ipv4/ip_gre.c | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> > > index 177d26d8fb9c..09311992a617 100644
> > > --- a/net/ipv4/ip_gre.c
> > > +++ b/net/ipv4/ip_gre.c
> > > @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
> > >
> > >  static int gre_handle_offloads(struct sk_buff *skb, bool csum)
> > >  {
> > > -       if (csum && skb_checksum_start(skb) < skb->data)
> > > +       /* Local checksum offload requires csum offload of the inner packet */
> > > +       if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
> > > +           skb_checksum_start(skb) < skb->data)
> > >                 return -EINVAL;
> > > +
> > >         return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE);
> > >  }
>
> Thanks for taking a look.
>
> > So a few minor nits.
> >
> > First I think we need this for both v4 and v6 since it looks like this
> > code is reproduced for net/ipv6/ip6_gre.c.
>
> I sent a separate patch for v6. Perhaps should have made it a series
> to make this more clear.

Yeah, that was part of the reason I assumed the ipv6 patch was overlooked.

> > Second I don't know if we even need to bother including the "csum"
> > portion of the lookup since that technically is just telling us if the
> > GRE tunnel is requesting a checksum or not and I am not sure that
> > applies to the fact that the inner L4 header is going to be what is
> > requesting the checksum offload most likely.
>
> This test introduced in the original patch specifically protects the
> GRE tunnel checksum calculation using lco_csum. The regular inner
> packet path likely is already robust (as similar bug reports would be
> more likely for that more common case).

I was just thinking in terms of shaving off some extra comparisons. I
suppose it depends on if this is being inlined or not. If it is being
inlined there are at least 2 cases where the if statement would be
dropped since csum is explicitly false. My thought was that by just
jumping straight to the skb->ip_summed check we can drop the lookup
for csum since it would be implied by the fact that skb->ip_summed is
being checked. It would create a broader check, but at the same time
it would reduce the number of comparisons in the call.

> > Also maybe this should be triggering a WARN_ON_ONCE if we hit this as
> > the path triggering this should be fixed rather than us silently
> > dropping frames. We should be figuring out what cases are resulting in
> > us getting CHECKSUM_PARTIAL without skb_checksum_start being set.
>
> We already know that bad packets can enter the kernel and trigger
> this, using packet sockets and virtio_net_hdr. Unfortunately, this
> *is* the fix.

It sounds almost like we need a CHECKSUM_DODGY to go along with the
SKB_GSO_DODGY in order to resolve these kinds of issues.

So essentially we have a source that we know can give us bad packets.
We really should be performing some sort of validation on these much
earlier in order to clean them up so that we don't have to add this
sort of exception handling code all over the Tx path.
Willem de Bruijn Sept. 2, 2021, 9:44 p.m. UTC | #4
On Thu, Sep 2, 2021 at 5:17 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Thu, Sep 2, 2021 at 1:30 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Thu, Sep 2, 2021 at 4:25 PM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:
> > >
> > > On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > From: Willem de Bruijn <willemb@google.com>
> > > >
> > > > Only test integrity of csum_start if checksum offload is configured.
> > > >
> > > > With checksum offload and GRE tunnel checksum, gre_build_header will
> > > > cheaply build the GRE checksum using local checksum offload. This
> > > > depends on inner packet csum offload, and thus that csum_start points
> > > > behind GRE. But validate this condition only with checksum offload.
> > > >
> > > > Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
> > > > Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start")
> > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > ---
> > > >  net/ipv4/ip_gre.c | 5 ++++-
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> > > > index 177d26d8fb9c..09311992a617 100644
> > > > --- a/net/ipv4/ip_gre.c
> > > > +++ b/net/ipv4/ip_gre.c
> > > > @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
> > > >
> > > >  static int gre_handle_offloads(struct sk_buff *skb, bool csum)
> > > >  {
> > > > -       if (csum && skb_checksum_start(skb) < skb->data)
> > > > +       /* Local checksum offload requires csum offload of the inner packet */
> > > > +       if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
> > > > +           skb_checksum_start(skb) < skb->data)
> > > >                 return -EINVAL;
> > > > +
> > > >         return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE);
> > > >  }
> >
> > Thanks for taking a look.
> >
> > > So a few minor nits.
> > >
> > > First I think we need this for both v4 and v6 since it looks like this
> > > code is reproduced for net/ipv6/ip6_gre.c.
> >
> > I sent a separate patch for v6. Perhaps should have made it a series
> > to make this more clear.
>
> Yeah, that was part of the reason I assumed the ipv6 patch was overlooked.

I was in two minds only because a series should come with a cover
letter and thus one extra email added to the firehose. But this makes
clear the value. Will just do that in the future.

> > > Second I don't know if we even need to bother including the "csum"
> > > portion of the lookup since that technically is just telling us if the
> > > GRE tunnel is requesting a checksum or not and I am not sure that
> > > applies to the fact that the inner L4 header is going to be what is
> > > requesting the checksum offload most likely.
> >
> > This test introduced in the original patch specifically protects the
> > GRE tunnel checksum calculation using lco_csum. The regular inner
> > packet path likely is already robust (as similar bug reports would be
> > more likely for that more common case).
>
> I was just thinking in terms of shaving off some extra comparisons. I
> suppose it depends on if this is being inlined or not. If it is being
> inlined there are at least 2 cases where the if statement would be
> dropped since csum is explicitly false. My thought was that by just
> jumping straight to the skb->ip_summed check we can drop the lookup
> for csum since it would be implied by the fact that skb->ip_summed is
> being checked. It would create a broader check, but at the same time
> it would reduce the number of comparisons in the call.

Most GRE tunnels don't have checksums and csum is likely in a register,
as function argument, so it likely is the cheaper test?

More functional argument: if !csum, the GRE tunnel does not care about
the integrity of csum_start. So I think that it should not read it at all.

> > > Also maybe this should be triggering a WARN_ON_ONCE if we hit this as
> > > the path triggering this should be fixed rather than us silently
> > > dropping frames. We should be figuring out what cases are resulting in
> > > us getting CHECKSUM_PARTIAL without skb_checksum_start being set.
> >
> > We already know that bad packets can enter the kernel and trigger
> > this, using packet sockets and virtio_net_hdr. Unfortunately, this
> > *is* the fix.
>
> It sounds almost like we need a CHECKSUM_DODGY to go along with the
> SKB_GSO_DODGY in order to resolve these kinds of issues.
>
> So essentially we have a source that we know can give us bad packets.
> We really should be performing some sort of validation on these much
> earlier in order to clean them up so that we don't have to add this
> sort of exception handling code all over the Tx path.

Agreed with the concern. I've been arguing for validation at kernel
entry of virtio_net_hdr. As an optional feature, if nothing else:
https://patchwork.kernel.org/project/netdevbpf/patch/20210616203448.995314-3-tannerlove.kernel@gmail.com/

But unless we accept the cost of full parsing to identify the
transport headers, we cannot predict at that stage whether the field
is bogus, let alone whether it might trigger a bug later on. We do
basic validation: csum_start is verified to be within the skb linear,
so not totally out of bounds.

That the offset is not just bogus, but causes a bug appears to be a
rare exception peculiar to the GRE tunnel. Only it pulls the outer
header (in ipgre_xmit), applies lco_csum and can so trigger negative
overflow, as far as I could tell. That's why we decided to add the
limited check local to that code.

I'm not sure how we would use CHECKSUM_DODGY in practice.
Alexander Duyck Sept. 3, 2021, 2:01 a.m. UTC | #5
On Thu, Sep 2, 2021 at 2:45 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Thu, Sep 2, 2021 at 5:17 PM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Thu, Sep 2, 2021 at 1:30 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > On Thu, Sep 2, 2021 at 4:25 PM Alexander Duyck
> > > <alexander.duyck@gmail.com> wrote:
> > > >
> > > > On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn
> > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > >
> > > > > From: Willem de Bruijn <willemb@google.com>
> > > > >
> > > > > Only test integrity of csum_start if checksum offload is configured.
> > > > >
> > > > > With checksum offload and GRE tunnel checksum, gre_build_header will
> > > > > cheaply build the GRE checksum using local checksum offload. This
> > > > > depends on inner packet csum offload, and thus that csum_start points
> > > > > behind GRE. But validate this condition only with checksum offload.
> > > > >
> > > > > Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
> > > > > Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start")
> > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > ---
> > > > >  net/ipv4/ip_gre.c | 5 ++++-
> > > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> > > > > index 177d26d8fb9c..09311992a617 100644
> > > > > --- a/net/ipv4/ip_gre.c
> > > > > +++ b/net/ipv4/ip_gre.c
> > > > > @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
> > > > >
> > > > >  static int gre_handle_offloads(struct sk_buff *skb, bool csum)
> > > > >  {
> > > > > -       if (csum && skb_checksum_start(skb) < skb->data)
> > > > > +       /* Local checksum offload requires csum offload of the inner packet */
> > > > > +       if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
> > > > > +           skb_checksum_start(skb) < skb->data)
> > > > >                 return -EINVAL;
> > > > > +
> > > > >         return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE);
> > > > >  }
> > >
> > > Thanks for taking a look.
> > >
> > > > So a few minor nits.
> > > >
> > > > First I think we need this for both v4 and v6 since it looks like this
> > > > code is reproduced for net/ipv6/ip6_gre.c.
> > >
> > > I sent a separate patch for v6. Perhaps should have made it a series
> > > to make this more clear.
> >
> > Yeah, that was part of the reason I assumed the ipv6 patch was overlooked.
>
> I was in two minds only because a series should come with a cover
> letter and thus one extra email added to the firehose. But this makes
> clear the value. Will just do that in the future.
>
> > > > Second I don't know if we even need to bother including the "csum"
> > > > portion of the lookup since that technically is just telling us if the
> > > > GRE tunnel is requesting a checksum or not and I am not sure that
> > > > applies to the fact that the inner L4 header is going to be what is
> > > > requesting the checksum offload most likely.
> > >
> > > This test introduced in the original patch specifically protects the
> > > GRE tunnel checksum calculation using lco_csum. The regular inner
> > > packet path likely is already robust (as similar bug reports would be
> > > more likely for that more common case).
> >
> > I was just thinking in terms of shaving off some extra comparisons. I
> > suppose it depends on if this is being inlined or not. If it is being
> > inlined there are at least 2 cases where the if statement would be
> > dropped since csum is explicitly false. My thought was that by just
> > jumping straight to the skb->ip_summed check we can drop the lookup
> > for csum since it would be implied by the fact that skb->ip_summed is
> > being checked. It would create a broader check, but at the same time
> > it would reduce the number of comparisons in the call.
>
> Most GRE tunnels don't have checksums and csum is likely in a register,
> as function argument, so it likely is the cheaper test?
>
> More functional argument: if !csum, the GRE tunnel does not care about
> the integrity of csum_start. So I think that it should not read it at all.

The problem as I see it is that it is just passing the problem on.
Adding the check to the GRE drivers only really fixes one spot that
exposes the issue. In this case it was triggered because the lco_csum
was causing issues. What happens when one of these packets makes it
down to hardware and it has to deal with the malformed csum_start? I
suspect we end up potentially causing issues where Tx metadata could
be malformed resulting in a dropped packet at the best case, and
kernel panics at the worst.

> > > > Also maybe this should be triggering a WARN_ON_ONCE if we hit this as
> > > > the path triggering this should be fixed rather than us silently
> > > > dropping frames. We should be figuring out what cases are resulting in
> > > > us getting CHECKSUM_PARTIAL without skb_checksum_start being set.
> > >
> > > We already know that bad packets can enter the kernel and trigger
> > > this, using packet sockets and virtio_net_hdr. Unfortunately, this
> > > *is* the fix.
> >
> > It sounds almost like we need a CHECKSUM_DODGY to go along with the
> > SKB_GSO_DODGY in order to resolve these kinds of issues.
> >
> > So essentially we have a source that we know can give us bad packets.
> > We really should be performing some sort of validation on these much
> > earlier in order to clean them up so that we don't have to add this
> > sort of exception handling code all over the Tx path.
>
> Agreed with the concern. I've been arguing for validation at kernel
> entry of virtio_net_hdr. As an optional feature, if nothing else:
> https://patchwork.kernel.org/project/netdevbpf/patch/20210616203448.995314-3-tannerlove.kernel@gmail.com/
>
> But unless we accept the cost of full parsing to identify the
> transport headers, we cannot predict at that stage whether the field
> is bogus, let alone whether it might trigger a bug later on. We do
> basic validation: csum_start is verified to be within the skb linear,
> so not totally out of bounds.
>
> That the offset is not just bogus, but causes a bug appears to be a
> rare exception peculiar to the GRE tunnel. Only it pulls the outer
> header (in ipgre_xmit), applies lco_csum and can so trigger negative
> overflow, as far as I could tell. That's why we decided to add the
> limited check local to that code.
>
> I'm not sure how we would use CHECKSUM_DODGY in practice.

The issue is drivers with NETIF_F_HW_CSUM would be expecting a
reasonable csum_start and csum_offset. If the hardware is only
advertising that and we are expecting it to offload the checksum we
should probably be doing some sort of validation on user derived
inputs to make sure that they aren't totally ridiculous as is the case
here where the original issue was that the csum_start wasn't even in
the packet data.

Would it maybe make sense to look at reverting the earlier fixes and
instead updating skb_partial_csum_set so that we cannot write a
csum_start value that is less than the start of skb->data? That way we
are addressing this at the source instead of allowing the garbage data
to propagate further down the stack and having to drop it at the
driver level which is going to have us playing whack a mole trying to
fix it where it pops up. It seems like we are already validating the
upper bounds for these values in that function so why not validate the
lower bounds as well?
Willem de Bruijn Sept. 3, 2021, 2:40 a.m. UTC | #6
On Thu, Sep 2, 2021 at 10:01 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Thu, Sep 2, 2021 at 2:45 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Thu, Sep 2, 2021 at 5:17 PM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:
> > >
> > > On Thu, Sep 2, 2021 at 1:30 PM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > On Thu, Sep 2, 2021 at 4:25 PM Alexander Duyck
> > > > <alexander.duyck@gmail.com> wrote:
> > > > >
> > > > > On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn
> > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > >
> > > > > > From: Willem de Bruijn <willemb@google.com>
> > > > > >
> > > > > > Only test integrity of csum_start if checksum offload is configured.
> > > > > >
> > > > > > With checksum offload and GRE tunnel checksum, gre_build_header will
> > > > > > cheaply build the GRE checksum using local checksum offload. This
> > > > > > depends on inner packet csum offload, and thus that csum_start points
> > > > > > behind GRE. But validate this condition only with checksum offload.
> > > > > >
> > > > > > Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
> > > > > > Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start")
> > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > > ---
> > > > > >  net/ipv4/ip_gre.c | 5 ++++-
> > > > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> > > > > > index 177d26d8fb9c..09311992a617 100644
> > > > > > --- a/net/ipv4/ip_gre.c
> > > > > > +++ b/net/ipv4/ip_gre.c
> > > > > > @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
> > > > > >
> > > > > >  static int gre_handle_offloads(struct sk_buff *skb, bool csum)
> > > > > >  {
> > > > > > -       if (csum && skb_checksum_start(skb) < skb->data)
> > > > > > +       /* Local checksum offload requires csum offload of the inner packet */
> > > > > > +       if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
> > > > > > +           skb_checksum_start(skb) < skb->data)
> > > > > >                 return -EINVAL;
> > > > > > +
> > > > > >         return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE);
> > > > > >  }
> > > >
> > > > Thanks for taking a look.
> > > >
> > > > > So a few minor nits.
> > > > >
> > > > > First I think we need this for both v4 and v6 since it looks like this
> > > > > code is reproduced for net/ipv6/ip6_gre.c.
> > > >
> > > > I sent a separate patch for v6. Perhaps should have made it a series
> > > > to make this more clear.
> > >
> > > Yeah, that was part of the reason I assumed the ipv6 patch was overlooked.
> >
> > I was in two minds only because a series should come with a cover
> > letter and thus one extra email added to the firehose. But this makes
> > clear the value. Will just do that in the future.
> >
> > > > > Second I don't know if we even need to bother including the "csum"
> > > > > portion of the lookup since that technically is just telling us if the
> > > > > GRE tunnel is requesting a checksum or not and I am not sure that
> > > > > applies to the fact that the inner L4 header is going to be what is
> > > > > requesting the checksum offload most likely.
> > > >
> > > > This test introduced in the original patch specifically protects the
> > > > GRE tunnel checksum calculation using lco_csum. The regular inner
> > > > packet path likely is already robust (as similar bug reports would be
> > > > more likely for that more common case).
> > >
> > > I was just thinking in terms of shaving off some extra comparisons. I
> > > suppose it depends on if this is being inlined or not. If it is being
> > > inlined there are at least 2 cases where the if statement would be
> > > dropped since csum is explicitly false. My thought was that by just
> > > jumping straight to the skb->ip_summed check we can drop the lookup
> > > for csum since it would be implied by the fact that skb->ip_summed is
> > > being checked. It would create a broader check, but at the same time
> > > it would reduce the number of comparisons in the call.
> >
> > Most GRE tunnels don't have checksums and csum is likely in a register,
> > as function argument, so it likely is the cheaper test?
> >
> > More functional argument: if !csum, the GRE tunnel does not care about
> > the integrity of csum_start. So I think that it should not read it at all.
>
> The problem as I see it is that it is just passing the problem on.
> Adding the check to the GRE drivers only really fixes one spot that
> exposes the issue. In this case it was triggered because the lco_csum
> was causing issues. What happens when one of these packets makes it
> down to hardware and it has to deal with the malformed csum_start? I
> suspect we end up potentially causing issues where Tx metadata could
> be malformed resulting in a dropped packet at the best case, and
> kernel panics at the worst.
>
> > > > > Also maybe this should be triggering a WARN_ON_ONCE if we hit this as
> > > > > the path triggering this should be fixed rather than us silently
> > > > > dropping frames. We should be figuring out what cases are resulting in
> > > > > us getting CHECKSUM_PARTIAL without skb_checksum_start being set.
> > > >
> > > > We already know that bad packets can enter the kernel and trigger
> > > > this, using packet sockets and virtio_net_hdr. Unfortunately, this
> > > > *is* the fix.
> > >
> > > It sounds almost like we need a CHECKSUM_DODGY to go along with the
> > > SKB_GSO_DODGY in order to resolve these kinds of issues.
> > >
> > > So essentially we have a source that we know can give us bad packets.
> > > We really should be performing some sort of validation on these much
> > > earlier in order to clean them up so that we don't have to add this
> > > sort of exception handling code all over the Tx path.
> >
> > Agreed with the concern. I've been arguing for validation at kernel
> > entry of virtio_net_hdr. As an optional feature, if nothing else:
> > https://patchwork.kernel.org/project/netdevbpf/patch/20210616203448.995314-3-tannerlove.kernel@gmail.com/
> >
> > But unless we accept the cost of full parsing to identify the
> > transport headers, we cannot predict at that stage whether the field
> > is bogus, let alone whether it might trigger a bug later on. We do
> > basic validation: csum_start is verified to be within the skb linear,
> > so not totally out of bounds.
> >
> > That the offset is not just bogus, but causes a bug appears to be a
> > rare exception peculiar to the GRE tunnel. Only it pulls the outer
> > header (in ipgre_xmit), applies lco_csum and can so trigger negative
> > overflow, as far as I could tell. That's why we decided to add the
> > limited check local to that code.
> >
> > I'm not sure how we would use CHECKSUM_DODGY in practice.
>
> The issue is drivers with NETIF_F_HW_CSUM would be expecting a
> reasonable csum_start and csum_offset. If the hardware is only
> advertising that and we are expecting it to offload the checksum we
> should probably be doing some sort of validation on user derived
> inputs to make sure that they aren't totally ridiculous as is the case
> here where the original issue was that the csum_start wasn't even in
> the packet data.
>
> Would it maybe make sense to look at reverting the earlier fixes and
> instead updating skb_partial_csum_set so that we cannot write a
> csum_start value that is less than the start of skb->data? That way we
> are addressing this at the source instead of allowing the garbage data
> to propagate further down the stack and having to drop it at the
> driver level which is going to have us playing whack a mole trying to
> fix it where it pops up. It seems like we are already validating the
> upper bounds for these values in that function so why not validate the
> lower bounds as well?

skb_partial_csum_set verifies that csum_start is within bounds
at the time it is called. The offset passed from userspace is
an unsigned int added to skb_headroom(skb), so >= skb->data.

The issue is with ipgre_xmit calling skb_pull. Only then does it
become out of bounds. From what I saw, pulling headers like
that is very rare in the transmit path. Indeed, I did not see it in
any other tunnels. We could instrument each of these directly,
but at non-trivial cost.

The negative underflow and kernel crash is very specific to
lco_csum, which calculates the offset between csum_offset
and transport_offset. Standard checksum offset doesn't.

One alternative fix, then, would be to add a negative overflow
check in lco_csum itself.

That only has three callers and unfortunately by the time we hit
that for GRE in gre_build_header, there no longer is a path to
cleanly dropping the packet and returning an error.

I don't find this bad input whack-a-mole elegant either. But I
thought it was the least bad option..
Alexander Duyck Sept. 3, 2021, 4:46 p.m. UTC | #7
On Thu, Sep 2, 2021 at 7:41 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Thu, Sep 2, 2021 at 10:01 PM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Thu, Sep 2, 2021 at 2:45 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > On Thu, Sep 2, 2021 at 5:17 PM Alexander Duyck
> > > <alexander.duyck@gmail.com> wrote:
> > > >
> > > > On Thu, Sep 2, 2021 at 1:30 PM Willem de Bruijn
> > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > >
> > > > > On Thu, Sep 2, 2021 at 4:25 PM Alexander Duyck
> > > > > <alexander.duyck@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn
> > > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > > >
> > > > > > > From: Willem de Bruijn <willemb@google.com>
> > > > > > >
> > > > > > > Only test integrity of csum_start if checksum offload is configured.
> > > > > > >
> > > > > > > With checksum offload and GRE tunnel checksum, gre_build_header will
> > > > > > > cheaply build the GRE checksum using local checksum offload. This
> > > > > > > depends on inner packet csum offload, and thus that csum_start points
> > > > > > > behind GRE. But validate this condition only with checksum offload.
> > > > > > >
> > > > > > > Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
> > > > > > > Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start")
> > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > > > ---
> > > > > > >  net/ipv4/ip_gre.c | 5 ++++-
> > > > > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> > > > > > > index 177d26d8fb9c..09311992a617 100644
> > > > > > > --- a/net/ipv4/ip_gre.c
> > > > > > > +++ b/net/ipv4/ip_gre.c
> > > > > > > @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
> > > > > > >
> > > > > > >  static int gre_handle_offloads(struct sk_buff *skb, bool csum)
> > > > > > >  {
> > > > > > > -       if (csum && skb_checksum_start(skb) < skb->data)
> > > > > > > +       /* Local checksum offload requires csum offload of the inner packet */
> > > > > > > +       if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
> > > > > > > +           skb_checksum_start(skb) < skb->data)
> > > > > > >                 return -EINVAL;
> > > > > > > +
> > > > > > >         return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE);
> > > > > > >  }
> > > > >
> > > > > Thanks for taking a look.
> > > > >
> > > > > > So a few minor nits.
> > > > > >
> > > > > > First I think we need this for both v4 and v6 since it looks like this
> > > > > > code is reproduced for net/ipv6/ip6_gre.c.
> > > > >
> > > > > I sent a separate patch for v6. Perhaps should have made it a series
> > > > > to make this more clear.
> > > >
> > > > Yeah, that was part of the reason I assumed the ipv6 patch was overlooked.
> > >
> > > I was in two minds only because a series should come with a cover
> > > letter and thus one extra email added to the firehose. But this makes
> > > clear the value. Will just do that in the future.
> > >
> > > > > > Second I don't know if we even need to bother including the "csum"
> > > > > > portion of the lookup since that technically is just telling us if the
> > > > > > GRE tunnel is requesting a checksum or not and I am not sure that
> > > > > > applies to the fact that the inner L4 header is going to be what is
> > > > > > requesting the checksum offload most likely.
> > > > >
> > > > > This test introduced in the original patch specifically protects the
> > > > > GRE tunnel checksum calculation using lco_csum. The regular inner
> > > > > packet path likely is already robust (as similar bug reports would be
> > > > > more likely for that more common case).
> > > >
> > > > I was just thinking in terms of shaving off some extra comparisons. I
> > > > suppose it depends on if this is being inlined or not. If it is being
> > > > inlined there are at least 2 cases where the if statement would be
> > > > dropped since csum is explicitly false. My thought was that by just
> > > > jumping straight to the skb->ip_summed check we can drop the lookup
> > > > for csum since it would be implied by the fact that skb->ip_summed is
> > > > being checked. It would create a broader check, but at the same time
> > > > it would reduce the number of comparisons in the call.
> > >
> > > Most GRE tunnels don't have checksums and csum is likely in a register,
> > > as function argument, so it likely is the cheaper test?
> > >
> > > More functional argument: if !csum, the GRE tunnel does not care about
> > > the integrity of csum_start. So I think that it should not read it at all.
> >
> > The problem as I see it is that it is just passing the problem on.
> > Adding the check to the GRE drivers only really fixes one spot that
> > exposes the issue. In this case it was triggered because the lco_csum
> > was causing issues. What happens when one of these packets makes it
> > down to hardware and it has to deal with the malformed csum_start? I
> > suspect we end up potentially causing issues where Tx metadata could
> > be malformed resulting in a dropped packet at the best case, and
> > kernel panics at the worst.
> >
> > > > > > Also maybe this should be triggering a WARN_ON_ONCE if we hit this as
> > > > > > the path triggering this should be fixed rather than us silently
> > > > > > dropping frames. We should be figuring out what cases are resulting in
> > > > > > us getting CHECKSUM_PARTIAL without skb_checksum_start being set.
> > > > >
> > > > > We already know that bad packets can enter the kernel and trigger
> > > > > this, using packet sockets and virtio_net_hdr. Unfortunately, this
> > > > > *is* the fix.
> > > >
> > > > It sounds almost like we need a CHECKSUM_DODGY to go along with the
> > > > SKB_GSO_DODGY in order to resolve these kinds of issues.
> > > >
> > > > So essentially we have a source that we know can give us bad packets.
> > > > We really should be performing some sort of validation on these much
> > > > earlier in order to clean them up so that we don't have to add this
> > > > sort of exception handling code all over the Tx path.
> > >
> > > Agreed with the concern. I've been arguing for validation at kernel
> > > entry of virtio_net_hdr. As an optional feature, if nothing else:
> > > https://patchwork.kernel.org/project/netdevbpf/patch/20210616203448.995314-3-tannerlove.kernel@gmail.com/
> > >
> > > But unless we accept the cost of full parsing to identify the
> > > transport headers, we cannot predict at that stage whether the field
> > > is bogus, let alone whether it might trigger a bug later on. We do
> > > basic validation: csum_start is verified to be within the skb linear,
> > > so not totally out of bounds.
> > >
> > > That the offset is not just bogus, but causes a bug appears to be a
> > > rare exception peculiar to the GRE tunnel. Only it pulls the outer
> > > header (in ipgre_xmit), applies lco_csum and can so trigger negative
> > > overflow, as far as I could tell. That's why we decided to add the
> > > limited check local to that code.
> > >
> > > I'm not sure how we would use CHECKSUM_DODGY in practice.
> >
> > The issue is drivers with NETIF_F_HW_CSUM would be expecting a
> > reasonable csum_start and csum_offset. If the hardware is only
> > advertising that and we are expecting it to offload the checksum we
> > should probably be doing some sort of validation on user derived
> > inputs to make sure that they aren't totally ridiculous as is the case
> > here where the original issue was that the csum_start wasn't even in
> > the packet data.
> >
> > Would it maybe make sense to look at reverting the earlier fixes and
> > instead updating skb_partial_csum_set so that we cannot write a
> > csum_start value that is less than the start of skb->data? That way we
> > are addressing this at the source instead of allowing the garbage data
> > to propagate further down the stack and having to drop it at the
> > driver level which is going to have us playing whack a mole trying to
> > fix it where it pops up. It seems like we are already validating the
> > upper bounds for these values in that function so why not validate the
> > lower bounds as well?
>
> skb_partial_csum_set verifies that csum_start is within bounds
> at the time it is called. The offset passed from userspace is
> an unsigned int added to skb_headroom(skb), so >= skb->data.
>
> The issue is with ipgre_xmit calling skb_pull. Only then does it
> become out of bounds. From what I saw, pulling headers like
> that is very rare in the transmit path. Indeed, I did not see it in
> any other tunnels. We could instrument each of these directly,
> but at non-trivial cost.

So there are some positives and negatives to addressing this in
ipgre_xmit. Specifically if we address it before the pull we could do
some other quick checks to verify the offset. If the offset and start
are both in the region to be pulled we could just drop the offload,
whereas if the offset is stored somewhere in the unstripped data we
could then drop the packet and count it as a drop without having to
modify the frame via the skb_pull.

> The negative underflow and kernel crash is very specific to
> lco_csum, which calculates the offset between csum_offset
> and transport_offset. Standard checksum offset doesn't.
>
> One alternative fix, then, would be to add a negative overflow
> check in lco_csum itself.
> That only has three callers and unfortunately by the time we hit
> that for GRE in gre_build_header, there no longer is a path to
> cleanly dropping the packet and returning an error.

Right. We could find the problem there but it doesn't solve the issue.
The problem is the expectation that the offset exists in the area
after the checksum for the tunnel header, not before.

> I don't find this bad input whack-a-mole elegant either. But I
> thought it was the least bad option..

The part that has been confusing me is how the virtio_net_hdr could
have been pointing as the IP or GRE headers since they shouldn't have
been present on the frame provided by the userspace. I think I am
starting to see now.

So in the case of af_packet it looks like it is calling
dev_hard_header which calls header_ops->create before doing the
virtio_net_hdr_to_skb call. Do I have that right? Maybe for those
cases we need to look at adding an unsigned int argument to
virtio_net_hdr_to_skb in which we could pass 0 for the unused case or
dev->hard_header_len in the cases where we have something like
af_packet that is transmitting over an ipgre tunnel. The general idea
is to prevent these virtio_net_hdr_to_skb calls from pointing the
csum_start into headers that userspace was not responsible for
populating.
Willem de Bruijn Sept. 3, 2021, 7:38 p.m. UTC | #8
> > > The issue is drivers with NETIF_F_HW_CSUM would be expecting a
> > > reasonable csum_start and csum_offset. If the hardware is only
> > > advertising that and we are expecting it to offload the checksum we
> > > should probably be doing some sort of validation on user derived
> > > inputs to make sure that they aren't totally ridiculous as is the case
> > > here where the original issue was that the csum_start wasn't even in
> > > the packet data.
> > >
> > > Would it maybe make sense to look at reverting the earlier fixes and
> > > instead updating skb_partial_csum_set so that we cannot write a
> > > csum_start value that is less than the start of skb->data? That way we
> > > are addressing this at the source instead of allowing the garbage data
> > > to propagate further down the stack and having to drop it at the
> > > driver level which is going to have us playing whack a mole trying to
> > > fix it where it pops up. It seems like we are already validating the
> > > upper bounds for these values in that function so why not validate the
> > > lower bounds as well?
> >
> > skb_partial_csum_set verifies that csum_start is within bounds
> > at the time it is called. The offset passed from userspace is
> > an unsigned int added to skb_headroom(skb), so >= skb->data.
> >
> > The issue is with ipgre_xmit calling skb_pull. Only then does it
> > become out of bounds. From what I saw, pulling headers like
> > that is very rare in the transmit path. Indeed, I did not see it in
> > any other tunnels. We could instrument each of these directly,
> > but at non-trivial cost.
>
> So there are some positives and negatives to addressing this in
> ipgre_xmit. Specifically if we address it before the pull we could do
> some other quick checks to verify the offset. If the offset and start
> are both in the region to be pulled we could just drop the offload,

These cases would currently crash the kernel. They are so clearly bad
input that I would not even try to start accommodating them. Drop
them.

> whereas if the offset is stored somewhere in the unstripped data we
> could then drop the packet and count it as a drop without having to
> modify the frame via the skb_pull.

This is a broader issue that userspace can pass any csum_start as long
as it is within packet bounds. We could address it here specifically
for the GRE header. But that still leaves many potentially bad offsets
further in the packet in this case, and all the other cases. Checking
that specific header seems a bit arbitrary to me, and might actually
give false confidence.

We could certainly move the validation from gre_handle_offloads to
before skb_pull, to make it more obvious *why* the check exists.

> > The negative underflow and kernel crash is very specific to
> > lco_csum, which calculates the offset between csum_offset
> > and transport_offset. Standard checksum offset doesn't.
> >
> > One alternative fix, then, would be to add a negative overflow
> > check in lco_csum itself.
> > That only has three callers and unfortunately by the time we hit
> > that for GRE in gre_build_header, there no longer is a path to
> > cleanly dropping the packet and returning an error.
>
> Right. We could find the problem there but it doesn't solve the issue.
> The problem is the expectation that the offset exists in the area
> after the checksum for the tunnel header, not before.
>
> > I don't find this bad input whack-a-mole elegant either. But I
> > thought it was the least bad option..
>
> The part that has been confusing me is how the virtio_net_hdr could
> have been pointing as the IP or GRE headers since they shouldn't have
> been present on the frame provided by the userspace. I think I am
> starting to see now.
>
> So in the case of af_packet it looks like it is calling
> dev_hard_header which calls header_ops->create before doing the
> virtio_net_hdr_to_skb call. Do I have that right?

Mostly yes. That is the case for SOCK_DGRAM. For SOCK_RAW, the caller
is expected to have prepared these headers.

Note that for the ip_gre device to have header_ops, this will be
ipgre_header_ops, which .create member function comment starts with
"/* Nice toy. Unfortunately, useless in real life :-)". We recently
had a small series of fixes related to this special case and packet
sockets, such as commit fdafed459998 ("ip_gre: set
dev->hard_header_len and dev->needed_headroom properly"). This case of
a GRE device that receives packets with outer IP + GRE headers is out
of the ordinary.

> Maybe for those
> cases we need to look at adding an unsigned int argument to
> virtio_net_hdr_to_skb in which we could pass 0 for the unused case or
> dev->hard_header_len in the cases where we have something like
> af_packet that is transmitting over an ipgre tunnel. The general idea
> is to prevent these virtio_net_hdr_to_skb calls from pointing the
> csum_start into headers that userspace was not responsible for
> populating.

One issue with that is that dev->hard_header_len itself is imprecise
for protocols with variable length link layer headers. There, too, we
have had a variety of bug fixes in the past.

It also adds cost to every user of virtio_net_hdr, while we only know
one issue in a rare case of the IP_GRE device.
Alexander Duyck Sept. 3, 2021, 11:13 p.m. UTC | #9
On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>

<snip>

> > whereas if the offset is stored somewhere in the unstripped data we
> > could then drop the packet and count it as a drop without having to
> > modify the frame via the skb_pull.
>
> This is a broader issue that userspace can pass any csum_start as long
> as it is within packet bounds. We could address it here specifically
> for the GRE header. But that still leaves many potentially bad offsets
> further in the packet in this case, and all the other cases. Checking
> that specific header seems a bit arbitrary to me, and might actually
> give false confidence.
>
> We could certainly move the validation from gre_handle_offloads to
> before skb_pull, to make it more obvious *why* the check exists.

Agreed. My main concern is that the csum_start is able to be located
somewhere where the userspace didn't write. For the most part the
csum_start and csum_offset just needs to be restricted to the regions
that the userspace actually wrote to.

> > > The negative underflow and kernel crash is very specific to
> > > lco_csum, which calculates the offset between csum_offset
> > > and transport_offset. Standard checksum offset doesn't.
> > >
> > > One alternative fix, then, would be to add a negative overflow
> > > check in lco_csum itself.
> > > That only has three callers and unfortunately by the time we hit
> > > that for GRE in gre_build_header, there no longer is a path to
> > > cleanly dropping the packet and returning an error.
> >
> > Right. We could find the problem there but it doesn't solve the issue.
> > The problem is the expectation that the offset exists in the area
> > after the checksum for the tunnel header, not before.
> >
> > > I don't find this bad input whack-a-mole elegant either. But I
> > > thought it was the least bad option..
> >
> > The part that has been confusing me is how the virtio_net_hdr could
> > have been pointing as the IP or GRE headers since they shouldn't have
> > been present on the frame provided by the userspace. I think I am
> > starting to see now.
> >
> > So in the case of af_packet it looks like it is calling
> > dev_hard_header which calls header_ops->create before doing the
> > virtio_net_hdr_to_skb call. Do I have that right?
>
> Mostly yes. That is the case for SOCK_DGRAM. For SOCK_RAW, the caller
> is expected to have prepared these headers.
>
> Note that for the ip_gre device to have header_ops, this will be
> ipgre_header_ops, which .create member function comment starts with
> "/* Nice toy. Unfortunately, useless in real life :-)". We recently
> had a small series of fixes related to this special case and packet
> sockets, such as commit fdafed459998 ("ip_gre: set
> dev->hard_header_len and dev->needed_headroom properly"). This case of
> a GRE device that receives packets with outer IP + GRE headers is out
> of the ordinary.

I'm not sure the comment is about the functionality of the function as
much as the addressing. Seems to be about setting up a multicast GRE
configuration.

> > Maybe for those
> > cases we need to look at adding an unsigned int argument to
> > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or
> > dev->hard_header_len in the cases where we have something like
> > af_packet that is transmitting over an ipgre tunnel. The general idea
> > is to prevent these virtio_net_hdr_to_skb calls from pointing the
> > csum_start into headers that userspace was not responsible for
> > populating.
>
> One issue with that is that dev->hard_header_len itself is imprecise
> for protocols with variable length link layer headers. There, too, we
> have had a variety of bug fixes in the past.
>
> It also adds cost to every user of virtio_net_hdr, while we only know
> one issue in a rare case of the IP_GRE device.

Quick question, the assumption is that the checksum should always be
performed starting no earlier than the transport header right? Looking
over virtio_net_hdr_to_skb it looks like it is already verifying the
transport header is in the linear portion of the skb. I'm wondering if
we couldn't just look at adding a check to verify the transport offset
is <= csum start? We might also be able to get rid of one of the two
calls to pskb_may_pull by doing that.
Willem de Bruijn Sept. 4, 2021, 2:45 p.m. UTC | #10
On Fri, Sep 3, 2021 at 7:27 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
>
> <snip>
>
> > > whereas if the offset is stored somewhere in the unstripped data we
> > > could then drop the packet and count it as a drop without having to
> > > modify the frame via the skb_pull.
> >
> > This is a broader issue that userspace can pass any csum_start as long
> > as it is within packet bounds. We could address it here specifically
> > for the GRE header. But that still leaves many potentially bad offsets
> > further in the packet in this case, and all the other cases. Checking
> > that specific header seems a bit arbitrary to me, and might actually
> > give false confidence.
> >
> > We could certainly move the validation from gre_handle_offloads to
> > before skb_pull, to make it more obvious *why* the check exists.
>
> Agreed. My main concern is that the csum_start is able to be located
> somewhere where the userspace didn't write. For the most part the
> csum_start and csum_offset just needs to be restricted to the regions
> that the userspace actually wrote to.

I don't quite follow. Even with this bug, the offset is somewhere userspace
wrote. That data is just pulled.

> > > Maybe for those
> > > cases we need to look at adding an unsigned int argument to
> > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or
> > > dev->hard_header_len in the cases where we have something like
> > > af_packet that is transmitting over an ipgre tunnel. The general idea
> > > is to prevent these virtio_net_hdr_to_skb calls from pointing the
> > > csum_start into headers that userspace was not responsible for
> > > populating.
> >
> > One issue with that is that dev->hard_header_len itself is imprecise
> > for protocols with variable length link layer headers. There, too, we
> > have had a variety of bug fixes in the past.
> >
> > It also adds cost to every user of virtio_net_hdr, while we only know
> > one issue in a rare case of the IP_GRE device.
>
> Quick question, the assumption is that the checksum should always be
> performed starting no earlier than the transport header right? Looking
> over virtio_net_hdr_to_skb it looks like it is already verifying the
> transport header is in the linear portion of the skb. I'm wondering if
> we couldn't just look at adding a check to verify the transport offset
> is <= csum start? We might also be able to get rid of one of the two
> calls to pskb_may_pull by doing that.

Are you referring to this part in the .._NEEDS_CSUM branch?

                if (!skb_partial_csum_set(skb, start, off))
                        return -EINVAL;

                p_off = skb_transport_offset(skb) + thlen;
                if (!pskb_may_pull(skb, p_off))
                        return -EINVAL;

skb_partial_csum_set is actually what sets the transport offset,
derived from start.
Alexander Duyck Sept. 4, 2021, 3:35 p.m. UTC | #11
On Sat, Sep 4, 2021 at 7:46 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Fri, Sep 3, 2021 at 7:27 PM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> >
> > <snip>
> >
> > > > whereas if the offset is stored somewhere in the unstripped data we
> > > > could then drop the packet and count it as a drop without having to
> > > > modify the frame via the skb_pull.
> > >
> > > This is a broader issue that userspace can pass any csum_start as long
> > > as it is within packet bounds. We could address it here specifically
> > > for the GRE header. But that still leaves many potentially bad offsets
> > > further in the packet in this case, and all the other cases. Checking
> > > that specific header seems a bit arbitrary to me, and might actually
> > > give false confidence.
> > >
> > > We could certainly move the validation from gre_handle_offloads to
> > > before skb_pull, to make it more obvious *why* the check exists.
> >
> > Agreed. My main concern is that the csum_start is able to be located
> > somewhere where the userspace didn't write. For the most part the
> > csum_start and csum_offset just needs to be restricted to the regions
> > that the userspace actually wrote to.
>
> I don't quite follow. Even with this bug, the offset is somewhere userspace
> wrote. That data is just pulled.

Sorry, I was thinking of the SOCK_DGRAM case where the header is added
via a call to dev_hard_header().

> > > > Maybe for those
> > > > cases we need to look at adding an unsigned int argument to
> > > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or
> > > > dev->hard_header_len in the cases where we have something like
> > > > af_packet that is transmitting over an ipgre tunnel. The general idea
> > > > is to prevent these virtio_net_hdr_to_skb calls from pointing the
> > > > csum_start into headers that userspace was not responsible for
> > > > populating.
> > >
> > > One issue with that is that dev->hard_header_len itself is imprecise
> > > for protocols with variable length link layer headers. There, too, we
> > > have had a variety of bug fixes in the past.
> > >
> > > It also adds cost to every user of virtio_net_hdr, while we only know
> > > one issue in a rare case of the IP_GRE device.
> >
> > Quick question, the assumption is that the checksum should always be
> > performed starting no earlier than the transport header right? Looking
> > over virtio_net_hdr_to_skb it looks like it is already verifying the
> > transport header is in the linear portion of the skb. I'm wondering if
> > we couldn't just look at adding a check to verify the transport offset
> > is <= csum start? We might also be able to get rid of one of the two
> > calls to pskb_may_pull by doing that.
>
> Are you referring to this part in the .._NEEDS_CSUM branch?
>
>                 if (!skb_partial_csum_set(skb, start, off))
>                         return -EINVAL;
>
>                 p_off = skb_transport_offset(skb) + thlen;
>                 if (!pskb_may_pull(skb, p_off))
>                         return -EINVAL;
>
> skb_partial_csum_set is actually what sets the transport offset,
> derived from start.

Ugh, I had overlooked that as I was more focused on the
skb_probe_transport_header calls in the af_packet code.

So we can have both the transport offset and the csum_start in a
region that gets stripped by the ipgre code. Worse yet the inner
transport header will also be pointing somewhere outside of the
encapsulated region when we pass it off to skb_reset_inner_headers().

Maybe it would make sense to just have the check look into the
transport offset instead of csum start as that way you are essentially
addressing two possible issues instead of one, and it would
effectively combine multiple checks as the uninitialized value is ~0
which should always be greater than "skb_headroom + tunnel->hlen +
sizeof(struct iphdr)". I think you mentioned before placing a check
just before you make the call to skb_pull in the GRE transmit path.
Doing that we would at least reduce the impact as it would only apply
in the header_ops case in ipgre_xmit instead of being applied to all
the transmit paths which don't perform the pull.
Willem de Bruijn Sept. 4, 2021, 9:40 p.m. UTC | #12
On Sat, Sep 4, 2021 at 11:37 AM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Sat, Sep 4, 2021 at 7:46 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Fri, Sep 3, 2021 at 7:27 PM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:
> > >
> > > On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > >
> > > <snip>
> > >
> > > > > whereas if the offset is stored somewhere in the unstripped data we
> > > > > could then drop the packet and count it as a drop without having to
> > > > > modify the frame via the skb_pull.
> > > >
> > > > This is a broader issue that userspace can pass any csum_start as long
> > > > as it is within packet bounds. We could address it here specifically
> > > > for the GRE header. But that still leaves many potentially bad offsets
> > > > further in the packet in this case, and all the other cases. Checking
> > > > that specific header seems a bit arbitrary to me, and might actually
> > > > give false confidence.
> > > >
> > > > We could certainly move the validation from gre_handle_offloads to
> > > > before skb_pull, to make it more obvious *why* the check exists.
> > >
> > > Agreed. My main concern is that the csum_start is able to be located
> > > somewhere where the userspace didn't write. For the most part the
> > > csum_start and csum_offset just needs to be restricted to the regions
> > > that the userspace actually wrote to.
> >
> > I don't quite follow. Even with this bug, the offset is somewhere userspace
> > wrote. That data is just pulled.
>
> Sorry, I was thinking of the SOCK_DGRAM case where the header is added
> via a call to dev_hard_header().
>
> > > > > Maybe for those
> > > > > cases we need to look at adding an unsigned int argument to
> > > > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or
> > > > > dev->hard_header_len in the cases where we have something like
> > > > > af_packet that is transmitting over an ipgre tunnel. The general idea
> > > > > is to prevent these virtio_net_hdr_to_skb calls from pointing the
> > > > > csum_start into headers that userspace was not responsible for
> > > > > populating.
> > > >
> > > > One issue with that is that dev->hard_header_len itself is imprecise
> > > > for protocols with variable length link layer headers. There, too, we
> > > > have had a variety of bug fixes in the past.
> > > >
> > > > It also adds cost to every user of virtio_net_hdr, while we only know
> > > > one issue in a rare case of the IP_GRE device.
> > >
> > > Quick question, the assumption is that the checksum should always be
> > > performed starting no earlier than the transport header right? Looking
> > > over virtio_net_hdr_to_skb it looks like it is already verifying the
> > > transport header is in the linear portion of the skb. I'm wondering if
> > > we couldn't just look at adding a check to verify the transport offset
> > > is <= csum start? We might also be able to get rid of one of the two
> > > calls to pskb_may_pull by doing that.
> >
> > Are you referring to this part in the .._NEEDS_CSUM branch?
> >
> >                 if (!skb_partial_csum_set(skb, start, off))
> >                         return -EINVAL;
> >
> >                 p_off = skb_transport_offset(skb) + thlen;
> >                 if (!pskb_may_pull(skb, p_off))
> >                         return -EINVAL;
> >
> > skb_partial_csum_set is actually what sets the transport offset,
> > derived from start.
>
> Ugh, I had overlooked that as I was more focused on the
> skb_probe_transport_header calls in the af_packet code.
>
> So we can have both the transport offset and the csum_start in a
> region that gets stripped by the ipgre code. Worse yet the inner
> transport header will also be pointing somewhere outside of the
> encapsulated region when we pass it off to skb_reset_inner_headers().
>
> Maybe it would make sense to just have the check look into the
> transport offset instead of csum start as that way you are essentially
> addressing two possible issues instead of one, and it would
> effectively combine multiple checks as the uninitialized value is ~0
> which should always be greater than "skb_headroom + tunnel->hlen +
> sizeof(struct iphdr)". I think you mentioned before placing a check
> just before you make the call to skb_pull in the GRE transmit path.
> Doing that we would at least reduce the impact as it would only apply
> in the header_ops case in ipgre_xmit instead of being applied to all
> the transmit paths which don't perform the pull.

Do you mean

        if (dev->header_ops) {
+               int pull_len = tunnel->hlen + sizeof(struct iphdr);
+
                if (skb_cow_head(skb, 0))
                        goto free_skb;

                tnl_params = (const struct iphdr *)skb->data;

+               if (pull_len > skb_transport_offset(skb))
+                       goto free_skb;
+
                /* Pull skb since ip_tunnel_xmit() needs skb->data pointing
                 * to gre header.
                 */
-               skb_pull(skb, tunnel->hlen + sizeof(struct iphdr));
+               skb_pull(skb, pull_len);
                skb_reset_mac_header(skb);

plus then

 static int gre_handle_offloads(struct sk_buff *skb, bool csum)
 {
-       /* Local checksum offload requires csum offload of the inner packet */
-       if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
-           skb_checksum_start(skb) < skb->data)
-               return -EINVAL;
-
        return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM :
SKB_GSO_GRE);
 }
Alexander Duyck Sept. 4, 2021, 9:53 p.m. UTC | #13
On Sat, Sep 4, 2021 at 2:40 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Sat, Sep 4, 2021 at 11:37 AM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Sat, Sep 4, 2021 at 7:46 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > On Fri, Sep 3, 2021 at 7:27 PM Alexander Duyck
> > > <alexander.duyck@gmail.com> wrote:
> > > >
> > > > On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn
> > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > >
> > > >
> > > > <snip>
> > > >
> > > > > > whereas if the offset is stored somewhere in the unstripped data we
> > > > > > could then drop the packet and count it as a drop without having to
> > > > > > modify the frame via the skb_pull.
> > > > >
> > > > > This is a broader issue that userspace can pass any csum_start as long
> > > > > as it is within packet bounds. We could address it here specifically
> > > > > for the GRE header. But that still leaves many potentially bad offsets
> > > > > further in the packet in this case, and all the other cases. Checking
> > > > > that specific header seems a bit arbitrary to me, and might actually
> > > > > give false confidence.
> > > > >
> > > > > We could certainly move the validation from gre_handle_offloads to
> > > > > before skb_pull, to make it more obvious *why* the check exists.
> > > >
> > > > Agreed. My main concern is that the csum_start is able to be located
> > > > somewhere where the userspace didn't write. For the most part the
> > > > csum_start and csum_offset just needs to be restricted to the regions
> > > > that the userspace actually wrote to.
> > >
> > > I don't quite follow. Even with this bug, the offset is somewhere userspace
> > > wrote. That data is just pulled.
> >
> > Sorry, I was thinking of the SOCK_DGRAM case where the header is added
> > via a call to dev_hard_header().
> >
> > > > > > Maybe for those
> > > > > > cases we need to look at adding an unsigned int argument to
> > > > > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or
> > > > > > dev->hard_header_len in the cases where we have something like
> > > > > > af_packet that is transmitting over an ipgre tunnel. The general idea
> > > > > > is to prevent these virtio_net_hdr_to_skb calls from pointing the
> > > > > > csum_start into headers that userspace was not responsible for
> > > > > > populating.
> > > > >
> > > > > One issue with that is that dev->hard_header_len itself is imprecise
> > > > > for protocols with variable length link layer headers. There, too, we
> > > > > have had a variety of bug fixes in the past.
> > > > >
> > > > > It also adds cost to every user of virtio_net_hdr, while we only know
> > > > > one issue in a rare case of the IP_GRE device.
> > > >
> > > > Quick question, the assumption is that the checksum should always be
> > > > performed starting no earlier than the transport header right? Looking
> > > > over virtio_net_hdr_to_skb it looks like it is already verifying the
> > > > transport header is in the linear portion of the skb. I'm wondering if
> > > > we couldn't just look at adding a check to verify the transport offset
> > > > is <= csum start? We might also be able to get rid of one of the two
> > > > calls to pskb_may_pull by doing that.
> > >
> > > Are you referring to this part in the .._NEEDS_CSUM branch?
> > >
> > >                 if (!skb_partial_csum_set(skb, start, off))
> > >                         return -EINVAL;
> > >
> > >                 p_off = skb_transport_offset(skb) + thlen;
> > >                 if (!pskb_may_pull(skb, p_off))
> > >                         return -EINVAL;
> > >
> > > skb_partial_csum_set is actually what sets the transport offset,
> > > derived from start.
> >
> > Ugh, I had overlooked that as I was more focused on the
> > skb_probe_transport_header calls in the af_packet code.
> >
> > So we can have both the transport offset and the csum_start in a
> > region that gets stripped by the ipgre code. Worse yet the inner
> > transport header will also be pointing somewhere outside of the
> > encapsulated region when we pass it off to skb_reset_inner_headers().
> >
> > Maybe it would make sense to just have the check look into the
> > transport offset instead of csum start as that way you are essentially
> > addressing two possible issues instead of one, and it would
> > effectively combine multiple checks as the uninitialized value is ~0
> > which should always be greater than "skb_headroom + tunnel->hlen +
> > sizeof(struct iphdr)". I think you mentioned before placing a check
> > just before you make the call to skb_pull in the GRE transmit path.
> > Doing that we would at least reduce the impact as it would only apply
> > in the header_ops case in ipgre_xmit instead of being applied to all
> > the transmit paths which don't perform the pull.
>
> Do you mean
>
>         if (dev->header_ops) {
> +               int pull_len = tunnel->hlen + sizeof(struct iphdr);
> +
>                 if (skb_cow_head(skb, 0))
>                         goto free_skb;
>
>                 tnl_params = (const struct iphdr *)skb->data;
>
> +               if (pull_len > skb_transport_offset(skb))
> +                       goto free_skb;
> +
>                 /* Pull skb since ip_tunnel_xmit() needs skb->data pointing
>                  * to gre header.
>                  */
> -               skb_pull(skb, tunnel->hlen + sizeof(struct iphdr));
> +               skb_pull(skb, pull_len);
>                 skb_reset_mac_header(skb);
>
> plus then
>
>  static int gre_handle_offloads(struct sk_buff *skb, bool csum)
>  {
> -       /* Local checksum offload requires csum offload of the inner packet */
> -       if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
> -           skb_checksum_start(skb) < skb->data)
> -               return -EINVAL;
> -
>         return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM :
> SKB_GSO_GRE);
>  }

Yes, this is what I was thinking. We will also need an IPv6 version of
this as well, and may want to add a comment clarifying that this is to
prevent us from pointing inner offsets at pulled headers.

It lets us drop the csum, ipsummed, and csum_start checks in favor of
just the skb_transport_offset comparison which should be a net win
since it reduces the number of paths the code is encountered in, and
reduces the number of checks to just 1.
Willem de Bruijn Sept. 4, 2021, 10:05 p.m. UTC | #14
On Sat, Sep 4, 2021 at 5:54 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Sat, Sep 4, 2021 at 2:40 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Sat, Sep 4, 2021 at 11:37 AM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:
> > >
> > > On Sat, Sep 4, 2021 at 7:46 AM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > On Fri, Sep 3, 2021 at 7:27 PM Alexander Duyck
> > > > <alexander.duyck@gmail.com> wrote:
> > > > >
> > > > > On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn
> > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > >
> > > > >
> > > > > <snip>
> > > > >
> > > > > > > whereas if the offset is stored somewhere in the unstripped data we
> > > > > > > could then drop the packet and count it as a drop without having to
> > > > > > > modify the frame via the skb_pull.
> > > > > >
> > > > > > This is a broader issue that userspace can pass any csum_start as long
> > > > > > as it is within packet bounds. We could address it here specifically
> > > > > > for the GRE header. But that still leaves many potentially bad offsets
> > > > > > further in the packet in this case, and all the other cases. Checking
> > > > > > that specific header seems a bit arbitrary to me, and might actually
> > > > > > give false confidence.
> > > > > >
> > > > > > We could certainly move the validation from gre_handle_offloads to
> > > > > > before skb_pull, to make it more obvious *why* the check exists.
> > > > >
> > > > > Agreed. My main concern is that the csum_start is able to be located
> > > > > somewhere where the userspace didn't write. For the most part the
> > > > > csum_start and csum_offset just needs to be restricted to the regions
> > > > > that the userspace actually wrote to.
> > > >
> > > > I don't quite follow. Even with this bug, the offset is somewhere userspace
> > > > wrote. That data is just pulled.
> > >
> > > Sorry, I was thinking of the SOCK_DGRAM case where the header is added
> > > via a call to dev_hard_header().
> > >
> > > > > > > Maybe for those
> > > > > > > cases we need to look at adding an unsigned int argument to
> > > > > > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or
> > > > > > > dev->hard_header_len in the cases where we have something like
> > > > > > > af_packet that is transmitting over an ipgre tunnel. The general idea
> > > > > > > is to prevent these virtio_net_hdr_to_skb calls from pointing the
> > > > > > > csum_start into headers that userspace was not responsible for
> > > > > > > populating.
> > > > > >
> > > > > > One issue with that is that dev->hard_header_len itself is imprecise
> > > > > > for protocols with variable length link layer headers. There, too, we
> > > > > > have had a variety of bug fixes in the past.
> > > > > >
> > > > > > It also adds cost to every user of virtio_net_hdr, while we only know
> > > > > > one issue in a rare case of the IP_GRE device.
> > > > >
> > > > > Quick question, the assumption is that the checksum should always be
> > > > > performed starting no earlier than the transport header right? Looking
> > > > > over virtio_net_hdr_to_skb it looks like it is already verifying the
> > > > > transport header is in the linear portion of the skb. I'm wondering if
> > > > > we couldn't just look at adding a check to verify the transport offset
> > > > > is <= csum start? We might also be able to get rid of one of the two
> > > > > calls to pskb_may_pull by doing that.
> > > >
> > > > Are you referring to this part in the .._NEEDS_CSUM branch?
> > > >
> > > >                 if (!skb_partial_csum_set(skb, start, off))
> > > >                         return -EINVAL;
> > > >
> > > >                 p_off = skb_transport_offset(skb) + thlen;
> > > >                 if (!pskb_may_pull(skb, p_off))
> > > >                         return -EINVAL;
> > > >
> > > > skb_partial_csum_set is actually what sets the transport offset,
> > > > derived from start.
> > >
> > > Ugh, I had overlooked that as I was more focused on the
> > > skb_probe_transport_header calls in the af_packet code.
> > >
> > > So we can have both the transport offset and the csum_start in a
> > > region that gets stripped by the ipgre code. Worse yet the inner
> > > transport header will also be pointing somewhere outside of the
> > > encapsulated region when we pass it off to skb_reset_inner_headers().
> > >
> > > Maybe it would make sense to just have the check look into the
> > > transport offset instead of csum start as that way you are essentially
> > > addressing two possible issues instead of one, and it would
> > > effectively combine multiple checks as the uninitialized value is ~0
> > > which should always be greater than "skb_headroom + tunnel->hlen +
> > > sizeof(struct iphdr)". I think you mentioned before placing a check
> > > just before you make the call to skb_pull in the GRE transmit path.
> > > Doing that we would at least reduce the impact as it would only apply
> > > in the header_ops case in ipgre_xmit instead of being applied to all
> > > the transmit paths which don't perform the pull.
> >
> > Do you mean
> >
> >         if (dev->header_ops) {
> > +               int pull_len = tunnel->hlen + sizeof(struct iphdr);
> > +
> >                 if (skb_cow_head(skb, 0))
> >                         goto free_skb;
> >
> >                 tnl_params = (const struct iphdr *)skb->data;
> >
> > +               if (pull_len > skb_transport_offset(skb))
> > +                       goto free_skb;
> > +
> >                 /* Pull skb since ip_tunnel_xmit() needs skb->data pointing
> >                  * to gre header.
> >                  */
> > -               skb_pull(skb, tunnel->hlen + sizeof(struct iphdr));
> > +               skb_pull(skb, pull_len);
> >                 skb_reset_mac_header(skb);
> >
> > plus then
> >
> >  static int gre_handle_offloads(struct sk_buff *skb, bool csum)
> >  {
> > -       /* Local checksum offload requires csum offload of the inner packet */
> > -       if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
> > -           skb_checksum_start(skb) < skb->data)
> > -               return -EINVAL;
> > -
> >         return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM :
> > SKB_GSO_GRE);
> >  }
>
> Yes, this is what I was thinking. We will also need an IPv6 version of
> this as well, and may want to add a comment clarifying that this is to
> prevent us from pointing inner offsets at pulled headers.
>
> It lets us drop the csum, ipsummed, and csum_start checks in favor of
> just the skb_transport_offset comparison which should be a net win
> since it reduces the number of paths the code is encountered in, and
> reduces the number of checks to just 1.

Okay. Yes, this looks better to me too. Thanks.

Do you want to submit it? Or I can do it, either way.
Alexander Duyck Sept. 4, 2021, 11:47 p.m. UTC | #15
On Sat, Sep 4, 2021 at 3:05 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Sat, Sep 4, 2021 at 5:54 PM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Sat, Sep 4, 2021 at 2:40 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > On Sat, Sep 4, 2021 at 11:37 AM Alexander Duyck
> > > <alexander.duyck@gmail.com> wrote:
> > > >
> > > > On Sat, Sep 4, 2021 at 7:46 AM Willem de Bruijn
> > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > >
> > > > > On Fri, Sep 3, 2021 at 7:27 PM Alexander Duyck
> > > > > <alexander.duyck@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn
> > > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > > >
> > > > > >
> > > > > > <snip>
> > > > > >
> > > > > > > > whereas if the offset is stored somewhere in the unstripped data we
> > > > > > > > could then drop the packet and count it as a drop without having to
> > > > > > > > modify the frame via the skb_pull.
> > > > > > >
> > > > > > > This is a broader issue that userspace can pass any csum_start as long
> > > > > > > as it is within packet bounds. We could address it here specifically
> > > > > > > for the GRE header. But that still leaves many potentially bad offsets
> > > > > > > further in the packet in this case, and all the other cases. Checking
> > > > > > > that specific header seems a bit arbitrary to me, and might actually
> > > > > > > give false confidence.
> > > > > > >
> > > > > > > We could certainly move the validation from gre_handle_offloads to
> > > > > > > before skb_pull, to make it more obvious *why* the check exists.
> > > > > >
> > > > > > Agreed. My main concern is that the csum_start is able to be located
> > > > > > somewhere where the userspace didn't write. For the most part the
> > > > > > csum_start and csum_offset just needs to be restricted to the regions
> > > > > > that the userspace actually wrote to.
> > > > >
> > > > > I don't quite follow. Even with this bug, the offset is somewhere userspace
> > > > > wrote. That data is just pulled.
> > > >
> > > > Sorry, I was thinking of the SOCK_DGRAM case where the header is added
> > > > via a call to dev_hard_header().
> > > >
> > > > > > > > Maybe for those
> > > > > > > > cases we need to look at adding an unsigned int argument to
> > > > > > > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or
> > > > > > > > dev->hard_header_len in the cases where we have something like
> > > > > > > > af_packet that is transmitting over an ipgre tunnel. The general idea
> > > > > > > > is to prevent these virtio_net_hdr_to_skb calls from pointing the
> > > > > > > > csum_start into headers that userspace was not responsible for
> > > > > > > > populating.
> > > > > > >
> > > > > > > One issue with that is that dev->hard_header_len itself is imprecise
> > > > > > > for protocols with variable length link layer headers. There, too, we
> > > > > > > have had a variety of bug fixes in the past.
> > > > > > >
> > > > > > > It also adds cost to every user of virtio_net_hdr, while we only know
> > > > > > > one issue in a rare case of the IP_GRE device.
> > > > > >
> > > > > > Quick question, the assumption is that the checksum should always be
> > > > > > performed starting no earlier than the transport header right? Looking
> > > > > > over virtio_net_hdr_to_skb it looks like it is already verifying the
> > > > > > transport header is in the linear portion of the skb. I'm wondering if
> > > > > > we couldn't just look at adding a check to verify the transport offset
> > > > > > is <= csum start? We might also be able to get rid of one of the two
> > > > > > calls to pskb_may_pull by doing that.
> > > > >
> > > > > Are you referring to this part in the .._NEEDS_CSUM branch?
> > > > >
> > > > >                 if (!skb_partial_csum_set(skb, start, off))
> > > > >                         return -EINVAL;
> > > > >
> > > > >                 p_off = skb_transport_offset(skb) + thlen;
> > > > >                 if (!pskb_may_pull(skb, p_off))
> > > > >                         return -EINVAL;
> > > > >
> > > > > skb_partial_csum_set is actually what sets the transport offset,
> > > > > derived from start.
> > > >
> > > > Ugh, I had overlooked that as I was more focused on the
> > > > skb_probe_transport_header calls in the af_packet code.
> > > >
> > > > So we can have both the transport offset and the csum_start in a
> > > > region that gets stripped by the ipgre code. Worse yet the inner
> > > > transport header will also be pointing somewhere outside of the
> > > > encapsulated region when we pass it off to skb_reset_inner_headers().
> > > >
> > > > Maybe it would make sense to just have the check look into the
> > > > transport offset instead of csum start as that way you are essentially
> > > > addressing two possible issues instead of one, and it would
> > > > effectively combine multiple checks as the uninitialized value is ~0
> > > > which should always be greater than "skb_headroom + tunnel->hlen +
> > > > sizeof(struct iphdr)". I think you mentioned before placing a check
> > > > just before you make the call to skb_pull in the GRE transmit path.
> > > > Doing that we would at least reduce the impact as it would only apply
> > > > in the header_ops case in ipgre_xmit instead of being applied to all
> > > > the transmit paths which don't perform the pull.
> > >
> > > Do you mean
> > >
> > >         if (dev->header_ops) {
> > > +               int pull_len = tunnel->hlen + sizeof(struct iphdr);
> > > +
> > >                 if (skb_cow_head(skb, 0))
> > >                         goto free_skb;
> > >
> > >                 tnl_params = (const struct iphdr *)skb->data;
> > >
> > > +               if (pull_len > skb_transport_offset(skb))
> > > +                       goto free_skb;
> > > +
> > >                 /* Pull skb since ip_tunnel_xmit() needs skb->data pointing
> > >                  * to gre header.
> > >                  */
> > > -               skb_pull(skb, tunnel->hlen + sizeof(struct iphdr));
> > > +               skb_pull(skb, pull_len);
> > >                 skb_reset_mac_header(skb);
> > >
> > > plus then
> > >
> > >  static int gre_handle_offloads(struct sk_buff *skb, bool csum)
> > >  {
> > > -       /* Local checksum offload requires csum offload of the inner packet */
> > > -       if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
> > > -           skb_checksum_start(skb) < skb->data)
> > > -               return -EINVAL;
> > > -
> > >         return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM :
> > > SKB_GSO_GRE);
> > >  }
> >
> > Yes, this is what I was thinking. We will also need an IPv6 version of
> > this as well, and may want to add a comment clarifying that this is to
> > prevent us from pointing inner offsets at pulled headers.
> >
> > It lets us drop the csum, ipsummed, and csum_start checks in favor of
> > just the skb_transport_offset comparison which should be a net win
> > since it reduces the number of paths the code is encountered in, and
> > reduces the number of checks to just 1.
>
> Okay. Yes, this looks better to me too. Thanks.
>
> Do you want to submit it? Or I can do it, either way.

You can do it since you have essentially already written half the code.. :)

Thanks.

- Alex
Willem de Bruijn Sept. 5, 2021, 3:24 p.m. UTC | #16
On Sat, Sep 4, 2021 at 7:47 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Sat, Sep 4, 2021 at 3:05 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Sat, Sep 4, 2021 at 5:54 PM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:
> > >
> > > On Sat, Sep 4, 2021 at 2:40 PM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > On Sat, Sep 4, 2021 at 11:37 AM Alexander Duyck
> > > > <alexander.duyck@gmail.com> wrote:
> > > > >
> > > > > On Sat, Sep 4, 2021 at 7:46 AM Willem de Bruijn
> > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Sep 3, 2021 at 7:27 PM Alexander Duyck
> > > > > > <alexander.duyck@gmail.com> wrote:
> > > > > > >
> > > > > > > On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn
> > > > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > > > >
> > > > > > >
> > > > > > > <snip>
> > > > > > >
> > > > > > > > > whereas if the offset is stored somewhere in the unstripped data we
> > > > > > > > > could then drop the packet and count it as a drop without having to
> > > > > > > > > modify the frame via the skb_pull.
> > > > > > > >
> > > > > > > > This is a broader issue that userspace can pass any csum_start as long
> > > > > > > > as it is within packet bounds. We could address it here specifically
> > > > > > > > for the GRE header. But that still leaves many potentially bad offsets
> > > > > > > > further in the packet in this case, and all the other cases. Checking
> > > > > > > > that specific header seems a bit arbitrary to me, and might actually
> > > > > > > > give false confidence.
> > > > > > > >
> > > > > > > > We could certainly move the validation from gre_handle_offloads to
> > > > > > > > before skb_pull, to make it more obvious *why* the check exists.
> > > > > > >
> > > > > > > Agreed. My main concern is that the csum_start is able to be located
> > > > > > > somewhere where the userspace didn't write. For the most part the
> > > > > > > csum_start and csum_offset just needs to be restricted to the regions
> > > > > > > that the userspace actually wrote to.
> > > > > >
> > > > > > I don't quite follow. Even with this bug, the offset is somewhere userspace
> > > > > > wrote. That data is just pulled.
> > > > >
> > > > > Sorry, I was thinking of the SOCK_DGRAM case where the header is added
> > > > > via a call to dev_hard_header().
> > > > >
> > > > > > > > > Maybe for those
> > > > > > > > > cases we need to look at adding an unsigned int argument to
> > > > > > > > > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or
> > > > > > > > > dev->hard_header_len in the cases where we have something like
> > > > > > > > > af_packet that is transmitting over an ipgre tunnel. The general idea
> > > > > > > > > is to prevent these virtio_net_hdr_to_skb calls from pointing the
> > > > > > > > > csum_start into headers that userspace was not responsible for
> > > > > > > > > populating.
> > > > > > > >
> > > > > > > > One issue with that is that dev->hard_header_len itself is imprecise
> > > > > > > > for protocols with variable length link layer headers. There, too, we
> > > > > > > > have had a variety of bug fixes in the past.
> > > > > > > >
> > > > > > > > It also adds cost to every user of virtio_net_hdr, while we only know
> > > > > > > > one issue in a rare case of the IP_GRE device.
> > > > > > >
> > > > > > > Quick question, the assumption is that the checksum should always be
> > > > > > > performed starting no earlier than the transport header right? Looking
> > > > > > > over virtio_net_hdr_to_skb it looks like it is already verifying the
> > > > > > > transport header is in the linear portion of the skb. I'm wondering if
> > > > > > > we couldn't just look at adding a check to verify the transport offset
> > > > > > > is <= csum start? We might also be able to get rid of one of the two
> > > > > > > calls to pskb_may_pull by doing that.
> > > > > >
> > > > > > Are you referring to this part in the .._NEEDS_CSUM branch?
> > > > > >
> > > > > >                 if (!skb_partial_csum_set(skb, start, off))
> > > > > >                         return -EINVAL;
> > > > > >
> > > > > >                 p_off = skb_transport_offset(skb) + thlen;
> > > > > >                 if (!pskb_may_pull(skb, p_off))
> > > > > >                         return -EINVAL;
> > > > > >
> > > > > > skb_partial_csum_set is actually what sets the transport offset,
> > > > > > derived from start.
> > > > >
> > > > > Ugh, I had overlooked that as I was more focused on the
> > > > > skb_probe_transport_header calls in the af_packet code.
> > > > >
> > > > > So we can have both the transport offset and the csum_start in a
> > > > > region that gets stripped by the ipgre code. Worse yet the inner
> > > > > transport header will also be pointing somewhere outside of the
> > > > > encapsulated region when we pass it off to skb_reset_inner_headers().
> > > > >
> > > > > Maybe it would make sense to just have the check look into the
> > > > > transport offset instead of csum start as that way you are essentially
> > > > > addressing two possible issues instead of one, and it would
> > > > > effectively combine multiple checks as the uninitialized value is ~0
> > > > > which should always be greater than "skb_headroom + tunnel->hlen +
> > > > > sizeof(struct iphdr)". I think you mentioned before placing a check
> > > > > just before you make the call to skb_pull in the GRE transmit path.
> > > > > Doing that we would at least reduce the impact as it would only apply
> > > > > in the header_ops case in ipgre_xmit instead of being applied to all
> > > > > the transmit paths which don't perform the pull.
> > > >
> > > > Do you mean
> > > >
> > > >         if (dev->header_ops) {
> > > > +               int pull_len = tunnel->hlen + sizeof(struct iphdr);
> > > > +
> > > >                 if (skb_cow_head(skb, 0))
> > > >                         goto free_skb;
> > > >
> > > >                 tnl_params = (const struct iphdr *)skb->data;
> > > >
> > > > +               if (pull_len > skb_transport_offset(skb))
> > > > +                       goto free_skb;
> > > > +
> > > >                 /* Pull skb since ip_tunnel_xmit() needs skb->data pointing
> > > >                  * to gre header.
> > > >                  */
> > > > -               skb_pull(skb, tunnel->hlen + sizeof(struct iphdr));
> > > > +               skb_pull(skb, pull_len);
> > > >                 skb_reset_mac_header(skb);
> > > >
> > > > plus then
> > > >
> > > >  static int gre_handle_offloads(struct sk_buff *skb, bool csum)
> > > >  {
> > > > -       /* Local checksum offload requires csum offload of the inner packet */
> > > > -       if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
> > > > -           skb_checksum_start(skb) < skb->data)
> > > > -               return -EINVAL;
> > > > -
> > > >         return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM :
> > > > SKB_GSO_GRE);
> > > >  }
> > >
> > > Yes, this is what I was thinking. We will also need an IPv6 version of
> > > this as well, and may want to add a comment clarifying that this is to
> > > prevent us from pointing inner offsets at pulled headers.
> > >
> > > It lets us drop the csum, ipsummed, and csum_start checks in favor of
> > > just the skb_transport_offset comparison which should be a net win
> > > since it reduces the number of paths the code is encountered in, and
> > > reduces the number of checks to just 1.
> >
> > Okay. Yes, this looks better to me too. Thanks.
> >
> > Do you want to submit it? Or I can do it, either way.
>
> You can do it since you have essentially already written half the code.. :)

Sent, but only the ipv4 patch.

I actually do not see an equivalent skb_pull path in ip6_gre.c. Will
take a closer look later, but don't have time for that now.

https://patchwork.kernel.org/project/netdevbpf/patch/20210905152109.1805619-1-willemdebruijn.kernel@gmail.com/
Alexander Duyck Sept. 5, 2021, 3:53 p.m. UTC | #17
On Sun, Sep 5, 2021 at 8:24 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Sat, Sep 4, 2021 at 7:47 PM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:

<snip>

> > You can do it since you have essentially already written half the code.. :)
>
> Sent, but only the ipv4 patch.
>
> I actually do not see an equivalent skb_pull path in ip6_gre.c. Will
> take a closer look later, but don't have time for that now.
>
> https://patchwork.kernel.org/project/netdevbpf/patch/20210905152109.1805619-1-willemdebruijn.kernel@gmail.com/

So does that mean that commit 9cf448c200ba ("ip6_gre: add validation
for csum_start") is adding overhead that doesn't really address
anything, and is introducing false positives? If so, should we just
revert it?

Thanks,

- Alex
Willem de Bruijn Sept. 6, 2021, 3:02 a.m. UTC | #18
On Sun, Sep 5, 2021 at 11:53 AM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Sun, Sep 5, 2021 at 8:24 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Sat, Sep 4, 2021 at 7:47 PM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:
>
> <snip>
>
> > > You can do it since you have essentially already written half the code.. :)
> >
> > Sent, but only the ipv4 patch.
> >
> > I actually do not see an equivalent skb_pull path in ip6_gre.c. Will
> > take a closer look later, but don't have time for that now.
> >
> > https://patchwork.kernel.org/project/netdevbpf/patch/20210905152109.1805619-1-willemdebruijn.kernel@gmail.com/
>
> So does that mean that commit 9cf448c200ba ("ip6_gre: add validation
> for csum_start") is adding overhead that doesn't really address
> anything, and is introducing false positives? If so, should we just
> revert it?

I agree.

I didn't immediately understand how there can be an ip6gre_header
analogous to ipgre_header (for headers_ops.create), but no skb_pull
in ip6gre_tunnel_xmit equivalent to the one in ipgre_xmit.

One difference between the two of them is the return value, which
returns an offset: ip6gre_header returns t->hlen. ipgre_header returns
t->hlen + sizeof(*iph). This extra offset was introduced in commit
77a482bdb2e6 ("ip_gre: fix ipgre_header to return correct offset") as
a fix to previous commit c544193214 ("GRE: Refactor GRE tunneling
code."). Before the latter commit the IPv4 header was included in
t->hlen, similar to IPv6. After that commit it is no longer. The latter
patch is also the one that introduces the skb_pull in ipgre_xmit.

I can't say that I fully understand how this works for IPv6.

But I think that is sufficient to understand that this skb_pull issue
does not affect the ip6_gre path and we best just revert that commit.
diff mbox series

Patch

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 177d26d8fb9c..09311992a617 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -473,8 +473,11 @@  static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
 
 static int gre_handle_offloads(struct sk_buff *skb, bool csum)
 {
-	if (csum && skb_checksum_start(skb) < skb->data)
+	/* Local checksum offload requires csum offload of the inner packet */
+	if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
+	    skb_checksum_start(skb) < skb->data)
 		return -EINVAL;
+
 	return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE);
 }