IB/IPoIB: Check the headroom size

Message ID	1493192794.2409.3.camel@redhat.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-rdma-owner@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 68A3D4E4C6 Message-ID: <1493192794.2409.3.camel@redhat.com> Subject: Re: [PATCH] IB/IPoIB: Check the headroom size From: Paolo Abeni <pabeni@redhat.com> To: Or Gerlitz <gerlitz.or@gmail.com>, Doug Ledford <dledford@redhat.com> Cc: Erez Shitrit <erezsh@dev.mellanox.co.il>, Honggang LI <honli@redhat.com>, Erez Shitrit <erezsh@mellanox.com>, "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, Linux Netdev List <netdev@vger.kernel.org>, David Miller <davem@davemloft.net> Date: Wed, 26 Apr 2017 09:46:34 +0200 In-Reply-To: <CAJ3xEMjqkJrKi+6ronuPBn2P6y8p6sdhVppDzFtMwQrhL13bzg@mail.gmail.com> References: <1493114155-12101-1-git-send-email-honli@redhat.com> <CAJ3xEMg4_2ph7QwPsUb-tX-K4d2ppkqz98sPzytsmBCK=29WHw@mail.gmail.com> <CAAk-MO8O19iC2Yn-BMn5pKTAYxaSzGPMyta=fwes3XSvzmz_cQ@mail.gmail.com> <CAJ3xEMgw=9sj3rdahPEiST_yDfDJPNSZZLRn43tnb3bK4_RPzg@mail.gmail.com> <CAAk-MO-B8Liv6RyGye_Fx-9OELCUrGMj5SwK9UsciC=khSO5fA@mail.gmail.com> <CAJ3xEMgwS1Bq8+eZC1iAr6xi8ZPHrchsOJ5r4LNJxR8P+6VipA@mail.gmail.com> <1493134815.3041.72.camel@redhat.com> <CAJ3xEMjqkJrKi+6ronuPBn2P6y8p6sdhVppDzFtMwQrhL13bzg@mail.gmail.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk

Paolo Abeni April 26, 2017, 7:46 a.m. UTC

On Wed, 2017-04-26 at 00:50 +0300, Or Gerlitz wrote:
> so maybe @ least for the time being, we should be picking Hong's patch
> with proper change log and without the giant stack dump till we have
> something better. If you agree, can you do the re-write of the change
> log?

I think that Hong's patch is following the correct way to fix the
issue: ipoib_hard_header() can't assume that the skb headroom is at
least IPOIB_HARD_LEN bytes, as wrongly implied by commit fc791b633515
(my fault, I'm sorry).

Perhaps we can make the code a little more robust with something
alongside the following (only compile tested):
---
Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Honggang LI April 26, 2017, 12:52 p.m. UTC | #1

On Wed, Apr 26, 2017 at 09:46:34AM +0200, Paolo Abeni wrote:
> On Wed, 2017-04-26 at 00:50 +0300, Or Gerlitz wrote:
> > so maybe @ least for the time being, we should be picking Hong's patch
> > with proper change log and without the giant stack dump till we have
> > something better. If you agree, can you do the re-write of the change
> > log?
> 
> I think that Hong's patch is following the correct way to fix the
> issue: ipoib_hard_header() can't assume that the skb headroom is at
> least IPOIB_HARD_LEN bytes, as wrongly implied by commit fc791b633515
> (my fault, I'm sorry).
> 
> Perhaps we can make the code a little more robust with something
> alongside the following (only compile tested):
> ---
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index d1d3fb7..d53d2e1 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -1160,6 +1160,11 @@ static int ipoib_hard_header(struct sk_buff *skb,
>                              const void *daddr, const void *saddr, unsigned len)
>  {
>         struct ipoib_header *header;
> +       int ret;
> +
> +       ret = skb_cow_head(skb, IPOIB_HARD_LEN);

I don't think so. When this skb_under_panic arise, all slaves had been
removed from a busy bonding interface, so it is better to immediately
give up and return error.

> +       if (ret)
> +               return ret;
>  
>         header = (struct ipoib_header *) skb_push(skb, sizeof *header);
> ---
> 
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Abeni April 26, 2017, 1:25 p.m. UTC | #2

On Wed, 2017-04-26 at 20:52 +0800, Honggang LI wrote:
> On Wed, Apr 26, 2017 at 09:46:34AM +0200, Paolo Abeni wrote:
> > On Wed, 2017-04-26 at 00:50 +0300, Or Gerlitz wrote:
> > > so maybe @ least for the time being, we should be picking Hong's patch
> > > with proper change log and without the giant stack dump till we have
> > > something better. If you agree, can you do the re-write of the change
> > > log?
> > 
> > I think that Hong's patch is following the correct way to fix the
> > issue: ipoib_hard_header() can't assume that the skb headroom is at
> > least IPOIB_HARD_LEN bytes, as wrongly implied by commit fc791b633515
> > (my fault, I'm sorry).
> > 
> > Perhaps we can make the code a little more robust with something
> > alongside the following (only compile tested):
> > ---
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > index d1d3fb7..d53d2e1 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > @@ -1160,6 +1160,11 @@ static int ipoib_hard_header(struct sk_buff *skb,
> >                              const void *daddr, const void *saddr, unsigned len)
> >  {
> >         struct ipoib_header *header;
> > +       int ret;
> > +
> > +       ret = skb_cow_head(skb, IPOIB_HARD_LEN);
> 
> I don't think so. When this skb_under_panic arise, all slaves had been
> removed from a busy bonding interface, so it is better to immediately
> give up and return error.

I fear we can hit the same condition even with other, more convoluted,
setup (e.g. some kind of ip tunnel on top of ipoib) and the
skb_cow_head() check should be cheap, in the common case.

Anyway I'm fine even with the original fix, which has the advantage to 
minimize the side effects.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Doug Ledford April 26, 2017, 1:27 p.m. UTC | #3

On Wed, 2017-04-26 at 20:52 +0800, Honggang LI wrote:
> On Wed, Apr 26, 2017 at 09:46:34AM +0200, Paolo Abeni wrote:
> > 
> > On Wed, 2017-04-26 at 00:50 +0300, Or Gerlitz wrote:
> > > 
> > > so maybe @ least for the time being, we should be picking Hong's
> > > patch
> > > with proper change log and without the giant stack dump till we
> > > have
> > > something better. If you agree, can you do the re-write of the
> > > change
> > > log?
> > 
> > I think that Hong's patch is following the correct way to fix the
> > issue: ipoib_hard_header() can't assume that the skb headroom is at
> > least IPOIB_HARD_LEN bytes, as wrongly implied by commit
> > fc791b633515
> > (my fault, I'm sorry).
> > 
> > Perhaps we can make the code a little more robust with something
> > alongside the following (only compile tested):
> > ---
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > index d1d3fb7..d53d2e1 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > @@ -1160,6 +1160,11 @@ static int ipoib_hard_header(struct sk_buff
> > *skb,
> >                              const void *daddr, const void *saddr,
> > unsigned len)
> >  {
> >         struct ipoib_header *header;
> > +       int ret;
> > +
> > +       ret = skb_cow_head(skb, IPOIB_HARD_LEN);
> 
> I don't think so. When this skb_under_panic arise, all slaves had
> been
> removed from a busy bonding interface,

I'm not sure this entirely makes sense.  If all slaves had been
removed, then we should never end up in ipoib_hard_header.  That should
only be called as long as there is at least one slave still attached to
the bond.  It might be during the process of removing the final slave,
but I don't think it can be after the final slave has been fully
removed from the bond.  Paolo, what should the bond driver be doing
once the slaves are gone?  Wouldn't it just be dropping every skb on
the floor without calling anyone's hard header routine?

>  so it is better to immediately
> give up and return error.
> 
> > 
> > +       if (ret)
> > +               return ret;
> >  
> >         header = (struct ipoib_header *) skb_push(skb, sizeof
> > *header);
> > ---
> > 
> > Paolo
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-
> > rdma" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

Honggang LI April 26, 2017, 1:33 p.m. UTC | #4

On Wed, Apr 26, 2017 at 09:27:38AM -0400, Doug Ledford wrote:
> On Wed, 2017-04-26 at 20:52 +0800, Honggang LI wrote:
> > On Wed, Apr 26, 2017 at 09:46:34AM +0200, Paolo Abeni wrote:
> > > 
> > > On Wed, 2017-04-26 at 00:50 +0300, Or Gerlitz wrote:
> > > > 
> > > > so maybe @ least for the time being, we should be picking Hong's
> > > > patch
> > > > with proper change log and without the giant stack dump till we
> > > > have
> > > > something better. If you agree, can you do the re-write of the
> > > > change
> > > > log?
> > > 
> > > I think that Hong's patch is following the correct way to fix the
> > > issue: ipoib_hard_header() can't assume that the skb headroom is at
> > > least IPOIB_HARD_LEN bytes, as wrongly implied by commit
> > > fc791b633515
> > > (my fault, I'm sorry).
> > > 
> > > Perhaps we can make the code a little more robust with something
> > > alongside the following (only compile tested):
> > > ---
> > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > index d1d3fb7..d53d2e1 100644
> > > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > @@ -1160,6 +1160,11 @@ static int ipoib_hard_header(struct sk_buff
> > > *skb,
> > >                              const void *daddr, const void *saddr,
> > > unsigned len)
> > >  {
> > >         struct ipoib_header *header;
> > > +       int ret;
> > > +
> > > +       ret = skb_cow_head(skb, IPOIB_HARD_LEN);
> > 
> > I don't think so. When this skb_under_panic arise, all slaves had
> > been
> > removed from a busy bonding interface,
> 
> I'm not sure this entirely makes sense.  If all slaves had been
> removed, then we should never end up in ipoib_hard_header.  That should
> only be called as long as there is at least one slave still attached to
> the bond.  It might be during the process of removing the final slave,

Yes, it is during the process of removing the final slave. The
reproducer looks like this:

ping remote_ip_over_bonding_interface &
while 1; do
	ifdown bond0
	ifup   bond0
done

> but I don't think it can be after the final slave has been fully
> removed from the bond.  Paolo, what should the bond driver be doing
> once the slaves are gone?  Wouldn't it just be dropping every skb on
> the floor without calling anyone's hard header routine?
> 
> >  so it is better to immediately
> > give up and return error.
> > 
> > > 
> > > +       if (ret)
> > > +               return ret;
> > >  
> > >         header = (struct ipoib_header *) skb_push(skb, sizeof
> > > *header);
> > > ---
> > > 
> > > Paolo
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-
> > > rdma" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -- 
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>    
> Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Doug Ledford April 26, 2017, 1:48 p.m. UTC | #5

On Wed, 2017-04-26 at 21:33 +0800, Honggang LI wrote:
> Yes, it is during the process of removing the final slave. The
> reproducer looks like this:
> 
> ping remote_ip_over_bonding_interface &
> while 1; do
> 	ifdown bond0
> 	ifup   bond0
> done

Honestly, I would suspect the problem here is not when removing the
busy interface, but when bringing the interface back up.  IIRC, the
bonding driver defaults to assuming it will be used on an Ethernet
interface.  Once you attach an IB slave, it reconfigures itself for
running over IB instead.  But once it's configured it should stay
configured until after the last IB slave is removed (and once that
slave it removed, the bond should no longer even possess the pointer to
the ipoib_hard_header routine, so we should never call it).

The process, in the bonding driver, for going down and coming back up
needs to look something like this:

ifdown bond0:
  stop all queues
  remove all slaves
  possibly reconfigure to default state which is Ethernet suitable

ifup bond0:
  add first slave
  configure for IB instead of Ethernet
  start queues
  add additional slaves

I'm wondering if, when we have a current backlog, we aren't
accidentally doing this instead:

ifup bond0:
  add first slave
  release backlog queue
  configure for IB instead of Ethernet
  add additional slaves

Or, it might even be more subtle, such as:

ifup bond0:
  add first slave
  configure for IB instead of Ethernet
  start queues
   -> however, there was a backlog item on the queue from prior to
      the first slave being added and the port configured for IB mode,
      so the backlog skb is still configured for the default Ethernet
      mode, hence the failure
  add additional slaves

Maybe the real issue is that inside the bonding driver, when we
reconfigure the queue type from IB to Ethernet or Ethernet to IB, we
need to force either a drop or a reconfiguration of any packets already
currently in our waiting backlog queue of skbs.  Paolo?

> > 
> > but I don't think it can be after the final slave has been fully
> > removed from the bond.  Paolo, what should the bond driver be doing
> > once the slaves are gone?  Wouldn't it just be dropping every skb
> > on
> > the floor without calling anyone's hard header routine?
> > 
> > > 
> > >  so it is better to immediately
> > > give up and return error.
> > > 
> > > > 
> > > > 
> > > > +       if (ret)
> > > > +               return ret;
> > > >  
> > > >         header = (struct ipoib_header *) skb_push(skb, sizeof
> > > > *header);
> > > > ---
> > > > 
> > > > Paolo
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe
> > > > linux-
> > > > rdma" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.h
> > > > tml
> > -- 
> > Doug Ledford <dledford@redhat.com>
> >     GPG KeyID: B826A3330E572FDD
> >    
> > Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
> > 2FDD
> >

Doug Ledford April 26, 2017, 1:50 p.m. UTC | #6

On Wed, 2017-04-26 at 09:48 -0400, Doug Ledford wrote:
> On Wed, 2017-04-26 at 21:33 +0800, Honggang LI wrote:
> > 
> > Yes, it is during the process of removing the final slave. The
> > reproducer looks like this:
> > 
> > ping remote_ip_over_bonding_interface &
> > while 1; do
> > 	ifdown bond0
> > 	ifup   bond0
> > done

BTW, rerunning your test as:

ping remote_ip_over_bonding_interface &
while 1; do
    echo -n "Downing interface..."
    ifdown bond0
    echo "done."
    sleep 1
    echo -n "Bringing interface up..."
    ifup bond0
    echo "done."
done

will allow you to more precisely identify whether the failure is during
the down or the up of the interface.

Honggang LI April 26, 2017, 2:11 p.m. UTC | #7

On Wed, Apr 26, 2017 at 09:50:38AM -0400, Doug Ledford wrote:
> On Wed, 2017-04-26 at 09:48 -0400, Doug Ledford wrote:
> > On Wed, 2017-04-26 at 21:33 +0800, Honggang LI wrote:
> > > 
> > > Yes, it is during the process of removing the final slave. The
> > > reproducer looks like this:
> > > 
> > > ping remote_ip_over_bonding_interface &
> > > while 1; do
> > > 	ifdown bond0
> > > 	ifup   bond0
> > > done
> 
> BTW, rerunning your test as:
> 
> ping remote_ip_over_bonding_interface &
> while 1; do
>     echo -n "Downing interface..."
>     ifdown bond0

Confirmed panic in here.

>     echo "done."
>     sleep 1
>     echo -n "Bringing interface up..."
>     ifup bond0
>     echo "done."
> done
> 
> will allow you to more precisely identify whether the failure is during
> the down or the up of the interface.
> 
> -- 
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>    
> Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Doug Ledford April 26, 2017, 2:25 p.m. UTC | #8

On Wed, 2017-04-26 at 22:11 +0800, Honggang LI wrote:
> On Wed, Apr 26, 2017 at 09:50:38AM -0400, Doug Ledford wrote:
> > 
> > On Wed, 2017-04-26 at 09:48 -0400, Doug Ledford wrote:
> > > 
> > > On Wed, 2017-04-26 at 21:33 +0800, Honggang LI wrote:
> > > > 
> > > > 
> > > > Yes, it is during the process of removing the final slave. The
> > > > reproducer looks like this:
> > > > 
> > > > ping remote_ip_over_bonding_interface &
> > > > while 1; do
> > > > 	ifdown bond0
> > > > 	ifup   bond0
> > > > done
> > 
> > BTW, rerunning your test as:
> > 
> > ping remote_ip_over_bonding_interface &
> > while 1; do
> >     echo -n "Downing interface..."
> >     ifdown bond0
> 
> Confirmed panic in here.

OK, this leads me to suspect that the bonding driver is possibly
reconfiguring the skb setup to Ethernet mode before the last slave is
fully dropped from the bond, and so the slave is seeing a packet to its
hard header routine with the wrong headroom.  I think Paolo's patch is
probably the best way to go.  The reason I say that is because we can't
definitively say that this is the only pathway we will ever see that
causes us to get a packet with Ethernet headroom on our IPoIB
interface, and Paolo's patch gives us the possibility of recovering and
sending the packet out.  Even in this case, we are downing the
interface, but it isn't down entirely yet, so sending the packet out
isn't necessarily wrong.  So a method of fix that allows us to
recover/continue seems preferable to me.  Can you test Paolo's patch
and see if it resolves the problem?

Honggang LI April 26, 2017, 2:44 p.m. UTC | #9

On Wed, Apr 26, 2017 at 10:25:04AM -0400, Doug Ledford wrote:
> On Wed, 2017-04-26 at 22:11 +0800, Honggang LI wrote:
> > On Wed, Apr 26, 2017 at 09:50:38AM -0400, Doug Ledford wrote:
> > > 
> > > On Wed, 2017-04-26 at 09:48 -0400, Doug Ledford wrote:
> > > > 
> > > > On Wed, 2017-04-26 at 21:33 +0800, Honggang LI wrote:
> > > > > 
> > > > > 
> > > > > Yes, it is during the process of removing the final slave. The
> > > > > reproducer looks like this:
> > > > > 
> > > > > ping remote_ip_over_bonding_interface &
> > > > > while 1; do
> > > > > 	ifdown bond0
> > > > > 	ifup   bond0
> > > > > done
> > > 
> > > BTW, rerunning your test as:
> > > 
> > > ping remote_ip_over_bonding_interface &
> > > while 1; do
> > >     echo -n "Downing interface..."
> > >     ifdown bond0
> > 
> > Confirmed panic in here.
> 
> OK, this leads me to suspect that the bonding driver is possibly
> reconfiguring the skb setup to Ethernet mode before the last slave is
> fully dropped from the bond, and so the slave is seeing a packet to its
> hard header routine with the wrong headroom.  I think Paolo's patch is
> probably the best way to go.  The reason I say that is because we can't
> definitively say that this is the only pathway we will ever see that
> causes us to get a packet with Ethernet headroom on our IPoIB
> interface, and Paolo's patch gives us the possibility of recovering and
> sending the packet out.  Even in this case, we are downing the
> interface, but it isn't down entirely yet, so sending the packet out

Sending packet out over a removing interface looks bad. I'd like to drop
backlog packet.

> isn't necessarily wrong.  So a method of fix that allows us to
> recover/continue seems preferable to me.  Can you test Paolo's patch
> and see if it resolves the problem?

Yes, Paolo's patch fixes the panic issue too. But it seems has side
effort. For example:

520 struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip,
.....
538                 return NULL;
539 
540         skb_reserve(skb, hlen);
541         skb_reset_network_header(skb); <-- it update skb->network_header


But skb_cow_head seems does not touch skb->network_header. I have to say
I did not test the network_header.

> 
> -- 
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>    
> Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

IB/IPoIB: Check the headroom size

Commit Message

Comments

Patch