diff mbox

[rdma-next] IB/ipoib: Fix wqe initialized param on ipoib set mode to connected

Message ID 20180315160808.GG21888@ziepe.ca (mailing list archive)
State RFC
Headers show

Commit Message

Jason Gunthorpe March 15, 2018, 4:08 p.m. UTC
On Thu, Mar 15, 2018 at 05:06:08PM +0200, Erez Shitrit wrote:

> The origin issue was that after changing to CM mode traffic might
> stopped for very long time (depends of the arp time, at least 30 sec).
> Now, if we move the line after the ipoib_flush_paths() call, the
> problem is much smaller:
>     only while ipoib_flush_paths() runs, packet that sent to CM
> connection after packet from UD/GSO will be dropped.

sure, that is prett clear, and the above would be a much better commit
message.

> The question is does this something that we really need to handle?
> the error flow in CM mode already does this (and for other more often
> error flows like RNR etc.)
> and also we are talking about a case that is unlikely in the real life
> of ipoib driver, mode changing is something that done once at the
> beginning. should we add a code for this rare case?

If you don't want to deal with it then a big comment is needed here to
explain that it is racey but we don't care for XYZ reasons.

And only do that if fixing it properly is really hard.. but it doesn't
look too hard..

What about something like this.. I didn't check too carefully but it
seems to hold the right locks:


Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Erez Shitrit March 18, 2018, 9:52 a.m. UTC | #1
On Thu, Mar 15, 2018 at 6:08 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Thu, Mar 15, 2018 at 05:06:08PM +0200, Erez Shitrit wrote:
>
>> The origin issue was that after changing to CM mode traffic might
>> stopped for very long time (depends of the arp time, at least 30 sec).
>> Now, if we move the line after the ipoib_flush_paths() call, the
>> problem is much smaller:
>>     only while ipoib_flush_paths() runs, packet that sent to CM
>> connection after packet from UD/GSO will be dropped.
>
> sure, that is prett clear, and the above would be a much better commit
> message.
>
>> The question is does this something that we really need to handle?
>> the error flow in CM mode already does this (and for other more often
>> error flows like RNR etc.)
>> and also we are talking about a case that is unlikely in the real life
>> of ipoib driver, mode changing is something that done once at the
>> beginning. should we add a code for this rare case?
>
> If you don't want to deal with it then a big comment is needed here to
> explain that it is racey but we don't care for XYZ reasons.
>
> And only do that if fixing it properly is really hard.. but it doesn't
> look too hard..
>
> What about something like this.. I didn't check too carefully but it
> seems to hold the right locks:
>
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 161ba8c76285cb..bdf7a723f52818 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -711,7 +711,7 @@ static void push_pseudo_header(struct sk_buff *skb, const char *daddr)
>         memcpy(phdr->hwaddr, daddr, INFINIBAND_ALEN);
>  }
>
> -void ipoib_flush_paths(struct net_device *dev)
> +void ipoib_flush_paths(struct net_device *dev, bool cm_mode)
>  {
>         struct ipoib_dev_priv *priv = ipoib_priv(dev);
>         struct ipoib_path *path, *tp;
> @@ -737,6 +737,20 @@ void ipoib_flush_paths(struct net_device *dev)
>                 spin_lock_irqsave(&priv->lock, flags);
>         }
>
> +       /*
> +        * Mode change is done atomically with list flushing, we stay in the
> +        * old mode until the list is empty then atomically switch modes for
> +        * new neighs from then on.
> +        */
> +       if (cm_mode) {
> +               priv->tx_wr.wr.send_flags &= ~IB_SEND_IP_CSUM;
> +               priv->tx_wr.wr.opcode = IB_WR_SEND;
> +               set_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
> +       } else {
> +               // .. fixme set tx_wr properly for !CM mode
> +               clear_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
> +       }
> +

This will work.
But, IMHO seems to me that this is not the right place, the function
is used for other flows, like taking the device down, etc. place where
the mode is not relevant.

As you suggested I will add a detailed explain for the possible race
and why we can leave with it.

>         spin_unlock_irqrestore(&priv->lock, flags);
>         netif_tx_unlock_bh(dev);
>  }
>
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe March 19, 2018, 1:50 a.m. UTC | #2
On Sun, Mar 18, 2018 at 11:52:33AM +0200, Erez Shitrit wrote:
> On Thu, Mar 15, 2018 at 6:08 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Thu, Mar 15, 2018 at 05:06:08PM +0200, Erez Shitrit wrote:
> >
> >> The origin issue was that after changing to CM mode traffic might
> >> stopped for very long time (depends of the arp time, at least 30 sec).
> >> Now, if we move the line after the ipoib_flush_paths() call, the
> >> problem is much smaller:
> >>     only while ipoib_flush_paths() runs, packet that sent to CM
> >> connection after packet from UD/GSO will be dropped.
> >
> > sure, that is prett clear, and the above would be a much better commit
> > message.
> >
> >> The question is does this something that we really need to handle?
> >> the error flow in CM mode already does this (and for other more often
> >> error flows like RNR etc.)
> >> and also we are talking about a case that is unlikely in the real life
> >> of ipoib driver, mode changing is something that done once at the
> >> beginning. should we add a code for this rare case?
> >
> > If you don't want to deal with it then a big comment is needed here to
> > explain that it is racey but we don't care for XYZ reasons.
> >
> > And only do that if fixing it properly is really hard.. but it doesn't
> > look too hard..
> >
> > What about something like this.. I didn't check too carefully but it
> > seems to hold the right locks:
> >
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > index 161ba8c76285cb..bdf7a723f52818 100644
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > @@ -711,7 +711,7 @@ static void push_pseudo_header(struct sk_buff *skb, const char *daddr)
> >         memcpy(phdr->hwaddr, daddr, INFINIBAND_ALEN);
> >  }
> >
> > -void ipoib_flush_paths(struct net_device *dev)
> > +void ipoib_flush_paths(struct net_device *dev, bool cm_mode)
> >  {
> >         struct ipoib_dev_priv *priv = ipoib_priv(dev);
> >         struct ipoib_path *path, *tp;
> > @@ -737,6 +737,20 @@ void ipoib_flush_paths(struct net_device *dev)
> >                 spin_lock_irqsave(&priv->lock, flags);
> >         }
> >
> > +       /*
> > +        * Mode change is done atomically with list flushing, we stay in the
> > +        * old mode until the list is empty then atomically switch modes for
> > +        * new neighs from then on.
> > +        */
> > +       if (cm_mode) {
> > +               priv->tx_wr.wr.send_flags &= ~IB_SEND_IP_CSUM;
> > +               priv->tx_wr.wr.opcode = IB_WR_SEND;
> > +               set_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
> > +       } else {
> > +               // .. fixme set tx_wr properly for !CM mode
> > +               clear_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
> > +       }
> > +
> 
> This will work.
> But, IMHO seems to me that this is not the right place, the function
> is used for other flows, like taking the device down, etc. place where
> the mode is not relevant.

It *IS* the right place because it is the only place that is atomic
with the list flush due to the locking.

It clearly cnanot be pulled out of that locking region correctly.

It seems fine to me to have the argument and default it to
test_bit(IPOIB_FLAG_ADMIN_CM), maybe with a helper/wrapper to do
it. Nothing about this is performance.

> As you suggested I will add a detailed explain for the possible race
> and why we can leave with it.

Well, now that we have an actually correct solution that is trivial,
let us just do it right please?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 161ba8c76285cb..bdf7a723f52818 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -711,7 +711,7 @@  static void push_pseudo_header(struct sk_buff *skb, const char *daddr)
        memcpy(phdr->hwaddr, daddr, INFINIBAND_ALEN);
 }
 
-void ipoib_flush_paths(struct net_device *dev)
+void ipoib_flush_paths(struct net_device *dev, bool cm_mode)
 {
        struct ipoib_dev_priv *priv = ipoib_priv(dev);
        struct ipoib_path *path, *tp;
@@ -737,6 +737,20 @@  void ipoib_flush_paths(struct net_device *dev)
                spin_lock_irqsave(&priv->lock, flags);
        }
 
+       /*
+        * Mode change is done atomically with list flushing, we stay in the
+        * old mode until the list is empty then atomically switch modes for
+        * new neighs from then on.
+        */
+       if (cm_mode) {
+               priv->tx_wr.wr.send_flags &= ~IB_SEND_IP_CSUM;
+               priv->tx_wr.wr.opcode = IB_WR_SEND;
+               set_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
+       } else {
+               // .. fixme set tx_wr properly for !CM mode
+               clear_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
+       }
+
        spin_unlock_irqrestore(&priv->lock, flags);
        netif_tx_unlock_bh(dev);
 }