diff mbox

IB/ipoib: check path validity on allocation of neigh struct

Message ID 20180504114921.16571-1-evgenii.smirnov@profitbricks.com (mailing list archive)
State Superseded
Delegated to: Doug Ledford
Headers show

Commit Message

Evgenii Smirnov May 4, 2018, 11:49 a.m. UTC
Currently, the validity of a path is checked only on
unicast ARP transmission. If Subnet Manager switchover happens and
some LIDs get reassigned, driver in a network that uses only IPv6
addresses will not try to renew the path records, despite them
being marked as invalid.

In connected mode, remote side LID change will cause send to fail,
freeing the corresponding neigh struct. Subsequent packets to this
destination will trigger allocation of a new neigh struct.

With this patch allocation of new neigh struct will also check the
validity of the associated path and renew it if necessary.

This, however, will not help in datagram mode, if the host
continuously sends data to the destination with invalid path.
The neigh struct alive timer will be updated, thus preventing
it from reallocation.

Test setup consists of two target hosts and two initiator hosts,
one of the initiators is with the patch. All hosts have only IPv6
addresses from the same subnet and initiators constantly ping targets.
In connected mode swapping the LIDs of target hosts and switching over SM
leads to the loss of connectivity for the initiator without the patch.
Initiator with the patch recovers in ~3 sec. In datagram mode initiator
with the patch is able to recover only if ping is stopped for
neigh_obsolete time.

Signed-off-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Dennis Dalessandro May 9, 2018, 12:53 p.m. UTC | #1
On 5/4/2018 7:49 AM, Evgenii Smirnov wrote:
> Currently, the validity of a path is checked only on
> unicast ARP transmission. If Subnet Manager switchover happens and
> some LIDs get reassigned, driver in a network that uses only IPv6
> addresses will not try to renew the path records, despite them
> being marked as invalid.
> 
> In connected mode, remote side LID change will cause send to fail,
> freeing the corresponding neigh struct. Subsequent packets to this
> destination will trigger allocation of a new neigh struct.
> 
> With this patch allocation of new neigh struct will also check the
> validity of the associated path and renew it if necessary.
> 
> This, however, will not help in datagram mode, if the host
> continuously sends data to the destination with invalid path.
> The neigh struct alive timer will be updated, thus preventing
> it from reallocation.
> 
> Test setup consists of two target hosts and two initiator hosts,
> one of the initiators is with the patch. All hosts have only IPv6
> addresses from the same subnet and initiators constantly ping targets.
> In connected mode swapping the LIDs of target hosts and switching over SM
> leads to the loss of connectivity for the initiator without the patch.
> Initiator with the patch recovers in ~3 sec. In datagram mode initiator
> with the patch is able to recover only if ping is stopped for
> neigh_obsolete time.
> 
> Signed-off-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com>

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford May 9, 2018, 2:31 p.m. UTC | #2
On Fri, 2018-05-04 at 13:49 +0200, Evgenii Smirnov wrote:
> Currently, the validity of a path is checked only on
> unicast ARP transmission. If Subnet Manager switchover happens and
> some LIDs get reassigned, driver in a network that uses only IPv6
> addresses will not try to renew the path records, despite them
> being marked as invalid.

Are we not getting a flush event in this case?  If we are getting a
flush event, maybe we just aren't doing a heavy enough flush?

In general I don't have a problem for this patch, but I would prefer to
find a solution that resolves the UD case too, and maybe that just needs
to flush harder on the specific event we get when we get a new SM (it's
a rereg event, yes?).

> In connected mode, remote side LID change will cause send to fail,
> freeing the corresponding neigh struct. Subsequent packets to this
> destination will trigger allocation of a new neigh struct.
> 
> With this patch allocation of new neigh struct will also check the
> validity of the associated path and renew it if necessary.
> 
> This, however, will not help in datagram mode, if the host
> continuously sends data to the destination with invalid path.
> The neigh struct alive timer will be updated, thus preventing
> it from reallocation.
> 
> Test setup consists of two target hosts and two initiator hosts,
> one of the initiators is with the patch. All hosts have only IPv6
> addresses from the same subnet and initiators constantly ping targets.
> In connected mode swapping the LIDs of target hosts and switching over SM
> leads to the loss of connectivity for the initiator without the patch.
> Initiator with the patch recovers in ~3 sec. In datagram mode initiator
> with the patch is able to recover only if ping is stopped for
> neigh_obsolete time.
> 
> Signed-off-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com>
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 161ba8c76285..db5762d62aea 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -963,7 +963,7 @@ static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr,
>  
>  	list_add_tail(&neigh->list, &path->neigh_list);
>  
> -	if (path->ah) {
> +	if (path->ah && path->valid) {
>  		kref_get(&path->ah->ref);
>  		neigh->ah = path->ah;
>
Evgenii Smirnov May 11, 2018, 2:12 p.m. UTC | #3
Yes, in this case we are getting CLIENT_REREGISTER event and will do 
light flush, which marks all path records as invalid and reconnects all 
multicast groups. I think it would be too much to do a normal flush, as 
all interfaces in the network will go down on a simple subnet manager 
change.

Light flush seems to be suitable for this case, the problem is that 
validity of path records is checked only on outgoing unicast ARPs. My 
impression is that correct solution that also covers UD case is to check 
validity also on outgoing ICMP6 Neighbor Advertisement packets. I can 
try to prepare a patch that implements that.

This patch, however, still looks useful to me, as checking path on neigh 
creation allows to detect such situations in connected mode faster, not 
waiting for ARP cache expiry time.


On 09.05.2018 16:31, Doug Ledford wrote:
> On Fri, 2018-05-04 at 13:49 +0200, Evgenii Smirnov wrote:
>> Currently, the validity of a path is checked only on
>> unicast ARP transmission. If Subnet Manager switchover happens and
>> some LIDs get reassigned, driver in a network that uses only IPv6
>> addresses will not try to renew the path records, despite them
>> being marked as invalid.
> Are we not getting a flush event in this case?  If we are getting a
> flush event, maybe we just aren't doing a heavy enough flush?
>
> In general I don't have a problem for this patch, but I would prefer to
> find a solution that resolves the UD case too, and maybe that just needs
> to flush harder on the specific event we get when we get a new SM (it's
> a rereg event, yes?).
>
>> In connected mode, remote side LID change will cause send to fail,
>> freeing the corresponding neigh struct. Subsequent packets to this
>> destination will trigger allocation of a new neigh struct.
>>
>> With this patch allocation of new neigh struct will also check the
>> validity of the associated path and renew it if necessary.
>>
>> This, however, will not help in datagram mode, if the host
>> continuously sends data to the destination with invalid path.
>> The neigh struct alive timer will be updated, thus preventing
>> it from reallocation.
>>
>> Test setup consists of two target hosts and two initiator hosts,
>> one of the initiators is with the patch. All hosts have only IPv6
>> addresses from the same subnet and initiators constantly ping targets.
>> In connected mode swapping the LIDs of target hosts and switching over SM
>> leads to the loss of connectivity for the initiator without the patch.
>> Initiator with the patch recovers in ~3 sec. In datagram mode initiator
>> with the patch is able to recover only if ping is stopped for
>> neigh_obsolete time.
>>
>> Signed-off-by: Evgenii Smirnov<evgenii.smirnov@profitbricks.com>
>> ---
>>   drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
>> index 161ba8c76285..db5762d62aea 100644
>> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
>> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
>> @@ -963,7 +963,7 @@ static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr,
>>   
>>   	list_add_tail(&neigh->list, &path->neigh_list);
>>   
>> -	if (path->ah) {
>> +	if (path->ah && path->valid) {
>>   		kref_get(&path->ah->ref);
>>   		neigh->ah = path->ah;
>>   

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford May 11, 2018, 3:44 p.m. UTC | #4
On Fri, 2018-05-11 at 16:12 +0200, Evgenii Smirnov wrote:
> Yes, in this case we are getting CLIENT_REREGISTER event and will do 
> light flush, which marks all path records as invalid and reconnects all 
> multicast groups. I think it would be too much to do a normal flush, as 
> all interfaces in the network will go down on a simple subnet manager 
> change.
> 
> Light flush seems to be suitable for this case, the problem is that 
> validity of path records is checked only on outgoing unicast ARPs. My 
> impression is that correct solution that also covers UD case is to check 
> validity also on outgoing ICMP6 Neighbor Advertisement packets. I can 
> try to prepare a patch that implements that.
> 
> This patch, however, still looks useful to me, as checking path on neigh 
> creation allows to detect such situations in connected mode faster, not 
> waiting for ARP cache expiry time.

I don't want us waiting on any expiry time.  That's just the wrong
answer here.  If we know that the SM changed, and that an SM change can
reder our paths invalid by reassigning LIDs, then we need to be
proactive in resolving that our paths are valid.  Waiting for an expiry
timer is just the wrong thing to do IMO.

> 
> On 09.05.2018 16:31, Doug Ledford wrote:
> > On Fri, 2018-05-04 at 13:49 +0200, Evgenii Smirnov wrote:
> > > Currently, the validity of a path is checked only on
> > > unicast ARP transmission. If Subnet Manager switchover happens and
> > > some LIDs get reassigned, driver in a network that uses only IPv6
> > > addresses will not try to renew the path records, despite them
> > > being marked as invalid.
> > 
> > Are we not getting a flush event in this case?  If we are getting a
> > flush event, maybe we just aren't doing a heavy enough flush?
> > 
> > In general I don't have a problem for this patch, but I would prefer to
> > find a solution that resolves the UD case too, and maybe that just needs
> > to flush harder on the specific event we get when we get a new SM (it's
> > a rereg event, yes?).
> > 
> > > In connected mode, remote side LID change will cause send to fail,
> > > freeing the corresponding neigh struct. Subsequent packets to this
> > > destination will trigger allocation of a new neigh struct.
> > > 
> > > With this patch allocation of new neigh struct will also check the
> > > validity of the associated path and renew it if necessary.
> > > 
> > > This, however, will not help in datagram mode, if the host
> > > continuously sends data to the destination with invalid path.
> > > The neigh struct alive timer will be updated, thus preventing
> > > it from reallocation.
> > > 
> > > Test setup consists of two target hosts and two initiator hosts,
> > > one of the initiators is with the patch. All hosts have only IPv6
> > > addresses from the same subnet and initiators constantly ping targets.
> > > In connected mode swapping the LIDs of target hosts and switching over SM
> > > leads to the loss of connectivity for the initiator without the patch.
> > > Initiator with the patch recovers in ~3 sec. In datagram mode initiator
> > > with the patch is able to recover only if ping is stopped for
> > > neigh_obsolete time.
> > > 
> > > Signed-off-by: Evgenii Smirnov<evgenii.smirnov@profitbricks.com>
> > > ---
> > >   drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > index 161ba8c76285..db5762d62aea 100644
> > > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > @@ -963,7 +963,7 @@ static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr,
> > >   
> > >   	list_add_tail(&neigh->list, &path->neigh_list);
> > >   
> > > -	if (path->ah) {
> > > +	if (path->ah && path->valid) {
> > >   		kref_get(&path->ah->ref);
> > >   		neigh->ah = path->ah;
> > >   
> 
>
Doug Ledford May 17, 2018, 8:16 p.m. UTC | #5
On Fri, 2018-05-04 at 13:49 +0200, Evgenii Smirnov wrote:
> Currently, the validity of a path is checked only on
> unicast ARP transmission. If Subnet Manager switchover happens and
> some LIDs get reassigned, driver in a network that uses only IPv6
> addresses will not try to renew the path records, despite them
> being marked as invalid.
> 
> In connected mode, remote side LID change will cause send to fail,
> freeing the corresponding neigh struct. Subsequent packets to this
> destination will trigger allocation of a new neigh struct.
> 
> With this patch allocation of new neigh struct will also check the
> validity of the associated path and renew it if necessary.
> 
> This, however, will not help in datagram mode, if the host
> continuously sends data to the destination with invalid path.
> The neigh struct alive timer will be updated, thus preventing
> it from reallocation.
> 
> Test setup consists of two target hosts and two initiator hosts,
> one of the initiators is with the patch. All hosts have only IPv6
> addresses from the same subnet and initiators constantly ping targets.
> In connected mode swapping the LIDs of target hosts and switching over SM
> leads to the loss of connectivity for the initiator without the patch.
> Initiator with the patch recovers in ~3 sec. In datagram mode initiator
> with the patch is able to recover only if ping is stopped for
> neigh_obsolete time.
> 
> Signed-off-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com>
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 161ba8c76285..db5762d62aea 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -963,7 +963,7 @@ static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr,
>  
>  	list_add_tail(&neigh->list, &path->neigh_list);
>  
> -	if (path->ah) {
> +	if (path->ah && path->valid) {
>  		kref_get(&path->ah->ref);
>  		neigh->ah = path->ah;
>  

OK, looking at this closer, how is this patch even possibly right?

From what I can see, when we get a REREG or SM_CHANGE even we do a light
flush.  With a light flush we do this:

FLUSH_LIGHT->
        ipoib_mark_paths_invalid
        ipoib_mcast_dev_flush
        ipoib_flush_ah

In those three things, we will mark all paths as invalid (but this does
nothing to any ah's that are attached to neigh structs directly), we
flush all the multicast stuff and rejoin, and then the flush_ah only
collects ahs that are already expired, it doesn't mark any to be expired
or mark any as invalid.

Then, your patch changes neigh_add_path to check path->valid and only
use an existing path if it's marked valid.  OK, but that does nothing
for the case where we already have a neighbor and the path is already
attached to the neighbor.  In that case, we fall through the switch,
skip neigh_add_path entirely, and move on to the send.  It seems to me
that your fix only resolves the one situation where we don't have a
neighbor but we have a path.  It doesn't resolve the case where we have
a neighbor and a path that's been marked invalid, or a neighbor with a
direct ah attached to it but no path.

It would seem to me that the whole mark_paths_invalid is a mostly
useless function.  If we need to redo the paths, then we could just as
easily call flush_paths instead, which will get rid of all the paths,
attached to a neighbor or not, and then we will end up looking them all
up again on an as needed basis.  That would fix both the neighbor
present with path and neighbor absent situations.  Then we would just
need to resolve the neighbor present with ah situation, and for that it
seems we need to modify flush_ah to actually flush the ahs instead of
just collecting the already expired ones.

Is there something I'm missing here (I admit it's been a while since I
was mucking around in the ipoib transmit code)?
Doug Ledford May 18, 2018, 3:59 p.m. UTC | #6
On Thu, 2018-05-17 at 16:16 -0400, Doug Ledford wrote:
> On Fri, 2018-05-04 at 13:49 +0200, Evgenii Smirnov wrote:
> > Currently, the validity of a path is checked only on
> > unicast ARP transmission. If Subnet Manager switchover happens and
> > some LIDs get reassigned, driver in a network that uses only IPv6
> > addresses will not try to renew the path records, despite them
> > being marked as invalid.
> > 
> > In connected mode, remote side LID change will cause send to fail,
> > freeing the corresponding neigh struct. Subsequent packets to this
> > destination will trigger allocation of a new neigh struct.
> > 
> > With this patch allocation of new neigh struct will also check the
> > validity of the associated path and renew it if necessary.
> > 
> > This, however, will not help in datagram mode, if the host
> > continuously sends data to the destination with invalid path.
> > The neigh struct alive timer will be updated, thus preventing
> > it from reallocation.
> > 
> > Test setup consists of two target hosts and two initiator hosts,
> > one of the initiators is with the patch. All hosts have only IPv6
> > addresses from the same subnet and initiators constantly ping targets.
> > In connected mode swapping the LIDs of target hosts and switching over SM
> > leads to the loss of connectivity for the initiator without the patch.
> > Initiator with the patch recovers in ~3 sec. In datagram mode initiator
> > with the patch is able to recover only if ping is stopped for
> > neigh_obsolete time.
> > 
> > Signed-off-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com>
> > ---
> >  drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > index 161ba8c76285..db5762d62aea 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > @@ -963,7 +963,7 @@ static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr,
> >  
> >  	list_add_tail(&neigh->list, &path->neigh_list);
> >  
> > -	if (path->ah) {
> > +	if (path->ah && path->valid) {
> >  		kref_get(&path->ah->ref);
> >  		neigh->ah = path->ah;
> >  
> 
> OK, looking at this closer, how is this patch even possibly right?
> 
> From what I can see, when we get a REREG or SM_CHANGE even we do a light
> flush.  With a light flush we do this:
> 
> FLUSH_LIGHT->
>         ipoib_mark_paths_invalid
>         ipoib_mcast_dev_flush
>         ipoib_flush_ah
> 
> In those three things, we will mark all paths as invalid (but this does
> nothing to any ah's that are attached to neigh structs directly), we
> flush all the multicast stuff and rejoin, and then the flush_ah only
> collects ahs that are already expired, it doesn't mark any to be expired
> or mark any as invalid.
> 
> Then, your patch changes neigh_add_path to check path->valid and only
> use an existing path if it's marked valid.  OK, but that does nothing
> for the case where we already have a neighbor and the path is already
> attached to the neighbor.  In that case, we fall through the switch,
> skip neigh_add_path entirely, and move on to the send.  It seems to me
> that your fix only resolves the one situation where we don't have a
> neighbor but we have a path.  It doesn't resolve the case where we have
> a neighbor and a path that's been marked invalid, or a neighbor with a
> direct ah attached to it but no path.
> 
> It would seem to me that the whole mark_paths_invalid is a mostly
> useless function.  If we need to redo the paths, then we could just as
> easily call flush_paths instead, which will get rid of all the paths,
> attached to a neighbor or not, and then we will end up looking them all
> up again on an as needed basis.  That would fix both the neighbor
> present with path and neighbor absent situations.  Then we would just
> need to resolve the neighbor present with ah situation, and for that it
> seems we need to modify flush_ah to actually flush the ahs instead of
> just collecting the already expired ones.
> 
> Is there something I'm missing here (I admit it's been a while since I
> was mucking around in the ipoib transmit code)?
> 
> 

After spending more time staring at this, I think I see what's going on
finally.

In a nutshell, all neighbors have paths, all paths have an AH, and once
we've established that a path is valid, we shortcut the path out of the
picture and put the ah directly on the neigh via the neigh->ah pointer. 
The problem this creates is that when we want to mark paths invalid, we
can't easily deal with the fact that the neigh is holding a direct
pointer to the path's ah.

So, since the validity is as much on the ah as it is on the path, move
path->valid to path->ah->valid so that any time we look up a neigh
struct, and neigh->ah has been set, we can check neigh->ah->valid to
make sure we should be using it.

Then, also, since neigh->ah is used without taking the priv->lock that
we use when setting path->ah and its elements, we can't just set neigh-
>ah to NULL or we can cause oopses.  That means when we have the
situation where neigh->ah->valid == 0, we need to refresh our path
(which on path rec completion will do a swap of the old ah for a new
ah), so add code to refresh the path in this situation.

I'm sending a patch under separate email that I think will resolve the
situation for all of the needed cases.  Can you please test it and let
me know if it resolves your issue (and the mentioned UD issue too if you
don't mind)?
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 161ba8c76285..db5762d62aea 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -963,7 +963,7 @@  static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr,
 
 	list_add_tail(&neigh->list, &path->neigh_list);
 
-	if (path->ah) {
+	if (path->ah && path->valid) {
 		kref_get(&path->ah->ref);
 		neigh->ah = path->ah;