Message ID | 20180504114921.16571-1-evgenii.smirnov@profitbricks.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Doug Ledford |
Headers | show |
On 5/4/2018 7:49 AM, Evgenii Smirnov wrote: > Currently, the validity of a path is checked only on > unicast ARP transmission. If Subnet Manager switchover happens and > some LIDs get reassigned, driver in a network that uses only IPv6 > addresses will not try to renew the path records, despite them > being marked as invalid. > > In connected mode, remote side LID change will cause send to fail, > freeing the corresponding neigh struct. Subsequent packets to this > destination will trigger allocation of a new neigh struct. > > With this patch allocation of new neigh struct will also check the > validity of the associated path and renew it if necessary. > > This, however, will not help in datagram mode, if the host > continuously sends data to the destination with invalid path. > The neigh struct alive timer will be updated, thus preventing > it from reallocation. > > Test setup consists of two target hosts and two initiator hosts, > one of the initiators is with the patch. All hosts have only IPv6 > addresses from the same subnet and initiators constantly ping targets. > In connected mode swapping the LIDs of target hosts and switching over SM > leads to the loss of connectivity for the initiator without the patch. > Initiator with the patch recovers in ~3 sec. In datagram mode initiator > with the patch is able to recover only if ping is stopped for > neigh_obsolete time. > > Signed-off-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2018-05-04 at 13:49 +0200, Evgenii Smirnov wrote: > Currently, the validity of a path is checked only on > unicast ARP transmission. If Subnet Manager switchover happens and > some LIDs get reassigned, driver in a network that uses only IPv6 > addresses will not try to renew the path records, despite them > being marked as invalid. Are we not getting a flush event in this case? If we are getting a flush event, maybe we just aren't doing a heavy enough flush? In general I don't have a problem for this patch, but I would prefer to find a solution that resolves the UD case too, and maybe that just needs to flush harder on the specific event we get when we get a new SM (it's a rereg event, yes?). > In connected mode, remote side LID change will cause send to fail, > freeing the corresponding neigh struct. Subsequent packets to this > destination will trigger allocation of a new neigh struct. > > With this patch allocation of new neigh struct will also check the > validity of the associated path and renew it if necessary. > > This, however, will not help in datagram mode, if the host > continuously sends data to the destination with invalid path. > The neigh struct alive timer will be updated, thus preventing > it from reallocation. > > Test setup consists of two target hosts and two initiator hosts, > one of the initiators is with the patch. All hosts have only IPv6 > addresses from the same subnet and initiators constantly ping targets. > In connected mode swapping the LIDs of target hosts and switching over SM > leads to the loss of connectivity for the initiator without the patch. > Initiator with the patch recovers in ~3 sec. In datagram mode initiator > with the patch is able to recover only if ping is stopped for > neigh_obsolete time. > > Signed-off-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com> > --- > drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c > index 161ba8c76285..db5762d62aea 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > @@ -963,7 +963,7 @@ static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr, > > list_add_tail(&neigh->list, &path->neigh_list); > > - if (path->ah) { > + if (path->ah && path->valid) { > kref_get(&path->ah->ref); > neigh->ah = path->ah; >
Yes, in this case we are getting CLIENT_REREGISTER event and will do light flush, which marks all path records as invalid and reconnects all multicast groups. I think it would be too much to do a normal flush, as all interfaces in the network will go down on a simple subnet manager change. Light flush seems to be suitable for this case, the problem is that validity of path records is checked only on outgoing unicast ARPs. My impression is that correct solution that also covers UD case is to check validity also on outgoing ICMP6 Neighbor Advertisement packets. I can try to prepare a patch that implements that. This patch, however, still looks useful to me, as checking path on neigh creation allows to detect such situations in connected mode faster, not waiting for ARP cache expiry time. On 09.05.2018 16:31, Doug Ledford wrote: > On Fri, 2018-05-04 at 13:49 +0200, Evgenii Smirnov wrote: >> Currently, the validity of a path is checked only on >> unicast ARP transmission. If Subnet Manager switchover happens and >> some LIDs get reassigned, driver in a network that uses only IPv6 >> addresses will not try to renew the path records, despite them >> being marked as invalid. > Are we not getting a flush event in this case? If we are getting a > flush event, maybe we just aren't doing a heavy enough flush? > > In general I don't have a problem for this patch, but I would prefer to > find a solution that resolves the UD case too, and maybe that just needs > to flush harder on the specific event we get when we get a new SM (it's > a rereg event, yes?). > >> In connected mode, remote side LID change will cause send to fail, >> freeing the corresponding neigh struct. Subsequent packets to this >> destination will trigger allocation of a new neigh struct. >> >> With this patch allocation of new neigh struct will also check the >> validity of the associated path and renew it if necessary. >> >> This, however, will not help in datagram mode, if the host >> continuously sends data to the destination with invalid path. >> The neigh struct alive timer will be updated, thus preventing >> it from reallocation. >> >> Test setup consists of two target hosts and two initiator hosts, >> one of the initiators is with the patch. All hosts have only IPv6 >> addresses from the same subnet and initiators constantly ping targets. >> In connected mode swapping the LIDs of target hosts and switching over SM >> leads to the loss of connectivity for the initiator without the patch. >> Initiator with the patch recovers in ~3 sec. In datagram mode initiator >> with the patch is able to recover only if ping is stopped for >> neigh_obsolete time. >> >> Signed-off-by: Evgenii Smirnov<evgenii.smirnov@profitbricks.com> >> --- >> drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c >> index 161ba8c76285..db5762d62aea 100644 >> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c >> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c >> @@ -963,7 +963,7 @@ static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr, >> >> list_add_tail(&neigh->list, &path->neigh_list); >> >> - if (path->ah) { >> + if (path->ah && path->valid) { >> kref_get(&path->ah->ref); >> neigh->ah = path->ah; >> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2018-05-11 at 16:12 +0200, Evgenii Smirnov wrote: > Yes, in this case we are getting CLIENT_REREGISTER event and will do > light flush, which marks all path records as invalid and reconnects all > multicast groups. I think it would be too much to do a normal flush, as > all interfaces in the network will go down on a simple subnet manager > change. > > Light flush seems to be suitable for this case, the problem is that > validity of path records is checked only on outgoing unicast ARPs. My > impression is that correct solution that also covers UD case is to check > validity also on outgoing ICMP6 Neighbor Advertisement packets. I can > try to prepare a patch that implements that. > > This patch, however, still looks useful to me, as checking path on neigh > creation allows to detect such situations in connected mode faster, not > waiting for ARP cache expiry time. I don't want us waiting on any expiry time. That's just the wrong answer here. If we know that the SM changed, and that an SM change can reder our paths invalid by reassigning LIDs, then we need to be proactive in resolving that our paths are valid. Waiting for an expiry timer is just the wrong thing to do IMO. > > On 09.05.2018 16:31, Doug Ledford wrote: > > On Fri, 2018-05-04 at 13:49 +0200, Evgenii Smirnov wrote: > > > Currently, the validity of a path is checked only on > > > unicast ARP transmission. If Subnet Manager switchover happens and > > > some LIDs get reassigned, driver in a network that uses only IPv6 > > > addresses will not try to renew the path records, despite them > > > being marked as invalid. > > > > Are we not getting a flush event in this case? If we are getting a > > flush event, maybe we just aren't doing a heavy enough flush? > > > > In general I don't have a problem for this patch, but I would prefer to > > find a solution that resolves the UD case too, and maybe that just needs > > to flush harder on the specific event we get when we get a new SM (it's > > a rereg event, yes?). > > > > > In connected mode, remote side LID change will cause send to fail, > > > freeing the corresponding neigh struct. Subsequent packets to this > > > destination will trigger allocation of a new neigh struct. > > > > > > With this patch allocation of new neigh struct will also check the > > > validity of the associated path and renew it if necessary. > > > > > > This, however, will not help in datagram mode, if the host > > > continuously sends data to the destination with invalid path. > > > The neigh struct alive timer will be updated, thus preventing > > > it from reallocation. > > > > > > Test setup consists of two target hosts and two initiator hosts, > > > one of the initiators is with the patch. All hosts have only IPv6 > > > addresses from the same subnet and initiators constantly ping targets. > > > In connected mode swapping the LIDs of target hosts and switching over SM > > > leads to the loss of connectivity for the initiator without the patch. > > > Initiator with the patch recovers in ~3 sec. In datagram mode initiator > > > with the patch is able to recover only if ping is stopped for > > > neigh_obsolete time. > > > > > > Signed-off-by: Evgenii Smirnov<evgenii.smirnov@profitbricks.com> > > > --- > > > drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c > > > index 161ba8c76285..db5762d62aea 100644 > > > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c > > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > > > @@ -963,7 +963,7 @@ static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr, > > > > > > list_add_tail(&neigh->list, &path->neigh_list); > > > > > > - if (path->ah) { > > > + if (path->ah && path->valid) { > > > kref_get(&path->ah->ref); > > > neigh->ah = path->ah; > > > > >
On Fri, 2018-05-04 at 13:49 +0200, Evgenii Smirnov wrote: > Currently, the validity of a path is checked only on > unicast ARP transmission. If Subnet Manager switchover happens and > some LIDs get reassigned, driver in a network that uses only IPv6 > addresses will not try to renew the path records, despite them > being marked as invalid. > > In connected mode, remote side LID change will cause send to fail, > freeing the corresponding neigh struct. Subsequent packets to this > destination will trigger allocation of a new neigh struct. > > With this patch allocation of new neigh struct will also check the > validity of the associated path and renew it if necessary. > > This, however, will not help in datagram mode, if the host > continuously sends data to the destination with invalid path. > The neigh struct alive timer will be updated, thus preventing > it from reallocation. > > Test setup consists of two target hosts and two initiator hosts, > one of the initiators is with the patch. All hosts have only IPv6 > addresses from the same subnet and initiators constantly ping targets. > In connected mode swapping the LIDs of target hosts and switching over SM > leads to the loss of connectivity for the initiator without the patch. > Initiator with the patch recovers in ~3 sec. In datagram mode initiator > with the patch is able to recover only if ping is stopped for > neigh_obsolete time. > > Signed-off-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com> > --- > drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c > index 161ba8c76285..db5762d62aea 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > @@ -963,7 +963,7 @@ static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr, > > list_add_tail(&neigh->list, &path->neigh_list); > > - if (path->ah) { > + if (path->ah && path->valid) { > kref_get(&path->ah->ref); > neigh->ah = path->ah; > OK, looking at this closer, how is this patch even possibly right? From what I can see, when we get a REREG or SM_CHANGE even we do a light flush. With a light flush we do this: FLUSH_LIGHT-> ipoib_mark_paths_invalid ipoib_mcast_dev_flush ipoib_flush_ah In those three things, we will mark all paths as invalid (but this does nothing to any ah's that are attached to neigh structs directly), we flush all the multicast stuff and rejoin, and then the flush_ah only collects ahs that are already expired, it doesn't mark any to be expired or mark any as invalid. Then, your patch changes neigh_add_path to check path->valid and only use an existing path if it's marked valid. OK, but that does nothing for the case where we already have a neighbor and the path is already attached to the neighbor. In that case, we fall through the switch, skip neigh_add_path entirely, and move on to the send. It seems to me that your fix only resolves the one situation where we don't have a neighbor but we have a path. It doesn't resolve the case where we have a neighbor and a path that's been marked invalid, or a neighbor with a direct ah attached to it but no path. It would seem to me that the whole mark_paths_invalid is a mostly useless function. If we need to redo the paths, then we could just as easily call flush_paths instead, which will get rid of all the paths, attached to a neighbor or not, and then we will end up looking them all up again on an as needed basis. That would fix both the neighbor present with path and neighbor absent situations. Then we would just need to resolve the neighbor present with ah situation, and for that it seems we need to modify flush_ah to actually flush the ahs instead of just collecting the already expired ones. Is there something I'm missing here (I admit it's been a while since I was mucking around in the ipoib transmit code)?
On Thu, 2018-05-17 at 16:16 -0400, Doug Ledford wrote: > On Fri, 2018-05-04 at 13:49 +0200, Evgenii Smirnov wrote: > > Currently, the validity of a path is checked only on > > unicast ARP transmission. If Subnet Manager switchover happens and > > some LIDs get reassigned, driver in a network that uses only IPv6 > > addresses will not try to renew the path records, despite them > > being marked as invalid. > > > > In connected mode, remote side LID change will cause send to fail, > > freeing the corresponding neigh struct. Subsequent packets to this > > destination will trigger allocation of a new neigh struct. > > > > With this patch allocation of new neigh struct will also check the > > validity of the associated path and renew it if necessary. > > > > This, however, will not help in datagram mode, if the host > > continuously sends data to the destination with invalid path. > > The neigh struct alive timer will be updated, thus preventing > > it from reallocation. > > > > Test setup consists of two target hosts and two initiator hosts, > > one of the initiators is with the patch. All hosts have only IPv6 > > addresses from the same subnet and initiators constantly ping targets. > > In connected mode swapping the LIDs of target hosts and switching over SM > > leads to the loss of connectivity for the initiator without the patch. > > Initiator with the patch recovers in ~3 sec. In datagram mode initiator > > with the patch is able to recover only if ping is stopped for > > neigh_obsolete time. > > > > Signed-off-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com> > > --- > > drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c > > index 161ba8c76285..db5762d62aea 100644 > > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > > @@ -963,7 +963,7 @@ static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr, > > > > list_add_tail(&neigh->list, &path->neigh_list); > > > > - if (path->ah) { > > + if (path->ah && path->valid) { > > kref_get(&path->ah->ref); > > neigh->ah = path->ah; > > > > OK, looking at this closer, how is this patch even possibly right? > > From what I can see, when we get a REREG or SM_CHANGE even we do a light > flush. With a light flush we do this: > > FLUSH_LIGHT-> > ipoib_mark_paths_invalid > ipoib_mcast_dev_flush > ipoib_flush_ah > > In those three things, we will mark all paths as invalid (but this does > nothing to any ah's that are attached to neigh structs directly), we > flush all the multicast stuff and rejoin, and then the flush_ah only > collects ahs that are already expired, it doesn't mark any to be expired > or mark any as invalid. > > Then, your patch changes neigh_add_path to check path->valid and only > use an existing path if it's marked valid. OK, but that does nothing > for the case where we already have a neighbor and the path is already > attached to the neighbor. In that case, we fall through the switch, > skip neigh_add_path entirely, and move on to the send. It seems to me > that your fix only resolves the one situation where we don't have a > neighbor but we have a path. It doesn't resolve the case where we have > a neighbor and a path that's been marked invalid, or a neighbor with a > direct ah attached to it but no path. > > It would seem to me that the whole mark_paths_invalid is a mostly > useless function. If we need to redo the paths, then we could just as > easily call flush_paths instead, which will get rid of all the paths, > attached to a neighbor or not, and then we will end up looking them all > up again on an as needed basis. That would fix both the neighbor > present with path and neighbor absent situations. Then we would just > need to resolve the neighbor present with ah situation, and for that it > seems we need to modify flush_ah to actually flush the ahs instead of > just collecting the already expired ones. > > Is there something I'm missing here (I admit it's been a while since I > was mucking around in the ipoib transmit code)? > > After spending more time staring at this, I think I see what's going on finally. In a nutshell, all neighbors have paths, all paths have an AH, and once we've established that a path is valid, we shortcut the path out of the picture and put the ah directly on the neigh via the neigh->ah pointer. The problem this creates is that when we want to mark paths invalid, we can't easily deal with the fact that the neigh is holding a direct pointer to the path's ah. So, since the validity is as much on the ah as it is on the path, move path->valid to path->ah->valid so that any time we look up a neigh struct, and neigh->ah has been set, we can check neigh->ah->valid to make sure we should be using it. Then, also, since neigh->ah is used without taking the priv->lock that we use when setting path->ah and its elements, we can't just set neigh- >ah to NULL or we can cause oopses. That means when we have the situation where neigh->ah->valid == 0, we need to refresh our path (which on path rec completion will do a swap of the old ah for a new ah), so add code to refresh the path in this situation. I'm sending a patch under separate email that I think will resolve the situation for all of the needed cases. Can you please test it and let me know if it resolves your issue (and the mentioned UD issue too if you don't mind)?
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 161ba8c76285..db5762d62aea 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -963,7 +963,7 @@ static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr, list_add_tail(&neigh->list, &path->neigh_list); - if (path->ah) { + if (path->ah && path->valid) { kref_get(&path->ah->ref); neigh->ah = path->ah;
Currently, the validity of a path is checked only on unicast ARP transmission. If Subnet Manager switchover happens and some LIDs get reassigned, driver in a network that uses only IPv6 addresses will not try to renew the path records, despite them being marked as invalid. In connected mode, remote side LID change will cause send to fail, freeing the corresponding neigh struct. Subsequent packets to this destination will trigger allocation of a new neigh struct. With this patch allocation of new neigh struct will also check the validity of the associated path and renew it if necessary. This, however, will not help in datagram mode, if the host continuously sends data to the destination with invalid path. The neigh struct alive timer will be updated, thus preventing it from reallocation. Test setup consists of two target hosts and two initiator hosts, one of the initiators is with the patch. All hosts have only IPv6 addresses from the same subnet and initiators constantly ping targets. In connected mode swapping the LIDs of target hosts and switching over SM leads to the loss of connectivity for the initiator without the patch. Initiator with the patch recovers in ~3 sec. In datagram mode initiator with the patch is able to recover only if ping is stopped for neigh_obsolete time. Signed-off-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com> --- drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)