diff mbox series

[2/2] rpcrdma: improve handling of RDMA_CM_EVENT_ADDR_CHANGE

Message ID 20240711095908.1604235-2-dan.aloni@vastdata.com (mailing list archive)
State New
Headers show
Series [1/2] rpcrdma: fix handling for RDMA_CM_EVENT_DISCONNECTED due to address change | expand

Commit Message

Dan Aloni July 11, 2024, 9:59 a.m. UTC
It would be beneficial to implement handling similar to
RDMA_CM_EVENT_DEVICE_REMOVAL in the case where RDMA_CM_EVENT_ADDR_CHANGE
is issued for an unconnected CM.

Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
---
 net/sunrpc/xprtrdma/verbs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Sagi Grimberg July 11, 2024, 10:11 a.m. UTC | #1
On 11/07/2024 12:59, Dan Aloni wrote:
> It would be beneficial to implement handling similar to
> RDMA_CM_EVENT_DEVICE_REMOVAL in the case where RDMA_CM_EVENT_ADDR_CHANGE
> is issued for an unconnected CM.

I think Chuck has a pending series for handling device removals with a 
dedicated
ib_client. So one would have to rebase against the other...

See: 
https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.gittopic-device-removal

Other than that, looks good,
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Dan Aloni July 11, 2024, 12:26 p.m. UTC | #2
On 2024-07-11 13:11:11, Sagi Grimberg wrote:
> 
> 
> On 11/07/2024 12:59, Dan Aloni wrote:
> > It would be beneficial to implement handling similar to
> > RDMA_CM_EVENT_DEVICE_REMOVAL in the case where RDMA_CM_EVENT_ADDR_CHANGE
> > is issued for an unconnected CM.
> 
> I think Chuck has a pending series for handling device removals with a
> dedicated
> ib_client. So one would have to rebase against the other...
> 
> See: https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.gittopic-device-removal
> 
> Other than that, looks good,
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

I ran some testing with `device-removal` branch (merge fe4cef916d91
v6.10-rc7), and found that the `RDMA_CM_EVENT_ADDR_CHANGE` fix is still
required and independent of the branch's content.

And for the second patch I sent, which fixes only a theoretical issue, I
think we need the `case RDMA_CM_EVENT_ADDR_CHANGE:` branch to implement
the same logic like the removed `case RDMA_CM_EVENT_DEVICE_REMOVAL:`
branch.
Sagi Grimberg July 11, 2024, 12:34 p.m. UTC | #3
On 11/07/2024 15:26, Dan Aloni wrote:
> On 2024-07-11 13:11:11, Sagi Grimberg wrote:
>>
>> On 11/07/2024 12:59, Dan Aloni wrote:
>>> It would be beneficial to implement handling similar to
>>> RDMA_CM_EVENT_DEVICE_REMOVAL in the case where RDMA_CM_EVENT_ADDR_CHANGE
>>> is issued for an unconnected CM.
>> I think Chuck has a pending series for handling device removals with a
>> dedicated
>> ib_client. So one would have to rebase against the other...
>>
>> See: https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.gittopic-device-removal
>>
>> Other than that, looks good,
>> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> I ran some testing with `device-removal` branch (merge fe4cef916d91
> v6.10-rc7), and found that the `RDMA_CM_EVENT_ADDR_CHANGE` fix is still
> required and independent of the branch's content.

Yes it isn't related, was just mentioning that Chuck is touching this 
exact area
in the code, so perhaps these patches should be against this branch (if 
indeed it is
heading upstream)? Otherwise Chuck can rebase on top of these fixes, 
which I agree
are needed.
Chuck Lever July 11, 2024, 1:31 p.m. UTC | #4
> On Jul 11, 2024, at 6:11 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
> 
> On 11/07/2024 12:59, Dan Aloni wrote:
>> It would be beneficial to implement handling similar to
>> RDMA_CM_EVENT_DEVICE_REMOVAL in the case where RDMA_CM_EVENT_ADDR_CHANGE
>> is issued for an unconnected CM.
> 
> I think Chuck has a pending series for handling device removals with a dedicated
> ib_client. So one would have to rebase against the other...
> 
> See: https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.gittopic-device-removal
> 
> Other than that, looks good,
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

FYI Anna has taken my "rpcrdma ib_client" series:

http://git.linux-nfs.org/?p=anna/linux-nfs.git;a=shortlog;h=refs/heads/linux-next

PATCH 1/2 "rpcrdma: fix handling for RDMA_CM_EVENT_DISCONNECTED
due to address change" is a fix that you probably want to make
available to the LTS kernels, so it would have to be applied
before my patches in order to make that possible. I don't know
what Anna's policy is regarding rearranging the patches in her
linux-next branch.

Quite possibly she'll just have to rip out my patches so we
can give her a fresh series with your work and mine properly
ordered.


--
Chuck Lever
diff mbox series

Patch

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index e42f5664ecaf..0342a0527733 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -244,14 +244,13 @@  rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
 	case RDMA_CM_EVENT_DEVICE_REMOVAL:
 		pr_info("rpcrdma: removing device %s for %pISpc\n",
 			ep->re_id->device->name, sap);
+		fallthrough;
+	case RDMA_CM_EVENT_ADDR_CHANGE:
 		switch (xchg(&ep->re_connect_status, -ENODEV)) {
 		case 0: goto wake_connect_worker;
 		case 1: goto disconnected;
 		}
 		return 0;
-	case RDMA_CM_EVENT_ADDR_CHANGE:
-		ep->re_connect_status = -ENODEV;
-		goto disconnected;
 	case RDMA_CM_EVENT_ESTABLISHED:
 		rpcrdma_ep_get(ep);
 		ep->re_connect_status = 1;