Message ID | 20220613230119.73475-2-hyc.lee@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] ksmbd: remove duplicate flag set in smb2_write | expand |
2022-06-14 8:01 GMT+09:00, Hyunchul Lee <hyc.lee@gmail.com>: > After a QP has been disconnected, it stays > in a timewait state for in flight packets. > After the state has completed, > RDMA_CM_EVENT_TIMEWAIT_EXIT is reported. > Disconnect on RDMA_CM_EVENT_TIMEWAIT_EXIT > so that ksmbd can restart. > > Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Thanks!
On 6/13/2022 7:01 PM, Hyunchul Lee wrote: > After a QP has been disconnected, it stays > in a timewait state for in flight packets. > After the state has completed, > RDMA_CM_EVENT_TIMEWAIT_EXIT is reported. > Disconnect on RDMA_CM_EVENT_TIMEWAIT_EXIT > so that ksmbd can restart. > > Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> > --- > fs/ksmbd/transport_rdma.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c > index d035e060c2f0..4b1a471afcd0 100644 > --- a/fs/ksmbd/transport_rdma.c > +++ b/fs/ksmbd/transport_rdma.c > @@ -1535,6 +1535,7 @@ static int smb_direct_cm_handler(struct rdma_cm_id *cm_id, > wake_up_interruptible(&t->wait_status); > break; > } > + case RDMA_CM_EVENT_TIMEWAIT_EXIT: > case RDMA_CM_EVENT_DEVICE_REMOVAL: > case RDMA_CM_EVENT_DISCONNECTED: { > t->status = SMB_DIRECT_CS_DISCONNECTED; Is this issue seen on all RDMA providers? Because I would normally expect that an RDMA_CM_EVENT_DISCONNECTED will precede the TIMEWAIT event. What scenarios have you seen this not occur? Unless ksmbd wishes to reuse its QP's, which is not currently the case (right?), there's pretty much no reason to manage QP state and hang around for TIMEWAIT. Tom.
2022년 6월 14일 (화) 오후 8:56, Tom Talpey <tom@talpey.com>님이 작성: > > > On 6/13/2022 7:01 PM, Hyunchul Lee wrote: > > After a QP has been disconnected, it stays > > in a timewait state for in flight packets. > > After the state has completed, > > RDMA_CM_EVENT_TIMEWAIT_EXIT is reported. > > Disconnect on RDMA_CM_EVENT_TIMEWAIT_EXIT > > so that ksmbd can restart. > > > > Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> > > --- > > fs/ksmbd/transport_rdma.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c > > index d035e060c2f0..4b1a471afcd0 100644 > > --- a/fs/ksmbd/transport_rdma.c > > +++ b/fs/ksmbd/transport_rdma.c > > @@ -1535,6 +1535,7 @@ static int smb_direct_cm_handler(struct rdma_cm_id *cm_id, > > wake_up_interruptible(&t->wait_status); > > break; > > } > > + case RDMA_CM_EVENT_TIMEWAIT_EXIT: > > case RDMA_CM_EVENT_DEVICE_REMOVAL: > > case RDMA_CM_EVENT_DISCONNECTED: { > > t->status = SMB_DIRECT_CS_DISCONNECTED; > > Is this issue seen on all RDMA providers? Because I would normally > expect that an RDMA_CM_EVENT_DISCONNECTED will precede the TIMEWAIT > event. What scenarios have you seen this not occur? > There was an issue that ksmbd got stuck after attempting to shutdown. We are trying to reproduce it, but we haven't reproduced it yet, but It seems to be related to the TIMEWAIT event. And other drivers such as nvme have disconnected on the TIMEWAIT event. > Unless ksmbd wishes to reuse its QP's, which is not currently the > case (right?), there's pretty much no reason to manage QP state and > hang around for TIMEWAIT. Right, ksmbd doesn't reuse QP. > > Tom.
On 6/14/2022 10:14 PM, Hyunchul Lee wrote: > 2022년 6월 14일 (화) 오후 8:56, Tom Talpey <tom@talpey.com>님이 작성: >> >> >> On 6/13/2022 7:01 PM, Hyunchul Lee wrote: >>> After a QP has been disconnected, it stays >>> in a timewait state for in flight packets. >>> After the state has completed, >>> RDMA_CM_EVENT_TIMEWAIT_EXIT is reported. >>> Disconnect on RDMA_CM_EVENT_TIMEWAIT_EXIT >>> so that ksmbd can restart. >>> >>> Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> >>> --- >>> fs/ksmbd/transport_rdma.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c >>> index d035e060c2f0..4b1a471afcd0 100644 >>> --- a/fs/ksmbd/transport_rdma.c >>> +++ b/fs/ksmbd/transport_rdma.c >>> @@ -1535,6 +1535,7 @@ static int smb_direct_cm_handler(struct rdma_cm_id *cm_id, >>> wake_up_interruptible(&t->wait_status); >>> break; >>> } >>> + case RDMA_CM_EVENT_TIMEWAIT_EXIT: >>> case RDMA_CM_EVENT_DEVICE_REMOVAL: >>> case RDMA_CM_EVENT_DISCONNECTED: { >>> t->status = SMB_DIRECT_CS_DISCONNECTED; >> >> Is this issue seen on all RDMA providers? Because I would normally >> expect that an RDMA_CM_EVENT_DISCONNECTED will precede the TIMEWAIT >> event. What scenarios have you seen this not occur? >> > > There was an issue that ksmbd got stuck after attempting to shutdown. > We are trying to reproduce it, but we haven't reproduced it yet, > but It seems to be related to the TIMEWAIT event. I don't think it's appropriate to add this case to SMB. I think it's quite unlikely that it will address anything, because an RDMA provider must have indicated a CM_EVENT_DISCONNECTED prior to any TIMEWAIT. So, the QP (and connection) will already have been torn down by ksmbd at the earlier event. Perhaps ksmbd did not properly drain the QP at the initial disconnect. > And other drivers such as nvme have disconnected on the TIMEWAIT event. NVME is a completely different upper layer, and has different client/ server transport behavior. The SMB session insulates its peers from most transport errors, and should not be requesting timewait for its connections, and definitely not waiting for timewait to expire before initiating teardown (or recovery). The NFS/RDMA client and server ignore this event, btw. >> Unless ksmbd wishes to reuse its QP's, which is not currently the >> case (right?), there's pretty much no reason to manage QP state and >> hang around for TIMEWAIT. > > Right, ksmbd doesn't reuse QP. Then there appears to be no good justification for the change. Sorry, but it's a NAK from me. Tom.
2022년 6월 16일 (목) 오전 3:53, Tom Talpey <tom@talpey.com>님이 작성: > > > On 6/14/2022 10:14 PM, Hyunchul Lee wrote: > > 2022년 6월 14일 (화) 오후 8:56, Tom Talpey <tom@talpey.com>님이 작성: > >> > >> > >> On 6/13/2022 7:01 PM, Hyunchul Lee wrote: > >>> After a QP has been disconnected, it stays > >>> in a timewait state for in flight packets. > >>> After the state has completed, > >>> RDMA_CM_EVENT_TIMEWAIT_EXIT is reported. > >>> Disconnect on RDMA_CM_EVENT_TIMEWAIT_EXIT > >>> so that ksmbd can restart. > >>> > >>> Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> > >>> --- > >>> fs/ksmbd/transport_rdma.c | 1 + > >>> 1 file changed, 1 insertion(+) > >>> > >>> diff --git a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c > >>> index d035e060c2f0..4b1a471afcd0 100644 > >>> --- a/fs/ksmbd/transport_rdma.c > >>> +++ b/fs/ksmbd/transport_rdma.c > >>> @@ -1535,6 +1535,7 @@ static int smb_direct_cm_handler(struct rdma_cm_id *cm_id, > >>> wake_up_interruptible(&t->wait_status); > >>> break; > >>> } > >>> + case RDMA_CM_EVENT_TIMEWAIT_EXIT: > >>> case RDMA_CM_EVENT_DEVICE_REMOVAL: > >>> case RDMA_CM_EVENT_DISCONNECTED: { > >>> t->status = SMB_DIRECT_CS_DISCONNECTED; > >> > >> Is this issue seen on all RDMA providers? Because I would normally > >> expect that an RDMA_CM_EVENT_DISCONNECTED will precede the TIMEWAIT > >> event. What scenarios have you seen this not occur? > >> > > > > There was an issue that ksmbd got stuck after attempting to shutdown. > > We are trying to reproduce it, but we haven't reproduced it yet, > > but It seems to be related to the TIMEWAIT event. > > I don't think it's appropriate to add this case to SMB. I think it's > quite unlikely that it will address anything, because an RDMA provider > must have indicated a CM_EVENT_DISCONNECTED prior to any TIMEWAIT. > So, the QP (and connection) will already have been torn down by ksmbd > at the earlier event. Perhaps ksmbd did not properly drain the QP at > the initial disconnect. > > > And other drivers such as nvme have disconnected on the TIMEWAIT event. > > NVME is a completely different upper layer, and has different client/ > server transport behavior. The SMB session insulates its peers from > most transport errors, and should not be requesting timewait for > its connections, and definitely not waiting for timewait to expire > before initiating teardown (or recovery). The NFS/RDMA client and > server ignore this event, btw. > Okay, I got it. I am looking for the cause and have found some clues. > >> Unless ksmbd wishes to reuse its QP's, which is not currently the > >> case (right?), there's pretty much no reason to manage QP state and > >> hang around for TIMEWAIT. > > > > Right, ksmbd doesn't reuse QP. > > Then there appears to be no good justification for the change. Sorry, > but it's a NAK from me. > Really thank you for the detailed explanation. > Tom.
diff --git a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c index d035e060c2f0..4b1a471afcd0 100644 --- a/fs/ksmbd/transport_rdma.c +++ b/fs/ksmbd/transport_rdma.c @@ -1535,6 +1535,7 @@ static int smb_direct_cm_handler(struct rdma_cm_id *cm_id, wake_up_interruptible(&t->wait_status); break; } + case RDMA_CM_EVENT_TIMEWAIT_EXIT: case RDMA_CM_EVENT_DEVICE_REMOVAL: case RDMA_CM_EVENT_DISCONNECTED: { t->status = SMB_DIRECT_CS_DISCONNECTED;
After a QP has been disconnected, it stays in a timewait state for in flight packets. After the state has completed, RDMA_CM_EVENT_TIMEWAIT_EXIT is reported. Disconnect on RDMA_CM_EVENT_TIMEWAIT_EXIT so that ksmbd can restart. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> --- fs/ksmbd/transport_rdma.c | 1 + 1 file changed, 1 insertion(+)