diff mbox

[for-next,1/2] xprtrdma: take reference of rdma provider module

Message ID 3e39e90f-7095-4eb9-a844-516672a355ad@CMEXHTCAS2.ad.emulex.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Devesh Sharma July 17, 2014, 2:01 p.m. UTC
If verndor driver is attempted for removal while xprtrdma still has an
active mount, the removal of driver may never complete and can cause
unseen races or in worst case system crash.

To solve this, xprtrdma module should get reference of struct ib_device
structure for every mount. Reference is taken after local device address
resolution is completed successfuly.

reference to the struct ib_device pointer is put just before cm_id destruction.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
---
 net/sunrpc/xprtrdma/verbs.c     |   17 +++++++++++++++--
 net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
 2 files changed, 16 insertions(+), 2 deletions(-)

Comments

Steve Wise July 17, 2014, 3:01 p.m. UTC | #1
On 7/17/2014 9:01 AM, Devesh Sharma wrote:
> If verndor driver is attempted for removal while xprtrdma still has an
> active mount, the removal of driver may never complete and can cause
> unseen races or in worst case system crash.
>
> To solve this, xprtrdma module should get reference of struct ib_device
> structure for every mount. Reference is taken after local device address
> resolution is completed successfuly.
>
> reference to the struct ib_device pointer is put just before cm_id destruction.
>
> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>

This seems like an issue with the rdma-cm or rdma core, not xprtrdma.  I 
see that user rdma applications cause a ref on the provider module here 
in ib_uverbs_open():

         if (!try_module_get(dev->ib_dev->owner)) {
                 ret = -ENODEV;
                 goto err;


Maybe kernel applications that allocate device resources should cause a 
ref on the provider's module.

Sean/Roland,  is there some history here as to how rdma provider module 
removal should be handled?

Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever III July 17, 2014, 3:05 p.m. UTC | #2
On Jul 17, 2014, at 11:01 AM, Steve Wise <swise@opengridcomputing.com> wrote:

> 
> 
> On 7/17/2014 9:01 AM, Devesh Sharma wrote:
>> If verndor driver is attempted for removal while xprtrdma still has an
>> active mount, the removal of driver may never complete and can cause
>> unseen races or in worst case system crash.
>> 
>> To solve this, xprtrdma module should get reference of struct ib_device
>> structure for every mount. Reference is taken after local device address
>> resolution is completed successfuly.
>> 
>> reference to the struct ib_device pointer is put just before cm_id destruction.
>> 
>> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
> 
> This seems like an issue with the rdma-cm or rdma core, not xprtrdma.

Good point. I was wondering if svcrdma might have a similar issue.

Infrequently I see a hang during shutdown that appears to be related to
the ordering of removal of one of the RDMA modules, but I’ve never had
time to chase it.

See also: https://bugzilla.linux-nfs.org/show_bug.cgi?id=252


> I see that user rdma applications cause a ref on the provider module here in ib_uverbs_open():
> 
>        if (!try_module_get(dev->ib_dev->owner)) {
>                ret = -ENODEV;
>                goto err;
> 
> 
> Maybe kernel applications that allocate device resources should cause a ref on the provider's module.
> 
> Sean/Roland,  is there some history here as to how rdma provider module removal should be handled?
> 
> Steve.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Devesh Sharma July 17, 2014, 3:20 p.m. UTC | #3

Devesh Sharma July 17, 2014, 3:31 p.m. UTC | #4
Replying again, due to mail formatting issue:

> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> Sent: Thursday, July 17, 2014 8:36 PM
> To: Steve Wise
> Cc: Devesh Sharma; Roland Dreier; Hefty, Sean; linux-rdma@vger.kernel.org
> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
> module
> 
> 
> On Jul 17, 2014, at 11:01 AM, Steve Wise <swise@opengridcomputing.com>
> wrote:
> 
> >
> >
> > On 7/17/2014 9:01 AM, Devesh Sharma wrote:
> >> If verndor driver is attempted for removal while xprtrdma still has
> >> an active mount, the removal of driver may never complete and can
> >> cause unseen races or in worst case system crash.
> >>
> >> To solve this, xprtrdma module should get reference of struct
> >> ib_device structure for every mount. Reference is taken after local
> >> device address resolution is completed successfuly.
> >>
> >> reference to the struct ib_device pointer is put just before cm_id
> destruction.
> >>
> >> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
> >
> > This seems like an issue with the rdma-cm or rdma core, not xprtrdma.
> 
> Good point. I was wondering if svcrdma might have a similar issue.
> 
> Infrequently I see a hang during shutdown that appears to be related to the
> ordering of removal of one of the RDMA modules, but I've never had time to
> chase it.
> 
> See also: https://bugzilla.linux-nfs.org/show_bug.cgi?id=252
> 
> 
> > I see that user rdma applications cause a ref on the provider module here
> in ib_uverbs_open():
> >
> >        if (!try_module_get(dev->ib_dev->owner)) {
> >                ret = -ENODEV;
> >                goto err;
> >
> >
> > Maybe kernel applications that allocate device resources should cause a ref
> on the provider's module.

Yes, To me it looks like the right place to get reference of provider driver is rdma_resolve_addr()-->acuire_ib_dev(). However this patch provides a workaround.

Let's wait for Roland/Sean's input.

> >
> > Sean/Roland,  is there some history here as to how rdma provider module
> removal should be handled?
> >
> > Steve.
> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean July 17, 2014, 4:06 p.m. UTC | #5
> On 7/17/2014 9:01 AM, Devesh Sharma wrote:
> > If verndor driver is attempted for removal while xprtrdma still has an
> > active mount, the removal of driver may never complete and can cause
> > unseen races or in worst case system crash.
> >
> > To solve this, xprtrdma module should get reference of struct ib_device
> > structure for every mount. Reference is taken after local device address
> > resolution is completed successfuly.
> >
> > reference to the struct ib_device pointer is put just before cm_id
> destruction.
> >
> > Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
> 
> This seems like an issue with the rdma-cm or rdma core, not xprtrdma.  I
> see that user rdma applications cause a ref on the provider module here
> in ib_uverbs_open():
> 
>          if (!try_module_get(dev->ib_dev->owner)) {
>                  ret = -ENODEV;
>                  goto err;
> 
> 
> Maybe kernel applications that allocate device resources should cause a
> ref on the provider's module.
> 
> Sean/Roland,  is there some history here as to how rdma provider module
> removal should be handled?

The kernel modules should are not expected to access the rdma devices after their remove device callback has been invoked.  The rdma cm basically forwards the device removal on a per id basis.  Apps are expected to destroy the id after receiving that callback.  The rdma cm should block in the remove device call until all id's associated with the removed device have been destroyed.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shirley Ma July 17, 2014, 6:57 p.m. UTC | #6
On 07/17/2014 09:06 AM, Hefty, Sean wrote:
>> On 7/17/2014 9:01 AM, Devesh Sharma wrote:
>>> If verndor driver is attempted for removal while xprtrdma still has an
>>> active mount, the removal of driver may never complete and can cause
>>> unseen races or in worst case system crash.
>>>
>>> To solve this, xprtrdma module should get reference of struct ib_device
>>> structure for every mount. Reference is taken after local device address
>>> resolution is completed successfuly.
>>>
>>> reference to the struct ib_device pointer is put just before cm_id
>> destruction.
>>>
>>> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
>>
>> This seems like an issue with the rdma-cm or rdma core, not xprtrdma.  I
>> see that user rdma applications cause a ref on the provider module here
>> in ib_uverbs_open():
>>
>>          if (!try_module_get(dev->ib_dev->owner)) {
>>                  ret = -ENODEV;
>>                  goto err;
>>
>>
>> Maybe kernel applications that allocate device resources should cause a
>> ref on the provider's module.
>>
>> Sean/Roland,  is there some history here as to how rdma provider module
>> removal should be handled?
> 
> The kernel modules should are not expected to access the rdma devices after their remove device callback has been invoked.  The rdma cm basically forwards the device removal on a per id basis.  Apps are expected to destroy the id after receiving that callback.  The rdma cm should block in the remove device call until all id's associated with the removed device have been destroyed.

So the rdma cm is expected to increase the driver reference count (try_module_get) for each new cm id, then deference count (module_put) when cm id is destroyed?

> - Sean
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise July 17, 2014, 7:07 p.m. UTC | #7
> -----Original Message-----
> From: Shirley Ma [mailto:shirley.ma@oracle.com]
> Sent: Thursday, July 17, 2014 1:58 PM
> To: Hefty, Sean; Steve Wise; Devesh Sharma; Roland Dreier
> Cc: linux-rdma@vger.kernel.org; chuck.lever@oracle.com
> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider module
> 
> 
> 
> On 07/17/2014 09:06 AM, Hefty, Sean wrote:
> >> On 7/17/2014 9:01 AM, Devesh Sharma wrote:
> >>> If verndor driver is attempted for removal while xprtrdma still has an
> >>> active mount, the removal of driver may never complete and can cause
> >>> unseen races or in worst case system crash.
> >>>
> >>> To solve this, xprtrdma module should get reference of struct ib_device
> >>> structure for every mount. Reference is taken after local device address
> >>> resolution is completed successfuly.
> >>>
> >>> reference to the struct ib_device pointer is put just before cm_id
> >> destruction.
> >>>
> >>> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
> >>
> >> This seems like an issue with the rdma-cm or rdma core, not xprtrdma.  I
> >> see that user rdma applications cause a ref on the provider module here
> >> in ib_uverbs_open():
> >>
> >>          if (!try_module_get(dev->ib_dev->owner)) {
> >>                  ret = -ENODEV;
> >>                  goto err;
> >>
> >>
> >> Maybe kernel applications that allocate device resources should cause a
> >> ref on the provider's module.
> >>
> >> Sean/Roland,  is there some history here as to how rdma provider module
> >> removal should be handled?
> >
> > The kernel modules should are not expected to access the rdma devices after their
> remove device callback has been invoked.  The rdma cm basically forwards the device
> removal on a per id basis.  Apps are expected to destroy the id after receiving that
callback.
> The rdma cm should block in the remove device call until all id's associated with the
> removed device have been destroyed.
> 
> So the rdma cm is expected to increase the driver reference count (try_module_get) for
> each new cm id, then deference count (module_put) when cm id is destroyed?
> 

No, I think he's saying the rdma-cm posts a RDMA_CM_DEVICE_REMOVAL event  to each
application with rdmacm objects allocated, and each application is expected to destroy all
the objects it has allocated before returning from the event handler.

And I think the ib_verbs core calls each ib_client's remove handler when an rdma provider
unregisters with the core.  

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean July 17, 2014, 7:50 p.m. UTC | #8
> > So the rdma cm is expected to increase the driver reference count
> (try_module_get) for
> > each new cm id, then deference count (module_put) when cm id is
> destroyed?
> >
> 
> No, I think he's saying the rdma-cm posts a RDMA_CM_DEVICE_REMOVAL event
> to each
> application with rdmacm objects allocated, and each application is expected
> to destroy all
> the objects it has allocated before returning from the event handler.

This is almost correct.  The applications do not have to destroy all the objects that it has allocated before returning from their event handler.  E.g. an app can queue a work item that does the destruction.  The rdmacm will block in its ib_client remove handler until all relevant rdma_cm_id's have been destroyed.

> And I think the ib_verbs core calls each ib_client's remove handler when an
> rdma provider
> unregisters with the core.

yes
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise July 17, 2014, 7:55 p.m. UTC | #9
> -----Original Message-----
> From: Hefty, Sean [mailto:sean.hefty@intel.com]
> Sent: Thursday, July 17, 2014 2:50 PM
> To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
> Cc: linux-rdma@vger.kernel.org; chuck.lever@oracle.com
> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
> 
> > > So the rdma cm is expected to increase the driver reference count
> > (try_module_get) for
> > > each new cm id, then deference count (module_put) when cm id is
> > destroyed?
> > >
> >
> > No, I think he's saying the rdma-cm posts a RDMA_CM_DEVICE_REMOVAL event
> > to each
> > application with rdmacm objects allocated, and each application is expected
> > to destroy all
> > the objects it has allocated before returning from the event handler.
> 
> This is almost correct.  The applications do not have to destroy all the objects that it
has
> allocated before returning from their event handler.  E.g. an app can queue a work item
> that does the destruction.  The rdmacm will block in its ib_client remove handler until
all
> relevant rdma_cm_id's have been destroyed.
> 

Thanks for the clarification.  

> > And I think the ib_verbs core calls each ib_client's remove handler when an
> > rdma provider
> > unregisters with the core.
> 
> yes

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise July 17, 2014, 8:08 p.m. UTC | #10
> -----Original Message-----
> From: Steve Wise [mailto:swise@opengridcomputing.com]
> Sent: Thursday, July 17, 2014 2:56 PM
> To: 'Hefty, Sean'; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
> Cc: 'linux-rdma@vger.kernel.org'; 'chuck.lever@oracle.com'
> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
> 
> 
> 
> > -----Original Message-----
> > From: Hefty, Sean [mailto:sean.hefty@intel.com]
> > Sent: Thursday, July 17, 2014 2:50 PM
> > To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
> > Cc: linux-rdma@vger.kernel.org; chuck.lever@oracle.com
> > Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
> >
> > > > So the rdma cm is expected to increase the driver reference count
> > > (try_module_get) for
> > > > each new cm id, then deference count (module_put) when cm id is
> > > destroyed?
> > > >
> > >
> > > No, I think he's saying the rdma-cm posts a RDMA_CM_DEVICE_REMOVAL event
> > > to each
> > > application with rdmacm objects allocated, and each application is expected
> > > to destroy all
> > > the objects it has allocated before returning from the event handler.
> >
> > This is almost correct.  The applications do not have to destroy all the objects that
it has
> > allocated before returning from their event handler.  E.g. an app can queue a work
item
> > that does the destruction.  The rdmacm will block in its ib_client remove handler
until all
> > relevant rdma_cm_id's have been destroyed.
> >
> 
> Thanks for the clarification.
> 

And looking at xprtrdma, it does handle the DEVICE_REMOVAL event in rpcrdma_conn_upcall().
It sets ep->rep_connected to -ENODEV, wakes everybody up, and calls rpcrdma_conn_func()
for that endpoint, which schedules rep_connect_worker...  and I gave up following the code
path at this point... :)  

For this to all work correctly, it would need to destroy all the QPs, MRs, CQs, etc for
that device _before_ destroying the rdma cm ids.  Otherwise the provider module could be
unloaded too soon...

Steve.




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shirley Ma July 17, 2014, 8:23 p.m. UTC | #11
On 07/17/2014 12:55 PM, Steve Wise wrote:
> 
> 
>> -----Original Message-----
>> From: Hefty, Sean [mailto:sean.hefty@intel.com]
>> Sent: Thursday, July 17, 2014 2:50 PM
>> To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
>> Cc: linux-rdma@vger.kernel.org; chuck.lever@oracle.com
>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
>>
>>>> So the rdma cm is expected to increase the driver reference count
>>> (try_module_get) for
>>>> each new cm id, then deference count (module_put) when cm id is
>>> destroyed?
>>>>
>>>
>>> No, I think he's saying the rdma-cm posts a RDMA_CM_DEVICE_REMOVAL event
>>> to each
>>> application with rdmacm objects allocated, and each application is expected
>>> to destroy all
>>> the objects it has allocated before returning from the event handler.
>>
>> This is almost correct.  The applications do not have to destroy all the objects that it
> has
>> allocated before returning from their event handler.  E.g. an app can queue a work item
>> that does the destruction.  The rdmacm will block in its ib_client remove handler until
> all
>> relevant rdma_cm_id's have been destroyed.
>>
> 
> Thanks for the clarification.  

Thanks, checked the cma code, the reference count maintains there.

>>> And I think the ib_verbs core calls each ib_client's remove handler when an
>>> rdma provider
>>> unregisters with the core.
>>
>> yes
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever III July 17, 2014, 8:41 p.m. UTC | #12
On Jul 17, 2014, at 4:08 PM, Steve Wise <swise@opengridcomputing.com> wrote:

> 
> 
>> -----Original Message-----
>> From: Steve Wise [mailto:swise@opengridcomputing.com]
>> Sent: Thursday, July 17, 2014 2:56 PM
>> To: 'Hefty, Sean'; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
>> Cc: 'linux-rdma@vger.kernel.org'; 'chuck.lever@oracle.com'
>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
>> 
>> 
>> 
>>> -----Original Message-----
>>> From: Hefty, Sean [mailto:sean.hefty@intel.com]
>>> Sent: Thursday, July 17, 2014 2:50 PM
>>> To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
>>> Cc: linux-rdma@vger.kernel.org; chuck.lever@oracle.com
>>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
>>> 
>>>>> So the rdma cm is expected to increase the driver reference count
>>>> (try_module_get) for
>>>>> each new cm id, then deference count (module_put) when cm id is
>>>> destroyed?
>>>>> 
>>>> 
>>>> No, I think he's saying the rdma-cm posts a RDMA_CM_DEVICE_REMOVAL event
>>>> to each
>>>> application with rdmacm objects allocated, and each application is expected
>>>> to destroy all
>>>> the objects it has allocated before returning from the event handler.
>>> 
>>> This is almost correct.  The applications do not have to destroy all the objects that
> it has
>>> allocated before returning from their event handler.  E.g. an app can queue a work
> item
>>> that does the destruction.  The rdmacm will block in its ib_client remove handler
> until all
>>> relevant rdma_cm_id's have been destroyed.
>>> 
>> 
>> Thanks for the clarification.
>> 
> 
> And looking at xprtrdma, it does handle the DEVICE_REMOVAL event in rpcrdma_conn_upcall().
> It sets ep->rep_connected to -ENODEV, wakes everybody up, and calls rpcrdma_conn_func()
> for that endpoint, which schedules rep_connect_worker...  and I gave up following the code
> path at this point... :)  
> 
> For this to all work correctly, it would need to destroy all the QPs, MRs, CQs, etc for
> that device _before_ destroying the rdma cm ids.  Otherwise the provider module could be
> unloaded too soon…

We can’t really deal with a CM_DEVICE_REMOVE event while there are active
NFS mounts.

System shutdown ordering should guarantee (one would hope) that NFS
mount points are unmounted before the RDMA/IB core infrastructure is
torn down. Ordering shouldn’t matter as long all NFS activity has
ceased before the CM tries to remove the device.

So if something is hanging up the CM, there’s something xprtrdma is not
cleaning up properly.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise July 17, 2014, 8:59 p.m. UTC | #13
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-owner@vger.kernel.org] On
> Behalf Of Chuck Lever
> Sent: Thursday, July 17, 2014 3:42 PM
> To: Steve Wise
> Cc: Hefty, Sean; Shirley Ma; Devesh Sharma; Roland Dreier; linux-rdma@vger.kernel.org
> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider module
> 
> 
> On Jul 17, 2014, at 4:08 PM, Steve Wise <swise@opengridcomputing.com> wrote:
> 
> >
> >
> >> -----Original Message-----
> >> From: Steve Wise [mailto:swise@opengridcomputing.com]
> >> Sent: Thursday, July 17, 2014 2:56 PM
> >> To: 'Hefty, Sean'; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
> >> Cc: 'linux-rdma@vger.kernel.org'; 'chuck.lever@oracle.com'
> >> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Hefty, Sean [mailto:sean.hefty@intel.com]
> >>> Sent: Thursday, July 17, 2014 2:50 PM
> >>> To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
> >>> Cc: linux-rdma@vger.kernel.org; chuck.lever@oracle.com
> >>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
> >>>
> >>>>> So the rdma cm is expected to increase the driver reference count
> >>>> (try_module_get) for
> >>>>> each new cm id, then deference count (module_put) when cm id is
> >>>> destroyed?
> >>>>>
> >>>>
> >>>> No, I think he's saying the rdma-cm posts a RDMA_CM_DEVICE_REMOVAL event
> >>>> to each
> >>>> application with rdmacm objects allocated, and each application is expected
> >>>> to destroy all
> >>>> the objects it has allocated before returning from the event handler.
> >>>
> >>> This is almost correct.  The applications do not have to destroy all the objects
that
> > it has
> >>> allocated before returning from their event handler.  E.g. an app can queue a work
> > item
> >>> that does the destruction.  The rdmacm will block in its ib_client remove handler
> > until all
> >>> relevant rdma_cm_id's have been destroyed.
> >>>
> >>
> >> Thanks for the clarification.
> >>
> >
> > And looking at xprtrdma, it does handle the DEVICE_REMOVAL event in
> rpcrdma_conn_upcall().
> > It sets ep->rep_connected to -ENODEV, wakes everybody up, and calls
> rpcrdma_conn_func()
> > for that endpoint, which schedules rep_connect_worker...  and I gave up following the
> code
> > path at this point... :)
> >
> > For this to all work correctly, it would need to destroy all the QPs, MRs, CQs, etc
for
> > that device _before_ destroying the rdma cm ids.  Otherwise the provider module could
> be
> > unloaded too soon.
> 
> We can't really deal with a CM_DEVICE_REMOVE event while there are active
> NFS mounts.
> 
> System shutdown ordering should guarantee (one would hope) that NFS
> mount points are unmounted before the RDMA/IB core infrastructure is
> torn down. Ordering shouldn't matter as long all NFS activity has
> ceased before the CM tries to remove the device.
> 
> So if something is hanging up the CM, there's something xprtrdma is not
> cleaning up properly.
> 


Devesh, how are you reproducing this?  Are you just rmmod'ing the ocrdma module while
there are active mounts?






--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shirley Ma July 17, 2014, 9:25 p.m. UTC | #14
On 07/17/2014 01:41 PM, Chuck Lever wrote:
> On Jul 17, 2014, at 4:08 PM, Steve Wise <swise@opengridcomputing.com> wrote:
> 
>> > 
>> > 
>>> >> -----Original Message-----
>>> >> From: Steve Wise [mailto:swise@opengridcomputing.com]
>>> >> Sent: Thursday, July 17, 2014 2:56 PM
>>> >> To: 'Hefty, Sean'; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
>>> >> Cc: 'linux-rdma@vger.kernel.org'; 'chuck.lever@oracle.com'
>>> >> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
>>> >> 
>>> >> 
>>> >> 
>>>> >>> -----Original Message-----
>>>> >>> From: Hefty, Sean [mailto:sean.hefty@intel.com]
>>>> >>> Sent: Thursday, July 17, 2014 2:50 PM
>>>> >>> To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
>>>> >>> Cc: linux-rdma@vger.kernel.org; chuck.lever@oracle.com
>>>> >>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
>>>> >>> 
>>>>>> >>>>> So the rdma cm is expected to increase the driver reference count
>>>>> >>>> (try_module_get) for
>>>>>> >>>>> each new cm id, then deference count (module_put) when cm id is
>>>>> >>>> destroyed?
>>>>>> >>>>> 
>>>>> >>>> 
>>>>> >>>> No, I think he's saying the rdma-cm posts a RDMA_CM_DEVICE_REMOVAL event
>>>>> >>>> to each
>>>>> >>>> application with rdmacm objects allocated, and each application is expected
>>>>> >>>> to destroy all
>>>>> >>>> the objects it has allocated before returning from the event handler.
>>>> >>> 
>>>> >>> This is almost correct.  The applications do not have to destroy all the objects that
>> > it has
>>>> >>> allocated before returning from their event handler.  E.g. an app can queue a work
>> > item
>>>> >>> that does the destruction.  The rdmacm will block in its ib_client remove handler
>> > until all
>>>> >>> relevant rdma_cm_id's have been destroyed.
>>>> >>> 
>>> >> 
>>> >> Thanks for the clarification.
>>> >> 
>> > 
>> > And looking at xprtrdma, it does handle the DEVICE_REMOVAL event in rpcrdma_conn_upcall().
>> > It sets ep->rep_connected to -ENODEV, wakes everybody up, and calls rpcrdma_conn_func()
>> > for that endpoint, which schedules rep_connect_worker...  and I gave up following the code
>> > path at this point... :)  
>> > 
>> > For this to all work correctly, it would need to destroy all the QPs, MRs, CQs, etc for
>> > that device _before_ destroying the rdma cm ids.  Otherwise the provider module could be
>> > unloaded too soon…
> We can’t really deal with a CM_DEVICE_REMOVE event while there are active
> NFS mounts.
> 
> System shutdown ordering should guarantee (one would hope) that NFS
> mount points are unmounted before the RDMA/IB core infrastructure is
> torn down. Ordering shouldn’t matter as long all NFS activity has
> ceased before the CM tries to remove the device.
> 
> So if something is hanging up the CM, there’s something xprtrdma is not
> cleaning up properly.

I saw a problem once, restart the system without umounting the NFS. CM was hung on waiting for completion. It looks like a  bug in xprtrdma cleanup up. I couldn't reproduce it.

Call Trace:
 [<ffffffff815c9aa9>] schedule+0x29/0x70
 [<ffffffff815c8d35>] schedule_timeout+0x165/0x200
 [<ffffffff815ca9ff>] ? wait_for_completion+0xcf/0x110
 [<ffffffff810a708e>] ? __lock_release+0x9e/0x1f0
 [<ffffffff815ca9ff>] ? wait_for_completion+0xcf/0x110
 [<ffffffff815caa07>] wait_for_completion+0xd7/0x110
 [<ffffffff8108bce0>] ? try_to_wake_up+0x260/0x260
 [<ffffffffa064cb6e>] cma_process_remove+0xee/0x110 [rdma_cm]
 [<ffffffffa064cbdc>] cma_remove_one+0x4c/0x60 [rdma_cm]
 [<ffffffffa0279e0f>] ib_unregister_device+0x4f/0x100 [ib_core]
 [<ffffffffa02f76ee>] mlx4_ib_remove+0x2e/0x260 [mlx4_ib]
 [<ffffffffa01754c9>] mlx4_remove_device+0x69/0x80 [mlx4_core]
 [<ffffffffa01755b3>] mlx4_unregister_interface+0x43/0x80 [mlx4_core]
 [<ffffffffa030970c>] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib]
 [<ffffffff810d9183>] SyS_delete_module+0x183/0x1e0
 [<ffffffff810f7c94>] ? __audit_syscall_entry+0x94/0x100
 [<ffffffff812c5789>] ? lockdep_sys_exit_thunk+0x35/0x67
 [<ffffffff815cec92>] system_call_fastpath+0x16/0x1b


Shirley
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Devesh Sharma July 18, 2014, 5:05 a.m. UTC | #15
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Steve Wise
> Sent: Friday, July 18, 2014 2:29 AM
> To: 'Chuck Lever'
> Cc: 'Hefty, Sean'; 'Shirley Ma'; Devesh Sharma; 'Roland Dreier'; linux-
> rdma@vger.kernel.org
> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider
> module
> 
> 
> 
> > -----Original Message-----
> > From: linux-rdma-owner@vger.kernel.org
> > [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Chuck Lever
> > Sent: Thursday, July 17, 2014 3:42 PM
> > To: Steve Wise
> > Cc: Hefty, Sean; Shirley Ma; Devesh Sharma; Roland Dreier;
> > linux-rdma@vger.kernel.org
> > Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
> > module
> >
> >
> > On Jul 17, 2014, at 4:08 PM, Steve Wise <swise@opengridcomputing.com>
> wrote:
> >
> > >
> > >
> > >> -----Original Message-----
> > >> From: Steve Wise [mailto:swise@opengridcomputing.com]
> > >> Sent: Thursday, July 17, 2014 2:56 PM
> > >> To: 'Hefty, Sean'; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
> > >> Cc: 'linux-rdma@vger.kernel.org'; 'chuck.lever@oracle.com'
> > >> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma
> > >> provider module
> > >>
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: Hefty, Sean [mailto:sean.hefty@intel.com]
> > >>> Sent: Thursday, July 17, 2014 2:50 PM
> > >>> To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
> > >>> Cc: linux-rdma@vger.kernel.org; chuck.lever@oracle.com
> > >>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma
> > >>> provider module
> > >>>
> > >>>>> So the rdma cm is expected to increase the driver reference
> > >>>>> count
> > >>>> (try_module_get) for
> > >>>>> each new cm id, then deference count (module_put) when cm id is
> > >>>> destroyed?
> > >>>>>
> > >>>>
> > >>>> No, I think he's saying the rdma-cm posts a
> > >>>> RDMA_CM_DEVICE_REMOVAL event to each application with
> rdmacm
> > >>>> objects allocated, and each application is expected to destroy
> > >>>> all the objects it has allocated before returning from the event
> > >>>> handler.
> > >>>
> > >>> This is almost correct.  The applications do not have to destroy
> > >>> all the objects
> that
> > > it has
> > >>> allocated before returning from their event handler.  E.g. an app
> > >>> can queue a work
> > > item
> > >>> that does the destruction.  The rdmacm will block in its ib_client
> > >>> remove handler
> > > until all
> > >>> relevant rdma_cm_id's have been destroyed.
> > >>>
> > >>
> > >> Thanks for the clarification.
> > >>
> > >
> > > And looking at xprtrdma, it does handle the DEVICE_REMOVAL event in
> > rpcrdma_conn_upcall().
> > > It sets ep->rep_connected to -ENODEV, wakes everybody up, and calls
> > rpcrdma_conn_func()
> > > for that endpoint, which schedules rep_connect_worker...  and I gave
> > > up following the
> > code
> > > path at this point... :)
> > >
> > > For this to all work correctly, it would need to destroy all the
> > > QPs, MRs, CQs, etc
> for
> > > that device _before_ destroying the rdma cm ids.  Otherwise the
> > > provider module could
> > be
> > > unloaded too soon.
> >
> > We can't really deal with a CM_DEVICE_REMOVE event while there are
> > active NFS mounts.
> >
> > System shutdown ordering should guarantee (one would hope) that NFS
> > mount points are unmounted before the RDMA/IB core infrastructure is
> > torn down. Ordering shouldn't matter as long all NFS activity has
> > ceased before the CM tries to remove the device.
> >
> > So if something is hanging up the CM, there's something xprtrdma is
> > not cleaning up properly.
> >
> 
> 
> Devesh, how are you reproducing this?  Are you just rmmod'ing the ocrdma
> module while there are active mounts?

Yes, I am issuing rmmod while there is an active mount. In my case rmmod ocrdma remains blocked forever.
Off-the-course of this discussion: Is there a reasoning behind not using ib_register_client()/ib_unregister_client() framework?
> 
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Devesh Sharma July 18, 2014, 6:19 a.m. UTC | #16
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Steve Wise
> Sent: Friday, July 18, 2014 1:39 AM
> To: 'Hefty, Sean'; 'Shirley Ma'; Devesh Sharma; 'Roland Dreier'
> Cc: linux-rdma@vger.kernel.org; chuck.lever@oracle.com
> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider
> module
> 
> 
> 
> > -----Original Message-----
> > From: Steve Wise [mailto:swise@opengridcomputing.com]
> > Sent: Thursday, July 17, 2014 2:56 PM
> > To: 'Hefty, Sean'; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
> > Cc: 'linux-rdma@vger.kernel.org'; 'chuck.lever@oracle.com'
> > Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider
> > module
> >
> >
> >
> > > -----Original Message-----
> > > From: Hefty, Sean [mailto:sean.hefty@intel.com]
> > > Sent: Thursday, July 17, 2014 2:50 PM
> > > To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
> > > Cc: linux-rdma@vger.kernel.org; chuck.lever@oracle.com
> > > Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma
> > > provider module
> > >
> > > > > So the rdma cm is expected to increase the driver reference
> > > > > count
> > > > (try_module_get) for
> > > > > each new cm id, then deference count (module_put) when cm id is
> > > > destroyed?
> > > > >
> > > >
> > > > No, I think he's saying the rdma-cm posts a
> RDMA_CM_DEVICE_REMOVAL
> > > > event to each application with rdmacm objects allocated, and each
> > > > application is expected to destroy all the objects it has
> > > > allocated before returning from the event handler.
> > >
> > > This is almost correct.  The applications do not have to destroy all
> > > the objects that
> it has
> > > allocated before returning from their event handler.  E.g. an app
> > > can queue a work
> item
> > > that does the destruction.  The rdmacm will block in its ib_client
> > > remove handler
> until all
> > > relevant rdma_cm_id's have been destroyed.
> > >
> >
> > Thanks for the clarification.
> >
> 
> And looking at xprtrdma, it does handle the DEVICE_REMOVAL event in
> rpcrdma_conn_upcall().
> It sets ep->rep_connected to -ENODEV, wakes everybody up, and calls
> rpcrdma_conn_func() for that endpoint, which schedules
> rep_connect_worker...  and I gave up following the code path at this point...
> :)
> 
> For this to all work correctly, it would need to destroy all the QPs, MRs, CQs,
> etc for that device _before_ destroying the rdma cm ids.  Otherwise the
> provider module could be unloaded too soon...

Okay, Should I try to handle device removal in this proposed fashion and post the v1.

> 
> Steve.
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise July 18, 2014, 1:27 p.m. UTC | #17
> > > We can't really deal with a CM_DEVICE_REMOVE event while there are
> > > active NFS mounts.
> > >
> > > System shutdown ordering should guarantee (one would hope) that NFS
> > > mount points are unmounted before the RDMA/IB core infrastructure is
> > > torn down. Ordering shouldn't matter as long all NFS activity has
> > > ceased before the CM tries to remove the device.
> > >
> > > So if something is hanging up the CM, there's something xprtrdma is
> > > not cleaning up properly.
> > >
> >
> >
> > Devesh, how are you reproducing this?  Are you just rmmod'ing the ocrdma
> > module while there are active mounts?
> 
> Yes, I am issuing rmmod while there is an active mount. In my case rmmod ocrdma remains
> blocked forever.
> Off-the-course of this discussion: Is there a reasoning behind not using
> ib_register_client()/ib_unregister_client() framework?

I think the idea is that you don't need to use it if you are transport-independent and use
the rdmacm...



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever III July 18, 2014, 3:27 p.m. UTC | #18
On Jul 18, 2014, at 2:19 AM, Devesh Sharma <Devesh.Sharma@Emulex.Com> wrote:

>> -----Original Message-----
>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>> owner@vger.kernel.org] On Behalf Of Steve Wise
>> Sent: Friday, July 18, 2014 1:39 AM
>> To: 'Hefty, Sean'; 'Shirley Ma'; Devesh Sharma; 'Roland Dreier'
>> Cc: linux-rdma@vger.kernel.org; chuck.lever@oracle.com
>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider
>> module
>> 
>> 
>> 
>>> -----Original Message-----
>>> From: Steve Wise [mailto:swise@opengridcomputing.com]
>>> Sent: Thursday, July 17, 2014 2:56 PM
>>> To: 'Hefty, Sean'; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
>>> Cc: 'linux-rdma@vger.kernel.org'; 'chuck.lever@oracle.com'
>>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider
>>> module
>>> 
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Hefty, Sean [mailto:sean.hefty@intel.com]
>>>> Sent: Thursday, July 17, 2014 2:50 PM
>>>> To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
>>>> Cc: linux-rdma@vger.kernel.org; chuck.lever@oracle.com
>>>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma
>>>> provider module
>>>> 
>>>>>> So the rdma cm is expected to increase the driver reference
>>>>>> count
>>>>> (try_module_get) for
>>>>>> each new cm id, then deference count (module_put) when cm id is
>>>>> destroyed?
>>>>>> 
>>>>> 
>>>>> No, I think he's saying the rdma-cm posts a
>> RDMA_CM_DEVICE_REMOVAL
>>>>> event to each application with rdmacm objects allocated, and each
>>>>> application is expected to destroy all the objects it has
>>>>> allocated before returning from the event handler.
>>>> 
>>>> This is almost correct.  The applications do not have to destroy all
>>>> the objects that
>> it has
>>>> allocated before returning from their event handler.  E.g. an app
>>>> can queue a work
>> item
>>>> that does the destruction.  The rdmacm will block in its ib_client
>>>> remove handler
>> until all
>>>> relevant rdma_cm_id's have been destroyed.
>>>> 
>>> 
>>> Thanks for the clarification.
>>> 
>> 
>> And looking at xprtrdma, it does handle the DEVICE_REMOVAL event in
>> rpcrdma_conn_upcall().
>> It sets ep->rep_connected to -ENODEV, wakes everybody up, and calls
>> rpcrdma_conn_func() for that endpoint, which schedules
>> rep_connect_worker...  and I gave up following the code path at this point...
>> :)
>> 
>> For this to all work correctly, it would need to destroy all the QPs, MRs, CQs,
>> etc for that device _before_ destroying the rdma cm ids.  Otherwise the
>> provider module could be unloaded too soon...
> 
> Okay, Should I try to handle device removal in this proposed fashion and post the v1.

Hi Devesh,

To make it work, xprtrdma is going to have to allow the device to be
removed and added back while there are active NFS mounts and pending RPCs.
AFAICT the code is not structured to do that today.

Probably the place to start is to see how much work is needed to leverage
the existing logic to watch for ENODEV and do the right things to suspend
RPC activity until another device is inserted. It would have to work like
a network partition that causes a transport reconnect.

However, replacing everything, including all MRs and the PD, will require
significant code churn and additional (undesirable) serialization around
the use of QPs and cm_ids. Thus I would like to understand how much of a
priority this is.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shirley Ma July 18, 2014, 3:47 p.m. UTC | #19
On 07/18/2014 06:27 AM, Steve Wise wrote:
>>>> We can't really deal with a CM_DEVICE_REMOVE event while there are
>>>> active NFS mounts.
>>>>
>>>> System shutdown ordering should guarantee (one would hope) that NFS
>>>> mount points are unmounted before the RDMA/IB core infrastructure is
>>>> torn down. Ordering shouldn't matter as long all NFS activity has
>>>> ceased before the CM tries to remove the device.
>>>>
>>>> So if something is hanging up the CM, there's something xprtrdma is
>>>> not cleaning up properly.
>>>>
>>>
>>>
>>> Devesh, how are you reproducing this?  Are you just rmmod'ing the ocrdma
>>> module while there are active mounts?
>>
>> Yes, I am issuing rmmod while there is an active mount. In my case rmmod ocrdma remains
>> blocked forever.
Where is it blocked?

>> Off-the-course of this discussion: Is there a reasoning behind not using
>> ib_register_client()/ib_unregister_client() framework?
> 
> I think the idea is that you don't need to use it if you are transport-independent and use
> the rdmacm...
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Devesh Sharma July 21, 2014, 5:23 a.m. UTC | #20
> -----Original Message-----
> From: Steve Wise [mailto:swise@opengridcomputing.com]
> Sent: Friday, July 18, 2014 6:58 PM
> To: Devesh Sharma; 'Chuck Lever'
> Cc: 'Hefty, Sean'; 'Shirley Ma'; 'Roland Dreier'; linux-rdma@vger.kernel.org
> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider
> module
> 
> > > > We can't really deal with a CM_DEVICE_REMOVE event while there are
> > > > active NFS mounts.
> > > >
> > > > System shutdown ordering should guarantee (one would hope) that
> > > > NFS mount points are unmounted before the RDMA/IB core
> > > > infrastructure is torn down. Ordering shouldn't matter as long all
> > > > NFS activity has ceased before the CM tries to remove the device.
> > > >
> > > > So if something is hanging up the CM, there's something xprtrdma
> > > > is not cleaning up properly.
> > > >
> > >
> > >
> > > Devesh, how are you reproducing this?  Are you just rmmod'ing the
> > > ocrdma module while there are active mounts?
> >
> > Yes, I am issuing rmmod while there is an active mount. In my case
> > rmmod ocrdma remains blocked forever.
> > Off-the-course of this discussion: Is there a reasoning behind not
> > using
> > ib_register_client()/ib_unregister_client() framework?
> 
> I think the idea is that you don't need to use it if you are transport-
> independent and use the rdmacm...

Okay, Makes sense..

> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Devesh Sharma July 21, 2014, 5:40 a.m. UTC | #21
> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> Sent: Friday, July 18, 2014 8:57 PM
> To: Devesh Sharma
> Cc: Steve Wise; Hefty, Sean; Shirley Ma; Roland Dreier; linux-
> rdma@vger.kernel.org
> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
> module
> 
> 
> On Jul 18, 2014, at 2:19 AM, Devesh Sharma <Devesh.Sharma@Emulex.Com>
> wrote:
> 
> >> -----Original Message-----
> >> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> >> owner@vger.kernel.org] On Behalf Of Steve Wise
> >> Sent: Friday, July 18, 2014 1:39 AM
> >> To: 'Hefty, Sean'; 'Shirley Ma'; Devesh Sharma; 'Roland Dreier'
> >> Cc: linux-rdma@vger.kernel.org; chuck.lever@oracle.com
> >> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider
> >> module
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Steve Wise [mailto:swise@opengridcomputing.com]
> >>> Sent: Thursday, July 17, 2014 2:56 PM
> >>> To: 'Hefty, Sean'; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
> >>> Cc: 'linux-rdma@vger.kernel.org'; 'chuck.lever@oracle.com'
> >>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma
> >>> provider module
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Hefty, Sean [mailto:sean.hefty@intel.com]
> >>>> Sent: Thursday, July 17, 2014 2:50 PM
> >>>> To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
> >>>> Cc: linux-rdma@vger.kernel.org; chuck.lever@oracle.com
> >>>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma
> >>>> provider module
> >>>>
> >>>>>> So the rdma cm is expected to increase the driver reference count
> >>>>> (try_module_get) for
> >>>>>> each new cm id, then deference count (module_put) when cm id is
> >>>>> destroyed?
> >>>>>>
> >>>>>
> >>>>> No, I think he's saying the rdma-cm posts a
> >> RDMA_CM_DEVICE_REMOVAL
> >>>>> event to each application with rdmacm objects allocated, and each
> >>>>> application is expected to destroy all the objects it has
> >>>>> allocated before returning from the event handler.
> >>>>
> >>>> This is almost correct.  The applications do not have to destroy
> >>>> all the objects that
> >> it has
> >>>> allocated before returning from their event handler.  E.g. an app
> >>>> can queue a work
> >> item
> >>>> that does the destruction.  The rdmacm will block in its ib_client
> >>>> remove handler
> >> until all
> >>>> relevant rdma_cm_id's have been destroyed.
> >>>>
> >>>
> >>> Thanks for the clarification.
> >>>
> >>
> >> And looking at xprtrdma, it does handle the DEVICE_REMOVAL event in
> >> rpcrdma_conn_upcall().
> >> It sets ep->rep_connected to -ENODEV, wakes everybody up, and calls
> >> rpcrdma_conn_func() for that endpoint, which schedules
> >> rep_connect_worker...  and I gave up following the code path at this
> point...
> >> :)
> >>
> >> For this to all work correctly, it would need to destroy all the QPs,
> >> MRs, CQs, etc for that device _before_ destroying the rdma cm ids.
> >> Otherwise the provider module could be unloaded too soon...
> >
> > Okay, Should I try to handle device removal in this proposed fashion and
> post the v1.
> 
> Hi Devesh,
> 
> To make it work, xprtrdma is going to have to allow the device to be removed
> and added back while there are active NFS mounts and pending RPCs.
> AFAICT the code is not structured to do that today.
> 
> Probably the place to start is to see how much work is needed to leverage
> the existing logic to watch for ENODEV and do the right things to suspend
> RPC activity until another device is inserted. It would have to work like a
> network partition that causes a transport reconnect.
> 
> However, replacing everything, including all MRs and the PD, will require
> significant code churn and additional (undesirable) serialization around the
> use of QPs and cm_ids. Thus I would like to understand how much of a
> priority this is.

Sure, I got your point, this is good for long term stability that we understand
and do the right thing. However if it's not feasible to change the entire infrastructure
in one go OR in the near future, do we have other ideas/options to handle this problem

In the real world situations it's quite possible that someone attempts to reload/replace/unload the 
vendor driver without knowing if there is an active mount or not.

> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Devesh Sharma July 21, 2014, 6:11 a.m. UTC | #22
Shirley,

Once rmmod is issued, the connection corresponding to the active mount is destroyed and all the associated resources 
Are freed. As per the processing logic of DEVICE-REMOVAL event, nfs-rdma wakes-up all the  waiters, This results into
Re-establishment efforts, since the device is not present any more, rdma_resolve_address() fails with CM resolution
Error. This loop continues forever.

I am yet to find out which part of ocrdma is blocked. I am putting some debug messages to find it out. I will get back to 
The group with an update.

-Regards
 Devesh

> -----Original Message-----
> From: Shirley Ma [mailto:shirley.ma@oracle.com]
> Sent: Friday, July 18, 2014 9:18 PM
> To: Steve Wise; Devesh Sharma; 'Chuck Lever'
> Cc: 'Hefty, Sean'; 'Roland Dreier'; linux-rdma@vger.kernel.org
> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
> module
> 
> 
> On 07/18/2014 06:27 AM, Steve Wise wrote:
> >>>> We can't really deal with a CM_DEVICE_REMOVE event while there are
> >>>> active NFS mounts.
> >>>>
> >>>> System shutdown ordering should guarantee (one would hope) that
> NFS
> >>>> mount points are unmounted before the RDMA/IB core infrastructure
> >>>> is torn down. Ordering shouldn't matter as long all NFS activity
> >>>> has ceased before the CM tries to remove the device.
> >>>>
> >>>> So if something is hanging up the CM, there's something xprtrdma is
> >>>> not cleaning up properly.
> >>>>
> >>>
> >>>
> >>> Devesh, how are you reproducing this?  Are you just rmmod'ing the
> >>> ocrdma module while there are active mounts?
> >>
> >> Yes, I am issuing rmmod while there is an active mount. In my case
> >> rmmod ocrdma remains blocked forever.
> Where is it blocked?
> 
> >> Off-the-course of this discussion: Is there a reasoning behind not
> >> using
> >> ib_register_client()/ib_unregister_client() framework?
> >
> > I think the idea is that you don't need to use it if you are
> > transport-independent and use the rdmacm...
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > in the body of a message to majordomo@vger.kernel.org More
> majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Devesh Sharma July 21, 2014, 11:48 a.m. UTC | #23
In rpcrdma_ep_connect():

write_lock(&ia->ri_qplock);
                old = ia->ri_id;
                ia->ri_id = id;
                write_unlock(&ia->ri_qplock);

                rdma_destroy_qp(old);
                rdma_destroy_id(old);  =============> Cm -id is destroyed here.


If following code fails in rpcrdma_ep_connect():
id = rpcrdma_create_id(xprt, ia,
                                (struct sockaddr *)&xprt->rx_data.addr);
                if (IS_ERR(id)) {
                        rc = -EHOSTUNREACH;
                        goto out;
                }

it leaves old cm-id still alive. This will always fail if Device is removed abruptly.

In rdma_resolve_addr()/rdma_destroy_id() cm_dev is referenced/de-referenced here (cma.c):

static int cma_acquire_dev(struct rdma_id_private *id_priv,
                           struct rdma_id_private *listen_id_priv) {
.
.
if (!ret)
                cma_attach_to_dev(id_priv, cma_dev);
}

static void cma_release_dev(struct rdma_id_private *id_priv)
{
        mutex_lock(&lock);
        list_del(&id_priv->list);
        cma_deref_dev(id_priv->cma_dev);
.
.
}

Since as per design of nfs-rdma at-least previously known good cm-id always remains live utill
another good cm-id is created, cma_dev->refcount never becomes 0 upon device removal .
Thus blocking the rmmod <vendor driver> forever.

-Regards
 Devesh

> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Devesh Sharma
> Sent: Monday, July 21, 2014 11:42 AM
> To: Shirley Ma; Steve Wise; 'Chuck Lever'
> Cc: 'Hefty, Sean'; 'Roland Dreier'; linux-rdma@vger.kernel.org
> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider
> module
> 
> Shirley,
> 
> Once rmmod is issued, the connection corresponding to the active mount is
> destroyed and all the associated resources Are freed. As per the processing
> logic of DEVICE-REMOVAL event, nfs-rdma wakes-up all the  waiters, This
> results into Re-establishment efforts, since the device is not present any
> more, rdma_resolve_address() fails with CM resolution Error. This loop
> continues forever.
> 
> I am yet to find out which part of ocrdma is blocked. I am putting some debug
> messages to find it out. I will get back to The group with an update.
> 
> -Regards
>  Devesh
> 
> > -----Original Message-----
> > From: Shirley Ma [mailto:shirley.ma@oracle.com]
> > Sent: Friday, July 18, 2014 9:18 PM
> > To: Steve Wise; Devesh Sharma; 'Chuck Lever'
> > Cc: 'Hefty, Sean'; 'Roland Dreier'; linux-rdma@vger.kernel.org
> > Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
> > module
> >
> >
> > On 07/18/2014 06:27 AM, Steve Wise wrote:
> > >>>> We can't really deal with a CM_DEVICE_REMOVE event while there
> > >>>> are active NFS mounts.
> > >>>>
> > >>>> System shutdown ordering should guarantee (one would hope) that
> > NFS
> > >>>> mount points are unmounted before the RDMA/IB core
> infrastructure
> > >>>> is torn down. Ordering shouldn't matter as long all NFS activity
> > >>>> has ceased before the CM tries to remove the device.
> > >>>>
> > >>>> So if something is hanging up the CM, there's something xprtrdma
> > >>>> is not cleaning up properly.
> > >>>>
> > >>>
> > >>>
> > >>> Devesh, how are you reproducing this?  Are you just rmmod'ing the
> > >>> ocrdma module while there are active mounts?
> > >>
> > >> Yes, I am issuing rmmod while there is an active mount. In my case
> > >> rmmod ocrdma remains blocked forever.
> > Where is it blocked?
> >
> > >> Off-the-course of this discussion: Is there a reasoning behind not
> > >> using
> > >> ib_register_client()/ib_unregister_client() framework?
> > >
> > > I think the idea is that you don't need to use it if you are
> > > transport-independent and use the rdmacm...
> > >
> > >
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > > in the body of a message to majordomo@vger.kernel.org More
> > majordomo
> > > info at  http://vger.kernel.org/majordomo-info.html
> > >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever III July 21, 2014, 2:53 p.m. UTC | #24
Hi Devesh-

Thanks for drilling into this further.

On Jul 21, 2014, at 7:48 AM, Devesh Sharma <Devesh.Sharma@Emulex.Com> wrote:

> In rpcrdma_ep_connect():
> 
> write_lock(&ia->ri_qplock);
>                old = ia->ri_id;
>                ia->ri_id = id;
>                write_unlock(&ia->ri_qplock);
> 
>                rdma_destroy_qp(old);
>                rdma_destroy_id(old);  =============> Cm -id is destroyed here.
> 
> 
> If following code fails in rpcrdma_ep_connect():
> id = rpcrdma_create_id(xprt, ia,
>                                (struct sockaddr *)&xprt->rx_data.addr);
>                if (IS_ERR(id)) {
>                        rc = -EHOSTUNREACH;
>                        goto out;
>                }
> 
> it leaves old cm-id still alive. This will always fail if Device is removed abruptly.

For CM_EVENT_DEVICE_REMOVAL, rpcrdma_conn_upcall() sets ep->rep_connected
to -ENODEV.

Then:

 929 int
 930 rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
 931 {
 932         struct rdma_cm_id *id, *old;
 933         int rc = 0;
 934         int retry_count = 0;
 935 
 936         if (ep->rep_connected != 0) {
 937                 struct rpcrdma_xprt *xprt;
 938 retry:
 939                 dprintk("RPC:       %s: reconnecting...\n", __func__);

ep->rep_connected is probably -ENODEV after a device removal. It would be
possible for the connect worker to destroy everything associated with this
connection in that case to ensure the underlying object reference counts
are cleared.

The immediate danger is that if there are pending RPCs, they could exit while
qp/cm_id are NULL, triggering a panic in rpcrdma_deregister_frmr_external().
Checking for NULL pointers inside the ri_qplock would prevent that.

However, NFS mounts via this adapter will hang indefinitely after all
transports are torn down and the adapter is gone. The only thing that can be
done is something drastic like “echo b > /proc/sysrq_trigger” on the client.

Thus, IMO hot-plugging or passive fail-over are the only scenarios where
this makes sense. If we have an immediate problem here, is it a problem with
system shutdown ordering that can be addressed in some other way?

Until that support is in place, obviously I would prefer that the removal of
the underlying driver be prevented while there are NFS mounts in place. I
think that’s what NFS users have come to expect.

In other words, don’t allow device removal until we have support for device
insertion :-)


> In rdma_resolve_addr()/rdma_destroy_id() cm_dev is referenced/de-referenced here (cma.c):
> 
> static int cma_acquire_dev(struct rdma_id_private *id_priv,
>                           struct rdma_id_private *listen_id_priv) {
> .
> .
> if (!ret)
>                cma_attach_to_dev(id_priv, cma_dev);
> }
> 
> static void cma_release_dev(struct rdma_id_private *id_priv)
> {
>        mutex_lock(&lock);
>        list_del(&id_priv->list);
>        cma_deref_dev(id_priv->cma_dev);
> .
> .
> }
> 
> Since as per design of nfs-rdma at-least previously known good cm-id always remains live utill
> another good cm-id is created, cma_dev->refcount never becomes 0 upon device removal .
> Thus blocking the rmmod <vendor driver> forever.
> 
> -Regards
> Devesh
> 
>> -----Original Message-----
>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>> owner@vger.kernel.org] On Behalf Of Devesh Sharma
>> Sent: Monday, July 21, 2014 11:42 AM
>> To: Shirley Ma; Steve Wise; 'Chuck Lever'
>> Cc: 'Hefty, Sean'; 'Roland Dreier'; linux-rdma@vger.kernel.org
>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider
>> module
>> 
>> Shirley,
>> 
>> Once rmmod is issued, the connection corresponding to the active mount is
>> destroyed and all the associated resources Are freed. As per the processing
>> logic of DEVICE-REMOVAL event, nfs-rdma wakes-up all the  waiters, This
>> results into Re-establishment efforts, since the device is not present any
>> more, rdma_resolve_address() fails with CM resolution Error. This loop
>> continues forever.
>> 
>> I am yet to find out which part of ocrdma is blocked. I am putting some debug
>> messages to find it out. I will get back to The group with an update.
>> 
>> -Regards
>> Devesh
>> 
>>> -----Original Message-----
>>> From: Shirley Ma [mailto:shirley.ma@oracle.com]
>>> Sent: Friday, July 18, 2014 9:18 PM
>>> To: Steve Wise; Devesh Sharma; 'Chuck Lever'
>>> Cc: 'Hefty, Sean'; 'Roland Dreier'; linux-rdma@vger.kernel.org
>>> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
>>> module
>>> 
>>> 
>>> On 07/18/2014 06:27 AM, Steve Wise wrote:
>>>>>>> We can't really deal with a CM_DEVICE_REMOVE event while there
>>>>>>> are active NFS mounts.
>>>>>>> 
>>>>>>> System shutdown ordering should guarantee (one would hope) that
>>> NFS
>>>>>>> mount points are unmounted before the RDMA/IB core
>> infrastructure
>>>>>>> is torn down. Ordering shouldn't matter as long all NFS activity
>>>>>>> has ceased before the CM tries to remove the device.
>>>>>>> 
>>>>>>> So if something is hanging up the CM, there's something xprtrdma
>>>>>>> is not cleaning up properly.
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Devesh, how are you reproducing this?  Are you just rmmod'ing the
>>>>>> ocrdma module while there are active mounts?
>>>>> 
>>>>> Yes, I am issuing rmmod while there is an active mount. In my case
>>>>> rmmod ocrdma remains blocked forever.
>>> Where is it blocked?
>>> 
>>>>> Off-the-course of this discussion: Is there a reasoning behind not
>>>>> using
>>>>> ib_register_client()/ib_unregister_client() framework?
>>>> 
>>>> I think the idea is that you don't need to use it if you are
>>>> transport-independent and use the rdmacm...
>>>> 
>>>> 
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
>>>> in the body of a message to majordomo@vger.kernel.org More
>>> majordomo
>>>> info at  http://vger.kernel.org/majordomo-info.html
>>>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
>> body of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise July 21, 2014, 3:03 p.m. UTC | #25
> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> Sent: Monday, July 21, 2014 9:54 AM
> To: Devesh Sharma
> Cc: Shirley Ma; Steve Wise; Hefty, Sean; Roland Dreier; linux-rdma@vger.kernel.org
> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider module
> 
> Hi Devesh-
> 
> Thanks for drilling into this further.
> 
> On Jul 21, 2014, at 7:48 AM, Devesh Sharma <Devesh.Sharma@Emulex.Com> wrote:
> 
> > In rpcrdma_ep_connect():
> >
> > write_lock(&ia->ri_qplock);
> >                old = ia->ri_id;
> >                ia->ri_id = id;
> >                write_unlock(&ia->ri_qplock);
> >
> >                rdma_destroy_qp(old);
> >                rdma_destroy_id(old);  =============> Cm -id is destroyed here.
> >
> >
> > If following code fails in rpcrdma_ep_connect():
> > id = rpcrdma_create_id(xprt, ia,
> >                                (struct sockaddr *)&xprt->rx_data.addr);
> >                if (IS_ERR(id)) {
> >                        rc = -EHOSTUNREACH;
> >                        goto out;
> >                }
> >
> > it leaves old cm-id still alive. This will always fail if Device is removed abruptly.
> 
> For CM_EVENT_DEVICE_REMOVAL, rpcrdma_conn_upcall() sets ep->rep_connected
> to -ENODEV.
> 
> Then:
> 
>  929 int
>  930 rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
>  931 {
>  932         struct rdma_cm_id *id, *old;
>  933         int rc = 0;
>  934         int retry_count = 0;
>  935
>  936         if (ep->rep_connected != 0) {
>  937                 struct rpcrdma_xprt *xprt;
>  938 retry:
>  939                 dprintk("RPC:       %s: reconnecting...\n", __func__);
> 
> ep->rep_connected is probably -ENODEV after a device removal. It would be
> possible for the connect worker to destroy everything associated with this
> connection in that case to ensure the underlying object reference counts
> are cleared.
> 
> The immediate danger is that if there are pending RPCs, they could exit while
> qp/cm_id are NULL, triggering a panic in rpcrdma_deregister_frmr_external().
> Checking for NULL pointers inside the ri_qplock would prevent that.
> 
> However, NFS mounts via this adapter will hang indefinitely after all
> transports are torn down and the adapter is gone. The only thing that can be
> done is something drastic like "echo b > /proc/sysrq_trigger" on the client.
> 
> Thus, IMO hot-plugging or passive fail-over are the only scenarios where
> this makes sense. If we have an immediate problem here, is it a problem with
> system shutdown ordering that can be addressed in some other way?
> 
> Until that support is in place, obviously I would prefer that the removal of
> the underlying driver be prevented while there are NFS mounts in place. I
> think that's what NFS users have come to expect.
> 
> In other words, don't allow device removal until we have support for device
> insertion :-)
> 
> 


If we fix the above problems on provider unload, shouldn't the mount recover if the
provider module is subsequently loaded?  Or another provider configured such that
rdma_resolve_addr/route() then picks an active device?



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever III July 21, 2014, 3:20 p.m. UTC | #26
On Jul 21, 2014, at 11:03 AM, Steve Wise <swise@opengridcomputing.com> wrote:

> 
> 
>> -----Original Message-----
>> From: Chuck Lever [mailto:chuck.lever@oracle.com]
>> Sent: Monday, July 21, 2014 9:54 AM
>> To: Devesh Sharma
>> Cc: Shirley Ma; Steve Wise; Hefty, Sean; Roland Dreier; linux-rdma@vger.kernel.org
>> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider module
>> 
>> Hi Devesh-
>> 
>> Thanks for drilling into this further.
>> 
>> On Jul 21, 2014, at 7:48 AM, Devesh Sharma <Devesh.Sharma@Emulex.Com> wrote:
>> 
>>> In rpcrdma_ep_connect():
>>> 
>>> write_lock(&ia->ri_qplock);
>>>               old = ia->ri_id;
>>>               ia->ri_id = id;
>>>               write_unlock(&ia->ri_qplock);
>>> 
>>>               rdma_destroy_qp(old);
>>>               rdma_destroy_id(old);  =============> Cm -id is destroyed here.
>>> 
>>> 
>>> If following code fails in rpcrdma_ep_connect():
>>> id = rpcrdma_create_id(xprt, ia,
>>>                               (struct sockaddr *)&xprt->rx_data.addr);
>>>               if (IS_ERR(id)) {
>>>                       rc = -EHOSTUNREACH;
>>>                       goto out;
>>>               }
>>> 
>>> it leaves old cm-id still alive. This will always fail if Device is removed abruptly.
>> 
>> For CM_EVENT_DEVICE_REMOVAL, rpcrdma_conn_upcall() sets ep->rep_connected
>> to -ENODEV.
>> 
>> Then:
>> 
>> 929 int
>> 930 rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
>> 931 {
>> 932         struct rdma_cm_id *id, *old;
>> 933         int rc = 0;
>> 934         int retry_count = 0;
>> 935
>> 936         if (ep->rep_connected != 0) {
>> 937                 struct rpcrdma_xprt *xprt;
>> 938 retry:
>> 939                 dprintk("RPC:       %s: reconnecting...\n", __func__);
>> 
>> ep->rep_connected is probably -ENODEV after a device removal. It would be
>> possible for the connect worker to destroy everything associated with this
>> connection in that case to ensure the underlying object reference counts
>> are cleared.
>> 
>> The immediate danger is that if there are pending RPCs, they could exit while
>> qp/cm_id are NULL, triggering a panic in rpcrdma_deregister_frmr_external().
>> Checking for NULL pointers inside the ri_qplock would prevent that.
>> 
>> However, NFS mounts via this adapter will hang indefinitely after all
>> transports are torn down and the adapter is gone. The only thing that can be
>> done is something drastic like "echo b > /proc/sysrq_trigger" on the client.
>> 
>> Thus, IMO hot-plugging or passive fail-over are the only scenarios where
>> this makes sense. If we have an immediate problem here, is it a problem with
>> system shutdown ordering that can be addressed in some other way?
>> 
>> Until that support is in place, obviously I would prefer that the removal of
>> the underlying driver be prevented while there are NFS mounts in place. I
>> think that's what NFS users have come to expect.
>> 
>> In other words, don't allow device removal until we have support for device
>> insertion :-)
>> 
>> 
> 
> 
> If we fix the above problems on provider unload, shouldn't the mount recover if the
> provider module is subsequently loaded?  Or another provider configured such that
> rdma_resolve_addr/route() then picks an active device?

On device removal, we have to destroy everything.

On insertion, we’ll have to create a fresh PD and MRs, which isn’t done now
during reconnect. So, insertion needs work too.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise July 21, 2014, 3:22 p.m. UTC | #27
> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> Sent: Monday, July 21, 2014 10:21 AM
> To: Steve Wise
> Cc: Devesh Sharma; Shirley Ma; Hefty, Sean; Roland Dreier; linux-rdma@vger.kernel.org
> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider module
> 
> 
> On Jul 21, 2014, at 11:03 AM, Steve Wise <swise@opengridcomputing.com> wrote:
> 
> >
> >
> >> -----Original Message-----
> >> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> >> Sent: Monday, July 21, 2014 9:54 AM
> >> To: Devesh Sharma
> >> Cc: Shirley Ma; Steve Wise; Hefty, Sean; Roland Dreier; linux-rdma@vger.kernel.org
> >> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider module
> >>
> >> Hi Devesh-
> >>
> >> Thanks for drilling into this further.
> >>
> >> On Jul 21, 2014, at 7:48 AM, Devesh Sharma <Devesh.Sharma@Emulex.Com> wrote:
> >>
> >>> In rpcrdma_ep_connect():
> >>>
> >>> write_lock(&ia->ri_qplock);
> >>>               old = ia->ri_id;
> >>>               ia->ri_id = id;
> >>>               write_unlock(&ia->ri_qplock);
> >>>
> >>>               rdma_destroy_qp(old);
> >>>               rdma_destroy_id(old);  =============> Cm -id is destroyed here.
> >>>
> >>>
> >>> If following code fails in rpcrdma_ep_connect():
> >>> id = rpcrdma_create_id(xprt, ia,
> >>>                               (struct sockaddr *)&xprt->rx_data.addr);
> >>>               if (IS_ERR(id)) {
> >>>                       rc = -EHOSTUNREACH;
> >>>                       goto out;
> >>>               }
> >>>
> >>> it leaves old cm-id still alive. This will always fail if Device is removed
abruptly.
> >>
> >> For CM_EVENT_DEVICE_REMOVAL, rpcrdma_conn_upcall() sets ep->rep_connected
> >> to -ENODEV.
> >>
> >> Then:
> >>
> >> 929 int
> >> 930 rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
> >> 931 {
> >> 932         struct rdma_cm_id *id, *old;
> >> 933         int rc = 0;
> >> 934         int retry_count = 0;
> >> 935
> >> 936         if (ep->rep_connected != 0) {
> >> 937                 struct rpcrdma_xprt *xprt;
> >> 938 retry:
> >> 939                 dprintk("RPC:       %s: reconnecting...\n", __func__);
> >>
> >> ep->rep_connected is probably -ENODEV after a device removal. It would be
> >> possible for the connect worker to destroy everything associated with this
> >> connection in that case to ensure the underlying object reference counts
> >> are cleared.
> >>
> >> The immediate danger is that if there are pending RPCs, they could exit while
> >> qp/cm_id are NULL, triggering a panic in rpcrdma_deregister_frmr_external().
> >> Checking for NULL pointers inside the ri_qplock would prevent that.
> >>
> >> However, NFS mounts via this adapter will hang indefinitely after all
> >> transports are torn down and the adapter is gone. The only thing that can be
> >> done is something drastic like "echo b > /proc/sysrq_trigger" on the client.
> >>
> >> Thus, IMO hot-plugging or passive fail-over are the only scenarios where
> >> this makes sense. If we have an immediate problem here, is it a problem with
> >> system shutdown ordering that can be addressed in some other way?
> >>
> >> Until that support is in place, obviously I would prefer that the removal of
> >> the underlying driver be prevented while there are NFS mounts in place. I
> >> think that's what NFS users have come to expect.
> >>
> >> In other words, don't allow device removal until we have support for device
> >> insertion :-)
> >>
> >>
> >
> >
> > If we fix the above problems on provider unload, shouldn't the mount recover if the
> > provider module is subsequently loaded?  Or another provider configured such that
> > rdma_resolve_addr/route() then picks an active device?
> 
> On device removal, we have to destroy everything.
> 
> On insertion, we'll have to create a fresh PD and MRs, which isn't done now
> during reconnect. So, insertion needs work too.
> 

Got it, thanks! 



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Devesh Sharma July 21, 2014, 5:07 p.m. UTC | #28
> -----Original Message-----
> From: Steve Wise [mailto:swise@opengridcomputing.com]
> Sent: Monday, July 21, 2014 8:53 PM
> To: 'Chuck Lever'
> Cc: Devesh Sharma; 'Shirley Ma'; 'Hefty, Sean'; 'Roland Dreier'; linux-
> rdma@vger.kernel.org
> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider
> module
> 
> 
> 
> > -----Original Message-----
> > From: Chuck Lever [mailto:chuck.lever@oracle.com]
> > Sent: Monday, July 21, 2014 10:21 AM
> > To: Steve Wise
> > Cc: Devesh Sharma; Shirley Ma; Hefty, Sean; Roland Dreier;
> > linux-rdma@vger.kernel.org
> > Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
> > module
> >
> >
> > On Jul 21, 2014, at 11:03 AM, Steve Wise <swise@opengridcomputing.com>
> wrote:
> >
> > >
> > >
> > >> -----Original Message-----
> > >> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> > >> Sent: Monday, July 21, 2014 9:54 AM
> > >> To: Devesh Sharma
> > >> Cc: Shirley Ma; Steve Wise; Hefty, Sean; Roland Dreier;
> > >> linux-rdma@vger.kernel.org
> > >> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma
> > >> provider module
> > >>
> > >> Hi Devesh-
> > >>
> > >> Thanks for drilling into this further.
> > >>
> > >> On Jul 21, 2014, at 7:48 AM, Devesh Sharma
> <Devesh.Sharma@Emulex.Com> wrote:
> > >>
> > >>> In rpcrdma_ep_connect():
> > >>>
> > >>> write_lock(&ia->ri_qplock);
> > >>>               old = ia->ri_id;
> > >>>               ia->ri_id = id;
> > >>>               write_unlock(&ia->ri_qplock);
> > >>>
> > >>>               rdma_destroy_qp(old);
> > >>>               rdma_destroy_id(old);  =============> Cm -id is destroyed
> here.
> > >>>
> > >>>
> > >>> If following code fails in rpcrdma_ep_connect():
> > >>> id = rpcrdma_create_id(xprt, ia,
> > >>>                               (struct sockaddr *)&xprt->rx_data.addr);
> > >>>               if (IS_ERR(id)) {
> > >>>                       rc = -EHOSTUNREACH;
> > >>>                       goto out;
> > >>>               }
> > >>>
> > >>> it leaves old cm-id still alive. This will always fail if Device
> > >>> is removed
> abruptly.
> > >>
> > >> For CM_EVENT_DEVICE_REMOVAL, rpcrdma_conn_upcall() sets
> > >> ep->rep_connected to -ENODEV.
> > >>
> > >> Then:
> > >>
> > >> 929 int
> > >> 930 rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia
> > >> *ia)
> > >> 931 {
> > >> 932         struct rdma_cm_id *id, *old;
> > >> 933         int rc = 0;
> > >> 934         int retry_count = 0;
> > >> 935
> > >> 936         if (ep->rep_connected != 0) {
> > >> 937                 struct rpcrdma_xprt *xprt;
> > >> 938 retry:
> > >> 939                 dprintk("RPC:       %s: reconnecting...\n", __func__);
> > >>
> > >> ep->rep_connected is probably -ENODEV after a device removal. It
> > >> ep->would be
> > >> possible for the connect worker to destroy everything associated
> > >> with this connection in that case to ensure the underlying object
> > >> reference counts are cleared.
> > >>
> > >> The immediate danger is that if there are pending RPCs, they could
> > >> exit while qp/cm_id are NULL, triggering a panic in
> rpcrdma_deregister_frmr_external().
> > >> Checking for NULL pointers inside the ri_qplock would prevent that.
> > >>
> > >> However, NFS mounts via this adapter will hang indefinitely after
> > >> all transports are torn down and the adapter is gone. The only
> > >> thing that can be done is something drastic like "echo b >
> /proc/sysrq_trigger" on the client.
> > >>
> > >> Thus, IMO hot-plugging or passive fail-over are the only scenarios
> > >> where this makes sense. If we have an immediate problem here, is it
> > >> a problem with system shutdown ordering that can be addressed in
> some other way?
> > >>
> > >> Until that support is in place, obviously I would prefer that the
> > >> removal of the underlying driver be prevented while there are NFS
> > >> mounts in place. I think that's what NFS users have come to expect.
> > >>
> > >> In other words, don't allow device removal until we have support
> > >> for device insertion :-)

This needs a fresh series of patches?

> > >>
> > >>
> > >
> > >
> > > If we fix the above problems on provider unload, shouldn't the mount
> > > recover if the provider module is subsequently loaded?  Or another
> > > provider configured such that
> > > rdma_resolve_addr/route() then picks an active device?
> >
> > On device removal, we have to destroy everything.
> >
> > On insertion, we'll have to create a fresh PD and MRs, which isn't
> > done now during reconnect. So, insertion needs work too.

Makes Sense to me too.
Thanks Chuck.

-Regards
 Devesh
> >
> 
> Got it, thanks!
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever III July 21, 2014, 5:30 p.m. UTC | #29
On Jul 21, 2014, at 1:07 PM, Devesh Sharma <Devesh.Sharma@Emulex.Com> wrote:

>>>>> Until that support is in place, obviously I would prefer that the
>>>>> removal of the underlying driver be prevented while there are NFS
>>>>> mounts in place. I think that's what NFS users have come to expect.
>>>>> 
>>>>> In other words, don't allow device removal until we have support
>>>>> for device insertion :-)
> 
> This needs a fresh series of patches?

“do not allow removal” would likely take an approach similar to what you
originally posted, unless someone has an idea how to use a CM_EVENT to
make this work, or there is an objection from the kernel RDMA folks.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Devesh Sharma July 22, 2014, 5:06 a.m. UTC | #30
> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> Sent: Monday, July 21, 2014 11:01 PM
> To: Devesh Sharma
> Cc: Steve Wise; Shirley Ma; Hefty, Sean; Roland Dreier; linux-
> rdma@vger.kernel.org
> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
> module
> 
> 
> On Jul 21, 2014, at 1:07 PM, Devesh Sharma <Devesh.Sharma@Emulex.Com>
> wrote:
> 
> >>>>> Until that support is in place, obviously I would prefer that the
> >>>>> removal of the underlying driver be prevented while there are NFS
> >>>>> mounts in place. I think that's what NFS users have come to expect.
> >>>>>
> >>>>> In other words, don't allow device removal until we have support
> >>>>> for device insertion :-)
> >
> > This needs a fresh series of patches?
> 
> "do not allow removal" would likely take an approach similar to what you
> originally posted, unless someone has an idea how to use a CM_EVENT to
> make this work, or there is an objection from the kernel RDMA folks.

Okay, I will re-work the patch if need be.

> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever III July 30, 2014, 6:39 p.m. UTC | #31
On Jul 22, 2014, at 1:06 AM, Devesh Sharma <Devesh.Sharma@Emulex.Com> wrote:

>> -----Original Message-----
>> From: Chuck Lever [mailto:chuck.lever@oracle.com]
>> Sent: Monday, July 21, 2014 11:01 PM
>> To: Devesh Sharma
>> Cc: Steve Wise; Shirley Ma; Hefty, Sean; Roland Dreier; linux-
>> rdma@vger.kernel.org
>> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
>> module
>> 
>> 
>> On Jul 21, 2014, at 1:07 PM, Devesh Sharma <Devesh.Sharma@Emulex.Com>
>> wrote:
>> 
>>>>>>> Until that support is in place, obviously I would prefer that the
>>>>>>> removal of the underlying driver be prevented while there are NFS
>>>>>>> mounts in place. I think that's what NFS users have come to expect.
>>>>>>> 
>>>>>>> In other words, don't allow device removal until we have support
>>>>>>> for device insertion :-)
>>> 
>>> This needs a fresh series of patches?
>> 
>> "do not allow removal" would likely take an approach similar to what you
>> originally posted, unless someone has an idea how to use a CM_EVENT to
>> make this work, or there is an objection from the kernel RDMA folks.
> 
> Okay, I will re-work the patch if need be.

Devesh, if there isn’t one already, could you file an upstream bug at

  http://bugzilla.linux-nfs.org

that documents the shutdown hang and perhaps summarizes this thread?
Thanks!

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Devesh Sharma July 31, 2014, 5:14 a.m. UTC | #32
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Chuck Lever
> Sent: Thursday, July 31, 2014 12:09 AM
> To: Devesh Sharma
> Cc: Steve Wise; Shirley Ma; Hefty, Sean; Roland Dreier; linux-
> rdma@vger.kernel.org
> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
> module
> 
> 
> On Jul 22, 2014, at 1:06 AM, Devesh Sharma <Devesh.Sharma@Emulex.Com>
> wrote:
> 
> >> -----Original Message-----
> >> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> >> Sent: Monday, July 21, 2014 11:01 PM
> >> To: Devesh Sharma
> >> Cc: Steve Wise; Shirley Ma; Hefty, Sean; Roland Dreier; linux-
> >> rdma@vger.kernel.org
> >> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
> >> module
> >>
> >>
> >> On Jul 21, 2014, at 1:07 PM, Devesh Sharma
> <Devesh.Sharma@Emulex.Com>
> >> wrote:
> >>
> >>>>>>> Until that support is in place, obviously I would prefer that
> >>>>>>> the removal of the underlying driver be prevented while there
> >>>>>>> are NFS mounts in place. I think that's what NFS users have come to
> expect.
> >>>>>>>
> >>>>>>> In other words, don't allow device removal until we have support
> >>>>>>> for device insertion :-)
> >>>
> >>> This needs a fresh series of patches?
> >>
> >> "do not allow removal" would likely take an approach similar to what
> >> you originally posted, unless someone has an idea how to use a
> >> CM_EVENT to make this work, or there is an objection from the kernel
> RDMA folks.
> >
> > Okay, I will re-work the patch if need be.
> 
> Devesh, if there isn't one already, could you file an upstream bug at
> 
>   http://bugzilla.linux-nfs.org
> 
> that documents the shutdown hang and perhaps summarizes this thread?
> Thanks!

A bug has been filed
https://bugzilla.linux-nfs.org/show_bug.cgi?id=266

> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Devesh Sharma Aug. 18, 2014, 9:52 a.m. UTC | #33
Hi Chuk,

Can this patch be lined up for merger. Looks like there are no oppositions to this change

-Regards
 Devesh

> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Devesh Sharma
> Sent: Thursday, July 31, 2014 10:45 AM
> To: Chuck Lever
> Cc: Steve Wise; Shirley Ma; Hefty, Sean; Roland Dreier; linux-
> rdma@vger.kernel.org
> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider
> module
> 
> > -----Original Message-----
> > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > owner@vger.kernel.org] On Behalf Of Chuck Lever
> > Sent: Thursday, July 31, 2014 12:09 AM
> > To: Devesh Sharma
> > Cc: Steve Wise; Shirley Ma; Hefty, Sean; Roland Dreier; linux-
> > rdma@vger.kernel.org
> > Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
> > module
> >
> >
> > On Jul 22, 2014, at 1:06 AM, Devesh Sharma
> <Devesh.Sharma@Emulex.Com>
> > wrote:
> >
> > >> -----Original Message-----
> > >> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> > >> Sent: Monday, July 21, 2014 11:01 PM
> > >> To: Devesh Sharma
> > >> Cc: Steve Wise; Shirley Ma; Hefty, Sean; Roland Dreier; linux-
> > >> rdma@vger.kernel.org
> > >> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma
> > >> provider module
> > >>
> > >>
> > >> On Jul 21, 2014, at 1:07 PM, Devesh Sharma
> > <Devesh.Sharma@Emulex.Com>
> > >> wrote:
> > >>
> > >>>>>>> Until that support is in place, obviously I would prefer that
> > >>>>>>> the removal of the underlying driver be prevented while there
> > >>>>>>> are NFS mounts in place. I think that's what NFS users have
> > >>>>>>> come to
> > expect.
> > >>>>>>>
> > >>>>>>> In other words, don't allow device removal until we have
> > >>>>>>> support for device insertion :-)
> > >>>
> > >>> This needs a fresh series of patches?
> > >>
> > >> "do not allow removal" would likely take an approach similar to
> > >> what you originally posted, unless someone has an idea how to use a
> > >> CM_EVENT to make this work, or there is an objection from the
> > >> kernel
> > RDMA folks.
> > >
> > > Okay, I will re-work the patch if need be.
> >
> > Devesh, if there isn't one already, could you file an upstream bug at
> >
> >   http://bugzilla.linux-nfs.org
> >
> > that documents the shutdown hang and perhaps summarizes this thread?
> > Thanks!
> 
> A bug has been filed
> https://bugzilla.linux-nfs.org/show_bug.cgi?id=266
> 
> >
> > --
> > Chuck Lever
> > chuck[dot]lever[at]oracle[dot]com
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > in the body of a message to majordomo@vger.kernel.org More
> majordomo
> > info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever III Aug. 18, 2014, 1:13 p.m. UTC | #34
> On Aug 18, 2014, at 5:52 AM, Devesh Sharma <Devesh.Sharma@Emulex.Com> wrote:
> 
> Hi Chuk,
> 
> Can this patch be lined up for merger. Looks like there are no oppositions to this change

Sorry, I was expecting a fresh rewritten patch. I haven't done a deep review of the original.

Anything you want to merge should be independently tested, of course. For example Steve should include it in his testing branch.


> 
> -Regards
> Devesh
> 
>> -----Original Message-----
>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>> owner@vger.kernel.org] On Behalf Of Devesh Sharma
>> Sent: Thursday, July 31, 2014 10:45 AM
>> To: Chuck Lever
>> Cc: Steve Wise; Shirley Ma; Hefty, Sean; Roland Dreier; linux-
>> rdma@vger.kernel.org
>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider
>> module
>> 
>>> -----Original Message-----
>>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>>> owner@vger.kernel.org] On Behalf Of Chuck Lever
>>> Sent: Thursday, July 31, 2014 12:09 AM
>>> To: Devesh Sharma
>>> Cc: Steve Wise; Shirley Ma; Hefty, Sean; Roland Dreier; linux-
>>> rdma@vger.kernel.org
>>> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
>>> module
>>> 
>>> 
>>> On Jul 22, 2014, at 1:06 AM, Devesh Sharma
>> <Devesh.Sharma@Emulex.Com>
>>> wrote:
>>> 
>>>>> -----Original Message-----
>>>>> From: Chuck Lever [mailto:chuck.lever@oracle.com]
>>>>> Sent: Monday, July 21, 2014 11:01 PM
>>>>> To: Devesh Sharma
>>>>> Cc: Steve Wise; Shirley Ma; Hefty, Sean; Roland Dreier; linux-
>>>>> rdma@vger.kernel.org
>>>>> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma
>>>>> provider module
>>>>> 
>>>>> 
>>>>> On Jul 21, 2014, at 1:07 PM, Devesh Sharma
>>> <Devesh.Sharma@Emulex.Com>
>>>>> wrote:
>>>>> 
>>>>>>>>>> Until that support is in place, obviously I would prefer that
>>>>>>>>>> the removal of the underlying driver be prevented while there
>>>>>>>>>> are NFS mounts in place. I think that's what NFS users have
>>>>>>>>>> come to
>>> expect.
>>>>>>>>>> 
>>>>>>>>>> In other words, don't allow device removal until we have
>>>>>>>>>> support for device insertion :-)
>>>>>> 
>>>>>> This needs a fresh series of patches?
>>>>> 
>>>>> "do not allow removal" would likely take an approach similar to
>>>>> what you originally posted, unless someone has an idea how to use a
>>>>> CM_EVENT to make this work, or there is an objection from the
>>>>> kernel
>>> RDMA folks.
>>>> 
>>>> Okay, I will re-work the patch if need be.
>>> 
>>> Devesh, if there isn't one already, could you file an upstream bug at
>>> 
>>>  http://bugzilla.linux-nfs.org
>>> 
>>> that documents the shutdown hang and perhaps summarizes this thread?
>>> Thanks!
>> 
>> A bug has been filed
>> https://bugzilla.linux-nfs.org/show_bug.cgi?id=266
>> 
>>> 
>>> --
>>> Chuck Lever
>>> chuck[dot]lever[at]oracle[dot]com
>>> 
>>> 
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
>>> in the body of a message to majordomo@vger.kernel.org More
>> majordomo
>>> info at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
>> body of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 6ead5df..b00e55e 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -457,6 +457,11 @@  rpcrdma_create_id(struct rpcrdma_xprt *xprt,
 	}
 	wait_for_completion_interruptible_timeout(&ia->ri_done,
 				msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1);
+	if (!ia->ri_async_rc && !try_module_get(id->device->owner)) {
+		dprintk("RPC:       %s: Failed to get device module\n",
+			__func__);
+		ia->ri_async_rc = -ENODEV;
+	}
 	rc = ia->ri_async_rc;
 	if (rc)
 		goto out;
@@ -466,16 +471,18 @@  rpcrdma_create_id(struct rpcrdma_xprt *xprt,
 	if (rc) {
 		dprintk("RPC:       %s: rdma_resolve_route() failed %i\n",
 			__func__, rc);
-		goto out;
+		goto out_put;
 	}
 	wait_for_completion_interruptible_timeout(&ia->ri_done,
 				msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1);
 	rc = ia->ri_async_rc;
 	if (rc)
-		goto out;
+		goto out_put;
 
 	return id;
 
+out_put:
+	module_put(id->device->owner);
 out:
 	rdma_destroy_id(id);
 	return ERR_PTR(rc);
@@ -613,6 +620,7 @@  rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 	rwlock_init(&ia->ri_qplock);
 	return 0;
 out2:
+	module_put(ia->ri_id->device->owner);
 	rdma_destroy_id(ia->ri_id);
 	ia->ri_id = NULL;
 out1:
@@ -638,9 +646,11 @@  rpcrdma_ia_close(struct rpcrdma_ia *ia)
 	if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
 		if (ia->ri_id->qp)
 			rdma_destroy_qp(ia->ri_id);
+		module_put(ia->ri_id->device->owner);
 		rdma_destroy_id(ia->ri_id);
 		ia->ri_id = NULL;
 	}
+
 	if (ia->ri_pd != NULL && !IS_ERR(ia->ri_pd)) {
 		rc = ib_dealloc_pd(ia->ri_pd);
 		dprintk("RPC:       %s: ib_dealloc_pd returned %i\n",
@@ -886,6 +896,7 @@  retry:
 		if (ia->ri_id->device != id->device) {
 			printk("RPC:       %s: can't reconnect on "
 				"different device!\n", __func__);
+			module_put(id->device->owner);
 			rdma_destroy_id(id);
 			rc = -ENETUNREACH;
 			goto out;
@@ -895,6 +906,7 @@  retry:
 		if (rc) {
 			dprintk("RPC:       %s: rdma_create_qp failed %i\n",
 				__func__, rc);
+			module_put(id->device->owner);
 			rdma_destroy_id(id);
 			rc = -ENETUNREACH;
 			goto out;
@@ -906,6 +918,7 @@  retry:
 		write_unlock(&ia->ri_qplock);
 
 		rdma_destroy_qp(old);
+		module_put(old->device->owner);
 		rdma_destroy_id(old);
 	} else {
 		dprintk("RPC:       %s: connecting...\n", __func__);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index c419498..b35fa21 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -44,6 +44,7 @@ 
 #include <linux/spinlock.h> 		/* spinlock_t, etc */
 #include <linux/atomic.h>			/* atomic_t, etc */
 #include <linux/workqueue.h>		/* struct work_struct */
+#include <linux/module.h>		/* try_module_get()/module_put() */
 
 #include <rdma/rdma_cm.h>		/* RDMA connection api */
 #include <rdma/ib_verbs.h>		/* RDMA verbs api */