mbox series

[for-next,v11,00/13] Fix race conditions in rxe_pool

Message ID 20220304000808.225811-1-rpearsonhpe@gmail.com (mailing list archive)
Headers show
Series Fix race conditions in rxe_pool | expand

Message

Bob Pearson March 4, 2022, 12:07 a.m. UTC
There are several race conditions discovered in the current rdma_rxe
driver.  They mostly relate to races between normal operations and
destroying objects.  This patch series
 - Makes several minor cleanups in rxe_pool.[ch]
 - Replaces the red-black trees currently used by xarrays for indices
 - Corrects several reference counting errors
 - Adds wait for completions to the paths in verbs APIs which destroy
   objects.
 - Changes read side locking to rcu.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
v11
  Rebased to current for-next.
  Reordered patches and made other changes to respond to issues
  reported by Jason Gunthorpe.
v10
  Rebased to current wip/jgg-for-next.
  Split some patches into smaller ones.
v9
  Corrected issues reported by Jason Gunthorpe,
  Converted locking in rxe_mcast.c and rxe_pool.c to use RCU
  Split up the patches into smaller changes
v8
  Fixed an additional race in 3/8 which was not handled correctly.
v7
  Corrected issues reported by Jason Gunthorpe
Link: https://lore.kernel.org/linux-rdma/20211207190947.GH6385@nvidia.com/
Link: https://lore.kernel.org/linux-rdma/20211207191857.GI6385@nvidia.com/
Link: https://lore.kernel.org/linux-rdma/20211207192824.GJ6385@nvidia.com/
v6
  Fixed a kzalloc flags bug.
  Fixed comment bug reported by 'Kernel Test Robot'.
  Changed type of rxe_pool.c in __rxe_fini().
v5
  Removed patches already accepted into for-next and addressed comments
  from Jason Gunthorpe.
v4
  Restructured patch series to change to xarray earlier which
  greatly simplified the changes.
  Rebased to current for-next
v3
  Changed rxe_alloc to use GFP_KERNEL
  Addressed other comments by Jason Gunthorp
  Merged the previous 06/10 and 07/10 patches into one since they overlapped
  Added some minor cleanups as 10/10
v2
  Rebased to current for-next.
  Added 4 additional patches

Bob Pearson (13):
  RDMA/rxe: Fix ref error in rxe_av.c
  RDMA/rxe: Replace mr by rkey in responder resources
  RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
  RDMA/rxe: Delete _locked() APIs for pool objects
  RDMA/rxe: Replace obj by elem in declaration
  RDMA/rxe: Move max_elem into rxe_type_info
  RDMA/rxe: Shorten pool names in rxe_pool.c
  RDMA/rxe: Replace red-black trees by xarrays
  RDMA/rxe: Use standard names for ref counting
  RDMA/rxe: Stop lookup of partially built objects
  RDMA/rxe: Add wait_for_completion to pool objects
  RDMA/rxe: Convert read side locking to rcu
  RDMA/rxe: Cleanup rxe_pool.c

 drivers/infiniband/sw/rxe/rxe.c       |  88 +----
 drivers/infiniband/sw/rxe/rxe_av.c    |  19 +-
 drivers/infiniband/sw/rxe/rxe_comp.c  |   8 +-
 drivers/infiniband/sw/rxe/rxe_loc.h   |   5 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c |   4 +-
 drivers/infiniband/sw/rxe/rxe_mr.c    |  17 +-
 drivers/infiniband/sw/rxe/rxe_mw.c    |  42 ++-
 drivers/infiniband/sw/rxe/rxe_net.c   |  23 +-
 drivers/infiniband/sw/rxe/rxe_pool.c  | 461 ++++++++++++++------------
 drivers/infiniband/sw/rxe/rxe_pool.h  |  75 ++---
 drivers/infiniband/sw/rxe/rxe_qp.c    |  38 +--
 drivers/infiniband/sw/rxe/rxe_recv.c  |   8 +-
 drivers/infiniband/sw/rxe/rxe_req.c   |  61 ++--
 drivers/infiniband/sw/rxe/rxe_resp.c  | 149 ++++++---
 drivers/infiniband/sw/rxe/rxe_verbs.c |  89 +++--
 drivers/infiniband/sw/rxe/rxe_verbs.h |   1 -
 16 files changed, 540 insertions(+), 548 deletions(-)


base-commit: a80501b89152adb29adc7ab943d75c7345f9a3fb

Comments

Jason Gunthorpe March 16, 2022, 12:25 a.m. UTC | #1
On Thu, Mar 03, 2022 at 06:07:56PM -0600, Bob Pearson wrote:
> There are several race conditions discovered in the current rdma_rxe
> driver.  They mostly relate to races between normal operations and
> destroying objects.  This patch series
>  - Makes several minor cleanups in rxe_pool.[ch]
>  - Replaces the red-black trees currently used by xarrays for indices
>  - Corrects several reference counting errors
>  - Adds wait for completions to the paths in verbs APIs which destroy
>    objects.
>  - Changes read side locking to rcu.
> 
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> v11
>   Rebased to current for-next.
>   Reordered patches and made other changes to respond to issues
>   reported by Jason Gunthorpe.
> v10
>   Rebased to current wip/jgg-for-next.
>   Split some patches into smaller ones.
> v9
>   Corrected issues reported by Jason Gunthorpe,
>   Converted locking in rxe_mcast.c and rxe_pool.c to use RCU
>   Split up the patches into smaller changes
> v8
>   Fixed an additional race in 3/8 which was not handled correctly.
> v7
>   Corrected issues reported by Jason Gunthorpe
> Link: https://lore.kernel.org/linux-rdma/20211207190947.GH6385@nvidia.com/
> Link: https://lore.kernel.org/linux-rdma/20211207191857.GI6385@nvidia.com/
> Link: https://lore.kernel.org/linux-rdma/20211207192824.GJ6385@nvidia.com/
> v6
>   Fixed a kzalloc flags bug.
>   Fixed comment bug reported by 'Kernel Test Robot'.
>   Changed type of rxe_pool.c in __rxe_fini().
> v5
>   Removed patches already accepted into for-next and addressed comments
>   from Jason Gunthorpe.
> v4
>   Restructured patch series to change to xarray earlier which
>   greatly simplified the changes.
>   Rebased to current for-next
> v3
>   Changed rxe_alloc to use GFP_KERNEL
>   Addressed other comments by Jason Gunthorp
>   Merged the previous 06/10 and 07/10 patches into one since they overlapped
>   Added some minor cleanups as 10/10
> v2
>   Rebased to current for-next.
>   Added 4 additional patches
> 
> Bob Pearson (13):
>   RDMA/rxe: Fix ref error in rxe_av.c
>   RDMA/rxe: Replace mr by rkey in responder resources
>   RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
>   RDMA/rxe: Delete _locked() APIs for pool objects
>   RDMA/rxe: Replace obj by elem in declaration
>   RDMA/rxe: Move max_elem into rxe_type_info
>   RDMA/rxe: Shorten pool names in rxe_pool.c
>   RDMA/rxe: Replace red-black trees by xarrays
>   RDMA/rxe: Use standard names for ref counting

If you let me know about the WARN_ON I think up to here is good

Thanks,
Jason
Bob Pearson March 16, 2022, 4:05 a.m. UTC | #2
On 3/15/22 19:25, Jason Gunthorpe wrote:
> On Thu, Mar 03, 2022 at 06:07:56PM -0600, Bob Pearson wrote:
>> There are several race conditions discovered in the current rdma_rxe
>> driver.  They mostly relate to races between normal operations and
>> destroying objects.  This patch series
>>  - Makes several minor cleanups in rxe_pool.[ch]
>>  - Replaces the red-black trees currently used by xarrays for indices
>>  - Corrects several reference counting errors
>>  - Adds wait for completions to the paths in verbs APIs which destroy
>>    objects.
>>  - Changes read side locking to rcu.
>>
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> v11
>>   Rebased to current for-next.
>>   Reordered patches and made other changes to respond to issues
>>   reported by Jason Gunthorpe.
>> v10
>>   Rebased to current wip/jgg-for-next.
>>   Split some patches into smaller ones.
>> v9
>>   Corrected issues reported by Jason Gunthorpe,
>>   Converted locking in rxe_mcast.c and rxe_pool.c to use RCU
>>   Split up the patches into smaller changes
>> v8
>>   Fixed an additional race in 3/8 which was not handled correctly.
>> v7
>>   Corrected issues reported by Jason Gunthorpe
>> Link: https://lore.kernel.org/linux-rdma/20211207190947.GH6385@nvidia.com/
>> Link: https://lore.kernel.org/linux-rdma/20211207191857.GI6385@nvidia.com/
>> Link: https://lore.kernel.org/linux-rdma/20211207192824.GJ6385@nvidia.com/
>> v6
>>   Fixed a kzalloc flags bug.
>>   Fixed comment bug reported by 'Kernel Test Robot'.
>>   Changed type of rxe_pool.c in __rxe_fini().
>> v5
>>   Removed patches already accepted into for-next and addressed comments
>>   from Jason Gunthorpe.
>> v4
>>   Restructured patch series to change to xarray earlier which
>>   greatly simplified the changes.
>>   Rebased to current for-next
>> v3
>>   Changed rxe_alloc to use GFP_KERNEL
>>   Addressed other comments by Jason Gunthorp
>>   Merged the previous 06/10 and 07/10 patches into one since they overlapped
>>   Added some minor cleanups as 10/10
>> v2
>>   Rebased to current for-next.
>>   Added 4 additional patches
>>
>> Bob Pearson (13):
>>   RDMA/rxe: Fix ref error in rxe_av.c
>>   RDMA/rxe: Replace mr by rkey in responder resources
>>   RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
>>   RDMA/rxe: Delete _locked() APIs for pool objects
>>   RDMA/rxe: Replace obj by elem in declaration
>>   RDMA/rxe: Move max_elem into rxe_type_info
>>   RDMA/rxe: Shorten pool names in rxe_pool.c
>>   RDMA/rxe: Replace red-black trees by xarrays
>>   RDMA/rxe: Use standard names for ref counting
> 
> If you let me know about the WARN_ON I think up to here is good
> 
> Thanks,
> Jason

I agreed to the change.
Jason Gunthorpe March 16, 2022, 4:08 p.m. UTC | #3
On Tue, Mar 15, 2022 at 11:05:48PM -0500, Bob Pearson wrote:
> >> Bob Pearson (13):
> >>   RDMA/rxe: Fix ref error in rxe_av.c
> >>   RDMA/rxe: Replace mr by rkey in responder resources
> >>   RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
> >>   RDMA/rxe: Delete _locked() APIs for pool objects
> >>   RDMA/rxe: Replace obj by elem in declaration
> >>   RDMA/rxe: Move max_elem into rxe_type_info
> >>   RDMA/rxe: Shorten pool names in rxe_pool.c
> >>   RDMA/rxe: Replace red-black trees by xarrays
> >>   RDMA/rxe: Use standard names for ref counting
> > 
> > If you let me know about the WARN_ON I think up to here is good
> 
> I agreed to the change.

Ok applied to for-next

Thanks,
Jason
Pearson, Robert B March 16, 2022, 4:09 p.m. UTC | #4
-----Original Message-----
From: Jason Gunthorpe <jgg@nvidia.com> 
Sent: Wednesday, March 16, 2022 11:08 AM
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: zyjzyj2000@gmail.com; linux-rdma@vger.kernel.org
Subject: Re: [PATCH for-next v11 00/13] Fix race conditions in rxe_pool

On Tue, Mar 15, 2022 at 11:05:48PM -0500, Bob Pearson wrote:
> >> Bob Pearson (13):
> >>   RDMA/rxe: Fix ref error in rxe_av.c
> >>   RDMA/rxe: Replace mr by rkey in responder resources
> >>   RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
> >>   RDMA/rxe: Delete _locked() APIs for pool objects
> >>   RDMA/rxe: Replace obj by elem in declaration
> >>   RDMA/rxe: Move max_elem into rxe_type_info
> >>   RDMA/rxe: Shorten pool names in rxe_pool.c
> >>   RDMA/rxe: Replace red-black trees by xarrays
> >>   RDMA/rxe: Use standard names for ref counting
> > 
> > If you let me know about the WARN_ON I think up to here is good
> 
> I agreed to the change.

Ok applied to for-next

Thanks,
Jason


Thanks!