Message ID | 20220304000808.225811-1-rpearsonhpe@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix race conditions in rxe_pool | expand |
On Thu, Mar 03, 2022 at 06:07:56PM -0600, Bob Pearson wrote: > There are several race conditions discovered in the current rdma_rxe > driver. They mostly relate to races between normal operations and > destroying objects. This patch series > - Makes several minor cleanups in rxe_pool.[ch] > - Replaces the red-black trees currently used by xarrays for indices > - Corrects several reference counting errors > - Adds wait for completions to the paths in verbs APIs which destroy > objects. > - Changes read side locking to rcu. > > Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > v11 > Rebased to current for-next. > Reordered patches and made other changes to respond to issues > reported by Jason Gunthorpe. > v10 > Rebased to current wip/jgg-for-next. > Split some patches into smaller ones. > v9 > Corrected issues reported by Jason Gunthorpe, > Converted locking in rxe_mcast.c and rxe_pool.c to use RCU > Split up the patches into smaller changes > v8 > Fixed an additional race in 3/8 which was not handled correctly. > v7 > Corrected issues reported by Jason Gunthorpe > Link: https://lore.kernel.org/linux-rdma/20211207190947.GH6385@nvidia.com/ > Link: https://lore.kernel.org/linux-rdma/20211207191857.GI6385@nvidia.com/ > Link: https://lore.kernel.org/linux-rdma/20211207192824.GJ6385@nvidia.com/ > v6 > Fixed a kzalloc flags bug. > Fixed comment bug reported by 'Kernel Test Robot'. > Changed type of rxe_pool.c in __rxe_fini(). > v5 > Removed patches already accepted into for-next and addressed comments > from Jason Gunthorpe. > v4 > Restructured patch series to change to xarray earlier which > greatly simplified the changes. > Rebased to current for-next > v3 > Changed rxe_alloc to use GFP_KERNEL > Addressed other comments by Jason Gunthorp > Merged the previous 06/10 and 07/10 patches into one since they overlapped > Added some minor cleanups as 10/10 > v2 > Rebased to current for-next. > Added 4 additional patches > > Bob Pearson (13): > RDMA/rxe: Fix ref error in rxe_av.c > RDMA/rxe: Replace mr by rkey in responder resources > RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC > RDMA/rxe: Delete _locked() APIs for pool objects > RDMA/rxe: Replace obj by elem in declaration > RDMA/rxe: Move max_elem into rxe_type_info > RDMA/rxe: Shorten pool names in rxe_pool.c > RDMA/rxe: Replace red-black trees by xarrays > RDMA/rxe: Use standard names for ref counting If you let me know about the WARN_ON I think up to here is good Thanks, Jason
On 3/15/22 19:25, Jason Gunthorpe wrote: > On Thu, Mar 03, 2022 at 06:07:56PM -0600, Bob Pearson wrote: >> There are several race conditions discovered in the current rdma_rxe >> driver. They mostly relate to races between normal operations and >> destroying objects. This patch series >> - Makes several minor cleanups in rxe_pool.[ch] >> - Replaces the red-black trees currently used by xarrays for indices >> - Corrects several reference counting errors >> - Adds wait for completions to the paths in verbs APIs which destroy >> objects. >> - Changes read side locking to rcu. >> >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> >> v11 >> Rebased to current for-next. >> Reordered patches and made other changes to respond to issues >> reported by Jason Gunthorpe. >> v10 >> Rebased to current wip/jgg-for-next. >> Split some patches into smaller ones. >> v9 >> Corrected issues reported by Jason Gunthorpe, >> Converted locking in rxe_mcast.c and rxe_pool.c to use RCU >> Split up the patches into smaller changes >> v8 >> Fixed an additional race in 3/8 which was not handled correctly. >> v7 >> Corrected issues reported by Jason Gunthorpe >> Link: https://lore.kernel.org/linux-rdma/20211207190947.GH6385@nvidia.com/ >> Link: https://lore.kernel.org/linux-rdma/20211207191857.GI6385@nvidia.com/ >> Link: https://lore.kernel.org/linux-rdma/20211207192824.GJ6385@nvidia.com/ >> v6 >> Fixed a kzalloc flags bug. >> Fixed comment bug reported by 'Kernel Test Robot'. >> Changed type of rxe_pool.c in __rxe_fini(). >> v5 >> Removed patches already accepted into for-next and addressed comments >> from Jason Gunthorpe. >> v4 >> Restructured patch series to change to xarray earlier which >> greatly simplified the changes. >> Rebased to current for-next >> v3 >> Changed rxe_alloc to use GFP_KERNEL >> Addressed other comments by Jason Gunthorp >> Merged the previous 06/10 and 07/10 patches into one since they overlapped >> Added some minor cleanups as 10/10 >> v2 >> Rebased to current for-next. >> Added 4 additional patches >> >> Bob Pearson (13): >> RDMA/rxe: Fix ref error in rxe_av.c >> RDMA/rxe: Replace mr by rkey in responder resources >> RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC >> RDMA/rxe: Delete _locked() APIs for pool objects >> RDMA/rxe: Replace obj by elem in declaration >> RDMA/rxe: Move max_elem into rxe_type_info >> RDMA/rxe: Shorten pool names in rxe_pool.c >> RDMA/rxe: Replace red-black trees by xarrays >> RDMA/rxe: Use standard names for ref counting > > If you let me know about the WARN_ON I think up to here is good > > Thanks, > Jason I agreed to the change.
On Tue, Mar 15, 2022 at 11:05:48PM -0500, Bob Pearson wrote: > >> Bob Pearson (13): > >> RDMA/rxe: Fix ref error in rxe_av.c > >> RDMA/rxe: Replace mr by rkey in responder resources > >> RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC > >> RDMA/rxe: Delete _locked() APIs for pool objects > >> RDMA/rxe: Replace obj by elem in declaration > >> RDMA/rxe: Move max_elem into rxe_type_info > >> RDMA/rxe: Shorten pool names in rxe_pool.c > >> RDMA/rxe: Replace red-black trees by xarrays > >> RDMA/rxe: Use standard names for ref counting > > > > If you let me know about the WARN_ON I think up to here is good > > I agreed to the change. Ok applied to for-next Thanks, Jason
-----Original Message----- From: Jason Gunthorpe <jgg@nvidia.com> Sent: Wednesday, March 16, 2022 11:08 AM To: Bob Pearson <rpearsonhpe@gmail.com> Cc: zyjzyj2000@gmail.com; linux-rdma@vger.kernel.org Subject: Re: [PATCH for-next v11 00/13] Fix race conditions in rxe_pool On Tue, Mar 15, 2022 at 11:05:48PM -0500, Bob Pearson wrote: > >> Bob Pearson (13): > >> RDMA/rxe: Fix ref error in rxe_av.c > >> RDMA/rxe: Replace mr by rkey in responder resources > >> RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC > >> RDMA/rxe: Delete _locked() APIs for pool objects > >> RDMA/rxe: Replace obj by elem in declaration > >> RDMA/rxe: Move max_elem into rxe_type_info > >> RDMA/rxe: Shorten pool names in rxe_pool.c > >> RDMA/rxe: Replace red-black trees by xarrays > >> RDMA/rxe: Use standard names for ref counting > > > > If you let me know about the WARN_ON I think up to here is good > > I agreed to the change. Ok applied to for-next Thanks, Jason Thanks!
There are several race conditions discovered in the current rdma_rxe driver. They mostly relate to races between normal operations and destroying objects. This patch series - Makes several minor cleanups in rxe_pool.[ch] - Replaces the red-black trees currently used by xarrays for indices - Corrects several reference counting errors - Adds wait for completions to the paths in verbs APIs which destroy objects. - Changes read side locking to rcu. Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> --- v11 Rebased to current for-next. Reordered patches and made other changes to respond to issues reported by Jason Gunthorpe. v10 Rebased to current wip/jgg-for-next. Split some patches into smaller ones. v9 Corrected issues reported by Jason Gunthorpe, Converted locking in rxe_mcast.c and rxe_pool.c to use RCU Split up the patches into smaller changes v8 Fixed an additional race in 3/8 which was not handled correctly. v7 Corrected issues reported by Jason Gunthorpe Link: https://lore.kernel.org/linux-rdma/20211207190947.GH6385@nvidia.com/ Link: https://lore.kernel.org/linux-rdma/20211207191857.GI6385@nvidia.com/ Link: https://lore.kernel.org/linux-rdma/20211207192824.GJ6385@nvidia.com/ v6 Fixed a kzalloc flags bug. Fixed comment bug reported by 'Kernel Test Robot'. Changed type of rxe_pool.c in __rxe_fini(). v5 Removed patches already accepted into for-next and addressed comments from Jason Gunthorpe. v4 Restructured patch series to change to xarray earlier which greatly simplified the changes. Rebased to current for-next v3 Changed rxe_alloc to use GFP_KERNEL Addressed other comments by Jason Gunthorp Merged the previous 06/10 and 07/10 patches into one since they overlapped Added some minor cleanups as 10/10 v2 Rebased to current for-next. Added 4 additional patches Bob Pearson (13): RDMA/rxe: Fix ref error in rxe_av.c RDMA/rxe: Replace mr by rkey in responder resources RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC RDMA/rxe: Delete _locked() APIs for pool objects RDMA/rxe: Replace obj by elem in declaration RDMA/rxe: Move max_elem into rxe_type_info RDMA/rxe: Shorten pool names in rxe_pool.c RDMA/rxe: Replace red-black trees by xarrays RDMA/rxe: Use standard names for ref counting RDMA/rxe: Stop lookup of partially built objects RDMA/rxe: Add wait_for_completion to pool objects RDMA/rxe: Convert read side locking to rcu RDMA/rxe: Cleanup rxe_pool.c drivers/infiniband/sw/rxe/rxe.c | 88 +---- drivers/infiniband/sw/rxe/rxe_av.c | 19 +- drivers/infiniband/sw/rxe/rxe_comp.c | 8 +- drivers/infiniband/sw/rxe/rxe_loc.h | 5 +- drivers/infiniband/sw/rxe/rxe_mcast.c | 4 +- drivers/infiniband/sw/rxe/rxe_mr.c | 17 +- drivers/infiniband/sw/rxe/rxe_mw.c | 42 ++- drivers/infiniband/sw/rxe/rxe_net.c | 23 +- drivers/infiniband/sw/rxe/rxe_pool.c | 461 ++++++++++++++------------ drivers/infiniband/sw/rxe/rxe_pool.h | 75 ++--- drivers/infiniband/sw/rxe/rxe_qp.c | 38 +-- drivers/infiniband/sw/rxe/rxe_recv.c | 8 +- drivers/infiniband/sw/rxe/rxe_req.c | 61 ++-- drivers/infiniband/sw/rxe/rxe_resp.c | 149 ++++++--- drivers/infiniband/sw/rxe/rxe_verbs.c | 89 +++-- drivers/infiniband/sw/rxe/rxe_verbs.h | 1 - 16 files changed, 540 insertions(+), 548 deletions(-) base-commit: a80501b89152adb29adc7ab943d75c7345f9a3fb