diff mbox series

[for-next,v11,08/11] RDMA/rxe: Add code to cleanup mcast memory

Message ID 20220208211644.123457-9-rpearsonhpe@gmail.com (mailing list archive)
State Superseded
Headers show
Series Move two object pools to rxe_mcast.c | expand

Commit Message

Bob Pearson Feb. 8, 2022, 9:16 p.m. UTC
Well behaved applications will free all memory allocated by multicast
but programs which do not clean up properly can leave behind allocated
memory when the rxe driver is unloaded. This patch walks the red-black
tree holding multicast group elements and then walks the list of attached
qp's freeing the mca's and finally the mcg's.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |  2 ++
 drivers/infiniband/sw/rxe/rxe_loc.h   |  1 +
 drivers/infiniband/sw/rxe/rxe_mcast.c | 31 +++++++++++++++++++++++++++
 3 files changed, 34 insertions(+)

Comments

Jason Gunthorpe Feb. 11, 2022, 6:43 p.m. UTC | #1
On Tue, Feb 08, 2022 at 03:16:42PM -0600, Bob Pearson wrote:
> Well behaved applications will free all memory allocated by multicast
> but programs which do not clean up properly can leave behind allocated
> memory when the rxe driver is unloaded. This patch walks the red-black
> tree holding multicast group elements and then walks the list of attached
> qp's freeing the mca's and finally the mcg's.

How does this happen? the ib core ensures that all uobjects are
destroyed, so if something is still in the rb tree here it means that
an earlier uobject destruction leaked it

Jason
Bob Pearson Feb. 11, 2022, 7:36 p.m. UTC | #2
On 2/11/22 12:43, Jason Gunthorpe wrote:
> On Tue, Feb 08, 2022 at 03:16:42PM -0600, Bob Pearson wrote:
>> Well behaved applications will free all memory allocated by multicast
>> but programs which do not clean up properly can leave behind allocated
>> memory when the rxe driver is unloaded. This patch walks the red-black
>> tree holding multicast group elements and then walks the list of attached
>> qp's freeing the mca's and finally the mcg's.
> 
> How does this happen? the ib core ensures that all uobjects are
> destroyed, so if something is still in the rb tree here it means that
> an earlier uobject destruction leaked it
> 
> Jason

The mc_grp and mc_elem objects are not rdma-core uobjects. So their memory
is allocated by the rxe driver. They get created by ib_attach_mcast and destroyed
by ib_detach_mcast. If an application crashes without calling a matching
ib_detach_mcast for each attachment the driver would have leaked the memory.
This patch fixes that.

Bob
Jason Gunthorpe Feb. 11, 2022, 7:56 p.m. UTC | #3
On Fri, Feb 11, 2022 at 01:36:06PM -0600, Bob Pearson wrote:
> On 2/11/22 12:43, Jason Gunthorpe wrote:
> > On Tue, Feb 08, 2022 at 03:16:42PM -0600, Bob Pearson wrote:
> >> Well behaved applications will free all memory allocated by multicast
> >> but programs which do not clean up properly can leave behind allocated
> >> memory when the rxe driver is unloaded. This patch walks the red-black
> >> tree holding multicast group elements and then walks the list of attached
> >> qp's freeing the mca's and finally the mcg's.
> > 
> > How does this happen? the ib core ensures that all uobjects are
> > destroyed, so if something is still in the rb tree here it means that
> > an earlier uobject destruction leaked it
> > 
> > Jason
> 
> The mc_grp and mc_elem objects are not rdma-core uobjects. So their memory
> is allocated by the rxe driver. They get created by ib_attach_mcast and destroyed
> by ib_detach_mcast. If an application crashes without calling a matching
> ib_detach_mcast for each attachment the driver would have leaked the memory.
> This patch fixes that.

The mcast attachment is affiliated with a QP, when all the QPs are
destroyed, which are rdma-core objects, then all attachments should be
free'd as well. That should have happened by the time things get here

I would expect a simple WARN_ON that the rb tree is empty in destroy
to catch any bugs.

Jason
Bob Pearson Feb. 12, 2022, 12:37 a.m. UTC | #4
On 2/11/22 13:56, Jason Gunthorpe wrote:
> On Fri, Feb 11, 2022 at 01:36:06PM -0600, Bob Pearson wrote:
>> On 2/11/22 12:43, Jason Gunthorpe wrote:
>>> On Tue, Feb 08, 2022 at 03:16:42PM -0600, Bob Pearson wrote:
>>>> Well behaved applications will free all memory allocated by multicast
>>>> but programs which do not clean up properly can leave behind allocated
>>>> memory when the rxe driver is unloaded. This patch walks the red-black
>>>> tree holding multicast group elements and then walks the list of attached
>>>> qp's freeing the mca's and finally the mcg's.
>>>
>>> How does this happen? the ib core ensures that all uobjects are
>>> destroyed, so if something is still in the rb tree here it means that
>>> an earlier uobject destruction leaked it
>>>
>>> Jason
>>
>> The mc_grp and mc_elem objects are not rdma-core uobjects. So their memory
>> is allocated by the rxe driver. They get created by ib_attach_mcast and destroyed
>> by ib_detach_mcast. If an application crashes without calling a matching
>> ib_detach_mcast for each attachment the driver would have leaked the memory.
>> This patch fixes that.
> 
> The mcast attachment is affiliated with a QP, when all the QPs are
> destroyed, which are rdma-core objects, then all attachments should be
> free'd as well. That should have happened by the time things get here
> 
> I would expect a simple WARN_ON that the rb tree is empty in destroy
> to catch any bugs.
> 
> Jason
I wrote a simple user test case that calls ibv_attach_mcast() and then sleeps.
If I type ^C the qp is detached as you predicted so someone is counting
them somewhere. It isn't obvious in rdma-core. Not sure what you are suggesting
above. The detach code already checks to see if the attachment is *not* present.
The only time it is clear that there shouldn't be any more attachments present
is when the device is closed which is what the cleanup code is doing. I can add
a warning if there was anything there but it will likely never get called since
someone above me is actively cleaning them up from failed apps. Or we can just drop
this patch and assume it isn't needed.

Bob
Jason Gunthorpe Feb. 14, 2022, 5:41 p.m. UTC | #5
On Fri, Feb 11, 2022 at 06:37:50PM -0600, Bob Pearson wrote:
> > The mcast attachment is affiliated with a QP, when all the QPs are
> > destroyed, which are rdma-core objects, then all attachments should be
> > free'd as well. That should have happened by the time things get here
> > 
> > I would expect a simple WARN_ON that the rb tree is empty in destroy
> > to catch any bugs.
> > 
> > Jason
>
> I wrote a simple user test case that calls ibv_attach_mcast() and
> then sleeps.  If I type ^C the qp is detached as you predicted so
> someone is counting them somewhere. It isn't obvious in rdma-core.

QP destroy is done inside the kernel in all the uverbs stuff, detatch
from multicast during QP destroy should have been inside RXE's qp
destroy function.

> never get called since someone above me is actively cleaning them up
> from failed apps. Or we can just drop this patch and assume it isn't
> needed.

I mean drop this patch and just put a WARN_ON to assert the rb tree is
empty before destroying the memory that holds it, just in case someone
adds a bug someday.

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 3520eb2db685..603b0156f889 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -29,6 +29,8 @@  void rxe_dealloc(struct ib_device *ib_dev)
 	rxe_pool_cleanup(&rxe->mr_pool);
 	rxe_pool_cleanup(&rxe->mw_pool);
 
+	rxe_cleanup_mcast(rxe);
+
 	if (rxe->tfm)
 		crypto_free_shash(rxe->tfm);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 409efeecd581..0bc1b7e2877c 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -44,6 +44,7 @@  struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid);
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 void rxe_cleanup_mcg(struct kref *kref);
+void rxe_cleanup_mcast(struct rxe_dev *rxe);
 
 /* rxe_mmap.c */
 struct rxe_mmap_info {
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 07c218788c59..846147878607 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -382,3 +382,34 @@  int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 
 	return rxe_detach_mcg(rxe, qp, mgid);
 }
+
+/**
+ * rxe_cleanup_mcast - cleanup all resources held by mcast
+ * @rxe: rxe object
+ *
+ * Called when rxe device is unloaded. Walk red-black tree to
+ * find all mcg's and then walk mcg->qp_list to find all mca's and
+ * free them. These should have been freed already if apps are
+ * well behaved.
+ */
+void rxe_cleanup_mcast(struct rxe_dev *rxe)
+{
+	struct rb_root *root = &rxe->mcg_tree;
+	struct rb_node *node, *next;
+	struct rxe_mcg *mcg;
+	struct rxe_mca *mca, *tmp;
+
+	for (node = rb_first(root); node; node = next) {
+		next = rb_next(node);
+		mcg = rb_entry(node, typeof(*mcg), node);
+
+		spin_lock_bh(&rxe->mcg_lock);
+		list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list)
+			kfree(mca);
+
+		__rxe_remove_mcg(mcg);
+		spin_unlock_bh(&rxe->mcg_lock);
+
+		kfree(mcg);
+	}
+}