diff mbox series

[for-rc] IB/mlx4: Fix leak in id_map_find_del

Message ID 20200117135622.836563-1-haakon.bugge@oracle.com (mailing list archive)
State Superseded
Headers show
Series [for-rc] IB/mlx4: Fix leak in id_map_find_del | expand

Commit Message

Haakon Bugge Jan. 17, 2020, 1:56 p.m. UTC
Using CX-3 virtual functions, either from a bare-metal machine or
pass-through from a VM, MAD packets are proxied through the PF driver.

Since the VF drivers have separate name spaces for MAD Transaction Ids
(TIDs), the PF driver has to re-map the TIDs and keep the book keeping
in a cache.

Following the RDMA Connection Manager (CM) protocol, it is clear when
an entry has to evicted from the cache. When a DREP is sent from
mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar
when a REJ is received by the mlx4_ib_demux_cm_handler(),
id_map_find_del() is called.

This function wipes out the TID in use from the IDR or XArray and
removes the id_map_entry from the table.

In short, it does everything except the topping of the cake, which is
to remove the entry from the list and free it. In other words, for the
DREP and REJ cases enumerated above, both will leak one id_map_entry.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
---
 drivers/infiniband/hw/mlx4/cm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Haakon Bugge Jan. 20, 2020, 2:36 p.m. UTC | #1
> On 17 Jan 2020, at 14:56, Håkon Bugge <haakon.bugge@oracle.com> wrote:
> 
> Using CX-3 virtual functions, either from a bare-metal machine or
> pass-through from a VM, MAD packets are proxied through the PF driver.
> 
> Since the VF drivers have separate name spaces for MAD Transaction Ids
> (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
> in a cache.
> 
> Following the RDMA Connection Manager (CM) protocol, it is clear when
> an entry has to evicted from the cache. When a DREP is sent from
> mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar
> when a REJ is received by the mlx4_ib_demux_cm_handler(),
> id_map_find_del() is called.
> 
> This function wipes out the TID in use from the IDR or XArray and
> removes the id_map_entry from the table.
> 
> In short, it does everything except the topping of the cake, which is
> to remove the entry from the list and free it. In other words, for the
> DREP and REJ cases enumerated above, both will leak one id_map_entry.
> 
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>

When/if this gets merged, please add:

Cc: stable@vger.kernel.org

Thxs, Håkon

> ---
> drivers/infiniband/hw/mlx4/cm.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
> index ecd6cadd529a..1df6d3ccfc62 100644
> --- a/drivers/infiniband/hw/mlx4/cm.c
> +++ b/drivers/infiniband/hw/mlx4/cm.c
> @@ -197,8 +197,13 @@ static void id_map_find_del(struct ib_device *ibdev, int pv_cm_id)
> 	if (!ent)
> 		goto out;
> 	found_ent = id_map_find_by_sl_id(ibdev, ent->slave_id, ent->sl_cm_id);
> -	if (found_ent && found_ent == ent)
> +	if (found_ent && found_ent == ent) {
> 		rb_erase(&found_ent->node, sl_id_map);
> +		if (!ent->scheduled_delete) {
> +			list_del(&ent->list);
> +			kfree(ent);
> +		}
> +	}
> out:
> 	spin_unlock(&sriov->id_map_lock);
> }
> -- 
> 2.20.1
>
Leon Romanovsky Jan. 20, 2020, 3:06 p.m. UTC | #2
On Fri, Jan 17, 2020 at 02:56:22PM +0100, Håkon Bugge wrote:
> Using CX-3 virtual functions, either from a bare-metal machine or
> pass-through from a VM, MAD packets are proxied through the PF driver.
>
> Since the VF drivers have separate name spaces for MAD Transaction Ids
> (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
> in a cache.
>
> Following the RDMA Connection Manager (CM) protocol, it is clear when
> an entry has to evicted from the cache. When a DREP is sent from
> mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar
> when a REJ is received by the mlx4_ib_demux_cm_handler(),
> id_map_find_del() is called.
>
> This function wipes out the TID in use from the IDR or XArray and
> removes the id_map_entry from the table.
>
> In short, it does everything except the topping of the cake, which is
> to remove the entry from the list and free it. In other words, for the
> DREP and REJ cases enumerated above, both will leak one id_map_entry.
>
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> ---
>  drivers/infiniband/hw/mlx4/cm.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
> index ecd6cadd529a..1df6d3ccfc62 100644
> --- a/drivers/infiniband/hw/mlx4/cm.c
> +++ b/drivers/infiniband/hw/mlx4/cm.c
> @@ -197,8 +197,13 @@ static void id_map_find_del(struct ib_device *ibdev, int pv_cm_id)
>  	if (!ent)
>  		goto out;
>  	found_ent = id_map_find_by_sl_id(ibdev, ent->slave_id, ent->sl_cm_id);
> -	if (found_ent && found_ent == ent)
> +	if (found_ent && found_ent == ent) {
>  		rb_erase(&found_ent->node, sl_id_map);
> +		if (!ent->scheduled_delete) {

Why do we need to check scheduled_delete?

Isn't this to mark call to timeout cleanup (id_map_ent_timeout), which
can't race with id_map_find_del()? They both hold the same spinlock.

Thanks

> +			list_del(&ent->list);
> +			kfree(ent);
> +		}
> +	}
>  out:
>  	spin_unlock(&sriov->id_map_lock);
>  }
> --
> 2.20.1
>
Haakon Bugge Jan. 20, 2020, 3:49 p.m. UTC | #3
> On 20 Jan 2020, at 16:06, Leon Romanovsky <leon@kernel.org> wrote:
> 
> On Fri, Jan 17, 2020 at 02:56:22PM +0100, Håkon Bugge wrote:
>> Using CX-3 virtual functions, either from a bare-metal machine or
>> pass-through from a VM, MAD packets are proxied through the PF driver.
>> 
>> Since the VF drivers have separate name spaces for MAD Transaction Ids
>> (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
>> in a cache.
>> 
>> Following the RDMA Connection Manager (CM) protocol, it is clear when
>> an entry has to evicted from the cache. When a DREP is sent from
>> mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar
>> when a REJ is received by the mlx4_ib_demux_cm_handler(),
>> id_map_find_del() is called.
>> 
>> This function wipes out the TID in use from the IDR or XArray and
>> removes the id_map_entry from the table.
>> 
>> In short, it does everything except the topping of the cake, which is
>> to remove the entry from the list and free it. In other words, for the
>> DREP and REJ cases enumerated above, both will leak one id_map_entry.
>> 
>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>> ---
>> drivers/infiniband/hw/mlx4/cm.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
>> index ecd6cadd529a..1df6d3ccfc62 100644
>> --- a/drivers/infiniband/hw/mlx4/cm.c
>> +++ b/drivers/infiniband/hw/mlx4/cm.c
>> @@ -197,8 +197,13 @@ static void id_map_find_del(struct ib_device *ibdev, int pv_cm_id)
>> 	if (!ent)
>> 		goto out;
>> 	found_ent = id_map_find_by_sl_id(ibdev, ent->slave_id, ent->sl_cm_id);
>> -	if (found_ent && found_ent == ent)
>> +	if (found_ent && found_ent == ent) {
>> 		rb_erase(&found_ent->node, sl_id_map);
>> +		if (!ent->scheduled_delete) {
> 
> Why do we need to check scheduled_delete?

1. Node receives a DREQ and mlx4_ib_demux_cm_handler() is called, which again calls schedule_delayed(), which sets scheduled_delete.

2. DREQ is proxied over to the VM, which replies with a DREP.

3. The DREP is proxied over to the PF driver, mlx4_ib_multiplex_cm_handler() is called, id_map_find_del() is called. If it is freed now (without checking scheduled_delete), it will be a double free when the delayed work kicks in.

But this documents that the commit message is not accurate, it is only the REJ case that has a leak.

> Isn't this to mark call to timeout cleanup (id_map_ent_timeout), which
> can't race with id_map_find_del()? They both hold the same spinlock.

No race, but it can be set as per the above.

If you agree, I will send a v2 with corrected commit message.


Thxs, Håkon


> 
> Thanks
> 
>> +			list_del(&ent->list);
>> +			kfree(ent);
>> +		}
>> +	}
>> out:
>> 	spin_unlock(&sriov->id_map_lock);
>> }
>> --
>> 2.20.1
>>
Leon Romanovsky Jan. 20, 2020, 4:27 p.m. UTC | #4
On Mon, Jan 20, 2020 at 04:49:56PM +0100, Håkon Bugge wrote:
>
>
> > On 20 Jan 2020, at 16:06, Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Fri, Jan 17, 2020 at 02:56:22PM +0100, Håkon Bugge wrote:
> >> Using CX-3 virtual functions, either from a bare-metal machine or
> >> pass-through from a VM, MAD packets are proxied through the PF driver.
> >>
> >> Since the VF drivers have separate name spaces for MAD Transaction Ids
> >> (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
> >> in a cache.
> >>
> >> Following the RDMA Connection Manager (CM) protocol, it is clear when
> >> an entry has to evicted from the cache. When a DREP is sent from
> >> mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar
> >> when a REJ is received by the mlx4_ib_demux_cm_handler(),
> >> id_map_find_del() is called.
> >>
> >> This function wipes out the TID in use from the IDR or XArray and
> >> removes the id_map_entry from the table.
> >>
> >> In short, it does everything except the topping of the cake, which is
> >> to remove the entry from the list and free it. In other words, for the
> >> DREP and REJ cases enumerated above, both will leak one id_map_entry.
> >>
> >> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> >> ---
> >> drivers/infiniband/hw/mlx4/cm.c | 7 ++++++-
> >> 1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
> >> index ecd6cadd529a..1df6d3ccfc62 100644
> >> --- a/drivers/infiniband/hw/mlx4/cm.c
> >> +++ b/drivers/infiniband/hw/mlx4/cm.c
> >> @@ -197,8 +197,13 @@ static void id_map_find_del(struct ib_device *ibdev, int pv_cm_id)
> >> 	if (!ent)
> >> 		goto out;
> >> 	found_ent = id_map_find_by_sl_id(ibdev, ent->slave_id, ent->sl_cm_id);
> >> -	if (found_ent && found_ent == ent)
> >> +	if (found_ent && found_ent == ent) {
> >> 		rb_erase(&found_ent->node, sl_id_map);
> >> +		if (!ent->scheduled_delete) {
> >
> > Why do we need to check scheduled_delete?
>
> 1. Node receives a DREQ and mlx4_ib_demux_cm_handler() is called, which again calls schedule_delayed(), which sets scheduled_delete.
>
> 2. DREQ is proxied over to the VM, which replies with a DREP.
>
> 3. The DREP is proxied over to the PF driver, mlx4_ib_multiplex_cm_handler() is called, id_map_find_del() is called. If it is freed now (without checking scheduled_delete), it will be a double free when the delayed work kicks in.

It will be the case if we don't cancel delayed work inside
id_map_find_del(), but it raises other question. Why do we need two
identical delete functions? Can we convert id_map_find_del() callers
to use id_map_ent_timeout() instead?

Thanks

>
> But this documents that the commit message is not accurate, it is only the REJ case that has a leak.
>
> > Isn't this to mark call to timeout cleanup (id_map_ent_timeout), which
> > can't race with id_map_find_del()? They both hold the same spinlock.
>
> No race, but it can be set as per the above.
>
> If you agree, I will send a v2 with corrected commit message.
>
>
> Thxs, Håkon
>
>
> >
> > Thanks
> >
> >> +			list_del(&ent->list);
> >> +			kfree(ent);
> >> +		}
> >> +	}
> >> out:
> >> 	spin_unlock(&sriov->id_map_lock);
> >> }
> >> --
> >> 2.20.1
> >>
>
Haakon Bugge Jan. 21, 2020, 2:26 p.m. UTC | #5
> On 20 Jan 2020, at 17:27, Leon Romanovsky <leon@kernel.org> wrote:
> 
> On Mon, Jan 20, 2020 at 04:49:56PM +0100, Håkon Bugge wrote:
>> 
>> 
>>> On 20 Jan 2020, at 16:06, Leon Romanovsky <leon@kernel.org> wrote:
>>> 
>>> On Fri, Jan 17, 2020 at 02:56:22PM +0100, Håkon Bugge wrote:
>>>> Using CX-3 virtual functions, either from a bare-metal machine or
>>>> pass-through from a VM, MAD packets are proxied through the PF driver.
>>>> 
>>>> Since the VF drivers have separate name spaces for MAD Transaction Ids
>>>> (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
>>>> in a cache.
>>>> 
>>>> Following the RDMA Connection Manager (CM) protocol, it is clear when
>>>> an entry has to evicted from the cache. When a DREP is sent from
>>>> mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar
>>>> when a REJ is received by the mlx4_ib_demux_cm_handler(),
>>>> id_map_find_del() is called.
>>>> 
>>>> This function wipes out the TID in use from the IDR or XArray and
>>>> removes the id_map_entry from the table.
>>>> 
>>>> In short, it does everything except the topping of the cake, which is
>>>> to remove the entry from the list and free it. In other words, for the
>>>> DREP and REJ cases enumerated above, both will leak one id_map_entry.
>>>> 
>>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>> ---
>>>> drivers/infiniband/hw/mlx4/cm.c | 7 ++++++-
>>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>>> 
>>>> diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
>>>> index ecd6cadd529a..1df6d3ccfc62 100644
>>>> --- a/drivers/infiniband/hw/mlx4/cm.c
>>>> +++ b/drivers/infiniband/hw/mlx4/cm.c
>>>> @@ -197,8 +197,13 @@ static void id_map_find_del(struct ib_device *ibdev, int pv_cm_id)
>>>> 	if (!ent)
>>>> 		goto out;
>>>> 	found_ent = id_map_find_by_sl_id(ibdev, ent->slave_id, ent->sl_cm_id);
>>>> -	if (found_ent && found_ent == ent)
>>>> +	if (found_ent && found_ent == ent) {
>>>> 		rb_erase(&found_ent->node, sl_id_map);
>>>> +		if (!ent->scheduled_delete) {
>>> 
>>> Why do we need to check scheduled_delete?
>> 
>> 1. Node receives a DREQ and mlx4_ib_demux_cm_handler() is called, which again calls schedule_delayed(), which sets scheduled_delete.
>> 
>> 2. DREQ is proxied over to the VM, which replies with a DREP.
>> 
>> 3. The DREP is proxied over to the PF driver, mlx4_ib_multiplex_cm_handler() is called, id_map_find_del() is called. If it is freed now (without checking scheduled_delete), it will be a double free when the delayed work kicks in.
> 
> It will be the case if we don't cancel delayed work inside
> id_map_find_del(), but it raises other question. Why do we need two
> identical delete functions?

Almost identical. It's a question about being evolutionary vs. revolutionary, when fixing a leak. My preference would be to fix the leak as simple as possible, without changing any logic.

> Can we convert id_map_find_del() callers
> to use id_map_ent_timeout() instead?

I did the opposite, and I will send it out as a v2 after testing. But, I didn't like it. Fewer lines, yes, but more complexity.


Thxs, Håkon

> 
> Thanks
> 
>> 
>> But this documents that the commit message is not accurate, it is only the REJ case that has a leak.
>> 
>>> Isn't this to mark call to timeout cleanup (id_map_ent_timeout), which
>>> can't race with id_map_find_del()? They both hold the same spinlock.
>> 
>> No race, but it can be set as per the above.
>> 
>> If you agree, I will send a v2 with corrected commit message.
>> 
>> 
>> Thxs, Håkon
>> 
>> 
>>> 
>>> Thanks
>>> 
>>>> +			list_del(&ent->list);
>>>> +			kfree(ent);
>>>> +		}
>>>> +	}
>>>> out:
>>>> 	spin_unlock(&sriov->id_map_lock);
>>>> }
>>>> --
>>>> 2.20.1
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
index ecd6cadd529a..1df6d3ccfc62 100644
--- a/drivers/infiniband/hw/mlx4/cm.c
+++ b/drivers/infiniband/hw/mlx4/cm.c
@@ -197,8 +197,13 @@  static void id_map_find_del(struct ib_device *ibdev, int pv_cm_id)
 	if (!ent)
 		goto out;
 	found_ent = id_map_find_by_sl_id(ibdev, ent->slave_id, ent->sl_cm_id);
-	if (found_ent && found_ent == ent)
+	if (found_ent && found_ent == ent) {
 		rb_erase(&found_ent->node, sl_id_map);
+		if (!ent->scheduled_delete) {
+			list_del(&ent->list);
+			kfree(ent);
+		}
+	}
 out:
 	spin_unlock(&sriov->id_map_lock);
 }