mbox series

[for-next,v4,0/7] RDMA/rxe: Make multicast work

Message ID 20231204200342.7125-1-rpearsonhpe@gmail.com (mailing list archive)
Headers show
Series RDMA/rxe: Make multicast work | expand

Message

Bob Pearson Dec. 4, 2023, 8:03 p.m. UTC
After developing a test program which exercises node to node
testing of RoCE multicast it became clear that there are a
number of issues with the current rdma_rxe multicast implementation.

The issues seen include:
	- There is no support for IPV4 multicast addresses.
	- Once a multicast MAC is added it is not removed.
	- Multicast packets are sent with the wrong QP number.
	- Multicast IP addresses are not created and without
	  them no multicast packets received on the interface
	  are delivered to the rdma_rxe driver.
	- The implementation in rxe_mcast.c is potentially
	  racy if multiple simultaneous attach/detach operations
	  are issued.

This patch set fixes these issues. 
---
v4:
  Corrected a lockdep bug reported by Zhu Yanjun.
v3:
  Removed rxe_loop_and_send(). It turns out it is not needed.
  Added module parameters to set mcast limits to small values when
  driver is loaded to enable mcast limit testing.
  Rebased to current for-next branch.
v2:
  Respond to comments by Zhu.
  Added more Fixes lines.
  Added some more explanation in the commit messages.
  Fixed an error in rxe_lookup_mcg. Should have checked
	the return from rxe_get_mcg().

Bob Pearson (7):
  RDMA/rxe: Cleanup rxe_ah/av_chk_attr
  RDMA/rxe: Fix sending of mcast packets
  RDMA/rxe: Register IP mcast address
  RDMA/rxe: Let rxe_lookup_mcg use rcu_read_lock
  RDMA/rxe: Split multicast lock
  RDMA/rxe: Cleanup mcg lifetime
  RDMA/rxe: Add module parameters for mcast limits

 drivers/infiniband/sw/rxe/Makefile     |   3 +-
 drivers/infiniband/sw/rxe/rxe.c        |   8 +-
 drivers/infiniband/sw/rxe/rxe_av.c     |  50 +--
 drivers/infiniband/sw/rxe/rxe_loc.h    |   6 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c  | 524 +++++++++++--------------
 drivers/infiniband/sw/rxe/rxe_net.c    |   6 +-
 drivers/infiniband/sw/rxe/rxe_net.h    |   1 +
 drivers/infiniband/sw/rxe/rxe_opcode.h |   2 +-
 drivers/infiniband/sw/rxe/rxe_param.c  |  23 ++
 drivers/infiniband/sw/rxe/rxe_param.h  |   4 +
 drivers/infiniband/sw/rxe/rxe_qp.c     |   4 +-
 drivers/infiniband/sw/rxe/rxe_recv.c   |  11 +-
 drivers/infiniband/sw/rxe/rxe_req.c    |  11 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c  |   5 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h  |   5 +-
 15 files changed, 303 insertions(+), 360 deletions(-)
 create mode 100644 drivers/infiniband/sw/rxe/rxe_param.c

Comments

Bob Pearson Dec. 4, 2023, 8:07 p.m. UTC | #1
Zhu,

Thanks for testing this. It turns out I needed to take the sk_lock for
ipv6_sock_mc_join/drop().

Bob

On 12/4/23 14:03, Bob Pearson wrote:
> After developing a test program which exercises node to node
> testing of RoCE multicast it became clear that there are a
> number of issues with the current rdma_rxe multicast implementation.
> 
> The issues seen include:
> 	- There is no support for IPV4 multicast addresses.
> 	- Once a multicast MAC is added it is not removed.
> 	- Multicast packets are sent with the wrong QP number.
> 	- Multicast IP addresses are not created and without
> 	  them no multicast packets received on the interface
> 	  are delivered to the rdma_rxe driver.
> 	- The implementation in rxe_mcast.c is potentially
> 	  racy if multiple simultaneous attach/detach operations
> 	  are issued.
> 
> This patch set fixes these issues.
> ---
> v4:
>    Corrected a lockdep bug reported by Zhu Yanjun.
> v3:
>    Removed rxe_loop_and_send(). It turns out it is not needed.
>    Added module parameters to set mcast limits to small values when
>    driver is loaded to enable mcast limit testing.
>    Rebased to current for-next branch.
> v2:
>    Respond to comments by Zhu.
>    Added more Fixes lines.
>    Added some more explanation in the commit messages.
>    Fixed an error in rxe_lookup_mcg. Should have checked
> 	the return from rxe_get_mcg().
> 
> Bob Pearson (7):
>    RDMA/rxe: Cleanup rxe_ah/av_chk_attr
>    RDMA/rxe: Fix sending of mcast packets
>    RDMA/rxe: Register IP mcast address
>    RDMA/rxe: Let rxe_lookup_mcg use rcu_read_lock
>    RDMA/rxe: Split multicast lock
>    RDMA/rxe: Cleanup mcg lifetime
>    RDMA/rxe: Add module parameters for mcast limits
> 
>   drivers/infiniband/sw/rxe/Makefile     |   3 +-
>   drivers/infiniband/sw/rxe/rxe.c        |   8 +-
>   drivers/infiniband/sw/rxe/rxe_av.c     |  50 +--
>   drivers/infiniband/sw/rxe/rxe_loc.h    |   6 +-
>   drivers/infiniband/sw/rxe/rxe_mcast.c  | 524 +++++++++++--------------
>   drivers/infiniband/sw/rxe/rxe_net.c    |   6 +-
>   drivers/infiniband/sw/rxe/rxe_net.h    |   1 +
>   drivers/infiniband/sw/rxe/rxe_opcode.h |   2 +-
>   drivers/infiniband/sw/rxe/rxe_param.c  |  23 ++
>   drivers/infiniband/sw/rxe/rxe_param.h  |   4 +
>   drivers/infiniband/sw/rxe/rxe_qp.c     |   4 +-
>   drivers/infiniband/sw/rxe/rxe_recv.c   |  11 +-
>   drivers/infiniband/sw/rxe/rxe_req.c    |  11 +-
>   drivers/infiniband/sw/rxe/rxe_verbs.c  |   5 +-
>   drivers/infiniband/sw/rxe/rxe_verbs.h  |   5 +-
>   15 files changed, 303 insertions(+), 360 deletions(-)
>   create mode 100644 drivers/infiniband/sw/rxe/rxe_param.c
>
Zhu Yanjun Dec. 4, 2023, 11:50 p.m. UTC | #2
+ Leon

Bob

Exactly.  Some kind of lock is needed for ipv6_sock_mc_join/drop. I will 
delve into these commits.

And this is related with mcast. We can invite netdev experts to review 
the code as well.

If you agree, I will also send your commits to netdev maillist.

And these commits are complicated. It is very good to have the 
suggestions from

the experts in netdev.

Zhu Yanjun

在 2023/12/5 4:07, Bob Pearson 写道:
> Zhu,
>
> Thanks for testing this. It turns out I needed to take the sk_lock for
> ipv6_sock_mc_join/drop().
>
> Bob
>
> On 12/4/23 14:03, Bob Pearson wrote:
>> After developing a test program which exercises node to node
>> testing of RoCE multicast it became clear that there are a
>> number of issues with the current rdma_rxe multicast implementation.
>>
>> The issues seen include:
>>     - There is no support for IPV4 multicast addresses.
>>     - Once a multicast MAC is added it is not removed.
>>     - Multicast packets are sent with the wrong QP number.
>>     - Multicast IP addresses are not created and without
>>       them no multicast packets received on the interface
>>       are delivered to the rdma_rxe driver.
>>     - The implementation in rxe_mcast.c is potentially
>>       racy if multiple simultaneous attach/detach operations
>>       are issued.
>>
>> This patch set fixes these issues.
>> ---
>> v4:
>>    Corrected a lockdep bug reported by Zhu Yanjun.
>> v3:
>>    Removed rxe_loop_and_send(). It turns out it is not needed.
>>    Added module parameters to set mcast limits to small values when
>>    driver is loaded to enable mcast limit testing.
>>    Rebased to current for-next branch.
>> v2:
>>    Respond to comments by Zhu.
>>    Added more Fixes lines.
>>    Added some more explanation in the commit messages.
>>    Fixed an error in rxe_lookup_mcg. Should have checked
>>     the return from rxe_get_mcg().
>>
>> Bob Pearson (7):
>>    RDMA/rxe: Cleanup rxe_ah/av_chk_attr
>>    RDMA/rxe: Fix sending of mcast packets
>>    RDMA/rxe: Register IP mcast address
>>    RDMA/rxe: Let rxe_lookup_mcg use rcu_read_lock
>>    RDMA/rxe: Split multicast lock
>>    RDMA/rxe: Cleanup mcg lifetime
>>    RDMA/rxe: Add module parameters for mcast limits
>>
>>   drivers/infiniband/sw/rxe/Makefile     |   3 +-
>>   drivers/infiniband/sw/rxe/rxe.c        |   8 +-
>>   drivers/infiniband/sw/rxe/rxe_av.c     |  50 +--
>>   drivers/infiniband/sw/rxe/rxe_loc.h    |   6 +-
>>   drivers/infiniband/sw/rxe/rxe_mcast.c  | 524 +++++++++++--------------
>>   drivers/infiniband/sw/rxe/rxe_net.c    |   6 +-
>>   drivers/infiniband/sw/rxe/rxe_net.h    |   1 +
>>   drivers/infiniband/sw/rxe/rxe_opcode.h |   2 +-
>>   drivers/infiniband/sw/rxe/rxe_param.c  |  23 ++
>>   drivers/infiniband/sw/rxe/rxe_param.h  |   4 +
>>   drivers/infiniband/sw/rxe/rxe_qp.c     |   4 +-
>>   drivers/infiniband/sw/rxe/rxe_recv.c   |  11 +-
>>   drivers/infiniband/sw/rxe/rxe_req.c    |  11 +-
>>   drivers/infiniband/sw/rxe/rxe_verbs.c  |   5 +-
>>   drivers/infiniband/sw/rxe/rxe_verbs.h  |   5 +-
>>   15 files changed, 303 insertions(+), 360 deletions(-)
>>   create mode 100644 drivers/infiniband/sw/rxe/rxe_param.c
>>
Bob Pearson Dec. 5, 2023, 12:02 a.m. UTC | #3
On 12/4/23 17:50, Zhu Yanjun wrote:
> + Leon
> 
> Bob
> 
> Exactly.  Some kind of lock is needed for ipv6_sock_mc_join/drop. I will 
> delve into these commits.
> 
> And this is related with mcast. We can invite netdev experts to review 
> the code as well.
> 
> If you agree, I will also send your commits to netdev maillist.
> 
> And these commits are complicated. It is very good to have the 
> suggestions from
> 
> the experts in netdev.
> 
> Zhu Yanjun
> 
> 在 2023/12/5 4:07, Bob Pearson 写道:
>> Zhu,
>>
>> Thanks for testing this. It turns out I needed to take the sk_lock for
>> ipv6_sock_mc_join/drop().
>>
>> Bob
>>
>> On 12/4/23 14:03, Bob Pearson wrote:
>>> After developing a test program which exercises node to node
>>> testing of RoCE multicast it became clear that there are a
>>> number of issues with the current rdma_rxe multicast implementation.
>>>
>>> The issues seen include:
>>>     - There is no support for IPV4 multicast addresses.
>>>     - Once a multicast MAC is added it is not removed.
>>>     - Multicast packets are sent with the wrong QP number.
>>>     - Multicast IP addresses are not created and without
>>>       them no multicast packets received on the interface
>>>       are delivered to the rdma_rxe driver.
>>>     - The implementation in rxe_mcast.c is potentially
>>>       racy if multiple simultaneous attach/detach operations
>>>       are issued.
>>>
>>> This patch set fixes these issues.
>>> ---
>>> v4:
>>>    Corrected a lockdep bug reported by Zhu Yanjun.
>>> v3:
>>>    Removed rxe_loop_and_send(). It turns out it is not needed.
>>>    Added module parameters to set mcast limits to small values when
>>>    driver is loaded to enable mcast limit testing.
>>>    Rebased to current for-next branch.
>>> v2:
>>>    Respond to comments by Zhu.
>>>    Added more Fixes lines.
>>>    Added some more explanation in the commit messages.
>>>    Fixed an error in rxe_lookup_mcg. Should have checked
>>>     the return from rxe_get_mcg().
>>>
>>> Bob Pearson (7):
>>>    RDMA/rxe: Cleanup rxe_ah/av_chk_attr
>>>    RDMA/rxe: Fix sending of mcast packets
>>>    RDMA/rxe: Register IP mcast address
>>>    RDMA/rxe: Let rxe_lookup_mcg use rcu_read_lock
>>>    RDMA/rxe: Split multicast lock
>>>    RDMA/rxe: Cleanup mcg lifetime
>>>    RDMA/rxe: Add module parameters for mcast limits
>>>
>>>   drivers/infiniband/sw/rxe/Makefile     |   3 +-
>>>   drivers/infiniband/sw/rxe/rxe.c        |   8 +-
>>>   drivers/infiniband/sw/rxe/rxe_av.c     |  50 +--
>>>   drivers/infiniband/sw/rxe/rxe_loc.h    |   6 +-
>>>   drivers/infiniband/sw/rxe/rxe_mcast.c  | 524 +++++++++++--------------
>>>   drivers/infiniband/sw/rxe/rxe_net.c    |   6 +-
>>>   drivers/infiniband/sw/rxe/rxe_net.h    |   1 +
>>>   drivers/infiniband/sw/rxe/rxe_opcode.h |   2 +-
>>>   drivers/infiniband/sw/rxe/rxe_param.c  |  23 ++
>>>   drivers/infiniband/sw/rxe/rxe_param.h  |   4 +
>>>   drivers/infiniband/sw/rxe/rxe_qp.c     |   4 +-
>>>   drivers/infiniband/sw/rxe/rxe_recv.c   |  11 +-
>>>   drivers/infiniband/sw/rxe/rxe_req.c    |  11 +-
>>>   drivers/infiniband/sw/rxe/rxe_verbs.c  |   5 +-
>>>   drivers/infiniband/sw/rxe/rxe_verbs.h  |   5 +-
>>>   15 files changed, 303 insertions(+), 360 deletions(-)
>>>   create mode 100644 drivers/infiniband/sw/rxe/rxe_param.c
>>>
Zhu,

After rereading v4 patch 0003 I noticed that I missed adding locks
around ipv6_sock_mc_drop() in the error path in rxe_mcast_add6().
Would you rather I send a new v4 patch or resend the series as v5?

Bob
Jason Gunthorpe Dec. 5, 2023, 12:08 a.m. UTC | #4
On Mon, Dec 04, 2023 at 06:02:55PM -0600, Bob Pearson wrote:

> After rereading v4 patch 0003 I noticed that I missed adding locks
> around ipv6_sock_mc_drop() in the error path in rxe_mcast_add6().
> Would you rather I send a new v4 patch or resend the series as v5?

Please send the whole thing, patchworks does not like it when
individual patches are revised

Thanks,
Jason