mbox series

[for-next,00/16] Implement work queues for rdma_rxe

Message ID 20221018043345.4033-1-rpearsonhpe@gmail.com (mailing list archive)
Headers show
Series Implement work queues for rdma_rxe | expand

Message

Bob Pearson Oct. 18, 2022, 4:33 a.m. UTC
This patch series implements work queues as an alternative for
the main tasklets in the rdma_rxe driver. The first few patches
perform some cleanups of the current tasklet code followed by a
patch that makes the internal API for task execution pluggable and
implements an inline and a tasklet based set of functions.
The remaining patches cleanup the qp reset and error code in the
three tasklets and modify the locking logic to prevent making
multiple calls to the tasklet scheduling routine. Finally after
this preparation the work queue equivalent set of functions is
added and module parameters are implemented to allow tuning the
task types.

The advantages of the work queue version of deferred task execution
is mainly that the work queue variant has much better scalability
and overall performance than the tasklet variant. The tasklet
performance saturates with one connected queue pair and stays constant.
The work queue performance is slightly better for one queue pair but
scales up with the number of connected queue pairs. The perftest
microbenchmarks in local loopback mode (not a very realistic test
case) can reach approximately 100Gb/sec with work queues compared to
about 16Gb/sec for tasklets.

This patch series is derived from an earlier patch set developed by
Ian Ziemba at HPE which is used in some Lustre storage clients attached
to Lustre servers with hard RoCE v2 NICs.

Bob Pearson (16):
  RDMA/rxe: Remove init of task locks from rxe_qp.c
  RDMA/rxe: Removed unused name from rxe_task struct
  RDMA/rxe: Split rxe_run_task() into two subroutines
  RDMA/rxe: Make rxe_do_task static
  RDMA/rxe: Rename task->state_lock to task->lock
  RDMA/rxe: Make task interface pluggable
  RDMA/rxe: Simplify reset state handling in rxe_resp.c
  RDMA/rxe: Split rxe_drain_resp_pkts()
  RDMA/rxe: Handle qp error in rxe_resp.c
  RDMA/rxe: Cleanup comp tasks in rxe_qp.c
  RDMA/rxe: Remove __rxe_do_task()
  RDMA/rxe: Make tasks schedule each other
  RDMA/rxe: Implement disable/enable_task()
  RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE
  RDMA/rxe: Add workqueue support for tasks
  RDMA/rxe: Add parameters to control task type

 drivers/infiniband/sw/rxe/rxe.c       |   9 +-
 drivers/infiniband/sw/rxe/rxe_comp.c  |  35 ++-
 drivers/infiniband/sw/rxe/rxe_net.c   |   4 +-
 drivers/infiniband/sw/rxe/rxe_qp.c    |  87 +++----
 drivers/infiniband/sw/rxe/rxe_req.c   |  10 +-
 drivers/infiniband/sw/rxe/rxe_resp.c  |  75 ++++--
 drivers/infiniband/sw/rxe/rxe_task.c  | 354 ++++++++++++++++++++------
 drivers/infiniband/sw/rxe/rxe_task.h  |  76 +++---
 drivers/infiniband/sw/rxe/rxe_verbs.c |   8 +-
 9 files changed, 451 insertions(+), 207 deletions(-)


base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780

Comments

Md Haris Iqbal Oct. 20, 2022, 3:02 p.m. UTC | #1
On Tue, Oct 18, 2022 at 6:39 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> This patch series implements work queues as an alternative for
> the main tasklets in the rdma_rxe driver. The first few patches
> perform some cleanups of the current tasklet code followed by a
> patch that makes the internal API for task execution pluggable and
> implements an inline and a tasklet based set of functions.
> The remaining patches cleanup the qp reset and error code in the
> three tasklets and modify the locking logic to prevent making
> multiple calls to the tasklet scheduling routine. Finally after
> this preparation the work queue equivalent set of functions is
> added and module parameters are implemented to allow tuning the
> task types.
>
> The advantages of the work queue version of deferred task execution
> is mainly that the work queue variant has much better scalability
> and overall performance than the tasklet variant. The tasklet
> performance saturates with one connected queue pair and stays constant.
> The work queue performance is slightly better for one queue pair but
> scales up with the number of connected queue pairs. The perftest
> microbenchmarks in local loopback mode (not a very realistic test
> case) can reach approximately 100Gb/sec with work queues compared to
> about 16Gb/sec for tasklets.
>
> This patch series is derived from an earlier patch set developed by
> Ian Ziemba at HPE which is used in some Lustre storage clients attached
> to Lustre servers with hard RoCE v2 NICs.
>
> Bob Pearson (16):
>   RDMA/rxe: Remove init of task locks from rxe_qp.c
>   RDMA/rxe: Removed unused name from rxe_task struct
>   RDMA/rxe: Split rxe_run_task() into two subroutines
>   RDMA/rxe: Make rxe_do_task static
>   RDMA/rxe: Rename task->state_lock to task->lock
>   RDMA/rxe: Make task interface pluggable
>   RDMA/rxe: Simplify reset state handling in rxe_resp.c
>   RDMA/rxe: Split rxe_drain_resp_pkts()
>   RDMA/rxe: Handle qp error in rxe_resp.c
>   RDMA/rxe: Cleanup comp tasks in rxe_qp.c
>   RDMA/rxe: Remove __rxe_do_task()
>   RDMA/rxe: Make tasks schedule each other
>   RDMA/rxe: Implement disable/enable_task()
>   RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE
>   RDMA/rxe: Add workqueue support for tasks
>   RDMA/rxe: Add parameters to control task type
>
>  drivers/infiniband/sw/rxe/rxe.c       |   9 +-
>  drivers/infiniband/sw/rxe/rxe_comp.c  |  35 ++-
>  drivers/infiniband/sw/rxe/rxe_net.c   |   4 +-
>  drivers/infiniband/sw/rxe/rxe_qp.c    |  87 +++----
>  drivers/infiniband/sw/rxe/rxe_req.c   |  10 +-
>  drivers/infiniband/sw/rxe/rxe_resp.c  |  75 ++++--
>  drivers/infiniband/sw/rxe/rxe_task.c  | 354 ++++++++++++++++++++------
>  drivers/infiniband/sw/rxe/rxe_task.h  |  76 +++---
>  drivers/infiniband/sw/rxe/rxe_verbs.c |   8 +-
>  9 files changed, 451 insertions(+), 207 deletions(-)
>
>
> base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780

The patch series is not applying cleanly over the mentioned commit for
me. Patch 'PATCH for-next 05/16] RDMA/rxe: Rename task->state_lock to
task->lock.' fails at "drivers/infiniband/sw/rxe/rxe_task.c:103".
I corrected that manually, then it fails in the next commit. Didn't
check after that. Is it the same for others or is it just me?

> --
> 2.34.1
>
Daisuke Matsuda (Fujitsu) Oct. 21, 2022, 2:46 a.m. UTC | #2
On Fri, Oct 21, 2022 12:02 AM haris Iqbal wrote:
> 
> On Tue, Oct 18, 2022 at 6:39 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
> >
> > This patch series implements work queues as an alternative for
> > the main tasklets in the rdma_rxe driver. The first few patches
> > perform some cleanups of the current tasklet code followed by a
> > patch that makes the internal API for task execution pluggable and
> > implements an inline and a tasklet based set of functions.
> > The remaining patches cleanup the qp reset and error code in the
> > three tasklets and modify the locking logic to prevent making
> > multiple calls to the tasklet scheduling routine. Finally after
> > this preparation the work queue equivalent set of functions is
> > added and module parameters are implemented to allow tuning the
> > task types.
> >
> > The advantages of the work queue version of deferred task execution
> > is mainly that the work queue variant has much better scalability
> > and overall performance than the tasklet variant. The tasklet
> > performance saturates with one connected queue pair and stays constant.
> > The work queue performance is slightly better for one queue pair but
> > scales up with the number of connected queue pairs. The perftest
> > microbenchmarks in local loopback mode (not a very realistic test
> > case) can reach approximately 100Gb/sec with work queues compared to
> > about 16Gb/sec for tasklets.
> >
> > This patch series is derived from an earlier patch set developed by
> > Ian Ziemba at HPE which is used in some Lustre storage clients attached
> > to Lustre servers with hard RoCE v2 NICs.
> >
> > Bob Pearson (16):
> >   RDMA/rxe: Remove init of task locks from rxe_qp.c
> >   RDMA/rxe: Removed unused name from rxe_task struct
> >   RDMA/rxe: Split rxe_run_task() into two subroutines
> >   RDMA/rxe: Make rxe_do_task static
> >   RDMA/rxe: Rename task->state_lock to task->lock
> >   RDMA/rxe: Make task interface pluggable
> >   RDMA/rxe: Simplify reset state handling in rxe_resp.c
> >   RDMA/rxe: Split rxe_drain_resp_pkts()
> >   RDMA/rxe: Handle qp error in rxe_resp.c
> >   RDMA/rxe: Cleanup comp tasks in rxe_qp.c
> >   RDMA/rxe: Remove __rxe_do_task()
> >   RDMA/rxe: Make tasks schedule each other
> >   RDMA/rxe: Implement disable/enable_task()
> >   RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE
> >   RDMA/rxe: Add workqueue support for tasks
> >   RDMA/rxe: Add parameters to control task type
> >
> >  drivers/infiniband/sw/rxe/rxe.c       |   9 +-
> >  drivers/infiniband/sw/rxe/rxe_comp.c  |  35 ++-
> >  drivers/infiniband/sw/rxe/rxe_net.c   |   4 +-
> >  drivers/infiniband/sw/rxe/rxe_qp.c    |  87 +++----
> >  drivers/infiniband/sw/rxe/rxe_req.c   |  10 +-
> >  drivers/infiniband/sw/rxe/rxe_resp.c  |  75 ++++--
> >  drivers/infiniband/sw/rxe/rxe_task.c  | 354 ++++++++++++++++++++------
> >  drivers/infiniband/sw/rxe/rxe_task.h  |  76 +++---
> >  drivers/infiniband/sw/rxe/rxe_verbs.c |   8 +-
> >  9 files changed, 451 insertions(+), 207 deletions(-)
> >
> >
> > base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
> 
> The patch series is not applying cleanly over the mentioned commit for
> me. Patch 'PATCH for-next 05/16] RDMA/rxe: Rename task->state_lock to
> task->lock.' fails at "drivers/infiniband/sw/rxe/rxe_task.c:103".
> I corrected that manually, then it fails in the next commit. Didn't
> check after that. Is it the same for others or is it just me?

There is a problem with the 4th patch. Its subject is different from other patches,
so probably it was not generated at the same time with them. I could apply the rest
after adding the following change to the 4th:
===
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 097ddb16c230..a7203b93e5cc 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -28,7 +28,7 @@ int __rxe_do_task(struct rxe_task *task)
  * a second caller finds the task already running
  * but looks just after the last call to func
  */
-static void rxe_do_task(struct tasklet_struct *t)
+static void do_task(struct tasklet_struct *t)
 {
        int cont;
        int ret;
@@ -100,7 +100,7 @@ int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *))
        task->func      = func;
        task->destroyed = false;

-       tasklet_setup(&task->tasklet, rxe_do_task);
+       tasklet_setup(&task->tasklet, do_task);

        task->state = TASK_STATE_START;
        spin_lock_init(&task->state_lock);
@@ -132,7 +132,7 @@ void rxe_run_task(struct rxe_task *task)
        if (task->destroyed)
                return;

-       rxe_do_task(&task->tasklet);
+       do_task(&task->tasklet);
 }

 void rxe_sched_task(struct rxe_task *task)
===

Daisuke

> 
> > --
> > 2.34.1
> >
Bob Pearson Oct. 21, 2022, 3:40 a.m. UTC | #3
On 10/20/22 21:46, matsuda-daisuke@fujitsu.com wrote:
> On Fri, Oct 21, 2022 12:02 AM haris Iqbal wrote:
>>
>> On Tue, Oct 18, 2022 at 6:39 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>>
>>> This patch series implements work queues as an alternative for
>>> the main tasklets in the rdma_rxe driver. The first few patches
>>> perform some cleanups of the current tasklet code followed by a
>>> patch that makes the internal API for task execution pluggable and
>>> implements an inline and a tasklet based set of functions.
>>> The remaining patches cleanup the qp reset and error code in the
>>> three tasklets and modify the locking logic to prevent making
>>> multiple calls to the tasklet scheduling routine. Finally after
>>> this preparation the work queue equivalent set of functions is
>>> added and module parameters are implemented to allow tuning the
>>> task types.
>>>
>>> The advantages of the work queue version of deferred task execution
>>> is mainly that the work queue variant has much better scalability
>>> and overall performance than the tasklet variant. The tasklet
>>> performance saturates with one connected queue pair and stays constant.
>>> The work queue performance is slightly better for one queue pair but
>>> scales up with the number of connected queue pairs. The perftest
>>> microbenchmarks in local loopback mode (not a very realistic test
>>> case) can reach approximately 100Gb/sec with work queues compared to
>>> about 16Gb/sec for tasklets.
>>>
>>> This patch series is derived from an earlier patch set developed by
>>> Ian Ziemba at HPE which is used in some Lustre storage clients attached
>>> to Lustre servers with hard RoCE v2 NICs.
>>>
>>> Bob Pearson (16):
>>>   RDMA/rxe: Remove init of task locks from rxe_qp.c
>>>   RDMA/rxe: Removed unused name from rxe_task struct
>>>   RDMA/rxe: Split rxe_run_task() into two subroutines
>>>   RDMA/rxe: Make rxe_do_task static
>>>   RDMA/rxe: Rename task->state_lock to task->lock
>>>   RDMA/rxe: Make task interface pluggable
>>>   RDMA/rxe: Simplify reset state handling in rxe_resp.c
>>>   RDMA/rxe: Split rxe_drain_resp_pkts()
>>>   RDMA/rxe: Handle qp error in rxe_resp.c
>>>   RDMA/rxe: Cleanup comp tasks in rxe_qp.c
>>>   RDMA/rxe: Remove __rxe_do_task()
>>>   RDMA/rxe: Make tasks schedule each other
>>>   RDMA/rxe: Implement disable/enable_task()
>>>   RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE
>>>   RDMA/rxe: Add workqueue support for tasks
>>>   RDMA/rxe: Add parameters to control task type
>>>
>>>  drivers/infiniband/sw/rxe/rxe.c       |   9 +-
>>>  drivers/infiniband/sw/rxe/rxe_comp.c  |  35 ++-
>>>  drivers/infiniband/sw/rxe/rxe_net.c   |   4 +-
>>>  drivers/infiniband/sw/rxe/rxe_qp.c    |  87 +++----
>>>  drivers/infiniband/sw/rxe/rxe_req.c   |  10 +-
>>>  drivers/infiniband/sw/rxe/rxe_resp.c  |  75 ++++--
>>>  drivers/infiniband/sw/rxe/rxe_task.c  | 354 ++++++++++++++++++++------
>>>  drivers/infiniband/sw/rxe/rxe_task.h  |  76 +++---
>>>  drivers/infiniband/sw/rxe/rxe_verbs.c |   8 +-
>>>  9 files changed, 451 insertions(+), 207 deletions(-)
>>>
>>>
>>> base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
>>
>> The patch series is not applying cleanly over the mentioned commit for
>> me. Patch 'PATCH for-next 05/16] RDMA/rxe: Rename task->state_lock to
>> task->lock.' fails at "drivers/infiniband/sw/rxe/rxe_task.c:103".
>> I corrected that manually, then it fails in the next commit. Didn't
>> check after that. Is it the same for others or is it just me?
> 
> There is a problem with the 4th patch. Its subject is different from other patches,
> so probably it was not generated at the same time with them. I could apply the rest
> after adding the following change to the 4th:
> ===

I messed up the subject and resent it just after the other ones.

> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
> index 097ddb16c230..a7203b93e5cc 100644
> --- a/drivers/infiniband/sw/rxe/rxe_task.c
> +++ b/drivers/infiniband/sw/rxe/rxe_task.c
> @@ -28,7 +28,7 @@ int __rxe_do_task(struct rxe_task *task)
>   * a second caller finds the task already running
>   * but looks just after the last call to func
>   */
> -static void rxe_do_task(struct tasklet_struct *t)
> +static void do_task(struct tasklet_struct *t)
>  {
>         int cont;
>         int ret;
> @@ -100,7 +100,7 @@ int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *))
>         task->func      = func;
>         task->destroyed = false;
> 
> -       tasklet_setup(&task->tasklet, rxe_do_task);
> +       tasklet_setup(&task->tasklet, do_task);
> 
>         task->state = TASK_STATE_START;
>         spin_lock_init(&task->state_lock);
> @@ -132,7 +132,7 @@ void rxe_run_task(struct rxe_task *task)
>         if (task->destroyed)
>                 return;
> 
> -       rxe_do_task(&task->tasklet);
> +       do_task(&task->tasklet);
>  }
> 
>  void rxe_sched_task(struct rxe_task *task)
> ===
> 
> Daisuke
> 
>>
>>> --
>>> 2.34.1
>>>
Bob Pearson Oct. 21, 2022, 6:02 a.m. UTC | #4
On 10/20/22 10:02, haris iqbal wrote:
> On Tue, Oct 18, 2022 at 6:39 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>
>> This patch series implements work queues as an alternative for
>> the main tasklets in the rdma_rxe driver. The first few patches
>> perform some cleanups of the current tasklet code followed by a
>> patch that makes the internal API for task execution pluggable and
>> implements an inline and a tasklet based set of functions.
>> The remaining patches cleanup the qp reset and error code in the
>> three tasklets and modify the locking logic to prevent making
>> multiple calls to the tasklet scheduling routine. Finally after
>> this preparation the work queue equivalent set of functions is
>> added and module parameters are implemented to allow tuning the
>> task types.
>>
>> The advantages of the work queue version of deferred task execution
>> is mainly that the work queue variant has much better scalability
>> and overall performance than the tasklet variant. The tasklet
>> performance saturates with one connected queue pair and stays constant.
>> The work queue performance is slightly better for one queue pair but
>> scales up with the number of connected queue pairs. The perftest
>> microbenchmarks in local loopback mode (not a very realistic test
>> case) can reach approximately 100Gb/sec with work queues compared to
>> about 16Gb/sec for tasklets.
>>
>> This patch series is derived from an earlier patch set developed by
>> Ian Ziemba at HPE which is used in some Lustre storage clients attached
>> to Lustre servers with hard RoCE v2 NICs.
>>
>> Bob Pearson (16):
>>   RDMA/rxe: Remove init of task locks from rxe_qp.c
>>   RDMA/rxe: Removed unused name from rxe_task struct
>>   RDMA/rxe: Split rxe_run_task() into two subroutines
>>   RDMA/rxe: Make rxe_do_task static
>>   RDMA/rxe: Rename task->state_lock to task->lock
>>   RDMA/rxe: Make task interface pluggable
>>   RDMA/rxe: Simplify reset state handling in rxe_resp.c
>>   RDMA/rxe: Split rxe_drain_resp_pkts()
>>   RDMA/rxe: Handle qp error in rxe_resp.c
>>   RDMA/rxe: Cleanup comp tasks in rxe_qp.c
>>   RDMA/rxe: Remove __rxe_do_task()
>>   RDMA/rxe: Make tasks schedule each other
>>   RDMA/rxe: Implement disable/enable_task()
>>   RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE
>>   RDMA/rxe: Add workqueue support for tasks
>>   RDMA/rxe: Add parameters to control task type
>>
>>  drivers/infiniband/sw/rxe/rxe.c       |   9 +-
>>  drivers/infiniband/sw/rxe/rxe_comp.c  |  35 ++-
>>  drivers/infiniband/sw/rxe/rxe_net.c   |   4 +-
>>  drivers/infiniband/sw/rxe/rxe_qp.c    |  87 +++----
>>  drivers/infiniband/sw/rxe/rxe_req.c   |  10 +-
>>  drivers/infiniband/sw/rxe/rxe_resp.c  |  75 ++++--
>>  drivers/infiniband/sw/rxe/rxe_task.c  | 354 ++++++++++++++++++++------
>>  drivers/infiniband/sw/rxe/rxe_task.h  |  76 +++---
>>  drivers/infiniband/sw/rxe/rxe_verbs.c |   8 +-
>>  9 files changed, 451 insertions(+), 207 deletions(-)
>>
>>
>> base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
> 
> The patch series is not applying cleanly over the mentioned commit for
> me. Patch 'PATCH for-next 05/16] RDMA/rxe: Rename task->state_lock to
> task->lock.' fails at "drivers/infiniband/sw/rxe/rxe_task.c:103".
> I corrected that manually, then it fails in the next commit. Didn't
> check after that. Is it the same for others or is it just me?
> 
>> --
>> 2.34.1
>>

This worked for me. There was the botched 4/16 which I resent just after the other ones.
You may need to delete the first 4/16 and use the second one. I am going to resend it
tomorrow. There are a couple of things folks have pointed out that I want to address.

Bob