Message ID | cover.1726480607.git.lorenzo@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | Introduce GRO support to cpumap codebase | expand |
From: Lorenzo Bianconi <lorenzo@kernel.org> Date: Mon, 16 Sep 2024 12:13:42 +0200 > Add GRO support to cpumap codebase moving the cpu_map_entry kthread to a > NAPI-kthread pinned on the selected cpu. > > Changes in rfc v2: > - get rid of dummy netdev dependency > > Lorenzo Bianconi (3): > net: Add napi_init_for_gro routine > net: add napi_threaded_poll to netdevice.h > bpf: cpumap: Add gro support Oh okay, so it's still uses a NAPI. When I'm back from the conferences (next week), I might rebase and send the solution where I only use the GRO part of it, i.e. no napi_schedule()/poll()/napi_complete() logics. > > include/linux/netdevice.h | 3 + > kernel/bpf/cpumap.c | 123 ++++++++++++++++---------------------- > net/core/dev.c | 27 ++++++--- > 3 files changed, 73 insertions(+), 80 deletions(-) Thanks, Olek
Hi Lorenzo, On Mon, Sep 16, 2024 at 12:13:42PM GMT, Lorenzo Bianconi wrote: > Add GRO support to cpumap codebase moving the cpu_map_entry kthread to a > NAPI-kthread pinned on the selected cpu. > > Changes in rfc v2: > - get rid of dummy netdev dependency > > Lorenzo Bianconi (3): > net: Add napi_init_for_gro routine > net: add napi_threaded_poll to netdevice.h > bpf: cpumap: Add gro support > > include/linux/netdevice.h | 3 + > kernel/bpf/cpumap.c | 123 ++++++++++++++++---------------------- > net/core/dev.c | 27 ++++++--- > 3 files changed, 73 insertions(+), 80 deletions(-) > > -- > 2.46.0 > Sorry about the long delay - finally caught up to everything after conferences. I re-ran my synthetic tests (including baseline). v2 is somehow showing 2x bigger gains than v1 (~30% vs ~14%) for tcp_stream. Again, the only variable I changed is kernel version - steering prog is active for both. Baseline (again) ./tcp_rr -c -H $TASK_IP -p 50,90,99 -T4 -F8 -l30 ./tcp_stream -c -H $TASK_IP -T8 -F16 -l30 Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) Run 1 2560252 0.00009087 0.00010495 0.00011647 Run 1 15479.31 Run 2 2665517 0.00008575 0.00010239 0.00013311 Run 2 15162.48 Run 3 2755939 0.00008191 0.00010367 0.00012287 Run 3 14709.04 Run 4 2595680 0.00008575 0.00011263 0.00012671 Run 4 15373.06 Run 5 2841865 0.00007999 0.00009471 0.00012799 Run 5 15234.91 Average 2683850.6 0.000084854 0.00010367 0.00012543 Average 15191.76 cpumap NAPI patches v2 Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) Run 1 2577838 0.00008575 0.00012031 0.00013695 Run 1 19914.56 Run 2 2729237 0.00007551 0.00013311 0.00017663 Run 2 20140.92 Run 3 2689442 0.00008319 0.00010495 0.00013311 Run 3 19887.48 Run 4 2862366 0.00008127 0.00009471 0.00010623 Run 4 19374.49 Run 5 2700538 0.00008319 0.00010367 0.00012799 Run 5 19784.49 Average 2711884.2 0.000081782 0.00011135 0.000136182 Average 19820.388 Delta 1.04% -3.62% 7.41% 8.57% 30.47% Thanks, Daniel
> Hi Lorenzo, > > On Mon, Sep 16, 2024 at 12:13:42PM GMT, Lorenzo Bianconi wrote: > > Add GRO support to cpumap codebase moving the cpu_map_entry kthread to a > > NAPI-kthread pinned on the selected cpu. > > > > Changes in rfc v2: > > - get rid of dummy netdev dependency > > > > Lorenzo Bianconi (3): > > net: Add napi_init_for_gro routine > > net: add napi_threaded_poll to netdevice.h > > bpf: cpumap: Add gro support > > > > include/linux/netdevice.h | 3 + > > kernel/bpf/cpumap.c | 123 ++++++++++++++++---------------------- > > net/core/dev.c | 27 ++++++--- > > 3 files changed, 73 insertions(+), 80 deletions(-) > > > > -- > > 2.46.0 > > > > Sorry about the long delay - finally caught up to everything after > conferences. > > I re-ran my synthetic tests (including baseline). v2 is somehow showing > 2x bigger gains than v1 (~30% vs ~14%) for tcp_stream. Again, the only > variable I changed is kernel version - steering prog is active for both. > > > Baseline (again) > > ./tcp_rr -c -H $TASK_IP -p 50,90,99 -T4 -F8 -l30 ./tcp_stream -c -H $TASK_IP -T8 -F16 -l30 > > Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) > Run 1 2560252 0.00009087 0.00010495 0.00011647 Run 1 15479.31 > Run 2 2665517 0.00008575 0.00010239 0.00013311 Run 2 15162.48 > Run 3 2755939 0.00008191 0.00010367 0.00012287 Run 3 14709.04 > Run 4 2595680 0.00008575 0.00011263 0.00012671 Run 4 15373.06 > Run 5 2841865 0.00007999 0.00009471 0.00012799 Run 5 15234.91 > Average 2683850.6 0.000084854 0.00010367 0.00012543 Average 15191.76 > > cpumap NAPI patches v2 > > Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) > Run 1 2577838 0.00008575 0.00012031 0.00013695 Run 1 19914.56 > Run 2 2729237 0.00007551 0.00013311 0.00017663 Run 2 20140.92 > Run 3 2689442 0.00008319 0.00010495 0.00013311 Run 3 19887.48 > Run 4 2862366 0.00008127 0.00009471 0.00010623 Run 4 19374.49 > Run 5 2700538 0.00008319 0.00010367 0.00012799 Run 5 19784.49 > Average 2711884.2 0.000081782 0.00011135 0.000136182 Average 19820.388 > Delta 1.04% -3.62% 7.41% 8.57% 30.47% > > Thanks, > Daniel Hi Daniel, cool, thx for testing it. @Olek: how do we want to proceed on it? Are you still working on it or do you want me to send a regular patch for it? Regards, Lorenzo
From: Lorenzo Bianconi <lorenzo@kernel.org> Date: Wed, 9 Oct 2024 12:46:00 +0200 >> Hi Lorenzo, >> >> On Mon, Sep 16, 2024 at 12:13:42PM GMT, Lorenzo Bianconi wrote: >>> Add GRO support to cpumap codebase moving the cpu_map_entry kthread to a >>> NAPI-kthread pinned on the selected cpu. >>> >>> Changes in rfc v2: >>> - get rid of dummy netdev dependency >>> >>> Lorenzo Bianconi (3): >>> net: Add napi_init_for_gro routine >>> net: add napi_threaded_poll to netdevice.h >>> bpf: cpumap: Add gro support >>> >>> include/linux/netdevice.h | 3 + >>> kernel/bpf/cpumap.c | 123 ++++++++++++++++---------------------- >>> net/core/dev.c | 27 ++++++--- >>> 3 files changed, 73 insertions(+), 80 deletions(-) >>> >>> -- >>> 2.46.0 >>> >> >> Sorry about the long delay - finally caught up to everything after >> conferences. >> >> I re-ran my synthetic tests (including baseline). v2 is somehow showing >> 2x bigger gains than v1 (~30% vs ~14%) for tcp_stream. Again, the only >> variable I changed is kernel version - steering prog is active for both. >> >> >> Baseline (again) >> >> ./tcp_rr -c -H $TASK_IP -p 50,90,99 -T4 -F8 -l30 ./tcp_stream -c -H $TASK_IP -T8 -F16 -l30 >> >> Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) >> Run 1 2560252 0.00009087 0.00010495 0.00011647 Run 1 15479.31 >> Run 2 2665517 0.00008575 0.00010239 0.00013311 Run 2 15162.48 >> Run 3 2755939 0.00008191 0.00010367 0.00012287 Run 3 14709.04 >> Run 4 2595680 0.00008575 0.00011263 0.00012671 Run 4 15373.06 >> Run 5 2841865 0.00007999 0.00009471 0.00012799 Run 5 15234.91 >> Average 2683850.6 0.000084854 0.00010367 0.00012543 Average 15191.76 >> >> cpumap NAPI patches v2 >> >> Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) >> Run 1 2577838 0.00008575 0.00012031 0.00013695 Run 1 19914.56 >> Run 2 2729237 0.00007551 0.00013311 0.00017663 Run 2 20140.92 >> Run 3 2689442 0.00008319 0.00010495 0.00013311 Run 3 19887.48 >> Run 4 2862366 0.00008127 0.00009471 0.00010623 Run 4 19374.49 >> Run 5 2700538 0.00008319 0.00010367 0.00012799 Run 5 19784.49 >> Average 2711884.2 0.000081782 0.00011135 0.000136182 Average 19820.388 >> Delta 1.04% -3.62% 7.41% 8.57% 30.47% >> >> Thanks, >> Daniel > > Hi Daniel, > > cool, thx for testing it. > > @Olek: how do we want to proceed on it? Are you still working on it or do you want me > to send a regular patch for it? Hi, I had a small vacation, sorry. I'm starting working on it again today. > > Regards, > Lorenzo Thanks, Olek
> From: Lorenzo Bianconi <lorenzo@kernel.org> > Date: Wed, 9 Oct 2024 12:46:00 +0200 > > >> Hi Lorenzo, > >> > >> On Mon, Sep 16, 2024 at 12:13:42PM GMT, Lorenzo Bianconi wrote: > >>> Add GRO support to cpumap codebase moving the cpu_map_entry kthread to a > >>> NAPI-kthread pinned on the selected cpu. > >>> > >>> Changes in rfc v2: > >>> - get rid of dummy netdev dependency > >>> > >>> Lorenzo Bianconi (3): > >>> net: Add napi_init_for_gro routine > >>> net: add napi_threaded_poll to netdevice.h > >>> bpf: cpumap: Add gro support > >>> > >>> include/linux/netdevice.h | 3 + > >>> kernel/bpf/cpumap.c | 123 ++++++++++++++++---------------------- > >>> net/core/dev.c | 27 ++++++--- > >>> 3 files changed, 73 insertions(+), 80 deletions(-) > >>> > >>> -- > >>> 2.46.0 > >>> > >> > >> Sorry about the long delay - finally caught up to everything after > >> conferences. > >> > >> I re-ran my synthetic tests (including baseline). v2 is somehow showing > >> 2x bigger gains than v1 (~30% vs ~14%) for tcp_stream. Again, the only > >> variable I changed is kernel version - steering prog is active for both. > >> > >> > >> Baseline (again) > >> > >> ./tcp_rr -c -H $TASK_IP -p 50,90,99 -T4 -F8 -l30 ./tcp_stream -c -H $TASK_IP -T8 -F16 -l30 > >> > >> Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) > >> Run 1 2560252 0.00009087 0.00010495 0.00011647 Run 1 15479.31 > >> Run 2 2665517 0.00008575 0.00010239 0.00013311 Run 2 15162.48 > >> Run 3 2755939 0.00008191 0.00010367 0.00012287 Run 3 14709.04 > >> Run 4 2595680 0.00008575 0.00011263 0.00012671 Run 4 15373.06 > >> Run 5 2841865 0.00007999 0.00009471 0.00012799 Run 5 15234.91 > >> Average 2683850.6 0.000084854 0.00010367 0.00012543 Average 15191.76 > >> > >> cpumap NAPI patches v2 > >> > >> Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) > >> Run 1 2577838 0.00008575 0.00012031 0.00013695 Run 1 19914.56 > >> Run 2 2729237 0.00007551 0.00013311 0.00017663 Run 2 20140.92 > >> Run 3 2689442 0.00008319 0.00010495 0.00013311 Run 3 19887.48 > >> Run 4 2862366 0.00008127 0.00009471 0.00010623 Run 4 19374.49 > >> Run 5 2700538 0.00008319 0.00010367 0.00012799 Run 5 19784.49 > >> Average 2711884.2 0.000081782 0.00011135 0.000136182 Average 19820.388 > >> Delta 1.04% -3.62% 7.41% 8.57% 30.47% > >> > >> Thanks, > >> Daniel > > > > Hi Daniel, > > > > cool, thx for testing it. > > > > @Olek: how do we want to proceed on it? Are you still working on it or do you want me > > to send a regular patch for it? > > Hi, > > I had a small vacation, sorry. I'm starting working on it again today. ack, no worries. Are you going to rebase the other patches on top of it or are you going to try a different approach? Regards, Lorenzo > > > > > Regards, > > Lorenzo > > Thanks, > Olek
From: Lorenzo Bianconi <lorenzo@kernel.org> Date: Wed, 9 Oct 2024 14:47:58 +0200 >> From: Lorenzo Bianconi <lorenzo@kernel.org> >> Date: Wed, 9 Oct 2024 12:46:00 +0200 >> >>>> Hi Lorenzo, >>>> >>>> On Mon, Sep 16, 2024 at 12:13:42PM GMT, Lorenzo Bianconi wrote: >>>>> Add GRO support to cpumap codebase moving the cpu_map_entry kthread to a >>>>> NAPI-kthread pinned on the selected cpu. >>>>> >>>>> Changes in rfc v2: >>>>> - get rid of dummy netdev dependency >>>>> >>>>> Lorenzo Bianconi (3): >>>>> net: Add napi_init_for_gro routine >>>>> net: add napi_threaded_poll to netdevice.h >>>>> bpf: cpumap: Add gro support >>>>> >>>>> include/linux/netdevice.h | 3 + >>>>> kernel/bpf/cpumap.c | 123 ++++++++++++++++---------------------- >>>>> net/core/dev.c | 27 ++++++--- >>>>> 3 files changed, 73 insertions(+), 80 deletions(-) >>>>> >>>>> -- >>>>> 2.46.0 >>>>> >>>> >>>> Sorry about the long delay - finally caught up to everything after >>>> conferences. >>>> >>>> I re-ran my synthetic tests (including baseline). v2 is somehow showing >>>> 2x bigger gains than v1 (~30% vs ~14%) for tcp_stream. Again, the only >>>> variable I changed is kernel version - steering prog is active for both. >>>> >>>> >>>> Baseline (again) >>>> >>>> ./tcp_rr -c -H $TASK_IP -p 50,90,99 -T4 -F8 -l30 ./tcp_stream -c -H $TASK_IP -T8 -F16 -l30 >>>> >>>> Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) >>>> Run 1 2560252 0.00009087 0.00010495 0.00011647 Run 1 15479.31 >>>> Run 2 2665517 0.00008575 0.00010239 0.00013311 Run 2 15162.48 >>>> Run 3 2755939 0.00008191 0.00010367 0.00012287 Run 3 14709.04 >>>> Run 4 2595680 0.00008575 0.00011263 0.00012671 Run 4 15373.06 >>>> Run 5 2841865 0.00007999 0.00009471 0.00012799 Run 5 15234.91 >>>> Average 2683850.6 0.000084854 0.00010367 0.00012543 Average 15191.76 >>>> >>>> cpumap NAPI patches v2 >>>> >>>> Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) >>>> Run 1 2577838 0.00008575 0.00012031 0.00013695 Run 1 19914.56 >>>> Run 2 2729237 0.00007551 0.00013311 0.00017663 Run 2 20140.92 >>>> Run 3 2689442 0.00008319 0.00010495 0.00013311 Run 3 19887.48 >>>> Run 4 2862366 0.00008127 0.00009471 0.00010623 Run 4 19374.49 >>>> Run 5 2700538 0.00008319 0.00010367 0.00012799 Run 5 19784.49 >>>> Average 2711884.2 0.000081782 0.00011135 0.000136182 Average 19820.388 >>>> Delta 1.04% -3.62% 7.41% 8.57% 30.47% >>>> >>>> Thanks, >>>> Daniel >>> >>> Hi Daniel, >>> >>> cool, thx for testing it. >>> >>> @Olek: how do we want to proceed on it? Are you still working on it or do you want me >>> to send a regular patch for it? >> >> Hi, >> >> I had a small vacation, sorry. I'm starting working on it again today. > > ack, no worries. Are you going to rebase the other patches on top of it > or are you going to try a different approach? I'll try the approach without NAPI as Kuba asks and let Daniel test it, then we'll see. BTW I'm curious how he got this boost on v2, from what I see you didn't change the implementation that much? Thanks, Olek
From: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Wed, 9 Oct 2024 14:50:42 +0200 > From: Lorenzo Bianconi <lorenzo@kernel.org> > Date: Wed, 9 Oct 2024 14:47:58 +0200 > >>> From: Lorenzo Bianconi <lorenzo@kernel.org> >>> Date: Wed, 9 Oct 2024 12:46:00 +0200 >>> >>>>> Hi Lorenzo, >>>>> >>>>> On Mon, Sep 16, 2024 at 12:13:42PM GMT, Lorenzo Bianconi wrote: >>>>>> Add GRO support to cpumap codebase moving the cpu_map_entry kthread to a >>>>>> NAPI-kthread pinned on the selected cpu. >>>>>> >>>>>> Changes in rfc v2: >>>>>> - get rid of dummy netdev dependency >>>>>> >>>>>> Lorenzo Bianconi (3): >>>>>> net: Add napi_init_for_gro routine >>>>>> net: add napi_threaded_poll to netdevice.h >>>>>> bpf: cpumap: Add gro support >>>>>> >>>>>> include/linux/netdevice.h | 3 + >>>>>> kernel/bpf/cpumap.c | 123 ++++++++++++++++---------------------- >>>>>> net/core/dev.c | 27 ++++++--- >>>>>> 3 files changed, 73 insertions(+), 80 deletions(-) >>>>>> >>>>>> -- >>>>>> 2.46.0 >>>>>> >>>>> >>>>> Sorry about the long delay - finally caught up to everything after >>>>> conferences. >>>>> >>>>> I re-ran my synthetic tests (including baseline). v2 is somehow showing >>>>> 2x bigger gains than v1 (~30% vs ~14%) for tcp_stream. Again, the only >>>>> variable I changed is kernel version - steering prog is active for both. >>>>> >>>>> >>>>> Baseline (again) >>>>> >>>>> ./tcp_rr -c -H $TASK_IP -p 50,90,99 -T4 -F8 -l30 ./tcp_stream -c -H $TASK_IP -T8 -F16 -l30 >>>>> >>>>> Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) >>>>> Run 1 2560252 0.00009087 0.00010495 0.00011647 Run 1 15479.31 >>>>> Run 2 2665517 0.00008575 0.00010239 0.00013311 Run 2 15162.48 >>>>> Run 3 2755939 0.00008191 0.00010367 0.00012287 Run 3 14709.04 >>>>> Run 4 2595680 0.00008575 0.00011263 0.00012671 Run 4 15373.06 >>>>> Run 5 2841865 0.00007999 0.00009471 0.00012799 Run 5 15234.91 >>>>> Average 2683850.6 0.000084854 0.00010367 0.00012543 Average 15191.76 >>>>> >>>>> cpumap NAPI patches v2 >>>>> >>>>> Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) >>>>> Run 1 2577838 0.00008575 0.00012031 0.00013695 Run 1 19914.56 >>>>> Run 2 2729237 0.00007551 0.00013311 0.00017663 Run 2 20140.92 >>>>> Run 3 2689442 0.00008319 0.00010495 0.00013311 Run 3 19887.48 >>>>> Run 4 2862366 0.00008127 0.00009471 0.00010623 Run 4 19374.49 >>>>> Run 5 2700538 0.00008319 0.00010367 0.00012799 Run 5 19784.49 >>>>> Average 2711884.2 0.000081782 0.00011135 0.000136182 Average 19820.388 >>>>> Delta 1.04% -3.62% 7.41% 8.57% 30.47% >>>>> >>>>> Thanks, >>>>> Daniel >>>> >>>> Hi Daniel, >>>> >>>> cool, thx for testing it. >>>> >>>> @Olek: how do we want to proceed on it? Are you still working on it or do you want me >>>> to send a regular patch for it? >>> >>> Hi, >>> >>> I had a small vacation, sorry. I'm starting working on it again today. >> >> ack, no worries. Are you going to rebase the other patches on top of it >> or are you going to try a different approach? > > I'll try the approach without NAPI as Kuba asks and let Daniel test it, > then we'll see. For now, I have the same results without NAPI as with your series, so I'll push it soon and let Daniel test. (I simply decoupled GRO and NAPI and used the former in cpumap, but the kthread logic didn't change) > > BTW I'm curious how he got this boost on v2, from what I see you didn't > change the implementation that much? Thanks, Olek
From: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Tue, 22 Oct 2024 17:51:43 +0200 > From: Alexander Lobakin <aleksander.lobakin@intel.com> > Date: Wed, 9 Oct 2024 14:50:42 +0200 > >> From: Lorenzo Bianconi <lorenzo@kernel.org> >> Date: Wed, 9 Oct 2024 14:47:58 +0200 >> >>>> From: Lorenzo Bianconi <lorenzo@kernel.org> >>>> Date: Wed, 9 Oct 2024 12:46:00 +0200 >>>> >>>>>> Hi Lorenzo, >>>>>> >>>>>> On Mon, Sep 16, 2024 at 12:13:42PM GMT, Lorenzo Bianconi wrote: >>>>>>> Add GRO support to cpumap codebase moving the cpu_map_entry kthread to a >>>>>>> NAPI-kthread pinned on the selected cpu. >>>>>>> >>>>>>> Changes in rfc v2: >>>>>>> - get rid of dummy netdev dependency >>>>>>> >>>>>>> Lorenzo Bianconi (3): >>>>>>> net: Add napi_init_for_gro routine >>>>>>> net: add napi_threaded_poll to netdevice.h >>>>>>> bpf: cpumap: Add gro support >>>>>>> >>>>>>> include/linux/netdevice.h | 3 + >>>>>>> kernel/bpf/cpumap.c | 123 ++++++++++++++++---------------------- >>>>>>> net/core/dev.c | 27 ++++++--- >>>>>>> 3 files changed, 73 insertions(+), 80 deletions(-) >>>>>>> >>>>>>> -- >>>>>>> 2.46.0 >>>>>>> >>>>>> >>>>>> Sorry about the long delay - finally caught up to everything after >>>>>> conferences. >>>>>> >>>>>> I re-ran my synthetic tests (including baseline). v2 is somehow showing >>>>>> 2x bigger gains than v1 (~30% vs ~14%) for tcp_stream. Again, the only >>>>>> variable I changed is kernel version - steering prog is active for both. >>>>>> >>>>>> >>>>>> Baseline (again) >>>>>> >>>>>> ./tcp_rr -c -H $TASK_IP -p 50,90,99 -T4 -F8 -l30 ./tcp_stream -c -H $TASK_IP -T8 -F16 -l30 >>>>>> >>>>>> Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) >>>>>> Run 1 2560252 0.00009087 0.00010495 0.00011647 Run 1 15479.31 >>>>>> Run 2 2665517 0.00008575 0.00010239 0.00013311 Run 2 15162.48 >>>>>> Run 3 2755939 0.00008191 0.00010367 0.00012287 Run 3 14709.04 >>>>>> Run 4 2595680 0.00008575 0.00011263 0.00012671 Run 4 15373.06 >>>>>> Run 5 2841865 0.00007999 0.00009471 0.00012799 Run 5 15234.91 >>>>>> Average 2683850.6 0.000084854 0.00010367 0.00012543 Average 15191.76 >>>>>> >>>>>> cpumap NAPI patches v2 >>>>>> >>>>>> Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) >>>>>> Run 1 2577838 0.00008575 0.00012031 0.00013695 Run 1 19914.56 >>>>>> Run 2 2729237 0.00007551 0.00013311 0.00017663 Run 2 20140.92 >>>>>> Run 3 2689442 0.00008319 0.00010495 0.00013311 Run 3 19887.48 >>>>>> Run 4 2862366 0.00008127 0.00009471 0.00010623 Run 4 19374.49 >>>>>> Run 5 2700538 0.00008319 0.00010367 0.00012799 Run 5 19784.49 >>>>>> Average 2711884.2 0.000081782 0.00011135 0.000136182 Average 19820.388 >>>>>> Delta 1.04% -3.62% 7.41% 8.57% 30.47% >>>>>> >>>>>> Thanks, >>>>>> Daniel >>>>> >>>>> Hi Daniel, >>>>> >>>>> cool, thx for testing it. >>>>> >>>>> @Olek: how do we want to proceed on it? Are you still working on it or do you want me >>>>> to send a regular patch for it? >>>> >>>> Hi, >>>> >>>> I had a small vacation, sorry. I'm starting working on it again today. >>> >>> ack, no worries. Are you going to rebase the other patches on top of it >>> or are you going to try a different approach? >> >> I'll try the approach without NAPI as Kuba asks and let Daniel test it, >> then we'll see. > > For now, I have the same results without NAPI as with your series, so > I'll push it soon and let Daniel test. > > (I simply decoupled GRO and NAPI and used the former in cpumap, but the > kthread logic didn't change) > >> >> BTW I'm curious how he got this boost on v2, from what I see you didn't >> change the implementation that much? Hi Daniel, Sorry for the delay. Please test [0]. [0] https://github.com/alobakin/linux/commits/cpumap-old Thanks, Olek
On Tue, Nov 12, 2024, at 9:43 AM, Alexander Lobakin wrote: > From: Alexander Lobakin <aleksander.lobakin@intel.com> > Date: Tue, 22 Oct 2024 17:51:43 +0200 > >> From: Alexander Lobakin <aleksander.lobakin@intel.com> >> Date: Wed, 9 Oct 2024 14:50:42 +0200 >> >>> From: Lorenzo Bianconi <lorenzo@kernel.org> >>> Date: Wed, 9 Oct 2024 14:47:58 +0200 >>> >>>>> From: Lorenzo Bianconi <lorenzo@kernel.org> >>>>> Date: Wed, 9 Oct 2024 12:46:00 +0200 >>>>> >>>>>>> Hi Lorenzo, >>>>>>> >>>>>>> On Mon, Sep 16, 2024 at 12:13:42PM GMT, Lorenzo Bianconi wrote: >>>>>>>> Add GRO support to cpumap codebase moving the cpu_map_entry kthread to a >>>>>>>> NAPI-kthread pinned on the selected cpu. >>>>>>>> >>>>>>>> Changes in rfc v2: >>>>>>>> - get rid of dummy netdev dependency >>>>>>>> >>>>>>>> Lorenzo Bianconi (3): >>>>>>>> net: Add napi_init_for_gro routine >>>>>>>> net: add napi_threaded_poll to netdevice.h >>>>>>>> bpf: cpumap: Add gro support >>>>>>>> >>>>>>>> include/linux/netdevice.h | 3 + >>>>>>>> kernel/bpf/cpumap.c | 123 ++++++++++++++++---------------------- >>>>>>>> net/core/dev.c | 27 ++++++--- >>>>>>>> 3 files changed, 73 insertions(+), 80 deletions(-) >>>>>>>> >>>>>>>> -- >>>>>>>> 2.46.0 >>>>>>>> >>>>>>> >>>>>>> Sorry about the long delay - finally caught up to everything after >>>>>>> conferences. >>>>>>> >>>>>>> I re-ran my synthetic tests (including baseline). v2 is somehow showing >>>>>>> 2x bigger gains than v1 (~30% vs ~14%) for tcp_stream. Again, the only >>>>>>> variable I changed is kernel version - steering prog is active for both. >>>>>>> >>>>>>> >>>>>>> Baseline (again) >>>>>>> >>>>>>> ./tcp_rr -c -H $TASK_IP -p 50,90,99 -T4 -F8 -l30 ./tcp_stream -c -H $TASK_IP -T8 -F16 -l30 >>>>>>> >>>>>>> Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) >>>>>>> Run 1 2560252 0.00009087 0.00010495 0.00011647 Run 1 15479.31 >>>>>>> Run 2 2665517 0.00008575 0.00010239 0.00013311 Run 2 15162.48 >>>>>>> Run 3 2755939 0.00008191 0.00010367 0.00012287 Run 3 14709.04 >>>>>>> Run 4 2595680 0.00008575 0.00011263 0.00012671 Run 4 15373.06 >>>>>>> Run 5 2841865 0.00007999 0.00009471 0.00012799 Run 5 15234.91 >>>>>>> Average 2683850.6 0.000084854 0.00010367 0.00012543 Average 15191.76 >>>>>>> >>>>>>> cpumap NAPI patches v2 >>>>>>> >>>>>>> Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) >>>>>>> Run 1 2577838 0.00008575 0.00012031 0.00013695 Run 1 19914.56 >>>>>>> Run 2 2729237 0.00007551 0.00013311 0.00017663 Run 2 20140.92 >>>>>>> Run 3 2689442 0.00008319 0.00010495 0.00013311 Run 3 19887.48 >>>>>>> Run 4 2862366 0.00008127 0.00009471 0.00010623 Run 4 19374.49 >>>>>>> Run 5 2700538 0.00008319 0.00010367 0.00012799 Run 5 19784.49 >>>>>>> Average 2711884.2 0.000081782 0.00011135 0.000136182 Average 19820.388 >>>>>>> Delta 1.04% -3.62% 7.41% 8.57% 30.47% >>>>>>> >>>>>>> Thanks, >>>>>>> Daniel >>>>>> >>>>>> Hi Daniel, >>>>>> >>>>>> cool, thx for testing it. >>>>>> >>>>>> @Olek: how do we want to proceed on it? Are you still working on it or do you want me >>>>>> to send a regular patch for it? >>>>> >>>>> Hi, >>>>> >>>>> I had a small vacation, sorry. I'm starting working on it again today. >>>> >>>> ack, no worries. Are you going to rebase the other patches on top of it >>>> or are you going to try a different approach? >>> >>> I'll try the approach without NAPI as Kuba asks and let Daniel test it, >>> then we'll see. >> >> For now, I have the same results without NAPI as with your series, so >> I'll push it soon and let Daniel test. >> >> (I simply decoupled GRO and NAPI and used the former in cpumap, but the >> kthread logic didn't change) >> >>> >>> BTW I'm curious how he got this boost on v2, from what I see you didn't >>> change the implementation that much? > > Hi Daniel, > > Sorry for the delay. Please test [0]. > > [0] https://github.com/alobakin/linux/commits/cpumap-old > > Thanks, > Olek Ack. Will do probably early next week.