Message ID | 20250110130703.3814407-1-linyunsheng@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | fix two bugs related to page_pool | expand |
On 10/01/2025 14.06, Yunsheng Lin wrote: > This patchset fix a possible time window problem for page_pool and > the dma API misuse problem as mentioned in [1], and try to avoid the > overhead of the fixing using some optimization. > > From the below performance data, the overhead is not so obvious > due to performance variations for time_bench_page_pool01_fast_path() > and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead > for time_bench_page_pool03_slow() for fixing the bug. > My benchmarking on x86_64 CPUs looks significantly different. - CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz Benchmark (bench_page_pool_simple) results from before and after patchset: | Test name | Cycles | | |Nanosec | | | % | | (tasklet_*)| Before | After |diff| Before | After | diff | change | |------------+--------+-------+----+--------+--------+-------+--------| | fast_path | 19 | 24 | 5| 5.399 | 6.928 | 1.529 | 28.3 | | ptr_ring | 54 | 79 | 25| 15.090 | 21.976 | 6.886 | 45.6 | | slow | 238 | 299 | 61| 66.134 | 83.298 |17.164 | 26.0 | #+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f My above testing show a clear performance regressions across three different page_pool operating modes. Data also available in: - https://github.com/xdp-project/xdp-project/blob/main/areas/mem/page_pool07_bench_DMA_fix.org Raw data below Before this patchset: [ 157.186644] bench_page_pool_simple: Loaded [ 157.475084] time_bench: Type:for_loop Per elem: 1 cycles(tsc) 0.284 ns (step:0) - (measurement period time:0.284327440 sec time_interval:284327440) - (invoke count:1000000000 tsc_interval:1023590451) [ 162.262752] time_bench: Type:atomic_inc Per elem: 17 cycles(tsc) 4.769 ns (step:0) - (measurement period time:4.769757001 sec time_interval:4769757001) - (invoke count:1000000000 tsc_interval:17171776113) [ 163.324091] time_bench: Type:lock Per elem: 37 cycles(tsc) 10.431 ns (step:0) - (measurement period time:1.043182161 sec time_interval:1043182161) - (invoke count:100000000 tsc_interval:3755514465) [ 163.341702] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 163.922466] time_bench: Type:no-softirq-page_pool01 Per elem: 20 cycles(tsc) 5.713 ns (step:0) - (measurement period time:0.571357387 sec time_interval:571357387) - (invoke count:100000000 tsc_interval:2056911063) [ 163.941429] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 165.506796] time_bench: Type:no-softirq-page_pool02 Per elem: 56 cycles(tsc) 15.560 ns (step:0) - (measurement period time:1.556080558 sec time_interval:1556080558) - (invoke count:100000000 tsc_interval:5601960921) [ 165.525978] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 171.811289] time_bench: Type:no-softirq-page_pool03 Per elem: 225 cycles(tsc) 62.763 ns (step:0) - (measurement period time:6.276301531 sec time_interval:6276301531) - (invoke count:100000000 tsc_interval:22594974468) [ 171.830646] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 171.838561] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 172.387597] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 19 cycles(tsc) 5.399 ns (step:0) - (measurement period time:0.539904228 sec time_interval:539904228) - (invoke count:100000000 tsc_interval:1943679246) [ 172.407130] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 173.925266] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 54 cycles(tsc) 15.090 ns (step:0) - (measurement period time:1.509075496 sec time_interval:1509075496) - (invoke count:100000000 tsc_interval:5432740575) [ 173.944878] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 180.567094] time_bench: Type:tasklet_page_pool03_slow Per elem: 238 cycles(tsc) 66.134 ns (step:0) - (measurement period time:6.613430605 sec time_interval:6613430605) - (invoke count:100000000 tsc_interval:23808654870) After this patchset: [ 860.519918] bench_page_pool_simple: Loaded [ 860.781605] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.257 ns (step:0) - (measurement period time:0.257573336 sec time_interval:257573336) - (invoke count:1000000000 tsc_interval:927275355) [ 865.613893] time_bench: Type:atomic_inc Per elem: 17 cycles(tsc) 4.814 ns (step:0) - (measurement period time:4.814593429 sec time_interval:4814593429) - (invoke count:1000000000 tsc_interval:17332768494) [ 866.708420] time_bench: Type:lock Per elem: 38 cycles(tsc) 10.763 ns (step:0) - (measurement period time:1.076362960 sec time_interval:1076362960) - (invoke count:100000000 tsc_interval:3874955595) [ 866.726118] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 867.423572] time_bench: Type:no-softirq-page_pool01 Per elem: 24 cycles(tsc) 6.880 ns (step:0) - (measurement period time:0.688069107 sec time_interval:688069107) - (invoke count:100000000 tsc_interval:2477080260) [ 867.442517] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 869.436286] time_bench: Type:no-softirq-page_pool02 Per elem: 71 cycles(tsc) 19.844 ns (step:0) - (measurement period time:1.984451929 sec time_interval:1984451929) - (invoke count:100000000 tsc_interval:7144120329) [ 869.455492] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 877.071437] time_bench: Type:no-softirq-page_pool03 Per elem: 273 cycles(tsc) 76.069 ns (step:0) - (measurement period time:7.606911291 sec time_interval:7606911291) - (invoke count:100000000 tsc_interval:27385252251) [ 877.090762] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 877.098683] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 877.800696] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 24 cycles(tsc) 6.928 ns (step:0) - (measurement period time:0.692852876 sec time_interval:692852876) - (invoke count:100000000 tsc_interval:2494303293) [ 877.820224] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 880.026911] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 79 cycles(tsc) 21.976 ns (step:0) - (measurement period time:2.197615122 sec time_interval:2197615122) - (invoke count:100000000 tsc_interval:7911521190) [ 880.046528] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 888.385235] time_bench: Type:tasklet_page_pool03_slow Per elem: 299 cycles(tsc) 83.298 ns (step:0) - (measurement period time:8.329893717 sec time_interval:8329893717) - (invoke count:100000000 tsc_interval:29988024696) > Before this patchset: > root@(none)$ insmod bench_page_pool_simple.ko > [ 323.367627] bench_page_pool_simple: Loaded > [ 323.448747] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076997150 sec time_interval:76997150) - (invoke count:100000000 tsc_interval:7699707) > [ 324.812884] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.468 ns (step:0) - (measurement period time:1.346855130 sec time_interval:1346855130) - (invoke count:100000000 tsc_interval:134685507) > [ 324.980875] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.010 ns (step:0) - (measurement period time:0.150101270 sec time_interval:150101270) - (invoke count:10000000 tsc_interval:15010120) > [ 325.652195] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.542 ns (step:0) - (measurement period time:0.654213000 sec time_interval:654213000) - (invoke count:100000000 tsc_interval:65421294) > [ 325.669215] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path > [ 325.974848] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 29.633 ns (step:0) - (measurement period time:0.296338200 sec time_interval:296338200) - (invoke count:10000000 tsc_interval:29633814) > [ 325.993517] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path > [ 326.576636] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.391 ns (step:0) - (measurement period time:0.573911820 sec time_interval:573911820) - (invoke count:10000000 tsc_interval:57391174) > [ 326.595307] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path > [ 328.422661] time_bench: Type:no-softirq-page_pool03 Per elem: 18 cycles(tsc) 181.849 ns (step:0) - (measurement period time:1.818495880 sec time_interval:1818495880) - (invoke count:10000000 tsc_interval:181849581) > [ 328.441681] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path > [ 328.449584] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path > [ 328.755031] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 29.632 ns (step:0) - (measurement period time:0.296327910 sec time_interval:296327910) - (invoke count:10000000 tsc_interval:29632785) > [ 328.774308] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path > [ 329.578579] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 7 cycles(tsc) 79.523 ns (step:0) - (measurement period time:0.795236560 sec time_interval:795236560) - (invoke count:10000000 tsc_interval:79523650) > [ 329.597769] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path > [ 331.507501] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 190.104 ns (step:0) - (measurement period time:1.901047510 sec time_interval:1901047510) - (invoke count:10000000 tsc_interval:190104743) > > After this patchset: > root@(none)$ insmod bench_page_pool_simple.ko > [ 138.634758] bench_page_pool_simple: Loaded > [ 138.715879] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076972720 sec time_interval:76972720) - (invoke count:100000000 tsc_interval:7697265) > [ 140.079897] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:1.346735370 sec time_interval:1346735370) - (invoke count:100000000 tsc_interval:134673531) > [ 140.247841] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150055080 sec time_interval:150055080) - (invoke count:10000000 tsc_interval:15005497) > [ 140.919072] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:0.654125000 sec time_interval:654125000) - (invoke count:100000000 tsc_interval:65412493) > [ 140.936091] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path > [ 141.246985] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 30.159 ns (step:0) - (measurement period time:0.301598160 sec time_interval:301598160) - (invoke count:10000000 tsc_interval:30159812) > [ 141.265654] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path > [ 141.976265] time_bench: Type:no-softirq-page_pool02 Per elem: 7 cycles(tsc) 70.140 ns (step:0) - (measurement period time:0.701405780 sec time_interval:701405780) - (invoke count:10000000 tsc_interval:70140573) > [ 141.994933] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path > [ 144.018945] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 201.514 ns (step:0) - (measurement period time:2.015141210 sec time_interval:2015141210) - (invoke count:10000000 tsc_interval:201514113) > [ 144.037966] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path > [ 144.045870] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path > [ 144.205045] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150056510 sec time_interval:150056510) - (invoke count:10000000 tsc_interval:15005645) > [ 144.224320] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path > [ 144.916044] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 68.269 ns (step:0) - (measurement period time:0.682693070 sec time_interval:682693070) - (invoke count:10000000 tsc_interval:68269300) > [ 144.935234] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path > [ 146.997684] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 205.376 ns (step:0) - (measurement period time:2.053766310 sec time_interval:2053766310) - (invoke count:10000000 tsc_interval:205376624) > > 1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/ > > CC: Alexander Lobakin <aleksander.lobakin@intel.com> > CC: Robin Murphy <robin.murphy@arm.com> > CC: Alexander Duyck <alexander.duyck@gmail.com> > CC: Andrew Morton <akpm@linux-foundation.org> > CC: IOMMU <iommu@lists.linux.dev> > CC: MM <linux-mm@kvack.org> > > Change log: > V7: > 1. Fix a used-after-free bug reported by KASAN as mentioned by Jakub. > 2. Fix the 'netmem' variable not setting up correctly bug as mentioned > by Simon. > > V6: > 1. Repost based on latest net-next. > 2. Rename page_pool_to_pp() to page_pool_get_pp(). > > V5: > 1. Support unlimit inflight pages. > 2. Add some optimization to avoid the overhead of fixing bug. > > V4: > 1. use scanning to do the unmapping > 2. spilt dma sync skipping into separate patch > > V3: > 1. Target net-next tree instead of net tree. > 2. Narrow the rcu lock as the discussion in v2. > 3. Check the ummapping cnt against the inflight cnt. > > V2: > 1. Add a item_full stat. > 2. Use container_of() for page_pool_to_pp(). > > Yunsheng Lin (8): > page_pool: introduce page_pool_get_pp() API > page_pool: fix timing for checking and disabling napi_local > page_pool: fix IOMMU crash when driver has already unbound > page_pool: support unlimited number of inflight pages > page_pool: skip dma sync operation for inflight pages > page_pool: use list instead of ptr_ring for ring cache > page_pool: batch refilling pages to reduce atomic operation > page_pool: use list instead of array for alloc cache > > drivers/net/ethernet/freescale/fec_main.c | 8 +- > .../ethernet/google/gve/gve_buffer_mgmt_dqo.c | 2 +- > drivers/net/ethernet/intel/iavf/iavf_txrx.c | 6 +- > drivers/net/ethernet/intel/idpf/idpf_txrx.c | 14 +- > drivers/net/ethernet/intel/libeth/rx.c | 2 +- > .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 3 +- > drivers/net/netdevsim/netdev.c | 6 +- > drivers/net/wireless/mediatek/mt76/mt76.h | 2 +- > include/linux/mm_types.h | 2 +- > include/linux/skbuff.h | 1 + > include/net/libeth/rx.h | 3 +- > include/net/netmem.h | 24 +- > include/net/page_pool/helpers.h | 11 + > include/net/page_pool/types.h | 64 +- > net/core/devmem.c | 4 +- > net/core/netmem_priv.h | 5 +- > net/core/page_pool.c | 664 ++++++++++++++---- > net/core/page_pool_priv.h | 12 +- > 18 files changed, 675 insertions(+), 158 deletions(-) >
On 2025/1/14 22:31, Jesper Dangaard Brouer wrote: > > > On 10/01/2025 14.06, Yunsheng Lin wrote: >> This patchset fix a possible time window problem for page_pool and >> the dma API misuse problem as mentioned in [1], and try to avoid the >> overhead of the fixing using some optimization. >> >> From the below performance data, the overhead is not so obvious >> due to performance variations for time_bench_page_pool01_fast_path() >> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead >> for time_bench_page_pool03_slow() for fixing the bug. >> > > My benchmarking on x86_64 CPUs looks significantly different. > - CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz > > Benchmark (bench_page_pool_simple) results from before and after patchset: > > | Test name | Cycles | | |Nanosec | | | % | > | (tasklet_*)| Before | After |diff| Before | After | diff | change | > |------------+--------+-------+----+--------+--------+-------+--------| > | fast_path | 19 | 24 | 5| 5.399 | 6.928 | 1.529 | 28.3 | > | ptr_ring | 54 | 79 | 25| 15.090 | 21.976 | 6.886 | 45.6 | > | slow | 238 | 299 | 61| 66.134 | 83.298 |17.164 | 26.0 | > #+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f > > My above testing show a clear performance regressions across three > different page_pool operating modes. I retested it on arm64 server patch by patch as the raw performance data in the attachment, it seems the result seemed similar as before. Before this patchset: fast_path ptr_ring slow 1. 31.171 ns 60.980 ns 164.917 ns 2. 28.824 ns 60.891 ns 170.241 ns 3. 14.236 ns 60.583 ns 164.355 ns With patch 1-4: 4. 31.443 ns 53.242 ns 210.148 ns 5. 31.406 ns 53.270 ns 210.189 ns With patch 1-5: 6. 26.163 ns 53.781 ns 189.450 ns 7. 26.189 ns 53.798 ns 189.466 ns With patch 1-8: 8. 28.108 ns 68.199 ns 202.516 ns 9. 16.128 ns 55.904 ns 202.711 ns I am not able to get hold of a x86 server yet, I might be able to get one during weekend. Theoretically, patch 1-4 or 1-5 should not have much performance impact for fast_path and ptr_ring except for the rcu_lock mentioned in page_pool_napi_local(), so it would be good if patch 1-5 is also tested in your testlab with the rcu_lock removing in page_pool_napi_local(). > > > Data also available in: > - https://github.com/xdp-project/xdp-project/blob/main/areas/mem/page_pool07_bench_DMA_fix.org > > Raw data below > > Before this patchset: > > [ 157.186644] bench_page_pool_simple: Loaded > [ 157.475084] time_bench: Type:for_loop Per elem: 1 cycles(tsc) 0.284 ns (step:0) - (measurement period time:0.284327440 sec time_interval:284327440) - (invoke count:1000000000 tsc_interval:1023590451) > [ 162.262752] time_bench: Type:atomic_inc Per elem: 17 cycles(tsc) 4.769 ns (step:0) - (measurement period time:4.769757001 sec time_interval:4769757001) - (invoke count:1000000000 tsc_interval:17171776113) > [ 163.324091] time_bench: Type:lock Per elem: 37 cycles(tsc) 10.431 ns (step:0) - (measurement period time:1.043182161 sec time_interval:1043182161) - (invoke count:100000000 tsc_interval:3755514465) > [ 163.341702] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path > [ 163.922466] time_bench: Type:no-softirq-page_pool01 Per elem: 20 cycles(tsc) 5.713 ns (step:0) - (measurement period time:0.571357387 sec time_interval:571357387) - (invoke count:100000000 tsc_interval:2056911063) > [ 163.941429] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path > [ 165.506796] time_bench: Type:no-softirq-page_pool02 Per elem: 56 cycles(tsc) 15.560 ns (step:0) - (measurement period time:1.556080558 sec time_interval:1556080558) - (invoke count:100000000 tsc_interval:5601960921) > [ 165.525978] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path > [ 171.811289] time_bench: Type:no-softirq-page_pool03 Per elem: 225 cycles(tsc) 62.763 ns (step:0) - (measurement period time:6.276301531 sec time_interval:6276301531) - (invoke count:100000000 tsc_interval:22594974468) > [ 171.830646] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path > [ 171.838561] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path > [ 172.387597] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 19 cycles(tsc) 5.399 ns (step:0) - (measurement period time:0.539904228 sec time_interval:539904228) - (invoke count:100000000 tsc_interval:1943679246) > [ 172.407130] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path > [ 173.925266] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 54 cycles(tsc) 15.090 ns (step:0) - (measurement period time:1.509075496 sec time_interval:1509075496) - (invoke count:100000000 tsc_interval:5432740575) > [ 173.944878] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path > [ 180.567094] time_bench: Type:tasklet_page_pool03_slow Per elem: 238 cycles(tsc) 66.134 ns (step:0) - (measurement period time:6.613430605 sec time_interval:6613430605) - (invoke count:100000000 tsc_interval:23808654870) > > > > After this patchset: > [ 860.519918] bench_page_pool_simple: Loaded > [ 860.781605] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.257 ns (step:0) - (measurement period time:0.257573336 sec time_interval:257573336) - (invoke count:1000000000 tsc_interval:927275355) > [ 865.613893] time_bench: Type:atomic_inc Per elem: 17 cycles(tsc) 4.814 ns (step:0) - (measurement period time:4.814593429 sec time_interval:4814593429) - (invoke count:1000000000 tsc_interval:17332768494) > [ 866.708420] time_bench: Type:lock Per elem: 38 cycles(tsc) 10.763 ns (step:0) - (measurement period time:1.076362960 sec time_interval:1076362960) - (invoke count:100000000 tsc_interval:3874955595) > [ 866.726118] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path > [ 867.423572] time_bench: Type:no-softirq-page_pool01 Per elem: 24 cycles(tsc) 6.880 ns (step:0) - (measurement period time:0.688069107 sec time_interval:688069107) - (invoke count:100000000 tsc_interval:2477080260) > [ 867.442517] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path > [ 869.436286] time_bench: Type:no-softirq-page_pool02 Per elem: 71 cycles(tsc) 19.844 ns (step:0) - (measurement period time:1.984451929 sec time_interval:1984451929) - (invoke count:100000000 tsc_interval:7144120329) > [ 869.455492] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path > [ 877.071437] time_bench: Type:no-softirq-page_pool03 Per elem: 273 cycles(tsc) 76.069 ns (step:0) - (measurement period time:7.606911291 sec time_interval:7606911291) - (invoke count:100000000 tsc_interval:27385252251) > [ 877.090762] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path > [ 877.098683] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path > [ 877.800696] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 24 cycles(tsc) 6.928 ns (step:0) - (measurement period time:0.692852876 sec time_interval:692852876) - (invoke count:100000000 tsc_interval:2494303293) > [ 877.820224] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path > [ 880.026911] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 79 cycles(tsc) 21.976 ns (step:0) - (measurement period time:2.197615122 sec time_interval:2197615122) - (invoke count:100000000 tsc_interval:7911521190) > [ 880.046528] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path > [ 888.385235] time_bench: Type:tasklet_page_pool03_slow Per elem: 299 cycles(tsc) 83.298 ns (step:0) - (measurement period time:8.329893717 sec time_interval:8329893717) - (invoke count:100000000 tsc_interval:29988024696) As mentioned by Toke, we may be able to reduce the performance difference between tasklet and non-tasklet testcases by removing the rcu_lock in page_pool_napi_local() for patch 1 as in_softirq() checking in page_pool_napi_local() should ensure RCU-bh read-side critical section. 07ea810753bd Revert "page_pool: introduce page_pool_get_pp() API" root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 118.835127] bench_page_pool_simple: Loaded [ 119.608858] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769587320 sec time_interval:769587320) - (invoke count:1000000000 tsc_interval:76958720) [ 136.559273] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 16.932 ns (step:0) - (measurement period time:16.932925510 sec time_interval:16932925510) - (invoke count:1000000000 tsc_interval:1693292543) [ 138.078107] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500666520 sec time_interval:1500666520) - (invoke count:100000000 tsc_interval:150066646) [ 144.636732] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541323980 sec time_interval:6541323980) - (invoke count:1000000000 tsc_interval:654132391) [ 144.653948] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 147.780571] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 31.173 ns (step:0) - (measurement period time:3.117359810 sec time_interval:3117359810) - (invoke count:100000000 tsc_interval:311735974) [ 147.799427] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 153.566322] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.577 ns (step:0) - (measurement period time:5.757708010 sec time_interval:5757708010) - (invoke count:100000000 tsc_interval:575770795) [ 153.585178] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 171.732446] time_bench: Type:no-softirq-page_pool03 Per elem: 18 cycles(tsc) 181.384 ns (step:0) - (measurement period time:18.138436700 sec time_interval:18138436700) - (invoke count:100000000 tsc_interval:1813843661) [ 171.751744] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 171.759626] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 174.885885] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.171 ns (step:0) - (measurement period time:3.117169710 sec time_interval:3117169710) - (invoke count:100000000 tsc_interval:311716965) [ 174.905345] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 181.012397] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 60.980 ns (step:0) - (measurement period time:6.098047810 sec time_interval:6098047810) - (invoke count:100000000 tsc_interval:609804775) [ 181.031770] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 197.532151] time_bench: Type:tasklet_page_pool03_slow Per elem: 16 cycles(tsc) 164.917 ns (step:0) - (measurement period time:16.491723510 sec time_interval:16491723510) - (invoke count:100000000 tsc_interval:1649172345) root@(none)$ root@(none)$ root@(none)$ rmmod bench_page_pool_simple.ko [ 209.510186] bench_page_pool_simple: Unloaded root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 210.659129] bench_page_pool_simple: Loaded [ 211.432882] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769610810 sec time_interval:769610810) - (invoke count:1000000000 tsc_interval:76961072) [ 224.917831] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467473740 sec time_interval:13467473740) - (invoke count:1000000000 tsc_interval:1346747368) [ 226.436667] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500671210 sec time_interval:1500671210) - (invoke count:100000000 tsc_interval:150067117) [ 232.995372] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541405330 sec time_interval:6541405330) - (invoke count:1000000000 tsc_interval:654140528) [ 233.012586] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 236.139341] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 31.174 ns (step:0) - (measurement period time:3.117491630 sec time_interval:3117491630) - (invoke count:100000000 tsc_interval:311749159) [ 236.158197] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 241.926861] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.594 ns (step:0) - (measurement period time:5.759481900 sec time_interval:5759481900) - (invoke count:100000000 tsc_interval:575948185) [ 241.945717] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 259.747779] time_bench: Type:no-softirq-page_pool03 Per elem: 17 cycles(tsc) 177.932 ns (step:0) - (measurement period time:17.793230520 sec time_interval:17793230520) - (invoke count:100000000 tsc_interval:1779323045) [ 259.767070] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 259.774951] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 262.901276] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.172 ns (step:0) - (measurement period time:3.117235450 sec time_interval:3117235450) - (invoke count:100000000 tsc_interval:311723540) [ 262.920737] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 269.016589] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 60.868 ns (step:0) - (measurement period time:6.086848810 sec time_interval:6086848810) - (invoke count:100000000 tsc_interval:608684876) [ 269.035963] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 285.540301] time_bench: Type:tasklet_page_pool03_slow Per elem: 16 cycles(tsc) 164.956 ns (step:0) - (measurement period time:16.495681400 sec time_interval:16495681400) - (invoke count:100000000 tsc_interval:1649568134) root@(none)$ cat /proc/version Linux version 6.13.0-rc6-00905-g07ea810753bd (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #295 SMP PREEMPT Wed Jan 15 11:22:27 CST 2025 root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 102.478309] bench_page_pool_simple: Loaded [ 103.252061] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769609840 sec time_interval:769609840) - (invoke count:1000000000 tsc_interval:76960976) [ 116.737122] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467584160 sec time_interval:13467584160) - (invoke count:1000000000 tsc_interval:1346758411) [ 118.255948] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500661720 sec time_interval:1500661720) - (invoke count:100000000 tsc_interval:150066166) [ 124.814672] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541425600 sec time_interval:6541425600) - (invoke count:1000000000 tsc_interval:654142555) [ 124.831887] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 126.355730] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 15.145 ns (step:0) - (measurement period time:1.514579980 sec time_interval:1514579980) - (invoke count:100000000 tsc_interval:151457991) [ 126.374588] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 132.139818] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.560 ns (step:0) - (measurement period time:5.756052820 sec time_interval:5756052820) - (invoke count:100000000 tsc_interval:575605276) [ 132.158674] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 149.943233] time_bench: Type:no-softirq-page_pool03 Per elem: 17 cycles(tsc) 177.757 ns (step:0) - (measurement period time:17.775726280 sec time_interval:17775726280) - (invoke count:100000000 tsc_interval:1777572621) [ 149.962525] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 149.970407] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 152.861903] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 28.824 ns (step:0) - (measurement period time:2.882405020 sec time_interval:2882405020) - (invoke count:100000000 tsc_interval:288240495) [ 152.881364] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 158.979512] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 60.891 ns (step:0) - (measurement period time:6.089144870 sec time_interval:6089144870) - (invoke count:100000000 tsc_interval:608914482) [ 158.998884] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 176.031659] time_bench: Type:tasklet_page_pool03_slow Per elem: 17 cycles(tsc) 170.241 ns (step:0) - (measurement period time:17.024117960 sec time_interval:17024117960) - (invoke count:100000000 tsc_interval:1702411789) root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 442.818325] bench_page_pool_simple: Loaded [ 443.592055] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769610330 sec time_interval:769610330) - (invoke count:1000000000 tsc_interval:76961025) [ 458.439817] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 14.830 ns (step:0) - (measurement period time:14.830285600 sec time_interval:14830285600) - (invoke count:1000000000 tsc_interval:1483028556) [ 459.958698] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.007 ns (step:0) - (measurement period time:1.500714240 sec time_interval:1500714240) - (invoke count:100000000 tsc_interval:150071418) [ 466.517515] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541516880 sec time_interval:6541516880) - (invoke count:1000000000 tsc_interval:654151682) [ 466.534728] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 468.047027] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 15.030 ns (step:0) - (measurement period time:1.503035130 sec time_interval:1503035130) - (invoke count:100000000 tsc_interval:150303507) [ 468.065883] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 473.829596] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.545 ns (step:0) - (measurement period time:5.754537290 sec time_interval:5754537290) - (invoke count:100000000 tsc_interval:575453724) [ 473.848452] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 491.124253] time_bench: Type:no-softirq-page_pool03 Per elem: 17 cycles(tsc) 172.669 ns (step:0) - (measurement period time:17.266968680 sec time_interval:17266968680) - (invoke count:100000000 tsc_interval:1726696861) [ 491.143550] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 491.151434] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 493.118656] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 19.581 ns (step:0) - (measurement period time:1.958131510 sec time_interval:1958131510) - (invoke count:100000000 tsc_interval:195813143) [ 493.138115] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 499.227968] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 60.808 ns (step:0) - (measurement period time:6.080847450 sec time_interval:6080847450) - (invoke count:100000000 tsc_interval:608084740) [ 499.247339] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 515.691157] time_bench: Type:tasklet_page_pool03_slow Per elem: 16 cycles(tsc) 164.351 ns (step:0) - (measurement period time:16.435160550 sec time_interval:16435160550) - (invoke count:100000000 tsc_interval:1643516048) root@(none)$ rmmod bench_page_pool_simple.ko [ 683.197394] bench_page_pool_simple: Unloaded root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 684.374311] bench_page_pool_simple: Loaded [ 685.148035] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769604180 sec time_interval:769604180) - (invoke count:1000000000 tsc_interval:76960410) [ 698.632947] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467434190 sec time_interval:13467434190) - (invoke count:1000000000 tsc_interval:1346743412) [ 700.151767] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500657020 sec time_interval:1500657020) - (invoke count:100000000 tsc_interval:150065696) [ 706.710339] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541272330 sec time_interval:6541272330) - (invoke count:1000000000 tsc_interval:654127227) [ 706.727553] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 709.619400] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 28.825 ns (step:0) - (measurement period time:2.882584100 sec time_interval:2882584100) - (invoke count:100000000 tsc_interval:288258403) [ 709.638256] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 715.411633] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.642 ns (step:0) - (measurement period time:5.764201050 sec time_interval:5764201050) - (invoke count:100000000 tsc_interval:576420099) [ 715.430493] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 732.168906] time_bench: Type:no-softirq-page_pool03 Per elem: 16 cycles(tsc) 167.295 ns (step:0) - (measurement period time:16.729578200 sec time_interval:16729578200) - (invoke count:100000000 tsc_interval:1672957815) [ 732.188197] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 732.196078] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 733.628852] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 14.236 ns (step:0) - (measurement period time:1.423682990 sec time_interval:1423682990) - (invoke count:100000000 tsc_interval:142368292) [ 733.648311] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 739.715700] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 60.583 ns (step:0) - (measurement period time:6.058384260 sec time_interval:6058384260) - (invoke count:100000000 tsc_interval:605838420) [ 739.735073] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 756.179270] time_bench: Type:tasklet_page_pool03_slow Per elem: 16 cycles(tsc) 164.355 ns (step:0) - (measurement period time:16.435539700 sec time_interval:16435539700) - (invoke count:100000000 tsc_interval:1643553963) root@(none)$ cat /proc/version Linux version 6.13.0-rc6-00905-g07ea810753bd (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #295 SMP PREEMPT Wed Jan 15 11:22:27 CST 2025 c8cd65aea46f (HEAD -> pp-inflight-fix_v6_test) Revert "page_pool: fix IOMMU crash when driver has already unbound" root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 112.284533] bench_page_pool_simple: Loaded [ 113.058250] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769595440 sec time_interval:769595440) - (invoke count:1000000000 tsc_interval:76959536) [ 126.543325] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467599580 sec time_interval:13467599580) - (invoke count:1000000000 tsc_interval:1346759954) [ 128.062178] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500688700 sec time_interval:1500688700) - (invoke count:100000000 tsc_interval:150068863) [ 134.620885] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541407810 sec time_interval:6541407810) - (invoke count:1000000000 tsc_interval:654140776) [ 134.638100] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 137.764295] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 31.169 ns (step:0) - (measurement period time:3.116932100 sec time_interval:3116932100) - (invoke count:100000000 tsc_interval:311693204) [ 137.783151] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 143.556498] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.641 ns (step:0) - (measurement period time:5.764165830 sec time_interval:5764165830) - (invoke count:100000000 tsc_interval:576416578) [ 143.575354] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 160.391936] time_bench: Type:no-softirq-page_pool03 Per elem: 16 cycles(tsc) 168.077 ns (step:0) - (measurement period time:16.807748380 sec time_interval:16807748380) - (invoke count:100000000 tsc_interval:1680774833) [ 160.411228] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 160.419110] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 163.025216] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 25.970 ns (step:0) - (measurement period time:2.597014370 sec time_interval:2597014370) - (invoke count:100000000 tsc_interval:259701433) [ 163.044675] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 169.169341] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 61.156 ns (step:0) - (measurement period time:6.115661410 sec time_interval:6115661410) - (invoke count:100000000 tsc_interval:611566136) [ 169.188712] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 185.721921] time_bench: Type:tasklet_page_pool03_slow Per elem: 16 cycles(tsc) 165.245 ns (step:0) - (measurement period time:16.524552130 sec time_interval:16524552130) - (invoke count:100000000 tsc_interval:1652455208) root@(none)$ rmmod bench_page_pool_simple.ko [ 228.647567] bench_page_pool_simple: Unloaded root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 229.756515] bench_page_pool_simple: Loaded [ 230.530211] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769571820 sec time_interval:769571820) - (invoke count:1000000000 tsc_interval:76957172) [ 244.015118] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467427880 sec time_interval:13467427880) - (invoke count:1000000000 tsc_interval:1346742782) [ 245.533931] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500649840 sec time_interval:1500649840) - (invoke count:100000000 tsc_interval:150064979) [ 252.092555] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541318290 sec time_interval:6541318290) - (invoke count:1000000000 tsc_interval:654131824) [ 252.109769] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 253.543110] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 14.240 ns (step:0) - (measurement period time:1.424077550 sec time_interval:1424077550) - (invoke count:100000000 tsc_interval:142407750) [ 253.561963] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 259.320132] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.489 ns (step:0) - (measurement period time:5.748989970 sec time_interval:5748989970) - (invoke count:100000000 tsc_interval:574898993) [ 259.338990] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 276.124086] time_bench: Type:no-softirq-page_pool03 Per elem: 16 cycles(tsc) 167.762 ns (step:0) - (measurement period time:16.776264180 sec time_interval:16776264180) - (invoke count:100000000 tsc_interval:1677626413) [ 276.143377] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 276.151259] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 277.584309] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 14.239 ns (step:0) - (measurement period time:1.423960790 sec time_interval:1423960790) - (invoke count:100000000 tsc_interval:142396074) [ 277.603769] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 283.675754] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 60.629 ns (step:0) - (measurement period time:6.062981570 sec time_interval:6062981570) - (invoke count:100000000 tsc_interval:606298151) [ 283.695128] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 300.180187] time_bench: Type:tasklet_page_pool03_slow Per elem: 16 cycles(tsc) 164.764 ns (step:0) - (measurement period time:16.476401670 sec time_interval:16476401670) - (invoke count:100000000 tsc_interval:1647640163) root@(none)$ cat /proc/version Linux version 6.13.0-rc6-00903-gc8cd65aea46f (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #296 SMP PREEMPT Wed Jan 15 11:29:54 CST 2025 d8de0484ad23------page_pool: fix IOMMU crash when driver has already unbound root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 352.981066] bench_page_pool_simple: Loaded [ 353.754833] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769612830 sec time_interval:769612830) - (invoke count:1000000000 tsc_interval:76961275) [ 367.239820] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467509700 sec time_interval:13467509700) - (invoke count:1000000000 tsc_interval:1346750932) [ 368.758688] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.007 ns (step:0) - (measurement period time:1.500703810 sec time_interval:1500703810) - (invoke count:100000000 tsc_interval:150070375) [ 375.317433] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541446010 sec time_interval:6541446010) - (invoke count:1000000000 tsc_interval:654144595) [ 375.334647] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 378.470719] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 31.268 ns (step:0) - (measurement period time:3.126808010 sec time_interval:3126808010) - (invoke count:100000000 tsc_interval:312680796) [ 378.489580] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 384.237992] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.392 ns (step:0) - (measurement period time:5.739235000 sec time_interval:5739235000) - (invoke count:100000000 tsc_interval:573923493) [ 384.256846] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 404.284227] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 200.185 ns (step:0) - (measurement period time:20.018549500 sec time_interval:20018549500) - (invoke count:100000000 tsc_interval:2001854942) [ 404.303523] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 404.311405] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 407.450798] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.303 ns (step:0) - (measurement period time:3.130301150 sec time_interval:3130301150) - (invoke count:100000000 tsc_interval:313030109) [ 407.470257] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 413.117820] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 56.385 ns (step:0) - (measurement period time:5.638558540 sec time_interval:5638558540) - (invoke count:100000000 tsc_interval:563855847) [ 413.137192] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 433.250575] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 201.047 ns (step:0) - (measurement period time:20.104725790 sec time_interval:20104725790) - (invoke count:100000000 tsc_interval:2010472573) root@(none)$ rmmod bench_page_pool_simple.ko [ 481.612067] bench_page_pool_simple: Unloaded root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 482.525041] bench_page_pool_simple: Loaded [ 483.298777] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769612290 sec time_interval:769612290) - (invoke count:1000000000 tsc_interval:76961221) [ 496.783660] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467404470 sec time_interval:13467404470) - (invoke count:1000000000 tsc_interval:1346740441) [ 498.302476] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500651360 sec time_interval:1500651360) - (invoke count:100000000 tsc_interval:150065132) [ 504.861015] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541237000 sec time_interval:6541237000) - (invoke count:1000000000 tsc_interval:654123694) [ 504.878228] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 508.017855] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 31.303 ns (step:0) - (measurement period time:3.130363490 sec time_interval:3130363490) - (invoke count:100000000 tsc_interval:313036345) [ 508.036725] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 513.777554] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.316 ns (step:0) - (measurement period time:5.731647070 sec time_interval:5731647070) - (invoke count:100000000 tsc_interval:573164701) [ 513.796408] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 533.821092] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 200.158 ns (step:0) - (measurement period time:20.015853910 sec time_interval:20015853910) - (invoke count:100000000 tsc_interval:2001585384) [ 533.840385] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 533.848266] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 536.987413] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.300 ns (step:0) - (measurement period time:3.130056990 sec time_interval:3130056990) - (invoke count:100000000 tsc_interval:313005695) [ 537.006870] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 542.553443] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 55.375 ns (step:0) - (measurement period time:5.537567730 sec time_interval:5537567730) - (invoke count:100000000 tsc_interval:553756767) [ 542.572814] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 562.622903] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 200.414 ns (step:0) - (measurement period time:20.041430960 sec time_interval:20041430960) - (invoke count:100000000 tsc_interval:2004143090) root@(none)$ b53806ee8b03 (HEAD -> pp-inflight-fix_v6_test) page_pool: support unlimited number of inflight pages root@(none)$ insmod time_bench.ko [ 57.826902] time_bench: loading out-of-tree module taints kernel. [ 57.833978] time_bench: Loaded root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 66.015795] bench_page_pool_simple: Loaded [ 66.789504] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769581100 sec time_interval:769581100) - (invoke count:1000000000 tsc_interval:76958101) [ 85.985445] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 19.178 ns (step:0) - (measurement period time:19.178464890 sec time_interval:19178464890) - (invoke count:1000000000 tsc_interval:1917846484) [ 87.504318] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.007 ns (step:0) - (measurement period time:1.500707820 sec time_interval:1500707820) - (invoke count:100000000 tsc_interval:150070776) [ 94.062989] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541369880 sec time_interval:6541369880) - (invoke count:1000000000 tsc_interval:654136982) [ 94.080203] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 97.229937] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 31.404 ns (step:0) - (measurement period time:3.140470140 sec time_interval:3140470140) - (invoke count:100000000 tsc_interval:314047009) [ 97.248793] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 102.967699] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.097 ns (step:0) - (measurement period time:5.709729700 sec time_interval:5709729700) - (invoke count:100000000 tsc_interval:570972963) [ 102.986554] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 123.332228] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 203.368 ns (step:0) - (measurement period time:20.336842600 sec time_interval:20336842600) - (invoke count:100000000 tsc_interval:2033684253) [ 123.351522] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 123.359404] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 126.512828] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.443 ns (step:0) - (measurement period time:3.144333160 sec time_interval:3144333160) - (invoke count:100000000 tsc_interval:314433311) [ 126.532286] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 131.865545] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 53.242 ns (step:0) - (measurement period time:5.324254260 sec time_interval:5324254260) - (invoke count:100000000 tsc_interval:532425421) [ 131.884917] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 152.908467] time_bench: Type:tasklet_page_pool03_slow Per elem: 21 cycles(tsc) 210.148 ns (step:0) - (measurement period time:21.014892650 sec time_interval:21014892650) - (invoke count:100000000 tsc_interval:2101489259) root@(none)$ rmmod bench_page_pool_simple.ko [ 163.826865] bench_page_pool_simple: Unloaded root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 164.867796] bench_page_pool_simple: Loaded [ 165.641522] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769607400 sec time_interval:769607400) - (invoke count:1000000000 tsc_interval:76960732) [ 179.126540] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467542660 sec time_interval:13467542660) - (invoke count:1000000000 tsc_interval:1346754260) [ 180.645378] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500671580 sec time_interval:1500671580) - (invoke count:100000000 tsc_interval:150067152) [ 187.204029] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541350520 sec time_interval:6541350520) - (invoke count:1000000000 tsc_interval:654135046) [ 187.221243] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 188.577413] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 13.468 ns (step:0) - (measurement period time:1.346892420 sec time_interval:1346892420) - (invoke count:100000000 tsc_interval:134689236) [ 188.596268] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 194.314705] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.092 ns (step:0) - (measurement period time:5.709260290 sec time_interval:5709260290) - (invoke count:100000000 tsc_interval:570926024) [ 194.333561] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 214.660328] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 203.179 ns (step:0) - (measurement period time:20.317934940 sec time_interval:20317934940) - (invoke count:100000000 tsc_interval:2031793485) [ 214.679620] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 214.687501] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 217.837259] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.406 ns (step:0) - (measurement period time:3.140666230 sec time_interval:3140666230) - (invoke count:100000000 tsc_interval:314066616) [ 217.856720] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 223.192797] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 53.270 ns (step:0) - (measurement period time:5.327072820 sec time_interval:5327072820) - (invoke count:100000000 tsc_interval:532707276) [ 223.212169] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 244.239728] time_bench: Type:tasklet_page_pool03_slow Per elem: 21 cycles(tsc) 210.189 ns (step:0) - (measurement period time:21.018901830 sec time_interval:21018901830) - (invoke count:100000000 tsc_interval:2101890177) root@(none)$ cat /proc/version Linux version 6.13.0-rc6-00903-gb53806ee8b03 (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #297 SMP PREEMPT Wed Jan 15 11:43:41 CST 2025 249fa431270c (HEAD -> pp-inflight-fix_v6_test) page_pool: skip dma sync operation for inflight pages root@(none)$ cat /proc/version Linux version 6.13.0-rc6-00904-g249fa431270c (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #300 SMP PREEMPT Wed Jan 15 14:21:51 CST 2025 root@(none)$ rmmod bench_page_pool_simple.ko [ 459.241973] bench_page_pool_simple: Unloaded root@(none)$ root@(none)$ root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 462.674971] bench_page_pool_simple: Loaded [ 463.448730] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769614430 sec time_interval:769614430) - (invoke count:1000000000 tsc_interval:76961435) [ 476.933835] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467629020 sec time_interval:13467629020) - (invoke count:1000000000 tsc_interval:1346762898) [ 478.452709] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.007 ns (step:0) - (measurement period time:1.500710750 sec time_interval:1500710750) - (invoke count:100000000 tsc_interval:150071069) [ 485.011458] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541449970 sec time_interval:6541449970) - (invoke count:1000000000 tsc_interval:654144991) [ 485.028671] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 486.500170] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 14.622 ns (step:0) - (measurement period time:1.462234950 sec time_interval:1462234950) - (invoke count:100000000 tsc_interval:146223489) [ 486.519026] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 491.827181] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 52.989 ns (step:0) - (measurement period time:5.298974920 sec time_interval:5298974920) - (invoke count:100000000 tsc_interval:529897484) [ 491.846039] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 509.968937] time_bench: Type:no-softirq-page_pool03 Per elem: 18 cycles(tsc) 181.140 ns (step:0) - (measurement period time:18.114063050 sec time_interval:18114063050) - (invoke count:100000000 tsc_interval:1811406296) [ 509.988228] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 509.996109] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 512.621549] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 26.163 ns (step:0) - (measurement period time:2.616350750 sec time_interval:2616350750) - (invoke count:100000000 tsc_interval:261635069) [ 512.641009] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 518.028167] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 53.781 ns (step:0) - (measurement period time:5.378154590 sec time_interval:5378154590) - (invoke count:100000000 tsc_interval:537815454) [ 518.047541] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 537.001263] time_bench: Type:tasklet_page_pool03_slow Per elem: 18 cycles(tsc) 189.450 ns (step:0) - (measurement period time:18.945065660 sec time_interval:18945065660) - (invoke count:100000000 tsc_interval:1894506561) root@(none)$ rmmod bench_page_pool_simple.ko [ 554.270004] bench_page_pool_simple: Unloaded root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 555.334974] bench_page_pool_simple: Loaded [ 556.108716] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769622900 sec time_interval:769622900) - (invoke count:1000000000 tsc_interval:76962277) [ 569.593570] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467378920 sec time_interval:13467378920) - (invoke count:1000000000 tsc_interval:1346737886) [ 571.112408] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500672390 sec time_interval:1500672390) - (invoke count:100000000 tsc_interval:150067233) [ 577.671068] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541360400 sec time_interval:6541360400) - (invoke count:1000000000 tsc_interval:654136033) [ 577.688281] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 579.159760] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 14.622 ns (step:0) - (measurement period time:1.462214680 sec time_interval:1462214680) - (invoke count:100000000 tsc_interval:146221461) [ 579.178615] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 584.387107] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 51.993 ns (step:0) - (measurement period time:5.199315890 sec time_interval:5199315890) - (invoke count:100000000 tsc_interval:519931583) [ 584.405963] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 601.992462] time_bench: Type:no-softirq-page_pool03 Per elem: 17 cycles(tsc) 175.776 ns (step:0) - (measurement period time:17.577663130 sec time_interval:17577663130) - (invoke count:100000000 tsc_interval:1757766306) [ 602.011753] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 602.019634] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 604.647682] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 26.189 ns (step:0) - (measurement period time:2.618955910 sec time_interval:2618955910) - (invoke count:100000000 tsc_interval:261895585) [ 604.667141] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 610.055961] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 53.798 ns (step:0) - (measurement period time:5.379816080 sec time_interval:5379816080) - (invoke count:100000000 tsc_interval:537981602) [ 610.075334] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 629.030597] time_bench: Type:tasklet_page_pool03_slow Per elem: 18 cycles(tsc) 189.466 ns (step:0) - (measurement period time:18.946606280 sec time_interval:18946606280) - (invoke count:100000000 tsc_interval:1894660622) bd05af7e28d2 (HEAD -> pp-inflight-fix_v6_test) page_pool: use list instead of ptr_ring for ring cache root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 324.256893] bench_page_pool_simple: Loaded [ 325.030626] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769608510 sec time_interval:769608510) - (invoke count:1000000000 tsc_interval:76960843) [ 338.515544] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467442220 sec time_interval:13467442220) - (invoke count:1000000000 tsc_interval:1346744216) [ 340.034383] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500673080 sec time_interval:1500673080) - (invoke count:100000000 tsc_interval:150067302) [ 346.593168] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541486300 sec time_interval:6541486300) - (invoke count:1000000000 tsc_interval:654148625) [ 346.610383] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 349.198132] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 25.784 ns (step:0) - (measurement period time:2.578484390 sec time_interval:2578484390) - (invoke count:100000000 tsc_interval:257848433) [ 349.216987] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 358.266543] time_bench: Type:no-softirq-page_pool02 Per elem: 9 cycles(tsc) 90.403 ns (step:0) - (measurement period time:9.040378740 sec time_interval:9040378740) - (invoke count:100000000 tsc_interval:904037869) [ 358.285398] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 378.581275] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 202.870 ns (step:0) - (measurement period time:20.287047800 sec time_interval:20287047800) - (invoke count:100000000 tsc_interval:2028704772) [ 378.600567] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 378.608449] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 381.195830] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 25.782 ns (step:0) - (measurement period time:2.578291220 sec time_interval:2578291220) - (invoke count:100000000 tsc_interval:257829118) [ 381.215288] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 390.262793] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 9 cycles(tsc) 90.385 ns (step:0) - (measurement period time:9.038500040 sec time_interval:9038500040) - (invoke count:100000000 tsc_interval:903849999) [ 390.282165] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 410.602531] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 203.117 ns (step:0) - (measurement period time:20.311708230 sec time_interval:20311708230) - (invoke count:100000000 tsc_interval:2031170817) root@(none)$ rmmod bench_page_pool_simple.ko [ 452.799939] bench_page_pool_simple: Unloaded root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 454.932877] bench_page_pool_simple: Loaded [ 455.706590] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769596200 sec time_interval:769596200) - (invoke count:1000000000 tsc_interval:76959611) [ 469.191300] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467234550 sec time_interval:13467234550) - (invoke count:1000000000 tsc_interval:1346723449) [ 470.710117] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500652740 sec time_interval:1500652740) - (invoke count:100000000 tsc_interval:150065267) [ 477.268702] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541285540 sec time_interval:6541285540) - (invoke count:1000000000 tsc_interval:654128549) [ 477.285914] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 479.873572] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 25.783 ns (step:0) - (measurement period time:2.578394320 sec time_interval:2578394320) - (invoke count:100000000 tsc_interval:257839426) [ 479.892426] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 488.941591] time_bench: Type:no-softirq-page_pool02 Per elem: 9 cycles(tsc) 90.399 ns (step:0) - (measurement period time:9.039988700 sec time_interval:9039988700) - (invoke count:100000000 tsc_interval:903998864) [ 488.960458] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 509.252999] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 202.837 ns (step:0) - (measurement period time:20.283709920 sec time_interval:20283709920) - (invoke count:100000000 tsc_interval:2028370986) [ 509.275188] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 509.283069] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 511.870501] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 25.783 ns (step:0) - (measurement period time:2.578339900 sec time_interval:2578339900) - (invoke count:100000000 tsc_interval:257833985) [ 511.889959] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 520.937881] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 9 cycles(tsc) 90.389 ns (step:0) - (measurement period time:9.038917580 sec time_interval:9038917580) - (invoke count:100000000 tsc_interval:903891752) [ 520.957253] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 541.278328] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 203.124 ns (step:0) - (measurement period time:20.312417960 sec time_interval:20312417960) - (invoke count:100000000 tsc_interval:2031241790) root@(none)$ cat /proc/version Linux version 6.13.0-rc6-00905-gbd05af7e28d2 (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #301 SMP PREEMPT Wed Jan 15 14:57:40 CST 2025 e8e4ef65fd4b (HEAD -> pp-inflight-fix_v6_test) page_pool: batch refilling pages to reduce atomic operation root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 81.660612] bench_page_pool_simple: Loaded [ 82.434335] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769577370 sec time_interval:769577370) - (invoke count:1000000000 tsc_interval:76957728) [ 95.919455] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467643010 sec time_interval:13467643010) - (invoke count:1000000000 tsc_interval:1346764295) [ 97.438295] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500675620 sec time_interval:1500675620) - (invoke count:100000000 tsc_interval:150067556) [ 103.997112] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541514490 sec time_interval:6541514490) - (invoke count:1000000000 tsc_interval:654151443) [ 104.014327] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 105.524295] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 15.007 ns (step:0) - (measurement period time:1.500704660 sec time_interval:1500704660) - (invoke count:100000000 tsc_interval:150070459) [ 105.543183] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 111.935637] time_bench: Type:no-softirq-page_pool02 Per elem: 6 cycles(tsc) 63.832 ns (step:0) - (measurement period time:6.383276590 sec time_interval:6383276590) - (invoke count:100000000 tsc_interval:638327653) [ 111.954492] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 131.007329] time_bench: Type:no-softirq-page_pool03 Per elem: 19 cycles(tsc) 190.440 ns (step:0) - (measurement period time:19.044004630 sec time_interval:19044004630) - (invoke count:100000000 tsc_interval:1904400455) [ 131.026621] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 131.034503] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 132.544154] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:1.500558810 sec time_interval:1500558810) - (invoke count:100000000 tsc_interval:150055876) [ 132.563614] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 139.007314] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 64.346 ns (step:0) - (measurement period time:6.434695610 sec time_interval:6434695610) - (invoke count:100000000 tsc_interval:643469557) [ 139.026687] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 158.093560] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 190.582 ns (step:0) - (measurement period time:19.058215140 sec time_interval:19058215140) - (invoke count:100000000 tsc_interval:1905821508) root@(none)$ rmmod bench_page_pool_simple.ko [ 172.671534] bench_page_pool_simple: Unloaded root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 174.012461] bench_page_pool_simple: Loaded [ 174.786162] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769579310 sec time_interval:769579310) - (invoke count:1000000000 tsc_interval:76957922) [ 188.270731] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467093170 sec time_interval:13467093170) - (invoke count:1000000000 tsc_interval:1346709310) [ 189.789532] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500638040 sec time_interval:1500638040) - (invoke count:100000000 tsc_interval:150063795) [ 196.348065] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541234660 sec time_interval:6541234660) - (invoke count:1000000000 tsc_interval:654123460) [ 196.365281] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 197.875195] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500650210 sec time_interval:1500650210) - (invoke count:100000000 tsc_interval:150065016) [ 197.894050] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 203.394345] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 54.911 ns (step:0) - (measurement period time:5.491119700 sec time_interval:5491119700) - (invoke count:100000000 tsc_interval:549111964) [ 203.413201] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 222.522015] time_bench: Type:no-softirq-page_pool03 Per elem: 19 cycles(tsc) 190.999 ns (step:0) - (measurement period time:19.099982300 sec time_interval:19099982300) - (invoke count:100000000 tsc_interval:1909998222) [ 222.541306] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 222.549187] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 224.058807] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:1.500531720 sec time_interval:1500531720) - (invoke count:100000000 tsc_interval:150053166) [ 224.078267] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 229.638432] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 55.511 ns (step:0) - (measurement period time:5.551160500 sec time_interval:5551160500) - (invoke count:100000000 tsc_interval:555116045) [ 229.657805] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 248.720382] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 190.539 ns (step:0) - (measurement period time:19.053918960 sec time_interval:19053918960) - (invoke count:100000000 tsc_interval:1905391890) root@(none)$ cat /proc/version Linux version 6.13.0-rc6-00906-ge8e4ef65fd4b (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #302 SMP PREEMPT Wed Jan 15 15:11:10 CST 2025 root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 493.008461] bench_page_pool_simple: Loaded [ 493.782195] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769607870 sec time_interval:769607870) - (invoke count:1000000000 tsc_interval:76960778) [ 507.266860] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467190060 sec time_interval:13467190060) - (invoke count:1000000000 tsc_interval:1346718999) [ 508.785667] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500643840 sec time_interval:1500643840) - (invoke count:100000000 tsc_interval:150064378) [ 515.344224] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541258530 sec time_interval:6541258530) - (invoke count:1000000000 tsc_interval:654125847) [ 515.361440] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 518.102903] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 27.321 ns (step:0) - (measurement period time:2.732199220 sec time_interval:2732199220) - (invoke count:100000000 tsc_interval:273219917) [ 518.121759] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 524.874604] time_bench: Type:no-softirq-page_pool02 Per elem: 6 cycles(tsc) 67.436 ns (step:0) - (measurement period time:6.743668740 sec time_interval:6743668740) - (invoke count:100000000 tsc_interval:674366869) [ 524.893460] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 543.980580] time_bench: Type:no-softirq-page_pool03 Per elem: 19 cycles(tsc) 190.782 ns (step:0) - (measurement period time:19.078288770 sec time_interval:19078288770) - (invoke count:100000000 tsc_interval:1907828868) [ 543.999871] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 544.007753] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 546.748829] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 27.319 ns (step:0) - (measurement period time:2.731985080 sec time_interval:2731985080) - (invoke count:100000000 tsc_interval:273198499) [ 546.768288] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 553.505522] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 67.282 ns (step:0) - (measurement period time:6.728229430 sec time_interval:6728229430) - (invoke count:100000000 tsc_interval:672822938) [ 553.524893] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 572.731687] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 191.981 ns (step:0) - (measurement period time:19.198137710 sec time_interval:19198137710) - (invoke count:100000000 tsc_inter root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 624.624453] bench_page_pool_simple: Loaded [ 625.398155] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769580100 sec time_interval:769580100) - (invoke count:1000000000 tsc_interval:76958003) [ 638.882758] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467127790 sec time_interval:13467127790) - (invoke count:1000000000 tsc_interval:1346712774) [ 640.401554] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500633000 sec time_interval:1500633000) - (invoke count:100000000 tsc_interval:150063294) [ 646.960100] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541244270 sec time_interval:6541244270) - (invoke count:1000000000 tsc_interval:654124421) [ 646.977313] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 649.718817] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 27.322 ns (step:0) - (measurement period time:2.732241230 sec time_interval:2732241230) - (invoke count:100000000 tsc_interval:273224117) [ 649.737673] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 656.485353] time_bench: Type:no-softirq-page_pool02 Per elem: 6 cycles(tsc) 67.385 ns (step:0) - (measurement period time:6.738504450 sec time_interval:6738504450) - (invoke count:100000000 tsc_interval:673850439) [ 656.504211] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 675.730226] time_bench: Type:no-softirq-page_pool03 Per elem: 19 cycles(tsc) 192.171 ns (step:0) - (measurement period time:19.217181040 sec time_interval:19217181040) - (invoke count:100000000 tsc_interval:1921718097) [ 675.749517] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 675.757399] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 678.498457] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 27.319 ns (step:0) - (measurement period time:2.731969810 sec time_interval:2731969810) - (invoke count:100000000 tsc_interval:273196975) [ 678.517917] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 685.272622] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 67.457 ns (step:0) - (measurement period time:6.745701080 sec time_interval:6745701080) - (invoke count:100000000 tsc_interval:674570103) [ 685.291993] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 704.535410] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 192.347 ns (step:0) - (measurement period time:19.234760880 sec time_interval:19234760880) - (invoke count:100000000 tsc_interval:1923476080) 5760bcdd3fef (HEAD -> pp-inflight-fix_v6_test) page_pool: use list instead of array for alloc cache root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 1378.118009] bench_page_pool_simple: Loaded [ 1378.891760] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769629870 sec time_interval:769629870) - (invoke count:1000000000 tsc_interval:76962977) [ 1392.376430] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467196340 sec time_interval:13467196340) - (invoke count:1000000000 tsc_interval:1346719628) [ 1393.895253] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500659490 sec time_interval:1500659490) - (invoke count:100000000 tsc_interval:150065942) [ 1400.453791] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541237910 sec time_interval:6541237910) - (invoke count:1000000000 tsc_interval:654123784) [ 1400.471006] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 1402.135620] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 16.553 ns (step:0) - (measurement period time:1.655350930 sec time_interval:1655350930) - (invoke count:100000000 tsc_interval:165535087) [ 1402.154474] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 1407.685584] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 55.219 ns (step:0) - (measurement period time:5.521934590 sec time_interval:5521934590) - (invoke count:100000000 tsc_interval:552193452) [ 1407.704438] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 1427.906125] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 201.928 ns (step:0) - (measurement period time:20.192856910 sec time_interval:20192856910) - (invoke count:100000000 tsc_interval:2019285683) [ 1427.925416] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 1427.933297] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 1429.519900] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 15.775 ns (step:0) - (measurement period time:1.577513290 sec time_interval:1577513290) - (invoke count:100000000 tsc_interval:157751323) [ 1429.539358] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 1435.138765] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 55.904 ns (step:0) - (measurement period time:5.590404140 sec time_interval:5590404140) - (invoke count:100000000 tsc_interval:559040410) [ 1435.158136] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 1455.411856] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 202.450 ns (step:0) - (measurement period time:20.245062650 sec time_interval:20245062650) - (invoke count:100000000 tsc_interval:2024506258) root@(none)$ rmmod bench_page_pool_simple.ko [ 1624.116972] bench_page_pool_simple: Unloaded root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 1625.254057] bench_page_pool_simple: Loaded [ 1626.027804] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769627010 sec time_interval:769627010) - (invoke count:1000000000 tsc_interval:76962694) [ 1639.512664] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467385750 sec time_interval:13467385750) - (invoke count:1000000000 tsc_interval:1346738568) [ 1641.031493] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500664980 sec time_interval:1500664980) - (invoke count:100000000 tsc_interval:150066492) [ 1647.590116] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541324190 sec time_interval:6541324190) - (invoke count:1000000000 tsc_interval:654132413) [ 1647.607328] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 1649.211118] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 15.945 ns (step:0) - (measurement period time:1.594526020 sec time_interval:1594526020) - (invoke count:100000000 tsc_interval:159452596) [ 1649.229971] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 1654.761083] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 55.219 ns (step:0) - (measurement period time:5.521934830 sec time_interval:5521934830) - (invoke count:100000000 tsc_interval:552193476) [ 1654.779937] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 1674.973459] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 201.846 ns (step:0) - (measurement period time:20.184690600 sec time_interval:20184690600) - (invoke count:100000000 tsc_interval:2018469053) [ 1674.992751] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 1675.000632] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 1676.622598] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 16.128 ns (step:0) - (measurement period time:1.612877140 sec time_interval:1612877140) - (invoke count:100000000 tsc_interval:161287709) [ 1676.642056] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 1682.241489] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 55.904 ns (step:0) - (measurement period time:5.590428410 sec time_interval:5590428410) - (invoke count:100000000 tsc_interval:559042835) [ 1682.260860] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 1702.540682] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 202.711 ns (step:0) - (measurement period time:20.271164760 sec time_interval:20271164760) - (invoke count:100000000 tsc_interval:2027116470) root@(none)$ rmmod bench_page_pool_simple.ko [ 3945.224975] bench_page_pool_simple: Unloaded root@(none)$ insmod bench_page_pool_simple.ko loops=100000000 [ 3946.318072] bench_page_pool_simple: Loaded [ 3947.091825] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769631280 sec time_interval:769631280) - (invoke count:1000000000 tsc_interval:76963115) [ 3960.576784] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467483140 sec time_interval:13467483140) - (invoke count:1000000000 tsc_interval:1346748308) [ 3962.095607] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500658780 sec time_interval:1500658780) - (invoke count:100000000 tsc_interval:150065872) [ 3968.654285] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541378830 sec time_interval:6541378830) - (invoke count:1000000000 tsc_interval:654137877) [ 3968.671520] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 3971.491845] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 28.110 ns (step:0) - (measurement period time:2.811058810 sec time_interval:2811058810) - (invoke count:100000000 tsc_interval:281105875) [ 3971.510703] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 3978.348581] time_bench: Type:no-softirq-page_pool02 Per elem: 6 cycles(tsc) 68.287 ns (step:0) - (measurement period time:6.828701400 sec time_interval:6828701400) - (invoke count:100000000 tsc_interval:682870134) [ 3978.367435] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 3998.595188] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 202.189 ns (step:0) - (measurement period time:20.218922630 sec time_interval:20218922630) - (invoke count:100000000 tsc_interval:2021892255) [ 3998.614480] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 3998.622362] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 4001.442253] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 28.108 ns (step:0) - (measurement period time:2.810802040 sec time_interval:2810802040) - (invoke count:100000000 tsc_interval:281080197) [ 4001.461713] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 4008.290654] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 68.199 ns (step:0) - (measurement period time:6.819937430 sec time_interval:6819937430) - (invoke count:100000000 tsc_interval:681993738) [ 4008.310026] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 4028.570377] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 202.516 ns (step:0) - (measurement period time:20.251693920 sec time_interval:20251693920) - (invoke count:100000000 tsc_interval:2025169387) root@(none)$ cat /proc/version Linux version 6.13.0-rc6-00907-g5760bcdd3fef (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #303 SMP PREEMPT Wed Jan 15 15:27:07 CST 2025
On 15/01/2025 12.33, Yunsheng Lin wrote: > On 2025/1/14 22:31, Jesper Dangaard Brouer wrote: >> >> >> On 10/01/2025 14.06, Yunsheng Lin wrote: >>> This patchset fix a possible time window problem for page_pool and >>> the dma API misuse problem as mentioned in [1], and try to avoid the >>> overhead of the fixing using some optimization. >>> >>> From the below performance data, the overhead is not so obvious >>> due to performance variations for time_bench_page_pool01_fast_path() >>> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead >>> for time_bench_page_pool03_slow() for fixing the bug. >>> >> >> My benchmarking on x86_64 CPUs looks significantly different. >> - CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz >> >> Benchmark (bench_page_pool_simple) results from before and after patchset: >> >> | Test name | Cycles | | |Nanosec | | | % | >> | (tasklet_*)| Before | After |diff| Before | After | diff | change | >> |------------+--------+-------+----+--------+--------+-------+--------| >> | fast_path | 19 | 24 | 5| 5.399 | 6.928 | 1.529 | 28.3 | >> | ptr_ring | 54 | 79 | 25| 15.090 | 21.976 | 6.886 | 45.6 | >> | slow | 238 | 299 | 61| 66.134 | 83.298 |17.164 | 26.0 | >> #+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f >> >> My above testing show a clear performance regressions across three >> different page_pool operating modes. > > I retested it on arm64 server patch by patch as the raw performance > data in the attachment, it seems the result seemed similar as before. > > Before this patchset: > fast_path ptr_ring slow > 1. 31.171 ns 60.980 ns 164.917 ns > 2. 28.824 ns 60.891 ns 170.241 ns > 3. 14.236 ns 60.583 ns 164.355 ns > > With patch 1-4: > 4. 31.443 ns 53.242 ns 210.148 ns > 5. 31.406 ns 53.270 ns 210.189 ns > > With patch 1-5: > 6. 26.163 ns 53.781 ns 189.450 ns > 7. 26.189 ns 53.798 ns 189.466 ns > > With patch 1-8: > 8. 28.108 ns 68.199 ns 202.516 ns > 9. 16.128 ns 55.904 ns 202.711 ns > > I am not able to get hold of a x86 server yet, I might be able > to get one during weekend. > > Theoretically, patch 1-4 or 1-5 should not have much performance > impact for fast_path and ptr_ring except for the rcu_lock mentioned > in page_pool_napi_local(), so it would be good if patch 1-5 is also > tested in your testlab with the rcu_lock removing in > page_pool_napi_local(). > What are you saying? - (1) test patch 1-5 - or (2) test patch 1-5 but revert patch 2 with page_pool_napi_local() --Jesper >> >> >> Data also available in: >> - https://github.com/xdp-project/xdp-project/blob/main/areas/mem/page_pool07_bench_DMA_fix.org >>
On 2025/1/16 1:40, Jesper Dangaard Brouer wrote: > > > On 15/01/2025 12.33, Yunsheng Lin wrote: >> On 2025/1/14 22:31, Jesper Dangaard Brouer wrote: >>> >>> >>> On 10/01/2025 14.06, Yunsheng Lin wrote: >>>> This patchset fix a possible time window problem for page_pool and >>>> the dma API misuse problem as mentioned in [1], and try to avoid the >>>> overhead of the fixing using some optimization. >>>> >>>> From the below performance data, the overhead is not so obvious >>>> due to performance variations for time_bench_page_pool01_fast_path() >>>> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead >>>> for time_bench_page_pool03_slow() for fixing the bug. >>>> >>> >>> My benchmarking on x86_64 CPUs looks significantly different. >>> - CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz >>> >>> Benchmark (bench_page_pool_simple) results from before and after patchset: >>> >>> | Test name | Cycles | | |Nanosec | | | % | >>> | (tasklet_*)| Before | After |diff| Before | After | diff | change | >>> |------------+--------+-------+----+--------+--------+-------+--------| >>> | fast_path | 19 | 24 | 5| 5.399 | 6.928 | 1.529 | 28.3 | >>> | ptr_ring | 54 | 79 | 25| 15.090 | 21.976 | 6.886 | 45.6 | >>> | slow | 238 | 299 | 61| 66.134 | 83.298 |17.164 | 26.0 | >>> #+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f >>> >>> My above testing show a clear performance regressions across three >>> different page_pool operating modes. >> >> I retested it on arm64 server patch by patch as the raw performance >> data in the attachment, it seems the result seemed similar as before. >> >> Before this patchset: >> fast_path ptr_ring slow >> 1. 31.171 ns 60.980 ns 164.917 ns >> 2. 28.824 ns 60.891 ns 170.241 ns >> 3. 14.236 ns 60.583 ns 164.355 ns >> >> With patch 1-4: >> 4. 31.443 ns 53.242 ns 210.148 ns >> 5. 31.406 ns 53.270 ns 210.189 ns >> >> With patch 1-5: >> 6. 26.163 ns 53.781 ns 189.450 ns >> 7. 26.189 ns 53.798 ns 189.466 ns >> >> With patch 1-8: >> 8. 28.108 ns 68.199 ns 202.516 ns >> 9. 16.128 ns 55.904 ns 202.711 ns >> >> I am not able to get hold of a x86 server yet, I might be able >> to get one during weekend. >> >> Theoretically, patch 1-4 or 1-5 should not have much performance >> impact for fast_path and ptr_ring except for the rcu_lock mentioned >> in page_pool_napi_local(), so it would be good if patch 1-5 is also >> tested in your testlab with the rcu_lock removing in >> page_pool_napi_local(). >> > > What are you saying? > - (1) test patch 1-5 > - or (2) test patch 1-5 but revert patch 2 with page_pool_napi_local() patch 1-5 with below applied. --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -1207,10 +1207,8 @@ static bool page_pool_napi_local(const struct page_pool *pool) /* Synchronizated with page_pool_destory() to avoid use-after-free * for 'napi'. */ - rcu_read_lock(); napi = READ_ONCE(pool->p.napi); napi_local = napi && READ_ONCE(napi->list_owner) == cpuid; - rcu_read_unlock(); return napi_local; }
On 16/01/2025 13.52, Yunsheng Lin wrote: > On 2025/1/16 1:40, Jesper Dangaard Brouer wrote: >> >> >> On 15/01/2025 12.33, Yunsheng Lin wrote: >>> On 2025/1/14 22:31, Jesper Dangaard Brouer wrote: >>>> >>>> >>>> On 10/01/2025 14.06, Yunsheng Lin wrote: >>>>> This patchset fix a possible time window problem for page_pool and >>>>> the dma API misuse problem as mentioned in [1], and try to avoid the >>>>> overhead of the fixing using some optimization. >>>>> >>>>> From the below performance data, the overhead is not so obvious >>>>> due to performance variations for time_bench_page_pool01_fast_path() >>>>> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead >>>>> for time_bench_page_pool03_slow() for fixing the bug. >>>>> >>>> >>>> My benchmarking on x86_64 CPUs looks significantly different. >>>> - CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz >>>> >>>> Benchmark (bench_page_pool_simple) results from before and after patchset: >>>> >>>> | Test name | Cycles | | |Nanosec | | | % | >>>> | (tasklet_*)| Before | After |diff| Before | After | diff | change | >>>> |------------+--------+-------+----+--------+--------+-------+--------| >>>> | fast_path | 19 | 24 | 5| 5.399 | 6.928 | 1.529 | 28.3 | >>>> | ptr_ring | 54 | 79 | 25| 15.090 | 21.976 | 6.886 | 45.6 | >>>> | slow | 238 | 299 | 61| 66.134 | 83.298 |17.164 | 26.0 | >>>> #+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f >>>> >>>> My above testing show a clear performance regressions across three >>>> different page_pool operating modes. >>> >>> I retested it on arm64 server patch by patch as the raw performance >>> data in the attachment, it seems the result seemed similar as before. >>> >>> Before this patchset: >>> fast_path ptr_ring slow >>> 1. 31.171 ns 60.980 ns 164.917 ns >>> 2. 28.824 ns 60.891 ns 170.241 ns >>> 3. 14.236 ns 60.583 ns 164.355 ns >>> >>> With patch 1-4: >>> 4. 31.443 ns 53.242 ns 210.148 ns >>> 5. 31.406 ns 53.270 ns 210.189 ns >>> >>> With patch 1-5: >>> 6. 26.163 ns 53.781 ns 189.450 ns >>> 7. 26.189 ns 53.798 ns 189.466 ns >>> >>> With patch 1-8: >>> 8. 28.108 ns 68.199 ns 202.516 ns >>> 9. 16.128 ns 55.904 ns 202.711 ns >>> >>> I am not able to get hold of a x86 server yet, I might be able >>> to get one during weekend. >>> >>> Theoretically, patch 1-4 or 1-5 should not have much performance >>> impact for fast_path and ptr_ring except for the rcu_lock mentioned >>> in page_pool_napi_local(), so it would be good if patch 1-5 is also >>> tested in your testlab with the rcu_lock removing in >>> page_pool_napi_local(). >>> >> >> What are you saying? >> - (1) test patch 1-5 >> - or (2) test patch 1-5 but revert patch 2 with page_pool_napi_local() > > patch 1-5 with below applied. > > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -1207,10 +1207,8 @@ static bool page_pool_napi_local(const struct page_pool *pool) > /* Synchronizated with page_pool_destory() to avoid use-after-free > * for 'napi'. > */ > - rcu_read_lock(); > napi = READ_ONCE(pool->p.napi); > napi_local = napi && READ_ONCE(napi->list_owner) == cpuid; > - rcu_read_unlock(); > > return napi_local; > } > Benchmark (bench_page_pool_simple) results from before and after patchset with patches 1-5m and rcu lock removal as requested. | Test name |Cycles | 1-5 | | Nanosec | 1-5 | | % | | (tasklet_*)|Before | After |diff| Before | After | diff | change | |------------+-------+-------+----+---------+--------+--------+--------| | fast_path | 19 | 19 | 0| 5.399 | 5.492 | 0.093 | 1.7 | | ptr_ring | 54 | 57 | 3| 15.090 | 15.849 | 0.759 | 5.0 | | slow | 238 | 284 | 46| 66.134 | 78.909 | 12.775 | 19.3 | #+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f This test with patches 1-5 looks much better regarding performance. --Jesper https://github.com/xdp-project/xdp-project/blob/main/areas/mem/page_pool07_bench_DMA_fix.org#e5-1650-pp01-dma-fix-v7-p1-5 Kernel: - 6.13.0-rc6-pp01-DMA-fix-v7-p1-5+ #5 SMP PREEMPT_DYNAMIC Thu Jan 16 18:06:53 CET 2025 x86_64 GNU/Linux Machine: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz modprobe bench_page_pool_simple loops=100000000 Raw data: [ 187.309423] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 187.872849] time_bench: Type:no-softirq-page_pool01 Per elem: 19 cycles(tsc) 5.539 ns (step:0) - (measurement period time:0.553906443 sec time_interval:553906443) - (invoke count:100000000 tsc_interval:1994123064) [ 187.892023] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 189.611070] time_bench: Type:no-softirq-page_pool02 Per elem: 61 cycles(tsc) 17.095 ns (step:0) - (measurement period time:1.709580367 sec time_interval:1709580367) - (invoke count:100000000 tsc_interval:6154679394) [ 189.630414] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 197.222387] time_bench: Type:no-softirq-page_pool03 Per elem: 272 cycles(tsc) 75.826 ns (step:0) - (measurement period time:7.582681388 sec time_interval:7582681388) - (invoke count:100000000 tsc_interval:27298499214) [ 197.241926] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 197.249968] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 197.808470] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 19 cycles(tsc) 5.492 ns (step:0) - (measurement period time:0.549225541 sec time_interval:549225541) - (invoke count:100000000 tsc_interval:1977272238) [ 197.828174] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 199.422305] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 57 cycles(tsc) 15.849 ns (step:0) - (measurement period time:1.584920736 sec time_interval:1584920736) - (invoke count:100000000 tsc_interval:5705890830) [ 199.442087] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 207.342120] time_bench: Type:tasklet_page_pool03_slow Per elem: 284 cycles(tsc) 78.909 ns (step:0) - (measurement period time:7.890955151 sec time_interval:7890955151) - (invoke count:100000000 tsc_interval:28408319289)
On 2025/1/17 2:02, Jesper Dangaard Brouer wrote: > > Benchmark (bench_page_pool_simple) results from before and after > patchset with patches 1-5m and rcu lock removal as requested. > > | Test name |Cycles | 1-5 | | Nanosec | 1-5 | | % | > | (tasklet_*)|Before | After |diff| Before | After | diff | change | > |------------+-------+-------+----+---------+--------+--------+--------| > | fast_path | 19 | 19 | 0| 5.399 | 5.492 | 0.093 | 1.7 | > | ptr_ring | 54 | 57 | 3| 15.090 | 15.849 | 0.759 | 5.0 | > | slow | 238 | 284 | 46| 66.134 | 78.909 | 12.775 | 19.3 | > #+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f > > This test with patches 1-5 looks much better regarding performance. Thanks for the testing. Is there any notiable performance variation during different test running for the same built kernel in your machine? > > --Jesper > > https://github.com/xdp-project/xdp-project/blob/main/areas/mem/page_pool07_bench_DMA_fix.org#e5-1650-pp01-dma-fix-v7-p1-5 > > Kernel: > - 6.13.0-rc6-pp01-DMA-fix-v7-p1-5+ #5 SMP PREEMPT_DYNAMIC Thu Jan 16 18:06:53 CET 2025 x86_64 GNU/Linux > > Machine: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz > > modprobe bench_page_pool_simple loops=100000000 > > Raw data: > [ 187.309423] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path > [ 187.872849] time_bench: Type:no-softirq-page_pool01 Per elem: 19 cycles(tsc) 5.539 ns (step:0) - (measurement period time:0.553906443 sec time_interval:553906443) - (invoke count:100000000 tsc_interval:1994123064) > [ 187.892023] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path > [ 189.611070] time_bench: Type:no-softirq-page_pool02 Per elem: 61 cycles(tsc) 17.095 ns (step:0) - (measurement period time:1.709580367 sec time_interval:1709580367) - (invoke count:100000000 tsc_interval:6154679394) > [ 189.630414] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path > [ 197.222387] time_bench: Type:no-softirq-page_pool03 Per elem: 272 cycles(tsc) 75.826 ns (step:0) - (measurement period time:7.582681388 sec time_interval:7582681388) - (invoke count:100000000 tsc_interval:27298499214) > [ 197.241926] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path > [ 197.249968] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path > [ 197.808470] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 19 cycles(tsc) 5.492 ns (step:0) - (measurement period time:0.549225541 sec time_interval:549225541) - (invoke count:100000000 tsc_interval:1977272238) > [ 197.828174] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path > [ 199.422305] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 57 cycles(tsc) 15.849 ns (step:0) - (measurement period time:1.584920736 sec time_interval:1584920736) - (invoke count:100000000 tsc_interval:5705890830) > [ 199.442087] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path > [ 207.342120] time_bench: Type:tasklet_page_pool03_slow Per elem: 284 cycles(tsc) 78.909 ns (step:0) - (measurement period time:7.890955151 sec time_interval:7890955151) - (invoke count:100000000 tsc_interval:28408319289) >
On 17/01/2025 12.35, Yunsheng Lin wrote: > On 2025/1/17 2:02, Jesper Dangaard Brouer wrote: > >> >> Benchmark (bench_page_pool_simple) results from before and after >> patchset with patches 1-5m and rcu lock removal as requested. >> >> | Test name |Cycles | 1-5 | | Nanosec | 1-5 | | % | >> | (tasklet_*)|Before | After |diff| Before | After | diff | change | >> |------------+-------+-------+----+---------+--------+--------+--------| >> | fast_path | 19 | 19 | 0| 5.399 | 5.492 | 0.093 | 1.7 | >> | ptr_ring | 54 | 57 | 3| 15.090 | 15.849 | 0.759 | 5.0 | >> | slow | 238 | 284 | 46| 66.134 | 78.909 | 12.775 | 19.3 | >> #+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f >> >> This test with patches 1-5 looks much better regarding performance. > > Thanks for the testing. > > Is there any notiable performance variation during different test running > for the same built kernel in your machine? > My machine have quite stable performance for this benchmark. >> https://github.com/xdp-project/xdp-project/blob/main/areas/mem/page_pool07_bench_DMA_fix.org#e5-1650-pp01-dma-fix-v7-p1-5 Like documented in above link. I have also increased the loops count for the test to get it more stable, given this will be measured over a longer period. modprobe bench_page_pool_simple loops=100000000 >> Kernel: >> - 6.13.0-rc6-pp01-DMA-fix-v7-p1-5+ #5 SMP PREEMPT_DYNAMIC Thu Jan 16 18:06:53 CET 2025 x86_64 GNU/Linux >> >> Machine: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz >> >> modprobe bench_page_pool_simple loops=100000000 >> >> Raw data: >> [ 187.309423] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path >> [ 187.872849] time_bench: Type:no-softirq-page_pool01 Per elem: 19 cycles(tsc) 5.539 ns (step:0) - (measurement period time:0.553906443 sec time_interval:553906443) - (invoke count:100000000 tsc_interval:1994123064) >> [ 187.892023] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path >> [ 189.611070] time_bench: Type:no-softirq-page_pool02 Per elem: 61 cycles(tsc) 17.095 ns (step:0) - (measurement period time:1.709580367 sec time_interval:1709580367) - (invoke count:100000000 tsc_interval:6154679394) >> [ 189.630414] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path >> [ 197.222387] time_bench: Type:no-softirq-page_pool03 Per elem: 272 cycles(tsc) 75.826 ns (step:0) - (measurement period time:7.582681388 sec time_interval:7582681388) - (invoke count:100000000 tsc_interval:27298499214) >> [ 197.241926] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path >> [ 197.249968] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path >> [ 197.808470] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 19 cycles(tsc) 5.492 ns (step:0) - (measurement period time:0.549225541 sec time_interval:549225541) - (invoke count:100000000 tsc_interval:1977272238) >> [ 197.828174] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path >> [ 199.422305] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 57 cycles(tsc) 15.849 ns (step:0) - (measurement period time:1.584920736 sec time_interval:1584920736) - (invoke count:100000000 tsc_interval:5705890830) >> [ 199.442087] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path >> [ 207.342120] time_bench: Type:tasklet_page_pool03_slow Per elem: 284 cycles(tsc) 78.909 ns (step:0) - (measurement period time:7.890955151 sec time_interval:7890955151) - (invoke count:100000000 tsc_interval:28408319289) >>