Message ID | 20241028115343.3405838-1-linyunsheng@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | Replace page_frag with page_frag_cache (Part-1) | expand |
On Mon, Oct 28, 2024 at 5:00 AM Yunsheng Lin <linyunsheng@huawei.com> wrote: > > This is part 1 of "Replace page_frag with page_frag_cache", > which mainly contain refactoring and optimization for the > implementation of page_frag API before the replacing. > > As the discussion in [1], it would be better to target net-next > tree to get more testing as all the callers page_frag API are > in networking, and the chance of conflicting with MM tree seems > low as implementation of page_frag API seems quite self-contained. > > After [2], there are still two implementations for page frag: > > 1. mm/page_alloc.c: net stack seems to be using it in the > rx part with 'struct page_frag_cache' and the main API > being page_frag_alloc_align(). > 2. net/core/sock.c: net stack seems to be using it in the > tx part with 'struct page_frag' and the main API being > skb_page_frag_refill(). > > This patchset tries to unfiy the page frag implementation > by replacing page_frag with page_frag_cache for sk_page_frag() > first. net_high_order_alloc_disable_key for the implementation > in net/core/sock.c doesn't seems matter that much now as pcp > is also supported for high-order pages: > commit 44042b449872 ("mm/page_alloc: allow high-order pages to > be stored on the per-cpu lists") > > As the related change is mostly related to networking, so > targeting the net-next. And will try to replace the rest > of page_frag in the follow patchset. > > After this patchset: > 1. Unify the page frag implementation by taking the best out of > two the existing implementations: we are able to save some space > for the 'page_frag_cache' API user, and avoid 'get_page()' for > the old 'page_frag' API user. > 2. Future bugfix and performance can be done in one place, hence > improving maintainability of page_frag's implementation. > > Kernel Image changing: > Linux Kernel total | text data bss > ------------------------------------------------------ > after 45250307 | 27274279 17209996 766032 > before 45254134 | 27278118 17209984 766032 > delta -3827 | -3839 +12 +0 > > Performance validation: > 1. Using micro-benchmark ko added in patch 1 to test aligned and > non-aligned API performance impact for the existing users, there > is no notiable performance degradation. Instead we seems to have > some major performance boot for both aligned and non-aligned API > after switching to ptr_ring for testing, respectively about 200% > and 10% improvement in arm64 server as below. > > 2. Use the below netcat test case, we also have some minor > performance boot for replacing 'page_frag' with 'page_frag_cache' > after this patchset. > server: taskset -c 32 nc -l -k 1234 > /dev/null > client: perf stat -r 200 -- taskset -c 0 head -c 20G /dev/zero | taskset -c 1 nc 127.0.0.1 1234 > > In order to avoid performance noise as much as possible, the testing > is done in system without any other load and have enough iterations to > prove the data is stable enough, complete log for testing is below: > > perf stat -r 200 -- insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000 > perf stat -r 200 -- insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000 test_align=1 > taskset -c 32 nc -l -k 1234 > /dev/null > perf stat -r 200 -- taskset -c 0 head -c 20G /dev/zero | taskset -c 1 nc 127.0.0.1 1234 > > *After* this patchset: > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000' (200 runs): > > 17.758393 task-clock (msec) # 0.004 CPUs utilized ( +- 0.51% ) > 5 context-switches # 0.293 K/sec ( +- 0.65% ) > 0 cpu-migrations # 0.008 K/sec ( +- 17.21% ) > 74 page-faults # 0.004 M/sec ( +- 0.12% ) > 46128650 cycles # 2.598 GHz ( +- 0.51% ) > 60810511 instructions # 1.32 insn per cycle ( +- 0.04% ) > 14764914 branches # 831.433 M/sec ( +- 0.04% ) > 19281 branch-misses # 0.13% of all branches ( +- 0.13% ) > > 4.240273854 seconds time elapsed ( +- 0.13% ) > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000 test_align=1' (200 runs): > > 17.348690 task-clock (msec) # 0.019 CPUs utilized ( +- 0.66% ) > 5 context-switches # 0.310 K/sec ( +- 0.84% ) > 0 cpu-migrations # 0.009 K/sec ( +- 16.55% ) > 74 page-faults # 0.004 M/sec ( +- 0.11% ) > 45065287 cycles # 2.598 GHz ( +- 0.66% ) > 60755389 instructions # 1.35 insn per cycle ( +- 0.05% ) > 14747865 branches # 850.085 M/sec ( +- 0.05% ) > 19272 branch-misses # 0.13% of all branches ( +- 0.13% ) > > 0.935251375 seconds time elapsed ( +- 0.07% ) > > Performance counter stats for 'taskset -c 0 head -c 20G /dev/zero' (200 runs): > > 16626.042731 task-clock (msec) # 0.607 CPUs utilized ( +- 0.03% ) > 3291020 context-switches # 0.198 M/sec ( +- 0.05% ) > 1 cpu-migrations # 0.000 K/sec ( +- 0.50% ) > 85 page-faults # 0.005 K/sec ( +- 0.16% ) > 30581044838 cycles # 1.839 GHz ( +- 0.05% ) > 34962744631 instructions # 1.14 insn per cycle ( +- 0.01% ) > 6483883671 branches # 389.984 M/sec ( +- 0.02% ) > 99624551 branch-misses # 1.54% of all branches ( +- 0.17% ) > > 27.370305077 seconds time elapsed ( +- 0.01% ) > > > *Before* this patchset: > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000' (200 runs): > > 21.587934 task-clock (msec) # 0.005 CPUs utilized ( +- 0.72% ) > 6 context-switches # 0.281 K/sec ( +- 0.28% ) > 1 cpu-migrations # 0.047 K/sec ( +- 0.50% ) > 73 page-faults # 0.003 M/sec ( +- 0.12% ) > 56080697 cycles # 2.598 GHz ( +- 0.72% ) > 61605150 instructions # 1.10 insn per cycle ( +- 0.05% ) > 14950196 branches # 692.526 M/sec ( +- 0.05% ) > 19410 branch-misses # 0.13% of all branches ( +- 0.18% ) > > 4.603530546 seconds time elapsed ( +- 0.11% ) > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000 test_align=1' (200 runs): > > 20.988297 task-clock (msec) # 0.006 CPUs utilized ( +- 0.81% ) > 7 context-switches # 0.316 K/sec ( +- 0.54% ) > 1 cpu-migrations # 0.048 K/sec ( +- 0.70% ) > 73 page-faults # 0.003 M/sec ( +- 0.11% ) > 54512166 cycles # 2.597 GHz ( +- 0.81% ) > 61440941 instructions # 1.13 insn per cycle ( +- 0.08% ) > 14906043 branches # 710.207 M/sec ( +- 0.08% ) > 19927 branch-misses # 0.13% of all branches ( +- 0.17% ) > > 3.438041238 seconds time elapsed ( +- 1.11% ) > > Performance counter stats for 'taskset -c 0 head -c 20G /dev/zero' (200 runs): > > 17364.040855 task-clock (msec) # 0.624 CPUs utilized ( +- 0.02% ) > 3340375 context-switches # 0.192 M/sec ( +- 0.06% ) > 1 cpu-migrations # 0.000 K/sec > 85 page-faults # 0.005 K/sec ( +- 0.15% ) > 32077623335 cycles # 1.847 GHz ( +- 0.03% ) > 35121047596 instructions # 1.09 insn per cycle ( +- 0.01% ) > 6519872824 branches # 375.481 M/sec ( +- 0.02% ) > 101877022 branch-misses # 1.56% of all branches ( +- 0.14% ) > > 27.842745343 seconds time elapsed ( +- 0.02% ) > > Is this actually the numbers for this patch set? Seems like you have been using the same numbers for the last several releases. I can understand the "before" being mostly the same, but since we have factored out the refactor portion of it the numbers for the "after" should have deviated as I find it highly unlikely the numbers are exactly the same down to the nanosecond. from the previous patch set. Also it wouldn't hurt to have an explanation for the 3.4->0.9 second performance change as it seems like the samples don't seem to match up with the elapsed time data.
On 2024/10/28 23:30, Alexander Duyck wrote: ... >> >> > > Is this actually the numbers for this patch set? Seems like you have > been using the same numbers for the last several releases. I can Yes, as recent refactoring doesn't seems big enough that the perf data is reused for the last several releases. > understand the "before" being mostly the same, but since we have As there is rebasing for the latest net-next tree, even the 'before' might not be the same as the testing seems sensitive to other changing, like binary size changing and page allocator changing during different version. So it might need both the same kernel and config for 'before' and 'after'. > factored out the refactor portion of it the numbers for the "after" > should have deviated as I find it highly unlikely the numbers are > exactly the same down to the nanosecond. from the previous patch set. Below is the the performance data for Part-1 with the latest net-next: Before this patchset: Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000' (200 runs): 17.990790 task-clock (msec) # 0.003 CPUs utilized ( +- 0.19% ) 8 context-switches # 0.444 K/sec ( +- 0.09% ) 0 cpu-migrations # 0.000 K/sec ( +-100.00% ) 81 page-faults # 0.004 M/sec ( +- 0.09% ) 46712295 cycles # 2.596 GHz ( +- 0.19% ) 34466157 instructions # 0.74 insn per cycle ( +- 0.01% ) 8011755 branches # 445.325 M/sec ( +- 0.01% ) 39913 branch-misses # 0.50% of all branches ( +- 0.07% ) 6.382252558 seconds time elapsed ( +- 0.07% ) Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000 test_align=1' (200 runs): 17.638466 task-clock (msec) # 0.003 CPUs utilized ( +- 0.01% ) 8 context-switches # 0.451 K/sec ( +- 0.20% ) 0 cpu-migrations # 0.001 K/sec ( +- 70.53% ) 81 page-faults # 0.005 M/sec ( +- 0.08% ) 45794305 cycles # 2.596 GHz ( +- 0.01% ) 34435077 instructions # 0.75 insn per cycle ( +- 0.00% ) 8004416 branches # 453.805 M/sec ( +- 0.00% ) 39758 branch-misses # 0.50% of all branches ( +- 0.06% ) 5.328976590 seconds time elapsed ( +- 0.60% ) After this patchset: Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000' (200 runs): 18.647432 task-clock (msec) # 0.003 CPUs utilized ( +- 1.11% ) 8 context-switches # 0.422 K/sec ( +- 0.36% ) 0 cpu-migrations # 0.005 K/sec ( +- 22.54% ) 81 page-faults # 0.004 M/sec ( +- 0.08% ) 48418108 cycles # 2.597 GHz ( +- 1.11% ) 35889299 instructions # 0.74 insn per cycle ( +- 0.11% ) 8318363 branches # 446.086 M/sec ( +- 0.11% ) 19263 branch-misses # 0.23% of all branches ( +- 0.13% ) 5.624666079 seconds time elapsed ( +- 0.07% ) Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000 test_align=1' (200 runs): 18.466768 task-clock (msec) # 0.007 CPUs utilized ( +- 1.23% ) 8 context-switches # 0.428 K/sec ( +- 0.26% ) 0 cpu-migrations # 0.002 K/sec ( +- 34.73% ) 81 page-faults # 0.004 M/sec ( +- 0.09% ) 47949220 cycles # 2.597 GHz ( +- 1.23% ) 35859039 instructions # 0.75 insn per cycle ( +- 0.12% ) 8309086 branches # 449.948 M/sec ( +- 0.11% ) 19246 branch-misses # 0.23% of all branches ( +- 0.08% ) 2.573546035 seconds time elapsed ( +- 0.04% ) > > Also it wouldn't hurt to have an explanation for the 3.4->0.9 second > performance change as it seems like the samples don't seem to match up > with the elapsed time data. As there is also a 4.6->3.4 second performance change for the 'before' part, I am not really thinking much at that. I am guessing some timing for implementation of ptr_ring or cpu cache cause the above performance change? I used the same cpu for both pop and push thread, the performance change doesn't seems to exist anymore, and the performance improvement doesn't seems to exist anymore either: After this patchset: Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=0 test_alloc_len=12 nr_test=512000' (10 runs): 13.293402 task-clock (msec) # 0.002 CPUs utilized ( +- 5.05% ) 7 context-switches # 0.534 K/sec ( +- 1.41% ) 0 cpu-migrations # 0.015 K/sec ( +-100.00% ) 80 page-faults # 0.006 M/sec ( +- 0.38% ) 34494793 cycles # 2.595 GHz ( +- 5.05% ) 9663299 instructions # 0.28 insn per cycle ( +- 1.45% ) 1767284 branches # 132.944 M/sec ( +- 1.70% ) 19798 branch-misses # 1.12% of all branches ( +- 1.18% ) 8.119681413 seconds time elapsed ( +- 0.01% ) Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=0 test_alloc_len=12 nr_test=512000 test_align=1' (10 runs): 12.289096 task-clock (msec) # 0.002 CPUs utilized ( +- 0.07% ) 7 context-switches # 0.570 K/sec ( +- 2.13% ) 0 cpu-migrations # 0.033 K/sec ( +- 66.67% ) 81 page-faults # 0.007 M/sec ( +- 0.43% ) 31886319 cycles # 2.595 GHz ( +- 0.07% ) 9468850 instructions # 0.30 insn per cycle ( +- 0.06% ) 1723487 branches # 140.245 M/sec ( +- 0.05% ) 19263 branch-misses # 1.12% of all branches ( +- 0.47% ) 8.119686950 seconds time elapsed ( +- 0.01% ) Before this patchset: Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=0 test_alloc_len=12 nr_test=512000' (10 runs): 13.320328 task-clock (msec) # 0.002 CPUs utilized ( +- 5.00% ) 7 context-switches # 0.541 K/sec ( +- 1.85% ) 0 cpu-migrations # 0.008 K/sec ( +-100.00% ) 80 page-faults # 0.006 M/sec ( +- 0.36% ) 34572091 cycles # 2.595 GHz ( +- 5.01% ) 9664910 instructions # 0.28 insn per cycle ( +- 1.51% ) 1768276 branches # 132.750 M/sec ( +- 1.80% ) 19592 branch-misses # 1.11% of all branches ( +- 1.33% ) 8.119686381 seconds time elapsed ( +- 0.01% ) Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=0 test_alloc_len=12 nr_test=512000 test_align=1' (10 runs): 12.306471 task-clock (msec) # 0.002 CPUs utilized ( +- 0.08% ) 7 context-switches # 0.585 K/sec ( +- 1.85% ) 0 cpu-migrations # 0.000 K/sec 80 page-faults # 0.007 M/sec ( +- 0.28% ) 31937686 cycles # 2.595 GHz ( +- 0.08% ) 9462218 instructions # 0.30 insn per cycle ( +- 0.08% ) 1721989 branches # 139.925 M/sec ( +- 0.07% ) 19114 branch-misses # 1.11% of all branches ( +- 0.31% ) 8.118897296 seconds time elapsed ( +- 0.00% )
On Tue, Oct 29, 2024 at 2:36 AM Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2024/10/28 23:30, Alexander Duyck wrote: > > ... > > >> > >> > > > > Is this actually the numbers for this patch set? Seems like you have > > been using the same numbers for the last several releases. I can > > Yes, as recent refactoring doesn't seems big enough that the perf data is > reused for the last several releases. > > > understand the "before" being mostly the same, but since we have > > As there is rebasing for the latest net-next tree, even the 'before' > might not be the same as the testing seems sensitive to other changing, > like binary size changing and page allocator changing during different > version. > > So it might need both the same kernel and config for 'before' and 'after'. > > > factored out the refactor portion of it the numbers for the "after" > > should have deviated as I find it highly unlikely the numbers are > > exactly the same down to the nanosecond. from the previous patch set. > Below is the the performance data for Part-1 with the latest net-next: > > Before this patchset: > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000' (200 runs): > > 17.990790 task-clock (msec) # 0.003 CPUs utilized ( +- 0.19% ) > 8 context-switches # 0.444 K/sec ( +- 0.09% ) > 0 cpu-migrations # 0.000 K/sec ( +-100.00% ) > 81 page-faults # 0.004 M/sec ( +- 0.09% ) > 46712295 cycles # 2.596 GHz ( +- 0.19% ) > 34466157 instructions # 0.74 insn per cycle ( +- 0.01% ) > 8011755 branches # 445.325 M/sec ( +- 0.01% ) > 39913 branch-misses # 0.50% of all branches ( +- 0.07% ) > > 6.382252558 seconds time elapsed ( +- 0.07% ) > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000 test_align=1' (200 runs): > > 17.638466 task-clock (msec) # 0.003 CPUs utilized ( +- 0.01% ) > 8 context-switches # 0.451 K/sec ( +- 0.20% ) > 0 cpu-migrations # 0.001 K/sec ( +- 70.53% ) > 81 page-faults # 0.005 M/sec ( +- 0.08% ) > 45794305 cycles # 2.596 GHz ( +- 0.01% ) > 34435077 instructions # 0.75 insn per cycle ( +- 0.00% ) > 8004416 branches # 453.805 M/sec ( +- 0.00% ) > 39758 branch-misses # 0.50% of all branches ( +- 0.06% ) > > 5.328976590 seconds time elapsed ( +- 0.60% ) > > > After this patchset: > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000' (200 runs): > > 18.647432 task-clock (msec) # 0.003 CPUs utilized ( +- 1.11% ) > 8 context-switches # 0.422 K/sec ( +- 0.36% ) > 0 cpu-migrations # 0.005 K/sec ( +- 22.54% ) > 81 page-faults # 0.004 M/sec ( +- 0.08% ) > 48418108 cycles # 2.597 GHz ( +- 1.11% ) > 35889299 instructions # 0.74 insn per cycle ( +- 0.11% ) > 8318363 branches # 446.086 M/sec ( +- 0.11% ) > 19263 branch-misses # 0.23% of all branches ( +- 0.13% ) > > 5.624666079 seconds time elapsed ( +- 0.07% ) > > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000 test_align=1' (200 runs): > > 18.466768 task-clock (msec) # 0.007 CPUs utilized ( +- 1.23% ) > 8 context-switches # 0.428 K/sec ( +- 0.26% ) > 0 cpu-migrations # 0.002 K/sec ( +- 34.73% ) > 81 page-faults # 0.004 M/sec ( +- 0.09% ) > 47949220 cycles # 2.597 GHz ( +- 1.23% ) > 35859039 instructions # 0.75 insn per cycle ( +- 0.12% ) > 8309086 branches # 449.948 M/sec ( +- 0.11% ) > 19246 branch-misses # 0.23% of all branches ( +- 0.08% ) > > 2.573546035 seconds time elapsed ( +- 0.04% ) > Interesting. It doesn't look like too much changed in terms of most of the metrics other than the fact that we reduced the number of branch misses by just over half. > > > > Also it wouldn't hurt to have an explanation for the 3.4->0.9 second > > performance change as it seems like the samples don't seem to match up > > with the elapsed time data. > > As there is also a 4.6->3.4 second performance change for the 'before' > part, I am not really thinking much at that. > > I am guessing some timing for implementation of ptr_ring or cpu cache > cause the above performance change? > > I used the same cpu for both pop and push thread, the performance change > doesn't seems to exist anymore, and the performance improvement doesn't > seems to exist anymore either: > > After this patchset: > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=0 test_alloc_len=12 nr_test=512000' (10 runs): > > 13.293402 task-clock (msec) # 0.002 CPUs utilized ( +- 5.05% ) > 7 context-switches # 0.534 K/sec ( +- 1.41% ) > 0 cpu-migrations # 0.015 K/sec ( +-100.00% ) > 80 page-faults # 0.006 M/sec ( +- 0.38% ) > 34494793 cycles # 2.595 GHz ( +- 5.05% ) > 9663299 instructions # 0.28 insn per cycle ( +- 1.45% ) > 1767284 branches # 132.944 M/sec ( +- 1.70% ) > 19798 branch-misses # 1.12% of all branches ( +- 1.18% ) > > 8.119681413 seconds time elapsed ( +- 0.01% ) > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=0 test_alloc_len=12 nr_test=512000 test_align=1' (10 runs): > > 12.289096 task-clock (msec) # 0.002 CPUs utilized ( +- 0.07% ) > 7 context-switches # 0.570 K/sec ( +- 2.13% ) > 0 cpu-migrations # 0.033 K/sec ( +- 66.67% ) > 81 page-faults # 0.007 M/sec ( +- 0.43% ) > 31886319 cycles # 2.595 GHz ( +- 0.07% ) > 9468850 instructions # 0.30 insn per cycle ( +- 0.06% ) > 1723487 branches # 140.245 M/sec ( +- 0.05% ) > 19263 branch-misses # 1.12% of all branches ( +- 0.47% ) > > 8.119686950 seconds time elapsed ( +- 0.01% ) > > Before this patchset: > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=0 test_alloc_len=12 nr_test=512000' (10 runs): > > 13.320328 task-clock (msec) # 0.002 CPUs utilized ( +- 5.00% ) > 7 context-switches # 0.541 K/sec ( +- 1.85% ) > 0 cpu-migrations # 0.008 K/sec ( +-100.00% ) > 80 page-faults # 0.006 M/sec ( +- 0.36% ) > 34572091 cycles # 2.595 GHz ( +- 5.01% ) > 9664910 instructions # 0.28 insn per cycle ( +- 1.51% ) > 1768276 branches # 132.750 M/sec ( +- 1.80% ) > 19592 branch-misses # 1.11% of all branches ( +- 1.33% ) > > 8.119686381 seconds time elapsed ( +- 0.01% ) > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=0 test_alloc_len=12 nr_test=512000 test_align=1' (10 runs): > > 12.306471 task-clock (msec) # 0.002 CPUs utilized ( +- 0.08% ) > 7 context-switches # 0.585 K/sec ( +- 1.85% ) > 0 cpu-migrations # 0.000 K/sec > 80 page-faults # 0.007 M/sec ( +- 0.28% ) > 31937686 cycles # 2.595 GHz ( +- 0.08% ) > 9462218 instructions # 0.30 insn per cycle ( +- 0.08% ) > 1721989 branches # 139.925 M/sec ( +- 0.07% ) > 19114 branch-misses # 1.11% of all branches ( +- 0.31% ) > > 8.118897296 seconds time elapsed ( +- 0.00% ) That isn't too surprising. Most likely you are at the mercy of the scheduler and you are just waiting for it to cycle back and forth from producer to consumer and back in order to allow you to complete the test.
On Mon, 28 Oct 2024 19:53:35 +0800 Yunsheng Lin wrote: > This is part 1 of "Replace page_frag with page_frag_cache", > which mainly contain refactoring and optimization for the > implementation of page_frag API before the replacing. Looks like Alex is happy with all of these patches. Since page_frag_cache is primarily used in networking I think it's okay for us to apply it but I wanted to ask if anyone: - thinks this shouldn't go in; - needs more time to review; - prefers to take it via their own tree.
On Tue, Nov 5, 2024 at 3:57 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 28 Oct 2024 19:53:35 +0800 Yunsheng Lin wrote: > > This is part 1 of "Replace page_frag with page_frag_cache", > > which mainly contain refactoring and optimization for the > > implementation of page_frag API before the replacing. > > Looks like Alex is happy with all of these patches. Since > page_frag_cache is primarily used in networking I think it's > okay for us to apply it but I wanted to ask if anyone: > - thinks this shouldn't go in; > - needs more time to review; > - prefers to take it via their own tree. Yeah. I was happy with the set. Just curious about the numbers as they hadn't been updated, but I am satisfied with the numbers provided after I pointed that out. - Alex
Hello: This series was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Mon, 28 Oct 2024 19:53:35 +0800 you wrote: > This is part 1 of "Replace page_frag with page_frag_cache", > which mainly contain refactoring and optimization for the > implementation of page_frag API before the replacing. > > As the discussion in [1], it would be better to target net-next > tree to get more testing as all the callers page_frag API are > in networking, and the chance of conflicting with MM tree seems > low as implementation of page_frag API seems quite self-contained. > > [...] Here is the summary with links: - [net-next,v23,1/7] mm: page_frag: add a test module for page_frag https://git.kernel.org/netdev/net-next/c/7fef0dec415c - [net-next,v23,2/7] mm: move the page fragment allocator from page_alloc into its own file https://git.kernel.org/netdev/net-next/c/65941f10caf2 - [net-next,v23,3/7] mm: page_frag: use initial zero offset for page_frag_alloc_align() https://git.kernel.org/netdev/net-next/c/8218f62c9c9b - [net-next,v23,4/7] mm: page_frag: avoid caller accessing 'page_frag_cache' directly https://git.kernel.org/netdev/net-next/c/3d18dfe69ce4 - [net-next,v23,5/7] xtensa: remove the get_order() implementation https://git.kernel.org/netdev/net-next/c/49e302be73f1 - [net-next,v23,6/7] mm: page_frag: reuse existing space for 'size' and 'pfmemalloc' https://git.kernel.org/netdev/net-next/c/0c3ce2f50261 - [net-next,v23,7/7] mm: page_frag: use __alloc_pages() to replace alloc_pages_node() https://git.kernel.org/netdev/net-next/c/ec397ea00cb3 You are awesome, thank you!