Message ID | 20241118010808.2243555-1-houtao@huaweicloud.com (mailing list archive) |
---|---|
Headers | show |
Series | Fixes for LPM trie | expand |
Hi, On 11/18/2024 9:42 AM, bot+bpf-ci@kernel.org wrote: > Dear patch submitter, > > CI has tested the following submission: > Status: SUCCESS > Name: [bpf-next,00/10] Fixes for LPM trie > Patchwork: https://patchwork.kernel.org/project/netdevbpf/list/?series=910440&state=* > Matrix: https://github.com/kernel-patches/bpf/actions/runs/11884065937 > > No further action is necessary on your part. > > > Please note: this email is coming from an unmonitored mailbox. If you have > questions or feedback, please reach out to the Meta Kernel CI team at > kernel-ci@meta.com. I am curious about the reason on why test_maps on s390 is disabled. If I remember correctly, test_maps on s390 was still enabled last year [1]. [1]: https://github.com/kernel-patches/bpf/actions/runs/7164372250
On 2024-11-18 09:07:58 [+0800], Hou Tao wrote: > From: Hou Tao <houtao1@huawei.com> > > Hi, Hi, > Please see individual patches for more details. Comments are always > welcome. This might be a coincidence but it seems I get | helper_fill_hashmap(282):FAIL:can't update hashmap err: Unknown error -12 | test_maps: test_maps.c:1379: __run_parallel: Assertion `status == 0' failed. more often with the series when I do ./test_maps. I never managed to pass the test with series while it passed on v6.12. I'm not blaming the series, just pointing this out it might be known… In 08/10 you switch the locks to raw_spinlock_t. I was a little worried that a lot of elements will make the while() loop go for a long time. Is there a test for this? I run into "test_progs -a map_kptr" and noticed something else… Sebastian
Hi Hou, On 11/17/24 22:20, Hou Tao wrote: > Hi, > > On 11/18/2024 9:42 AM, bot+bpf-ci@kernel.org wrote: >> Dear patch submitter, >> >> CI has tested the following submission: >> Status: SUCCESS >> Name: [bpf-next,00/10] Fixes for LPM trie >> Patchwork: https://patchwork.kernel.org/project/netdevbpf/list/?series=910440&state=* >> Matrix: https://github.com/kernel-patches/bpf/actions/runs/11884065937 >> >> No further action is necessary on your part. >> >> >> Please note: this email is coming from an unmonitored mailbox. If you have >> questions or feedback, please reach out to the Meta Kernel CI team at >> kernel-ci@meta.com. > I am curious about the reason on why test_maps on s390 is disabled. If I > remember correctly, test_maps on s390 was still enabled last year [1]. > > [1]: https://github.com/kernel-patches/bpf/actions/runs/7164372250 > It was disabled in https://github.com/kernel-patches/vmtest/commit/5489ef2bc1fed11e841d2d3485ab80ce4b390d94 . I recall it was kinda flakey and low signal. Thanks, Daniel
Hi, On 11/19/2024 1:43 AM, Daniel Xu wrote: > Hi Hou, > > On 11/17/24 22:20, Hou Tao wrote: >> Hi, >> >> On 11/18/2024 9:42 AM, bot+bpf-ci@kernel.org wrote: >>> Dear patch submitter, >>> >>> CI has tested the following submission: >>> Status: SUCCESS >>> Name: [bpf-next,00/10] Fixes for LPM trie >>> Patchwork: https://patchwork.kernel.org/project/netdevbpf/list/?series=910440&state=* >>> Matrix: https://github.com/kernel-patches/bpf/actions/runs/11884065937 >>> >>> No further action is necessary on your part. >>> >>> >>> Please note: this email is coming from an unmonitored mailbox. If you have >>> questions or feedback, please reach out to the Meta Kernel CI team at >>> kernel-ci@meta.com. >> I am curious about the reason on why test_maps on s390 is disabled. If I >> remember correctly, test_maps on s390 was still enabled last year [1]. >> >> [1]: https://github.com/kernel-patches/bpf/actions/runs/7164372250 >> > It was disabled in > https://github.com/kernel-patches/vmtest/commit/5489ef2bc1fed11e841d2d3485ab80ce4b390d94 > . > > I recall it was kinda flakey and low signal. I see. Thanks for the information. > > Thanks, > > Daniel >
Hi Sebastian, On 11/18/2024 11:39 PM, Sebastian Andrzej Siewior wrote: > On 2024-11-18 09:07:58 [+0800], Hou Tao wrote: >> From: Hou Tao <houtao1@huawei.com> >> >> Hi, > Hi, > >> Please see individual patches for more details. Comments are always >> welcome. > This might be a coincidence but it seems I get > > | helper_fill_hashmap(282):FAIL:can't update hashmap err: Unknown error -12 > | test_maps: test_maps.c:1379: __run_parallel: Assertion `status == 0' failed. > > more often with the series when I do ./test_maps. I never managed to > pass the test with series while it passed on v6.12. I'm not blaming the > series, just pointing this out it might be known… Thanks for the information. 12 is ENOMEM, so the hash map failed to allocate an element for it. There are multiple possible reasons for ENOMEM: 1) something is wrong for bpf mem allocator. E.g., it could not refill the free list timely. It may be possible when running under RT, because the irq work is threaded under RT. 2) the series causes the shortage of memory (e.g., It uses a lot memory then free these memory, but the reclaim of the memory is slow) Could you please share the kernel config file and the VM setup, so I could try to reproduce the problem ? > > In 08/10 you switch the locks to raw_spinlock_t. I was a little worried > that a lot of elements will make the while() loop go for a long time. Is > there a test for this? I run into "test_progs -a map_kptr" and noticed > something else… The concern is the possibility of hard-lockup, right ? The total time used for update or deletion is decided by the max_prefixlen. The typical use case will use 32 or 128 as the max_prefixlen. The max value of max_prefixlen is LPM_DATA_SIZE_MAX * 8 = 2048, I think the loop time will be fine. Will try to construct some test cases for it. > > Sebastian
On 2024-11-19 09:35:51 [+0800], Hou Tao wrote: > Hi Sebastian, Hi Hou, > On 11/18/2024 11:39 PM, Sebastian Andrzej Siewior wrote: > > On 2024-11-18 09:07:58 [+0800], Hou Tao wrote: > >> From: Hou Tao <houtao1@huawei.com> > >> > >> Hi, > > Hi, > > > >> Please see individual patches for more details. Comments are always > >> welcome. > > This might be a coincidence but it seems I get > > > > | helper_fill_hashmap(282):FAIL:can't update hashmap err: Unknown error -12 > > | test_maps: test_maps.c:1379: __run_parallel: Assertion `status == 0' failed. > > > > more often with the series when I do ./test_maps. I never managed to > > pass the test with series while it passed on v6.12. I'm not blaming the > > series, just pointing this out it might be known… > > Thanks for the information. 12 is ENOMEM, so the hash map failed to > allocate an element for it. There are multiple possible reasons for ENOMEM: > > 1) something is wrong for bpf mem allocator. E.g., it could not refill > the free list timely. It may be possible when running under RT, because > the irq work is threaded under RT. right. forgot to switch that one off. I had it for the initial test… > 2) the series causes the shortage of memory (e.g., It uses a lot memory > then free these memory, but the reclaim of the memory is slow) > Could you please share the kernel config file and the VM setup, so I > could try to reproduce the problem ? I very much thing this is due to the RT switch. The irq-work and testcase run on different CPUs so… > > In 08/10 you switch the locks to raw_spinlock_t. I was a little worried > > that a lot of elements will make the while() loop go for a long time. Is > > there a test for this? I run into "test_progs -a map_kptr" and noticed > > something else… > > The concern is the possibility of hard-lockup, right ? The total time Not lockup but spending a "visible amount of time" for the lookup. > used for update or deletion is decided by the max_prefixlen. The typical > use case will use 32 or 128 as the max_prefixlen. The max value of > max_prefixlen is LPM_DATA_SIZE_MAX * 8 = 2048, I think the loop time > will be fine. Will try to construct some test cases for it. Okay, thank you. Sebastian
From: Hou Tao <houtao1@huawei.com> Hi, This patch set fixes several issues for LPM trie. These issues were found during adding new test cases or were reported by syzbot. The patch set is structured as follows: Patch #1~#2 are clean-ups for lpm_trie_update_elem(). Patch #3 handles BPF_EXIST and BPF_NOEXIST correctly for LPM trie. Patch #4 fixes the accounting of n_entries when doing in-place update. Patch #5 fixes the exact match condition in trie_get_next_key() and it may skip keys when the passed key is not found in the map. Patch #6~7 switch from kmalloc() to bpf memory allocator for LPM trie to fix several lock order warnings reported by syzbot. Patch #8 enables raw_spinlock_t for LPM trie again after switching to bpf memory allocator. Patch #9: move test_lpm_map to map_tests to make it run regularly. Patch #10: add more basic test cases for LPM trie. Please see individual patches for more details. Comments are always welcome. Hou Tao (10): bpf: Remove unnecessary check when updating LPM trie bpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elem bpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie bpf: Handle in-place update for full LPM trie correctly bpf: Fix exact match conditions in trie_get_next_key() bpf: Add bpf_mem_cache_is_mergeable() helper bpf: Switch to bpf mem allocator for LPM trie bpf: Use raw_spinlock_t for LPM trie selftests/bpf: Move test_lpm_map.c to map_tests selftests/bpf: Add more test cases for LPM trie include/linux/bpf_mem_alloc.h | 1 + kernel/bpf/lpm_trie.c | 158 +++++-- kernel/bpf/memalloc.c | 12 + tools/testing/selftests/bpf/.gitignore | 1 - tools/testing/selftests/bpf/Makefile | 2 +- .../lpm_trie_map_basic_ops.c} | 401 +++++++++++++++++- 6 files changed, 523 insertions(+), 52 deletions(-) rename tools/testing/selftests/bpf/{test_lpm_map.c => map_tests/lpm_trie_map_basic_ops.c} (64%)