[bpf-next,00/10] Fixes for LPM trie

Message ID	20241118010808.2243555-1-houtao@huaweicloud.com (mailing list archive)
Headers	show Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A3A6A95E for <bpf@vger.kernel.org>; Mon, 18 Nov 2024 01:13:55 +0000 (UTC) From: Hou Tao <houtao@huaweicloud.com> To: bpf@vger.kernel.org Cc: Martin KaFai Lau <martin.lau@linux.dev>, Alexei Starovoitov <alexei.starovoitov@gmail.com>, Andrii Nakryiko <andrii@kernel.org>, Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>, Hao Luo <haoluo@google.com>, Yonghong Song <yonghong.song@linux.dev>, Daniel Borkmann <daniel@iogearbox.net>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@fomichev.me>, Jiri Olsa <jolsa@kernel.org>, John Fastabend <john.fastabend@gmail.com>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Thomas Gleixner <tglx@linutronix.de>, =?utf-8?q?Thomas_Wei=C3=9Fschuh?= <linux@weissschuh.net>, houtao1@huawei.com, xukuohai@huawei.com Subject: [PATCH bpf-next 00/10] Fixes for LPM trie Date: Mon, 18 Nov 2024 09:07:58 +0800 Message-Id: <20241118010808.2243555-1-houtao@huaweicloud.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Fixes for LPM trie \| expand [bpf-next,00/10] Fixes for LPM trie [bpf-next,01/10] bpf: Remove unnecessary check when updating LPM trie [bpf-next,02/10] bpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elem [bpf-next,03/10] bpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie [bpf-next,04/10] bpf: Handle in-place update for full LPM trie correctly [bpf-next,05/10] bpf: Fix exact match conditions in trie_get_next_key() [bpf-next,06/10] bpf: Add bpf_mem_cache_is_mergeable() helper [bpf-next,07/10] bpf: Switch to bpf mem allocator for LPM trie [bpf-next,08/10] bpf: Use raw_spinlock_t for LPM trie [bpf-next,09/10] selftests/bpf: Move test_lpm_map.c to map_tests [bpf-next,10/10] selftests/bpf: Add more test cases for LPM trie

Hou Tao Nov. 18, 2024, 1:07 a.m. UTC

From: Hou Tao <houtao1@huawei.com>

Hi,

This patch set fixes several issues for LPM trie. These issues were
found during adding new test cases or were reported by syzbot.

The patch set is structured as follows:

Patch #1~#2 are clean-ups for lpm_trie_update_elem().
Patch #3 handles BPF_EXIST and BPF_NOEXIST correctly for LPM trie.
Patch #4 fixes the accounting of n_entries when doing in-place update.
Patch #5 fixes the exact match condition in trie_get_next_key() and it
may skip keys when the passed key is not found in the map.
Patch #6~7 switch from kmalloc() to bpf memory allocator for LPM trie to
fix several lock order warnings reported by syzbot.
Patch #8 enables raw_spinlock_t for LPM trie again after switching to
bpf memory allocator.
Patch #9: move test_lpm_map to map_tests to make it run regularly.
Patch #10: add more basic test cases for LPM trie.

Please see individual patches for more details. Comments are always
welcome.

Hou Tao (10):
  bpf: Remove unnecessary check when updating LPM trie
  bpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elem
  bpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie
  bpf: Handle in-place update for full LPM trie correctly
  bpf: Fix exact match conditions in trie_get_next_key()
  bpf: Add bpf_mem_cache_is_mergeable() helper
  bpf: Switch to bpf mem allocator for LPM trie
  bpf: Use raw_spinlock_t for LPM trie
  selftests/bpf: Move test_lpm_map.c to map_tests
  selftests/bpf: Add more test cases for LPM trie

 include/linux/bpf_mem_alloc.h                 |   1 +
 kernel/bpf/lpm_trie.c                         | 158 +++++--
 kernel/bpf/memalloc.c                         |  12 +
 tools/testing/selftests/bpf/.gitignore        |   1 -
 tools/testing/selftests/bpf/Makefile          |   2 +-
 .../lpm_trie_map_basic_ops.c}                 | 401 +++++++++++++++++-
 6 files changed, 523 insertions(+), 52 deletions(-)
 rename tools/testing/selftests/bpf/{test_lpm_map.c => map_tests/lpm_trie_map_basic_ops.c} (64%)

Hou Tao Nov. 18, 2024, 6:20 a.m. UTC | #1

Hi,

On 11/18/2024 9:42 AM, bot+bpf-ci@kernel.org wrote:
> Dear patch submitter,
>
> CI has tested the following submission:
> Status:     SUCCESS
> Name:       [bpf-next,00/10] Fixes for LPM trie
> Patchwork:  https://patchwork.kernel.org/project/netdevbpf/list/?series=910440&state=*
> Matrix:     https://github.com/kernel-patches/bpf/actions/runs/11884065937
>
> No further action is necessary on your part.
>
>
> Please note: this email is coming from an unmonitored mailbox. If you have
> questions or feedback, please reach out to the Meta Kernel CI team at
> kernel-ci@meta.com.

I am curious about the reason on why test_maps on s390 is disabled. If I
remember correctly, test_maps on s390 was still enabled last year [1].

[1]: https://github.com/kernel-patches/bpf/actions/runs/7164372250

Sebastian Andrzej Siewior Nov. 18, 2024, 3:39 p.m. UTC | #2

On 2024-11-18 09:07:58 [+0800], Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Hi,
Hi,

> Please see individual patches for more details. Comments are always
> welcome.

This might be a coincidence but it seems I get

| helper_fill_hashmap(282):FAIL:can't update hashmap err: Unknown error -12
| test_maps: test_maps.c:1379: __run_parallel: Assertion `status == 0' failed.

more often with the series when I do ./test_maps. I never managed to
pass the test with series while it passed on v6.12. I'm not blaming the
series, just pointing this out it might be known…

In 08/10 you switch the locks to raw_spinlock_t. I was a little worried
that a lot of elements will make the while() loop go for a long time. Is
there a test for this? I run into "test_progs -a map_kptr" and noticed
something else…

Sebastian

Daniel Xu Nov. 18, 2024, 5:43 p.m. UTC | #3

Hi Hou,

On 11/17/24 22:20, Hou Tao wrote:
> Hi,
>
> On 11/18/2024 9:42 AM, bot+bpf-ci@kernel.org wrote:
>> Dear patch submitter,
>>
>> CI has tested the following submission:
>> Status:     SUCCESS
>> Name:       [bpf-next,00/10] Fixes for LPM trie
>> Patchwork:  https://patchwork.kernel.org/project/netdevbpf/list/?series=910440&state=*
>> Matrix:     https://github.com/kernel-patches/bpf/actions/runs/11884065937
>>
>> No further action is necessary on your part.
>>
>>
>> Please note: this email is coming from an unmonitored mailbox. If you have
>> questions or feedback, please reach out to the Meta Kernel CI team at
>> kernel-ci@meta.com.
> I am curious about the reason on why test_maps on s390 is disabled. If I
> remember correctly, test_maps on s390 was still enabled last year [1].
>
> [1]: https://github.com/kernel-patches/bpf/actions/runs/7164372250
>
It was disabled in 
https://github.com/kernel-patches/vmtest/commit/5489ef2bc1fed11e841d2d3485ab80ce4b390d94 
.

I recall it was kinda flakey and low signal.

Thanks,

Daniel

Hou Tao Nov. 19, 2024, 1:09 a.m. UTC | #4

Hi,

On 11/19/2024 1:43 AM, Daniel Xu wrote:
> Hi Hou,
>
> On 11/17/24 22:20, Hou Tao wrote:
>> Hi,
>>
>> On 11/18/2024 9:42 AM, bot+bpf-ci@kernel.org wrote:
>>> Dear patch submitter,
>>>
>>> CI has tested the following submission:
>>> Status:     SUCCESS
>>> Name:       [bpf-next,00/10] Fixes for LPM trie
>>> Patchwork:  https://patchwork.kernel.org/project/netdevbpf/list/?series=910440&state=*
>>> Matrix:     https://github.com/kernel-patches/bpf/actions/runs/11884065937
>>>
>>> No further action is necessary on your part.
>>>
>>>
>>> Please note: this email is coming from an unmonitored mailbox. If you have
>>> questions or feedback, please reach out to the Meta Kernel CI team at
>>> kernel-ci@meta.com.
>> I am curious about the reason on why test_maps on s390 is disabled. If I
>> remember correctly, test_maps on s390 was still enabled last year [1].
>>
>> [1]: https://github.com/kernel-patches/bpf/actions/runs/7164372250
>>
> It was disabled in 
> https://github.com/kernel-patches/vmtest/commit/5489ef2bc1fed11e841d2d3485ab80ce4b390d94 
> .
>
> I recall it was kinda flakey and low signal.

I see. Thanks for the information.
>
> Thanks,
>
> Daniel
>

Hou Tao Nov. 19, 2024, 1:35 a.m. UTC | #5

Hi Sebastian,

On 11/18/2024 11:39 PM, Sebastian Andrzej Siewior wrote:
> On 2024-11-18 09:07:58 [+0800], Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Hi,
> Hi,
>
>> Please see individual patches for more details. Comments are always
>> welcome.
> This might be a coincidence but it seems I get
>
> | helper_fill_hashmap(282):FAIL:can't update hashmap err: Unknown error -12
> | test_maps: test_maps.c:1379: __run_parallel: Assertion `status == 0' failed.
>
> more often with the series when I do ./test_maps. I never managed to
> pass the test with series while it passed on v6.12. I'm not blaming the
> series, just pointing this out it might be known…

Thanks for the information. 12 is ENOMEM, so the hash map failed to
allocate an element for it. There are multiple possible reasons for ENOMEM:

1) something is wrong for bpf mem allocator. E.g., it could not refill
the free list timely. It may be possible when running under RT, because
the irq work is threaded under RT.
2) the series causes the shortage of memory (e.g., It uses a lot memory
then free these memory, but the reclaim of the memory is slow)

Could you please share the kernel config file and the VM setup, so I
could try to reproduce the problem ?
>
> In 08/10 you switch the locks to raw_spinlock_t. I was a little worried
> that a lot of elements will make the while() loop go for a long time. Is
> there a test for this? I run into "test_progs -a map_kptr" and noticed
> something else…

The concern is the possibility of hard-lockup, right ? The total time
used for update or deletion is decided by the max_prefixlen. The typical
use case will use 32 or 128 as the max_prefixlen. The max value of
max_prefixlen is LPM_DATA_SIZE_MAX * 8 = 2048, I think the loop time
will be fine. Will try to construct some test cases for it.
>
> Sebastian

Sebastian Andrzej Siewior Nov. 19, 2024, 2:15 p.m. UTC | #6

On 2024-11-19 09:35:51 [+0800], Hou Tao wrote:
> Hi Sebastian,
Hi Hou,

> On 11/18/2024 11:39 PM, Sebastian Andrzej Siewior wrote:
> > On 2024-11-18 09:07:58 [+0800], Hou Tao wrote:
> >> From: Hou Tao <houtao1@huawei.com>
> >>
> >> Hi,
> > Hi,
> >
> >> Please see individual patches for more details. Comments are always
> >> welcome.
> > This might be a coincidence but it seems I get
> >
> > | helper_fill_hashmap(282):FAIL:can't update hashmap err: Unknown error -12
> > | test_maps: test_maps.c:1379: __run_parallel: Assertion `status == 0' failed.
> >
> > more often with the series when I do ./test_maps. I never managed to
> > pass the test with series while it passed on v6.12. I'm not blaming the
> > series, just pointing this out it might be known…
> 
> Thanks for the information. 12 is ENOMEM, so the hash map failed to
> allocate an element for it. There are multiple possible reasons for ENOMEM:
> 
> 1) something is wrong for bpf mem allocator. E.g., it could not refill
> the free list timely. It may be possible when running under RT, because
> the irq work is threaded under RT.

right. forgot to switch that one off. I had it for the initial test…

> 2) the series causes the shortage of memory (e.g., It uses a lot memory
> then free these memory, but the reclaim of the memory is slow)

> Could you please share the kernel config file and the VM setup, so I
> could try to reproduce the problem ?

I very much thing this is due to the RT switch. The irq-work and
testcase run on different CPUs so…

> > In 08/10 you switch the locks to raw_spinlock_t. I was a little worried
> > that a lot of elements will make the while() loop go for a long time. Is
> > there a test for this? I run into "test_progs -a map_kptr" and noticed
> > something else…
> 
> The concern is the possibility of hard-lockup, right ? The total time
Not lockup but spending a "visible amount of time" for the lookup.

> used for update or deletion is decided by the max_prefixlen. The typical
> use case will use 32 or 128 as the max_prefixlen. The max value of
> max_prefixlen is LPM_DATA_SIZE_MAX * 8 = 2048, I think the loop time
> will be fine. Will try to construct some test cases for it.

Okay, thank you.

Sebastian

[bpf-next,00/10] Fixes for LPM trie

Message

Comments