diff mbox series

[bpf-next,v2,2/2] selftests/bpf: Fix flaky test_btf_id test

Message ID 20231205060455.3577644-1-yonghong.song@linux.dev (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [bpf-next,v2,1/2] bpf: Fix a race condition between btf_put() and map_free() | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-3 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf-next
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers warning 12 maintainers not CCed: haoluo@google.com jolsa@kernel.org kpsingh@kernel.org martin.lau@linux.dev john.fastabend@gmail.com mykolal@fb.com song@kernel.org shuah@kernel.org sdf@google.com lorenz.bauer@isovalent.com iii@linux.ibm.com linux-kselftest@vger.kernel.org
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 7 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-9 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-15 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-16 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-llvm-16 / build / build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-llvm-16 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-llvm-16 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-llvm-16 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-16 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-16 / veristat
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-5 success Logs for set-matrix

Commit Message

Yonghong Song Dec. 5, 2023, 6:04 a.m. UTC
With previous patch, one of subtests in test_btf_id becomes
flaky and may fail. The following is a failing example:

  Error: #26 btf
  Error: #26/174 btf/BTF ID
    Error: #26/174 btf/BTF ID
    btf_raw_create:PASS:check 0 nsec
    btf_raw_create:PASS:check 0 nsec
    test_btf_id:PASS:check 0 nsec
    ...
    test_btf_id:PASS:check 0 nsec
    test_btf_id:FAIL:check BTF lingersdo_test_get_info:FAIL:check failed: -1

The test tries to prove a btf_id not available after the map is closed.
But btf_id is freed only after workqueue and a rcu grace period, compared
to previous case just after a rcu grade period.

To fix the flaky test, I added a kern_sync_rcu() after closing map and
before querying btf id availability, essentially ensuring a rcu grace
period in the kernel, which seems making the test happy.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 tools/testing/selftests/bpf/prog_tests/btf.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Hou Tao Dec. 5, 2023, 6:39 a.m. UTC | #1
Hi,

On 12/5/2023 2:04 PM, Yonghong Song wrote:
> With previous patch, one of subtests in test_btf_id becomes
> flaky and may fail. The following is a failing example:
>
>   Error: #26 btf
>   Error: #26/174 btf/BTF ID
>     Error: #26/174 btf/BTF ID
>     btf_raw_create:PASS:check 0 nsec
>     btf_raw_create:PASS:check 0 nsec
>     test_btf_id:PASS:check 0 nsec
>     ...
>     test_btf_id:PASS:check 0 nsec
>     test_btf_id:FAIL:check BTF lingersdo_test_get_info:FAIL:check failed: -1
>
> The test tries to prove a btf_id not available after the map is closed.
> But btf_id is freed only after workqueue and a rcu grace period, compared
> to previous case just after a rcu grade period.

It is not accurate. Before applying the patch, the btf_id will be
released in btf_put() and there is no RCU grace period involved. After
applying the patch, the btf_id will be released after the running of
bpf_map_free_deferred kworker.
>
> To fix the flaky test, I added a kern_sync_rcu() after closing map and
> before querying btf id availability, essentially ensuring a rcu grace
> period in the kernel, which seems making the test happy.

kern_sync_rcu() doesn't guarantee the bpf_map_free_deferred kworker will
complete, so why not remove the test case instead ?
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  tools/testing/selftests/bpf/prog_tests/btf.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/btf.c b/tools/testing/selftests/bpf/prog_tests/btf.c
> index 8fb4a04fbbc0..7feb4223bbac 100644
> --- a/tools/testing/selftests/bpf/prog_tests/btf.c
> +++ b/tools/testing/selftests/bpf/prog_tests/btf.c
> @@ -4629,6 +4629,7 @@ static int test_btf_id(unsigned int test_num)
>  
>  	/* The map holds the last ref to BTF and its btf_id */
>  	close(map_fd);
> +	kern_sync_rcu();
>  	map_fd = -1;
>  	btf_fd[0] = bpf_btf_get_fd_by_id(map_info.btf_id);
>  	if (CHECK(btf_fd[0] >= 0, "BTF lingers")) {
Yonghong Song Dec. 5, 2023, 7:10 a.m. UTC | #2
On 12/5/23 1:39 AM, Hou Tao wrote:
> Hi,
>
> On 12/5/2023 2:04 PM, Yonghong Song wrote:
>> With previous patch, one of subtests in test_btf_id becomes
>> flaky and may fail. The following is a failing example:
>>
>>    Error: #26 btf
>>    Error: #26/174 btf/BTF ID
>>      Error: #26/174 btf/BTF ID
>>      btf_raw_create:PASS:check 0 nsec
>>      btf_raw_create:PASS:check 0 nsec
>>      test_btf_id:PASS:check 0 nsec
>>      ...
>>      test_btf_id:PASS:check 0 nsec
>>      test_btf_id:FAIL:check BTF lingersdo_test_get_info:FAIL:check failed: -1
>>
>> The test tries to prove a btf_id not available after the map is closed.
>> But btf_id is freed only after workqueue and a rcu grace period, compared
>> to previous case just after a rcu grade period.
> It is not accurate. Before applying the patch, the btf_id will be
> released in btf_put() and there is no RCU grace period involved. After

I missed it (and because I didn't double check the code).
Yes, btf_id is freed before going to rcu gp. So previously
reliable test now becomes not reliable due to workqueue.


> applying the patch, the btf_id will be released after the running of
> bpf_map_free_deferred kworker.
>> To fix the flaky test, I added a kern_sync_rcu() after closing map and
>> before querying btf id availability, essentially ensuring a rcu grace
>> period in the kernel, which seems making the test happy.
> kern_sync_rcu() doesn't guarantee the bpf_map_free_deferred kworker will
> complete, so why not remove the test case instead ?

Yes, I understand this. My hope is that kern_sync_rcu() can
make the test stable enough (that is why I am using 'seems making')
but no guarantees.

For this particular case, if I am doing refcount for btf as mentioned
in the comments of previous patch, we should be okay.

Will craft another version tomorrow with btf refcount approach.

>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   tools/testing/selftests/bpf/prog_tests/btf.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/tools/testing/selftests/bpf/prog_tests/btf.c b/tools/testing/selftests/bpf/prog_tests/btf.c
>> index 8fb4a04fbbc0..7feb4223bbac 100644
>> --- a/tools/testing/selftests/bpf/prog_tests/btf.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/btf.c
>> @@ -4629,6 +4629,7 @@ static int test_btf_id(unsigned int test_num)
>>   
>>   	/* The map holds the last ref to BTF and its btf_id */
>>   	close(map_fd);
>> +	kern_sync_rcu();
>>   	map_fd = -1;
>>   	btf_fd[0] = bpf_btf_get_fd_by_id(map_info.btf_id);
>>   	if (CHECK(btf_fd[0] >= 0, "BTF lingers")) {
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/prog_tests/btf.c b/tools/testing/selftests/bpf/prog_tests/btf.c
index 8fb4a04fbbc0..7feb4223bbac 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf.c
@@ -4629,6 +4629,7 @@  static int test_btf_id(unsigned int test_num)
 
 	/* The map holds the last ref to BTF and its btf_id */
 	close(map_fd);
+	kern_sync_rcu();
 	map_fd = -1;
 	btf_fd[0] = bpf_btf_get_fd_by_id(map_info.btf_id);
 	if (CHECK(btf_fd[0] >= 0, "BTF lingers")) {