Context |
Check |
Description |
bpf/vmtest-bpf-next-PR |
success
|
PR summary
|
bpf/vmtest-bpf-next-VM_Test-0 |
success
|
Logs for Lint
|
bpf/vmtest-bpf-next-VM_Test-1 |
success
|
Logs for ShellCheck
|
bpf/vmtest-bpf-next-VM_Test-2 |
success
|
Logs for Unittests
|
bpf/vmtest-bpf-next-VM_Test-3 |
success
|
Logs for Validate matrix.py
|
bpf/vmtest-bpf-next-VM_Test-5 |
success
|
Logs for aarch64-gcc / build-release
|
bpf/vmtest-bpf-next-VM_Test-4 |
success
|
Logs for aarch64-gcc / build / build for aarch64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-10 |
success
|
Logs for aarch64-gcc / veristat
|
bpf/vmtest-bpf-next-VM_Test-9 |
success
|
Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-12 |
success
|
Logs for s390x-gcc / build-release
|
bpf/vmtest-bpf-next-VM_Test-6 |
success
|
Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-7 |
success
|
Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-8 |
success
|
Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-11 |
success
|
Logs for s390x-gcc / build / build for s390x with gcc
|
bpf/vmtest-bpf-next-VM_Test-16 |
success
|
Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
|
bpf/vmtest-bpf-next-VM_Test-17 |
success
|
Logs for s390x-gcc / veristat
|
bpf/vmtest-bpf-next-VM_Test-18 |
success
|
Logs for set-matrix
|
bpf/vmtest-bpf-next-VM_Test-19 |
success
|
Logs for x86_64-gcc / build / build for x86_64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-20 |
success
|
Logs for x86_64-gcc / build-release
|
bpf/vmtest-bpf-next-VM_Test-21 |
success
|
Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-22 |
success
|
Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-25 |
success
|
Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-26 |
success
|
Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-24 |
success
|
Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-31 |
success
|
Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
|
bpf/vmtest-bpf-next-VM_Test-33 |
success
|
Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
|
bpf/vmtest-bpf-next-VM_Test-23 |
success
|
Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-32 |
success
|
Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
|
bpf/vmtest-bpf-next-VM_Test-34 |
success
|
Logs for x86_64-llvm-17 / veristat
|
bpf/vmtest-bpf-next-VM_Test-39 |
success
|
Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
|
bpf/vmtest-bpf-next-VM_Test-30 |
success
|
Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
|
bpf/vmtest-bpf-next-VM_Test-36 |
success
|
Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
|
bpf/vmtest-bpf-next-VM_Test-37 |
success
|
Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
|
bpf/vmtest-bpf-next-VM_Test-27 |
success
|
Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-38 |
success
|
Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
|
bpf/vmtest-bpf-next-VM_Test-42 |
success
|
Logs for x86_64-llvm-18 / veristat
|
bpf/vmtest-bpf-next-VM_Test-41 |
success
|
Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
|
bpf/vmtest-bpf-next-VM_Test-28 |
success
|
Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
|
bpf/vmtest-bpf-next-VM_Test-35 |
success
|
Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
|
bpf/vmtest-bpf-next-VM_Test-40 |
success
|
Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
|
netdev/series_format |
success
|
Posting correctly formatted
|
netdev/tree_selection |
success
|
Clearly marked for bpf-next
|
netdev/ynl |
success
|
SINGLE THREAD;
Generated files up to date;
no warnings/errors;
no diff in generated;
|
netdev/fixes_present |
success
|
Fixes tag not required for -next series
|
netdev/header_inline |
success
|
No static functions without inline keyword in header files
|
netdev/build_32bit |
fail
|
Errors and warnings before: 1094 this patch: 1095
|
netdev/build_tools |
success
|
No tools touched, skip
|
netdev/cc_maintainers |
success
|
CCed 0 of 0 maintainers
|
netdev/build_clang |
success
|
Errors and warnings before: 1066 this patch: 1066
|
netdev/verify_signedoff |
success
|
Signed-off-by tag matches author and committer
|
netdev/deprecated_api |
success
|
None detected
|
netdev/check_selftest |
success
|
No net selftest shell script
|
netdev/verify_fixes |
success
|
No Fixes tag
|
netdev/build_allmodconfig_warn |
fail
|
Errors and warnings before: 1111 this patch: 1112
|
netdev/checkpatch |
warning
|
WARNING: Commit log lines starting with '#' are dropped by git as comments
|
netdev/build_clang_rust |
success
|
No Rust files in patch. Skipping build
|
netdev/kdoc |
success
|
Errors and warnings before: 0 this patch: 0
|
netdev/source_inline |
success
|
Was 0 now: 0
|
bpf/vmtest-bpf-next-VM_Test-29 |
success
|
Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
|
bpf/vmtest-bpf-next-VM_Test-14 |
success
|
Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
|
bpf/vmtest-bpf-next-VM_Test-15 |
success
|
Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
|
bpf/vmtest-bpf-next-VM_Test-13 |
success
|
Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
|
@@ -23,6 +23,7 @@
#include <linux/btf_ids.h>
#include <linux/bpf_mem_alloc.h>
#include <linux/kasan.h>
+#include <linux/bitops.h>
#include "../../lib/kstrtox.h"
@@ -2542,6 +2543,11 @@ __bpf_kfunc void bpf_throw(u64 cookie)
WARN(1, "A call to BPF exception callback should never return\n");
}
+__bpf_kfunc unsigned long bpf_ffs64(u64 word)
+{
+ return __ffs64(word);
+}
+
__bpf_kfunc_end_defs();
BTF_SET8_START(generic_btf_ids)
@@ -2573,6 +2579,7 @@ BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
#endif
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_throw)
+BTF_ID_FLAGS(func, bpf_ffs64)
BTF_SET8_END(generic_btf_ids)
static const struct btf_kfunc_id_set generic_kfunc_set = {
On XDP-based virtual network gateway, ffs (aka find first set) algorithm is used to find the index of the very first 1-value bit in a bitmap, which is an array of u64, in the gateway's ACL module. The ACL module was designed from these two papers: * "eBPF / XDP based firewall and packet filtering"[1] * "Securing Linux with a Faster and Scalable Iptables"[2] In the ACL module, the key details are: 1. Match source address to get a bitmap. 2. Match destination address to get a bitmap. 3. Match l4 protocol to get a bitmap. 4. Match source port to get a bitmap. 5. Match destination port to get a bitmap. Finally, by traversing these 5 bitmaps and doing bitwise-and on 5 u64s meanwhile, for every bitwise-and result, an u64, if it's not zero, do ffs to find the index of the very first 1-value bit in the result. When the index is found, convert it to a rule index of a rule policy bpf map, whose type is BPF_MAP_TYPE_ARRAY or BPF_MAP_TYPE_PERCPU_ARRAY. If __ffs64() kernel function can be reused in bpf, it can save some time in finding the index of the very first 1-value bit in an u64. Like AVX2, __ffs64() will be compiled to one instruction, "rep bsf", on x86. Then, I do compare bpf-implemented __ffs64() with this kfunc bpf_ffs64() with following bpf code snippet: #include "vmlinux.h" #include "bpf/bpf_helpers.h" unsigned long bpf_ffs64(u64 word) __ksym; static __noinline __u64 __ffs64(__u64 word) { __u64 shift = 0; if ((word & 0xffffffff) == 0) { word >>= 32; shift += 32; } if ((word & 0xffff) == 0) { word >>= 16; shift += 16; } if ((word & 0xff) == 0) { word >>= 8; shift += 8; } if ((word & 0xf) == 0) { word >>= 4; shift += 4; } if ((word & 0x3) == 0) { word >>= 2; shift += 2; } if ((word & 0x1) == 0) { shift += 1; } return shift; } SEC("tc") int tc_ffs1(struct __sk_buff *skb) { void *data_end = (void *)(long) skb->data_end; u64 *data = (u64 *)(long) skb->data; if ((void *)(u64) (data + 1) > data_end) return 0; return __ffs64(*data); } SEC("tc") int tc_ffs2(struct __sk_buff *skb) { void *data_end = (void *)(long) skb->data_end; u64 *data = (u64 *)(long) skb->data; if ((void *)(u64) (data + 1) > data_end) return 0; return bpf_ffs64(*data); } char _license[] SEC("license") = "GPL"; Then, I run them on a KVM-based VM, which runs on a 48 cores and "Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz" CPU server. As for the 1-value bit offset is 0, and for every time the bpf progs run for 10000000 times, the average time cost data of bpf progs running is: +----------+---------------+-------------------+ | Nth time | bpf __ffs64() | kfunc bpf_ffs64() | +----------+---------------+-------------------+ | 1 | 164ns | 154ns | | 2 | 166ns | 155ns | | 3 | 160ns | 154ns | | 4 | 161ns | 157ns | | 5 | 161ns | 155ns | | 6 | 163ns | 155ns | | 7 | 164ns | 155ns | | 8 | 159ns | 159ns | | 9 | 171ns | 154ns | | 10 | 164ns | 156ns | | 11 | 161ns | 155ns | | 12 | 160ns | 155ns | | 13 | 161ns | 154ns | | 14 | 165ns | 154ns | | 15 | 161ns | 162ns | | 16 | 161ns | 157ns | | 17 | 164ns | 154ns | | 18 | 162ns | 154ns | | 19 | 159ns | 156ns | | 20 | 160ns | 154ns | +----------+---------------+-------------------+ As for the 1-value bit offset is 63, and for every time the bpf progs run for 10000000 times, the average time cost data of bpf progs running is: +----------+---------------+-------------------+ | Nth time | bpf __ffs64() | kfunc bpf_ffs64() | +----------+---------------+-------------------+ | 1 | 163ns | 157ns | | 2 | 163ns | 154ns | | 3 | 165ns | 155ns | | 4 | 167ns | 155ns | | 5 | 165ns | 155ns | | 6 | 163ns | 155ns | | 7 | 162ns | 155ns | | 8 | 162ns | 156ns | | 9 | 174ns | 155ns | | 10 | 162ns | 156ns | | 11 | 168ns | 155ns | | 12 | 169ns | 156ns | | 13 | 162ns | 155ns | | 14 | 169ns | 155ns | | 15 | 162ns | 154ns | | 16 | 163ns | 155ns | | 17 | 162ns | 154ns | | 18 | 166ns | 154ns | | 19 | 165ns | 154ns | | 20 | 165ns | 154ns | +----------+---------------+-------------------+ As we can see, for every time, bpf __ffs64() costs around 165ns, and kfunc bpf_ffs64() costs around 155ns. It seems that kfunc bpf_ffs64() saves 10ns for every time. If there is 1m PPS on the gateway, kfunc bpf_ffs64() will save much CPU resource. Links: [1] http://vger.kernel.org/lpc_net2018_talks/ebpf-firewall-paper-LPC.pdf [2] https://mbertrone.github.io/documents/21-Securing_Linux_with_a_Faster_and_Scalable_Iptables.pdf Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> --- kernel/bpf/helpers.c | 7 +++++++ 1 file changed, 7 insertions(+)