Message ID | 20221110184303.393179-1-hbathini@linux.ibm.com (mailing list archive) |
---|---|
Headers | show |
Series | enable bpf_prog_pack allocator for powerpc | expand |
Le 10/11/2022 à 19:43, Hari Bathini a écrit : > Most BPF programs are small, but they consume a page each. For systems > with busy traffic and many BPF programs, this may also add significant > pressure on instruction TLB. High iTLB pressure usually slows down the > whole system causing visible performance degradation for production > workloads. > > bpf_prog_pack, a customized allocator that packs multiple bpf programs > into preallocated memory chunks, was proposed [1] to address it. This > series extends this support on powerpc. > > Patches 1 & 2 add the arch specific functions needed to support this > feature. Patch 3 enables the support for powerpc. The last patch > ensures cleanup is handled racefully. > > Tested the changes successfully on a PowerVM. patch_instruction(), > needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it. > Posting the patches in the meanwhile for feedback on these changes. I did a quick test on ppc32, I don't get such a problem, only something wrong in the dump print as traps intructions only are dumped, but tcpdump works as expected: [ 55.692998] bpf_jit_enable = 2 was set! NEVER use this in production, only for JIT debugging! [ 66.279259] device eth0 entered promiscuous mode [ 67.214756] Pass 1: shrink = 0, seen = 0x1f980000 [ 67.214880] Pass 2: shrink = 0, seen = 0x1f980000 [ 67.214966] flen=5 proglen=60 pass=3 image=be7a8038 from=tcpdump pid=459 [ 67.225261] JIT code: 00000000: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.233904] JIT code: 00000010: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.242579] JIT code: 00000020: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.249694] JIT code: 00000030: 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.259255] Pass 1: shrink = 0, seen = 0x3ff801fe [ 67.259421] Pass 2: shrink = 0, seen = 0x3ff801fe [ 67.259514] flen=40 proglen=504 pass=3 image=be7a80a0 from=tcpdump pid=459 [ 67.269467] JIT code: 00000000: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.278001] JIT code: 00000010: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.286519] JIT code: 00000020: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.295041] JIT code: 00000030: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.303596] JIT code: 00000040: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.312164] JIT code: 00000050: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.319231] JIT code: 00000060: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.328822] JIT code: 00000070: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.337382] JIT code: 00000080: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.345901] JIT code: 00000090: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.354423] JIT code: 000000a0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.362941] JIT code: 000000b0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.371462] JIT code: 000000c0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.378526] JIT code: 000000d0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.388120] JIT code: 000000e0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.396680] JIT code: 000000f0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.405199] JIT code: 00000100: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.413756] JIT code: 00000110: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.422324] JIT code: 00000120: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.429389] JIT code: 00000130: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.438982] JIT code: 00000140: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.447541] JIT code: 00000150: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.456059] JIT code: 00000160: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.464578] JIT code: 00000170: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.473201] JIT code: 00000180: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.481705] JIT code: 00000190: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.488770] JIT code: 000001a0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.498359] JIT code: 000001b0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.506921] JIT code: 000001c0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.515439] JIT code: 000001d0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.523998] JIT code: 000001e0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 7f e0 00 08 [ 67.532565] JIT code: 000001f0: 7f e0 00 08 7f e0 00 08 [ 82.620898] device eth0 left promiscuous mode > > [1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/ > > Hari Bathini (3): > powerpc/bpf: implement bpf_arch_text_copy > powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack > powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free] > > arch/powerpc/net/bpf_jit.h | 18 +-- > arch/powerpc/net/bpf_jit_comp.c | 194 ++++++++++++++++++++++++------ > arch/powerpc/net/bpf_jit_comp32.c | 26 ++-- > arch/powerpc/net/bpf_jit_comp64.c | 32 ++--- > 4 files changed, 198 insertions(+), 72 deletions(-) >
Hi Christophe, On 11/11/22 4:55 pm, Christophe Leroy wrote: > Le 10/11/2022 à 19:43, Hari Bathini a écrit : >> Most BPF programs are small, but they consume a page each. For systems >> with busy traffic and many BPF programs, this may also add significant >> pressure on instruction TLB. High iTLB pressure usually slows down the >> whole system causing visible performance degradation for production >> workloads. >> >> bpf_prog_pack, a customized allocator that packs multiple bpf programs >> into preallocated memory chunks, was proposed [1] to address it. This >> series extends this support on powerpc. >> >> Patches 1 & 2 add the arch specific functions needed to support this >> feature. Patch 3 enables the support for powerpc. The last patch >> ensures cleanup is handled racefully. >> >> Tested the changes successfully on a PowerVM. patch_instruction(), >> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it. >> Posting the patches in the meanwhile for feedback on these changes. > > I did a quick test on ppc32, I don't get such a problem, only something > wrong in the dump print as traps intructions only are dumped, but > tcpdump works as expected: Thanks for the quick test. Could you please share the config you used. I am probably missing a few knobs in my conifg... Thanks Hari
On 16/11/22 12:14 am, Christophe Leroy wrote: > > > Le 14/11/2022 à 18:27, Christophe Leroy a écrit : >> >> >> Le 14/11/2022 à 15:47, Hari Bathini a écrit : >>> Hi Christophe, >>> >>> On 11/11/22 4:55 pm, Christophe Leroy wrote: >>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit : >>>>> Most BPF programs are small, but they consume a page each. For systems >>>>> with busy traffic and many BPF programs, this may also add significant >>>>> pressure on instruction TLB. High iTLB pressure usually slows down the >>>>> whole system causing visible performance degradation for production >>>>> workloads. >>>>> >>>>> bpf_prog_pack, a customized allocator that packs multiple bpf programs >>>>> into preallocated memory chunks, was proposed [1] to address it. This >>>>> series extends this support on powerpc. >>>>> >>>>> Patches 1 & 2 add the arch specific functions needed to support this >>>>> feature. Patch 3 enables the support for powerpc. The last patch >>>>> ensures cleanup is handled racefully. >>>>> >>> >>>>> Tested the changes successfully on a PowerVM. patch_instruction(), >>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it. >>>>> Posting the patches in the meanwhile for feedback on these changes. >>>> >>>> I did a quick test on ppc32, I don't get such a problem, only something >>>> wrong in the dump print as traps intructions only are dumped, but >>>> tcpdump works as expected: >>> >>> Thanks for the quick test. Could you please share the config you used. >>> I am probably missing a few knobs in my conifg... >>> >> > > I also managed to test it on QEMU. The config is based on pmac32_defconfig. I had the same config but hit this problem: # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf test_bpf: #0 TAX ------------[ cut here ]------------ WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 bpf_int_jit_compile+0x8a0/0x9f8 Modules linked in: test_bpf(+) CPU: 0 PID: 96 Comm: modprobe Not tainted 6.1.0-rc5+ #116 Hardware name: PowerMac3,1 750CL 0x87210 PowerMac NIP: c0038224 LR: c0037f74 CTR: 00000009 REGS: f1b41b10 TRAP: 0700 Not tainted (6.1.0-rc5+) MSR: 00029032 <EE,ME,IR,DR,RI> CR: 82004882 XER: 20000000 GPR00: c0037f74 f1b41bd0 c12333c0 ffffffea c19747e0 c0123a08 c1974a00 c12333c0 GPR08: 00000000 00000b58 00009032 00000003 82004882 100d816a f2660000 00000000 GPR16: 00000000 c0c10000 00000000 c0c10000 c1913780 00000000 0000001a 100a2ee3 GPR24: 0000014c be857000 be857020 f1b41c00 c194c180 f1b46000 ffffffea f1b46000 NIP [c0038224] bpf_int_jit_compile+0x8a0/0x9f8 LR [c0037f74] bpf_int_jit_compile+0x5f0/0x9f8 Call Trace: [f1b41bd0] [c0037f74] bpf_int_jit_compile+0x5f0/0x9f8 (unreliable) [f1b41ca0] [c0123790] bpf_prog_select_runtime+0x178/0x1c4 [f1b41cd0] [c06cc4b0] bpf_prepare_filter+0x524/0x624 [f1b41d20] [c06cc63c] bpf_prog_create+0x8c/0xd0 [f1b41d40] [be85083c] test_bpf_init+0x46c/0x11b0 [test_bpf] [f1b41df0] [c0008534] do_one_initcall+0x58/0x1b8 [f1b41e60] [c00b6c38] do_init_module+0x54/0x1e4 [f1b41e80] [c00b8f80] sys_init_module+0x10c/0x174 [f1b41f10] [c0014390] system_call_exception+0x94/0x144 [f1b41f30] [c001a1ac] ret_from_syscall+0x0/0x2c --- interrupt: c00 at 0xfef48ac NIP: 0fef48ac LR: 10015de0 CTR: 00000000 REGS: f1b41f40 TRAP: 0c00 Not tainted (6.1.0-rc5+) MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 88002224 XER: 20000000 GPR00: 00000080 aff34990 a7a8d380 a75c0010 00477e90 1051b9a0 0fedfdd8 0000d032 GPR08: 00000000 00000008 00478ffa 400f26fa 400f23c7 100d816a 00000000 00000000 GPR16: 00000000 1051b950 00000000 1051b92c 100d0000 100a2eab 100a2e97 100a2ee3 GPR24: 100a2ed5 1051b980 00000001 100d01a8 1051b920 a75c0010 1051b9a0 a7a86388 NIP [0fef48ac] 0xfef48ac LR [10015de0] 0x10015de0 --- interrupt: c00 Instruction dump: 8081001c 38a00004 7f23cb78 4bfff5d1 7f23cb78 8081001c 480eba85 82410098 8261009c 82a100a4 82e100ac 4bfffce8 <0fe00000> 92010090 92410098 92e100ac ---[ end trace 0000000000000000 ]--- jited:1 kernel tried to execute exec-protected page (be857020) - exploit attempt? (uid: 0) BUG: Unable to handle kernel instruction fetch Faulting instruction address: 0xbe857020 Vector: 400 (Instruction Access) at [f1b41c30] pc: be857020 lr: be84eaa4: __run_one+0xec/0x248 [test_bpf] sp: f1b41cf0 msr: 40009032 current = 0xc12333c0 pid = 96, comm = modprobe Linux version 6.1.0-rc5+ (root@hbathini-Standard-PC-Q35-ICH9-2009) (powerpc-linux-gnu-gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #116 Wed Nov 16 20:48:10 IST 2022 enter ? for help [link register ] be84eaa4 __run_one+0xec/0x248 [test_bpf] [f1b41cf0] be84e9dc __run_one+0x24/0x248 [test_bpf] (unreliable) [f1b41d40] be850c78 test_bpf_init+0x8a8/0x11b0 [test_bpf] [f1b41df0] c0008534 do_one_initcall+0x58/0x1b8 [f1b41e60] c00b6c38 do_init_module+0x54/0x1e4 [f1b41e80] c00b8f80 sys_init_module+0x10c/0x174 [f1b41f10] c0014390 system_call_exception+0x94/0x144 [f1b41f30] c001a1ac ret_from_syscall+0x0/0x2c --- Exception: c00 (System Call) at 0fef48ac SP (aff34990) is in userspace mon> bpf_jit_binary_pack_finalize() function failed due to patch_instruction() .. Also, I am hitting the same problem with the other config too.. Thanks Hari
Le 16/11/2022 à 18:01, Hari Bathini a écrit : >> >> I also managed to test it on QEMU. The config is based on >> pmac32_defconfig. > > I had the same config but hit this problem: > > # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf > test_bpf: #0 TAX > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 > bpf_int_jit_compile+0x8a0/0x9f8 Ok, I'll give it a try with test_bpf module. Christophe
Le 16/11/2022 à 18:01, Hari Bathini a écrit : > > > On 16/11/22 12:14 am, Christophe Leroy wrote: >> >> >> Le 14/11/2022 à 18:27, Christophe Leroy a écrit : >>> >>> >>> Le 14/11/2022 à 15:47, Hari Bathini a écrit : >>>> Hi Christophe, >>>> >>>> On 11/11/22 4:55 pm, Christophe Leroy wrote: >>>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit : >>>>>> Most BPF programs are small, but they consume a page each. For >>>>>> systems >>>>>> with busy traffic and many BPF programs, this may also add >>>>>> significant >>>>>> pressure on instruction TLB. High iTLB pressure usually slows down >>>>>> the >>>>>> whole system causing visible performance degradation for production >>>>>> workloads. >>>>>> >>>>>> bpf_prog_pack, a customized allocator that packs multiple bpf >>>>>> programs >>>>>> into preallocated memory chunks, was proposed [1] to address it. This >>>>>> series extends this support on powerpc. >>>>>> >>>>>> Patches 1 & 2 add the arch specific functions needed to support this >>>>>> feature. Patch 3 enables the support for powerpc. The last patch >>>>>> ensures cleanup is handled racefully. >>>>>> >>>> >>>>>> Tested the changes successfully on a PowerVM. patch_instruction(), >>>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it. >>>>>> Posting the patches in the meanwhile for feedback on these changes. >>>>> >>>>> I did a quick test on ppc32, I don't get such a problem, only >>>>> something >>>>> wrong in the dump print as traps intructions only are dumped, but >>>>> tcpdump works as expected: >>>> >>>> Thanks for the quick test. Could you please share the config you used. >>>> I am probably missing a few knobs in my conifg... >>>> >>> >> >> I also managed to test it on QEMU. The config is based on >> pmac32_defconfig. > > I had the same config but hit this problem: > > # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf > test_bpf: #0 TAX > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 > bpf_int_jit_compile+0x8a0/0x9f8 I get no such problem, on QEMU, and I checked the .config has: CONFIG_STRICT_KERNEL_RWX=y CONFIG_STRICT_MODULE_RWX=y Boot successful. / # ifconfig eth0 10.0.2.15 e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX / # tftp -g 10.0.2.2 -r test_bpf.ko / # echo 1 > /proc/sys/net/core/bpf_jit_enable / # insmod ./test_bpf.ko test_bpf: #0 TAX jited:1 216 87 86 PASS test_bpf: #1 TXA jited:1 57 27 27 PASS test_bpf: #2 ADD_SUB_MUL_K jited:1 50 PASS test_bpf: #3 DIV_MOD_KX jited:1 110 PASS test_bpf: #4 AND_OR_LSH_K jited:1 67 26 PASS test_bpf: #5 LD_IMM_0 jited:1 77 PASS ... By the way, you can note that during the boot you get: This platform has HASH MMU, STRICT_MODULE_RWX won't work See why in 0670010f3b10 ("powerpc/32s: Enable STRICT_MODULE_RWX for the 603 core") Nevertheless it should prevent patch_instruction() to work. Could you had a pr_err() in __patch_instruction() in the failure path to print and check exec_addr and patch_addr ? > jited:1 > kernel tried to execute exec-protected page (be857020) - exploit > attempt? (uid: 0) > BUG: Unable to handle kernel instruction fetch > Faulting instruction address: 0xbe857020 I'm a bit surprised of this. On hash based book3s/32 there is no way to protect pages for exec-protection. Protection is performed at segment level, all kernel segments have the NX bit set except the segment used for module text, which is by default 0xb0000000-0xbfffffff. Or maybe this is the first time that address is accessed, and the ISI handler does the check before loading the hash table ? > > bpf_jit_binary_pack_finalize() function failed due to > patch_instruction() .. Is there a way tell BPF core that jit failed in that case to avoid that ? Christophe
On 17/11/22 12:29 pm, Christophe Leroy wrote: > > > Le 16/11/2022 à 18:01, Hari Bathini a écrit : >> >> >> On 16/11/22 12:14 am, Christophe Leroy wrote: >>> >>> >>> Le 14/11/2022 à 18:27, Christophe Leroy a écrit : >>>> >>>> >>>> Le 14/11/2022 à 15:47, Hari Bathini a écrit : >>>>> Hi Christophe, >>>>> >>>>> On 11/11/22 4:55 pm, Christophe Leroy wrote: >>>>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit : >>>>>>> Most BPF programs are small, but they consume a page each. For >>>>>>> systems >>>>>>> with busy traffic and many BPF programs, this may also add >>>>>>> significant >>>>>>> pressure on instruction TLB. High iTLB pressure usually slows down >>>>>>> the >>>>>>> whole system causing visible performance degradation for production >>>>>>> workloads. >>>>>>> >>>>>>> bpf_prog_pack, a customized allocator that packs multiple bpf >>>>>>> programs >>>>>>> into preallocated memory chunks, was proposed [1] to address it. This >>>>>>> series extends this support on powerpc. >>>>>>> >>>>>>> Patches 1 & 2 add the arch specific functions needed to support this >>>>>>> feature. Patch 3 enables the support for powerpc. The last patch >>>>>>> ensures cleanup is handled racefully. >>>>>>> >>>>> >>>>>>> Tested the changes successfully on a PowerVM. patch_instruction(), >>>>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it. >>>>>>> Posting the patches in the meanwhile for feedback on these changes. >>>>>> >>>>>> I did a quick test on ppc32, I don't get such a problem, only >>>>>> something >>>>>> wrong in the dump print as traps intructions only are dumped, but >>>>>> tcpdump works as expected: >>>>> >>>>> Thanks for the quick test. Could you please share the config you used. >>>>> I am probably missing a few knobs in my conifg... >>>>> >>>> >>> >>> I also managed to test it on QEMU. The config is based on >>> pmac32_defconfig. >> >> I had the same config but hit this problem: >> >> # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf >> test_bpf: #0 TAX >> ------------[ cut here ]------------ >> WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 >> bpf_int_jit_compile+0x8a0/0x9f8 > > I get no such problem, on QEMU, and I checked the .config has: > CONFIG_STRICT_KERNEL_RWX=y > CONFIG_STRICT_MODULE_RWX=y Yeah. That did the trick. These options were missing in my config and the pmac config you shared. I could not run the other config you shared on QEMU. Thanks for all the pointers. - Hari
Le 18/11/2022 à 09:39, Hari Bathini a écrit : > > > On 17/11/22 12:29 pm, Christophe Leroy wrote: >> >> >> Le 16/11/2022 à 18:01, Hari Bathini a écrit : >>> >>> >>> On 16/11/22 12:14 am, Christophe Leroy wrote: >>>> >>>> >>>> Le 14/11/2022 à 18:27, Christophe Leroy a écrit : >>>>> >>>>> >>>>> Le 14/11/2022 à 15:47, Hari Bathini a écrit : >>>>>> Hi Christophe, >>>>>> >>>>>> On 11/11/22 4:55 pm, Christophe Leroy wrote: >>>>>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit : >>>>>>>> Most BPF programs are small, but they consume a page each. For >>>>>>>> systems >>>>>>>> with busy traffic and many BPF programs, this may also add >>>>>>>> significant >>>>>>>> pressure on instruction TLB. High iTLB pressure usually slows down >>>>>>>> the >>>>>>>> whole system causing visible performance degradation for production >>>>>>>> workloads. >>>>>>>> >>>>>>>> bpf_prog_pack, a customized allocator that packs multiple bpf >>>>>>>> programs >>>>>>>> into preallocated memory chunks, was proposed [1] to address it. >>>>>>>> This >>>>>>>> series extends this support on powerpc. >>>>>>>> >>>>>>>> Patches 1 & 2 add the arch specific functions needed to support >>>>>>>> this >>>>>>>> feature. Patch 3 enables the support for powerpc. The last patch >>>>>>>> ensures cleanup is handled racefully. >>>>>>>> >>>>>> >>>>>>>> Tested the changes successfully on a PowerVM. patch_instruction(), >>>>>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging >>>>>>>> it. >>>>>>>> Posting the patches in the meanwhile for feedback on these changes. >>>>>>> >>>>>>> I did a quick test on ppc32, I don't get such a problem, only >>>>>>> something >>>>>>> wrong in the dump print as traps intructions only are dumped, but >>>>>>> tcpdump works as expected: >>>>>> >>>>>> Thanks for the quick test. Could you please share the config you >>>>>> used. >>>>>> I am probably missing a few knobs in my conifg... >>>>>> >>>>> >>>> >>>> I also managed to test it on QEMU. The config is based on >>>> pmac32_defconfig. >>> >>> I had the same config but hit this problem: >>> >>> # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf >>> test_bpf: #0 TAX >>> ------------[ cut here ]------------ >>> WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 >>> bpf_int_jit_compile+0x8a0/0x9f8 >> >> I get no such problem, on QEMU, and I checked the .config has: > >> CONFIG_STRICT_KERNEL_RWX=y >> CONFIG_STRICT_MODULE_RWX=y > > Yeah. That did the trick. Interesting. I guess we have to find out why it fails when those config are missing. Maybe module code plays with RO and NX flags even if CONFIG_STRICT_MODULE_RWX is not selected ? Christophe
On 18/11/22 2:21 pm, Christophe Leroy wrote: > > > Le 18/11/2022 à 09:39, Hari Bathini a écrit : >> >> >> On 17/11/22 12:29 pm, Christophe Leroy wrote: >>> >>> >>> Le 16/11/2022 à 18:01, Hari Bathini a écrit : >>>> >>>> >>>> On 16/11/22 12:14 am, Christophe Leroy wrote: >>>>> >>>>> >>>>> Le 14/11/2022 à 18:27, Christophe Leroy a écrit : >>>>>> >>>>>> >>>>>> Le 14/11/2022 à 15:47, Hari Bathini a écrit : >>>>>>> Hi Christophe, >>>>>>> >>>>>>> On 11/11/22 4:55 pm, Christophe Leroy wrote: >>>>>>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit : >>>>>>>>> Most BPF programs are small, but they consume a page each. For >>>>>>>>> systems >>>>>>>>> with busy traffic and many BPF programs, this may also add >>>>>>>>> significant >>>>>>>>> pressure on instruction TLB. High iTLB pressure usually slows down >>>>>>>>> the >>>>>>>>> whole system causing visible performance degradation for production >>>>>>>>> workloads. >>>>>>>>> >>>>>>>>> bpf_prog_pack, a customized allocator that packs multiple bpf >>>>>>>>> programs >>>>>>>>> into preallocated memory chunks, was proposed [1] to address it. >>>>>>>>> This >>>>>>>>> series extends this support on powerpc. >>>>>>>>> >>>>>>>>> Patches 1 & 2 add the arch specific functions needed to support >>>>>>>>> this >>>>>>>>> feature. Patch 3 enables the support for powerpc. The last patch >>>>>>>>> ensures cleanup is handled racefully. >>>>>>>>> >>>>>>> >>>>>>>>> Tested the changes successfully on a PowerVM. patch_instruction(), >>>>>>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging >>>>>>>>> it. >>>>>>>>> Posting the patches in the meanwhile for feedback on these changes. >>>>>>>> >>>>>>>> I did a quick test on ppc32, I don't get such a problem, only >>>>>>>> something >>>>>>>> wrong in the dump print as traps intructions only are dumped, but >>>>>>>> tcpdump works as expected: >>>>>>> >>>>>>> Thanks for the quick test. Could you please share the config you >>>>>>> used. >>>>>>> I am probably missing a few knobs in my conifg... >>>>>>> >>>>>> >>>>> >>>>> I also managed to test it on QEMU. The config is based on >>>>> pmac32_defconfig. >>>> >>>> I had the same config but hit this problem: >>>> >>>> # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf >>>> test_bpf: #0 TAX >>>> ------------[ cut here ]------------ >>>> WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 >>>> bpf_int_jit_compile+0x8a0/0x9f8 >>> >>> I get no such problem, on QEMU, and I checked the .config has: >> >>> CONFIG_STRICT_KERNEL_RWX=y >>> CONFIG_STRICT_MODULE_RWX=y >> >> Yeah. That did the trick. > > Interesting. I guess we have to find out why it fails when those config > are missing. > > Maybe module code plays with RO and NX flags even if > CONFIG_STRICT_MODULE_RWX is not selected ? Need to look at the code closely but fwiw, observing same failure on 64-bit as well with !STRICT_RWX... Thanks Hari
Le 18/11/2022 à 10:39, Hari Bathini a écrit : > > > On 18/11/22 2:21 pm, Christophe Leroy wrote: >>>>> >>>>> I had the same config but hit this problem: >>>>> >>>>> # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf >>>>> test_bpf: #0 TAX >>>>> ------------[ cut here ]------------ >>>>> WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 >>>>> bpf_int_jit_compile+0x8a0/0x9f8 >>>> >>>> I get no such problem, on QEMU, and I checked the .config has: >>> >>>> CONFIG_STRICT_KERNEL_RWX=y >>>> CONFIG_STRICT_MODULE_RWX=y >>> >>> Yeah. That did the trick. >> >> Interesting. I guess we have to find out why it fails when those config >> are missing. >> >> Maybe module code plays with RO and NX flags even if >> CONFIG_STRICT_MODULE_RWX is not selected ? > > Need to look at the code closely but fwiw, observing same failure on > 64-bit as well with !STRICT_RWX... The problem is in bpf_prog_pack_alloc() and in alloc_new_pack() : They do set_memory_ro() and set_memory_x() without taking into account CONFIG_STRICT_MODULE_RWX. When CONFIG_STRICT_MODULE_RWX is selected, powerpc module_alloc() allocates PAGE_KERNEL memory, that is RW memory, and expects the user to call do set_memory_ro() and set_memory_x(). But when CONFIG_STRICT_MODULE_RWX is not selected, powerpc module_alloc() allocates PAGE_KERNEL_TEXT memory, that is RWX memory, and expects to be able to always write into it. Christophe
On Fri, Nov 18, 2022 at 3:47 AM Christophe Leroy <christophe.leroy@csgroup.eu> wrote: > > > > Le 18/11/2022 à 10:39, Hari Bathini a écrit : > > > > > > On 18/11/22 2:21 pm, Christophe Leroy wrote: >>>>> > >>>>> I had the same config but hit this problem: > >>>>> > >>>>> # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf > >>>>> test_bpf: #0 TAX > >>>>> ------------[ cut here ]------------ > >>>>> WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 > >>>>> bpf_int_jit_compile+0x8a0/0x9f8 > >>>> > >>>> I get no such problem, on QEMU, and I checked the .config has: > >>> > >>>> CONFIG_STRICT_KERNEL_RWX=y > >>>> CONFIG_STRICT_MODULE_RWX=y > >>> > >>> Yeah. That did the trick. > >> > >> Interesting. I guess we have to find out why it fails when those config > >> are missing. > >> > >> Maybe module code plays with RO and NX flags even if > >> CONFIG_STRICT_MODULE_RWX is not selected ? > > > > Need to look at the code closely but fwiw, observing same failure on > > 64-bit as well with !STRICT_RWX... > > The problem is in bpf_prog_pack_alloc() and in alloc_new_pack() : They > do set_memory_ro() and set_memory_x() without taking into account > CONFIG_STRICT_MODULE_RWX. > > When CONFIG_STRICT_MODULE_RWX is selected, powerpc module_alloc() > allocates PAGE_KERNEL memory, that is RW memory, and expects the user to > call do set_memory_ro() and set_memory_x(). > > But when CONFIG_STRICT_MODULE_RWX is not selected, powerpc > module_alloc() allocates PAGE_KERNEL_TEXT memory, that is RWX memory, > and expects to be able to always write into it. Ah, I see. x86_64 requires CONFIG_STRICT_MODULE_RWX, so this hasn't been a problem yet. Thanks, Song
Le 18/11/2022 à 18:28, Song Liu a écrit : > On Fri, Nov 18, 2022 at 3:47 AM Christophe Leroy > <christophe.leroy@csgroup.eu> wrote: >> >> >> >> Le 18/11/2022 à 10:39, Hari Bathini a écrit : >>> >>> >>> On 18/11/22 2:21 pm, Christophe Leroy wrote: >>>>> >>>>>>> I had the same config but hit this problem: >>>>>>> >>>>>>> # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf >>>>>>> test_bpf: #0 TAX >>>>>>> ------------[ cut here ]------------ >>>>>>> WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 >>>>>>> bpf_int_jit_compile+0x8a0/0x9f8 >>>>>> >>>>>> I get no such problem, on QEMU, and I checked the .config has: >>>>> >>>>>> CONFIG_STRICT_KERNEL_RWX=y >>>>>> CONFIG_STRICT_MODULE_RWX=y >>>>> >>>>> Yeah. That did the trick. >>>> >>>> Interesting. I guess we have to find out why it fails when those config >>>> are missing. >>>> >>>> Maybe module code plays with RO and NX flags even if >>>> CONFIG_STRICT_MODULE_RWX is not selected ? >>> >>> Need to look at the code closely but fwiw, observing same failure on >>> 64-bit as well with !STRICT_RWX... >> >> The problem is in bpf_prog_pack_alloc() and in alloc_new_pack() : They >> do set_memory_ro() and set_memory_x() without taking into account >> CONFIG_STRICT_MODULE_RWX. >> >> When CONFIG_STRICT_MODULE_RWX is selected, powerpc module_alloc() >> allocates PAGE_KERNEL memory, that is RW memory, and expects the user to >> call do set_memory_ro() and set_memory_x(). >> >> But when CONFIG_STRICT_MODULE_RWX is not selected, powerpc >> module_alloc() allocates PAGE_KERNEL_TEXT memory, that is RWX memory, >> and expects to be able to always write into it. > > Ah, I see. x86_64 requires CONFIG_STRICT_MODULE_RWX, so this hasn't > been a problem yet. > In fact it shouldn't be a problem for BPF on powerpc either. Because powerpc BPF expects RO at all time and today uses bpf_jit_binary_lock_ro(). It just means that we can't use patch_instruction() for that. Anyway, using patch_instruction() was sub-optimal. All we have to do I think is set a mirror of the page using vmap() then perform a memcpy() of the code then vunmap() it. Maybe a call to flush_tlb_kernel_range() will be also needed, unless BPF already does it. Christophe