mbox series

[RFC,0/3] enable bpf_prog_pack allocator for powerpc

Message ID 20221110184303.393179-1-hbathini@linux.ibm.com (mailing list archive)
Headers show
Series enable bpf_prog_pack allocator for powerpc | expand

Message

Hari Bathini Nov. 10, 2022, 6:43 p.m. UTC
Most BPF programs are small, but they consume a page each. For systems
with busy traffic and many BPF programs, this may also add significant
pressure on instruction TLB. High iTLB pressure usually slows down the
whole system causing visible performance degradation for production
workloads.

bpf_prog_pack, a customized allocator that packs multiple bpf programs
into preallocated memory chunks, was proposed [1] to address it. This
series extends this support on powerpc.

Patches 1 & 2 add the arch specific functions needed to support this
feature. Patch 3 enables the support for powerpc. The last patch
ensures cleanup is handled racefully.

Tested the changes successfully on a PowerVM. patch_instruction(),
needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it.
Posting the patches in the meanwhile for feedback on these changes.

[1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/

Hari Bathini (3):
  powerpc/bpf: implement bpf_arch_text_copy
  powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
  powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]

 arch/powerpc/net/bpf_jit.h        |  18 +--
 arch/powerpc/net/bpf_jit_comp.c   | 194 ++++++++++++++++++++++++------
 arch/powerpc/net/bpf_jit_comp32.c |  26 ++--
 arch/powerpc/net/bpf_jit_comp64.c |  32 ++---
 4 files changed, 198 insertions(+), 72 deletions(-)

Comments

Christophe Leroy Nov. 11, 2022, 11:25 a.m. UTC | #1
Le 10/11/2022 à 19:43, Hari Bathini a écrit :
> Most BPF programs are small, but they consume a page each. For systems
> with busy traffic and many BPF programs, this may also add significant
> pressure on instruction TLB. High iTLB pressure usually slows down the
> whole system causing visible performance degradation for production
> workloads.
> 
> bpf_prog_pack, a customized allocator that packs multiple bpf programs
> into preallocated memory chunks, was proposed [1] to address it. This
> series extends this support on powerpc.
> 
> Patches 1 & 2 add the arch specific functions needed to support this
> feature. Patch 3 enables the support for powerpc. The last patch
> ensures cleanup is handled racefully.
> 
> Tested the changes successfully on a PowerVM. patch_instruction(),
> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it.
> Posting the patches in the meanwhile for feedback on these changes.

I did a quick test on ppc32, I don't get such a problem, only something 
wrong in the dump print as traps intructions only are dumped, but 
tcpdump works as expected:

[   55.692998] bpf_jit_enable = 2 was set! NEVER use this in production, 
only for JIT debugging!
[   66.279259] device eth0 entered promiscuous mode
[   67.214756] Pass 1: shrink = 0, seen = 0x1f980000
[   67.214880] Pass 2: shrink = 0, seen = 0x1f980000
[   67.214966] flen=5 proglen=60 pass=3 image=be7a8038 from=tcpdump pid=459
[   67.225261] JIT code: 00000000: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.233904] JIT code: 00000010: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.242579] JIT code: 00000020: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.249694] JIT code: 00000030: 7f e0 00 08 7f e0 00 08 7f e0 00 08
[   67.259255] Pass 1: shrink = 0, seen = 0x3ff801fe
[   67.259421] Pass 2: shrink = 0, seen = 0x3ff801fe
[   67.259514] flen=40 proglen=504 pass=3 image=be7a80a0 from=tcpdump 
pid=459
[   67.269467] JIT code: 00000000: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.278001] JIT code: 00000010: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.286519] JIT code: 00000020: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.295041] JIT code: 00000030: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.303596] JIT code: 00000040: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.312164] JIT code: 00000050: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.319231] JIT code: 00000060: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.328822] JIT code: 00000070: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.337382] JIT code: 00000080: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.345901] JIT code: 00000090: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.354423] JIT code: 000000a0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.362941] JIT code: 000000b0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.371462] JIT code: 000000c0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.378526] JIT code: 000000d0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.388120] JIT code: 000000e0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.396680] JIT code: 000000f0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.405199] JIT code: 00000100: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.413756] JIT code: 00000110: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.422324] JIT code: 00000120: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.429389] JIT code: 00000130: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.438982] JIT code: 00000140: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.447541] JIT code: 00000150: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.456059] JIT code: 00000160: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.464578] JIT code: 00000170: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.473201] JIT code: 00000180: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.481705] JIT code: 00000190: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.488770] JIT code: 000001a0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.498359] JIT code: 000001b0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.506921] JIT code: 000001c0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.515439] JIT code: 000001d0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.523998] JIT code: 000001e0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.532565] JIT code: 000001f0: 7f e0 00 08 7f e0 00 08
[   82.620898] device eth0 left promiscuous mode


> 
> [1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/
> 
> Hari Bathini (3):
>    powerpc/bpf: implement bpf_arch_text_copy
>    powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
>    powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]
> 
>   arch/powerpc/net/bpf_jit.h        |  18 +--
>   arch/powerpc/net/bpf_jit_comp.c   | 194 ++++++++++++++++++++++++------
>   arch/powerpc/net/bpf_jit_comp32.c |  26 ++--
>   arch/powerpc/net/bpf_jit_comp64.c |  32 ++---
>   4 files changed, 198 insertions(+), 72 deletions(-)
>
Hari Bathini Nov. 14, 2022, 2:47 p.m. UTC | #2
Hi Christophe,

On 11/11/22 4:55 pm, Christophe Leroy wrote:
> Le 10/11/2022 à 19:43, Hari Bathini a écrit :
>> Most BPF programs are small, but they consume a page each. For systems
>> with busy traffic and many BPF programs, this may also add significant
>> pressure on instruction TLB. High iTLB pressure usually slows down the
>> whole system causing visible performance degradation for production
>> workloads.
>>
>> bpf_prog_pack, a customized allocator that packs multiple bpf programs
>> into preallocated memory chunks, was proposed [1] to address it. This
>> series extends this support on powerpc.
>>
>> Patches 1 & 2 add the arch specific functions needed to support this
>> feature. Patch 3 enables the support for powerpc. The last patch
>> ensures cleanup is handled racefully.
>>

>> Tested the changes successfully on a PowerVM. patch_instruction(),
>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it.
>> Posting the patches in the meanwhile for feedback on these changes.
> 
> I did a quick test on ppc32, I don't get such a problem, only something
> wrong in the dump print as traps intructions only are dumped, but
> tcpdump works as expected:

Thanks for the quick test. Could you please share the config you used.
I am probably missing a few knobs in my conifg...

Thanks
Hari
Hari Bathini Nov. 16, 2022, 5:01 p.m. UTC | #3
On 16/11/22 12:14 am, Christophe Leroy wrote:
> 
> 
> Le 14/11/2022 à 18:27, Christophe Leroy a écrit :
>>
>>
>> Le 14/11/2022 à 15:47, Hari Bathini a écrit :
>>> Hi Christophe,
>>>
>>> On 11/11/22 4:55 pm, Christophe Leroy wrote:
>>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit :
>>>>> Most BPF programs are small, but they consume a page each. For systems
>>>>> with busy traffic and many BPF programs, this may also add significant
>>>>> pressure on instruction TLB. High iTLB pressure usually slows down the
>>>>> whole system causing visible performance degradation for production
>>>>> workloads.
>>>>>
>>>>> bpf_prog_pack, a customized allocator that packs multiple bpf programs
>>>>> into preallocated memory chunks, was proposed [1] to address it. This
>>>>> series extends this support on powerpc.
>>>>>
>>>>> Patches 1 & 2 add the arch specific functions needed to support this
>>>>> feature. Patch 3 enables the support for powerpc. The last patch
>>>>> ensures cleanup is handled racefully.
>>>>>
>>>
>>>>> Tested the changes successfully on a PowerVM. patch_instruction(),
>>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it.
>>>>> Posting the patches in the meanwhile for feedback on these changes.
>>>>
>>>> I did a quick test on ppc32, I don't get such a problem, only something
>>>> wrong in the dump print as traps intructions only are dumped, but
>>>> tcpdump works as expected:
>>>
>>> Thanks for the quick test. Could you please share the config you used.
>>> I am probably missing a few knobs in my conifg...
>>>
>>
> 
> I also managed to test it on QEMU. The config is based on pmac32_defconfig.

I had the same config but hit this problem:

	# echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
	test_bpf: #0 TAX
	------------[ cut here ]------------
	WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 
bpf_int_jit_compile+0x8a0/0x9f8
	Modules linked in: test_bpf(+)
	CPU: 0 PID: 96 Comm: modprobe Not tainted 6.1.0-rc5+ #116
	Hardware name: PowerMac3,1 750CL 0x87210 PowerMac
	NIP:  c0038224 LR: c0037f74 CTR: 00000009
	REGS: f1b41b10 TRAP: 0700   Not tainted  (6.1.0-rc5+)
	MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 82004882  XER: 20000000

	GPR00: c0037f74 f1b41bd0 c12333c0 ffffffea c19747e0 c0123a08 c1974a00 
c12333c0
	GPR08: 00000000 00000b58 00009032 00000003 82004882 100d816a f2660000 
00000000
	GPR16: 00000000 c0c10000 00000000 c0c10000 c1913780 00000000 0000001a 
100a2ee3
	GPR24: 0000014c be857000 be857020 f1b41c00 c194c180 f1b46000 ffffffea 
f1b46000
	NIP [c0038224] bpf_int_jit_compile+0x8a0/0x9f8
	LR [c0037f74] bpf_int_jit_compile+0x5f0/0x9f8
	Call Trace:
	[f1b41bd0] [c0037f74] bpf_int_jit_compile+0x5f0/0x9f8 (unreliable)
	[f1b41ca0] [c0123790] bpf_prog_select_runtime+0x178/0x1c4
	[f1b41cd0] [c06cc4b0] bpf_prepare_filter+0x524/0x624
	[f1b41d20] [c06cc63c] bpf_prog_create+0x8c/0xd0
	[f1b41d40] [be85083c] test_bpf_init+0x46c/0x11b0 [test_bpf]
	[f1b41df0] [c0008534] do_one_initcall+0x58/0x1b8
	[f1b41e60] [c00b6c38] do_init_module+0x54/0x1e4
	[f1b41e80] [c00b8f80] sys_init_module+0x10c/0x174
	[f1b41f10] [c0014390] system_call_exception+0x94/0x144
	[f1b41f30] [c001a1ac] ret_from_syscall+0x0/0x2c
	--- interrupt: c00 at 0xfef48ac
	NIP:  0fef48ac LR: 10015de0 CTR: 00000000
	REGS: f1b41f40 TRAP: 0c00   Not tainted  (6.1.0-rc5+)
	MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 88002224  XER: 20000000

	GPR00: 00000080 aff34990 a7a8d380 a75c0010 00477e90 1051b9a0 0fedfdd8 
0000d032
	GPR08: 00000000 00000008 00478ffa 400f26fa 400f23c7 100d816a 00000000 
00000000
	GPR16: 00000000 1051b950 00000000 1051b92c 100d0000 100a2eab 100a2e97 
100a2ee3
	GPR24: 100a2ed5 1051b980 00000001 100d01a8 1051b920 a75c0010 1051b9a0 
a7a86388
	NIP [0fef48ac] 0xfef48ac
	LR [10015de0] 0x10015de0
	--- interrupt: c00
	Instruction dump:
	8081001c 38a00004 7f23cb78 4bfff5d1 7f23cb78 8081001c 480eba85 82410098
	8261009c 82a100a4 82e100ac 4bfffce8 <0fe00000> 92010090 92410098 92e100ac
	---[ end trace 0000000000000000 ]---
	jited:1
	kernel tried to execute exec-protected page (be857020) - exploit 
attempt? (uid: 0)
	BUG: Unable to handle kernel instruction fetch
	Faulting instruction address: 0xbe857020
	Vector: 400 (Instruction Access) at [f1b41c30]
	    pc: be857020
	    lr: be84eaa4: __run_one+0xec/0x248 [test_bpf]
	    sp: f1b41cf0
	   msr: 40009032
	  current = 0xc12333c0
	    pid   = 96, comm = modprobe
	Linux version 6.1.0-rc5+ (root@hbathini-Standard-PC-Q35-ICH9-2009) 
(powerpc-linux-gnu-gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, GNU ld 
(GNU Binutils for Ubuntu) 2.38) #116 Wed Nov 16 20:48:10 IST 2022
	enter ? for help
	[link register   ] be84eaa4 __run_one+0xec/0x248 [test_bpf]
	[f1b41cf0] be84e9dc __run_one+0x24/0x248 [test_bpf] (unreliable)
	[f1b41d40] be850c78 test_bpf_init+0x8a8/0x11b0 [test_bpf]
	[f1b41df0] c0008534 do_one_initcall+0x58/0x1b8
	[f1b41e60] c00b6c38 do_init_module+0x54/0x1e4
	[f1b41e80] c00b8f80 sys_init_module+0x10c/0x174
	[f1b41f10] c0014390 system_call_exception+0x94/0x144
	[f1b41f30] c001a1ac ret_from_syscall+0x0/0x2c
	--- Exception: c00 (System Call) at 0fef48ac
	SP (aff34990) is in userspace
	mon>

bpf_jit_binary_pack_finalize() function failed due to patch_instruction() ..
Also, I am hitting the same problem with the other config too..

Thanks
Hari
Christophe Leroy Nov. 16, 2022, 5:16 p.m. UTC | #4
Le 16/11/2022 à 18:01, Hari Bathini a écrit :
>>
>> I also managed to test it on QEMU. The config is based on 
>> pmac32_defconfig.
> 
> I had the same config but hit this problem:
> 
>      # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>      test_bpf: #0 TAX
>      ------------[ cut here ]------------
>      WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 
> bpf_int_jit_compile+0x8a0/0x9f8

Ok, I'll give it a try with test_bpf module.

Christophe
Christophe Leroy Nov. 17, 2022, 6:59 a.m. UTC | #5
Le 16/11/2022 à 18:01, Hari Bathini a écrit :
> 
> 
> On 16/11/22 12:14 am, Christophe Leroy wrote:
>>
>>
>> Le 14/11/2022 à 18:27, Christophe Leroy a écrit :
>>>
>>>
>>> Le 14/11/2022 à 15:47, Hari Bathini a écrit :
>>>> Hi Christophe,
>>>>
>>>> On 11/11/22 4:55 pm, Christophe Leroy wrote:
>>>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit :
>>>>>> Most BPF programs are small, but they consume a page each. For 
>>>>>> systems
>>>>>> with busy traffic and many BPF programs, this may also add 
>>>>>> significant
>>>>>> pressure on instruction TLB. High iTLB pressure usually slows down 
>>>>>> the
>>>>>> whole system causing visible performance degradation for production
>>>>>> workloads.
>>>>>>
>>>>>> bpf_prog_pack, a customized allocator that packs multiple bpf 
>>>>>> programs
>>>>>> into preallocated memory chunks, was proposed [1] to address it. This
>>>>>> series extends this support on powerpc.
>>>>>>
>>>>>> Patches 1 & 2 add the arch specific functions needed to support this
>>>>>> feature. Patch 3 enables the support for powerpc. The last patch
>>>>>> ensures cleanup is handled racefully.
>>>>>>
>>>>
>>>>>> Tested the changes successfully on a PowerVM. patch_instruction(),
>>>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it.
>>>>>> Posting the patches in the meanwhile for feedback on these changes.
>>>>>
>>>>> I did a quick test on ppc32, I don't get such a problem, only 
>>>>> something
>>>>> wrong in the dump print as traps intructions only are dumped, but
>>>>> tcpdump works as expected:
>>>>
>>>> Thanks for the quick test. Could you please share the config you used.
>>>> I am probably missing a few knobs in my conifg...
>>>>
>>>
>>
>> I also managed to test it on QEMU. The config is based on 
>> pmac32_defconfig.
> 
> I had the same config but hit this problem:
> 
>      # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>      test_bpf: #0 TAX
>      ------------[ cut here ]------------
>      WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 
> bpf_int_jit_compile+0x8a0/0x9f8

I get no such problem, on QEMU, and I checked the .config has:
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_STRICT_MODULE_RWX=y

Boot successful.
/ # ifconfig eth0 10.0.2.15
e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
/ # tftp -g 10.0.2.2 -r test_bpf.ko
/ # echo 1 > /proc/sys/net/core/bpf_jit_enable
/ # insmod ./test_bpf.ko
test_bpf: #0 TAX jited:1 216 87 86 PASS
test_bpf: #1 TXA jited:1 57 27 27 PASS
test_bpf: #2 ADD_SUB_MUL_K jited:1 50 PASS
test_bpf: #3 DIV_MOD_KX jited:1 110 PASS
test_bpf: #4 AND_OR_LSH_K jited:1 67 26 PASS
test_bpf: #5 LD_IMM_0 jited:1 77 PASS
...

By the way, you can note that during the boot you get:

	This platform has HASH MMU, STRICT_MODULE_RWX won't work

See why in 0670010f3b10 ("powerpc/32s: Enable STRICT_MODULE_RWX for the 
603 core")

Nevertheless it should prevent patch_instruction() to work.

Could you had a pr_err() in __patch_instruction() in the failure path to 
print and check exec_addr and patch_addr ?



>      jited:1
>      kernel tried to execute exec-protected page (be857020) - exploit 
> attempt? (uid: 0)
>      BUG: Unable to handle kernel instruction fetch
>      Faulting instruction address: 0xbe857020

I'm a bit surprised of this. On hash based book3s/32 there is no way to 
protect pages for exec-protection. Protection is performed at segment 
level, all kernel segments have the NX bit set except the segment used 
for module text, which is by default 0xb0000000-0xbfffffff.

Or maybe this is the first time that address is accessed, and the ISI 
handler does the check before loading the hash table ?

> 
> bpf_jit_binary_pack_finalize() function failed due to 
> patch_instruction() ..

Is there a way tell BPF core that jit failed in that case to avoid that ?

Christophe
Hari Bathini Nov. 18, 2022, 8:39 a.m. UTC | #6
On 17/11/22 12:29 pm, Christophe Leroy wrote:
> 
> 
> Le 16/11/2022 à 18:01, Hari Bathini a écrit :
>>
>>
>> On 16/11/22 12:14 am, Christophe Leroy wrote:
>>>
>>>
>>> Le 14/11/2022 à 18:27, Christophe Leroy a écrit :
>>>>
>>>>
>>>> Le 14/11/2022 à 15:47, Hari Bathini a écrit :
>>>>> Hi Christophe,
>>>>>
>>>>> On 11/11/22 4:55 pm, Christophe Leroy wrote:
>>>>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit :
>>>>>>> Most BPF programs are small, but they consume a page each. For
>>>>>>> systems
>>>>>>> with busy traffic and many BPF programs, this may also add
>>>>>>> significant
>>>>>>> pressure on instruction TLB. High iTLB pressure usually slows down
>>>>>>> the
>>>>>>> whole system causing visible performance degradation for production
>>>>>>> workloads.
>>>>>>>
>>>>>>> bpf_prog_pack, a customized allocator that packs multiple bpf
>>>>>>> programs
>>>>>>> into preallocated memory chunks, was proposed [1] to address it. This
>>>>>>> series extends this support on powerpc.
>>>>>>>
>>>>>>> Patches 1 & 2 add the arch specific functions needed to support this
>>>>>>> feature. Patch 3 enables the support for powerpc. The last patch
>>>>>>> ensures cleanup is handled racefully.
>>>>>>>
>>>>>
>>>>>>> Tested the changes successfully on a PowerVM. patch_instruction(),
>>>>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it.
>>>>>>> Posting the patches in the meanwhile for feedback on these changes.
>>>>>>
>>>>>> I did a quick test on ppc32, I don't get such a problem, only
>>>>>> something
>>>>>> wrong in the dump print as traps intructions only are dumped, but
>>>>>> tcpdump works as expected:
>>>>>
>>>>> Thanks for the quick test. Could you please share the config you used.
>>>>> I am probably missing a few knobs in my conifg...
>>>>>
>>>>
>>>
>>> I also managed to test it on QEMU. The config is based on
>>> pmac32_defconfig.
>>
>> I had the same config but hit this problem:
>>
>>       # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>>       test_bpf: #0 TAX
>>       ------------[ cut here ]------------
>>       WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367
>> bpf_int_jit_compile+0x8a0/0x9f8
> 
> I get no such problem, on QEMU, and I checked the .config has:

> CONFIG_STRICT_KERNEL_RWX=y
> CONFIG_STRICT_MODULE_RWX=y

Yeah. That did the trick. These options were missing in my config and
the pmac config you shared. I could not run the other config you shared
on QEMU. Thanks for all the pointers.

- Hari
Christophe Leroy Nov. 18, 2022, 8:51 a.m. UTC | #7
Le 18/11/2022 à 09:39, Hari Bathini a écrit :
> 
> 
> On 17/11/22 12:29 pm, Christophe Leroy wrote:
>>
>>
>> Le 16/11/2022 à 18:01, Hari Bathini a écrit :
>>>
>>>
>>> On 16/11/22 12:14 am, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 14/11/2022 à 18:27, Christophe Leroy a écrit :
>>>>>
>>>>>
>>>>> Le 14/11/2022 à 15:47, Hari Bathini a écrit :
>>>>>> Hi Christophe,
>>>>>>
>>>>>> On 11/11/22 4:55 pm, Christophe Leroy wrote:
>>>>>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit :
>>>>>>>> Most BPF programs are small, but they consume a page each. For
>>>>>>>> systems
>>>>>>>> with busy traffic and many BPF programs, this may also add
>>>>>>>> significant
>>>>>>>> pressure on instruction TLB. High iTLB pressure usually slows down
>>>>>>>> the
>>>>>>>> whole system causing visible performance degradation for production
>>>>>>>> workloads.
>>>>>>>>
>>>>>>>> bpf_prog_pack, a customized allocator that packs multiple bpf
>>>>>>>> programs
>>>>>>>> into preallocated memory chunks, was proposed [1] to address it. 
>>>>>>>> This
>>>>>>>> series extends this support on powerpc.
>>>>>>>>
>>>>>>>> Patches 1 & 2 add the arch specific functions needed to support 
>>>>>>>> this
>>>>>>>> feature. Patch 3 enables the support for powerpc. The last patch
>>>>>>>> ensures cleanup is handled racefully.
>>>>>>>>
>>>>>>
>>>>>>>> Tested the changes successfully on a PowerVM. patch_instruction(),
>>>>>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging 
>>>>>>>> it.
>>>>>>>> Posting the patches in the meanwhile for feedback on these changes.
>>>>>>>
>>>>>>> I did a quick test on ppc32, I don't get such a problem, only
>>>>>>> something
>>>>>>> wrong in the dump print as traps intructions only are dumped, but
>>>>>>> tcpdump works as expected:
>>>>>>
>>>>>> Thanks for the quick test. Could you please share the config you 
>>>>>> used.
>>>>>> I am probably missing a few knobs in my conifg...
>>>>>>
>>>>>
>>>>
>>>> I also managed to test it on QEMU. The config is based on
>>>> pmac32_defconfig.
>>>
>>> I had the same config but hit this problem:
>>>
>>>       # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>>>       test_bpf: #0 TAX
>>>       ------------[ cut here ]------------
>>>       WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367
>>> bpf_int_jit_compile+0x8a0/0x9f8
>>
>> I get no such problem, on QEMU, and I checked the .config has:
> 
>> CONFIG_STRICT_KERNEL_RWX=y
>> CONFIG_STRICT_MODULE_RWX=y
> 
> Yeah. That did the trick.

Interesting. I guess we have to find out why it fails when those config 
are missing.

Maybe module code plays with RO and NX flags even if 
CONFIG_STRICT_MODULE_RWX is not selected ?

Christophe
Hari Bathini Nov. 18, 2022, 9:39 a.m. UTC | #8
On 18/11/22 2:21 pm, Christophe Leroy wrote:
> 
> 
> Le 18/11/2022 à 09:39, Hari Bathini a écrit :
>>
>>
>> On 17/11/22 12:29 pm, Christophe Leroy wrote:
>>>
>>>
>>> Le 16/11/2022 à 18:01, Hari Bathini a écrit :
>>>>
>>>>
>>>> On 16/11/22 12:14 am, Christophe Leroy wrote:
>>>>>
>>>>>
>>>>> Le 14/11/2022 à 18:27, Christophe Leroy a écrit :
>>>>>>
>>>>>>
>>>>>> Le 14/11/2022 à 15:47, Hari Bathini a écrit :
>>>>>>> Hi Christophe,
>>>>>>>
>>>>>>> On 11/11/22 4:55 pm, Christophe Leroy wrote:
>>>>>>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit :
>>>>>>>>> Most BPF programs are small, but they consume a page each. For
>>>>>>>>> systems
>>>>>>>>> with busy traffic and many BPF programs, this may also add
>>>>>>>>> significant
>>>>>>>>> pressure on instruction TLB. High iTLB pressure usually slows down
>>>>>>>>> the
>>>>>>>>> whole system causing visible performance degradation for production
>>>>>>>>> workloads.
>>>>>>>>>
>>>>>>>>> bpf_prog_pack, a customized allocator that packs multiple bpf
>>>>>>>>> programs
>>>>>>>>> into preallocated memory chunks, was proposed [1] to address it.
>>>>>>>>> This
>>>>>>>>> series extends this support on powerpc.
>>>>>>>>>
>>>>>>>>> Patches 1 & 2 add the arch specific functions needed to support
>>>>>>>>> this
>>>>>>>>> feature. Patch 3 enables the support for powerpc. The last patch
>>>>>>>>> ensures cleanup is handled racefully.
>>>>>>>>>
>>>>>>>
>>>>>>>>> Tested the changes successfully on a PowerVM. patch_instruction(),
>>>>>>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging
>>>>>>>>> it.
>>>>>>>>> Posting the patches in the meanwhile for feedback on these changes.
>>>>>>>>
>>>>>>>> I did a quick test on ppc32, I don't get such a problem, only
>>>>>>>> something
>>>>>>>> wrong in the dump print as traps intructions only are dumped, but
>>>>>>>> tcpdump works as expected:
>>>>>>>
>>>>>>> Thanks for the quick test. Could you please share the config you
>>>>>>> used.
>>>>>>> I am probably missing a few knobs in my conifg...
>>>>>>>
>>>>>>
>>>>>
>>>>> I also managed to test it on QEMU. The config is based on
>>>>> pmac32_defconfig.
>>>>
>>>> I had the same config but hit this problem:
>>>>
>>>>        # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>>>>        test_bpf: #0 TAX
>>>>        ------------[ cut here ]------------
>>>>        WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367
>>>> bpf_int_jit_compile+0x8a0/0x9f8
>>>
>>> I get no such problem, on QEMU, and I checked the .config has:
>>
>>> CONFIG_STRICT_KERNEL_RWX=y
>>> CONFIG_STRICT_MODULE_RWX=y
>>
>> Yeah. That did the trick.
> 
> Interesting. I guess we have to find out why it fails when those config
> are missing.
> 
> Maybe module code plays with RO and NX flags even if
> CONFIG_STRICT_MODULE_RWX is not selected ?

Need to look at the code closely but fwiw, observing same failure on
64-bit as well with !STRICT_RWX...

Thanks
Hari
Christophe Leroy Nov. 18, 2022, 11:46 a.m. UTC | #9
Le 18/11/2022 à 10:39, Hari Bathini a écrit :
> 
> 
> On 18/11/22 2:21 pm, Christophe Leroy wrote: >>>>>
>>>>> I had the same config but hit this problem:
>>>>>
>>>>>        # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>>>>>        test_bpf: #0 TAX
>>>>>        ------------[ cut here ]------------
>>>>>        WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367
>>>>> bpf_int_jit_compile+0x8a0/0x9f8
>>>>
>>>> I get no such problem, on QEMU, and I checked the .config has:
>>>
>>>> CONFIG_STRICT_KERNEL_RWX=y
>>>> CONFIG_STRICT_MODULE_RWX=y
>>>
>>> Yeah. That did the trick.
>>
>> Interesting. I guess we have to find out why it fails when those config
>> are missing.
>>
>> Maybe module code plays with RO and NX flags even if
>> CONFIG_STRICT_MODULE_RWX is not selected ?
> 
> Need to look at the code closely but fwiw, observing same failure on
> 64-bit as well with !STRICT_RWX...

The problem is in bpf_prog_pack_alloc() and in alloc_new_pack() : They 
do set_memory_ro() and set_memory_x() without taking into account 
CONFIG_STRICT_MODULE_RWX.

When CONFIG_STRICT_MODULE_RWX is selected, powerpc module_alloc() 
allocates PAGE_KERNEL memory, that is RW memory, and expects the user to 
call do set_memory_ro() and set_memory_x().

But when CONFIG_STRICT_MODULE_RWX is not selected, powerpc 
module_alloc() allocates PAGE_KERNEL_TEXT memory, that is RWX memory, 
and expects to be able to always write into it.

Christophe
Song Liu Nov. 18, 2022, 5:28 p.m. UTC | #10
On Fri, Nov 18, 2022 at 3:47 AM Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
>
>
> Le 18/11/2022 à 10:39, Hari Bathini a écrit :
> >
> >
> > On 18/11/22 2:21 pm, Christophe Leroy wrote: >>>>>
> >>>>> I had the same config but hit this problem:
> >>>>>
> >>>>>        # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
> >>>>>        test_bpf: #0 TAX
> >>>>>        ------------[ cut here ]------------
> >>>>>        WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367
> >>>>> bpf_int_jit_compile+0x8a0/0x9f8
> >>>>
> >>>> I get no such problem, on QEMU, and I checked the .config has:
> >>>
> >>>> CONFIG_STRICT_KERNEL_RWX=y
> >>>> CONFIG_STRICT_MODULE_RWX=y
> >>>
> >>> Yeah. That did the trick.
> >>
> >> Interesting. I guess we have to find out why it fails when those config
> >> are missing.
> >>
> >> Maybe module code plays with RO and NX flags even if
> >> CONFIG_STRICT_MODULE_RWX is not selected ?
> >
> > Need to look at the code closely but fwiw, observing same failure on
> > 64-bit as well with !STRICT_RWX...
>
> The problem is in bpf_prog_pack_alloc() and in alloc_new_pack() : They
> do set_memory_ro() and set_memory_x() without taking into account
> CONFIG_STRICT_MODULE_RWX.
>
> When CONFIG_STRICT_MODULE_RWX is selected, powerpc module_alloc()
> allocates PAGE_KERNEL memory, that is RW memory, and expects the user to
> call do set_memory_ro() and set_memory_x().
>
> But when CONFIG_STRICT_MODULE_RWX is not selected, powerpc
> module_alloc() allocates PAGE_KERNEL_TEXT memory, that is RWX memory,
> and expects to be able to always write into it.

Ah, I see. x86_64 requires CONFIG_STRICT_MODULE_RWX, so this hasn't
been a problem yet.

Thanks,
Song
Christophe Leroy Nov. 18, 2022, 6:05 p.m. UTC | #11
Le 18/11/2022 à 18:28, Song Liu a écrit :
> On Fri, Nov 18, 2022 at 3:47 AM Christophe Leroy
> <christophe.leroy@csgroup.eu> wrote:
>>
>>
>>
>> Le 18/11/2022 à 10:39, Hari Bathini a écrit :
>>>
>>>
>>> On 18/11/22 2:21 pm, Christophe Leroy wrote: >>>>>
>>>>>>> I had the same config but hit this problem:
>>>>>>>
>>>>>>>         # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>>>>>>>         test_bpf: #0 TAX
>>>>>>>         ------------[ cut here ]------------
>>>>>>>         WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367
>>>>>>> bpf_int_jit_compile+0x8a0/0x9f8
>>>>>>
>>>>>> I get no such problem, on QEMU, and I checked the .config has:
>>>>>
>>>>>> CONFIG_STRICT_KERNEL_RWX=y
>>>>>> CONFIG_STRICT_MODULE_RWX=y
>>>>>
>>>>> Yeah. That did the trick.
>>>>
>>>> Interesting. I guess we have to find out why it fails when those config
>>>> are missing.
>>>>
>>>> Maybe module code plays with RO and NX flags even if
>>>> CONFIG_STRICT_MODULE_RWX is not selected ?
>>>
>>> Need to look at the code closely but fwiw, observing same failure on
>>> 64-bit as well with !STRICT_RWX...
>>
>> The problem is in bpf_prog_pack_alloc() and in alloc_new_pack() : They
>> do set_memory_ro() and set_memory_x() without taking into account
>> CONFIG_STRICT_MODULE_RWX.
>>
>> When CONFIG_STRICT_MODULE_RWX is selected, powerpc module_alloc()
>> allocates PAGE_KERNEL memory, that is RW memory, and expects the user to
>> call do set_memory_ro() and set_memory_x().
>>
>> But when CONFIG_STRICT_MODULE_RWX is not selected, powerpc
>> module_alloc() allocates PAGE_KERNEL_TEXT memory, that is RWX memory,
>> and expects to be able to always write into it.
> 
> Ah, I see. x86_64 requires CONFIG_STRICT_MODULE_RWX, so this hasn't
> been a problem yet.
> 

In fact it shouldn't be a problem for BPF on powerpc either. Because 
powerpc BPF expects RO at all time and today uses bpf_jit_binary_lock_ro().

It just means that we can't use patch_instruction() for that. Anyway, 
using patch_instruction() was sub-optimal.

All we have to do I think is set a mirror of the page using vmap() then 
perform a memcpy() of the code then vunmap() it. Maybe a call to 
flush_tlb_kernel_range() will be also needed, unless BPF already does it.

Christophe