diff mbox

[offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

Message ID fc5ab18a-922c-ea53-f9f3-fd5073c43248@iogearbox.net (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Borkmann July 4, 2018, 11:10 p.m. UTC
On 07/04/2018 09:33 AM, Peter Robinson wrote:
> On Tue, Jun 26, 2018 at 1:52 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 06/26/2018 02:23 PM, Peter Robinson wrote:
>>>>>> On 06/24/2018 11:24 AM, Peter Robinson wrote:
>>>>>>>>> I'm seeing this netlink/sk_filter_trim_cap crash on ARMv7 across quite
>>>>>>>>> a few ARMv7 platforms on Fedora with 4.18rc1. I've tested RPi2/RPi3
>>>>>>>>> (doesn't happen on aarch64), AllWinner H3, BeagleBone and a few
>>>>>>>>> others, both LPAE/normal kernels.
>>>>>>
>>>>>> So this is arm32 right?
>>>>>
>>>>> Correct.
>>>>>
>>>>>>>>> I'm a bit out of my depth in this part of the kernel but I'm wondering
>>>>>>>>> if it's known, I couldn't find anything that looked obvious on a few
>>>>>>>>> mailing lists.
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>
>>>>>>>> Hi Peter
>>>>>>>>
>>>>>>>> Could you provide symbolic information ?
>>>>>>>
>>>>>>> I passed in through scripts/decode_stacktrace.sh is that what you were after:
>>>>>>>
>>>>>>> [    8.673880] Internal error: Oops: a06 [#10] SMP ARM
>>>>>>> [    8.673949] ---[ end trace 049df4786ea3140a ]---
>>>>>>> [    8.678754] Modules linked in:
>>>>>>> [    8.678766] CPU: 1 PID: 206 Comm: systemd-udevd Tainted: G      D
>>>>>>>         4.18.0-0.rc1.git0.1.fc29.armv7hl+lpae #1
>>>>>>> [    8.678769] Hardware name: Allwinner sun8i Family
>>>>>>> [    8.678781] PC is at sk_filter_trim_cap ()
>>>>>>> [    8.678790] LR is at   (null)
>>>>>>> [    8.709463] pc : lr : psr: 60000013 ()
>>>>>>> [    8.715722] sp : c996bd60  ip : 00000000  fp : 00000000
>>>>>>> [    8.720939] r10: ee79dc00  r9 : c12c9f80  r8 : 00000000
>>>>>>> [    8.726157] r7 : 00000000  r6 : 00000001  r5 : f1648000  r4 : 00000000
>>>>>>> [    8.732674] r3 : 00000007  r2 : 00000000  r1 : 00000000  r0 : 00000000
>>>>>>> [    8.739193] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
>>>>>>> [    8.746318] Control: 30c5387d  Table: 6e7bc880  DAC: ffe75ece
>>>>>>> [    8.752055] Process systemd-udevd (pid: 206, stack limit = 0x(ptrval))
>>>>>>> [    8.758574] Stack: (0xc996bd60 to 0xc996c000)
>>>>>>
>>>>>> Do you have BPF JIT enabled or disabled? Does it happen with disabled?
>>>>>
>>>>> Enabled, I can test with it disabled, BPF configs bits are:
>>>>> CONFIG_BPF_EVENTS=y
>>>>> # CONFIG_BPFILTER is not set
>>>>> CONFIG_BPF_JIT_ALWAYS_ON=y
>>>>> CONFIG_BPF_JIT=y
>>>>> CONFIG_BPF_STREAM_PARSER=y
>>>>> CONFIG_BPF_SYSCALL=y
>>>>> CONFIG_BPF=y
>>>>> CONFIG_CGROUP_BPF=y
>>>>> CONFIG_HAVE_EBPF_JIT=y
>>>>> CONFIG_IPV6_SEG6_BPF=y
>>>>> CONFIG_LWTUNNEL_BPF=y
>>>>> # CONFIG_NBPFAXI_DMA is not set
>>>>> CONFIG_NET_ACT_BPF=m
>>>>> CONFIG_NET_CLS_BPF=m
>>>>> CONFIG_NETFILTER_XT_MATCH_BPF=m
>>>>> # CONFIG_TEST_BPF is not set
>>>>>
>>>>>> I can see one bug, but your stack trace seems unrelated.
>>>>>>
>>>>>> Anyway, could you try with this?
>>>>>
>>>>> Build in process.
>>>>>
>>>>>> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
>>>>>> index 6e8b716..f6a62ae 100644
>>>>>> --- a/arch/arm/net/bpf_jit_32.c
>>>>>> +++ b/arch/arm/net/bpf_jit_32.c
>>>>>> @@ -1844,7 +1844,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>>>>>>                 /* there are 2 passes here */
>>>>>>                 bpf_jit_dump(prog->len, image_size, 2, ctx.target);
>>>>>>
>>>>>> -       set_memory_ro((unsigned long)header, header->pages);
>>>>>> +       bpf_jit_binary_lock_ro(header);
>>>>>>         prog->bpf_func = (void *)ctx.target;
>>>>>>         prog->jited = 1;
>>>>>>         prog->jited_len = image_size;
>>>>
>>>> So with that and the other fix there was no improvement, with those
>>>> and the BPF JIT disabled it works, I'm not sure if the two patches
>>>> have any effect with the JIT disabled though.
>>>>
>>>> Will look at the other patches shortly, there's been some other issue
>>>> introduced between rc1 and rc2 which I have to work out before I can
>>>> test those though.
>>>
>>> Quick update, with linus's head as of yesterday, basically rc2 plus
>>> davem's network fixes it works if the JIT is disabled IE:
>>> # CONFIG_BPF_JIT_ALWAYS_ON is not set
>>> # CONFIG_BPF_JIT is not set
>>>
>>> If I enable it the boot breaks even worse than the errors above in
>>> that I get no console output at all, even with earlycon, so we've gone
>>> backwards since rc1 somehow.
>>>
>>> I'll try the above two reverted unless you have any other suggestions.
>>
>> Ok, thanks, lets do that!
>>
>> I'm still working on fixes meanwhile, should have something by end of day.
> 
> Sorry for the delay on this from my end. I noticed there was some bpf
> bits land in the last net fixes pull request landed Monday so I built
> a kernel with the JIT reenabled. It seems it's improved in that the
> completely dead no output boot has gone but the original problem that
> arrived in the merge window still persists:

Okay, thanks a lot! And on top of that tree could you try with the below
applied to check whether it fixes the issue?
diff mbox

Patch

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index f6a62ae..45e6b49 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -234,11 +234,11 @@  static void jit_fill_hole(void *area, unsigned int size)
 #define SCRATCH_SIZE 80

 /* total stack size used in JITed code */
-#define _STACK_SIZE	(ctx->prog->aux->stack_depth + SCRATCH_SIZE)
+#define _STACK_SIZE	(ctx->prog->aux->stack_depth + SCRATCH_SIZE + 4)
 #define STACK_SIZE	ALIGN(_STACK_SIZE, STACK_ALIGNMENT)

 /* Get the offset of eBPF REGISTERs stored on scratch space. */
-#define STACK_VAR(off) (STACK_SIZE - off)
+#define STACK_VAR(off) (STACK_SIZE - 4 - off)

 #if __LINUX_ARM_ARCH__ < 7