diff mbox series

riscv: Flush current cpu icache before other cpus

Message ID 20210918160221.111902-1-alex@ghiti.fr (mailing list archive)
State New, archived
Headers show
Series riscv: Flush current cpu icache before other cpus | expand

Commit Message

Alexandre Ghiti Sept. 18, 2021, 4:02 p.m. UTC
On SiFive Unmatched, I recently fell onto the following BUG when booting:

[    0.000000] ftrace: allocating 36610 entries in 144 pages
[    0.000000] Oops - illegal instruction [#1]
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.1+ #5
[    0.000000] Hardware name: SiFive HiFive Unmatched A00 (DT)
[    0.000000] epc : riscv_cpuid_to_hartid_mask+0x6/0xae
[    0.000000]  ra : __sbi_rfence_v02+0xc8/0x10a
[    0.000000] epc : ffffffff80007240 ra : ffffffff80009964 sp : ffffffff81803e10
[    0.000000]  gp : ffffffff81a1ea70 tp : ffffffff8180f500 t0 : ffffffe07fe30000
[    0.000000]  t1 : 0000000000000004 t2 : 0000000000000000 s0 : ffffffff81803e60
[    0.000000]  s1 : 0000000000000000 a0 : ffffffff81a22238 a1 : ffffffff81803e10
[    0.000000]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
[    0.000000]  a5 : 0000000000000000 a6 : ffffffff8000989c a7 : 0000000052464e43
[    0.000000]  s2 : ffffffff81a220c8 s3 : 0000000000000000 s4 : 0000000000000000
[    0.000000]  s5 : 0000000000000000 s6 : 0000000200000100 s7 : 0000000000000001
[    0.000000]  s8 : ffffffe07fe04040 s9 : ffffffff81a22c80 s10: 0000000000001000
[    0.000000]  s11: 0000000000000004 t3 : 0000000000000001 t4 : 0000000000000008
[    0.000000]  t5 : ffffffcf04000808 t6 : ffffffe3ffddf188
[    0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000002
[    0.000000] [<ffffffff80007240>] riscv_cpuid_to_hartid_mask+0x6/0xae
[    0.000000] [<ffffffff80009474>] sbi_remote_fence_i+0x1e/0x26
[    0.000000] [<ffffffff8000b8f4>] flush_icache_all+0x12/0x1a
[    0.000000] [<ffffffff8000666c>] patch_text_nosync+0x26/0x32
[    0.000000] [<ffffffff8000884e>] ftrace_init_nop+0x52/0x8c
[    0.000000] [<ffffffff800f051e>] ftrace_process_locs.isra.0+0x29c/0x360
[    0.000000] [<ffffffff80a0e3c6>] ftrace_init+0x80/0x130
[    0.000000] [<ffffffff80a00f8c>] start_kernel+0x5c4/0x8f6
[    0.000000] ---[ end trace f67eb9af4d8d492b ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

While ftrace is looping over a list of addresses to patch, it always failed
when patching the same function: riscv_cpuid_to_hartid_mask. Looking at the
backtrace, the illegal instruction is encountered in this same function.
However, patch_text_nosync, after patching the instructions, calls
flush_icache_range. But looking at what happens in this function:

flush_icache_range -> flush_icache_all
                   -> sbi_remote_fence_i
                   -> __sbi_rfence_v02
                   -> riscv_cpuid_to_hartid_mask

The icache and dcache of the current cpu are never synchronized between the
patching of riscv_cpuid_to_hartid_mask and calling this same function.

So fix this by flushing the current cpu's icache before asking for the other
cpus to do the same.

Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
---
 arch/riscv/mm/cacheflush.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Alexandre Ghiti Sept. 18, 2021, 4:05 p.m. UTC | #1
(+cc stable)

Le 18/09/2021 à 18:02, Alexandre Ghiti a écrit :
> On SiFive Unmatched, I recently fell onto the following BUG when booting:
> 
> [    0.000000] ftrace: allocating 36610 entries in 144 pages
> [    0.000000] Oops - illegal instruction [#1]
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.1+ #5
> [    0.000000] Hardware name: SiFive HiFive Unmatched A00 (DT)
> [    0.000000] epc : riscv_cpuid_to_hartid_mask+0x6/0xae
> [    0.000000]  ra : __sbi_rfence_v02+0xc8/0x10a
> [    0.000000] epc : ffffffff80007240 ra : ffffffff80009964 sp : ffffffff81803e10
> [    0.000000]  gp : ffffffff81a1ea70 tp : ffffffff8180f500 t0 : ffffffe07fe30000
> [    0.000000]  t1 : 0000000000000004 t2 : 0000000000000000 s0 : ffffffff81803e60
> [    0.000000]  s1 : 0000000000000000 a0 : ffffffff81a22238 a1 : ffffffff81803e10
> [    0.000000]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> [    0.000000]  a5 : 0000000000000000 a6 : ffffffff8000989c a7 : 0000000052464e43
> [    0.000000]  s2 : ffffffff81a220c8 s3 : 0000000000000000 s4 : 0000000000000000
> [    0.000000]  s5 : 0000000000000000 s6 : 0000000200000100 s7 : 0000000000000001
> [    0.000000]  s8 : ffffffe07fe04040 s9 : ffffffff81a22c80 s10: 0000000000001000
> [    0.000000]  s11: 0000000000000004 t3 : 0000000000000001 t4 : 0000000000000008
> [    0.000000]  t5 : ffffffcf04000808 t6 : ffffffe3ffddf188
> [    0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000002
> [    0.000000] [<ffffffff80007240>] riscv_cpuid_to_hartid_mask+0x6/0xae
> [    0.000000] [<ffffffff80009474>] sbi_remote_fence_i+0x1e/0x26
> [    0.000000] [<ffffffff8000b8f4>] flush_icache_all+0x12/0x1a
> [    0.000000] [<ffffffff8000666c>] patch_text_nosync+0x26/0x32
> [    0.000000] [<ffffffff8000884e>] ftrace_init_nop+0x52/0x8c
> [    0.000000] [<ffffffff800f051e>] ftrace_process_locs.isra.0+0x29c/0x360
> [    0.000000] [<ffffffff80a0e3c6>] ftrace_init+0x80/0x130
> [    0.000000] [<ffffffff80a00f8c>] start_kernel+0x5c4/0x8f6
> [    0.000000] ---[ end trace f67eb9af4d8d492b ]---
> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
> 
> While ftrace is looping over a list of addresses to patch, it always failed
> when patching the same function: riscv_cpuid_to_hartid_mask. Looking at the
> backtrace, the illegal instruction is encountered in this same function.
> However, patch_text_nosync, after patching the instructions, calls
> flush_icache_range. But looking at what happens in this function:
> 
> flush_icache_range -> flush_icache_all
>                     -> sbi_remote_fence_i
>                     -> __sbi_rfence_v02
>                     -> riscv_cpuid_to_hartid_mask
> 
> The icache and dcache of the current cpu are never synchronized between the
> patching of riscv_cpuid_to_hartid_mask and calling this same function.
> 
> So fix this by flushing the current cpu's icache before asking for the other
> cpus to do the same.
> 
> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
> ---
>   arch/riscv/mm/cacheflush.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index 094118663285..89f81067e09e 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -16,6 +16,8 @@ static void ipi_remote_fence_i(void *info)
>   
>   void flush_icache_all(void)
>   {
> +	local_flush_icache_all();
> +
>   	if (IS_ENABLED(CONFIG_RISCV_SBI))
>   		sbi_remote_fence_i(NULL);
>   	else
>
Greg KH Sept. 18, 2021, 4:17 p.m. UTC | #2
On Sat, Sep 18, 2021 at 06:05:09PM +0200, Alex Ghiti wrote:
> (+cc stable)
> 

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>
Atish Patra Sept. 19, 2021, 2:35 a.m. UTC | #3
On Sat, Sep 18, 2021 at 9:03 AM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> On SiFive Unmatched, I recently fell onto the following BUG when booting:
>
> [    0.000000] ftrace: allocating 36610 entries in 144 pages
> [    0.000000] Oops - illegal instruction [#1]
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.1+ #5
> [    0.000000] Hardware name: SiFive HiFive Unmatched A00 (DT)
> [    0.000000] epc : riscv_cpuid_to_hartid_mask+0x6/0xae
> [    0.000000]  ra : __sbi_rfence_v02+0xc8/0x10a
> [    0.000000] epc : ffffffff80007240 ra : ffffffff80009964 sp : ffffffff81803e10
> [    0.000000]  gp : ffffffff81a1ea70 tp : ffffffff8180f500 t0 : ffffffe07fe30000
> [    0.000000]  t1 : 0000000000000004 t2 : 0000000000000000 s0 : ffffffff81803e60
> [    0.000000]  s1 : 0000000000000000 a0 : ffffffff81a22238 a1 : ffffffff81803e10
> [    0.000000]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> [    0.000000]  a5 : 0000000000000000 a6 : ffffffff8000989c a7 : 0000000052464e43
> [    0.000000]  s2 : ffffffff81a220c8 s3 : 0000000000000000 s4 : 0000000000000000
> [    0.000000]  s5 : 0000000000000000 s6 : 0000000200000100 s7 : 0000000000000001
> [    0.000000]  s8 : ffffffe07fe04040 s9 : ffffffff81a22c80 s10: 0000000000001000
> [    0.000000]  s11: 0000000000000004 t3 : 0000000000000001 t4 : 0000000000000008
> [    0.000000]  t5 : ffffffcf04000808 t6 : ffffffe3ffddf188
> [    0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000002
> [    0.000000] [<ffffffff80007240>] riscv_cpuid_to_hartid_mask+0x6/0xae
> [    0.000000] [<ffffffff80009474>] sbi_remote_fence_i+0x1e/0x26
> [    0.000000] [<ffffffff8000b8f4>] flush_icache_all+0x12/0x1a
> [    0.000000] [<ffffffff8000666c>] patch_text_nosync+0x26/0x32
> [    0.000000] [<ffffffff8000884e>] ftrace_init_nop+0x52/0x8c
> [    0.000000] [<ffffffff800f051e>] ftrace_process_locs.isra.0+0x29c/0x360
> [    0.000000] [<ffffffff80a0e3c6>] ftrace_init+0x80/0x130
> [    0.000000] [<ffffffff80a00f8c>] start_kernel+0x5c4/0x8f6
> [    0.000000] ---[ end trace f67eb9af4d8d492b ]---
> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>
> While ftrace is looping over a list of addresses to patch, it always failed
> when patching the same function: riscv_cpuid_to_hartid_mask. Looking at the
> backtrace, the illegal instruction is encountered in this same function.
> However, patch_text_nosync, after patching the instructions, calls
> flush_icache_range. But looking at what happens in this function:
>
> flush_icache_range -> flush_icache_all
>                    -> sbi_remote_fence_i
>                    -> __sbi_rfence_v02
>                    -> riscv_cpuid_to_hartid_mask
>
> The icache and dcache of the current cpu are never synchronized between the
> patching of riscv_cpuid_to_hartid_mask and calling this same function.
>
> So fix this by flushing the current cpu's icache before asking for the other
> cpus to do the same.
>

Is this the same bugs described in this thread ?

http://lkml.iu.edu/hypermail/linux/kernel/2108.1/05465.html


> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
> ---
>  arch/riscv/mm/cacheflush.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index 094118663285..89f81067e09e 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -16,6 +16,8 @@ static void ipi_remote_fence_i(void *info)
>
>  void flush_icache_all(void)
>  {
> +       local_flush_icache_all();
> +
>         if (IS_ENABLED(CONFIG_RISCV_SBI))
>                 sbi_remote_fence_i(NULL);
>         else
> --
> 2.30.2
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Alexandre Ghiti Sept. 20, 2021, 5:03 a.m. UTC | #4
Hi Atish,

On 9/19/21 4:35 AM, Atish Patra wrote:
> On Sat, Sep 18, 2021 at 9:03 AM Alexandre Ghiti <alex@ghiti.fr> wrote:
>> On SiFive Unmatched, I recently fell onto the following BUG when booting:
>>
>> [    0.000000] ftrace: allocating 36610 entries in 144 pages
>> [    0.000000] Oops - illegal instruction [#1]
>> [    0.000000] Modules linked in:
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.1+ #5
>> [    0.000000] Hardware name: SiFive HiFive Unmatched A00 (DT)
>> [    0.000000] epc : riscv_cpuid_to_hartid_mask+0x6/0xae
>> [    0.000000]  ra : __sbi_rfence_v02+0xc8/0x10a
>> [    0.000000] epc : ffffffff80007240 ra : ffffffff80009964 sp : ffffffff81803e10
>> [    0.000000]  gp : ffffffff81a1ea70 tp : ffffffff8180f500 t0 : ffffffe07fe30000
>> [    0.000000]  t1 : 0000000000000004 t2 : 0000000000000000 s0 : ffffffff81803e60
>> [    0.000000]  s1 : 0000000000000000 a0 : ffffffff81a22238 a1 : ffffffff81803e10
>> [    0.000000]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
>> [    0.000000]  a5 : 0000000000000000 a6 : ffffffff8000989c a7 : 0000000052464e43
>> [    0.000000]  s2 : ffffffff81a220c8 s3 : 0000000000000000 s4 : 0000000000000000
>> [    0.000000]  s5 : 0000000000000000 s6 : 0000000200000100 s7 : 0000000000000001
>> [    0.000000]  s8 : ffffffe07fe04040 s9 : ffffffff81a22c80 s10: 0000000000001000
>> [    0.000000]  s11: 0000000000000004 t3 : 0000000000000001 t4 : 0000000000000008
>> [    0.000000]  t5 : ffffffcf04000808 t6 : ffffffe3ffddf188
>> [    0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000002
>> [    0.000000] [<ffffffff80007240>] riscv_cpuid_to_hartid_mask+0x6/0xae
>> [    0.000000] [<ffffffff80009474>] sbi_remote_fence_i+0x1e/0x26
>> [    0.000000] [<ffffffff8000b8f4>] flush_icache_all+0x12/0x1a
>> [    0.000000] [<ffffffff8000666c>] patch_text_nosync+0x26/0x32
>> [    0.000000] [<ffffffff8000884e>] ftrace_init_nop+0x52/0x8c
>> [    0.000000] [<ffffffff800f051e>] ftrace_process_locs.isra.0+0x29c/0x360
>> [    0.000000] [<ffffffff80a0e3c6>] ftrace_init+0x80/0x130
>> [    0.000000] [<ffffffff80a00f8c>] start_kernel+0x5c4/0x8f6
>> [    0.000000] ---[ end trace f67eb9af4d8d492b ]---
>> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
>> [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>>
>> While ftrace is looping over a list of addresses to patch, it always failed
>> when patching the same function: riscv_cpuid_to_hartid_mask. Looking at the
>> backtrace, the illegal instruction is encountered in this same function.
>> However, patch_text_nosync, after patching the instructions, calls
>> flush_icache_range. But looking at what happens in this function:
>>
>> flush_icache_range -> flush_icache_all
>>                    -> sbi_remote_fence_i
>>                    -> __sbi_rfence_v02
>>                    -> riscv_cpuid_to_hartid_mask
>>
>> The icache and dcache of the current cpu are never synchronized between the
>> patching of riscv_cpuid_to_hartid_mask and calling this same function.
>>
>> So fix this by flushing the current cpu's icache before asking for the other
>> cpus to do the same.
>>
> Is this the same bugs described in this thread ?
>
> http://lkml.iu.edu/hypermail/linux/kernel/2108.1/05465.html

My issue can be explained this way:

This is what objdump output looks like in my case:

ffffffff8000723a
<riscv_cpuid_to_hartid_mask>:                                   
ffffffff8000723a:       0001                   
nop                              
ffffffff8000723c:       0001                   
nop                              
ffffffff8000723e:       0001                   
nop                              
ffffffff80007240:       0001                    nop            
ffffffff80007242:       0001                   
nop                              
ffffffff80007244:       0001                   
nop                              
ffffffff80007246:       0001                   
nop                              
ffffffff80007248:       0001                    nop

This will get patched to:

ffffffff8000723a
<riscv_cpuid_to_hartid_mask>:                                   
ffffffff8000723a:      
0013                                                  
ffffffff8000723c:       0000                   
nop                              
ffffffff8000723e:      
0013                                                  
ffffffff80007240:       0000                    nop           
ffffffff80007242:      
0013                                                  
ffffffff80007244:       0000                   
nop                              
ffffffff80007246:      
0013                                                  
ffffffff80007248:       0000                    nop

But 0xffffffff80007240 is the beginning of a new cache line, so actually
for some reason, the following will be in icache:

ffffffff8000723a
<riscv_cpuid_to_hartid_mask>:                                   
ffffffff8000723a:       0001                   
nop                              
ffffffff8000723c:       0001                   
nop                              
ffffffff8000723e:       0001                   
nop                              
ffffffff80007240:       0000                    unimp <- new cache
line            
ffffffff80007242:      
0013                                                  
ffffffff80007244:       0000                   
nop                              
ffffffff80007246:      
0013                                                  
ffffffff80007248:       0000                    nop

0x0001 at 0xffffffff8000723e is executed as a nop but then 0x0000 is an
invalid instruction, hence the trap.

Regarding Dimitri issue, given the address where it traps, the code
objdumped would look like:

ffffffff80007200:       0001                    nop
ffffffff80007202:       0001                    nop
                             
ffffffff80007204:       0001                   
nop                              
ffffffff80007206:       0001                    nop
                             
ffffffff80007208:       0001                    nop
ffffffff8000720a:       0001                    nop
                             
ffffffff8000720c:       0001                    nop       
ffffffff8000720e:       0001                    nop

Once patched it looks like:

ffffffff80007200:       0013                    nop
ffffffff80007202:       0000
                                                 
ffffffff80007204:       0013                   
nop                              
ffffffff80007206:       0000
                                                 
ffffffff80007208:       0013                    nop
ffffffff8000720a:       0000                    
                             
ffffffff8000720c:       0013                    nop       
ffffffff8000720e:       0000                   

So if we imagine that until 0xffffffff8000720c, the code that is
executed is the first unpatched version and then at 0xffffffff8000720e,
the cache line gets reloaded, that can explain the illegal instruction.
But it's a bit far-fetched.

Actually I don't understand how the whole thing can work, because
between the moment the code is patched and other harts are asked to sync
their caches, there's a large window where what I explained above can
happen right?

Alex


>
>> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
>> ---
>>  arch/riscv/mm/cacheflush.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>> index 094118663285..89f81067e09e 100644
>> --- a/arch/riscv/mm/cacheflush.c
>> +++ b/arch/riscv/mm/cacheflush.c
>> @@ -16,6 +16,8 @@ static void ipi_remote_fence_i(void *info)
>>
>>  void flush_icache_all(void)
>>  {
>> +       local_flush_icache_all();
>> +
>>         if (IS_ENABLED(CONFIG_RISCV_SBI))
>>                 sbi_remote_fence_i(NULL);
>>         else
>> --
>> 2.30.2
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
>
Palmer Dabbelt Oct. 8, 2021, 12:40 a.m. UTC | #5
On Sat, 18 Sep 2021 09:02:21 PDT (-0700), alex@ghiti.fr wrote:
> On SiFive Unmatched, I recently fell onto the following BUG when booting:
>
> [    0.000000] ftrace: allocating 36610 entries in 144 pages
> [    0.000000] Oops - illegal instruction [#1]
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.1+ #5
> [    0.000000] Hardware name: SiFive HiFive Unmatched A00 (DT)
> [    0.000000] epc : riscv_cpuid_to_hartid_mask+0x6/0xae
> [    0.000000]  ra : __sbi_rfence_v02+0xc8/0x10a
> [    0.000000] epc : ffffffff80007240 ra : ffffffff80009964 sp : ffffffff81803e10
> [    0.000000]  gp : ffffffff81a1ea70 tp : ffffffff8180f500 t0 : ffffffe07fe30000
> [    0.000000]  t1 : 0000000000000004 t2 : 0000000000000000 s0 : ffffffff81803e60
> [    0.000000]  s1 : 0000000000000000 a0 : ffffffff81a22238 a1 : ffffffff81803e10
> [    0.000000]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> [    0.000000]  a5 : 0000000000000000 a6 : ffffffff8000989c a7 : 0000000052464e43
> [    0.000000]  s2 : ffffffff81a220c8 s3 : 0000000000000000 s4 : 0000000000000000
> [    0.000000]  s5 : 0000000000000000 s6 : 0000000200000100 s7 : 0000000000000001
> [    0.000000]  s8 : ffffffe07fe04040 s9 : ffffffff81a22c80 s10: 0000000000001000
> [    0.000000]  s11: 0000000000000004 t3 : 0000000000000001 t4 : 0000000000000008
> [    0.000000]  t5 : ffffffcf04000808 t6 : ffffffe3ffddf188
> [    0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000002
> [    0.000000] [<ffffffff80007240>] riscv_cpuid_to_hartid_mask+0x6/0xae
> [    0.000000] [<ffffffff80009474>] sbi_remote_fence_i+0x1e/0x26
> [    0.000000] [<ffffffff8000b8f4>] flush_icache_all+0x12/0x1a
> [    0.000000] [<ffffffff8000666c>] patch_text_nosync+0x26/0x32
> [    0.000000] [<ffffffff8000884e>] ftrace_init_nop+0x52/0x8c
> [    0.000000] [<ffffffff800f051e>] ftrace_process_locs.isra.0+0x29c/0x360
> [    0.000000] [<ffffffff80a0e3c6>] ftrace_init+0x80/0x130
> [    0.000000] [<ffffffff80a00f8c>] start_kernel+0x5c4/0x8f6
> [    0.000000] ---[ end trace f67eb9af4d8d492b ]---
> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>
> While ftrace is looping over a list of addresses to patch, it always failed
> when patching the same function: riscv_cpuid_to_hartid_mask. Looking at the
> backtrace, the illegal instruction is encountered in this same function.
> However, patch_text_nosync, after patching the instructions, calls
> flush_icache_range. But looking at what happens in this function:
>
> flush_icache_range -> flush_icache_all
>                    -> sbi_remote_fence_i
>                    -> __sbi_rfence_v02
>                    -> riscv_cpuid_to_hartid_mask
>
> The icache and dcache of the current cpu are never synchronized between the
> patching of riscv_cpuid_to_hartid_mask and calling this same function.
>
> So fix this by flushing the current cpu's icache before asking for the other
> cpus to do the same.
>
> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
> ---
>  arch/riscv/mm/cacheflush.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index 094118663285..89f81067e09e 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -16,6 +16,8 @@ static void ipi_remote_fence_i(void *info)
>
>  void flush_icache_all(void)
>  {
> +	local_flush_icache_all();
> +
>  	if (IS_ENABLED(CONFIG_RISCV_SBI))
>  		sbi_remote_fence_i(NULL);
>  	else

Thanks, this is on fixes (with the tags for stable)
diff mbox series

Patch

diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 094118663285..89f81067e09e 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -16,6 +16,8 @@  static void ipi_remote_fence_i(void *info)
 
 void flush_icache_all(void)
 {
+	local_flush_icache_all();
+
 	if (IS_ENABLED(CONFIG_RISCV_SBI))
 		sbi_remote_fence_i(NULL);
 	else