diff mbox

KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.

Message ID 20180622111614.GA1150@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Will Deacon June 22, 2018, 11:16 a.m. UTC
Hi Wei,

Thanks for giving that a spin.

On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
> On 2018/6/22 17:23, Will Deacon wrote:
> >On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
> >>On 2018/6/21 11:54, Will Deacon wrote:
> >>>On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
> >>>>On 2018/6/21 10:18, Will Deacon wrote:
> >>>>>Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> >>>>>otherwise your kernel will take an age to boot.
> >>>>Yes, amazing! This patch resolved the issue.
> >>>Great...
> >>>
> >>>>I have tested 50 times and can not reproduce the issue any more.
> >>>>Could you please tell more why this patch works?
> >>>You might need to ask your CPU design team ;)
> >>>
> >>>Without this patch, the code in idmap_kpti_install_ng_mappings() sets
> >>>bit 11 in table descriptors so that we can keep track of which parts of
> >>>the page table we've visited. With this patch, we don't bother tracking
> >>>and potentially rewalk parts of the page table (which takes a very long
> >>>time if KASAN is enabled).
> >>Got it. Thanks!
> >>
> >>>The architecture documents I've looked at are clear that bit 11 is IGNORED
> >>>by the CPU, which:
> >>>
> >>>   "Indicates that the architecture guarantees that the bit or field is not
> >>>    interpreted or modified by hardware."
> >>>
> >>>Please can you double-check that your CPU is indeed ignoring bit 11 in
> >>>non-leaf (table) descriptors?
> >>Do the non-leaf(table) descriptors mean the table descriptors
> >>of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
> >>in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
> >>
> >>If yes, our hardware does ignore it(not interpret or modify).
> >Ok, thanks for checking.
> >
> >>Is there any other possible reason cause this?
> >Perhaps just writing back the table entries is enough to cause the issue,
> >although I really can't understand why that would be the case. Can you try
> >the diff below (without my previous change), please?
> 
> Thanks!
> But it does not resolve the issue(only apply this patch based on 4.17.0).

Thanks, that's a useful data point. It means that it still crashes even if
we write back the same table entries, so it's the fact that we're writing
them at all which causes the problem, not the value that we write.

Whilst looking at the code, we noticed a missing DMB. On the off-chance
that it helps, can you try this instead please?

Will

--->8

Comments

Wei Xu June 22, 2018, 1:18 p.m. UTC | #1
Hi Will,

On 2018/6/22 19:16, Will Deacon wrote:
> Hi Wei,
>
> Thanks for giving that a spin.
>
> On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
>> On 2018/6/22 17:23, Will Deacon wrote:
>>> On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
>>>> On 2018/6/21 11:54, Will Deacon wrote:
>>>>> On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
>>>>>> On 2018/6/21 10:18, Will Deacon wrote:
>>>>>>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
>>>>>>> otherwise your kernel will take an age to boot.
>>>>>> Yes, amazing! This patch resolved the issue.
>>>>> Great...
>>>>>
>>>>>> I have tested 50 times and can not reproduce the issue any more.
>>>>>> Could you please tell more why this patch works?
>>>>> You might need to ask your CPU design team ;)
>>>>>
>>>>> Without this patch, the code in idmap_kpti_install_ng_mappings() sets
>>>>> bit 11 in table descriptors so that we can keep track of which parts of
>>>>> the page table we've visited. With this patch, we don't bother tracking
>>>>> and potentially rewalk parts of the page table (which takes a very long
>>>>> time if KASAN is enabled).
>>>> Got it. Thanks!
>>>>
>>>>> The architecture documents I've looked at are clear that bit 11 is IGNORED
>>>>> by the CPU, which:
>>>>>
>>>>>    "Indicates that the architecture guarantees that the bit or field is not
>>>>>     interpreted or modified by hardware."
>>>>>
>>>>> Please can you double-check that your CPU is indeed ignoring bit 11 in
>>>>> non-leaf (table) descriptors?
>>>> Do the non-leaf(table) descriptors mean the table descriptors
>>>> of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
>>>> in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
>>>>
>>>> If yes, our hardware does ignore it(not interpret or modify).
>>> Ok, thanks for checking.
>>>
>>>> Is there any other possible reason cause this?
>>> Perhaps just writing back the table entries is enough to cause the issue,
>>> although I really can't understand why that would be the case. Can you try
>>> the diff below (without my previous change), please?
>> Thanks!
>> But it does not resolve the issue(only apply this patch based on 4.17.0).
> Thanks, that's a useful data point. It means that it still crashes even if
> we write back the same table entries, so it's the fact that we're writing
> them at all which causes the problem, not the value that we write.
>
> Whilst looking at the code, we noticed a missing DMB. On the off-chance
> that it helps, can you try this instead please?
Thanks!
Only apply below patch based on 4.17.0, we still got the crash.
The log is as below nearly same with before.

     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
     [    0.000000] Linux version 4.17.0-45864-g29dcea8-dirty 
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #16 
SMP PREEMPT Fri Jun 22 21:05:10 CST 2018
     [    0.000000] Machine model: linux,dummy-virt
     [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
     [    0.000000] bootconsole [pl11] enabled
     [    0.000000] efi: Getting EFI parameters from FDT:
     [    0.000000] efi: UEFI not found.
     [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
     [    0.000000] NUMA: No NUMA configuration found
     [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
     [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
     [    0.000000] Zone ranges:
     [    0.000000]   DMA32    [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000]   Normal   empty
     [    0.000000] Movable zone start for each node
     [    0.000000] Early memory node ranges
     [    0.000000]   node   0: [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
     [    0.000000] psci: probing for conduit method from DT.
     [    0.000000] psci: PSCIv1.0 detected in firmware.
     [    0.000000] psci: Using standard PSCI v0.2 function IDs
     [    0.000000] psci: Trusted OS migration not required
     [    0.000000] psci: SMC Calling Convention v1.1
     [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
     [    0.000000] percpu: Embedded 24 pages/cpu @        (ptrval) 
s57984 r8192 d32128 u98304
     [    0.000000] Detected VIPT I-cache on CPU0
     [    0.000000] CPU features: detected: Kernel page table isolation 
(KPTI)
     [    0.000000] CPU features: detected: Hardware dirty bit management
     [    0.000000] Built 1 zonelists, mobility grouping on.  Total 
pages: 258048
     [    0.000000] Policy zone: DMA32
     [    0.000000] Kernel command line: rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000
     [    0.000000] Memory: 968436K/1048576K available (10044K kernel 
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved, 
16384K cma-reserved)
     [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, 
Nodes=1
     [    0.000000] Preemptible hierarchical RCU implementation.
     [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
     [    0.000000]     Tasks RCU enabled.
     [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
     [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
     [    0.000000] GICv3: Distributor has no Range Selector support
     [    0.000000] GICv3: no VLPI support, no direct LPI support
     [    0.000000] ITS [mem 0x08080000-0x0809ffff]
     [    0.000000] ITS@0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
     [    0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
     [    0.000000] GIC: using LPI property table @0x000000007d850000
     [    0.000000] ITS: Allocated 1792 chunks for LPIs
     [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
     [    0.000000] CPU0: using LPI pending table @0x000000007d860000
     [    0.000000] GIC: PPI11 is secure or misconfigured
     [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
     [    0.000000] arch_timer: WARNING: Please fix your firmware
     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
     [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
     [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
     [    0.000849] Console: colour dummy device 80x25
     [    0.001427] Calibrating delay loop (skipped), value calculated 
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
     [    0.002485] pid_max: default: 32768 minimum: 301
     [    0.002966] Security Framework initialized
     [    0.003549] Dentry cache hash table entries: 131072 (order: 8, 
1048576 bytes)
     [    0.004353] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
     [    0.005068] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.005858] Mountpoint-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.025962] ASID allocator initialised with 32768 entries
     [    0.029972] Hierarchical SRCU implementation.
     [    0.034341] Platform MSI: its domain created
     [    0.034793] PCI/MSI: /intc/its domain created
     [    0.035360] EFI services will not be available.
     [    0.038002] smp: Bringing up secondary CPUs ...
     [    0.038472] smp: Brought up 1 node, 1 CPU
     [    0.038878] SMP: Total of 1 processors activated.
     [    0.039354] CPU features: detected: GIC system register CPU 
interface
     [    0.040004] CPU features: detected: Privileged Access Never
     [    0.040566] CPU features: detected: User Access Override
     [    0.042462] Insufficient stack space to handle exception!
     [    0.042464] ESR: 0x96000046 -- DABT (current EL)
     [    0.043781] FAR: 0xffff0000093a80e0
     [    0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
     [    0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
     [    0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
     [    0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
     [    0.067946] Hardware name: linux,dummy-virt (DT)
     [    0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
     [    0.077480] pc : el1_sync+0x0/0xb0
     [    0.080970] lr : kpti_install_ng_mappings+0x120/0x214
     [    0.086143] sp : ffff0000093a80e0
     [    0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
     [    0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
     [    0.100241] x25: ffff00000906d000 x24: ffff000009191000
     [    0.105657] x23: ffff000008ea9000 x22: 0000000041190000
     [    0.111448] x21: ffff0000091f7000 x20: 0000000000000000
     [    0.116437] x19: ffff000009190000 x18: 000000003455d99d
     [    0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
     [    0.127155] x15: 000000007eff6000 x14: 000000007eff6000
     [    0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
     [    0.137886] x11: 000000007eff8000 x10: 0000000000000000
     [    0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
     [    0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
     [    0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
     [    0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
     [    0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
     [    0.170179] Kernel panic - not syncing: kernel stack overflow
     [    0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
     [    0.184152] Hardware name: linux,dummy-virt (DT)
     [    0.188851] Call trace:
     [    0.191380]  dump_backtrace+0x0/0x180
     [    0.195113]  show_stack+0x14/0x1c
     [    0.198488]  dump_stack+0x90/0xb0
     [    0.201862]  panic+0x138/0x2a0
     [    0.204989]  __stack_chk_fail+0x0/0x18
     [    0.208836]  handle_bad_stack+0x118/0x124
     [    0.212927]  __bad_stack+0x88/0x8c
     [    0.216414]  el1_sync+0x0/0xb0
     [    0.219544] Unable to handle kernel paging request at virtual 
address ffff0000093abce0
     [    0.227507] Mem abort info:
     [    0.230390]   ESR = 0x96000006
     [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
     [    0.239428]   SET = 0, FnV = 0
     [    0.242555]   EA = 0, S1PTW = 0
     [    0.245797] Data abort info:
     [    0.248795]   ISV = 0, ISS = 0x00000006
     [    0.252652]   CM = 0, WnR = 0
     [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
     [    0.262645] [ffff0000093abce0] pgd=00000000411f8803, 
pud=00000000411f9803, pmd=0000000000000000

Best Regards,
Wei

> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..03646e6a2ef4 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -217,8 +217,9 @@ ENDPROC(idmap_cpu_replace_ttbr1)
>   
>   	.macro __idmap_kpti_put_pgtable_ent_ng, type
>   	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
> -	str	\type, [cur_\()\type\()p]	// Update the entry and ensure it
> -	dc	civac, cur_\()\type\()p		// is visible to all CPUs.
> +	str	\type, [cur_\()\type\()p]	// Update the entry and ensure
> +	dmb	sy				// that it is visible to all
> +	dc	civac, cur_\()\type\()p		// CPUs.
>   	.endm
>   
>   /*
>
> .
>
Will Deacon June 22, 2018, 1:31 p.m. UTC | #2
Hi again, Wei,

On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> On 2018/6/22 19:16, Will Deacon wrote:
> >On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
> >>On 2018/6/22 17:23, Will Deacon wrote:
> >>>Perhaps just writing back the table entries is enough to cause the issue,
> >>>although I really can't understand why that would be the case. Can you try
> >>>the diff below (without my previous change), please?
> >>Thanks!
> >>But it does not resolve the issue(only apply this patch based on 4.17.0).
> >Thanks, that's a useful data point. It means that it still crashes even if
> >we write back the same table entries, so it's the fact that we're writing
> >them at all which causes the problem, not the value that we write.
> >
> >Whilst looking at the code, we noticed a missing DMB. On the off-chance
> >that it helps, can you try this instead please?
> Thanks!
> Only apply below patch based on 4.17.0, we still got the crash.

Oh well, it was worth a shot (and that's still a fix worth having). Please
can you provide the complete disassembly for kpti_install_ng_mappings()
(I'm referring to the C function in cpufeature.c) along with a corresponding
crash log so that we can correlate the instruction stream with the crash?

Thanks,

Will
Wei Xu June 22, 2018, 1:46 p.m. UTC | #3
Hi Will,

On 2018/6/22 21:31, Will Deacon wrote:
> Hi again, Wei,
>
> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>> On 2018/6/22 19:16, Will Deacon wrote:
>>> On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
>>>> On 2018/6/22 17:23, Will Deacon wrote:
>>>>> Perhaps just writing back the table entries is enough to cause the issue,
>>>>> although I really can't understand why that would be the case. Can you try
>>>>> the diff below (without my previous change), please?
>>>> Thanks!
>>>> But it does not resolve the issue(only apply this patch based on 4.17.0).
>>> Thanks, that's a useful data point. It means that it still crashes even if
>>> we write back the same table entries, so it's the fact that we're writing
>>> them at all which causes the problem, not the value that we write.
>>>
>>> Whilst looking at the code, we noticed a missing DMB. On the off-chance
>>> that it helps, can you try this instead please?
>> Thanks!
>> Only apply below patch based on 4.17.0, we still got the crash.
> Oh well, it was worth a shot (and that's still a fix worth having). Please
> can you provide the complete disassembly for kpti_install_ng_mappings()
> (I'm referring to the C function in cpufeature.c) along with a corresponding
> crash log so that we can correlate the instruction stream with the crash?
Just let me know if you need more information.
Thanks!

The disassemble code is as below:
     Dump of assembler code for function kpti_install_ng_mappings:
        0xffff000008091d68 <+0>:     stp     x29, x30, [sp,#-112]!
        0xffff000008091d6c <+4>:     adrp    x0, 0xffff000009022000 
<bp_hardening_data>
        0xffff000008091d70 <+8>:     mov     x29, sp
        0xffff000008091d74 <+12>:    stp     x23, x24, [sp,#48]
        0xffff000008091d78 <+16>:    adrp    x24, 0xffff000009191000 
<reset_devices>
        0xffff000008091d7c <+20>:    add     x0, x0, #0x10
        0xffff000008091d80 <+24>:    add     x1, x24, #0x550
        0xffff000008091d84 <+28>:    stp     x19, x20, [sp,#16]
        0xffff000008091d88 <+32>:    stp     x21, x22, [sp,#32]
        0xffff000008091d8c <+36>:    stp     x25, x26, [sp,#64]
        0xffff000008091d90 <+40>:    stp     x27, x28, [sp,#80]
        0xffff000008091d94 <+44>:    mrs     x2, tpidr_el1
        0xffff000008091d98 <+48>:    ldrb    w1, [x1,#8]
        0xffff000008091d9c <+52>:    ldr     w20, [x2,x0]
        0xffff000008091da0 <+56>:    cbnz    w1, 0xffff000008091f18 
<kpti_install_ng_mappings+432>
        0xffff000008091da4 <+60>:    adrp    x27, 0xffff000008ea9000 
<cpu_ops+384>
        0xffff000008091da8 <+64>:    adrp    x19, 0xffff000009190000 
<empty_zero_page>
        0xffff000008091dac <+68>:    add     x19, x19, #0x0
        0xffff000008091db0 <+72>:    adrp    x1, 0xffff000008a44000 
<kimage_vaddr>
        0xffff000008091db4 <+76>:    mov     x0, x19
        0xffff000008091db8 <+80>:    add     x1, x1, #0x3d8
        0xffff000008091dbc <+84>:    ldr     x2, [x27,#672]
        0xffff000008091dc0 <+88>:    sub     x4, x1, x2
        0xffff000008091dc4 <+92>:    sub     x0, x0, x2
        0xffff000008091dc8 <+96>:    msr     ttbr0_el1, x0
        0xffff000008091dcc <+100>:   isb
        0xffff000008091dd0 <+104>:   dsb     nshst
        0xffff000008091dd4 <+108>:   tlbi    vmalle1
        0xffff000008091dd8 <+112>:   nop
        0xffff000008091ddc <+116>:   nop
        0xffff000008091de0 <+120>:   dsb     nsh
        0xffff000008091de4 <+124>:   isb
        0xffff000008091de8 <+128>:   adrp    x3, 0xffff000009056000 
<armv8_event_attr_sw_incr+8>
        0xffff000008091dec <+132>:   ldr     x0, [x3,#2248]
        0xffff000008091df0 <+136>:   cmp     x0, #0x10
        0xffff000008091df4 <+140>:   b.ne    0xffff000008091f64 
<kpti_install_ng_mappings+508>
        0xffff000008091df8 <+144>:   adrp    x28, 0xffff000008ea9000 
<cpu_ops+384>
        0xffff000008091dfc <+148>:   ldr     x2, [x27,#672]
        0xffff000008091e00 <+152>:   adrp    x1, 0xffff0000091f3000
        0xffff000008091e04 <+156>:   adrp    x26, 0xffff0000091f7000
        0xffff000008091e08 <+160>:   add     x1, x1, #0x0
        0xffff000008091e0c <+164>:   add     x21, x26, #0x0
        0xffff000008091e10 <+168>:   ldr     x0, [x28,#656]
        0xffff000008091e14 <+172>:   adrp    x23, 0xffff000008ea9000 
<cpu_ops+384>
        0xffff000008091e18 <+176>:   sub     x1, x1, x2
        0xffff000008091e1c <+180>:   sub     x1, x1, x0
        0xffff000008091e20 <+184>:   orr     x0, x1, #0xffff800000000000
        0xffff000008091e24 <+188>:   cmp     x0, x21
        0xffff000008091e28 <+192>:   b.eq    0xffff000008091f60 
<kpti_install_ng_mappings+504>
        0xffff000008091e2c <+196>:   mov     x22, x19
        0xffff000008091e30 <+200>:   str     x3, [x29,#96]
        0xffff000008091e34 <+204>:   str     x4, [x29,#104]
        0xffff000008091e38 <+208>:   sub     x2, x22, x2
        0xffff000008091e3c <+212>:   msr     ttbr0_el1, x2
        0xffff000008091e40 <+216>:   isb
        0xffff000008091e44 <+220>:   ldr     x0, [x28,#656]
        0xffff000008091e48 <+224>:   and     x1, x1, #0x7fffffffffff
        0xffff000008091e4c <+228>:   adrp    x25, 0xffff00000906d000 
<shmem_swaplist_mutex+16>
        0xffff000008091e50 <+232>:   add     x0, x1, x0
        0xffff000008091e54 <+236>:   add     x1, x25, #0x7b0
        0xffff000008091e58 <+240>:   bl      0xffff0000080a021c 
<cpu_do_switch_mm>
        0xffff000008091e5c <+244>:   adrp    x0, 0xffff00000904a000 
<__cpu_online_mask>
        0xffff000008091e60 <+248>:   mov     w1, 
#0x80                       // #128
        0xffff000008091e64 <+252>:   add     x0, x0, #0x0
        0xffff000008091e68 <+256>:   bl      0xffff0000083e22f0 
<__bitmap_weight>
        0xffff000008091e6c <+260>:   mov     w1, w0
        0xffff000008091e70 <+264>:   ldr     x5, [x23,#672]
        0xffff000008091e74 <+268>:   mov     w0, w20
        0xffff000008091e78 <+272>:   ldr     x4, [x29,#104]
        0xffff000008091e7c <+276>:   mov     x2, x21
        0xffff000008091e80 <+280>:   sub     x2, x2, x5
        0xffff000008091e84 <+284>:   blr     x4
        0xffff000008091e88 <+288>:   ldr     x1, [x23,#672]
        0xffff000008091e8c <+292>:   mrs     x0, sp_el0
        0xffff000008091e90 <+296>:   sub     x22, x22, x1
        0xffff000008091e94 <+300>:   ldr     x1, [x0,#1128]
        0xffff000008091e98 <+304>:   msr     ttbr0_el1, x22
        0xffff000008091e9c <+308>:   isb
        0xffff000008091ea0 <+312>:   dsb     nshst
        0xffff000008091ea4 <+316>:   tlbi    vmalle1
        0xffff000008091ea8 <+320>:   nop
        0xffff000008091eac <+324>:   nop
        0xffff000008091eb0 <+328>:   dsb     nsh
        0xffff000008091eb4 <+332>:   isb
        0xffff000008091eb8 <+336>:   ldr     x3, [x29,#96]
        0xffff000008091ebc <+340>:   ldr     x0, [x3,#2248]
        0xffff000008091ec0 <+344>:   cmp     x0, #0x10
        0xffff000008091ec4 <+348>:   b.ne    0xffff000008091f48 
<kpti_install_ng_mappings+480>
        0xffff000008091ec8 <+352>:   add     x25, x25, #0x7b0
        0xffff000008091ecc <+356>:   cmp     x1, x25
        0xffff000008091ed0 <+360>:   b.eq    0xffff000008091f08 
<kpti_install_ng_mappings+416>
        0xffff000008091ed4 <+364>:   ldr     x2, [x1,#64]
        0xffff000008091ed8 <+368>:   add     x26, x26, #0x0
        0xffff000008091edc <+372>:   cmp     x2, x26
        0xffff000008091ee0 <+376>:   b.eq    0xffff000008091f60 
<kpti_install_ng_mappings+504>
        0xffff000008091ee4 <+380>:   ldr     x0, [x27,#672]
        0xffff000008091ee8 <+384>:   sub     x19, x19, x0
        0xffff000008091eec <+388>:   msr     ttbr0_el1, x19
        0xffff000008091ef0 <+392>:   isb
        0xffff000008091ef4 <+396>:   tbz     x2, #47, 0xffff000008091f34 
<kpti_install_ng_mappings+460>
        0xffff000008091ef8 <+400>:   ldr     x0, [x28,#656]
        0xffff000008091efc <+404>:   and     x2, x2, #0x7fffffffffff
        0xffff000008091f00 <+408>:   add     x0, x2, x0
        0xffff000008091f04 <+412>:   bl      0xffff0000080a021c 
<cpu_do_switch_mm>
        0xffff000008091f08 <+416>:   cbnz    w20, 0xffff000008091f18 
<kpti_install_ng_mappings+432>
        0xffff000008091f0c <+420>:   add     x24, x24, #0x550
        0xffff000008091f10 <+424>:   mov     w0, 
#0x1                        // #1
        0xffff000008091f14 <+428>:   strb    w0, [x24,#8]
        0xffff000008091f18 <+432>:   ldp     x19, x20, [sp,#16]
        0xffff000008091f1c <+436>:   ldp     x21, x22, [sp,#32]
        0xffff000008091f20 <+440>:   ldp     x23, x24, [sp,#48]
        0xffff000008091f24 <+444>:   ldp     x25, x26, [sp,#64]
        0xffff000008091f28 <+448>:   ldp     x27, x28, [sp,#80]
        0xffff000008091f2c <+452>:   ldp     x29, x30, [sp],#112
        0xffff000008091f30 <+456>:   ret
        0xffff000008091f34 <+460>:   adrp    x0, 0xffff000008ea9000 
<cpu_ops+384>
        0xffff000008091f38 <+464>:   ldr     x0, [x0,#672]
        0xffff000008091f3c <+468>:   sub     x0, x2, x0
        0xffff000008091f40 <+472>:   bl      0xffff0000080a021c 
<cpu_do_switch_mm>
        0xffff000008091f44 <+476>:   b       0xffff000008091f08 
<kpti_install_ng_mappings+416>
        0xffff000008091f48 <+480>:   mrs     x0, tcr_el1
        0xffff000008091f4c <+484>:   and     x0, x0, #0xffffffffffffffc0
        0xffff000008091f50 <+488>:   orr     x0, x0, #0x10
        0xffff000008091f54 <+492>:   msr     tcr_el1, x0
        0xffff000008091f58 <+496>:   isb
        0xffff000008091f5c <+500>:   b       0xffff000008091ec8 
<kpti_install_ng_mappings+352>
        0xffff000008091f60 <+504>:   brk     #0x800
        0xffff000008091f64 <+508>:   mrs     x1, tcr_el1
        0xffff000008091f68 <+512>:   and     x1, x1, #0xffffffffffffffc0
        0xffff000008091f6c <+516>:   orr     x0, x1, x0
        0xffff000008091f70 <+520>:   msr     tcr_el1, x0
        0xffff000008091f74 <+524>:   isb
        0xffff000008091f78 <+528>:   b       0xffff000008091df8 
<kpti_install_ng_mappings+144>
     End of assembler dump.


The crash log for it is as :
     estuary:/$ ./qemu-system-aarch64 -machine 
virt,kernel_irqchip=on,gic-version=3
      -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.17-joyx 
-initrd
     ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init 
console=ttyAMA0 earlycon=pl011,0x9000000"
         [    0.000000] Booting Linux on physical CPU 0x0000000000 
[0x480fd010]
         [    0.000000] Linux version 4.17.0-45864-g29dcea8-dirty 
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #16 
SMP PREEMPT Fri Jun 22 21:05:10 CST 2018
         [    0.000000] Machine model: linux,dummy-virt
         [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 
(options '')
         [    0.000000] bootconsole [pl11] enabled
         [    0.000000] efi: Getting EFI parameters from FDT:
         [    0.000000] efi: UEFI not found.
         [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
         [    0.000000] NUMA: No NUMA configuration found
         [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
         [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
         [    0.000000] Zone ranges:
         [    0.000000]   DMA32    [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000]   Normal   empty
         [    0.000000] Movable zone start for each node
         [    0.000000] Early memory node ranges
         [    0.000000]   node   0: [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000] psci: probing for conduit method from DT.
         [    0.000000] psci: PSCIv1.0 detected in firmware.
         [    0.000000] psci: Using standard PSCI v0.2 function IDs
         [    0.000000] psci: Trusted OS migration not required
         [    0.000000] psci: SMC Calling Convention v1.1
         [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
         [    0.000000] percpu: Embedded 24 pages/cpu @ (ptrval) s57984 
r8192 d32128 u98304
         [    0.000000] Detected VIPT I-cache on CPU0
         [    0.000000] CPU features: detected: Kernel page table 
isolation (KPTI)
         [    0.000000] CPU features: detected: Hardware dirty bit 
management
         [    0.000000] Built 1 zonelists, mobility grouping on. Total 
pages: 258048
         [    0.000000] Policy zone: DMA32
         [    0.000000] Kernel command line: rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000
         [    0.000000] Memory: 968436K/1048576K available (10044K 
kernel code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K 
reserved, 16384K cma-reserved)
         [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, 
CPUs=1, Nodes=1
         [    0.000000] Preemptible hierarchical RCU implementation.
         [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
         [    0.000000]     Tasks RCU enabled.
         [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
         [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
         [    0.000000] GICv3: Distributor has no Range Selector support
         [    0.000000] GICv3: no VLPI support, no direct LPI support
         [    0.000000] ITS [mem 0x08080000-0x0809ffff]
         [    0.000000] ITS@0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
         [    0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
         [    0.000000] GIC: using LPI property table @0x000000007d850000
         [    0.000000] ITS: Allocated 1792 chunks for LPIs
         [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
         [    0.000000] CPU0: using LPI pending table @0x000000007d860000
         [    0.000000] GIC: PPI11 is secure or misconfigured
         [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
         [    0.000000] arch_timer: WARNING: Please fix your firmware
         [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz 
(virt).
         [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
         [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
         [    0.000849] Console: colour dummy device 80x25
         [    0.001427] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 200.00 BogoMIPS (lpj=400000)
         [    0.002485] pid_max: default: 32768 minimum: 301
         [    0.002966] Security Framework initialized
         [    0.003549] Dentry cache hash table entries: 131072 (order: 
8, 1048576 bytes)
         [    0.004353] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
         [    0.005068] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
         [    0.005858] Mountpoint-cache hash table entries: 2048 
(order: 2, 16384 bytes)
         [    0.025962] ASID allocator initialised with 32768 entries
         [    0.029972] Hierarchical SRCU implementation.
         [    0.034341] Platform MSI: its domain created
         [    0.034793] PCI/MSI: /intc/its domain created
         [    0.035360] EFI services will not be available.
         [    0.038002] smp: Bringing up secondary CPUs ...
         [    0.038472] smp: Brought up 1 node, 1 CPU
         [    0.038878] SMP: Total of 1 processors activated.
         [    0.039354] CPU features: detected: GIC system register CPU 
interface
         [    0.040004] CPU features: detected: Privileged Access Never
         [    0.040566] CPU features: detected: User Access Override
         [    0.042462] Insufficient stack space to handle exception!
         [    0.042464] ESR: 0x96000046 -- DABT (current EL)
         [    0.043781] FAR: 0xffff0000093a80e0
         [    0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
         [    0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
         [    0.053361] Overflow stack: 
[0xffff80003efce2f0..0xffff80003efcf2f0]
         [    0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
         [    0.067946] Hardware name: linux,dummy-virt (DT)
         [    0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
         [    0.077480] pc : el1_sync+0x0/0xb0
         [    0.080970] lr : kpti_install_ng_mappings+0x120/0x214
         [    0.086143] sp : ffff0000093a80e0
         [    0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
         [    0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
         [    0.100241] x25: ffff00000906d000 x24: ffff000009191000
         [    0.105657] x23: ffff000008ea9000 x22: 0000000041190000
         [    0.111448] x21: ffff0000091f7000 x20: 0000000000000000
         [    0.116437] x19: ffff000009190000 x18: 000000003455d99d
         [    0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
         [    0.127155] x15: 000000007eff6000 x14: 000000007eff6000
         [    0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
         [    0.137886] x11: 000000007eff8000 x10: 0000000000000000
         [    0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
         [    0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
         [    0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
         [    0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
         [    0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
         [    0.170179] Kernel panic - not syncing: kernel stack overflow
         [    0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
         [    0.184152] Hardware name: linux,dummy-virt (DT)
         [    0.188851] Call trace:
         [    0.191380]  dump_backtrace+0x0/0x180
         [    0.195113]  show_stack+0x14/0x1c
         [    0.198488]  dump_stack+0x90/0xb0
         [    0.201862]  panic+0x138/0x2a0
         [    0.204989]  __stack_chk_fail+0x0/0x18
         [    0.208836]  handle_bad_stack+0x118/0x124
         [    0.212927]  __bad_stack+0x88/0x8c
         [    0.216414]  el1_sync+0x0/0xb0
         [    0.219544] Unable to handle kernel paging request at 
virtual address ffff0000093abce0
         [    0.227507] Mem abort info:
         [    0.230390]   ESR = 0x96000006
         [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
         [    0.239428]   SET = 0, FnV = 0
         [    0.242555]   EA = 0, S1PTW = 0
         [    0.245797] Data abort info:
         [    0.248795]   ISV = 0, ISS = 0x00000006
         [    0.252652]   CM = 0, WnR = 0
         [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
         [    0.262645] [ffff0000093abce0] pgd=00000000411f8803, 
pud=00000000411f9803, pmd=0000000000000000
         [    0.271438] Internal error: Oops: 96000006 [#1] PREEMPT SMP
         [    0.277098] Modules linked in:
         [    0.280227] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
         [    0.288310] Hardware name: linux,dummy-virt (DT)
         [    0.293004] pstate: 204003c5 (nzCv DAIF +PAN -UAO)
         [    0.297931] pc : unwind_frame+0x28/0xc8
         [    0.301792] lr : dump_backtrace+0x12c/0x180
         [    0.306114] sp : ffff80003efcf000
         [    0.309483] x29: ffff80003efcf000 x28: ffff80003da61c00
         [    0.314798] x27: ffff000008ea9000 x26: ffff0000091f7000
         [    0.320216] x25: ffff00000906d000 x24: ffff0000093a80e0
         [    0.325527] x23: 0000000000000000 x22: ffff000008dbada8
         [    0.330941] x21: 0000000000000000 x20: ffff000009049000
         [    0.336355] x19: ffff80003da61c00 x18: 000000003455d99d
         [    0.341770] x17: 0000000000000001 x16: 00f8000040ffff13
         [    0.347078] x15: 000000007eff6000 x14: 642d386165636439
         [    0.352491] x13: 0000000000000000 x12: cc26f77952f87e00
         [    0.357905] x11: ffffffffffffffff x10: 0000000000000075
         [    0.363214] x9 : ffff0000085ae9e8 x8 : ffff80003efcec90
         [    0.368628] x7 : 0000000000000000 x6 : ffff0000091befe1
         [    0.374053] x5 : 0000000000000000 x4 : ffff0000093ac000
         [    0.379363] x3 : ffff0000093a8000 x2 : ffff0000093abce0
         [    0.384779] x1 : ffff80003efcf048 x0 : ffff80003da61c00
         [    0.390196] Process migration/0 (pid: 12, stack limit = 
0x        (ptrval))
         [    0.397188] Call trace:
         [    0.399712]  unwind_frame+0x28/0xc8
         [    0.403316]  show_stack+0x14/0x1c
         [    0.406689]  dump_stack+0x90/0xb0
         [    0.410065]  panic+0x138/0x2a0
         [    0.413193]  __stack_chk_fail+0x0/0x18
         [    0.416934]  handle_bad_stack+0x118/0x124
         [    0.421025]  __bad_stack+0x88/0x8c
         [    0.424513]  el1_sync+0x0/0xb0
         [    0.427643] Unable to handle kernel paging request at 
virtual address ffff0000093abce0
         [    0.435604] Mem abort info:
         [    0.438488]   ESR = 0x96000006
         [    0.441615]   Exception class = DABT (current EL), IL = 32 bits
         [    0.447635]   SET = 0, FnV = 0
         [    0.450759]   EA = 0, S1PTW = 0
         [    0.454002] Data abort info:
         [    0.456896]   ISV = 0, ISS = 0x00000006
         [    0.460863]   CM = 0, WnR = 0
         [    0.463874] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
         [    0.470750] [ffff0000093abce0] pgd=00000000411f8803, 
pud=00000000411f9803, pmd=0000000000000000

Best Regards,
Wei

> Thanks,
>
> Will
>
> .
>
Mark Rutland June 22, 2018, 2:28 p.m. UTC | #4
On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>     [    0.042462] Insufficient stack space to handle exception!
>     [    0.042464] ESR: 0x96000046 -- DABT (current EL)
>     [    0.043781] FAR: 0xffff0000093a80e0
>     [    0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]

Here, the FAR points somewhere in the task stack, so we're evidently
faulting on that...

>     [    0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
>     [    0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>     [    0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #16
>     [    0.067946] Hardware name: linux,dummy-virt (DT)
>     [    0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>     [    0.077480] pc : el1_sync+0x0/0xb0
>     [    0.080970] lr : kpti_install_ng_mappings+0x120/0x214
>     [    0.086143] sp : ffff0000093a80e0
>     [    0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
>     [    0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
>     [    0.100241] x25: ffff00000906d000 x24: ffff000009191000
>     [    0.105657] x23: ffff000008ea9000 x22: 0000000041190000
>     [    0.111448] x21: ffff0000091f7000 x20: 0000000000000000
>     [    0.116437] x19: ffff000009190000 x18: 000000003455d99d
>     [    0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
>     [    0.127155] x15: 000000007eff6000 x14: 000000007eff6000
>     [    0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
>     [    0.137886] x11: 000000007eff8000 x10: 0000000000000000
>     [    0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
>     [    0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
>     [    0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
>     [    0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
>     [    0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
>     [    0.170179] Kernel panic - not syncing: kernel stack overflow
>     [    0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #16
>     [    0.184152] Hardware name: linux,dummy-virt (DT)
>     [    0.188851] Call trace:
>     [    0.191380]  dump_backtrace+0x0/0x180
>     [    0.195113]  show_stack+0x14/0x1c
>     [    0.198488]  dump_stack+0x90/0xb0
>     [    0.201862]  panic+0x138/0x2a0
>     [    0.204989]  __stack_chk_fail+0x0/0x18
>     [    0.208836]  handle_bad_stack+0x118/0x124
>     [    0.212927]  __bad_stack+0x88/0x8c
>     [    0.216414]  el1_sync+0x0/0xb0
>     [    0.219544] Unable to handle kernel paging request at virtual address
> ffff0000093abce0

Likewise, here we're faulting on an address within the task stack,
presumably as part of the unwinding process...

>     [    0.227507] Mem abort info:
>     [    0.230390]   ESR = 0x96000006
>     [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
>     [    0.239428]   SET = 0, FnV = 0
>     [    0.242555]   EA = 0, S1PTW = 0
>     [    0.245797] Data abort info:
>     [    0.248795]   ISV = 0, ISS = 0x00000006
>     [    0.252652]   CM = 0, WnR = 0
>     [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
> (ptrval)
>     [    0.262645] [ffff0000093abce0] pgd=00000000411f8803,
> pud=00000000411f9803, pmd=0000000000000000

... and here the PMD for the task stack is all zeroes, so evidently
that's getting corrupted somehow.

It appears that the overflow stack (which IIRC is embedded within the
kernel's data segment, as part of the image mapping), is fine.

I wonder if there's some existing weirdness in the page tables for the
vmalloc area that causes things to go wrong. Can you please:

* enable ARM64_PTDUMP_DEBUGFS

* boot with kpti=off (with Will's patch to make this work)

* as root, cat /sys/kernel/debug/kernel_page_tables

... and dump the result here?

Thanks,
Mark.
Wei Xu June 22, 2018, 3:28 p.m. UTC | #5
Hi Mark,

On 2018/6/22 22:28, Mark Rutland wrote:
> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>>      [    0.042462] Insufficient stack space to handle exception!
>>      [    0.042464] ESR: 0x96000046 -- DABT (current EL)
>>      [    0.043781] FAR: 0xffff0000093a80e0
>>      [    0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> Here, the FAR points somewhere in the task stack, so we're evidently
> faulting on that...
>
>>      [    0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>      [    0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>>      [    0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45864-g29dcea8-dirty #16
>>      [    0.067946] Hardware name: linux,dummy-virt (DT)
>>      [    0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>      [    0.077480] pc : el1_sync+0x0/0xb0
>>      [    0.080970] lr : kpti_install_ng_mappings+0x120/0x214
>>      [    0.086143] sp : ffff0000093a80e0
>>      [    0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
>>      [    0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
>>      [    0.100241] x25: ffff00000906d000 x24: ffff000009191000
>>      [    0.105657] x23: ffff000008ea9000 x22: 0000000041190000
>>      [    0.111448] x21: ffff0000091f7000 x20: 0000000000000000
>>      [    0.116437] x19: ffff000009190000 x18: 000000003455d99d
>>      [    0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
>>      [    0.127155] x15: 000000007eff6000 x14: 000000007eff6000
>>      [    0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
>>      [    0.137886] x11: 000000007eff8000 x10: 0000000000000000
>>      [    0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
>>      [    0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
>>      [    0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
>>      [    0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
>>      [    0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
>>      [    0.170179] Kernel panic - not syncing: kernel stack overflow
>>      [    0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45864-g29dcea8-dirty #16
>>      [    0.184152] Hardware name: linux,dummy-virt (DT)
>>      [    0.188851] Call trace:
>>      [    0.191380]  dump_backtrace+0x0/0x180
>>      [    0.195113]  show_stack+0x14/0x1c
>>      [    0.198488]  dump_stack+0x90/0xb0
>>      [    0.201862]  panic+0x138/0x2a0
>>      [    0.204989]  __stack_chk_fail+0x0/0x18
>>      [    0.208836]  handle_bad_stack+0x118/0x124
>>      [    0.212927]  __bad_stack+0x88/0x8c
>>      [    0.216414]  el1_sync+0x0/0xb0
>>      [    0.219544] Unable to handle kernel paging request at virtual address
>> ffff0000093abce0
> Likewise, here we're faulting on an address within the task stack,
> presumably as part of the unwinding process...
>
>>      [    0.227507] Mem abort info:
>>      [    0.230390]   ESR = 0x96000006
>>      [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
>>      [    0.239428]   SET = 0, FnV = 0
>>      [    0.242555]   EA = 0, S1PTW = 0
>>      [    0.245797] Data abort info:
>>      [    0.248795]   ISV = 0, ISS = 0x00000006
>>      [    0.252652]   CM = 0, WnR = 0
>>      [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
>> (ptrval)
>>      [    0.262645] [ffff0000093abce0] pgd=00000000411f8803,
>> pud=00000000411f9803, pmd=0000000000000000
> ... and here the PMD for the task stack is all zeroes, so evidently
> that's getting corrupted somehow.
>
> It appears that the overflow stack (which IIRC is embedded within the
> kernel's data segment, as part of the image mapping), is fine.
>
> I wonder if there's some existing weirdness in the page tables for the
> vmalloc area that causes things to go wrong. Can you please:
>
> * enable ARM64_PTDUMP_DEBUGFS
>
> * boot with kpti=off (with Will's patch to make this work)
>
> * as root, cat /sys/kernel/debug/kernel_page_tables
>
> ... and dump the result here?
Thanks!
Can I do this later since Will's new patch works?

Best Regards,
Wei

> Thanks,
> Mark.
>
> .
>
Will Deacon June 22, 2018, 3:41 p.m. UTC | #6
On Fri, Jun 22, 2018 at 11:28:21PM +0800, Wei Xu wrote:
> On 2018/6/22 22:28, Mark Rutland wrote:
> >On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> >>     [    0.227507] Mem abort info:
> >>     [    0.230390]   ESR = 0x96000006
> >>     [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
> >>     [    0.239428]   SET = 0, FnV = 0
> >>     [    0.242555]   EA = 0, S1PTW = 0
> >>     [    0.245797] Data abort info:
> >>     [    0.248795]   ISV = 0, ISS = 0x00000006
> >>     [    0.252652]   CM = 0, WnR = 0
> >>     [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
> >>(ptrval)
> >>     [    0.262645] [ffff0000093abce0] pgd=00000000411f8803,
> >>pud=00000000411f9803, pmd=0000000000000000
> >... and here the PMD for the task stack is all zeroes, so evidently
> >that's getting corrupted somehow.
> >
> >It appears that the overflow stack (which IIRC is embedded within the
> >kernel's data segment, as part of the image mapping), is fine.
> >
> >I wonder if there's some existing weirdness in the page tables for the
> >vmalloc area that causes things to go wrong. Can you please:
> >
> >* enable ARM64_PTDUMP_DEBUGFS
> >
> >* boot with kpti=off (with Will's patch to make this work)
> >
> >* as root, cat /sys/kernel/debug/kernel_page_tables
> >
> >... and dump the result here?
> Thanks!
> Can I do this later since Will's new patch works?

Yes, you should probably go to bed now! Please note that my patch still
isn't the right thing for mainline, since it avoids setting PTE_NG for
tables and therefore won't solve the boot-time issue with KASAN enabled.

We also don't understand why clean+invalidate is causing the issue on your
CPU, whereas clean does not. It looks like clean+invalidate somehow results
in page table entries being zeroed.

Have a good weekend,

Will
Wei Xu June 22, 2018, 4:02 p.m. UTC | #7
Hi Will, Mark,

On 2018/6/22 23:41, Will Deacon wrote:
> On Fri, Jun 22, 2018 at 11:28:21PM +0800, Wei Xu wrote:
>> On 2018/6/22 22:28, Mark Rutland wrote:
>>> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>>>>      [    0.227507] Mem abort info:
>>>>      [    0.230390]   ESR = 0x96000006
>>>>      [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
>>>>      [    0.239428]   SET = 0, FnV = 0
>>>>      [    0.242555]   EA = 0, S1PTW = 0
>>>>      [    0.245797] Data abort info:
>>>>      [    0.248795]   ISV = 0, ISS = 0x00000006
>>>>      [    0.252652]   CM = 0, WnR = 0
>>>>      [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
>>>> (ptrval)
>>>>      [    0.262645] [ffff0000093abce0] pgd=00000000411f8803,
>>>> pud=00000000411f9803, pmd=0000000000000000
>>> ... and here the PMD for the task stack is all zeroes, so evidently
>>> that's getting corrupted somehow.
>>>
>>> It appears that the overflow stack (which IIRC is embedded within the
>>> kernel's data segment, as part of the image mapping), is fine.
>>>
>>> I wonder if there's some existing weirdness in the page tables for the
>>> vmalloc area that causes things to go wrong. Can you please:
>>>
>>> * enable ARM64_PTDUMP_DEBUGFS
>>>
>>> * boot with kpti=off (with Will's patch to make this work)
>>>
>>> * as root, cat /sys/kernel/debug/kernel_page_tables
>>>
>>> ... and dump the result here?
>> Thanks!
>> Can I do this later since Will's new patch works?
> Yes, you should probably go to bed now! Please note that my patch still
> isn't the right thing for mainline, since it avoids setting PTE_NG for
> tables and therefore won't solve the boot-time issue with KASAN enabled.
>
> We also don't understand why clean+invalidate is causing the issue on your
> CPU, whereas clean does not. It looks like clean+invalidate somehow results
> in page table entries being zeroed.
>
> Have a good weekend,

Got it. Thanks and enjoy the fifa world cup :)
Below is the log enabled ARM64_PTDUMP_DEBUGFS.
Only Will's kpti early_param patch on 4.17.0.
Hope it helps.

     ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
     ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel 
./Image-4.17-joyx -initrd
     ../mini-rootfs-arm64.cpio.gz -nographic -append "kpti=off 
rdinit=init console=tt
     yAMA0 earlycon=pl011,0x9000000"
     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
     [    0.000000] Linux version 4.17.0-45865-ga3d6816 
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #19 
SMP PREEMPT Fri Jun 22 23:47:07 CST 2018
     [    0.000000] Machine model: linux,dummy-virt
     [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
     [    0.000000] bootconsole [pl11] enabled
     [    0.000000] efi: Getting EFI parameters from FDT:
     [    0.000000] efi: UEFI not found.
     [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
     [    0.000000] NUMA: No NUMA configuration found
     [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
     [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
     [    0.000000] Zone ranges:
     [    0.000000]   DMA32    [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000]   Normal   empty
     [    0.000000] Movable zone start for each node
     [    0.000000] Early memory node ranges
     [    0.000000]   node   0: [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
     [    0.000000] psci: probing for conduit method from DT.
     [    0.000000] psci: PSCIv1.0 detected in firmware.
     [    0.000000] psci: Using standard PSCI v0.2 function IDs
     [    0.000000] psci: Trusted OS migration not required
     [    0.000000] psci: SMC Calling Convention v1.1
     [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
     [    0.000000] percpu: Embedded 24 pages/cpu @        (ptrval) 
s57984 r8192 d32128 u98304
     [    0.000000] Detected VIPT I-cache on CPU0
     [    0.000000] CPU features: kernel page table isolation forced OFF 
by command line option
     [    0.000000] CPU features: detected: Hardware dirty bit management
     [    0.000000] Built 1 zonelists, mobility grouping on.  Total 
pages: 258048
     [    0.000000] Policy zone: DMA32
     [    0.000000] Kernel command line: kpti=off rdinit=init 
console=ttyAMA0 earlycon=pl011,0x9000000
     [    0.000000] Memory: 968436K/1048576K available (10044K kernel 
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved, 
16384K cma-reserved)
     [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, 
Nodes=1
     [    0.000000] Preemptible hierarchical RCU implementation.
     [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
     [    0.000000]     Tasks RCU enabled.
     [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
     [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
     [    0.000000] GICv3: Distributor has no Range Selector support
     [    0.000000] GICv3: no VLPI support, no direct LPI support
     [    0.000000] ITS [mem 0x08080000-0x0809ffff]
     [    0.000000] ITS@0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
     [    0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
     [    0.000000] GIC: using LPI property table @0x000000007d850000
     [    0.000000] ITS: Allocated 1792 chunks for LPIs
     [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
     [    0.000000] CPU0: using LPI pending table @0x000000007d860000
     [    0.000000] GIC: PPI11 is secure or misconfigured
     [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
     [    0.000000] arch_timer: WARNING: Please fix your firmware
     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
     [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
     [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
     [    0.000859] Console: colour dummy device 80x25
     [    0.001459] Calibrating delay loop (skipped), value calculated 
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
     [    0.002537] pid_max: default: 32768 minimum: 301
     [    0.003028] Security Framework initialized
     [    0.003606] Dentry cache hash table entries: 131072 (order: 8, 
1048576 bytes)
     [    0.004418] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
     [    0.005129] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.005938] Mountpoint-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.026041] ASID allocator initialised with 32768 entries
     [    0.030055] Hierarchical SRCU implementation.
     [    0.034426] Platform MSI: its domain created
     [    0.034885] PCI/MSI: /intc/its domain created
     [    0.035457] EFI services will not be available.
     [    0.038086] smp: Bringing up secondary CPUs ...
     [    0.038557] smp: Brought up 1 node, 1 CPU
     [    0.038966] SMP: Total of 1 processors activated.
     [    0.039447] CPU features: detected: GIC system register CPU 
interface
     [    0.040101] CPU features: detected: Privileged Access Never
     [    0.040667] CPU features: detected: User Access Override
     [    0.041988] CPU: All CPU(s) started at EL1
     [    0.042536] alternatives: patching kernel code
     [    0.044809] devtmpfs: initialized
     [    0.046662] clocksource: jiffies: mask: 0xffffffff max_cycles: 
0xffffffff, max_idle_ns: 7645041785100000 ns
     [    0.049470] futex hash table entries: 256 (order: 3, 32768 bytes)
     [    0.055780] pinctrl core: initialized pinctrl subsystem
     [    0.061504] DMI not present or invalid.
     [    0.065230] NET: Registered protocol family 16
     [    0.069514] audit: initializing netlink subsys (disabled)
     [    0.075351] cpuidle: using governor menu
     [    0.078855] audit: type=2000 audit(0.068:1): state=initialized 
audit_enabled=0 res=1
     [    0.086714] vdso: 2 pages (1 code @         (ptrval), 1 data 
@         (ptrval))
     [    0.094456] hw-breakpoint: found 6 breakpoint and 4 watchpoint 
registers.
     [    0.101869] DMA: preallocated 256 KiB pool for atomic allocations
     [    0.107408] Serial: AMBA PL011 UART driver
     [    0.114802] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 39, 
base_baud = 0) is a PL011 rev1
     [    0.120256] console [ttyAMA0] enabled
     [    0.120256] console [ttyAMA0] enabled
     [    0.127525] bootconsole [pl11] disabled
     [    0.127525] bootconsole [pl11] disabled
     [    0.135667] irq: type mismatch, failed to map hwirq-27 for intc!
     [    0.153827] HugeTLB registered 2.00 MiB page size, pre-allocated 
0 pages
     [    0.157547] cryptd: max_cpu_qlen set to 1000
     [    0.165692] ACPI: Interpreter disabled.
     [    0.166341] vgaarb: loaded
     [    0.166629] SCSI subsystem initialized
     [    0.169664] usbcore: registered new interface driver usbfs
     [    0.170139] usbcore: registered new interface driver hub
     [    0.174110] usbcore: registered new device driver usb
     [    0.179293] pps_core: LinuxPPS API ver. 1 registered
     [    0.184239] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 
Rodolfo Giometti <giometti@linux.it>
     [    0.193320] PTP clock support registered
     [    0.197360] EDAC MC: Ver: 3.0.0
     [    0.201468] Advanced Linux Sound Architecture Driver Initialized.
     [    0.207035] clocksource: Switched to clocksource arch_sys_counter
     [    0.212870] VFS: Disk quotas dquot_6.6.0
     [    0.216844] VFS: Dquot-cache hash table entries: 512 (order 0, 
4096 bytes)
     [    0.223782] pnp: PnP ACPI: disabled
     [    0.229309] NET: Registered protocol family 2
     [    0.232711] tcp_listen_portaddr_hash hash table entries: 512 
(order: 1, 8192 bytes)
     [    0.239478] TCP established hash table entries: 8192 (order: 4, 
65536 bytes)
     [    0.246564] TCP bind hash table entries: 8192 (order: 5, 131072 
bytes)
     [    0.253246] TCP: Hash tables configured (established 8192 bind 8192)
     [    0.259572] UDP hash table entries: 512 (order: 2, 16384 bytes)
     [    0.265610] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
     [    0.272044] NET: Registered protocol family 1
     [    0.288576] RPC: Registered named UNIX socket transport module.
     [    0.289058] RPC: Registered udp transport module.
     [    0.289434] RPC: Registered tcp transport module.
     [    0.291949] RPC: Registered tcp NFSv4.1 backchannel transport 
module.
     [    0.298471] Unpacking initramfs...
     [    0.835705] Freeing initrd memory: 29212K
     [    0.836273] hw perfevents: enabled with armv8_pmuv3 PMU driver, 
13 counters available
     [    0.837026] kvm [1]: HYP mode not available
     [    0.838111] Initialise system trusted keyrings
     [    0.838710] workingset: timestamp_bits=44 max_order=18 
bucket_order=0
     [    0.840716] squashfs: version 4.0 (2009/01/31) Phillip Lougher
     [    0.846449] NFS: Registering the id_resolver key type
     [    0.846892] Key type id_resolver registered
     [    0.847453] Key type id_legacy registered
     [    0.847789] nfs4filelayout_init: NFSv4 File Layout Driver 
Registering...
     [    0.848383] 9p: Installing v9fs 9p2000 file system support
     [    0.848878] pstore: using deflate compression
     [    0.849942] Key type asymmetric registered
     [    0.850303] Asymmetric key parser 'x509' registered
     [    0.850729] Block layer SCSI generic (bsg) driver version 0.4 
loaded (major 245)
     [    0.851480] io scheduler noop registered
     [    0.851801] io scheduler deadline registered
     [    0.852215] io scheduler cfq registered (default)
     [    0.852595] io scheduler mq-deadline registered
     [    0.852955] io scheduler kyber registered
     [    0.855192] pl061_gpio 9030000.pl061: PL061 GPIO chip 
@0x0000000009030000 registered
     [    0.857039] PCI: OF: host bridge /pcie@10000000 ranges:
     [    0.857481] PCI: OF:    IO 0x3eff0000..0x3effffff -> 0x00000000
     [    0.857953] PCI: OF:   MEM 0x10000000..0x3efeffff -> 0x10000000
     [    0.858435] PCI: OF:   MEM 0x8000000000..0xffffffffff -> 
0x8000000000
     [    0.858956] pci-host-generic 3f000000.pcie: ECAM at [mem 
0x3f000000-0x3fffffff] for [bus 00-0f]
     [    0.860042] pci-host-generic 3f000000.pcie: PCI host bridge to 
bus 0000:00
     [    0.860598] pci_bus 0000:00: root bus resource [bus 00-0f]
     [    0.861034] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
     [    0.861524] pci_bus 0000:00: root bus resource [mem 
0x10000000-0x3efeffff]
     [    0.862074] pci_bus 0000:00: root bus resource [mem 
0x8000000000-0xffffffffff]
     [    0.863568] pci 0000:00:01.0: BAR 6: assigned [mem 
0x10000000-0x1003ffff pref]
     [    0.864147] pci 0000:00:01.0: BAR 4: assigned [mem 
0x8000000000-0x8000003fff 64bit pref]
     [    0.864803] pci 0000:00:01.0: BAR 1: assigned [mem 
0x10040000-0x10040fff]
     [    0.865342] pci 0000:00:01.0: BAR 0: assigned [io 0x1000-0x101f]
     [    0.866470] EINJ: ACPI disabled.
     [    0.868836] virtio-pci 0000:00:01.0: enabling device (0000 -> 0003)
     [    0.874100] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
     [    0.875395] SuperH (H)SCI(F) driver initialized
     [    0.876757] msm_serial: driver initialized
     [    0.877328] cacheinfo: Unable to detect cache hierarchy for CPU 0
     [    0.880330] loop: module loaded
     [    0.881885] libphy: Fixed MDIO Bus: probed
     [    0.882499] tun: Universal TUN/TAP device driver, 1.6
     [    0.884820] thunder_xcv, ver 1.0
     [    0.885126] thunder_bgx, ver 1.0
     [    0.885415] nicpf, ver 1.0
     [    0.885764] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
     [    0.886246] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
     [    0.886927] igb: Intel(R) Gigabit Ethernet Network Driver - 
version 5.4.0-k
     [    0.887687] igb: Copyright (c) 2007-2014 Intel Corporation.
     [    0.888159] igbvf: Intel(R) Gigabit Virtual Function Network 
Driver - version 2.4.0-k
     [    0.888782] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
     [    0.889388] sky2: driver version 1.30
     [    0.889931] VFIO - User Level meta-driver version: 0.3
     [    0.890861] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) 
Driver
     [    0.891644] ehci-pci: EHCI PCI platform driver
     [    0.892043] ehci-platform: EHCI generic platform driver
     [    0.892515] ehci-orion: EHCI orion driver
     [    0.892880] ehci-exynos: EHCI EXYNOS driver
     [    0.893414] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
     [    0.893914] ohci-pci: OHCI PCI platform driver
     [    0.894308] ohci-platform: OHCI generic platform driver
     [    0.894765] ohci-exynos: OHCI EXYNOS driver
     [    0.895357] usbcore: registered new interface driver usb-storage
     [    0.896739] rtc-pl031 9010000.pl031: rtc core: registered pl031 
as rtc0
     [    0.897504] i2c /dev entries driver
     [    0.899576] sdhci: Secure Digital Host Controller Interface driver
     [    0.900086] sdhci: Copyright(c) Pierre Ossman
     [    0.900551] Synopsys Designware Multimedia Card Interface Driver
     [    0.901791] sdhci-pltfm: SDHCI platform and OF driver helper
     [    0.902636] ledtrig-cpu: registered to indicate activity on CPUs
     [    0.903644] usbcore: registered new interface driver usbhid
     [    0.904106] usbhid: USB HID core driver
     [    0.905520] NET: Registered protocol family 17
     [    0.905917] 9pnet: Installing 9P2000 support
     [    0.906304] Key type dns_resolver registered
     [    0.906814] registered taskstats version 1
     [    0.907542] Loading compiled-in X.509 certificates
     [    0.908155] input: gpio-keys as 
/devices/platform/gpio-keys/input/input0
     [    0.909760] rtc-pl031 9010000.pl031: setting system clock to 
2015-01-30 02:38:42 UTC (1422585522)
     [    0.918889] ALSA device list:
     [    0.921687]   No soundcards found.
     [    0.925317] uart-pl011 9000000.pl011: no DMA platform data
     [    0.930981] Freeing unused kernel memory: 1216K
     Starting rcS...
     ++ Mounting filesystem
     ifdown: interface lo not configured
     ifdown: interface eth0 not configured
     ++ Starting ssh daemon
     [    0.950291] random: sshd: uninitialized urandom read (32 bytes read)
     ip: RTNETLINK answers: File exists
     rcS Complete
     Welcome to Mini Linux
     GNU/Linux 4.17.0-45865-ga3d6816 aarch64
     Version: 1.1.6
             .--.
            |o_o |
            |:_/ |
           //   \ \
          (|     | )
         /'\_   _/`\
         \___)=(___/
     udhcpc: started, v1.29.0.git
     Setting IP address 0.0.0.0 on eth0
     Documentation: http://open-estuary.org
     E-mail: Chinafengliang@163.com
     estuary:/$ udhcpc: sending discover
     udhcpc: sending select for 10.0.2.15
     udhcpc: lease of 10.0.2.15 obtained, lease time 86400
     Setting IP address 10.0.2.15 on eth0
     Deleting routers
     route: SIOCDELRT: No such process
     Adding router 10.0.2.2
     Recreating /etc/resolv.conf
      Adding DNS server 10.0.2.3

     estuary:/$
     estuary:/$ cat /syestuary:/$ cat /sys/keestuary:/$ cat 
/sys/kernel/debestuary:/$ cat /sys/kernel/debug/keestuary:/$ cat 
/sys/kernel/debug/kernel_page_tables
     ---[ Modules start ]---
     ---[ Modules end ]---
     ---[ vmalloc() Area ]---
     0xffff000008000000-0xffff000008004000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008005000-0xffff000008009000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000800a000-0xffff00000800e000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008010000-0xffff000008020000          64K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000008021000-0xffff000008022000           4K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008028000-0xffff00000802c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008030000-0xffff000008034000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008035000-0xffff000008036000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000008038000-0xffff00000803c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000803d000-0xffff00000803f000           8K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008040000-0xffff000008060000         128K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000008061000-0xffff000008065000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008066000-0xffff000008067000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000008068000-0xffff00000806c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008070000-0xffff000008074000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008078000-0xffff00000807c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008080000-0xffff000008200000        1536K PTE       ro x  
SHD AF    CON     UXN MEM/NORMAL
     0xffff000008200000-0xffff000008a00000           8M PMD       ro x  
SHD AF        BLK UXN MEM/NORMAL
     0xffff000008a00000-0xffff000008a50000         320K PTE       ro x  
SHD AF    CON     UXN MEM/NORMAL
     0xffff000008a50000-0xffff000008c00000        1728K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008c00000-0xffff000008e00000           2M PMD       ro NX 
SHD AF        BLK UXN MEM/NORMAL
     0xffff000008e00000-0xffff000008f10000        1088K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009040000-0xffff0000091f0000        1728K PTE       RW NX 
SHD AF    CON     UXN MEM/NORMAL
     0xffff0000091f0000-0xffff0000091fa000          40K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000091fb000-0xffff0000092fb000           1M PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000092fc000-0xffff00000937c000         512K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009380000-0xffff000009384000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009388000-0xffff00000938c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009390000-0xffff000009394000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009398000-0xffff00000939c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093a0000-0xffff0000093a4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093a8000-0xffff0000093ac000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093b0000-0xffff0000093b4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093b8000-0xffff0000093bc000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093c0000-0xffff0000093c4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093c8000-0xffff0000093cc000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093d0000-0xffff0000093d4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093d5000-0xffff0000093dd000          32K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009408000-0xffff00000940c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009410000-0xffff000009414000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000946d000-0xffff00000946e000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000009475000-0xffff000009476000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff00000947d000-0xffff00000947e000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000009485000-0xffff000009486000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff00000948d000-0xffff00000948e000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000009495000-0xffff000009496000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000009595000-0xffff0000095d5000         256K PTE       RW NX 
SHD AF            UXN MEM/NORMAL-NC
     0xffff000009740000-0xffff000009744000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009c60000-0xffff000009c64000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009c70000-0xffff000009c74000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000a000000-0xffff00000af60000       15744K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff00000af61000-0xffff00000af65000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b020000-0xffff00000b024000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b028000-0xffff00000b02c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b030000-0xffff00000b034000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b038000-0xffff00000b03c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b048000-0xffff00000b04c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b0f8000-0xffff00000b0fc000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b170000-0xffff00000b174000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b208000-0xffff00000b20c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b230000-0xffff00000b234000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b238000-0xffff00000b23c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b48d000-0xffff00000b49d000          64K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b49e000-0xffff00000b4be000         128K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b4c0000-0xffff00000b4c4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b538000-0xffff00000b53c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b7e8000-0xffff00000b7ec000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000c000000-0xffff00000d000000          16M PMD       RW NX 
SHD AF        BLK UXN DEVICE/nGnRnE
     0xffff00000d001000-0xffff00000d004000          12K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d260000-0xffff00000d264000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d760000-0xffff00000d764000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d770000-0xffff00000d774000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d778000-0xffff00000d77c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d7b0000-0xffff00000d7b4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d7d8000-0xffff00000d7dc000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d7e0000-0xffff00000d7e4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff7dffbffd8000-0xffff7dffbffdb000          12K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     ---[ vmalloc() End ]---
     ---[ Fixmap start ]---
     0xffff7dfffe7fa000-0xffff7dfffe7fb000           4K PTE       ro x  
SHD AF            UXN MEM/NORMAL
     0xffff7dfffe7ff000-0xffff7dfffe800000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff7dfffe800000-0xffff7dfffea00000           2M PMD       ro NX 
SHD AF        BLK UXN MEM/NORMAL
     ---[ Fixmap end ]---
     ---[ PCI I/O start ]---
     0xffff7dfffee00000-0xffff7dfffee10000          64K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     ---[ PCI I/O end ]---
     ---[ vmemmap start ]---
     0xffff7e0000000000-0xffff7e0001000000          16M PMD       RW NX 
SHD AF        BLK UXN MEM/NORMAL
     ---[ vmemmap end ]---
     ---[ Linear Mapping ]---
     0xffff800000000000-0xffff800000080000         512K PTE       RW NX 
SHD AF    CON     UXN MEM/NORMAL
     0xffff800000080000-0xffff800000200000        1536K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff800000200000-0xffff800000e00000          12M PMD       ro NX 
SHD AF        BLK UXN MEM/NORMAL
     0xffff800000e00000-0xffff800000f10000        1088K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff800000f10000-0xffff800001000000         960K PTE       RW NX 
SHD AF    CON     UXN MEM/NORMAL
     0xffff800001000000-0xffff800002000000          16M PMD       RW NX 
SHD AF        BLK UXN MEM/NORMAL
     0xffff800002000000-0xffff800040000000         992M PMD       RW NX 
SHD AF    CON BLK UXN MEM/NORMAL
     estuary:/$

Thanks!

Best Regards,
Wei

> Will
>
> .
>
diff mbox

Patch

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..03646e6a2ef4 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -217,8 +217,9 @@  ENDPROC(idmap_cpu_replace_ttbr1)
 
 	.macro __idmap_kpti_put_pgtable_ent_ng, type
 	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
-	str	\type, [cur_\()\type\()p]	// Update the entry and ensure it
-	dc	civac, cur_\()\type\()p		// is visible to all CPUs.
+	str	\type, [cur_\()\type\()p]	// Update the entry and ensure
+	dmb	sy				// that it is visible to all
+	dc	civac, cur_\()\type\()p		// CPUs.
 	.endm
 
 /*