Message ID | 20220616161135.3997786-1-qperret@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Prevent kmemleak from accessing pKVM memory | expand |
On Thu, Jun 16, 2022 at 04:11:34PM +0000, Quentin Perret wrote: > Commit a7259df76702 ("memblock: make memblock_find_in_range method > private") changed the API using which memory is reserved for the pKVM > hypervisor. However, it seems that memblock_phys_alloc() differs > from the original API in terms of kmemleak semantics -- the old one > excluded the reserved regions from kmemleak scans when the new one > doesn't seem to. Unfortunately, when protected KVM is enabled, all > kernel accesses to pKVM-private memory result in a fatal exception, > which can now happen because of kmemleak scans: > > $ echo scan > /sys/kernel/debug/kmemleak > [ 34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290! > [ 34.991580] kvm [304]: Hyp Offset: 0xfffe8be807e00000 > [ 34.991813] Kernel panic - not syncing: HYP panic: > [ 34.991813] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800 > [ 34.991813] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000 > [ 34.991813] VCPU:0000000000000000 > [ 34.993660] CPU: 0 PID: 304 Comm: bash Not tainted 5.19.0-rc2 #102 > [ 34.994059] Hardware name: linux,dummy-virt (DT) > [ 34.994452] Call trace: > [ 34.994641] dump_backtrace.part.0+0xcc/0xe0 > [ 34.994932] show_stack+0x18/0x6c > [ 34.995094] dump_stack_lvl+0x68/0x84 > [ 34.995276] dump_stack+0x18/0x34 > [ 34.995484] panic+0x16c/0x354 > [ 34.995673] __hyp_pgtable_total_pages+0x0/0x60 > [ 34.995933] scan_block+0x74/0x12c > [ 34.996129] scan_gray_list+0xd8/0x19c > [ 34.996332] kmemleak_scan+0x2c8/0x580 > [ 34.996535] kmemleak_write+0x340/0x4a0 > [ 34.996744] full_proxy_write+0x60/0xbc > [ 34.996967] vfs_write+0xc4/0x2b0 > [ 34.997136] ksys_write+0x68/0xf4 > [ 34.997311] __arm64_sys_write+0x20/0x2c > [ 34.997532] invoke_syscall+0x48/0x114 > [ 34.997779] el0_svc_common.constprop.0+0x44/0xec > [ 34.998029] do_el0_svc+0x2c/0xc0 > [ 34.998205] el0_svc+0x2c/0x84 > [ 34.998421] el0t_64_sync_handler+0xf4/0x100 > [ 34.998653] el0t_64_sync+0x18c/0x190 > [ 34.999252] SMP: stopping secondary CPUs > [ 35.000034] Kernel Offset: disabled > [ 35.000261] CPU features: 0x800,00007831,00001086 > [ 35.000642] Memory Limit: none > [ 35.001329] ---[ end Kernel panic - not syncing: HYP panic: > [ 35.001329] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800 > [ 35.001329] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000 > [ 35.001329] VCPU:0000000000000000 ]--- > > Fix this by explicitly excluding the hypervisor's memory pool from > kmemleak like we already do for the hyp BSS. > > Cc: Mike Rapoport <rppt@kernel.org> > Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private") > Signed-off-by: Quentin Perret <qperret@google.com> > --- > An alternative could be to actually exclude memory allocated using > memblock_phys_alloc_range() from kmemleak scans to revert back to the > old behaviour. But nobody else has complained about this AFAIK, so I'd > be inclined to keep this local to pKVM. No strong opinion. This works for me, I haven't heard anyone else complaining. Acked-by: Catalin Marinas <catalin.marinas@arm.com>
On Thu, Jun 16, 2022 at 04:11:34PM +0000, Quentin Perret wrote: > Commit a7259df76702 ("memblock: make memblock_find_in_range method > private") changed the API using which memory is reserved for the pKVM > hypervisor. However, it seems that memblock_phys_alloc() differs > from the original API in terms of kmemleak semantics -- the old one > excluded the reserved regions from kmemleak scans when the new one > doesn't seem to. Unfortunately, when protected KVM is enabled, all I'd rather say that memblock_find_in_range() didn't inform kmemleak about the reserved regions, while memblock_phys_alloc() does. > kernel accesses to pKVM-private memory result in a fatal exception, > which can now happen because of kmemleak scans: > > $ echo scan > /sys/kernel/debug/kmemleak > [ 34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290! > [ 34.991580] kvm [304]: Hyp Offset: 0xfffe8be807e00000 > [ 34.991813] Kernel panic - not syncing: HYP panic: > [ 34.991813] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800 > [ 34.991813] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000 > [ 34.991813] VCPU:0000000000000000 > [ 34.993660] CPU: 0 PID: 304 Comm: bash Not tainted 5.19.0-rc2 #102 > [ 34.994059] Hardware name: linux,dummy-virt (DT) > [ 34.994452] Call trace: > [ 34.994641] dump_backtrace.part.0+0xcc/0xe0 > [ 34.994932] show_stack+0x18/0x6c > [ 34.995094] dump_stack_lvl+0x68/0x84 > [ 34.995276] dump_stack+0x18/0x34 > [ 34.995484] panic+0x16c/0x354 > [ 34.995673] __hyp_pgtable_total_pages+0x0/0x60 > [ 34.995933] scan_block+0x74/0x12c > [ 34.996129] scan_gray_list+0xd8/0x19c > [ 34.996332] kmemleak_scan+0x2c8/0x580 > [ 34.996535] kmemleak_write+0x340/0x4a0 > [ 34.996744] full_proxy_write+0x60/0xbc > [ 34.996967] vfs_write+0xc4/0x2b0 > [ 34.997136] ksys_write+0x68/0xf4 > [ 34.997311] __arm64_sys_write+0x20/0x2c > [ 34.997532] invoke_syscall+0x48/0x114 > [ 34.997779] el0_svc_common.constprop.0+0x44/0xec > [ 34.998029] do_el0_svc+0x2c/0xc0 > [ 34.998205] el0_svc+0x2c/0x84 > [ 34.998421] el0t_64_sync_handler+0xf4/0x100 > [ 34.998653] el0t_64_sync+0x18c/0x190 > [ 34.999252] SMP: stopping secondary CPUs > [ 35.000034] Kernel Offset: disabled > [ 35.000261] CPU features: 0x800,00007831,00001086 > [ 35.000642] Memory Limit: none > [ 35.001329] ---[ end Kernel panic - not syncing: HYP panic: > [ 35.001329] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800 > [ 35.001329] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000 > [ 35.001329] VCPU:0000000000000000 ]--- > > Fix this by explicitly excluding the hypervisor's memory pool from > kmemleak like we already do for the hyp BSS. > > Cc: Mike Rapoport <rppt@kernel.org> > Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private") > Signed-off-by: Quentin Perret <qperret@google.com> > --- > An alternative could be to actually exclude memory allocated using > memblock_phys_alloc_range() from kmemleak scans to revert back to the > old behaviour. This would be wrong because memblock_phys_alloc() does allocate memory and unless there is a good reason to exclude it from kmemleak. > But nobody else has complained about this AFAIK, so I'd be inclined to > keep this local to pKVM. No strong opinion. Yes, please :) An alternative to excluding this memory from kmemleak is to allocate it using memblock_phys_alloc_range(size, align, 0, MEMBLOCK_ALLOC_NOLEAKTRACE) then it won't be added to kmemleak at the first place. > --- > arch/arm64/kvm/arm.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index 400bb0fe2745..28765bd22efb 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -2110,11 +2110,11 @@ static int finalize_hyp_mode(void) > return 0; > > /* > - * Exclude HYP BSS from kmemleak so that it doesn't get peeked > - * at, which would end badly once the section is inaccessible. > - * None of other sections should ever be introspected. > + * Exclude HYP sections from kmemleak so that they don't get peeked > + * at, which would end badly once inaccessible. > */ > kmemleak_free_part(__hyp_bss_start, __hyp_bss_end - __hyp_bss_start); > + kmemleak_free_part(__va(hyp_mem_base), hyp_mem_size); > return pkvm_drop_host_privileges(); > } > > -- > 2.36.1.476.g0c4daa206d-goog >
On Thu, 16 Jun 2022 16:11:34 +0000, Quentin Perret wrote: > Commit a7259df76702 ("memblock: make memblock_find_in_range method > private") changed the API using which memory is reserved for the pKVM > hypervisor. However, it seems that memblock_phys_alloc() differs > from the original API in terms of kmemleak semantics -- the old one > excluded the reserved regions from kmemleak scans when the new one > doesn't seem to. Unfortunately, when protected KVM is enabled, all > kernel accesses to pKVM-private memory result in a fatal exception, > which can now happen because of kmemleak scans: > > [...] Applied to fixes, thanks! [1/1] KVM: arm64: Prevent kmemleak from accessing pKVM memory commit: 9e5afa8a537f742bccc2cd91bc0bef4b6483ee98 Cheers, M.
On Fri, Jun 17, 2022 at 09:21:31AM +0100, Marc Zyngier wrote: > On Thu, 16 Jun 2022 16:11:34 +0000, Quentin Perret wrote: > > Commit a7259df76702 ("memblock: make memblock_find_in_range method > > private") changed the API using which memory is reserved for the pKVM > > hypervisor. However, it seems that memblock_phys_alloc() differs > > from the original API in terms of kmemleak semantics -- the old one > > excluded the reserved regions from kmemleak scans when the new one > > doesn't seem to. Unfortunately, when protected KVM is enabled, all > > kernel accesses to pKVM-private memory result in a fatal exception, > > which can now happen because of kmemleak scans: > > > > [...] > > Applied to fixes, thanks! > > [1/1] KVM: arm64: Prevent kmemleak from accessing pKVM memory > commit: 9e5afa8a537f742bccc2cd91bc0bef4b6483ee98 I'd really like to update the changelog to this: Commit a7259df76702 ("memblock: make memblock_find_in_range method private") changed the API using which memory is reserved for the pKVM hypervisor. However, memblock_phys_alloc() differs from the original API in terms of kmemleak semantics -- the old one didn't report the reserved regions to kmemleak while the new one does. Unfortunately, when protected KVM is enabled, all kernel accesses to pKVM-private memory result in a fatal exception, which can now happen because of kmemleak scans: $ echo scan > /sys/kernel/debug/kmemleak [ 34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290! ... Fix this by explicitly excluding the hypervisor's memory pool from kmemleak like we already do for the hyp BSS. > Cheers, > > M. > -- > Marc Zyngier <maz@kernel.org> >
On Friday 17 Jun 2022 at 11:38:14 (+0300), Mike Rapoport wrote: > On Fri, Jun 17, 2022 at 09:21:31AM +0100, Marc Zyngier wrote: > > On Thu, 16 Jun 2022 16:11:34 +0000, Quentin Perret wrote: > > > Commit a7259df76702 ("memblock: make memblock_find_in_range method > > > private") changed the API using which memory is reserved for the pKVM > > > hypervisor. However, it seems that memblock_phys_alloc() differs > > > from the original API in terms of kmemleak semantics -- the old one > > > excluded the reserved regions from kmemleak scans when the new one > > > doesn't seem to. Unfortunately, when protected KVM is enabled, all > > > kernel accesses to pKVM-private memory result in a fatal exception, > > > which can now happen because of kmemleak scans: > > > > > > [...] > > > > Applied to fixes, thanks! > > > > [1/1] KVM: arm64: Prevent kmemleak from accessing pKVM memory > > commit: 9e5afa8a537f742bccc2cd91bc0bef4b6483ee98 > > I'd really like to update the changelog to this: > > Commit a7259df76702 ("memblock: make memblock_find_in_range method > private") changed the API using which memory is reserved for the pKVM > hypervisor. However, memblock_phys_alloc() differs from the original API in > terms of kmemleak semantics -- the old one didn't report the reserved > regions to kmemleak while the new one does. Unfortunately, when protected > KVM is enabled, all kernel accesses to pKVM-private memory result in a > fatal exception, which can now happen because of kmemleak scans: > > $ echo scan > /sys/kernel/debug/kmemleak > [ 34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290! > ... > > Fix this by explicitly excluding the hypervisor's memory pool from > kmemleak like we already do for the hyp BSS. Looks good to me, thanks. Quentin
On 2022-06-17 09:45, Quentin Perret wrote: > On Friday 17 Jun 2022 at 11:38:14 (+0300), Mike Rapoport wrote: >> On Fri, Jun 17, 2022 at 09:21:31AM +0100, Marc Zyngier wrote: >> > On Thu, 16 Jun 2022 16:11:34 +0000, Quentin Perret wrote: >> > > Commit a7259df76702 ("memblock: make memblock_find_in_range method >> > > private") changed the API using which memory is reserved for the pKVM >> > > hypervisor. However, it seems that memblock_phys_alloc() differs >> > > from the original API in terms of kmemleak semantics -- the old one >> > > excluded the reserved regions from kmemleak scans when the new one >> > > doesn't seem to. Unfortunately, when protected KVM is enabled, all >> > > kernel accesses to pKVM-private memory result in a fatal exception, >> > > which can now happen because of kmemleak scans: >> > > >> > > [...] >> > >> > Applied to fixes, thanks! >> > >> > [1/1] KVM: arm64: Prevent kmemleak from accessing pKVM memory >> > commit: 9e5afa8a537f742bccc2cd91bc0bef4b6483ee98 >> >> I'd really like to update the changelog to this: >> >> Commit a7259df76702 ("memblock: make memblock_find_in_range method >> private") changed the API using which memory is reserved for the pKVM >> hypervisor. However, memblock_phys_alloc() differs from the original >> API in >> terms of kmemleak semantics -- the old one didn't report the reserved >> regions to kmemleak while the new one does. Unfortunately, when >> protected >> KVM is enabled, all kernel accesses to pKVM-private memory result in a >> fatal exception, which can now happen because of kmemleak scans: >> >> $ echo scan > /sys/kernel/debug/kmemleak >> [ 34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] >> __kvm_nvhe_handle_host_mem_abort+0x270/0x290! >> ... >> >> Fix this by explicitly excluding the hypervisor's memory pool from >> kmemleak like we already do for the hyp BSS. > > Looks good to me, thanks. Now updated. Thanks, M.
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 400bb0fe2745..28765bd22efb 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2110,11 +2110,11 @@ static int finalize_hyp_mode(void) return 0; /* - * Exclude HYP BSS from kmemleak so that it doesn't get peeked - * at, which would end badly once the section is inaccessible. - * None of other sections should ever be introspected. + * Exclude HYP sections from kmemleak so that they don't get peeked + * at, which would end badly once inaccessible. */ kmemleak_free_part(__hyp_bss_start, __hyp_bss_end - __hyp_bss_start); + kmemleak_free_part(__va(hyp_mem_base), hyp_mem_size); return pkvm_drop_host_privileges(); }
Commit a7259df76702 ("memblock: make memblock_find_in_range method private") changed the API using which memory is reserved for the pKVM hypervisor. However, it seems that memblock_phys_alloc() differs from the original API in terms of kmemleak semantics -- the old one excluded the reserved regions from kmemleak scans when the new one doesn't seem to. Unfortunately, when protected KVM is enabled, all kernel accesses to pKVM-private memory result in a fatal exception, which can now happen because of kmemleak scans: $ echo scan > /sys/kernel/debug/kmemleak [ 34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290! [ 34.991580] kvm [304]: Hyp Offset: 0xfffe8be807e00000 [ 34.991813] Kernel panic - not syncing: HYP panic: [ 34.991813] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800 [ 34.991813] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000 [ 34.991813] VCPU:0000000000000000 [ 34.993660] CPU: 0 PID: 304 Comm: bash Not tainted 5.19.0-rc2 #102 [ 34.994059] Hardware name: linux,dummy-virt (DT) [ 34.994452] Call trace: [ 34.994641] dump_backtrace.part.0+0xcc/0xe0 [ 34.994932] show_stack+0x18/0x6c [ 34.995094] dump_stack_lvl+0x68/0x84 [ 34.995276] dump_stack+0x18/0x34 [ 34.995484] panic+0x16c/0x354 [ 34.995673] __hyp_pgtable_total_pages+0x0/0x60 [ 34.995933] scan_block+0x74/0x12c [ 34.996129] scan_gray_list+0xd8/0x19c [ 34.996332] kmemleak_scan+0x2c8/0x580 [ 34.996535] kmemleak_write+0x340/0x4a0 [ 34.996744] full_proxy_write+0x60/0xbc [ 34.996967] vfs_write+0xc4/0x2b0 [ 34.997136] ksys_write+0x68/0xf4 [ 34.997311] __arm64_sys_write+0x20/0x2c [ 34.997532] invoke_syscall+0x48/0x114 [ 34.997779] el0_svc_common.constprop.0+0x44/0xec [ 34.998029] do_el0_svc+0x2c/0xc0 [ 34.998205] el0_svc+0x2c/0x84 [ 34.998421] el0t_64_sync_handler+0xf4/0x100 [ 34.998653] el0t_64_sync+0x18c/0x190 [ 34.999252] SMP: stopping secondary CPUs [ 35.000034] Kernel Offset: disabled [ 35.000261] CPU features: 0x800,00007831,00001086 [ 35.000642] Memory Limit: none [ 35.001329] ---[ end Kernel panic - not syncing: HYP panic: [ 35.001329] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800 [ 35.001329] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000 [ 35.001329] VCPU:0000000000000000 ]--- Fix this by explicitly excluding the hypervisor's memory pool from kmemleak like we already do for the hyp BSS. Cc: Mike Rapoport <rppt@kernel.org> Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private") Signed-off-by: Quentin Perret <qperret@google.com> --- An alternative could be to actually exclude memory allocated using memblock_phys_alloc_range() from kmemleak scans to revert back to the old behaviour. But nobody else has complained about this AFAIK, so I'd be inclined to keep this local to pKVM. No strong opinion. --- arch/arm64/kvm/arm.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)