diff mbox series

KVM: arm64: Prevent kmemleak from accessing pKVM memory

Message ID 20220616161135.3997786-1-qperret@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Prevent kmemleak from accessing pKVM memory | expand

Commit Message

Quentin Perret June 16, 2022, 4:11 p.m. UTC
Commit a7259df76702 ("memblock: make memblock_find_in_range method
private") changed the API using which memory is reserved for the pKVM
hypervisor. However, it seems that memblock_phys_alloc() differs
from the original API in terms of kmemleak semantics -- the old one
excluded the reserved regions from kmemleak scans when the new one
doesn't seem to. Unfortunately, when protected KVM is enabled, all
kernel accesses to pKVM-private memory result in a fatal exception,
which can now happen because of kmemleak scans:

$ echo scan > /sys/kernel/debug/kmemleak
[   34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290!
[   34.991580] kvm [304]: Hyp Offset: 0xfffe8be807e00000
[   34.991813] Kernel panic - not syncing: HYP panic:
[   34.991813] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800
[   34.991813] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000
[   34.991813] VCPU:0000000000000000
[   34.993660] CPU: 0 PID: 304 Comm: bash Not tainted 5.19.0-rc2 #102
[   34.994059] Hardware name: linux,dummy-virt (DT)
[   34.994452] Call trace:
[   34.994641]  dump_backtrace.part.0+0xcc/0xe0
[   34.994932]  show_stack+0x18/0x6c
[   34.995094]  dump_stack_lvl+0x68/0x84
[   34.995276]  dump_stack+0x18/0x34
[   34.995484]  panic+0x16c/0x354
[   34.995673]  __hyp_pgtable_total_pages+0x0/0x60
[   34.995933]  scan_block+0x74/0x12c
[   34.996129]  scan_gray_list+0xd8/0x19c
[   34.996332]  kmemleak_scan+0x2c8/0x580
[   34.996535]  kmemleak_write+0x340/0x4a0
[   34.996744]  full_proxy_write+0x60/0xbc
[   34.996967]  vfs_write+0xc4/0x2b0
[   34.997136]  ksys_write+0x68/0xf4
[   34.997311]  __arm64_sys_write+0x20/0x2c
[   34.997532]  invoke_syscall+0x48/0x114
[   34.997779]  el0_svc_common.constprop.0+0x44/0xec
[   34.998029]  do_el0_svc+0x2c/0xc0
[   34.998205]  el0_svc+0x2c/0x84
[   34.998421]  el0t_64_sync_handler+0xf4/0x100
[   34.998653]  el0t_64_sync+0x18c/0x190
[   34.999252] SMP: stopping secondary CPUs
[   35.000034] Kernel Offset: disabled
[   35.000261] CPU features: 0x800,00007831,00001086
[   35.000642] Memory Limit: none
[   35.001329] ---[ end Kernel panic - not syncing: HYP panic:
[   35.001329] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800
[   35.001329] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000
[   35.001329] VCPU:0000000000000000 ]---

Fix this by explicitly excluding the hypervisor's memory pool from
kmemleak like we already do for the hyp BSS.

Cc: Mike Rapoport <rppt@kernel.org>
Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
Signed-off-by: Quentin Perret <qperret@google.com>
---
An alternative could be to actually exclude memory allocated using
memblock_phys_alloc_range() from kmemleak scans to revert back to the
old behaviour. But nobody else has complained about this AFAIK, so I'd
be inclined to keep this local to pKVM. No strong opinion.
---
 arch/arm64/kvm/arm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Catalin Marinas June 16, 2022, 5:51 p.m. UTC | #1
On Thu, Jun 16, 2022 at 04:11:34PM +0000, Quentin Perret wrote:
> Commit a7259df76702 ("memblock: make memblock_find_in_range method
> private") changed the API using which memory is reserved for the pKVM
> hypervisor. However, it seems that memblock_phys_alloc() differs
> from the original API in terms of kmemleak semantics -- the old one
> excluded the reserved regions from kmemleak scans when the new one
> doesn't seem to. Unfortunately, when protected KVM is enabled, all
> kernel accesses to pKVM-private memory result in a fatal exception,
> which can now happen because of kmemleak scans:
> 
> $ echo scan > /sys/kernel/debug/kmemleak
> [   34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290!
> [   34.991580] kvm [304]: Hyp Offset: 0xfffe8be807e00000
> [   34.991813] Kernel panic - not syncing: HYP panic:
> [   34.991813] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800
> [   34.991813] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000
> [   34.991813] VCPU:0000000000000000
> [   34.993660] CPU: 0 PID: 304 Comm: bash Not tainted 5.19.0-rc2 #102
> [   34.994059] Hardware name: linux,dummy-virt (DT)
> [   34.994452] Call trace:
> [   34.994641]  dump_backtrace.part.0+0xcc/0xe0
> [   34.994932]  show_stack+0x18/0x6c
> [   34.995094]  dump_stack_lvl+0x68/0x84
> [   34.995276]  dump_stack+0x18/0x34
> [   34.995484]  panic+0x16c/0x354
> [   34.995673]  __hyp_pgtable_total_pages+0x0/0x60
> [   34.995933]  scan_block+0x74/0x12c
> [   34.996129]  scan_gray_list+0xd8/0x19c
> [   34.996332]  kmemleak_scan+0x2c8/0x580
> [   34.996535]  kmemleak_write+0x340/0x4a0
> [   34.996744]  full_proxy_write+0x60/0xbc
> [   34.996967]  vfs_write+0xc4/0x2b0
> [   34.997136]  ksys_write+0x68/0xf4
> [   34.997311]  __arm64_sys_write+0x20/0x2c
> [   34.997532]  invoke_syscall+0x48/0x114
> [   34.997779]  el0_svc_common.constprop.0+0x44/0xec
> [   34.998029]  do_el0_svc+0x2c/0xc0
> [   34.998205]  el0_svc+0x2c/0x84
> [   34.998421]  el0t_64_sync_handler+0xf4/0x100
> [   34.998653]  el0t_64_sync+0x18c/0x190
> [   34.999252] SMP: stopping secondary CPUs
> [   35.000034] Kernel Offset: disabled
> [   35.000261] CPU features: 0x800,00007831,00001086
> [   35.000642] Memory Limit: none
> [   35.001329] ---[ end Kernel panic - not syncing: HYP panic:
> [   35.001329] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800
> [   35.001329] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000
> [   35.001329] VCPU:0000000000000000 ]---
> 
> Fix this by explicitly excluding the hypervisor's memory pool from
> kmemleak like we already do for the hyp BSS.
> 
> Cc: Mike Rapoport <rppt@kernel.org>
> Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
> An alternative could be to actually exclude memory allocated using
> memblock_phys_alloc_range() from kmemleak scans to revert back to the
> old behaviour. But nobody else has complained about this AFAIK, so I'd
> be inclined to keep this local to pKVM. No strong opinion.

This works for me, I haven't heard anyone else complaining.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Mike Rapoport June 17, 2022, 8:19 a.m. UTC | #2
On Thu, Jun 16, 2022 at 04:11:34PM +0000, Quentin Perret wrote:
> Commit a7259df76702 ("memblock: make memblock_find_in_range method
> private") changed the API using which memory is reserved for the pKVM
> hypervisor. However, it seems that memblock_phys_alloc() differs
> from the original API in terms of kmemleak semantics -- the old one
> excluded the reserved regions from kmemleak scans when the new one
> doesn't seem to. Unfortunately, when protected KVM is enabled, all

I'd rather say that memblock_find_in_range() didn't inform kmemleak about
the reserved regions, while memblock_phys_alloc() does.

> kernel accesses to pKVM-private memory result in a fatal exception,
> which can now happen because of kmemleak scans:
> 
> $ echo scan > /sys/kernel/debug/kmemleak
> [   34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290!
> [   34.991580] kvm [304]: Hyp Offset: 0xfffe8be807e00000
> [   34.991813] Kernel panic - not syncing: HYP panic:
> [   34.991813] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800
> [   34.991813] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000
> [   34.991813] VCPU:0000000000000000
> [   34.993660] CPU: 0 PID: 304 Comm: bash Not tainted 5.19.0-rc2 #102
> [   34.994059] Hardware name: linux,dummy-virt (DT)
> [   34.994452] Call trace:
> [   34.994641]  dump_backtrace.part.0+0xcc/0xe0
> [   34.994932]  show_stack+0x18/0x6c
> [   34.995094]  dump_stack_lvl+0x68/0x84
> [   34.995276]  dump_stack+0x18/0x34
> [   34.995484]  panic+0x16c/0x354
> [   34.995673]  __hyp_pgtable_total_pages+0x0/0x60
> [   34.995933]  scan_block+0x74/0x12c
> [   34.996129]  scan_gray_list+0xd8/0x19c
> [   34.996332]  kmemleak_scan+0x2c8/0x580
> [   34.996535]  kmemleak_write+0x340/0x4a0
> [   34.996744]  full_proxy_write+0x60/0xbc
> [   34.996967]  vfs_write+0xc4/0x2b0
> [   34.997136]  ksys_write+0x68/0xf4
> [   34.997311]  __arm64_sys_write+0x20/0x2c
> [   34.997532]  invoke_syscall+0x48/0x114
> [   34.997779]  el0_svc_common.constprop.0+0x44/0xec
> [   34.998029]  do_el0_svc+0x2c/0xc0
> [   34.998205]  el0_svc+0x2c/0x84
> [   34.998421]  el0t_64_sync_handler+0xf4/0x100
> [   34.998653]  el0t_64_sync+0x18c/0x190
> [   34.999252] SMP: stopping secondary CPUs
> [   35.000034] Kernel Offset: disabled
> [   35.000261] CPU features: 0x800,00007831,00001086
> [   35.000642] Memory Limit: none
> [   35.001329] ---[ end Kernel panic - not syncing: HYP panic:
> [   35.001329] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800
> [   35.001329] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000
> [   35.001329] VCPU:0000000000000000 ]---
> 
> Fix this by explicitly excluding the hypervisor's memory pool from
> kmemleak like we already do for the hyp BSS.
> 
> Cc: Mike Rapoport <rppt@kernel.org>
> Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
> An alternative could be to actually exclude memory allocated using
> memblock_phys_alloc_range() from kmemleak scans to revert back to the
> old behaviour.

This would be wrong because memblock_phys_alloc() does allocate memory and
unless there is a good reason to exclude it from kmemleak.

> But nobody else has complained about this AFAIK, so I'd be inclined to
> keep this local to pKVM. No strong opinion.

Yes, please :)
An alternative to excluding this memory from kmemleak is to allocate it
using 

	memblock_phys_alloc_range(size, align, 0, MEMBLOCK_ALLOC_NOLEAKTRACE)

then it won't be added to kmemleak at the first place.

> ---
>  arch/arm64/kvm/arm.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 400bb0fe2745..28765bd22efb 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -2110,11 +2110,11 @@ static int finalize_hyp_mode(void)
>  		return 0;
>  
>  	/*
> -	 * Exclude HYP BSS from kmemleak so that it doesn't get peeked
> -	 * at, which would end badly once the section is inaccessible.
> -	 * None of other sections should ever be introspected.
> +	 * Exclude HYP sections from kmemleak so that they don't get peeked
> +	 * at, which would end badly once inaccessible.
>  	 */
>  	kmemleak_free_part(__hyp_bss_start, __hyp_bss_end - __hyp_bss_start);
> +	kmemleak_free_part(__va(hyp_mem_base), hyp_mem_size);
>  	return pkvm_drop_host_privileges();
>  }
>  
> -- 
> 2.36.1.476.g0c4daa206d-goog
>
Marc Zyngier June 17, 2022, 8:21 a.m. UTC | #3
On Thu, 16 Jun 2022 16:11:34 +0000, Quentin Perret wrote:
> Commit a7259df76702 ("memblock: make memblock_find_in_range method
> private") changed the API using which memory is reserved for the pKVM
> hypervisor. However, it seems that memblock_phys_alloc() differs
> from the original API in terms of kmemleak semantics -- the old one
> excluded the reserved regions from kmemleak scans when the new one
> doesn't seem to. Unfortunately, when protected KVM is enabled, all
> kernel accesses to pKVM-private memory result in a fatal exception,
> which can now happen because of kmemleak scans:
> 
> [...]

Applied to fixes, thanks!

[1/1] KVM: arm64: Prevent kmemleak from accessing pKVM memory
      commit: 9e5afa8a537f742bccc2cd91bc0bef4b6483ee98

Cheers,

	M.
Mike Rapoport June 17, 2022, 8:38 a.m. UTC | #4
On Fri, Jun 17, 2022 at 09:21:31AM +0100, Marc Zyngier wrote:
> On Thu, 16 Jun 2022 16:11:34 +0000, Quentin Perret wrote:
> > Commit a7259df76702 ("memblock: make memblock_find_in_range method
> > private") changed the API using which memory is reserved for the pKVM
> > hypervisor. However, it seems that memblock_phys_alloc() differs
> > from the original API in terms of kmemleak semantics -- the old one
> > excluded the reserved regions from kmemleak scans when the new one
> > doesn't seem to. Unfortunately, when protected KVM is enabled, all
> > kernel accesses to pKVM-private memory result in a fatal exception,
> > which can now happen because of kmemleak scans:
> > 
> > [...]
> 
> Applied to fixes, thanks!
> 
> [1/1] KVM: arm64: Prevent kmemleak from accessing pKVM memory
>       commit: 9e5afa8a537f742bccc2cd91bc0bef4b6483ee98

I'd really like to update the changelog to this:

Commit a7259df76702 ("memblock: make memblock_find_in_range method
private") changed the API using which memory is reserved for the pKVM
hypervisor. However, memblock_phys_alloc() differs from the original API in
terms of kmemleak semantics -- the old one didn't report the reserved
regions to kmemleak while the new one does. Unfortunately, when protected
KVM is enabled, all kernel accesses to pKVM-private memory result in a
fatal exception, which can now happen because of kmemleak scans:

$ echo scan > /sys/kernel/debug/kmemleak
[   34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290!
...

Fix this by explicitly excluding the hypervisor's memory pool from
kmemleak like we already do for the hyp BSS.


> Cheers,
> 
> 	M.
> -- 
> Marc Zyngier <maz@kernel.org>
>
Quentin Perret June 17, 2022, 8:45 a.m. UTC | #5
On Friday 17 Jun 2022 at 11:38:14 (+0300), Mike Rapoport wrote:
> On Fri, Jun 17, 2022 at 09:21:31AM +0100, Marc Zyngier wrote:
> > On Thu, 16 Jun 2022 16:11:34 +0000, Quentin Perret wrote:
> > > Commit a7259df76702 ("memblock: make memblock_find_in_range method
> > > private") changed the API using which memory is reserved for the pKVM
> > > hypervisor. However, it seems that memblock_phys_alloc() differs
> > > from the original API in terms of kmemleak semantics -- the old one
> > > excluded the reserved regions from kmemleak scans when the new one
> > > doesn't seem to. Unfortunately, when protected KVM is enabled, all
> > > kernel accesses to pKVM-private memory result in a fatal exception,
> > > which can now happen because of kmemleak scans:
> > > 
> > > [...]
> > 
> > Applied to fixes, thanks!
> > 
> > [1/1] KVM: arm64: Prevent kmemleak from accessing pKVM memory
> >       commit: 9e5afa8a537f742bccc2cd91bc0bef4b6483ee98
> 
> I'd really like to update the changelog to this:
> 
> Commit a7259df76702 ("memblock: make memblock_find_in_range method
> private") changed the API using which memory is reserved for the pKVM
> hypervisor. However, memblock_phys_alloc() differs from the original API in
> terms of kmemleak semantics -- the old one didn't report the reserved
> regions to kmemleak while the new one does. Unfortunately, when protected
> KVM is enabled, all kernel accesses to pKVM-private memory result in a
> fatal exception, which can now happen because of kmemleak scans:
> 
> $ echo scan > /sys/kernel/debug/kmemleak
> [   34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290!
> ...
> 
> Fix this by explicitly excluding the hypervisor's memory pool from
> kmemleak like we already do for the hyp BSS.

Looks good to me, thanks.

Quentin
Marc Zyngier June 17, 2022, 8:50 a.m. UTC | #6
On 2022-06-17 09:45, Quentin Perret wrote:
> On Friday 17 Jun 2022 at 11:38:14 (+0300), Mike Rapoport wrote:
>> On Fri, Jun 17, 2022 at 09:21:31AM +0100, Marc Zyngier wrote:
>> > On Thu, 16 Jun 2022 16:11:34 +0000, Quentin Perret wrote:
>> > > Commit a7259df76702 ("memblock: make memblock_find_in_range method
>> > > private") changed the API using which memory is reserved for the pKVM
>> > > hypervisor. However, it seems that memblock_phys_alloc() differs
>> > > from the original API in terms of kmemleak semantics -- the old one
>> > > excluded the reserved regions from kmemleak scans when the new one
>> > > doesn't seem to. Unfortunately, when protected KVM is enabled, all
>> > > kernel accesses to pKVM-private memory result in a fatal exception,
>> > > which can now happen because of kmemleak scans:
>> > >
>> > > [...]
>> >
>> > Applied to fixes, thanks!
>> >
>> > [1/1] KVM: arm64: Prevent kmemleak from accessing pKVM memory
>> >       commit: 9e5afa8a537f742bccc2cd91bc0bef4b6483ee98
>> 
>> I'd really like to update the changelog to this:
>> 
>> Commit a7259df76702 ("memblock: make memblock_find_in_range method
>> private") changed the API using which memory is reserved for the pKVM
>> hypervisor. However, memblock_phys_alloc() differs from the original 
>> API in
>> terms of kmemleak semantics -- the old one didn't report the reserved
>> regions to kmemleak while the new one does. Unfortunately, when 
>> protected
>> KVM is enabled, all kernel accesses to pKVM-private memory result in a
>> fatal exception, which can now happen because of kmemleak scans:
>> 
>> $ echo scan > /sys/kernel/debug/kmemleak
>> [   34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] 
>> __kvm_nvhe_handle_host_mem_abort+0x270/0x290!
>> ...
>> 
>> Fix this by explicitly excluding the hypervisor's memory pool from
>> kmemleak like we already do for the hyp BSS.
> 
> Looks good to me, thanks.

Now updated. Thanks,

         M.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 400bb0fe2745..28765bd22efb 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2110,11 +2110,11 @@  static int finalize_hyp_mode(void)
 		return 0;
 
 	/*
-	 * Exclude HYP BSS from kmemleak so that it doesn't get peeked
-	 * at, which would end badly once the section is inaccessible.
-	 * None of other sections should ever be introspected.
+	 * Exclude HYP sections from kmemleak so that they don't get peeked
+	 * at, which would end badly once inaccessible.
 	 */
 	kmemleak_free_part(__hyp_bss_start, __hyp_bss_end - __hyp_bss_start);
+	kmemleak_free_part(__va(hyp_mem_base), hyp_mem_size);
 	return pkvm_drop_host_privileges();
 }