Message ID | 20240815170907.2792229-1-apatel@ventanamicro.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | RISC-V: KVM: Don't zero-out PMU snapshot area before freeing data | expand |
On Thu, Aug 15, 2024 at 10:39 PM Anup Patel <apatel@ventanamicro.com> wrote: > > With the latest Linux-6.11-rc3, the below NULL pointer crash is observed > when SBI PMU snapshot is enabled for the guest and the guest is forcefully > powered-off. > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000508 > Oops [#1] > Modules linked in: kvm > CPU: 0 UID: 0 PID: 61 Comm: term-poll Not tainted 6.11.0-rc3-00018-g44d7178dd77a #3 > Hardware name: riscv-virtio,qemu (DT) > epc : __kvm_write_guest_page+0x94/0xa6 [kvm] > ra : __kvm_write_guest_page+0x54/0xa6 [kvm] > epc : ffffffff01590e98 ra : ffffffff01590e58 sp : ffff8f80001f39b0 > gp : ffffffff81512a60 tp : ffffaf80024872c0 t0 : ffffaf800247e000 > t1 : 00000000000007e0 t2 : 0000000000000000 s0 : ffff8f80001f39f0 > s1 : 00007fff89ac4000 a0 : ffffffff015dd7e8 a1 : 0000000000000086 > a2 : 0000000000000000 a3 : ffffaf8000000000 a4 : ffffaf80024882c0 > a5 : 0000000000000000 a6 : ffffaf800328d780 a7 : 00000000000001cc > s2 : ffffaf800197bd00 s3 : 00000000000828c4 s4 : ffffaf800248c000 > s5 : ffffaf800247d000 s6 : 0000000000001000 s7 : 0000000000001000 > s8 : 0000000000000000 s9 : 00007fff861fd500 s10: 0000000000000001 > s11: 0000000000800000 t3 : 00000000000004d3 t4 : 00000000000004d3 > t5 : ffffffff814126e0 t6 : ffffffff81412700 > status: 0000000200000120 badaddr: 0000000000000508 cause: 000000000000000d > [<ffffffff01590e98>] __kvm_write_guest_page+0x94/0xa6 [kvm] > [<ffffffff015943a6>] kvm_vcpu_write_guest+0x56/0x90 [kvm] > [<ffffffff015a175c>] kvm_pmu_clear_snapshot_area+0x42/0x7e [kvm] > [<ffffffff015a1972>] kvm_riscv_vcpu_pmu_deinit.part.0+0xe0/0x14e [kvm] > [<ffffffff015a2ad0>] kvm_riscv_vcpu_pmu_deinit+0x1a/0x24 [kvm] > [<ffffffff0159b344>] kvm_arch_vcpu_destroy+0x28/0x4c [kvm] > [<ffffffff0158e420>] kvm_destroy_vcpus+0x5a/0xda [kvm] > [<ffffffff0159930c>] kvm_arch_destroy_vm+0x14/0x28 [kvm] > [<ffffffff01593260>] kvm_destroy_vm+0x168/0x2a0 [kvm] > [<ffffffff015933d4>] kvm_put_kvm+0x3c/0x58 [kvm] > [<ffffffff01593412>] kvm_vm_release+0x22/0x2e [kvm] > > Clearly, the kvm_vcpu_write_guest() function is crashing because it is > being called from kvm_pmu_clear_snapshot_area() upon guest tear down. > > To address the above issue, simplify the kvm_pmu_clear_snapshot_area() to > not zero-out PMU snapshot area from kvm_pmu_clear_snapshot_area() because > the guest is anyway being tore down. > > The kvm_pmu_clear_snapshot_area() is also called when guest changes > PMU snapshot area of a VCPU but even in this case the previous PMU > snaphsot area must not be zeroed-out because the guest might have > reclaimed the pervious PMU snapshot area for some other purpose. > > Fixes: c2f41ddbcdd7 ("RISC-V: KVM: Implement SBI PMU Snapshot feature") > Signed-off-by: Anup Patel <apatel@ventanamicro.com> Queued this patch for Linux-6.11 fixes. Regards, Anup > --- > arch/riscv/kvm/vcpu_pmu.c | 14 ++------------ > 1 file changed, 2 insertions(+), 12 deletions(-) > > diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c > index bcf41d6e0df0..2707a51b082c 100644 > --- a/arch/riscv/kvm/vcpu_pmu.c > +++ b/arch/riscv/kvm/vcpu_pmu.c > @@ -391,19 +391,9 @@ int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num, > static void kvm_pmu_clear_snapshot_area(struct kvm_vcpu *vcpu) > { > struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu); > - int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data); > > - if (kvpmu->sdata) { > - if (kvpmu->snapshot_addr != INVALID_GPA) { > - memset(kvpmu->sdata, 0, snapshot_area_size); > - kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, > - kvpmu->sdata, snapshot_area_size); > - } else { > - pr_warn("snapshot address invalid\n"); > - } > - kfree(kvpmu->sdata); > - kvpmu->sdata = NULL; > - } > + kfree(kvpmu->sdata); > + kvpmu->sdata = NULL; > kvpmu->snapshot_addr = INVALID_GPA; > } > > -- > 2.34.1 >
Hello: This patch was applied to riscv/linux.git (fixes) by Anup Patel <anup@brainfault.org>: On Thu, 15 Aug 2024 22:39:07 +0530 you wrote: > With the latest Linux-6.11-rc3, the below NULL pointer crash is observed > when SBI PMU snapshot is enabled for the guest and the guest is forcefully > powered-off. > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000508 > Oops [#1] > Modules linked in: kvm > CPU: 0 UID: 0 PID: 61 Comm: term-poll Not tainted 6.11.0-rc3-00018-g44d7178dd77a #3 > Hardware name: riscv-virtio,qemu (DT) > epc : __kvm_write_guest_page+0x94/0xa6 [kvm] > ra : __kvm_write_guest_page+0x54/0xa6 [kvm] > epc : ffffffff01590e98 ra : ffffffff01590e58 sp : ffff8f80001f39b0 > gp : ffffffff81512a60 tp : ffffaf80024872c0 t0 : ffffaf800247e000 > t1 : 00000000000007e0 t2 : 0000000000000000 s0 : ffff8f80001f39f0 > s1 : 00007fff89ac4000 a0 : ffffffff015dd7e8 a1 : 0000000000000086 > a2 : 0000000000000000 a3 : ffffaf8000000000 a4 : ffffaf80024882c0 > a5 : 0000000000000000 a6 : ffffaf800328d780 a7 : 00000000000001cc > s2 : ffffaf800197bd00 s3 : 00000000000828c4 s4 : ffffaf800248c000 > s5 : ffffaf800247d000 s6 : 0000000000001000 s7 : 0000000000001000 > s8 : 0000000000000000 s9 : 00007fff861fd500 s10: 0000000000000001 > s11: 0000000000800000 t3 : 00000000000004d3 t4 : 00000000000004d3 > t5 : ffffffff814126e0 t6 : ffffffff81412700 > status: 0000000200000120 badaddr: 0000000000000508 cause: 000000000000000d > [<ffffffff01590e98>] __kvm_write_guest_page+0x94/0xa6 [kvm] > [<ffffffff015943a6>] kvm_vcpu_write_guest+0x56/0x90 [kvm] > [<ffffffff015a175c>] kvm_pmu_clear_snapshot_area+0x42/0x7e [kvm] > [<ffffffff015a1972>] kvm_riscv_vcpu_pmu_deinit.part.0+0xe0/0x14e [kvm] > [<ffffffff015a2ad0>] kvm_riscv_vcpu_pmu_deinit+0x1a/0x24 [kvm] > [<ffffffff0159b344>] kvm_arch_vcpu_destroy+0x28/0x4c [kvm] > [<ffffffff0158e420>] kvm_destroy_vcpus+0x5a/0xda [kvm] > [<ffffffff0159930c>] kvm_arch_destroy_vm+0x14/0x28 [kvm] > [<ffffffff01593260>] kvm_destroy_vm+0x168/0x2a0 [kvm] > [<ffffffff015933d4>] kvm_put_kvm+0x3c/0x58 [kvm] > [<ffffffff01593412>] kvm_vm_release+0x22/0x2e [kvm] > > [...] Here is the summary with links: - RISC-V: KVM: Don't zero-out PMU snapshot area before freeing data https://git.kernel.org/riscv/c/47d40d93292d You are awesome, thank you!
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c index bcf41d6e0df0..2707a51b082c 100644 --- a/arch/riscv/kvm/vcpu_pmu.c +++ b/arch/riscv/kvm/vcpu_pmu.c @@ -391,19 +391,9 @@ int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num, static void kvm_pmu_clear_snapshot_area(struct kvm_vcpu *vcpu) { struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu); - int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data); - if (kvpmu->sdata) { - if (kvpmu->snapshot_addr != INVALID_GPA) { - memset(kvpmu->sdata, 0, snapshot_area_size); - kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, - kvpmu->sdata, snapshot_area_size); - } else { - pr_warn("snapshot address invalid\n"); - } - kfree(kvpmu->sdata); - kvpmu->sdata = NULL; - } + kfree(kvpmu->sdata); + kvpmu->sdata = NULL; kvpmu->snapshot_addr = INVALID_GPA; }
With the latest Linux-6.11-rc3, the below NULL pointer crash is observed when SBI PMU snapshot is enabled for the guest and the guest is forcefully powered-off. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000508 Oops [#1] Modules linked in: kvm CPU: 0 UID: 0 PID: 61 Comm: term-poll Not tainted 6.11.0-rc3-00018-g44d7178dd77a #3 Hardware name: riscv-virtio,qemu (DT) epc : __kvm_write_guest_page+0x94/0xa6 [kvm] ra : __kvm_write_guest_page+0x54/0xa6 [kvm] epc : ffffffff01590e98 ra : ffffffff01590e58 sp : ffff8f80001f39b0 gp : ffffffff81512a60 tp : ffffaf80024872c0 t0 : ffffaf800247e000 t1 : 00000000000007e0 t2 : 0000000000000000 s0 : ffff8f80001f39f0 s1 : 00007fff89ac4000 a0 : ffffffff015dd7e8 a1 : 0000000000000086 a2 : 0000000000000000 a3 : ffffaf8000000000 a4 : ffffaf80024882c0 a5 : 0000000000000000 a6 : ffffaf800328d780 a7 : 00000000000001cc s2 : ffffaf800197bd00 s3 : 00000000000828c4 s4 : ffffaf800248c000 s5 : ffffaf800247d000 s6 : 0000000000001000 s7 : 0000000000001000 s8 : 0000000000000000 s9 : 00007fff861fd500 s10: 0000000000000001 s11: 0000000000800000 t3 : 00000000000004d3 t4 : 00000000000004d3 t5 : ffffffff814126e0 t6 : ffffffff81412700 status: 0000000200000120 badaddr: 0000000000000508 cause: 000000000000000d [<ffffffff01590e98>] __kvm_write_guest_page+0x94/0xa6 [kvm] [<ffffffff015943a6>] kvm_vcpu_write_guest+0x56/0x90 [kvm] [<ffffffff015a175c>] kvm_pmu_clear_snapshot_area+0x42/0x7e [kvm] [<ffffffff015a1972>] kvm_riscv_vcpu_pmu_deinit.part.0+0xe0/0x14e [kvm] [<ffffffff015a2ad0>] kvm_riscv_vcpu_pmu_deinit+0x1a/0x24 [kvm] [<ffffffff0159b344>] kvm_arch_vcpu_destroy+0x28/0x4c [kvm] [<ffffffff0158e420>] kvm_destroy_vcpus+0x5a/0xda [kvm] [<ffffffff0159930c>] kvm_arch_destroy_vm+0x14/0x28 [kvm] [<ffffffff01593260>] kvm_destroy_vm+0x168/0x2a0 [kvm] [<ffffffff015933d4>] kvm_put_kvm+0x3c/0x58 [kvm] [<ffffffff01593412>] kvm_vm_release+0x22/0x2e [kvm] Clearly, the kvm_vcpu_write_guest() function is crashing because it is being called from kvm_pmu_clear_snapshot_area() upon guest tear down. To address the above issue, simplify the kvm_pmu_clear_snapshot_area() to not zero-out PMU snapshot area from kvm_pmu_clear_snapshot_area() because the guest is anyway being tore down. The kvm_pmu_clear_snapshot_area() is also called when guest changes PMU snapshot area of a VCPU but even in this case the previous PMU snaphsot area must not be zeroed-out because the guest might have reclaimed the pervious PMU snapshot area for some other purpose. Fixes: c2f41ddbcdd7 ("RISC-V: KVM: Implement SBI PMU Snapshot feature") Signed-off-by: Anup Patel <apatel@ventanamicro.com> --- arch/riscv/kvm/vcpu_pmu.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-)