Message ID | 20241028234533.942542-1-rananta@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] KVM: arm64: Get rid of userspace_irqchip_in_use | expand |
On Mon, 28 Oct 2024 23:45:33 +0000, Raghavendra Rao Ananta <rananta@google.com> wrote: > > Improper use of userspace_irqchip_in_use led to syzbot hitting the > following WARN_ON() in kvm_timer_update_irq(): > > WARNING: CPU: 0 PID: 3281 at arch/arm64/kvm/arch_timer.c:459 > kvm_timer_update_irq+0x21c/0x394 > Call trace: > kvm_timer_update_irq+0x21c/0x394 arch/arm64/kvm/arch_timer.c:459 > kvm_timer_vcpu_reset+0x158/0x684 arch/arm64/kvm/arch_timer.c:968 > kvm_reset_vcpu+0x3b4/0x560 arch/arm64/kvm/reset.c:264 > kvm_vcpu_set_target arch/arm64/kvm/arm.c:1553 [inline] > kvm_arch_vcpu_ioctl_vcpu_init arch/arm64/kvm/arm.c:1573 [inline] > kvm_arch_vcpu_ioctl+0x112c/0x1b3c arch/arm64/kvm/arm.c:1695 > kvm_vcpu_ioctl+0x4ec/0xf74 virt/kvm/kvm_main.c:4658 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:907 [inline] > __se_sys_ioctl fs/ioctl.c:893 [inline] > __arm64_sys_ioctl+0x108/0x184 fs/ioctl.c:893 > __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] > invoke_syscall+0x78/0x1b8 arch/arm64/kernel/syscall.c:49 > el0_svc_common+0xe8/0x1b0 arch/arm64/kernel/syscall.c:132 > do_el0_svc+0x40/0x50 arch/arm64/kernel/syscall.c:151 > el0_svc+0x54/0x14c arch/arm64/kernel/entry-common.c:712 > el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 > el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 > > The following sequence led to the scenario: > - Userspace creates a VM and a vCPU. > - The vCPU is initialized with KVM_ARM_VCPU_PMU_V3 during > KVM_ARM_VCPU_INIT. > - Without any other setup, such as vGIC or vPMU, userspace issues > KVM_RUN on the vCPU. Since the vPMU is requested, but not setup, > kvm_arm_pmu_v3_enable() fails in kvm_arch_vcpu_run_pid_change(). > As a result, KVM_RUN returns after enabling the timer, but before > incrementing 'userspace_irqchip_in_use': > kvm_arch_vcpu_run_pid_change() > ret = kvm_arm_pmu_v3_enable() > if (!vcpu->arch.pmu.created) > return -EINVAL; > if (ret) > return ret; > [...] > if (!irqchip_in_kernel(kvm)) > static_branch_inc(&userspace_irqchip_in_use); > - Userspace ignores the error and issues KVM_ARM_VCPU_INIT again. > Since the timer is already enabled, control moves through the > following flow, ultimately hitting the WARN_ON(): > kvm_timer_vcpu_reset() > if (timer->enabled) > kvm_timer_update_irq() > if (!userspace_irqchip()) > ret = kvm_vgic_inject_irq() > ret = vgic_lazy_init() > if (unlikely(!vgic_initialized(kvm))) > if (kvm->arch.vgic.vgic_model != > KVM_DEV_TYPE_ARM_VGIC_V2) > return -EBUSY; > WARN_ON(ret); > > Theoretically, since userspace_irqchip_in_use's functionality can be nit: this isn't theoretical at all. > simply replaced by '!irqchip_in_kernel()', get rid of the static key > to avoid the mismanagement, which also helps with the syzbot issue. Did you have a chance to check whether this had any negative impact on actual workloads? Since the entry/exit code is a bit of a hot spot, I'd like to make sure we're not penalising the common case (I only wrote this patch while waiting in an airport, and didn't test it at all). Any such data about it would be very welcome in the commit message. Thanks, M.
On Tue, Oct 29, 2024 at 9:27 AM Marc Zyngier <maz@kernel.org> wrote: > > On Mon, 28 Oct 2024 23:45:33 +0000, > Raghavendra Rao Ananta <rananta@google.com> wrote: > > > Did you have a chance to check whether this had any negative impact on > actual workloads? Since the entry/exit code is a bit of a hot spot, > I'd like to make sure we're not penalising the common case (I only > wrote this patch while waiting in an airport, and didn't test it at > all). > I ran the kvm selftests, kvm-unit-tests and booted a linux guest to test the change and noticed no failures. Any specific test you want to try out? > Any such data about it would be very welcome in the commit message. > Sure, I'll include it if we have a v3. Thank you. Raghavendra
On Tue, 29 Oct 2024 17:06:09 +0000, Raghavendra Rao Ananta <rananta@google.com> wrote: > > On Tue, Oct 29, 2024 at 9:27 AM Marc Zyngier <maz@kernel.org> wrote: > > > > On Mon, 28 Oct 2024 23:45:33 +0000, > > Raghavendra Rao Ananta <rananta@google.com> wrote: > > > > > Did you have a chance to check whether this had any negative impact on > > actual workloads? Since the entry/exit code is a bit of a hot spot, > > I'd like to make sure we're not penalising the common case (I only > > wrote this patch while waiting in an airport, and didn't test it at > > all). > > > I ran the kvm selftests, kvm-unit-tests and booted a linux guest to > test the change and noticed no failures. > Any specific test you want to try out? My question is not about failures (I didn't expect any), but specifically about *performance*, and whether checking the flag without a static key can lead to any performance drop on the hot path. Can you please run an exit-heavy workload (such as hackbench, for example), and report any significant delta you could measure? Thanks, M.
On Tue, Oct 29, 2024 at 11:47 AM Marc Zyngier <maz@kernel.org> wrote: > > On Tue, 29 Oct 2024 17:06:09 +0000, > Raghavendra Rao Ananta <rananta@google.com> wrote: > > > > On Tue, Oct 29, 2024 at 9:27 AM Marc Zyngier <maz@kernel.org> wrote: > > > > > > On Mon, 28 Oct 2024 23:45:33 +0000, > > > Raghavendra Rao Ananta <rananta@google.com> wrote: > > > > > > > Did you have a chance to check whether this had any negative impact on > > > actual workloads? Since the entry/exit code is a bit of a hot spot, > > > I'd like to make sure we're not penalising the common case (I only > > > wrote this patch while waiting in an airport, and didn't test it at > > > all). > > > > > I ran the kvm selftests, kvm-unit-tests and booted a linux guest to > > test the change and noticed no failures. > > Any specific test you want to try out? > > My question is not about failures (I didn't expect any), but > specifically about *performance*, and whether checking the flag > without a static key can lead to any performance drop on the hot path. > > Can you please run an exit-heavy workload (such as hackbench, for > example), and report any significant delta you could measure? Oh, I see. I ran hackbench and micro-bench from kvm-unit-tests (which also causes a lot of entry/exits), on Ampere Altra with kernel at v6.12-rc1, and see no significant difference in perf. hackbench: ========= Ran on a guest with 64 vCPUs and backed by 8G of memory. The results are an average of 3 runs: Task groups | Baseline | Patch | Approx. entry/exits ----------------|------------|--------- |------------------------ 100 | 0.154 | 0.164 | 150k 250 | 0.456 | 0.458 | 500k 500 | 0.851 | 0.826 | 920k (Total tasks for each row == task groups * 40) kvm-unit-tests micro-bench ====================== The test causes ~530k entry/exits. Baseline: name total ns avg ns -------------------------------------------------------------------------------------------- hvc 20095360.0 306.0 mmio_read_user 110350040.0 1683.0 mmio_read_vgic 29572840.0 451.0 eoi 964080.0 14.0 ipi 126236640.0 1926.0 lpi 142848920.0 2179.0 timer_10ms 231040.0 902.0 Patch: name total ns avg ns -------------------------------------------------------------------------------------------- hvc 20067680.0 306.0 mmio_read_user 109513800.0 1671.0 mmio_read_vgic 29190080.0 445.0 eoi 963400.0 14.0 ipi 116481640.0 1777.0 lpi 136556000.0 2083.0 timer_10ms 234120.0 914.0 Thank you. Raghavendra
On Wed, 30 Oct 2024 00:16:48 +0000, Raghavendra Rao Ananta <rananta@google.com> wrote: > > On Tue, Oct 29, 2024 at 11:47 AM Marc Zyngier <maz@kernel.org> wrote: > > > > On Tue, 29 Oct 2024 17:06:09 +0000, > > Raghavendra Rao Ananta <rananta@google.com> wrote: > > > > > > On Tue, Oct 29, 2024 at 9:27 AM Marc Zyngier <maz@kernel.org> wrote: > > > > > > > > On Mon, 28 Oct 2024 23:45:33 +0000, > > > > Raghavendra Rao Ananta <rananta@google.com> wrote: > > > > > > > > > Did you have a chance to check whether this had any negative impact on > > > > actual workloads? Since the entry/exit code is a bit of a hot spot, > > > > I'd like to make sure we're not penalising the common case (I only > > > > wrote this patch while waiting in an airport, and didn't test it at > > > > all). > > > > > > > I ran the kvm selftests, kvm-unit-tests and booted a linux guest to > > > test the change and noticed no failures. > > > Any specific test you want to try out? > > > > My question is not about failures (I didn't expect any), but > > specifically about *performance*, and whether checking the flag > > without a static key can lead to any performance drop on the hot path. > > > > Can you please run an exit-heavy workload (such as hackbench, for > > example), and report any significant delta you could measure? > > Oh, I see. I ran hackbench and micro-bench from kvm-unit-tests (which > also causes a lot of entry/exits), on Ampere Altra with kernel at > v6.12-rc1, and see no significant difference in perf. Thanks for running this stuff. > timer_10ms 231040.0 902.0 > timer_10ms 234120.0 914.0 This seems to be the only case were we are adversely affected by this change. In the grand scheme of thins, that's noise. But this gives us a clear line of sight for the removal of the in-kernel interrupts back to userspace. Thanks, M.
On Wed, Oct 30, 2024 at 1:22 AM Marc Zyngier <maz@kernel.org> wrote: > > On Wed, 30 Oct 2024 00:16:48 +0000, > Raghavendra Rao Ananta <rananta@google.com> wrote: > > > > On Tue, Oct 29, 2024 at 11:47 AM Marc Zyngier <maz@kernel.org> wrote: > > > > > > On Tue, 29 Oct 2024 17:06:09 +0000, > > > Raghavendra Rao Ananta <rananta@google.com> wrote: > > > > > > > > On Tue, Oct 29, 2024 at 9:27 AM Marc Zyngier <maz@kernel.org> wrote: > > > > > > > > > > On Mon, 28 Oct 2024 23:45:33 +0000, > > > > > Raghavendra Rao Ananta <rananta@google.com> wrote: > > > > > > > > > > > Did you have a chance to check whether this had any negative impact on > > > > > actual workloads? Since the entry/exit code is a bit of a hot spot, > > > > > I'd like to make sure we're not penalising the common case (I only > > > > > wrote this patch while waiting in an airport, and didn't test it at > > > > > all). > > > > > > > > > I ran the kvm selftests, kvm-unit-tests and booted a linux guest to > > > > test the change and noticed no failures. > > > > Any specific test you want to try out? > > > > > > My question is not about failures (I didn't expect any), but > > > specifically about *performance*, and whether checking the flag > > > without a static key can lead to any performance drop on the hot path. > > > > > > Can you please run an exit-heavy workload (such as hackbench, for > > > example), and report any significant delta you could measure? > > > > Oh, I see. I ran hackbench and micro-bench from kvm-unit-tests (which > > also causes a lot of entry/exits), on Ampere Altra with kernel at > > v6.12-rc1, and see no significant difference in perf. > > Thanks for running this stuff. > > > timer_10ms 231040.0 902.0 > > timer_10ms 234120.0 914.0 > > This seems to be the only case were we are adversely affected by this > change. Hmm, I'm not sure how much we want to trust this comparison. For instance, I just ran micro-bench again a few more times and here are the outcomes of timer_10ms for each try with the patch: Tries total ns avg ns ----------------------------------------------------------------------------------- 1_timer_10ms 231840.0 905.0 2_timer_10ms 234560.0 916.0 3_timer_10ms 227440.0 888.0 4_timer_10ms 236640.0 924.0 5_timer_10ms 231200.0 903.0 Here's a few on the baseline: Tries total ns avg ns ----------------------------------------------------------------------------------- 1_timer_10ms 231080.0 902.0 2_timer_10ms 238040.0 929.0 3_timer_10ms 231680.0 905.0 4_timer_10ms 229280.0 895.0 5_timer_10ms 228520.0 892.0 > In the grand scheme of thins, that's noise. But this gives us > a clear line of sight for the removal of the in-kernel interrupts back > to userspace. Sorry, I didn't follow you completely on this part. Thank you. Raghavendra
On Wed, 30 Oct 2024 16:36:19 +0000, Raghavendra Rao Ananta <rananta@google.com> wrote: > > On Wed, Oct 30, 2024 at 1:22 AM Marc Zyngier <maz@kernel.org> wrote: > > > > On Wed, 30 Oct 2024 00:16:48 +0000, > > Raghavendra Rao Ananta <rananta@google.com> wrote: > > > > > > On Tue, Oct 29, 2024 at 11:47 AM Marc Zyngier <maz@kernel.org> wrote: > > > > > > > > On Tue, 29 Oct 2024 17:06:09 +0000, > > > > Raghavendra Rao Ananta <rananta@google.com> wrote: > > > > > > > > > > On Tue, Oct 29, 2024 at 9:27 AM Marc Zyngier <maz@kernel.org> wrote: > > > > > > > > > > > > On Mon, 28 Oct 2024 23:45:33 +0000, > > > > > > Raghavendra Rao Ananta <rananta@google.com> wrote: > > > > > > > > > > > > > Did you have a chance to check whether this had any negative impact on > > > > > > actual workloads? Since the entry/exit code is a bit of a hot spot, > > > > > > I'd like to make sure we're not penalising the common case (I only > > > > > > wrote this patch while waiting in an airport, and didn't test it at > > > > > > all). > > > > > > > > > > > I ran the kvm selftests, kvm-unit-tests and booted a linux guest to > > > > > test the change and noticed no failures. > > > > > Any specific test you want to try out? > > > > > > > > My question is not about failures (I didn't expect any), but > > > > specifically about *performance*, and whether checking the flag > > > > without a static key can lead to any performance drop on the hot path. > > > > > > > > Can you please run an exit-heavy workload (such as hackbench, for > > > > example), and report any significant delta you could measure? > > > > > > Oh, I see. I ran hackbench and micro-bench from kvm-unit-tests (which > > > also causes a lot of entry/exits), on Ampere Altra with kernel at > > > v6.12-rc1, and see no significant difference in perf. > > > > Thanks for running this stuff. > > > > > timer_10ms 231040.0 902.0 > > > timer_10ms 234120.0 914.0 > > > > This seems to be the only case were we are adversely affected by this > > change. > Hmm, I'm not sure how much we want to trust this comparison. For > instance, I just ran micro-bench again a few more times and here are > the outcomes of timer_10ms for each try with the patch: > > Tries total ns > avg ns > ----------------------------------------------------------------------------------- > 1_timer_10ms 231840.0 905.0 > 2_timer_10ms 234560.0 916.0 > 3_timer_10ms 227440.0 888.0 > 4_timer_10ms 236640.0 924.0 > 5_timer_10ms 231200.0 903.0 > > Here's a few on the baseline: > > Tries total ns > avg ns > ----------------------------------------------------------------------------------- > 1_timer_10ms 231080.0 902.0 > 2_timer_10ms 238040.0 929.0 > 3_timer_10ms 231680.0 905.0 > 4_timer_10ms 229280.0 895.0 > 5_timer_10ms 228520.0 892.0 OK, so this benchmark is all over the place, and we can't derive much from it. > > In the grand scheme of thins, that's noise. But this gives us > > a clear line of sight for the removal of the in-kernel interrupts back > > to userspace. > Sorry, I didn't follow you completely on this part. Just me moaning. The code that was gated by the static key that you just removed is used to signal interrupts from the kernel back to userspace, and I'm resisting the urge to remove it altogether now. M.
On Mon, 28 Oct 2024 23:45:33 +0000, Raghavendra Rao Ananta wrote: > Improper use of userspace_irqchip_in_use led to syzbot hitting the > following WARN_ON() in kvm_timer_update_irq(): > > WARNING: CPU: 0 PID: 3281 at arch/arm64/kvm/arch_timer.c:459 > kvm_timer_update_irq+0x21c/0x394 > Call trace: > kvm_timer_update_irq+0x21c/0x394 arch/arm64/kvm/arch_timer.c:459 > kvm_timer_vcpu_reset+0x158/0x684 arch/arm64/kvm/arch_timer.c:968 > kvm_reset_vcpu+0x3b4/0x560 arch/arm64/kvm/reset.c:264 > kvm_vcpu_set_target arch/arm64/kvm/arm.c:1553 [inline] > kvm_arch_vcpu_ioctl_vcpu_init arch/arm64/kvm/arm.c:1573 [inline] > kvm_arch_vcpu_ioctl+0x112c/0x1b3c arch/arm64/kvm/arm.c:1695 > kvm_vcpu_ioctl+0x4ec/0xf74 virt/kvm/kvm_main.c:4658 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:907 [inline] > __se_sys_ioctl fs/ioctl.c:893 [inline] > __arm64_sys_ioctl+0x108/0x184 fs/ioctl.c:893 > __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] > invoke_syscall+0x78/0x1b8 arch/arm64/kernel/syscall.c:49 > el0_svc_common+0xe8/0x1b0 arch/arm64/kernel/syscall.c:132 > do_el0_svc+0x40/0x50 arch/arm64/kernel/syscall.c:151 > el0_svc+0x54/0x14c arch/arm64/kernel/entry-common.c:712 > el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 > el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 > > [...] Applied to kvmarm/next, thanks! [1/1] KVM: arm64: Get rid of userspace_irqchip_in_use https://git.kernel.org/kvmarm/kvmarm/c/e571ebcff926 -- Best, Oliver
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 329619c6fa961..9f96594a0e05d 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -73,8 +73,6 @@ enum kvm_mode kvm_get_mode(void); static inline enum kvm_mode kvm_get_mode(void) { return KVM_MODE_NONE; }; #endif -DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); - extern unsigned int __ro_after_init kvm_sve_max_vl; extern unsigned int __ro_after_init kvm_host_sve_max_vl; int __init kvm_arm_init_sve(void); diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c index 879982b1cc739..1215df5904185 100644 --- a/arch/arm64/kvm/arch_timer.c +++ b/arch/arm64/kvm/arch_timer.c @@ -206,8 +206,7 @@ void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map) static inline bool userspace_irqchip(struct kvm *kvm) { - return static_branch_unlikely(&userspace_irqchip_in_use) && - unlikely(!irqchip_in_kernel(kvm)); + return unlikely(!irqchip_in_kernel(kvm)); } static void soft_timer_start(struct hrtimer *hrt, u64 ns) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index a0d01c46e4084..63f5c05e9dec6 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -69,7 +69,6 @@ DECLARE_KVM_NVHE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt); static bool vgic_present, kvm_arm_initialised; static DEFINE_PER_CPU(unsigned char, kvm_hyp_initialized); -DEFINE_STATIC_KEY_FALSE(userspace_irqchip_in_use); bool is_kvm_arm_initialised(void) { @@ -503,9 +502,6 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu) void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) { - if (vcpu_has_run_once(vcpu) && unlikely(!irqchip_in_kernel(vcpu->kvm))) - static_branch_dec(&userspace_irqchip_in_use); - kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache); kvm_timer_vcpu_terminate(vcpu); kvm_pmu_vcpu_destroy(vcpu); @@ -848,14 +844,6 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu) return ret; } - if (!irqchip_in_kernel(kvm)) { - /* - * Tell the rest of the code that there are userspace irqchip - * VMs in the wild. - */ - static_branch_inc(&userspace_irqchip_in_use); - } - /* * Initialize traps for protected VMs. * NOTE: Move to run in EL2 directly, rather than via a hypercall, once @@ -1072,7 +1060,7 @@ static bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu, int *ret) * state gets updated in kvm_timer_update_run and * kvm_pmu_update_run below). */ - if (static_branch_unlikely(&userspace_irqchip_in_use)) { + if (unlikely(!irqchip_in_kernel(vcpu->kvm))) { if (kvm_timer_should_notify_user(vcpu) || kvm_pmu_should_notify_user(vcpu)) { *ret = -EINTR; @@ -1194,7 +1182,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) vcpu->mode = OUTSIDE_GUEST_MODE; isb(); /* Ensure work in x_flush_hwstate is committed */ kvm_pmu_sync_hwstate(vcpu); - if (static_branch_unlikely(&userspace_irqchip_in_use)) + if (unlikely(!irqchip_in_kernel(vcpu->kvm))) kvm_timer_sync_user(vcpu); kvm_vgic_sync_hwstate(vcpu); local_irq_enable(); @@ -1240,7 +1228,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) * we don't want vtimer interrupts to race with syncing the * timer virtual interrupt state. */ - if (static_branch_unlikely(&userspace_irqchip_in_use)) + if (unlikely(!irqchip_in_kernel(vcpu->kvm))) kvm_timer_sync_user(vcpu); kvm_arch_vcpu_ctxsync_fp(vcpu);
Improper use of userspace_irqchip_in_use led to syzbot hitting the following WARN_ON() in kvm_timer_update_irq(): WARNING: CPU: 0 PID: 3281 at arch/arm64/kvm/arch_timer.c:459 kvm_timer_update_irq+0x21c/0x394 Call trace: kvm_timer_update_irq+0x21c/0x394 arch/arm64/kvm/arch_timer.c:459 kvm_timer_vcpu_reset+0x158/0x684 arch/arm64/kvm/arch_timer.c:968 kvm_reset_vcpu+0x3b4/0x560 arch/arm64/kvm/reset.c:264 kvm_vcpu_set_target arch/arm64/kvm/arm.c:1553 [inline] kvm_arch_vcpu_ioctl_vcpu_init arch/arm64/kvm/arm.c:1573 [inline] kvm_arch_vcpu_ioctl+0x112c/0x1b3c arch/arm64/kvm/arm.c:1695 kvm_vcpu_ioctl+0x4ec/0xf74 virt/kvm/kvm_main.c:4658 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __arm64_sys_ioctl+0x108/0x184 fs/ioctl.c:893 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x78/0x1b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0xe8/0x1b0 arch/arm64/kernel/syscall.c:132 do_el0_svc+0x40/0x50 arch/arm64/kernel/syscall.c:151 el0_svc+0x54/0x14c arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 The following sequence led to the scenario: - Userspace creates a VM and a vCPU. - The vCPU is initialized with KVM_ARM_VCPU_PMU_V3 during KVM_ARM_VCPU_INIT. - Without any other setup, such as vGIC or vPMU, userspace issues KVM_RUN on the vCPU. Since the vPMU is requested, but not setup, kvm_arm_pmu_v3_enable() fails in kvm_arch_vcpu_run_pid_change(). As a result, KVM_RUN returns after enabling the timer, but before incrementing 'userspace_irqchip_in_use': kvm_arch_vcpu_run_pid_change() ret = kvm_arm_pmu_v3_enable() if (!vcpu->arch.pmu.created) return -EINVAL; if (ret) return ret; [...] if (!irqchip_in_kernel(kvm)) static_branch_inc(&userspace_irqchip_in_use); - Userspace ignores the error and issues KVM_ARM_VCPU_INIT again. Since the timer is already enabled, control moves through the following flow, ultimately hitting the WARN_ON(): kvm_timer_vcpu_reset() if (timer->enabled) kvm_timer_update_irq() if (!userspace_irqchip()) ret = kvm_vgic_inject_irq() ret = vgic_lazy_init() if (unlikely(!vgic_initialized(kvm))) if (kvm->arch.vgic.vgic_model != KVM_DEV_TYPE_ARM_VGIC_V2) return -EBUSY; WARN_ON(ret); Theoretically, since userspace_irqchip_in_use's functionality can be simply replaced by '!irqchip_in_kernel()', get rid of the static key to avoid the mismanagement, which also helps with the syzbot issue. Cc: <stable@vger.kernel.org> Reported-by: syzbot <syzkaller@googlegroups.com> Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> --- v2: - Picked the diff shared by Marc to get rid of 'userspace_irqchip_in_use' (thanks). - Adjusted the commit message accordingly. v1: https://lore.kernel.org/all/20241025221220.2985227-1-rananta@google.com/ arch/arm64/include/asm/kvm_host.h | 2 -- arch/arm64/kvm/arch_timer.c | 3 +-- arch/arm64/kvm/arm.c | 18 +++--------------- 3 files changed, 4 insertions(+), 19 deletions(-) base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc