Message ID | 20200924110706.254996-6-alexandru.elisei@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm_pmu: Use NMI for perf interrupt | expand |
On Thu, Sep 24, 2020 at 12:07:04PM +0100, Alexandru Elisei wrote: > From: Julien Thierry <julien.thierry@arm.com> > > kvm_vcpu_kick() is not NMI safe. When the overflow handler is called from > NMI context, defer waking the vcpu to an irq_work queue. > > A vcpu can be freed while it's not running by kvm_destroy_vm(). Prevent > running the irq_work for a non-existent vcpu by calling irq_work_sync() on > the PMU destroy path. > > Cc: Julien Thierry <julien.thierry.kdev@gmail.com> > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Will Deacon <will.deacon@arm.com> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: James Morse <james.morse@arm.com> > Cc: Suzuki K Pouloze <suzuki.poulose@arm.com> > Cc: kvm@vger.kernel.org > Cc: kvmarm@lists.cs.columbia.edu > Signed-off-by: Julien Thierry <julien.thierry@arm.com> > Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) > [Alexandru E.: Added irq_work_sync()] > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> > --- > I suggested in v6 that I will add an irq_work_sync() to > kvm_pmu_vcpu_reset(). It turns out it's not necessary: a vcpu reset is done > by the vcpu being reset with interrupts enabled, which means all the work > has had a chance to run before the reset takes place. I don't understand this ^^ But the patch itself looks good, so I'm going to queue this lot anyway! Will
On 2020-09-24 12:07, Alexandru Elisei wrote: > From: Julien Thierry <julien.thierry@arm.com> > > kvm_vcpu_kick() is not NMI safe. When the overflow handler is called > from > NMI context, defer waking the vcpu to an irq_work queue. > > A vcpu can be freed while it's not running by kvm_destroy_vm(). Prevent > running the irq_work for a non-existent vcpu by calling irq_work_sync() > on > the PMU destroy path. > > Cc: Julien Thierry <julien.thierry.kdev@gmail.com> > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Will Deacon <will.deacon@arm.com> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: James Morse <james.morse@arm.com> > Cc: Suzuki K Pouloze <suzuki.poulose@arm.com> > Cc: kvm@vger.kernel.org > Cc: kvmarm@lists.cs.columbia.edu > Signed-off-by: Julien Thierry <julien.thierry@arm.com> > Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) > [Alexandru E.: Added irq_work_sync()] > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> > --- > I suggested in v6 that I will add an irq_work_sync() to > kvm_pmu_vcpu_reset(). It turns out it's not necessary: a vcpu reset is > done > by the vcpu being reset with interrupts enabled, which means all the > work > has had a chance to run before the reset takes place. I don't understand your argument about interrupts being enabled. The real reason for not needing any synchronization is that all that the queued work does is to kick the vcpu. Given that the vcpu is resetting, no amount of kicking is going to change anything (it is already outside of the guest). Things are obviously different on destroy, where the vcpu is actively going away and we need to make sure we don't use stale data. > > arch/arm64/kvm/pmu-emul.c | 26 +++++++++++++++++++++++++- > include/kvm/arm_pmu.h | 1 + > 2 files changed, 26 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c > index f0d0312c0a55..81916e360b1e 100644 > --- a/arch/arm64/kvm/pmu-emul.c > +++ b/arch/arm64/kvm/pmu-emul.c > @@ -269,6 +269,7 @@ void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu) > > for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) > kvm_pmu_release_perf_event(&pmu->pmc[i]); > + irq_work_sync(&vcpu->arch.pmu.overflow_work); > } > > u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu) > @@ -433,6 +434,22 @@ void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) > kvm_pmu_update_state(vcpu); > } > > +/** > + * When perf interrupt is an NMI, we cannot safely notify the vcpu > corresponding > + * to the event. > + * This is why we need a callback to do it once outside of the NMI > context. > + */ > +static void kvm_pmu_perf_overflow_notify_vcpu(struct irq_work *work) > +{ > + struct kvm_vcpu *vcpu; > + struct kvm_pmu *pmu; > + > + pmu = container_of(work, struct kvm_pmu, overflow_work); > + vcpu = kvm_pmc_to_vcpu(pmu->pmc); > + > + kvm_vcpu_kick(vcpu); > +} > + > /** > * When the perf event overflows, set the overflow status and inform > the vcpu. > */ > @@ -465,7 +482,11 @@ static void kvm_pmu_perf_overflow(struct > perf_event *perf_event, > > if (kvm_pmu_overflow_status(vcpu)) { > kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu); > - kvm_vcpu_kick(vcpu); > + > + if (!in_nmi()) > + kvm_vcpu_kick(vcpu); > + else > + irq_work_queue(&vcpu->arch.pmu.overflow_work); > } > > cpu_pmu->pmu.start(perf_event, PERF_EF_RELOAD); > @@ -764,6 +785,9 @@ static int kvm_arm_pmu_v3_init(struct kvm_vcpu > *vcpu) > return ret; > } > > + init_irq_work(&vcpu->arch.pmu.overflow_work, > + kvm_pmu_perf_overflow_notify_vcpu); > + > vcpu->arch.pmu.created = true; > return 0; > } > diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h > index 6db030439e29..dbf4f08d42e5 100644 > --- a/include/kvm/arm_pmu.h > +++ b/include/kvm/arm_pmu.h > @@ -27,6 +27,7 @@ struct kvm_pmu { > bool ready; > bool created; > bool irq_level; > + struct irq_work overflow_work; Nit: placing this new field right after the pmc array would avoid creating an unnecessary padding in the structure. Not a big deal, and definitely something we can sort out when applying the patch. > }; > > #define kvm_arm_pmu_v3_ready(v) ((v)->arch.pmu.ready) Reviewed-by: Marc Zyngier <maz@kernel.org> M.
Hello, On 9/29/20 9:11 AM, Marc Zyngier wrote: > On 2020-09-24 12:07, Alexandru Elisei wrote: >> From: Julien Thierry <julien.thierry@arm.com> >> >> kvm_vcpu_kick() is not NMI safe. When the overflow handler is called from >> NMI context, defer waking the vcpu to an irq_work queue. >> >> A vcpu can be freed while it's not running by kvm_destroy_vm(). Prevent >> running the irq_work for a non-existent vcpu by calling irq_work_sync() on >> the PMU destroy path. >> >> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> >> Cc: Marc Zyngier <marc.zyngier@arm.com> >> Cc: Will Deacon <will.deacon@arm.com> >> Cc: Mark Rutland <mark.rutland@arm.com> >> Cc: Catalin Marinas <catalin.marinas@arm.com> >> Cc: James Morse <james.morse@arm.com> >> Cc: Suzuki K Pouloze <suzuki.poulose@arm.com> >> Cc: kvm@vger.kernel.org >> Cc: kvmarm@lists.cs.columbia.edu >> Signed-off-by: Julien Thierry <julien.thierry@arm.com> >> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) >> [Alexandru E.: Added irq_work_sync()] >> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> >> --- >> I suggested in v6 that I will add an irq_work_sync() to >> kvm_pmu_vcpu_reset(). It turns out it's not necessary: a vcpu reset is done >> by the vcpu being reset with interrupts enabled, which means all the work >> has had a chance to run before the reset takes place. > > I don't understand your argument about interrupts being enabled. The real > reason for not needing any synchronization is that all that the queued work > does is to kick the vcpu. Given that the vcpu is resetting, no amount of > kicking is going to change anything (it is already outside of the guest). > > Things are obviously different on destroy, where the vcpu is actively going > away and we need to make sure we don't use stale data. Like you and Will noticed, the above really doesn't make much sense. The reason we don't need to wait for the irq_work to be finished on reset is indeed that the vcpu isn't freed, so we will never trigger a use-after-free bug. > >> >> arch/arm64/kvm/pmu-emul.c | 26 +++++++++++++++++++++++++- >> include/kvm/arm_pmu.h | 1 + >> 2 files changed, 26 insertions(+), 1 deletion(-) >> >> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c >> index f0d0312c0a55..81916e360b1e 100644 >> --- a/arch/arm64/kvm/pmu-emul.c >> +++ b/arch/arm64/kvm/pmu-emul.c >> @@ -269,6 +269,7 @@ void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu) >> >> for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) >> kvm_pmu_release_perf_event(&pmu->pmc[i]); >> + irq_work_sync(&vcpu->arch.pmu.overflow_work); >> } >> >> u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu) >> @@ -433,6 +434,22 @@ void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) >> kvm_pmu_update_state(vcpu); >> } >> >> +/** >> + * When perf interrupt is an NMI, we cannot safely notify the vcpu >> corresponding >> + * to the event. >> + * This is why we need a callback to do it once outside of the NMI context. >> + */ >> +static void kvm_pmu_perf_overflow_notify_vcpu(struct irq_work *work) >> +{ >> + struct kvm_vcpu *vcpu; >> + struct kvm_pmu *pmu; >> + >> + pmu = container_of(work, struct kvm_pmu, overflow_work); >> + vcpu = kvm_pmc_to_vcpu(pmu->pmc); >> + >> + kvm_vcpu_kick(vcpu); >> +} >> + >> /** >> * When the perf event overflows, set the overflow status and inform the vcpu. >> */ >> @@ -465,7 +482,11 @@ static void kvm_pmu_perf_overflow(struct >> perf_event *perf_event, >> >> if (kvm_pmu_overflow_status(vcpu)) { >> kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu); >> - kvm_vcpu_kick(vcpu); >> + >> + if (!in_nmi()) >> + kvm_vcpu_kick(vcpu); >> + else >> + irq_work_queue(&vcpu->arch.pmu.overflow_work); >> } >> >> cpu_pmu->pmu.start(perf_event, PERF_EF_RELOAD); >> @@ -764,6 +785,9 @@ static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu) >> return ret; >> } >> >> + init_irq_work(&vcpu->arch.pmu.overflow_work, >> + kvm_pmu_perf_overflow_notify_vcpu); >> + >> vcpu->arch.pmu.created = true; >> return 0; >> } >> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h >> index 6db030439e29..dbf4f08d42e5 100644 >> --- a/include/kvm/arm_pmu.h >> +++ b/include/kvm/arm_pmu.h >> @@ -27,6 +27,7 @@ struct kvm_pmu { >> bool ready; >> bool created; >> bool irq_level; >> + struct irq_work overflow_work; > > Nit: placing this new field right after the pmc array would avoid creating > an unnecessary padding in the structure. Not a big deal, and definitely > something we can sort out when applying the patch. That makes sense, overflow_work must be aligned to 8 bytes, and there are 16 elements in the pmc array, which means no padding is required for the overflow_work field. Thanks, Alex
Hi Will, On 9/28/20 6:57 PM, Will Deacon wrote: > On Thu, Sep 24, 2020 at 12:07:04PM +0100, Alexandru Elisei wrote: >> From: Julien Thierry <julien.thierry@arm.com> >> >> kvm_vcpu_kick() is not NMI safe. When the overflow handler is called from >> NMI context, defer waking the vcpu to an irq_work queue. >> >> A vcpu can be freed while it's not running by kvm_destroy_vm(). Prevent >> running the irq_work for a non-existent vcpu by calling irq_work_sync() on >> the PMU destroy path. >> >> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> >> Cc: Marc Zyngier <marc.zyngier@arm.com> >> Cc: Will Deacon <will.deacon@arm.com> >> Cc: Mark Rutland <mark.rutland@arm.com> >> Cc: Catalin Marinas <catalin.marinas@arm.com> >> Cc: James Morse <james.morse@arm.com> >> Cc: Suzuki K Pouloze <suzuki.poulose@arm.com> >> Cc: kvm@vger.kernel.org >> Cc: kvmarm@lists.cs.columbia.edu >> Signed-off-by: Julien Thierry <julien.thierry@arm.com> >> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) >> [Alexandru E.: Added irq_work_sync()] >> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> >> --- >> I suggested in v6 that I will add an irq_work_sync() to >> kvm_pmu_vcpu_reset(). It turns out it's not necessary: a vcpu reset is done >> by the vcpu being reset with interrupts enabled, which means all the work >> has had a chance to run before the reset takes place. > I don't understand this ^^ Marc had the same comment, I replied in his email. I thought about it and you're right, it doesn't make much sense. > > But the patch itself looks good, so I'm going to queue this lot anyway! Thank you for picking up the series! Thanks, Alex
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index f0d0312c0a55..81916e360b1e 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -269,6 +269,7 @@ void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu) for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) kvm_pmu_release_perf_event(&pmu->pmc[i]); + irq_work_sync(&vcpu->arch.pmu.overflow_work); } u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu) @@ -433,6 +434,22 @@ void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) kvm_pmu_update_state(vcpu); } +/** + * When perf interrupt is an NMI, we cannot safely notify the vcpu corresponding + * to the event. + * This is why we need a callback to do it once outside of the NMI context. + */ +static void kvm_pmu_perf_overflow_notify_vcpu(struct irq_work *work) +{ + struct kvm_vcpu *vcpu; + struct kvm_pmu *pmu; + + pmu = container_of(work, struct kvm_pmu, overflow_work); + vcpu = kvm_pmc_to_vcpu(pmu->pmc); + + kvm_vcpu_kick(vcpu); +} + /** * When the perf event overflows, set the overflow status and inform the vcpu. */ @@ -465,7 +482,11 @@ static void kvm_pmu_perf_overflow(struct perf_event *perf_event, if (kvm_pmu_overflow_status(vcpu)) { kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu); - kvm_vcpu_kick(vcpu); + + if (!in_nmi()) + kvm_vcpu_kick(vcpu); + else + irq_work_queue(&vcpu->arch.pmu.overflow_work); } cpu_pmu->pmu.start(perf_event, PERF_EF_RELOAD); @@ -764,6 +785,9 @@ static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu) return ret; } + init_irq_work(&vcpu->arch.pmu.overflow_work, + kvm_pmu_perf_overflow_notify_vcpu); + vcpu->arch.pmu.created = true; return 0; } diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h index 6db030439e29..dbf4f08d42e5 100644 --- a/include/kvm/arm_pmu.h +++ b/include/kvm/arm_pmu.h @@ -27,6 +27,7 @@ struct kvm_pmu { bool ready; bool created; bool irq_level; + struct irq_work overflow_work; }; #define kvm_arm_pmu_v3_ready(v) ((v)->arch.pmu.ready)