Message ID | 20220111153539.2532246-2-mark.rutland@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kvm: fix latent guest entry/exit bugs | expand |
Hi Mark, On Tue, 11 Jan 2022 15:35:35 +0000, Mark Rutland <mark.rutland@arm.com> wrote: > > When transitioning to/from guest mode, it is necessary to inform > lockdep, tracing, and RCU in a specific order, similar to the > requirements for transitions to/from user mode. Additionally, it is > necessary to perform vtime accounting for a window around running the > guest, with RCU enabled, such that timer interrupts taken from the guest > can be accounted as guest time. > > Most architectures don't handle all the necessary pieces, and a have a > number of common bugs, including unsafe usage of RCU during the window > between guest_enter() and guest_exit(). > > On x86, this was dealt with across commits: > > 87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs") > 0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit") > 9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit") > 3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text") > 135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text") > 160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling") > bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers") > > ... but those fixes are specific to x86, and as the resulting logic > (while correct) is split across generic helper functions and > x86-specific helper functions, it is difficult to see that the > entry/exit accounting is balanced. > > This patch adds generic helpers which architectures can use to handle > guest entry/exit consistently and correctly. The guest_{enter,exit}() > helpers are split into guest_timing_{enter,exit}() to perform vtime > accounting, and guest_context_{enter,exit}() to perform the necessary > context tracking and RCU management. The existing guest_{enter,exit}() > heleprs are left as wrappers of these. > > Atop this, new exit_to_guest_mode() and enter_from_guest_mode() helpers > are added to handle the ordering of lockdep, tracing, and RCU manageent. > These are named to align with exit_to_user_mode() and > enter_from_user_mode(). > > Subsequent patches will migrate architectures over to the new helpers, > following a sequence: > > guest_timing_enter_irqoff(); > > exit_to_guest_mode(); > < run the vcpu > > enter_from_guest_mode(); > > < take any pending IRQs > > > guest_timing_exit_irqoff(); > > This sequences handles all of the above correctly, and more clearly > balances the entry and exit portions, making it easier to understand. > > The existing helpers are marked as deprecated, and will be removed once > all architectures have been converted. > > There should be no functional change as a result of this patch. > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> Thanks a lot for looking into this and writing this up. I have a couple of comments below, but that's pretty much cosmetic and is only there to ensure that I actually understand this stuff. FWIW: Reviewed-by: Marc Zyngier <maz@kernel.org> > --- > include/linux/kvm_host.h | 108 +++++++++++++++++++++++++++++++++++++-- > 1 file changed, 105 insertions(+), 3 deletions(-) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index c310648cc8f1..13fcf7979880 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -29,6 +29,8 @@ > #include <linux/refcount.h> > #include <linux/nospec.h> > #include <linux/notifier.h> > +#include <linux/ftrace.h> > +#include <linux/instrumentation.h> > #include <asm/signal.h> > > #include <linux/kvm.h> > @@ -362,8 +364,11 @@ struct kvm_vcpu { > int last_used_slot; > }; > > -/* must be called with irqs disabled */ > -static __always_inline void guest_enter_irqoff(void) > +/* > + * Start accounting time towards a guest. > + * Must be called before entering guest context. > + */ > +static __always_inline void guest_timing_enter_irqoff(void) > { > /* > * This is running in ioctl context so its safe to assume that it's the > @@ -372,7 +377,17 @@ static __always_inline void guest_enter_irqoff(void) > instrumentation_begin(); > vtime_account_guest_enter(); > instrumentation_end(); > +} > > +/* > + * Enter guest context and enter an RCU extended quiescent state. > + * > + * This should be the last thing called before entering the guest, and must be > + * called after any potential use of RCU (including any potentially > + * instrumented code). nit: "the last thing called" is terribly ambiguous. Any architecture obviously calls a ****load of stuff after this point. Should this be 'the last thing involving RCU' instead? > + */ > +static __always_inline void guest_context_enter_irqoff(void) > +{ > /* > * KVM does not hold any references to rcu protected data when it > * switches CPU into a guest mode. In fact switching to a guest mode > @@ -388,16 +403,77 @@ static __always_inline void guest_enter_irqoff(void) > } > } > > -static __always_inline void guest_exit_irqoff(void) > +/* > + * Deprecated. Architectures should move to guest_timing_enter_irqoff() and > + * exit_to_guest_mode(). > + */ > +static __always_inline void guest_enter_irqoff(void) > +{ > + guest_timing_enter_irqoff(); > + guest_context_enter_irqoff(); > +} > + > +/** > + * exit_to_guest_mode - Fixup state when exiting to guest mode > + * > + * This is analagous to exit_to_user_mode(), and ensures we perform the > + * following in order: > + * > + * 1) Trace interrupts on state > + * 2) Invoke context tracking if enabled to adjust RCU state > + * 3) Tell lockdep that interrupts are enabled nit: or rather, are about to be enabled? Certainly on arm64, the enable happens much later, right at the point where we enter the guest for real. Thanks, M.
On Tue, Jan 11, 2022 at 05:54:59PM +0000, Marc Zyngier wrote: > Hi Mark, > > On Tue, 11 Jan 2022 15:35:35 +0000, > Mark Rutland <mark.rutland@arm.com> wrote: > > > > When transitioning to/from guest mode, it is necessary to inform > > lockdep, tracing, and RCU in a specific order, similar to the > > requirements for transitions to/from user mode. Additionally, it is > > necessary to perform vtime accounting for a window around running the > > guest, with RCU enabled, such that timer interrupts taken from the guest > > can be accounted as guest time. > > > > Most architectures don't handle all the necessary pieces, and a have a > > number of common bugs, including unsafe usage of RCU during the window > > between guest_enter() and guest_exit(). > > > > On x86, this was dealt with across commits: > > > > 87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs") > > 0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit") > > 9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit") > > 3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text") > > 135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text") > > 160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling") > > bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers") > > > > ... but those fixes are specific to x86, and as the resulting logic > > (while correct) is split across generic helper functions and > > x86-specific helper functions, it is difficult to see that the > > entry/exit accounting is balanced. > > > > This patch adds generic helpers which architectures can use to handle > > guest entry/exit consistently and correctly. The guest_{enter,exit}() > > helpers are split into guest_timing_{enter,exit}() to perform vtime > > accounting, and guest_context_{enter,exit}() to perform the necessary > > context tracking and RCU management. The existing guest_{enter,exit}() > > heleprs are left as wrappers of these. > > > > Atop this, new exit_to_guest_mode() and enter_from_guest_mode() helpers > > are added to handle the ordering of lockdep, tracing, and RCU manageent. > > These are named to align with exit_to_user_mode() and > > enter_from_user_mode(). > > > > Subsequent patches will migrate architectures over to the new helpers, > > following a sequence: > > > > guest_timing_enter_irqoff(); > > > > exit_to_guest_mode(); > > < run the vcpu > > > enter_from_guest_mode(); > > > > < take any pending IRQs > > > > > guest_timing_exit_irqoff(); > > > > This sequences handles all of the above correctly, and more clearly > > balances the entry and exit portions, making it easier to understand. > > > > The existing helpers are marked as deprecated, and will be removed once > > all architectures have been converted. > > > > There should be no functional change as a result of this patch. > > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > > Thanks a lot for looking into this and writing this up. I have a > couple of comments below, but that's pretty much cosmetic and is only > there to ensure that I actually understand this stuff. FWIW: > > Reviewed-by: Marc Zyngier <maz@kernel.org> Thanks! > > --- > > include/linux/kvm_host.h | 108 +++++++++++++++++++++++++++++++++++++-- > > 1 file changed, 105 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index c310648cc8f1..13fcf7979880 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -29,6 +29,8 @@ > > #include <linux/refcount.h> > > #include <linux/nospec.h> > > #include <linux/notifier.h> > > +#include <linux/ftrace.h> > > +#include <linux/instrumentation.h> > > #include <asm/signal.h> > > > > #include <linux/kvm.h> > > @@ -362,8 +364,11 @@ struct kvm_vcpu { > > int last_used_slot; > > }; > > > > -/* must be called with irqs disabled */ > > -static __always_inline void guest_enter_irqoff(void) > > +/* > > + * Start accounting time towards a guest. > > + * Must be called before entering guest context. > > + */ > > +static __always_inline void guest_timing_enter_irqoff(void) > > { > > /* > > * This is running in ioctl context so its safe to assume that it's the > > @@ -372,7 +377,17 @@ static __always_inline void guest_enter_irqoff(void) > > instrumentation_begin(); > > vtime_account_guest_enter(); > > instrumentation_end(); > > +} > > > > +/* > > + * Enter guest context and enter an RCU extended quiescent state. > > + * > > + * This should be the last thing called before entering the guest, and must be > > + * called after any potential use of RCU (including any potentially > > + * instrumented code). > > nit: "the last thing called" is terribly ambiguous. Any architecture > obviously calls a ****load of stuff after this point. Should this be > 'the last thing involving RCU' instead? I agree this is unclear and I struggled to fing good wording for this. Is the following any better? /* * Enter guest context and enter an RCU extended quiescent state. * * Between guest_context_enter_irqoff() and guest_context_exit_irqoff() it is * unsafe to use any code which may directly or indirectly use RCU, tracing * (including IRQ flag tracing), or lockdep. All code in this period must be * non-instrumentable. */ If that's good I can add similar to guest_context_exit_irqoff(). [...] > > +/** > > + * exit_to_guest_mode - Fixup state when exiting to guest mode > > + * > > + * This is analagous to exit_to_user_mode(), and ensures we perform the > > + * following in order: > > + * > > + * 1) Trace interrupts on state > > + * 2) Invoke context tracking if enabled to adjust RCU state > > + * 3) Tell lockdep that interrupts are enabled > > nit: or rather, are about to be enabled? Certainly on arm64, the > enable happens much later, right at the point where we enter the guest > for real. True; I'd cribbed the wording from the comment block above exit_to_user_mode(), but I stripped the context that made that clear. I'll make that: /** * exit_to_guest_mode - Fixup state when exiting to guest mode * * Entry to a guest will enable interrupts, but the kernel state is * interrupts disabled when this is invoked. Also tell RCU about it. * * 1) Trace interrupts on state * 2) Invoke context tracking if enabled to adjust RCU state * 3) Tell lockdep that interrupts are enabled * * Invoked from architecture specific code before entering a guest. * Must be called with interrupts disabled and the caller must be * non-instrumentable. * The caller has to invoke guest_timing_enter_irqoff() before this. * * Note: this is analagous to exit_to_user_mode(). */ ... with likewise for enter_from_guest_mode(), if that's clear enough? FWIW, the comment blcok for exit_to_user_mode() in include/linux/entry-common.h says: /** * exit_to_user_mode - Fixup state when exiting to user mode * * Syscall/interrupt exit enables interrupts, but the kernel state is * interrupts disabled when this is invoked. Also tell RCU about it. * * 1) Trace interrupts on state * 2) Invoke context tracking if enabled to adjust RCU state * 3) Invoke architecture specific last minute exit code, e.g. speculation * mitigations, etc.: arch_exit_to_user_mode() * 4) Tell lockdep that interrupts are enabled * * Invoked from architecture specific code when syscall_exit_to_user_mode() * is not suitable as the last step before returning to userspace. Must be * invoked with interrupts disabled and the caller must be * non-instrumentable. * The caller has to invoke syscall_exit_to_user_mode_work() before this. */ Thanks, Mark.
On Thu, 13 Jan 2022 11:01:30 +0000, Mark Rutland <mark.rutland@arm.com> wrote: > > On Tue, Jan 11, 2022 at 05:54:59PM +0000, Marc Zyngier wrote: > > Hi Mark, > > > > On Tue, 11 Jan 2022 15:35:35 +0000, > > Mark Rutland <mark.rutland@arm.com> wrote: [...] > > > +/* > > > + * Enter guest context and enter an RCU extended quiescent state. > > > + * > > > + * This should be the last thing called before entering the guest, and must be > > > + * called after any potential use of RCU (including any potentially > > > + * instrumented code). > > > > nit: "the last thing called" is terribly ambiguous. Any architecture > > obviously calls a ****load of stuff after this point. Should this be > > 'the last thing involving RCU' instead? > > I agree this is unclear and I struggled to fing good wording for this. Is the > following any better? > > /* > * Enter guest context and enter an RCU extended quiescent state. > * > * Between guest_context_enter_irqoff() and guest_context_exit_irqoff() it is > * unsafe to use any code which may directly or indirectly use RCU, tracing > * (including IRQ flag tracing), or lockdep. All code in this period must be > * non-instrumentable. > */ > > If that's good I can add similar to guest_context_exit_irqoff(). Yes, that's much clearer, thanks. > > [...] > > > > +/** > > > + * exit_to_guest_mode - Fixup state when exiting to guest mode > > > + * > > > + * This is analagous to exit_to_user_mode(), and ensures we perform the > > > + * following in order: > > > + * > > > + * 1) Trace interrupts on state > > > + * 2) Invoke context tracking if enabled to adjust RCU state > > > + * 3) Tell lockdep that interrupts are enabled > > > > nit: or rather, are about to be enabled? Certainly on arm64, the > > enable happens much later, right at the point where we enter the guest > > for real. > > True; I'd cribbed the wording from the comment block above exit_to_user_mode(), > but I stripped the context that made that clear. I'll make that: > > /** > * exit_to_guest_mode - Fixup state when exiting to guest mode > * > * Entry to a guest will enable interrupts, but the kernel state is > * interrupts disabled when this is invoked. Also tell RCU about it. > * > * 1) Trace interrupts on state > * 2) Invoke context tracking if enabled to adjust RCU state > * 3) Tell lockdep that interrupts are enabled > * > * Invoked from architecture specific code before entering a guest. > * Must be called with interrupts disabled and the caller must be > * non-instrumentable. > * The caller has to invoke guest_timing_enter_irqoff() before this. > * > * Note: this is analagous to exit_to_user_mode(). nit: analogous > */ > > ... with likewise for enter_from_guest_mode(), if that's clear enough? Yes, that's great. Thanks again, M.
On Thu, Jan 13, 2022 at 11:55:11AM +0000, Marc Zyngier wrote: > On Thu, 13 Jan 2022 11:01:30 +0000, > Mark Rutland <mark.rutland@arm.com> wrote: > > > > On Tue, Jan 11, 2022 at 05:54:59PM +0000, Marc Zyngier wrote: > > > Hi Mark, > > > > > > On Tue, 11 Jan 2022 15:35:35 +0000, > > > Mark Rutland <mark.rutland@arm.com> wrote: > > [...] > > > > > +/* > > > > + * Enter guest context and enter an RCU extended quiescent state. > > > > + * > > > > + * This should be the last thing called before entering the guest, and must be > > > > + * called after any potential use of RCU (including any potentially > > > > + * instrumented code). > > > > > > nit: "the last thing called" is terribly ambiguous. Any architecture > > > obviously calls a ****load of stuff after this point. Should this be > > > 'the last thing involving RCU' instead? > > > > I agree this is unclear and I struggled to fing good wording for this. Is the > > following any better? > > > > /* > > * Enter guest context and enter an RCU extended quiescent state. > > * > > * Between guest_context_enter_irqoff() and guest_context_exit_irqoff() it is > > * unsafe to use any code which may directly or indirectly use RCU, tracing > > * (including IRQ flag tracing), or lockdep. All code in this period must be > > * non-instrumentable. > > */ > > > > If that's good I can add similar to guest_context_exit_irqoff(). > > Yes, that's much clearer, thanks. > > > > > [...] > > > > > > +/** > > > > + * exit_to_guest_mode - Fixup state when exiting to guest mode > > > > + * > > > > + * This is analagous to exit_to_user_mode(), and ensures we perform the > > > > + * following in order: > > > > + * > > > > + * 1) Trace interrupts on state > > > > + * 2) Invoke context tracking if enabled to adjust RCU state > > > > + * 3) Tell lockdep that interrupts are enabled > > > > > > nit: or rather, are about to be enabled? Certainly on arm64, the > > > enable happens much later, right at the point where we enter the guest > > > for real. > > > > True; I'd cribbed the wording from the comment block above exit_to_user_mode(), > > but I stripped the context that made that clear. I'll make that: > > > > /** > > * exit_to_guest_mode - Fixup state when exiting to guest mode > > * > > * Entry to a guest will enable interrupts, but the kernel state is > > * interrupts disabled when this is invoked. Also tell RCU about it. > > * > > * 1) Trace interrupts on state > > * 2) Invoke context tracking if enabled to adjust RCU state > > * 3) Tell lockdep that interrupts are enabled > > * > > * Invoked from architecture specific code before entering a guest. > > * Must be called with interrupts disabled and the caller must be > > * non-instrumentable. > > * The caller has to invoke guest_timing_enter_irqoff() before this. > > * > > * Note: this is analagous to exit_to_user_mode(). > > nit: analogous > > > */ > > > > ... with likewise for enter_from_guest_mode(), if that's clear enough? > > Yes, that's great. Thanks; I've pushed out an updated branch with those changes (including the typo fixes). I'll wait until next week before sending out a v2 since I don't think that meaningfully affects the arch bits for other architectures. Mark.
On Tue, Jan 11, 2022, Mark Rutland wrote: > Atop this, new exit_to_guest_mode() and enter_from_guest_mode() helpers > are added to handle the ordering of lockdep, tracing, and RCU manageent. > These are named to align with exit_to_user_mode() and > enter_from_user_mode(). > > Subsequent patches will migrate architectures over to the new helpers, > following a sequence: > > guest_timing_enter_irqoff(); > > exit_to_guest_mode(); I'm not a fan of this nomenclature. First and foremost, virtualization refers to transfers to guest mode as VM-Enter, and transfers from guest mode as VM-Exit. It's really, really confusing to read this code from a virtualization perspective. The functions themselves are contradictory as the "enter" helper calls functions with "exit" in their name, and vice versa. We settled on xfer_to_guest_mode_work() for a similar conundrum in the past, though I don't love using xfer_to/from_guest_mode() as that makes it sound like those helpers handle the actual transition into guest mode, i.e. runs the vCPU. To avoid too much bikeshedding, what about reusing the names we all compromised on when we did this for x86 and call them kvm_guest_enter/exit_irqoff()? If x86 is converted in the first patch then we could even avoid temporary #ifdefs. > < run the vcpu > > enter_from_guest_mode(); > < take any pending IRQs > > > guest_timing_exit_irqoff();
On Thu, Jan 13, 2022 at 08:32:20PM +0000, Sean Christopherson wrote: > On Tue, Jan 11, 2022, Mark Rutland wrote: > > Atop this, new exit_to_guest_mode() and enter_from_guest_mode() helpers > > are added to handle the ordering of lockdep, tracing, and RCU manageent. > > These are named to align with exit_to_user_mode() and > > enter_from_user_mode(). > > > > Subsequent patches will migrate architectures over to the new helpers, > > following a sequence: > > > > guest_timing_enter_irqoff(); > > > > exit_to_guest_mode(); > > I'm not a fan of this nomenclature. First and foremost, virtualization refers to > transfers to guest mode as VM-Enter, and transfers from guest mode as VM-Exit. > It's really, really confusing to read this code from a virtualization perspective. > The functions themselves are contradictory as the "enter" helper calls functions > with "exit" in their name, and vice versa. Sure; FWIW I wasn't happy with the naming either, but I couldn't find anything that was entirely clear, because it depends on whether you consider this an entry..exit of guest context or an exit..entry of regular kernel context. I went with exit_to_guest_mode() and enter_from_guest_mode() because that clearly corresponded to exit_to_user_mode() and enter_from_user_mode(), and the convention in the common entry code is to talk in terms of the regular kernel context. While I was working on this, I had guest_context_enter_irqoff() for kernel->guest and guest_context_exit_irqoff() for guest->kernel, which also matched the style of guest_timing_{enter,exit}_irqoff(). I'm happy to change to that, if that works for you? > We settled on xfer_to_guest_mode_work() for a similar conundrum in the past, though > I don't love using xfer_to/from_guest_mode() as that makes it sound like those > helpers handle the actual transition into guest mode, i.e. runs the vCPU. > > To avoid too much bikeshedding, what about reusing the names we all compromised > on when we did this for x86 and call them kvm_guest_enter/exit_irqoff()? If x86 > is converted in the first patch then we could even avoid temporary #ifdefs. I'd like to keep this somewhat orthogonal to the x86 changes (e.g. as other architectures will need backports to stable at least for the RCU bug fix), so I'd rather use a name that isn't immediately coupled with x86 changes. Does the guest_context_{enter,exit}_irqoff() naming above work for you? Thanks, Mark.
On Fri, Jan 14, 2022, Mark Rutland wrote: > I'd like to keep this somewhat orthogonal to the x86 changes (e.g. as other > architectures will need backports to stable at least for the RCU bug fix), so > I'd rather use a name that isn't immediately coupled with x86 changes. Ah, gotcha. > Does the guest_context_{enter,exit}_irqoff() naming above work for you? Yep, thanks!
On Fri, Jan 14, 2022 at 04:11:06PM +0000, Sean Christopherson wrote: > On Fri, Jan 14, 2022, Mark Rutland wrote: > > I'd like to keep this somewhat orthogonal to the x86 changes (e.g. as other > > architectures will need backports to stable at least for the RCU bug fix), so > > I'd rather use a name that isn't immediately coupled with x86 changes. > > Ah, gotcha. > > > Does the guest_context_{enter,exit}_irqoff() naming above work for you? > > Yep, thanks! I just realised that I already have guest_context_{enter,exit}_irqoff() for the context-tracking bits alone, and so I'll need to use a different name. For bisectability I can't use guest_{enter,exit}_irqoff() immediately, so for now I'll go with guest_state_{enter,exit}_irqoff(). Once the conversion is complete and the deprecated bits are removed we can rename those to guest_{enter,exit}_irqoff(). Thanks, Mark.
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c310648cc8f1..13fcf7979880 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -29,6 +29,8 @@ #include <linux/refcount.h> #include <linux/nospec.h> #include <linux/notifier.h> +#include <linux/ftrace.h> +#include <linux/instrumentation.h> #include <asm/signal.h> #include <linux/kvm.h> @@ -362,8 +364,11 @@ struct kvm_vcpu { int last_used_slot; }; -/* must be called with irqs disabled */ -static __always_inline void guest_enter_irqoff(void) +/* + * Start accounting time towards a guest. + * Must be called before entering guest context. + */ +static __always_inline void guest_timing_enter_irqoff(void) { /* * This is running in ioctl context so its safe to assume that it's the @@ -372,7 +377,17 @@ static __always_inline void guest_enter_irqoff(void) instrumentation_begin(); vtime_account_guest_enter(); instrumentation_end(); +} +/* + * Enter guest context and enter an RCU extended quiescent state. + * + * This should be the last thing called before entering the guest, and must be + * called after any potential use of RCU (including any potentially + * instrumented code). + */ +static __always_inline void guest_context_enter_irqoff(void) +{ /* * KVM does not hold any references to rcu protected data when it * switches CPU into a guest mode. In fact switching to a guest mode @@ -388,16 +403,77 @@ static __always_inline void guest_enter_irqoff(void) } } -static __always_inline void guest_exit_irqoff(void) +/* + * Deprecated. Architectures should move to guest_timing_enter_irqoff() and + * exit_to_guest_mode(). + */ +static __always_inline void guest_enter_irqoff(void) +{ + guest_timing_enter_irqoff(); + guest_context_enter_irqoff(); +} + +/** + * exit_to_guest_mode - Fixup state when exiting to guest mode + * + * This is analagous to exit_to_user_mode(), and ensures we perform the + * following in order: + * + * 1) Trace interrupts on state + * 2) Invoke context tracking if enabled to adjust RCU state + * 3) Tell lockdep that interrupts are enabled + * + * Invoked from architecture specific code as the last step before entering a + * guest. Must be invoked with interrupts disabled and the caller must be + * non-instrumentable. + * + * This must be called after guest_timing_enter_irqoff(). + */ +static __always_inline void exit_to_guest_mode(void) +{ + instrumentation_begin(); + trace_hardirqs_on_prepare(); + lockdep_hardirqs_on_prepare(CALLER_ADDR0); + instrumentation_end(); + + guest_context_enter_irqoff(); + lockdep_hardirqs_on(CALLER_ADDR0); +} + +/* + * Exit guest context and exit an RCU extended quiescent state. + * + * This should be the first thing called after exiting the guest, and must be + * called before any potential use of RCU (including any potentially + * instrumented code). + */ +static __always_inline void guest_context_exit_irqoff(void) { context_tracking_guest_exit(); +} +/* + * Stop accounting time towards a guest. + * Must be called after exiting guest context. + */ +static __always_inline void guest_timing_exit_irqoff(void) +{ instrumentation_begin(); /* Flush the guest cputime we spent on the guest */ vtime_account_guest_exit(); instrumentation_end(); } +/* + * Deprecated. Architectures should move to enter_from_guest_mode() and + * guest_timing_enter_irqoff(). + */ +static __always_inline void guest_exit_irqoff(void) +{ + guest_context_exit_irqoff(); + guest_timing_exit_irqoff(); +} + static inline void guest_exit(void) { unsigned long flags; @@ -407,6 +483,32 @@ static inline void guest_exit(void) local_irq_restore(flags); } +/** + * enter_from_guest_mode - Establish state when returning from guest mode + * + * This is analagous to enter_from_user_mode(), and ensures we perform the + * following in order: + * + * 1) Tell lockdep that interrupts are disabled + * 2) Invoke context tracking if enabled to reactivate RCU + * 3) Trace interrupts off state + * + * Invoked from architecture specific code as the first step after exiting a + * guest. Must be invoked with interrupts disabled and the caller must be + * non-instrumentable. + * + * This must be called before guest_timing_exit_irqoff(). + */ +static __always_inline void enter_from_guest_mode(void) +{ + lockdep_hardirqs_off(CALLER_ADDR0); + guest_context_exit_irqoff(); + + instrumentation_begin(); + trace_hardirqs_off_finish(); + instrumentation_end(); +} + static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu) { /*
When transitioning to/from guest mode, it is necessary to inform lockdep, tracing, and RCU in a specific order, similar to the requirements for transitions to/from user mode. Additionally, it is necessary to perform vtime accounting for a window around running the guest, with RCU enabled, such that timer interrupts taken from the guest can be accounted as guest time. Most architectures don't handle all the necessary pieces, and a have a number of common bugs, including unsafe usage of RCU during the window between guest_enter() and guest_exit(). On x86, this was dealt with across commits: 87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs") 0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit") 9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit") 3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text") 135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text") 160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling") bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers") ... but those fixes are specific to x86, and as the resulting logic (while correct) is split across generic helper functions and x86-specific helper functions, it is difficult to see that the entry/exit accounting is balanced. This patch adds generic helpers which architectures can use to handle guest entry/exit consistently and correctly. The guest_{enter,exit}() helpers are split into guest_timing_{enter,exit}() to perform vtime accounting, and guest_context_{enter,exit}() to perform the necessary context tracking and RCU management. The existing guest_{enter,exit}() heleprs are left as wrappers of these. Atop this, new exit_to_guest_mode() and enter_from_guest_mode() helpers are added to handle the ordering of lockdep, tracing, and RCU manageent. These are named to align with exit_to_user_mode() and enter_from_user_mode(). Subsequent patches will migrate architectures over to the new helpers, following a sequence: guest_timing_enter_irqoff(); exit_to_guest_mode(); < run the vcpu > enter_from_guest_mode(); < take any pending IRQs > guest_timing_exit_irqoff(); This sequences handles all of the above correctly, and more clearly balances the entry and exit portions, making it easier to understand. The existing helpers are marked as deprecated, and will be removed once all architectures have been converted. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> --- include/linux/kvm_host.h | 108 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 105 insertions(+), 3 deletions(-)