diff mbox series

[1/5] KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded

Message ID 20190125094656.5026-2-christoffer.dall@arm.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm/arm64: Fix VCPU power management problems | expand

Commit Message

Christoffer Dall Jan. 25, 2019, 9:46 a.m. UTC
Resetting the VCPU state modifies the system register state in memory,
but this may interact with vcpu_load/vcpu_put if running with preemption
disabled, which in turn may lead to corrupted system register state.

Address this by disabling preemption and doing put/load if required
around the reset logic.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/reset.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

Comments

Andrew Jones Jan. 29, 2019, 3:48 p.m. UTC | #1
Hi Christoffer,

On Fri, Jan 25, 2019 at 10:46:52AM +0100, Christoffer Dall wrote:
> Resetting the VCPU state modifies the system register state in memory,
> but this may interact with vcpu_load/vcpu_put if running with preemption
> disabled, which in turn may lead to corrupted system register state.
  ^ enabled
> 
> Address this by disabling preemption and doing put/load if required
> around the reset logic.

I'm having trouble understanding how disabling preemption helps here.
There shouldn't be an issue with the KVM_ARM_VCPU_INIT case, since the
target vcpu is guaranteed not to be loaded and therefore it doesn't
have preempt notifiers registered either. Also, KVM_ARM_VCPU_INIT holds
the vcpu mutex, so there's no chance for a load to occur until it's
complete.

For the PSCI case it makes sense to force a vcpu load after the reset,
otherwise the sleeping target vcpu won't have the correct state loaded.
The initial put also makes sense in case we're not resetting everything.
I don't understand how we're ensuring the target vcpu thread's preemption
is disabled though. This modified kvm_reset_vcpu would need to be run
from the target vcpu thread to work, but that's not how the PSCI code
currently does it.

> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/reset.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index b72a3dd56204..f21a2a575939 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -105,16 +105,33 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   * This function finds the right table above and sets the registers on
>   * the virtual CPU struct to their architecturally defined reset
>   * values.
> + *
> + * Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT
> + * ioctl or as part of handling a request issued by another VCPU in the PSCI
> + * handling code.  In the first case, the VCPU will not be loaded, and in the
> + * second case the VCPU will be loaded.  Because this function operates purely
> + * on the memory-backed valus of system registers, we want to do a full put if
                           ^ values
> + * we were loaded (handling a request) and load the values back at the end of
> + * the function.  Otherwise we leave the state alone.  In both cases, we
> + * disable preemption around the vcpu reset as we would otherwise race with
> + * preempt notifiers which also call put/load.
>   */
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  {
>  	const struct kvm_regs *cpu_reset;
> +	int ret = -EINVAL;
> +	bool loaded;
> +
> +	preempt_disable();
> +	loaded = (vcpu->cpu != -1);
> +	if (loaded)
> +		kvm_arch_vcpu_put(vcpu);
>  
>  	switch (vcpu->arch.target) {
>  	default:
>  		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
>  			if (!cpu_has_32bit_el1())
> -				return -EINVAL;
> +				goto out;
>  			cpu_reset = &default_regs_reset32;
>  		} else {
>  			cpu_reset = &default_regs_reset;
> @@ -137,7 +154,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  		vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG;
>  
>  	/* Reset timer */
> -	return kvm_timer_vcpu_reset(vcpu);
> +	ret = kvm_timer_vcpu_reset(vcpu);
> +out:
> +	if (loaded)
> +		kvm_arch_vcpu_load(vcpu, smp_processor_id());
> +	preempt_enable();
> +	return ret;
>  }
>  
>  void kvm_set_ipa_limit(void)
> -- 
> 2.18.0
>

Thanks,
drew
Marc Zyngier Jan. 29, 2019, 4:05 p.m. UTC | #2
Hi Drew,

On Tue, 29 Jan 2019 15:48:58 +0000,
Andrew Jones <drjones@redhat.com> wrote:
> 
> Hi Christoffer,
> 
> On Fri, Jan 25, 2019 at 10:46:52AM +0100, Christoffer Dall wrote:
> > Resetting the VCPU state modifies the system register state in memory,
> > but this may interact with vcpu_load/vcpu_put if running with preemption
> > disabled, which in turn may lead to corrupted system register state.
>   ^ enabled
> > 
> > Address this by disabling preemption and doing put/load if required
> > around the reset logic.
> 
> I'm having trouble understanding how disabling preemption helps here.
> There shouldn't be an issue with the KVM_ARM_VCPU_INIT case, since the
> target vcpu is guaranteed not to be loaded and therefore it doesn't
> have preempt notifiers registered either. Also, KVM_ARM_VCPU_INIT holds
> the vcpu mutex, so there's no chance for a load to occur until it's
> complete.
> 
> For the PSCI case it makes sense to force a vcpu load after the reset,
> otherwise the sleeping target vcpu won't have the correct state loaded.
> The initial put also makes sense in case we're not resetting everything.
> I don't understand how we're ensuring the target vcpu thread's preemption
> is disabled though. This modified kvm_reset_vcpu would need to be run
> from the target vcpu thread to work, but that's not how the PSCI code
> currently does it.

And that's exactly where we're going with the following patch in the
series. Ultimately, we need a vcpu to reset itself, as we otherwise
have a window where a vcpu can be spuriously loaded whilst being
reset.

Thanks,

	M.
Christoffer Dall Jan. 30, 2019, 9:22 a.m. UTC | #3
On Tue, Jan 29, 2019 at 04:05:25PM +0000, Marc Zyngier wrote:
> Hi Drew,
> 
> On Tue, 29 Jan 2019 15:48:58 +0000,
> Andrew Jones <drjones@redhat.com> wrote:
> > 
> > Hi Christoffer,
> > 
> > On Fri, Jan 25, 2019 at 10:46:52AM +0100, Christoffer Dall wrote:
> > > Resetting the VCPU state modifies the system register state in memory,
> > > but this may interact with vcpu_load/vcpu_put if running with preemption
> > > disabled, which in turn may lead to corrupted system register state.
> >   ^ enabled
> > > 
> > > Address this by disabling preemption and doing put/load if required
> > > around the reset logic.
> > 
> > I'm having trouble understanding how disabling preemption helps here.
> > There shouldn't be an issue with the KVM_ARM_VCPU_INIT case, since the
> > target vcpu is guaranteed not to be loaded and therefore it doesn't
> > have preempt notifiers registered either. Also, KVM_ARM_VCPU_INIT holds
> > the vcpu mutex, so there's no chance for a load to occur until it's
> > complete.
> > 
> > For the PSCI case it makes sense to force a vcpu load after the reset,
> > otherwise the sleeping target vcpu won't have the correct state loaded.
> > The initial put also makes sense in case we're not resetting everything.
> > I don't understand how we're ensuring the target vcpu thread's preemption
> > is disabled though. This modified kvm_reset_vcpu would need to be run
> > from the target vcpu thread to work, but that's not how the PSCI code
> > currently does it.
> 
> And that's exactly where we're going with the following patch in the
> series. Ultimately, we need a vcpu to reset itself, as we otherwise
> have a window where a vcpu can be spuriously loaded whilst being
> reset.
> 
FWIW, I think the confusion here comes from having re-ordered the
patches compared to how the commit text was originally written.  We
should probably explain in the commit message that this is in
preparation for doing the reset from the VCPU itself.

Thanks,

    Christoffer
Andrew Jones Feb. 4, 2019, 3:15 p.m. UTC | #4
On Fri, Jan 25, 2019 at 10:46:52AM +0100, Christoffer Dall wrote:
> Resetting the VCPU state modifies the system register state in memory,
> but this may interact with vcpu_load/vcpu_put if running with preemption
> disabled, which in turn may lead to corrupted system register state.
> 
> Address this by disabling preemption and doing put/load if required
> around the reset logic.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/reset.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
>

I believe this should come after the next patch, but anyway

Reviewed-by: Andrew Jones <drjones@redhat.com>
Dave Martin Feb. 20, 2019, 7:14 p.m. UTC | #5
On Fri, Jan 25, 2019 at 10:46:52AM +0100, Christoffer Dall wrote:
> Resetting the VCPU state modifies the system register state in memory,
> but this may interact with vcpu_load/vcpu_put if running with preemption
> disabled, which in turn may lead to corrupted system register state.

Should this be "enabled"?

Too late now, but I want to make sure I understand this right for
patches that will go on top.

> Address this by disabling preemption and doing put/load if required
> around the reset logic.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/reset.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index b72a3dd56204..f21a2a575939 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -105,16 +105,33 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   * This function finds the right table above and sets the registers on
>   * the virtual CPU struct to their architecturally defined reset
>   * values.
> + *
> + * Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT
> + * ioctl or as part of handling a request issued by another VCPU in the PSCI
> + * handling code.  In the first case, the VCPU will not be loaded, and in the
> + * second case the VCPU will be loaded.  Because this function operates purely
> + * on the memory-backed valus of system registers, we want to do a full put if
> + * we were loaded (handling a request) and load the values back at the end of
> + * the function.  Otherwise we leave the state alone.  In both cases, we
> + * disable preemption around the vcpu reset as we would otherwise race with
> + * preempt notifiers which also call put/load.
>   */
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  {
>  	const struct kvm_regs *cpu_reset;
> +	int ret = -EINVAL;
> +	bool loaded;
> +
> +	preempt_disable();
> +	loaded = (vcpu->cpu != -1);
> +	if (loaded)
> +		kvm_arch_vcpu_put(vcpu);
>  
>  	switch (vcpu->arch.target) {
>  	default:
>  		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
>  			if (!cpu_has_32bit_el1())
> -				return -EINVAL;
> +				goto out;
>  			cpu_reset = &default_regs_reset32;
>  		} else {
>  			cpu_reset = &default_regs_reset;
> @@ -137,7 +154,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  		vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG;
>  
>  	/* Reset timer */
> -	return kvm_timer_vcpu_reset(vcpu);
> +	ret = kvm_timer_vcpu_reset(vcpu);
> +out:
> +	if (loaded)
> +		kvm_arch_vcpu_load(vcpu, smp_processor_id());
> +	preempt_enable();
> +	return ret;
>  }
>  
>  void kvm_set_ipa_limit(void)

I was really confused by this: as far as I can see, we don't really need
to disable preemption here once kvm_arch_vcpu_put() is complete -- at
least not for the purpose of avoiding corruption of the reg state.  But
we _do_ need to disable the preempt notifier so that it doesn't fire
before we are ready.

It actually seems a bit surprising for a powered-off CPU to sit with the
VM regs live and preempt notifier armed, when the vcpu thread is
heading to interruptible sleep anyway until someone turns it on.
Perhaps an alternative approach would be to nobble the preempt notifier
and stick an explicit vcpu_put()...vcpu_load() around the
swait_event_interruptible_exclusive() call in vcpu_req_sleep().  This
is not fast path.


Any, with the code as-is, it looks like the SVE regs resetting should
go in the preempt_disable() region, after the kvm_arch_vcpu_put() call.

Does it sound like I've understood that right?

Cheers
---Dave
Marc Zyngier Feb. 20, 2019, 7:41 p.m. UTC | #6
On Wed, 20 Feb 2019 19:14:53 +0000
Dave Martin <Dave.Martin@arm.com> wrote:

> On Fri, Jan 25, 2019 at 10:46:52AM +0100, Christoffer Dall wrote:
> > Resetting the VCPU state modifies the system register state in memory,
> > but this may interact with vcpu_load/vcpu_put if running with preemption
> > disabled, which in turn may lead to corrupted system register state.  
> 
> Should this be "enabled"?

Yup. Never mind. The patches are firmly in mainline now.

> 
> Too late now, but I want to make sure I understand this right for
> patches that will go on top.
> 
> > Address this by disabling preemption and doing put/load if required
> > around the reset logic.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  arch/arm64/kvm/reset.c | 26 ++++++++++++++++++++++++--
> >  1 file changed, 24 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > index b72a3dd56204..f21a2a575939 100644
> > --- a/arch/arm64/kvm/reset.c
> > +++ b/arch/arm64/kvm/reset.c
> > @@ -105,16 +105,33 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >   * This function finds the right table above and sets the registers on
> >   * the virtual CPU struct to their architecturally defined reset
> >   * values.
> > + *
> > + * Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT
> > + * ioctl or as part of handling a request issued by another VCPU in the PSCI
> > + * handling code.  In the first case, the VCPU will not be loaded, and in the
> > + * second case the VCPU will be loaded.  Because this function operates purely
> > + * on the memory-backed valus of system registers, we want to do a full put if
> > + * we were loaded (handling a request) and load the values back at the end of
> > + * the function.  Otherwise we leave the state alone.  In both cases, we
> > + * disable preemption around the vcpu reset as we would otherwise race with
> > + * preempt notifiers which also call put/load.
> >   */
> >  int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> >  {
> >  	const struct kvm_regs *cpu_reset;
> > +	int ret = -EINVAL;
> > +	bool loaded;
> > +
> > +	preempt_disable();
> > +	loaded = (vcpu->cpu != -1);
> > +	if (loaded)
> > +		kvm_arch_vcpu_put(vcpu);
> >  
> >  	switch (vcpu->arch.target) {
> >  	default:
> >  		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
> >  			if (!cpu_has_32bit_el1())
> > -				return -EINVAL;
> > +				goto out;
> >  			cpu_reset = &default_regs_reset32;
> >  		} else {
> >  			cpu_reset = &default_regs_reset;
> > @@ -137,7 +154,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> >  		vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG;
> >  
> >  	/* Reset timer */
> > -	return kvm_timer_vcpu_reset(vcpu);
> > +	ret = kvm_timer_vcpu_reset(vcpu);
> > +out:
> > +	if (loaded)
> > +		kvm_arch_vcpu_load(vcpu, smp_processor_id());
> > +	preempt_enable();
> > +	return ret;
> >  }
> >  
> >  void kvm_set_ipa_limit(void)  
> 
> I was really confused by this: as far as I can see, we don't really need
> to disable preemption here once kvm_arch_vcpu_put() is complete -- at
> least not for the purpose of avoiding corruption of the reg state.  But
> we _do_ need to disable the preempt notifier so that it doesn't fire
> before we are ready.

Just reading this patch alone won't help. You need to read it in
conjunction with the following patch, which resets the vcpu from a
preemptible section.

> It actually seems a bit surprising for a powered-off CPU to sit with the
> VM regs live and preempt notifier armed, when the vcpu thread is
> heading to interruptible sleep anyway until someone turns it on.

All it takes is a signal for that vcpu to wake up, power-off or not.

> Perhaps an alternative approach would be to nobble the preempt notifier
> and stick an explicit vcpu_put()...vcpu_load() around the
> swait_event_interruptible_exclusive() call in vcpu_req_sleep().  This
> is not fast path.

What does it buy us? The problem we're solving here is a powered-off,
spuriously woken-up vcpu racing against the reset performed from
another vcpu. I don't see what adding more put/load would solve.

> Any, with the code as-is, it looks like the SVE regs resetting should
> go in the preempt_disable() region, after the kvm_arch_vcpu_put() call.

All resets must go there. This is the only safe location.

> Does it sound like I've understood that right?

I'm not sure. Your analysis of what we're trying to achieve with this
series seems a bit off. Or I'm completely misunderstanding what you're
suggesting, which is the most likely explanation.

	M.
Christoffer Dall Feb. 26, 2019, 12:34 p.m. UTC | #7
On Wed, Feb 20, 2019 at 07:14:53PM +0000, Dave Martin wrote:
> On Fri, Jan 25, 2019 at 10:46:52AM +0100, Christoffer Dall wrote:
> > Resetting the VCPU state modifies the system register state in memory,
> > but this may interact with vcpu_load/vcpu_put if running with preemption
> > disabled, which in turn may lead to corrupted system register state.
> 
> Should this be "enabled"?
> 
> Too late now, but I want to make sure I understand this right for
> patches that will go on top.
> 
> > Address this by disabling preemption and doing put/load if required
> > around the reset logic.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  arch/arm64/kvm/reset.c | 26 ++++++++++++++++++++++++--
> >  1 file changed, 24 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > index b72a3dd56204..f21a2a575939 100644
> > --- a/arch/arm64/kvm/reset.c
> > +++ b/arch/arm64/kvm/reset.c
> > @@ -105,16 +105,33 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >   * This function finds the right table above and sets the registers on
> >   * the virtual CPU struct to their architecturally defined reset
> >   * values.
> > + *
> > + * Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT
> > + * ioctl or as part of handling a request issued by another VCPU in the PSCI
> > + * handling code.  In the first case, the VCPU will not be loaded, and in the
> > + * second case the VCPU will be loaded.  Because this function operates purely
> > + * on the memory-backed valus of system registers, we want to do a full put if
> > + * we were loaded (handling a request) and load the values back at the end of
> > + * the function.  Otherwise we leave the state alone.  In both cases, we
> > + * disable preemption around the vcpu reset as we would otherwise race with
> > + * preempt notifiers which also call put/load.
> >   */
> >  int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> >  {
> >  	const struct kvm_regs *cpu_reset;
> > +	int ret = -EINVAL;
> > +	bool loaded;
> > +
> > +	preempt_disable();
> > +	loaded = (vcpu->cpu != -1);
> > +	if (loaded)
> > +		kvm_arch_vcpu_put(vcpu);
> >  
> >  	switch (vcpu->arch.target) {
> >  	default:
> >  		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
> >  			if (!cpu_has_32bit_el1())
> > -				return -EINVAL;
> > +				goto out;
> >  			cpu_reset = &default_regs_reset32;
> >  		} else {
> >  			cpu_reset = &default_regs_reset;
> > @@ -137,7 +154,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> >  		vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG;
> >  
> >  	/* Reset timer */
> > -	return kvm_timer_vcpu_reset(vcpu);
> > +	ret = kvm_timer_vcpu_reset(vcpu);
> > +out:
> > +	if (loaded)
> > +		kvm_arch_vcpu_load(vcpu, smp_processor_id());
> > +	preempt_enable();
> > +	return ret;
> >  }
> >  
> >  void kvm_set_ipa_limit(void)
> 
> I was really confused by this: as far as I can see, we don't really need
> to disable preemption here once kvm_arch_vcpu_put() is complete -- at
> least not for the purpose of avoiding corruption of the reg state.  But
> we _do_ need to disable the preempt notifier so that it doesn't fire
> before we are ready.
> 
> It actually seems a bit surprising for a powered-off CPU to sit with the
> VM regs live and preempt notifier armed, when the vcpu thread is
> heading to interruptible sleep anyway until someone turns it on.
> Perhaps an alternative approach would be to nobble the preempt notifier
> and stick an explicit vcpu_put()...vcpu_load() around the
> swait_event_interruptible_exclusive() call in vcpu_req_sleep().  This
> is not fast path.
> 
> 

I think you've understood the problem correctly, and the thing here is
that we (sort-of) "abuse" disabling preemption as a way to disable
preempt notifiers, which I don't think we have.  So we could add that,
and do something like:

  preempt_disable();
  vcpu_put(vcpu);
  disable_preempt_notifiers(vcpu);
  preempt_disable();
  funky_stuff();
  vcpu_load();
  preempt_enable();

But I think that's additional complexity to get a slightly shorter
section with disabled preemption.

We could also re-architect a lot of the vcpu_load/vpcu_put functionality
more drastically, but that is difficult and requires understanding of
how the other architectures work, so at the end of the day we just use
this pattern in multiple places, which is:

  preempt_disable();
  vcpu_put();
  modify_vcpu_state_in_memory();
  vcpu_load();
  preempt_enable();

Does that help?


Thanks,

    Christoffer
Dave Martin Feb. 26, 2019, 1:53 p.m. UTC | #8
On Wed, Feb 20, 2019 at 07:41:43PM +0000, Marc Zyngier wrote:
> On Wed, 20 Feb 2019 19:14:53 +0000
> Dave Martin <Dave.Martin@arm.com> wrote:
> 
> > On Fri, Jan 25, 2019 at 10:46:52AM +0100, Christoffer Dall wrote:
> > > Resetting the VCPU state modifies the system register state in memory,
> > > but this may interact with vcpu_load/vcpu_put if running with preemption
> > > disabled, which in turn may lead to corrupted system register state.  
> > 
> > Should this be "enabled"?
> 
> Yup. Never mind. The patches are firmly in mainline now.

Understood.  Just wanted to check my intuition.

> > 
> > Too late now, but I want to make sure I understand this right for
> > patches that will go on top.
> > 
> > > Address this by disabling preemption and doing put/load if required
> > > around the reset logic.
> > > 
> > > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > > ---
> > >  arch/arm64/kvm/reset.c | 26 ++++++++++++++++++++++++--
> > >  1 file changed, 24 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > > index b72a3dd56204..f21a2a575939 100644
> > > --- a/arch/arm64/kvm/reset.c
> > > +++ b/arch/arm64/kvm/reset.c
> > > @@ -105,16 +105,33 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > >   * This function finds the right table above and sets the registers on
> > >   * the virtual CPU struct to their architecturally defined reset
> > >   * values.
> > > + *
> > > + * Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT
> > > + * ioctl or as part of handling a request issued by another VCPU in the PSCI
> > > + * handling code.  In the first case, the VCPU will not be loaded, and in the
> > > + * second case the VCPU will be loaded.  Because this function operates purely
> > > + * on the memory-backed valus of system registers, we want to do a full put if
> > > + * we were loaded (handling a request) and load the values back at the end of
> > > + * the function.  Otherwise we leave the state alone.  In both cases, we
> > > + * disable preemption around the vcpu reset as we would otherwise race with
> > > + * preempt notifiers which also call put/load.
> > >   */
> > >  int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> > >  {
> > >  	const struct kvm_regs *cpu_reset;
> > > +	int ret = -EINVAL;
> > > +	bool loaded;
> > > +
> > > +	preempt_disable();
> > > +	loaded = (vcpu->cpu != -1);
> > > +	if (loaded)
> > > +		kvm_arch_vcpu_put(vcpu);
> > >  
> > >  	switch (vcpu->arch.target) {
> > >  	default:
> > >  		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
> > >  			if (!cpu_has_32bit_el1())
> > > -				return -EINVAL;
> > > +				goto out;
> > >  			cpu_reset = &default_regs_reset32;
> > >  		} else {
> > >  			cpu_reset = &default_regs_reset;
> > > @@ -137,7 +154,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> > >  		vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG;
> > >  
> > >  	/* Reset timer */
> > > -	return kvm_timer_vcpu_reset(vcpu);
> > > +	ret = kvm_timer_vcpu_reset(vcpu);
> > > +out:
> > > +	if (loaded)
> > > +		kvm_arch_vcpu_load(vcpu, smp_processor_id());
> > > +	preempt_enable();
> > > +	return ret;
> > >  }
> > >  
> > >  void kvm_set_ipa_limit(void)  
> > 
> > I was really confused by this: as far as I can see, we don't really need
> > to disable preemption here once kvm_arch_vcpu_put() is complete -- at
> > least not for the purpose of avoiding corruption of the reg state.  But
> > we _do_ need to disable the preempt notifier so that it doesn't fire
> > before we are ready.
> 
> Just reading this patch alone won't help. You need to read it in
> conjunction with the following patch, which resets the vcpu from a
> preemptible section.
> 
> > It actually seems a bit surprising for a powered-off CPU to sit with the
> > VM regs live and preempt notifier armed, when the vcpu thread is
> > heading to interruptible sleep anyway until someone turns it on.
> 
> All it takes is a signal for that vcpu to wake up, power-off or not.

Sure, but the vcpu was scheduled out in the meantim, so we may end up
having to do all the loading twice:  not a problem, because this is a
rare, slow path.  But I wanted to make sure I wasn't more confused than
necessary.

> > Perhaps an alternative approach would be to nobble the preempt notifier
> > and stick an explicit vcpu_put()...vcpu_load() around the
> > swait_event_interruptible_exclusive() call in vcpu_req_sleep().  This
> > is not fast path.
> 
> What does it buy us? The problem we're solving here is a powered-off,
> spuriously woken-up vcpu racing against the reset performed from
> another vcpu. I don't see what adding more put/load would solve.

Not a lot.  Changes in this area would allow some code to move outside
preempt_disable(), but only at the expense of extra cost elsewhere...

> > Any, with the code as-is, it looks like the SVE regs resetting should
> > go in the preempt_disable() region, after the kvm_arch_vcpu_put() call.
> 
> All resets must go there. This is the only safe location.

OK, that's what I wanted to be sure of.

> > Does it sound like I've understood that right?
> 
> I'm not sure. Your analysis of what we're trying to achieve with this
> series seems a bit off. Or I'm completely misunderstanding what you're
> suggesting, which is the most likely explanation.

I'm not trying to achieve anything except rebase the SVE series
correctly on top of this patch (which I've now done -- thanks for
your confirmation above of where the extra resetting should go).

My suggestions above were only intended as thought experiments to check
my understanding of what's going on, *not* proposals for changing the
code.  I guess I didn't make that very clear.

[...]

Cheers
---Dave
Dave Martin Feb. 26, 2019, 2 p.m. UTC | #9
On Tue, Feb 26, 2019 at 01:34:49PM +0100, Christoffer Dall wrote:
> On Wed, Feb 20, 2019 at 07:14:53PM +0000, Dave Martin wrote:
> > On Fri, Jan 25, 2019 at 10:46:52AM +0100, Christoffer Dall wrote:
> > > Resetting the VCPU state modifies the system register state in memory,
> > > but this may interact with vcpu_load/vcpu_put if running with preemption
> > > disabled, which in turn may lead to corrupted system register state.
> > 
> > Should this be "enabled"?
> > 
> > Too late now, but I want to make sure I understand this right for
> > patches that will go on top.
> > 
> > > Address this by disabling preemption and doing put/load if required
> > > around the reset logic.
> > > 
> > > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > > ---
> > >  arch/arm64/kvm/reset.c | 26 ++++++++++++++++++++++++--
> > >  1 file changed, 24 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > > index b72a3dd56204..f21a2a575939 100644
> > > --- a/arch/arm64/kvm/reset.c
> > > +++ b/arch/arm64/kvm/reset.c
> > > @@ -105,16 +105,33 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > >   * This function finds the right table above and sets the registers on
> > >   * the virtual CPU struct to their architecturally defined reset
> > >   * values.
> > > + *
> > > + * Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT
> > > + * ioctl or as part of handling a request issued by another VCPU in the PSCI
> > > + * handling code.  In the first case, the VCPU will not be loaded, and in the
> > > + * second case the VCPU will be loaded.  Because this function operates purely
> > > + * on the memory-backed valus of system registers, we want to do a full put if
> > > + * we were loaded (handling a request) and load the values back at the end of
> > > + * the function.  Otherwise we leave the state alone.  In both cases, we
> > > + * disable preemption around the vcpu reset as we would otherwise race with
> > > + * preempt notifiers which also call put/load.
> > >   */
> > >  int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> > >  {
> > >  	const struct kvm_regs *cpu_reset;
> > > +	int ret = -EINVAL;
> > > +	bool loaded;
> > > +
> > > +	preempt_disable();
> > > +	loaded = (vcpu->cpu != -1);
> > > +	if (loaded)
> > > +		kvm_arch_vcpu_put(vcpu);
> > >  
> > >  	switch (vcpu->arch.target) {
> > >  	default:
> > >  		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
> > >  			if (!cpu_has_32bit_el1())
> > > -				return -EINVAL;
> > > +				goto out;
> > >  			cpu_reset = &default_regs_reset32;
> > >  		} else {
> > >  			cpu_reset = &default_regs_reset;
> > > @@ -137,7 +154,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> > >  		vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG;
> > >  
> > >  	/* Reset timer */
> > > -	return kvm_timer_vcpu_reset(vcpu);
> > > +	ret = kvm_timer_vcpu_reset(vcpu);
> > > +out:
> > > +	if (loaded)
> > > +		kvm_arch_vcpu_load(vcpu, smp_processor_id());
> > > +	preempt_enable();
> > > +	return ret;
> > >  }
> > >  
> > >  void kvm_set_ipa_limit(void)
> > 
> > I was really confused by this: as far as I can see, we don't really need
> > to disable preemption here once kvm_arch_vcpu_put() is complete -- at
> > least not for the purpose of avoiding corruption of the reg state.  But
> > we _do_ need to disable the preempt notifier so that it doesn't fire
> > before we are ready.
> > 
> > It actually seems a bit surprising for a powered-off CPU to sit with the
> > VM regs live and preempt notifier armed, when the vcpu thread is
> > heading to interruptible sleep anyway until someone turns it on.
> > Perhaps an alternative approach would be to nobble the preempt notifier
> > and stick an explicit vcpu_put()...vcpu_load() around the
> > swait_event_interruptible_exclusive() call in vcpu_req_sleep().  This
> > is not fast path.
> > 
> > 
> 
> I think you've understood the problem correctly, and the thing here is
> that we (sort-of) "abuse" disabling preemption as a way to disable
> preempt notifiers, which I don't think we have.  So we could add that,
> and do something like:
> 
>   preempt_disable();
>   vcpu_put(vcpu);
>   disable_preempt_notifiers(vcpu);
>   preempt_disable();
>   funky_stuff();
>   vcpu_load();
>   preempt_enable();

Did you mean

	preempt_disable();
	disable_preempt_notifiers(vcpu);
	vcpu_put(vcpu);
	preempt_enable();

	funky_stuff();

	preempt_disable();
	vcpu_load(vcpu);
	enable_preempt_notifiers(vcpu);
	preempt_enable();

> But I think that's additional complexity to get a slightly shorter
> section with disabled preemption.

Agreed.

disable/enable_preempt_notifiers() may do little more than toggle a flag
that the preempt notifiers check, but I totally agree that it's unlikely
to be worth it just to be preemptible in this small region.

> We could also re-architect a lot of the vcpu_load/vpcu_put functionality
> more drastically, but that is difficult and requires understanding of
> how the other architectures work, so at the end of the day we just use
> this pattern in multiple places, which is:
> 
>   preempt_disable();
>   vcpu_put();

We still have the anomaly that we save all the state we're about to
reset here.

(i.e., we have no pure "invalidate" operation for the loaded CPU regs;
vcpu_put() is more "clean and invalidate", but we may not care much
about the "clean" part in this particular sequence).

Again, this only makes a difference to a rare slow path, so I don't
see it as a problem.

>   modify_vcpu_state_in_memory();
>   vcpu_load();
>   preempt_enable();
> 
> Does that help?

Yes, I think I've understood correctly what's going on here now.

The code is fine as it is.

Cheers
---Dave
Dave Martin Feb. 27, 2019, 12:05 p.m. UTC | #10
On Tue, Feb 26, 2019 at 01:53:52PM +0000, Dave Martin wrote:
> On Wed, Feb 20, 2019 at 07:41:43PM +0000, Marc Zyngier wrote:
> > On Wed, 20 Feb 2019 19:14:53 +0000
> > Dave Martin <Dave.Martin@arm.com> wrote:

[...]

> > > Perhaps an alternative approach would be to nobble the preempt notifier
> > > and stick an explicit vcpu_put()...vcpu_load() around the
> > > swait_event_interruptible_exclusive() call in vcpu_req_sleep().  This
> > > is not fast path.
> > 
> > What does it buy us? The problem we're solving here is a powered-off,
> > spuriously woken-up vcpu racing against the reset performed from
> > another vcpu. I don't see what adding more put/load would solve.
> 
> Not a lot.  Changes in this area would allow some code to move outside
> preempt_disable(), but only at the expense of extra cost elsewhere...
> 
> > > Any, with the code as-is, it looks like the SVE regs resetting should
> > > go in the preempt_disable() region, after the kvm_arch_vcpu_put() call.
> > 
> > All resets must go there. This is the only safe location.
> 
> OK, that's what I wanted to be sure of.

Hmmm, it turns out this is no good because we may need to kzalloc()
the vcpu->arch.sve_state storage here in the new-vcpu case.

Rather than create additional mess in this code, I will move that
allocation to be done more eagerly rather than deferred until the last
possible moment.

With the current API flow, this may mean krealloc()'ing the buffer if
the set of vector lengths for the vcpu is modified from the default
before the vcpu is run, via a user write to KVM_REG_ARM64_SVE_VLS --
though we would anyway get away with not doing it, since the default
buffer size will be the maximum that the hardware could possibly
require anyway.  We could waste a few K per vcpu, but that's almost not
worth caring about.

I'd rather not churn the API at this point, but I'll experiment to see
which approach under the hood looks cleanest.

Thoughts?

Cheers
---Dave
diff mbox series

Patch

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index b72a3dd56204..f21a2a575939 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -105,16 +105,33 @@  int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
  * This function finds the right table above and sets the registers on
  * the virtual CPU struct to their architecturally defined reset
  * values.
+ *
+ * Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT
+ * ioctl or as part of handling a request issued by another VCPU in the PSCI
+ * handling code.  In the first case, the VCPU will not be loaded, and in the
+ * second case the VCPU will be loaded.  Because this function operates purely
+ * on the memory-backed valus of system registers, we want to do a full put if
+ * we were loaded (handling a request) and load the values back at the end of
+ * the function.  Otherwise we leave the state alone.  In both cases, we
+ * disable preemption around the vcpu reset as we would otherwise race with
+ * preempt notifiers which also call put/load.
  */
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 {
 	const struct kvm_regs *cpu_reset;
+	int ret = -EINVAL;
+	bool loaded;
+
+	preempt_disable();
+	loaded = (vcpu->cpu != -1);
+	if (loaded)
+		kvm_arch_vcpu_put(vcpu);
 
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
 			if (!cpu_has_32bit_el1())
-				return -EINVAL;
+				goto out;
 			cpu_reset = &default_regs_reset32;
 		} else {
 			cpu_reset = &default_regs_reset;
@@ -137,7 +154,12 @@  int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG;
 
 	/* Reset timer */
-	return kvm_timer_vcpu_reset(vcpu);
+	ret = kvm_timer_vcpu_reset(vcpu);
+out:
+	if (loaded)
+		kvm_arch_vcpu_load(vcpu, smp_processor_id());
+	preempt_enable();
+	return ret;
 }
 
 void kvm_set_ipa_limit(void)