Message ID | 1384973499-30659-1-git-send-email-christoffer.dall@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 20 November 2013 18:51, Christoffer Dall <christoffer.dall@linaro.org> wrote: > Therefore, set the pause flag on the vcpu at VCPU init time (which can > reasonably be expected to be completed for all CPUs by user space before > running any VCPUs) and clear both this flag and the feature (in case the > feature can somehow get set again in the future) and ping the waitqueue > on turning on a VCPU using PSCI. Tangential, but your phrasing prompted me to ask: how does the "start in PSCI power-off" boot protocol work for system reset? Since the kernel doesn't currently provide a "reset this v CPU" ioctl userspace has to do reset manually[*]; how do we say "take this vCPU which has started up and run once, and put it back into PSCI power-off" ? [*] this is pretty tedious, since it involves reading every CPU register on the vCPU before first run in order to feed the kernel back a bunch of info it already knows about the reset state of a vCPU. thanks -- PMM
On Wed, Nov 20, 2013 at 07:12:42PM +0000, Peter Maydell wrote: > On 20 November 2013 18:51, Christoffer Dall <christoffer.dall@linaro.org> wrote: > > Therefore, set the pause flag on the vcpu at VCPU init time (which can > > reasonably be expected to be completed for all CPUs by user space before > > running any VCPUs) and clear both this flag and the feature (in case the > > feature can somehow get set again in the future) and ping the waitqueue > > on turning on a VCPU using PSCI. > > Tangential, but your phrasing prompted me to ask: how does > the "start in PSCI power-off" boot protocol work for system reset? > Since the kernel doesn't currently provide a "reset this v CPU" > ioctl userspace has to do reset manually[*]; how do we say "take > this vCPU which has started up and run once, and put it back > into PSCI power-off" ? > > [*] this is pretty tedious, since it involves reading every CPU > register on the vCPU before first run in order to feed the kernel > back a bunch of info it already knows about the reset state of > a vCPU. > So, from looking at the code and the API specification calling KVM_ARM_VCPU_INIT does exactly this and also resets all core and cp15 registers for you - you would here be able to set the power-off flag and pause those CPUs so PSCI can wake them up again. Am I missing something here? This makes me wonder if it's worth adding to Documentation/virtual/kvm/api.txt that KVM_ARM_VCPU_INIT should be called on all VCPUs before running any of the VCPUs... -Christoffer
On 20 November 2013 19:21, Christoffer Dall <christoffer.dall@linaro.org> wrote: > On Wed, Nov 20, 2013 at 07:12:42PM +0000, Peter Maydell wrote: >> Tangential, but your phrasing prompted me to ask: how does >> the "start in PSCI power-off" boot protocol work for system reset? >> Since the kernel doesn't currently provide a "reset this v CPU" >> ioctl userspace has to do reset manually[*]; how do we say "take >> this vCPU which has started up and run once, and put it back >> into PSCI power-off" ? >> >> [*] this is pretty tedious, since it involves reading every CPU >> register on the vCPU before first run in order to feed the kernel >> back a bunch of info it already knows about the reset state of >> a vCPU. >> > So, from looking at the code and the API specification calling > KVM_ARM_VCPU_INIT does exactly this and also resets all core and cp15 > registers for you - you would here be able to set the power-off flag and > pause those CPUs so PSCI can wake them up again. > > Am I missing something here? Well, we call KVM_ARM_VCPU_INIT as part of our CPU init, but I didn't think you were allowed to call it a second time later on to do a CPU reset (ie as part of system reset when the guest asks for a system reset as part of 'shutdown -r now'). Is that valid? If so, it would probably be good to specifically document it... -- PMM
On Wed, Nov 20, 2013 at 07:27:52PM +0000, Peter Maydell wrote: > On 20 November 2013 19:21, Christoffer Dall <christoffer.dall@linaro.org> wrote: > > On Wed, Nov 20, 2013 at 07:12:42PM +0000, Peter Maydell wrote: > >> Tangential, but your phrasing prompted me to ask: how does > >> the "start in PSCI power-off" boot protocol work for system reset? > >> Since the kernel doesn't currently provide a "reset this v CPU" > >> ioctl userspace has to do reset manually[*]; how do we say "take > >> this vCPU which has started up and run once, and put it back > >> into PSCI power-off" ? > >> > >> [*] this is pretty tedious, since it involves reading every CPU > >> register on the vCPU before first run in order to feed the kernel > >> back a bunch of info it already knows about the reset state of > >> a vCPU. > >> > > So, from looking at the code and the API specification calling > > KVM_ARM_VCPU_INIT does exactly this and also resets all core and cp15 > > registers for you - you would here be able to set the power-off flag and > > pause those CPUs so PSCI can wake them up again. > > > > Am I missing something here? > > Well, we call KVM_ARM_VCPU_INIT as part of our CPU init, > but I didn't think you were allowed to call it a second time > later on to do a CPU reset (ie as part of system reset when > the guest asks for a system reset as part of 'shutdown -r now'). > Is that valid? If so, it would probably be good to specifically document it... > I don't see anything technically preventing you from doing so nor anything in the documentation telling you that you shouldn't be doing it. However, I agree that we haven't thought about this ioctl in that way so far, but we can start doing that. If nobody objects I can add a patch that clarifies this in the api.txt documentation; I don't believe this would break or change the API, it's merely clarifying the recommendations on how user space can leverage the functionality. -Christoffer
On Thu, Dec 12, 2013 at 09:23:00AM +0000, Marc Zyngier wrote: > Hi Christoffer, > > On 11/12/13 20:53, Christoffer Dall wrote: > > On Wed, Nov 20, 2013 at 10:51:39AM -0800, Christoffer Dall wrote: > >> The current KVM implementation of PSCI returns INVALID_PARAMETERS if the > >> waitqueue for the corresponding CPU is not active. This does not seem > >> correct, since KVM should not care what the specific thread is doing, > >> for example, user space may not have called KVM_RUN on this VCPU yet or > >> the thread may be busy looping to user space because it received a > >> signal; this is really up to the user space implementation. We should > >> simply clear the pause flag on the CPU and wake up the thread if it > >> happens to be blocked for us. > > Coming back to the PSCI spec (v0.2), it is clearly said that an error > should be returned if the CPU is already ON (the error code is > different, as KVM only implements v0.1 so far, but still...). > ok, so we should check if the pause flag is set and report and error if it's not, but not check the waitqueue. > >> Further, the implementation seems to be racy when executing multiple > >> VCPU threads. There really isn't a reasonable user space programming > >> scheme to ensure all secondary CPUs have reached kvm_vcpu_first_run_init > >> before turning on the boot CPU. > > Agreed. > > >> It also does not make much sense to > >> call into the PSCI code for a CPU that is turned off - after all, it > >> cannot do anything if it is turned off and PSCI code could reasonably be > >> written with the assumption that the VCPU is actually up, in some shape > >> or form. > > I find this to be debatable. While I agree with you that doing a CPU_OFF > on something that has never ran is not the most gracious thing ever, it > helps (or helped) keeping the code relatively monimal, and without too > many actors messing with the pause flag. > It's certainly debatable. I personally found that it was hiding information that you had to look at anyhow to understand how things worked, and that it wasn't really beneficial, but I may have been a little over zealous about my general statement about calling into PSCI code. > >> Therefore, set the pause flag on the vcpu at VCPU init time (which can > >> reasonably be expected to be completed for all CPUs by user space before > >> running any VCPUs) and clear both this flag and the feature (in case the > >> feature can somehow get set again in the future) and ping the waitqueue > >> on turning on a VCPU using PSCI. > > Fair enough. See my comments below: > > >> > >> Cc: Marc Zyngier <marc.zyngier@arm.com> > >> Reported-by: Peter Maydell <peter.maydell@linaro.org> > >> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> > >> --- > >> arch/arm/kvm/arm.c | 30 +++++++++++++++++++----------- > >> arch/arm/kvm/psci.c | 6 ++---- > >> 2 files changed, 21 insertions(+), 15 deletions(-) > >> > >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > >> index e312e4a..1140e0e 100644 > >> --- a/arch/arm/kvm/arm.c > >> +++ b/arch/arm/kvm/arm.c > >> @@ -478,15 +478,6 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu) > >> return ret; > >> } > >> > >> - /* > >> - * Handle the "start in power-off" case by calling into the > >> - * PSCI code. > >> - */ > >> - if (test_and_clear_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features)) { > >> - *vcpu_reg(vcpu, 0) = KVM_PSCI_FN_CPU_OFF; > >> - kvm_psci_call(vcpu); > >> - } > >> - > >> return 0; > >> } > >> > >> @@ -700,6 +691,24 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level, > >> return -EINVAL; > >> } > >> > >> +static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu, > >> + struct kvm_vcpu_init *init) > >> +{ > >> + int ret; > >> + > >> + ret = kvm_vcpu_set_target(vcpu, init); > >> + if (ret) > >> + return ret; > >> + > >> + /* > >> + * Handle the "start in power-off" case by marking the VCPU as paused. > >> + */ > >> + if (test_and_clear_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features)) > >> + vcpu->arch.pause = true; > > Do we really need a test_and_clear? I know the original code used it, > but I now fail to see a reason why. > You mean why the atomic version? I think it reads nicely, we need to test a bit and we need to clear it, but we can use __test_and_clear_bit instead if you prefer. > >> + return 0; > >> +} > >> + > >> long kvm_arch_vcpu_ioctl(struct file *filp, > >> unsigned int ioctl, unsigned long arg) > >> { > >> @@ -713,8 +722,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp, > >> if (copy_from_user(&init, argp, sizeof(init))) > >> return -EFAULT; > >> > >> - return kvm_vcpu_set_target(vcpu, &init); > >> - > >> + return kvm_arch_vcpu_ioctl_vcpu_init(vcpu, &init); > >> } > >> case KVM_SET_ONE_REG: > >> case KVM_GET_ONE_REG: { > >> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c > >> index 0881bf1..2e72ef5 100644 > >> --- a/arch/arm/kvm/psci.c > >> +++ b/arch/arm/kvm/psci.c > >> @@ -59,10 +59,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) > >> > >> target_pc = *vcpu_reg(source_vcpu, 2); > >> > >> - wq = kvm_arch_vcpu_wq(vcpu); > >> - if (!waitqueue_active(wq)) > >> - return KVM_PSCI_RET_INVAL; > >> - > > That I object to. Calling CPU_ON on a CPU that is already ON is an > error, and should be reported as such. > Good point, but that's not why I removed the check - see above. The check should be against the pause flag, not the state of the waitqueue. I'll adjust the patch. > >> kvm_reset_vcpu(vcpu); > >> > >> /* Gracefully handle Thumb2 entry point */ > >> @@ -76,9 +72,11 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) > >> kvm_vcpu_set_be(vcpu); > >> > >> *vcpu_pc(vcpu) = target_pc; > >> + clear_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features); > > Why do you clear that bit here? Given the above test_and_clear, it > shouldn't be set anymore. > No reason, it's unnecessary. I'll remove it. > >> vcpu->arch.pause = false; > >> smp_mb(); /* Make sure the above is visible */ > >> > >> + wq = kvm_arch_vcpu_wq(vcpu); > >> wake_up_interruptible(wq); > >> > >> return KVM_PSCI_RET_SUCCESS; > >> -- > >> 1.8.4.3 > >> > > > > Otherwise, I think this is a valuable cleanup. > Thanks!
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index e312e4a..1140e0e 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -478,15 +478,6 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu) return ret; } - /* - * Handle the "start in power-off" case by calling into the - * PSCI code. - */ - if (test_and_clear_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features)) { - *vcpu_reg(vcpu, 0) = KVM_PSCI_FN_CPU_OFF; - kvm_psci_call(vcpu); - } - return 0; } @@ -700,6 +691,24 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level, return -EINVAL; } +static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu, + struct kvm_vcpu_init *init) +{ + int ret; + + ret = kvm_vcpu_set_target(vcpu, init); + if (ret) + return ret; + + /* + * Handle the "start in power-off" case by marking the VCPU as paused. + */ + if (test_and_clear_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features)) + vcpu->arch.pause = true; + + return 0; +} + long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -713,8 +722,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp, if (copy_from_user(&init, argp, sizeof(init))) return -EFAULT; - return kvm_vcpu_set_target(vcpu, &init); - + return kvm_arch_vcpu_ioctl_vcpu_init(vcpu, &init); } case KVM_SET_ONE_REG: case KVM_GET_ONE_REG: { diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c index 0881bf1..2e72ef5 100644 --- a/arch/arm/kvm/psci.c +++ b/arch/arm/kvm/psci.c @@ -59,10 +59,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) target_pc = *vcpu_reg(source_vcpu, 2); - wq = kvm_arch_vcpu_wq(vcpu); - if (!waitqueue_active(wq)) - return KVM_PSCI_RET_INVAL; - kvm_reset_vcpu(vcpu); /* Gracefully handle Thumb2 entry point */ @@ -76,9 +72,11 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) kvm_vcpu_set_be(vcpu); *vcpu_pc(vcpu) = target_pc; + clear_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features); vcpu->arch.pause = false; smp_mb(); /* Make sure the above is visible */ + wq = kvm_arch_vcpu_wq(vcpu); wake_up_interruptible(wq); return KVM_PSCI_RET_SUCCESS;
The current KVM implementation of PSCI returns INVALID_PARAMETERS if the waitqueue for the corresponding CPU is not active. This does not seem correct, since KVM should not care what the specific thread is doing, for example, user space may not have called KVM_RUN on this VCPU yet or the thread may be busy looping to user space because it received a signal; this is really up to the user space implementation. We should simply clear the pause flag on the CPU and wake up the thread if it happens to be blocked for us. Further, the implementation seems to be racy when executing multiple VCPU threads. There really isn't a reasonable user space programming scheme to ensure all secondary CPUs have reached kvm_vcpu_first_run_init before turning on the boot CPU. It also does not make much sense to call into the PSCI code for a CPU that is turned off - after all, it cannot do anything if it is turned off and PSCI code could reasonably be written with the assumption that the VCPU is actually up, in some shape or form. Therefore, set the pause flag on the vcpu at VCPU init time (which can reasonably be expected to be completed for all CPUs by user space before running any VCPUs) and clear both this flag and the feature (in case the feature can somehow get set again in the future) and ping the waitqueue on turning on a VCPU using PSCI. Cc: Marc Zyngier <marc.zyngier@arm.com> Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> --- arch/arm/kvm/arm.c | 30 +++++++++++++++++++----------- arch/arm/kvm/psci.c | 6 ++---- 2 files changed, 21 insertions(+), 15 deletions(-)