diff mbox

[v2,2/2] ARM: KVM: Power State Coordination Interface implementation

Message ID 1357834005-23843-3-git-send-email-marc.zyngier@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marc Zyngier Jan. 10, 2013, 4:06 p.m. UTC
Implement the PSCI specification (ARM DEN 0022A) to control
virtual CPUs being "powered" on or off.

PSCI/KVM is detected using the KVM_CAP_ARM_PSCI capability.

A virtual CPU can now be initialized in a "powered off" state,
using the KVM_ARM_VCPU_POWER_OFF feature flag.

The guest can use either SMC or HVC to execute a PSCI function.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 Documentation/virtual/kvm/api.txt  |  4 ++
 arch/arm/include/asm/kvm_emulate.h |  5 +++
 arch/arm/include/asm/kvm_host.h    |  2 +-
 arch/arm/include/asm/kvm_psci.h    | 23 ++++++++++
 arch/arm/include/uapi/asm/kvm.h    | 16 +++++++
 arch/arm/kvm/Makefile              |  2 +-
 arch/arm/kvm/arm.c                 | 26 +++++++----
 arch/arm/kvm/psci.c                | 90 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h           |  1 +
 9 files changed, 158 insertions(+), 11 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_psci.h
 create mode 100644 arch/arm/kvm/psci.c

Comments

Russell King - ARM Linux Jan. 11, 2013, 5:12 p.m. UTC | #1
On Thu, Jan 10, 2013 at 04:06:45PM +0000, Marc Zyngier wrote:
> +int kvm_psci_call(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0);
> +	unsigned long val;
> +
> +	switch (psci_fn) {
> +	case KVM_PSCI_FN_CPU_OFF:
> +		kvm_psci_vcpu_off(vcpu);
> +		val = KVM_PSCI_RET_SUCCESS;
> +		break;
> +	case KVM_PSCI_FN_CPU_ON:
> +		val = kvm_psci_vcpu_on(vcpu);
> +		break;
> +	case KVM_PSCI_FN_CPU_SUSPEND:
> +	case KVM_PSCI_FN_MIGRATE:
> +		val = KVM_PSCI_RET_NI;
> +		break;
> +
> +	default:
> +		return -1;
> +	}
> +
> +	*vcpu_reg(vcpu, 0) = val;
> +	return 0;
> +}

We were discussing recently on #kernel about kernel APIs and the way that
our integer-returning functions pretty much use 0 for success, and -errno
for failures, whereas our pointer-returning functions are a mess.

And above we have something returning -1 to some other chunk of code outside
this compilation unit.  That doesn't sound particularly clever to me.
Marc Zyngier Jan. 11, 2013, 5:24 p.m. UTC | #2
On 11/01/13 17:12, Russell King - ARM Linux wrote:
> On Thu, Jan 10, 2013 at 04:06:45PM +0000, Marc Zyngier wrote:
>> +int kvm_psci_call(struct kvm_vcpu *vcpu)
>> +{
>> +	unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0);
>> +	unsigned long val;
>> +
>> +	switch (psci_fn) {
>> +	case KVM_PSCI_FN_CPU_OFF:
>> +		kvm_psci_vcpu_off(vcpu);
>> +		val = KVM_PSCI_RET_SUCCESS;
>> +		break;
>> +	case KVM_PSCI_FN_CPU_ON:
>> +		val = kvm_psci_vcpu_on(vcpu);
>> +		break;
>> +	case KVM_PSCI_FN_CPU_SUSPEND:
>> +	case KVM_PSCI_FN_MIGRATE:
>> +		val = KVM_PSCI_RET_NI;
>> +		break;
>> +
>> +	default:
>> +		return -1;
>> +	}
>> +
>> +	*vcpu_reg(vcpu, 0) = val;
>> +	return 0;
>> +}
> 
> We were discussing recently on #kernel about kernel APIs and the way that
> our integer-returning functions pretty much use 0 for success, and -errno
> for failures, whereas our pointer-returning functions are a mess.
> 
> And above we have something returning -1 to some other chunk of code outside
> this compilation unit.  That doesn't sound particularly clever to me.

The original code used to return -EINVAL, see:
https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004509.html

Christoffer (Cc-ed) didn't like this, hence the -1. I'm happy to revert
the code to its original state though.

	M.
Christoffer Dall Jan. 11, 2013, 5:33 p.m. UTC | #3
On Fri, Jan 11, 2013 at 12:24 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 11/01/13 17:12, Russell King - ARM Linux wrote:
>> On Thu, Jan 10, 2013 at 04:06:45PM +0000, Marc Zyngier wrote:
>>> +int kvm_psci_call(struct kvm_vcpu *vcpu)
>>> +{
>>> +    unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0);
>>> +    unsigned long val;
>>> +
>>> +    switch (psci_fn) {
>>> +    case KVM_PSCI_FN_CPU_OFF:
>>> +            kvm_psci_vcpu_off(vcpu);
>>> +            val = KVM_PSCI_RET_SUCCESS;
>>> +            break;
>>> +    case KVM_PSCI_FN_CPU_ON:
>>> +            val = kvm_psci_vcpu_on(vcpu);
>>> +            break;
>>> +    case KVM_PSCI_FN_CPU_SUSPEND:
>>> +    case KVM_PSCI_FN_MIGRATE:
>>> +            val = KVM_PSCI_RET_NI;
>>> +            break;
>>> +
>>> +    default:
>>> +            return -1;
>>> +    }
>>> +
>>> +    *vcpu_reg(vcpu, 0) = val;
>>> +    return 0;
>>> +}
>>
>> We were discussing recently on #kernel about kernel APIs and the way that
>> our integer-returning functions pretty much use 0 for success, and -errno
>> for failures, whereas our pointer-returning functions are a mess.
>>
>> And above we have something returning -1 to some other chunk of code outside
>> this compilation unit.  That doesn't sound particularly clever to me.
>
> The original code used to return -EINVAL, see:
> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004509.html
>
> Christoffer (Cc-ed) didn't like this, hence the -1. I'm happy to revert
> the code to its original state though.
>
I don't want to return -EINVAL, because for the rest of the KVM code
this would mean kill the guest.

The convention used in other archs of KVM as well as for ARM is that
the handle_exit functions return:

-ERRNO: Error, report this error to user space
0: Everything is fine, but return to user space to let it do I/O
emulation and whatever it wants to do
1: Everything is fine, return directly to the guest without going to user space

And then you do:
if (handle_something() == 0)
    return 1;

which I thought was confusing, so I said make the function a bool, to
avoid the confusion, like Rusty did for all the coprocessor emulation
functions.

There are obviously other ways to handle the "return 1" case, like
having an extra state that you carry around, and we can change all the
code to do that, but I just don't think it's worth it, as we are in
fact quite close to the existing kernel API.

-Christoffer
Russell King - ARM Linux Jan. 11, 2013, 5:40 p.m. UTC | #4
On Fri, Jan 11, 2013 at 12:33:15PM -0500, Christoffer Dall wrote:
> On Fri, Jan 11, 2013 at 12:24 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> > On 11/01/13 17:12, Russell King - ARM Linux wrote:
> >> On Thu, Jan 10, 2013 at 04:06:45PM +0000, Marc Zyngier wrote:
> >>> +int kvm_psci_call(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +    unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0);
> >>> +    unsigned long val;
> >>> +
> >>> +    switch (psci_fn) {
> >>> +    case KVM_PSCI_FN_CPU_OFF:
> >>> +            kvm_psci_vcpu_off(vcpu);
> >>> +            val = KVM_PSCI_RET_SUCCESS;
> >>> +            break;
> >>> +    case KVM_PSCI_FN_CPU_ON:
> >>> +            val = kvm_psci_vcpu_on(vcpu);
> >>> +            break;
> >>> +    case KVM_PSCI_FN_CPU_SUSPEND:
> >>> +    case KVM_PSCI_FN_MIGRATE:
> >>> +            val = KVM_PSCI_RET_NI;
> >>> +            break;
> >>> +
> >>> +    default:
> >>> +            return -1;
> >>> +    }
> >>> +
> >>> +    *vcpu_reg(vcpu, 0) = val;
> >>> +    return 0;
> >>> +}
> >>
> >> We were discussing recently on #kernel about kernel APIs and the way that
> >> our integer-returning functions pretty much use 0 for success, and -errno
> >> for failures, whereas our pointer-returning functions are a mess.
> >>
> >> And above we have something returning -1 to some other chunk of code outside
> >> this compilation unit.  That doesn't sound particularly clever to me.
> >
> > The original code used to return -EINVAL, see:
> > https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004509.html
> >
> > Christoffer (Cc-ed) didn't like this, hence the -1. I'm happy to revert
> > the code to its original state though.
> >
> I don't want to return -EINVAL, because for the rest of the KVM code
> this would mean kill the guest.
> 
> The convention used in other archs of KVM as well as for ARM is that
> the handle_exit functions return:
> 
> -ERRNO: Error, report this error to user space
> 0: Everything is fine, but return to user space to let it do I/O
> emulation and whatever it wants to do
> 1: Everything is fine, return directly to the guest without going to user space

Right, so the above "return -1" _is_ doing the thing that I really hate,
which is it's actually doing a "return -EPERM" without anyone realising
that's what it's doing.

This is precisely why I hate (and pick up on, and have my mail reader to
highlight) any "return -1".  It's mostly always a bug.
Marc Zyngier Jan. 11, 2013, 5:43 p.m. UTC | #5
On 11/01/13 17:33, Christoffer Dall wrote:
> On Fri, Jan 11, 2013 at 12:24 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 11/01/13 17:12, Russell King - ARM Linux wrote:
>>> On Thu, Jan 10, 2013 at 04:06:45PM +0000, Marc Zyngier wrote:
>>>> +int kvm_psci_call(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +    unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0);
>>>> +    unsigned long val;
>>>> +
>>>> +    switch (psci_fn) {
>>>> +    case KVM_PSCI_FN_CPU_OFF:
>>>> +            kvm_psci_vcpu_off(vcpu);
>>>> +            val = KVM_PSCI_RET_SUCCESS;
>>>> +            break;
>>>> +    case KVM_PSCI_FN_CPU_ON:
>>>> +            val = kvm_psci_vcpu_on(vcpu);
>>>> +            break;
>>>> +    case KVM_PSCI_FN_CPU_SUSPEND:
>>>> +    case KVM_PSCI_FN_MIGRATE:
>>>> +            val = KVM_PSCI_RET_NI;
>>>> +            break;
>>>> +
>>>> +    default:
>>>> +            return -1;
>>>> +    }
>>>> +
>>>> +    *vcpu_reg(vcpu, 0) = val;
>>>> +    return 0;
>>>> +}
>>>
>>> We were discussing recently on #kernel about kernel APIs and the way that
>>> our integer-returning functions pretty much use 0 for success, and -errno
>>> for failures, whereas our pointer-returning functions are a mess.
>>>
>>> And above we have something returning -1 to some other chunk of code outside
>>> this compilation unit.  That doesn't sound particularly clever to me.
>>
>> The original code used to return -EINVAL, see:
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004509.html
>>
>> Christoffer (Cc-ed) didn't like this, hence the -1. I'm happy to revert
>> the code to its original state though.
>>
> I don't want to return -EINVAL, because for the rest of the KVM code
> this would mean kill the guest.
> 
> The convention used in other archs of KVM as well as for ARM is that
> the handle_exit functions return:
> 
> -ERRNO: Error, report this error to user space
> 0: Everything is fine, but return to user space to let it do I/O
> emulation and whatever it wants to do
> 1: Everything is fine, return directly to the guest without going to user space

That is assuming we propagate the handle_exit convention down to the
leaf calls, and I object to that. The 3 possible values only apply to
handle_exit, and we should keep that convention as local as possible,
because this is the odd case.

> And then you do:
> if (handle_something() == 0)
>     return 1;
> 
> which I thought was confusing, so I said make the function a bool, to
> avoid the confusion, like Rusty did for all the coprocessor emulation
> functions.

I don't see a compelling reason to propagate this convention to areas
that do not require it. In the PSCI case, we have a basic
handled/not-handled state, the later indicating the reason. The exit
handling functions can convert the error codes to whatever the run loop
requires.

> There are obviously other ways to handle the "return 1" case, like
> having an extra state that you carry around, and we can change all the
> code to do that, but I just don't think it's worth it, as we are in
> fact quite close to the existing kernel API.
Christoffer Dall Jan. 11, 2013, 5:48 p.m. UTC | #6
On Fri, Jan 11, 2013 at 12:43 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 11/01/13 17:33, Christoffer Dall wrote:
>> On Fri, Jan 11, 2013 at 12:24 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> On 11/01/13 17:12, Russell King - ARM Linux wrote:
>>>> On Thu, Jan 10, 2013 at 04:06:45PM +0000, Marc Zyngier wrote:
>>>>> +int kvm_psci_call(struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> +    unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0);
>>>>> +    unsigned long val;
>>>>> +
>>>>> +    switch (psci_fn) {
>>>>> +    case KVM_PSCI_FN_CPU_OFF:
>>>>> +            kvm_psci_vcpu_off(vcpu);
>>>>> +            val = KVM_PSCI_RET_SUCCESS;
>>>>> +            break;
>>>>> +    case KVM_PSCI_FN_CPU_ON:
>>>>> +            val = kvm_psci_vcpu_on(vcpu);
>>>>> +            break;
>>>>> +    case KVM_PSCI_FN_CPU_SUSPEND:
>>>>> +    case KVM_PSCI_FN_MIGRATE:
>>>>> +            val = KVM_PSCI_RET_NI;
>>>>> +            break;
>>>>> +
>>>>> +    default:
>>>>> +            return -1;
>>>>> +    }
>>>>> +
>>>>> +    *vcpu_reg(vcpu, 0) = val;
>>>>> +    return 0;
>>>>> +}
>>>>
>>>> We were discussing recently on #kernel about kernel APIs and the way that
>>>> our integer-returning functions pretty much use 0 for success, and -errno
>>>> for failures, whereas our pointer-returning functions are a mess.
>>>>
>>>> And above we have something returning -1 to some other chunk of code outside
>>>> this compilation unit.  That doesn't sound particularly clever to me.
>>>
>>> The original code used to return -EINVAL, see:
>>> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004509.html
>>>
>>> Christoffer (Cc-ed) didn't like this, hence the -1. I'm happy to revert
>>> the code to its original state though.
>>>
>> I don't want to return -EINVAL, because for the rest of the KVM code
>> this would mean kill the guest.
>>
>> The convention used in other archs of KVM as well as for ARM is that
>> the handle_exit functions return:
>>
>> -ERRNO: Error, report this error to user space
>> 0: Everything is fine, but return to user space to let it do I/O
>> emulation and whatever it wants to do
>> 1: Everything is fine, return directly to the guest without going to user space
>
> That is assuming we propagate the handle_exit convention down to the
> leaf calls, and I object to that. The 3 possible values only apply to
> handle_exit, and we should keep that convention as local as possible,
> because this is the odd case.

I don't agree - it requires you to carefully follow a potentially deep
call trace to make sense of what it does, and worse, it's directly
misleading in the case of KVM/ARM. So either change it everywhere and
have a consistent calling convention or adhere to what we do elsewhere
and use the bool.

>
>> And then you do:
>> if (handle_something() == 0)
>>     return 1;
>>
>> which I thought was confusing, so I said make the function a bool, to
>> avoid the confusion, like Rusty did for all the coprocessor emulation
>> functions.
>
> I don't see a compelling reason to propagate this convention to areas
> that do not require it. In the PSCI case, we have a basic
> handled/not-handled state, the later indicating the reason. The exit
> handling functions can convert the error codes to whatever the run loop
> requires.
>

again, that's why I suggest returning a bool instead. You just said
it: it's a basic handled/not-handled state. Why do you want to return
-EINVAL if that's not propogated anywhere?

If the return codes are local and do not map reasonably to error codes
and more complicated than a bool, I would vote for a #define
HVC_HANDLE_SUCCESS, HVC_HANDLE_ERROR, HVC_HANDLE_XXXXXX, ...



>> There are obviously other ways to handle the "return 1" case, like
>> having an extra state that you carry around, and we can change all the
>> code to do that, but I just don't think it's worth it, as we are in
>> fact quite close to the existing kernel API.
>
Russell King - ARM Linux Jan. 11, 2013, 5:57 p.m. UTC | #7
On Fri, Jan 11, 2013 at 12:48:45PM -0500, Christoffer Dall wrote:
> again, that's why I suggest returning a bool instead. You just said
> it: it's a basic handled/not-handled state. Why do you want to return
> -EINVAL if that's not propogated anywhere?

We have a well established principle throughout the kernel interfaces that
functions will return positive values for success and an appropriate -ve
errno for failure.

We *certainly* hate functions which return 0 for failure and non-zero
for success - it makes review a real pain because you start seeing code
doing this:

	if (!function()) {
		deal_with_failure();
	}

and you have to then start looking at the function to properly understand
what it's return semantics are.

We have more than enough proof already that this doesn't work: people
don't care to understand what the return values from functions mean.
You only have to do an audit of a few of the IS_ERR_OR_NULL() uses to
find that out, and you quickly realise that people just use what they
_think_ is the right test and which happens to pass their very simple
testing at the time.

We've avoided major problems so far in the kernel by having most of the
integer-returning functions following the established principle, and
that's good.  I really really think that there must be a _very_ good
reason, and overwhelming reason to deviate from the established
principle in any large project.
Christoffer Dall Jan. 11, 2013, 6:07 p.m. UTC | #8
On Fri, Jan 11, 2013 at 12:57 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Fri, Jan 11, 2013 at 12:48:45PM -0500, Christoffer Dall wrote:
>> again, that's why I suggest returning a bool instead. You just said
>> it: it's a basic handled/not-handled state. Why do you want to return
>> -EINVAL if that's not propogated anywhere?
>
> We have a well established principle throughout the kernel interfaces that
> functions will return positive values for success and an appropriate -ve
> errno for failure.
>
> We *certainly* hate functions which return 0 for failure and non-zero
> for success - it makes review a real pain because you start seeing code
> doing this:
>
>         if (!function()) {
>                 deal_with_failure();
>         }
>

and you'd certainly not find that in any of the KVM/ARM - that would
be despicable.

> and you have to then start looking at the function to properly understand
> what it's return semantics are.
>
> We have more than enough proof already that this doesn't work: people
> don't care to understand what the return values from functions mean.
> You only have to do an audit of a few of the IS_ERR_OR_NULL() uses to
> find that out, and you quickly realise that people just use what they
> _think_ is the right test and which happens to pass their very simple
> testing at the time.
>
> We've avoided major problems so far in the kernel by having most of the
> integer-returning functions following the established principle, and
> that's good.  I really really think that there must be a _very_ good
> reason, and overwhelming reason to deviate from the established
> principle in any large project.

The _very_ good reason here, is that we have two success cases: return
to guest and return to user space. As I said, we can save this state
in another bit somewhere and change all the KVM/ARM code to do so, but
the KVM guys back then would like to use the same convention as other
KVM archs.

I would prefer not having to change all that KVM/ARM code at this
point, but of course, if you insists, I will have to do that.

BUT, returning -EINVAL should indicate an actual error. This is not
the case for the concrete psci example, the case is simply that the
number in r0 does not fall within the psci range, and therefore could
potentially be handled by something else, and using -EINVAL as a
fall-through to the next value is equally weird and misleading.

An alternative is to do something like foo(..., &handled), but I think
using a bool as the function type is perfect here: what is the problem
with that again? Certainly it wouldn't be the only boolean function in
the kernel.

-Christoffer
Marc Zyngier Jan. 11, 2013, 6:09 p.m. UTC | #9
On 11/01/13 17:48, Christoffer Dall wrote:
> On Fri, Jan 11, 2013 at 12:43 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 11/01/13 17:33, Christoffer Dall wrote:
>>> On Fri, Jan 11, 2013 at 12:24 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>>> On 11/01/13 17:12, Russell King - ARM Linux wrote:
>>>>> On Thu, Jan 10, 2013 at 04:06:45PM +0000, Marc Zyngier wrote:
>>>>>> +int kvm_psci_call(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +    unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0);
>>>>>> +    unsigned long val;
>>>>>> +
>>>>>> +    switch (psci_fn) {
>>>>>> +    case KVM_PSCI_FN_CPU_OFF:
>>>>>> +            kvm_psci_vcpu_off(vcpu);
>>>>>> +            val = KVM_PSCI_RET_SUCCESS;
>>>>>> +            break;
>>>>>> +    case KVM_PSCI_FN_CPU_ON:
>>>>>> +            val = kvm_psci_vcpu_on(vcpu);
>>>>>> +            break;
>>>>>> +    case KVM_PSCI_FN_CPU_SUSPEND:
>>>>>> +    case KVM_PSCI_FN_MIGRATE:
>>>>>> +            val = KVM_PSCI_RET_NI;
>>>>>> +            break;
>>>>>> +
>>>>>> +    default:
>>>>>> +            return -1;
>>>>>> +    }
>>>>>> +
>>>>>> +    *vcpu_reg(vcpu, 0) = val;
>>>>>> +    return 0;
>>>>>> +}
>>>>>
>>>>> We were discussing recently on #kernel about kernel APIs and the way that
>>>>> our integer-returning functions pretty much use 0 for success, and -errno
>>>>> for failures, whereas our pointer-returning functions are a mess.
>>>>>
>>>>> And above we have something returning -1 to some other chunk of code outside
>>>>> this compilation unit.  That doesn't sound particularly clever to me.
>>>>
>>>> The original code used to return -EINVAL, see:
>>>> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004509.html
>>>>
>>>> Christoffer (Cc-ed) didn't like this, hence the -1. I'm happy to revert
>>>> the code to its original state though.
>>>>
>>> I don't want to return -EINVAL, because for the rest of the KVM code
>>> this would mean kill the guest.
>>>
>>> The convention used in other archs of KVM as well as for ARM is that
>>> the handle_exit functions return:
>>>
>>> -ERRNO: Error, report this error to user space
>>> 0: Everything is fine, but return to user space to let it do I/O
>>> emulation and whatever it wants to do
>>> 1: Everything is fine, return directly to the guest without going to user space
>>
>> That is assuming we propagate the handle_exit convention down to the
>> leaf calls, and I object to that. The 3 possible values only apply to
>> handle_exit, and we should keep that convention as local as possible,
>> because this is the odd case.
> 
> I don't agree - it requires you to carefully follow a potentially deep
> call trace to make sense of what it does, and worse, it's directly
> misleading in the case of KVM/ARM. So either change it everywhere and
> have a consistent calling convention or adhere to what we do elsewhere
> and use the bool.

Sorry, but I do not buy this.

In this particular case, the meaning of the value returned has nothing
to do with the handle_exit convention. We never return to user space,
let alone signaling an error.

Trying to force all the code in this convention makes it actually harder
to review, because this is the exception in the kernel. This is why I
suggest keeping the handle_exit return convention at this exact point,
and not impose it on anything else.

>>
>>> And then you do:
>>> if (handle_something() == 0)
>>>     return 1;
>>>
>>> which I thought was confusing, so I said make the function a bool, to
>>> avoid the confusion, like Rusty did for all the coprocessor emulation
>>> functions.
>>
>> I don't see a compelling reason to propagate this convention to areas
>> that do not require it. In the PSCI case, we have a basic
>> handled/not-handled state, the later indicating the reason. The exit
>> handling functions can convert the error codes to whatever the run loop
>> requires.
>>
> 
> again, that's why I suggest returning a bool instead. You just said
> it: it's a basic handled/not-handled state. Why do you want to return
> -EINVAL if that's not propogated anywhere?
> 
> If the return codes are local and do not map reasonably to error codes
> and more complicated than a bool, I would vote for a #define
> HVC_HANDLE_SUCCESS, HVC_HANDLE_ERROR, HVC_HANDLE_XXXXXX, ...
> 
> 
> 
>>> There are obviously other ways to handle the "return 1" case, like
>>> having an extra state that you carry around, and we can change all the
>>> code to do that, but I just don't think it's worth it, as we are in
>>> fact quite close to the existing kernel API.
>>
>
Russell King - ARM Linux Jan. 11, 2013, 6:14 p.m. UTC | #10
On Fri, Jan 11, 2013 at 01:07:31PM -0500, Christoffer Dall wrote:
> The _very_ good reason here, is that we have two success cases: return
> to guest and return to user space. As I said, we can save this state
> in another bit somewhere and change all the KVM/ARM code to do so, but
> the KVM guys back then would like to use the same convention as other
> KVM archs.

Can you please credit me for not objecting to returning 0/1 to have
different success meanings.  What I'm merely objecting to is that
"return -1" statement in the code (notice the negative sign.)
Christoffer Dall Jan. 11, 2013, 6:15 p.m. UTC | #11
On Fri, Jan 11, 2013 at 1:14 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Fri, Jan 11, 2013 at 01:07:31PM -0500, Christoffer Dall wrote:
>> The _very_ good reason here, is that we have two success cases: return
>> to guest and return to user space. As I said, we can save this state
>> in another bit somewhere and change all the KVM/ARM code to do so, but
>> the KVM guys back then would like to use the same convention as other
>> KVM archs.
>
> Can you please credit me for not objecting to returning 0/1 to have
> different success meanings.  What I'm merely objecting to is that
> "return -1" statement in the code (notice the negative sign.)

Sorry if I misunderstood you. Yes, the return -1 has to be changed.

-Christoffer
Christoffer Dall Jan. 11, 2013, 6:18 p.m. UTC | #12
On Fri, Jan 11, 2013 at 1:09 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 11/01/13 17:48, Christoffer Dall wrote:
>> On Fri, Jan 11, 2013 at 12:43 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> On 11/01/13 17:33, Christoffer Dall wrote:
>>>> On Fri, Jan 11, 2013 at 12:24 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>>>> On 11/01/13 17:12, Russell King - ARM Linux wrote:
>>>>>> On Thu, Jan 10, 2013 at 04:06:45PM +0000, Marc Zyngier wrote:
>>>>>>> +int kvm_psci_call(struct kvm_vcpu *vcpu)
>>>>>>> +{
>>>>>>> +    unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0);
>>>>>>> +    unsigned long val;
>>>>>>> +
>>>>>>> +    switch (psci_fn) {
>>>>>>> +    case KVM_PSCI_FN_CPU_OFF:
>>>>>>> +            kvm_psci_vcpu_off(vcpu);
>>>>>>> +            val = KVM_PSCI_RET_SUCCESS;
>>>>>>> +            break;
>>>>>>> +    case KVM_PSCI_FN_CPU_ON:
>>>>>>> +            val = kvm_psci_vcpu_on(vcpu);
>>>>>>> +            break;
>>>>>>> +    case KVM_PSCI_FN_CPU_SUSPEND:
>>>>>>> +    case KVM_PSCI_FN_MIGRATE:
>>>>>>> +            val = KVM_PSCI_RET_NI;
>>>>>>> +            break;
>>>>>>> +
>>>>>>> +    default:
>>>>>>> +            return -1;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    *vcpu_reg(vcpu, 0) = val;
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>
>>>>>> We were discussing recently on #kernel about kernel APIs and the way that
>>>>>> our integer-returning functions pretty much use 0 for success, and -errno
>>>>>> for failures, whereas our pointer-returning functions are a mess.
>>>>>>
>>>>>> And above we have something returning -1 to some other chunk of code outside
>>>>>> this compilation unit.  That doesn't sound particularly clever to me.
>>>>>
>>>>> The original code used to return -EINVAL, see:
>>>>> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004509.html
>>>>>
>>>>> Christoffer (Cc-ed) didn't like this, hence the -1. I'm happy to revert
>>>>> the code to its original state though.
>>>>>
>>>> I don't want to return -EINVAL, because for the rest of the KVM code
>>>> this would mean kill the guest.
>>>>
>>>> The convention used in other archs of KVM as well as for ARM is that
>>>> the handle_exit functions return:
>>>>
>>>> -ERRNO: Error, report this error to user space
>>>> 0: Everything is fine, but return to user space to let it do I/O
>>>> emulation and whatever it wants to do
>>>> 1: Everything is fine, return directly to the guest without going to user space
>>>
>>> That is assuming we propagate the handle_exit convention down to the
>>> leaf calls, and I object to that. The 3 possible values only apply to
>>> handle_exit, and we should keep that convention as local as possible,
>>> because this is the odd case.
>>
>> I don't agree - it requires you to carefully follow a potentially deep
>> call trace to make sense of what it does, and worse, it's directly
>> misleading in the case of KVM/ARM. So either change it everywhere and
>> have a consistent calling convention or adhere to what we do elsewhere
>> and use the bool.
>
> Sorry, but I do not buy this.
>
> In this particular case, the meaning of the value returned has nothing
> to do with the handle_exit convention. We never return to user space,
> let alone signaling an error.
>

But that is by no means obvious.

> Trying to force all the code in this convention makes it actually harder
> to review, because this is the exception in the kernel. This is why I
> suggest keeping the handle_exit return convention at this exact point,
> and not impose it on anything else.
>

Ok, then come up with a nicer convention, but using -EINVAL to mean I
did not handle this one, maybe someone else will handle it, and it's
not an error is definitely not easy to read.  What then if another
function called from the same caller could actually return an error,
would you have a switch case on the return value checking for EINVAL
vs. EFAULT?

I repeat: what's the problem with a bool for this particular case,
which would make it look like all the other sub-calls of handle exit
functions - could we be consistent here?

-Christoffer
diff mbox

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 497fd48..e0fa0ea 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2185,6 +2185,10 @@  return ENOEXEC for that vcpu.
 Note that because some registers reflect machine topology, all vcpus
 should be created before this ioctl is invoked.
 
+Possible features:
+	- KVM_ARM_VCPU_POWER_OFF: Starts the CPU in a power-off state.
+	  Depends on KVM_CAP_ARM_PSCI.
+
 
 4.78 KVM_GET_REG_LIST
 
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 375795b..eb07342 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -34,6 +34,11 @@  void kvm_inject_undefined(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
 
+static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu)
+{
+	return 1;
+}
+
 static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
 {
 	return (u32 *)&vcpu->arch.regs.usr_regs.ARM_pc;
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 14a71bb..b8eb3e0 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -31,7 +31,7 @@ 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 #define KVM_HAVE_ONE_REG
 
-#define KVM_VCPU_MAX_FEATURES 0
+#define KVM_VCPU_MAX_FEATURES 1
 
 /* We don't currently support large pages. */
 #define KVM_HPAGE_GFN_SHIFT(x)	0
diff --git a/arch/arm/include/asm/kvm_psci.h b/arch/arm/include/asm/kvm_psci.h
new file mode 100644
index 0000000..992d7f1
--- /dev/null
+++ b/arch/arm/include/asm/kvm_psci.h
@@ -0,0 +1,23 @@ 
+/*
+ * Copyright (C) 2012 - ARM Ltd
+ * Author: Marc Zyngier <marc.zyngier@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __ARM_KVM_PSCI_H__
+#define __ARM_KVM_PSCI_H__
+
+int kvm_psci_call(struct kvm_vcpu *vcpu);
+
+#endif /* __ARM_KVM_PSCI_H__ */
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 94e893b..972b90d 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -81,6 +81,8 @@  struct kvm_regs {
 #define KVM_VGIC_V2_DIST_SIZE		0x1000
 #define KVM_VGIC_V2_CPU_SIZE		0x2000
 
+#define KVM_ARM_VCPU_POWER_OFF		0 /* CPU is started in OFF state */
+
 struct kvm_vcpu_init {
 	__u32 target;
 	__u32 features[7];
@@ -161,4 +163,18 @@  struct kvm_arch_memory_slot {
 /* Highest supported SPI, from VGIC_NR_IRQS */
 #define KVM_ARM_IRQ_GIC_MAX		127
 
+/* PSCI interface */
+#define KVM_PSCI_FN_BASE		0x95c1ba5e
+#define KVM_PSCI_FN(n)			(KVM_PSCI_FN_BASE + (n))
+
+#define KVM_PSCI_FN_CPU_SUSPEND		KVM_PSCI_FN(0)
+#define KVM_PSCI_FN_CPU_OFF		KVM_PSCI_FN(1)
+#define KVM_PSCI_FN_CPU_ON		KVM_PSCI_FN(2)
+#define KVM_PSCI_FN_MIGRATE		KVM_PSCI_FN(3)
+
+#define KVM_PSCI_RET_SUCCESS		0
+#define KVM_PSCI_RET_NI			((unsigned long)-1)
+#define KVM_PSCI_RET_INVAL		((unsigned long)-2)
+#define KVM_PSCI_RET_DENIED		((unsigned long)-3)
+
 #endif /* __ARM_KVM_H__ */
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 43c4cad..6a2d571 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -18,6 +18,6 @@  kvm-arm-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o guest.o mmu.o emulate.o reset.o
-obj-y += coproc.o coproc_a15.o mmio.o decode.o
+obj-y += coproc.o coproc_a15.o mmio.o decode.o psci.o
 obj-$(CONFIG_KVM_ARM_VGIC) += vgic.o
 obj-$(CONFIG_KVM_ARM_TIMER) += arch_timer.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index a28e49d..4d5ed244 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -43,6 +43,7 @@ 
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_coproc.h>
+#include <asm/kvm_psci.h>
 #include <asm/opcodes.h>
 
 #ifdef REQUIRES_VIRT
@@ -192,6 +193,7 @@  int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_SYNC_MMU:
 	case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
 	case KVM_CAP_ONE_REG:
+	case KVM_CAP_ARM_PSCI:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
@@ -496,21 +498,18 @@  static int handle_svc_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	/*
-	 * Guest called HVC instruction:
-	 * Let it know we don't want that by injecting an undefined exception.
-	 */
-	kvm_debug("hvc: %x (at %08x)", vcpu->arch.hsr & ((1 << 16) - 1),
-		  *vcpu_pc(vcpu));
-	kvm_debug("         HSR: %8x", vcpu->arch.hsr);
+	if (!kvm_psci_call(vcpu))
+		return 1;
+
 	kvm_inject_undefined(vcpu);
 	return 1;
 }
 
 static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	/* We don't support SMC; don't do that. */
-	kvm_debug("smc: at %08x", *vcpu_pc(vcpu));
+	if (!kvm_psci_call(vcpu))
+		return 1;
+
 	kvm_inject_undefined(vcpu);
 	return 1;
 }
@@ -660,6 +659,15 @@  static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 			return ret;
 	}
 
+	/*
+	 * Handle the "start in power-off" case by calling into the
+	 * PSCI code.
+	 */
+	if (test_and_clear_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features)) {
+		*vcpu_reg(vcpu, 0) = KVM_PSCI_FN_CPU_OFF;
+		kvm_psci_call(vcpu);
+	}
+
 	return 0;
 }
 
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
new file mode 100644
index 0000000..4a1c4f0
--- /dev/null
+++ b/arch/arm/kvm/psci.c
@@ -0,0 +1,90 @@ 
+/*
+ * Copyright (C) 2012 - ARM Ltd
+ * Author: Marc Zyngier <marc.zyngier@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/wait.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_psci.h>
+
+static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
+{
+	wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
+
+	vcpu->arch.pause = true;
+
+	wait_event_interruptible(*wq, !vcpu->arch.pause);
+}
+
+static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
+{
+	struct kvm *kvm = source_vcpu->kvm;
+	struct kvm_vcpu *vcpu;
+	wait_queue_head_t *wq;
+	unsigned long cpu_id;
+	phys_addr_t target_pc;
+
+	cpu_id = *vcpu_reg(source_vcpu, 1);
+	if (vcpu_mode_is_32bit(source_vcpu))
+		cpu_id &= ~((u32) 0);
+
+	if (cpu_id >= atomic_read(&kvm->online_vcpus))
+		return KVM_PSCI_RET_INVAL;
+
+	target_pc = *vcpu_reg(source_vcpu, 2);
+
+	vcpu = kvm_get_vcpu(kvm, cpu_id);
+
+	wq = kvm_arch_vcpu_wq(vcpu);
+	if (!waitqueue_active(wq))
+		return KVM_PSCI_RET_INVAL;
+
+	kvm_reset_vcpu(vcpu);
+	*vcpu_pc(vcpu) = target_pc;
+	vcpu->arch.pause = false;
+	smp_mb();		/* Make sure the above is visible */
+
+	wake_up_interruptible(wq);
+
+	return KVM_PSCI_RET_SUCCESS;
+}
+
+int kvm_psci_call(struct kvm_vcpu *vcpu)
+{
+	unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0);
+	unsigned long val;
+
+	switch (psci_fn) {
+	case KVM_PSCI_FN_CPU_OFF:
+		kvm_psci_vcpu_off(vcpu);
+		val = KVM_PSCI_RET_SUCCESS;
+		break;
+	case KVM_PSCI_FN_CPU_ON:
+		val = kvm_psci_vcpu_on(vcpu);
+		break;
+	case KVM_PSCI_FN_CPU_SUSPEND:
+	case KVM_PSCI_FN_MIGRATE:
+		val = KVM_PSCI_RET_NI;
+		break;
+
+	default:
+		return -1;
+	}
+
+	*vcpu_reg(vcpu, 0) = val;
+	return 0;
+}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 1f68151..9ff7f70 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -637,6 +637,7 @@  struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_BOOKE_WATCHDOG 83
 #define KVM_CAP_PPC_HTAB_FD 84
 #define KVM_CAP_ARM_SET_DEVICE_ADDR 85
+#define KVM_CAP_ARM_PSCI 86
 
 #ifdef KVM_CAP_IRQ_ROUTING