diff mbox

[3/5] booke: define reset and shutdown hcalls

Message ID 1373886679-19581-4-git-send-email-Bharat.Bhushan@freescale.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bharat Bhushan July 15, 2013, 11:11 a.m. UTC
KVM_HC_VM_RESET: Requests that the virtual machine be reset.
KVM_HC_VM_SHUTDOWN: Requests that the virtual machine be powered-off/halted.

These hcalls are handled by guest userspace.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
 Documentation/virtual/kvm/hypercalls.txt |   16 ++++++++++++++++
 include/uapi/linux/kvm_para.h            |    3 ++-
 2 files changed, 18 insertions(+), 1 deletions(-)

Comments

Gleb Natapov July 15, 2013, 11:30 a.m. UTC | #1
On Mon, Jul 15, 2013 at 04:41:17PM +0530, Bharat Bhushan wrote:
> KVM_HC_VM_RESET: Requests that the virtual machine be reset.
> KVM_HC_VM_SHUTDOWN: Requests that the virtual machine be powered-off/halted.
> 
> These hcalls are handled by guest userspace.
> 
> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> ---
>  Documentation/virtual/kvm/hypercalls.txt |   16 ++++++++++++++++
>  include/uapi/linux/kvm_para.h            |    3 ++-
>  2 files changed, 18 insertions(+), 1 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
> index ea113b5..58acdc1 100644
> --- a/Documentation/virtual/kvm/hypercalls.txt
> +++ b/Documentation/virtual/kvm/hypercalls.txt
> @@ -64,3 +64,19 @@ Purpose: To enable communication between the hypervisor and guest there is a
>  shared page that contains parts of supervisor visible register state.
>  The guest can map this shared page to access its supervisor register through
>  memory using this hypercall.
> +
> +5. KVM_HC_VM_RESET
> +------------------------
> +Architecture: PPC
> +Status: active
> +Purpose:  Requests that the virtual machine be reset.  The hcall takes no
> +arguments. If successful the hcall does not return. If an error occurs it
> +returns EV_INTERNAL.
> +
> +6. KVM_HC_VM_SHUTDOWN
> +------------------------
> +Architecture: PPC
> +Status: active
> +Purpose: Requests that the virtual machine be powered-off/halted.
> +The hcall takes no arguments. If successful the hcall does not return.
> +If an error occurs it returns EV_INTERNAL.
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index cea2c5c..218882d 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -19,7 +19,8 @@
>  #define KVM_HC_MMU_OP			2
>  #define KVM_HC_FEATURES			3
>  #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
> -
> +#define KVM_HC_VM_RESET			5
> +#define KVM_HC_VM_SHUTDOWN		6
There is no much sense to share hypercalls between architectures. There
is zero probability x86 will implement those for instance (not sure
why PPC will want them either instead of emulating devices that do
shutdown/reset).  So lets move them to arch headers.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 15, 2013, 11:44 a.m. UTC | #2
On 15.07.2013, at 13:30, Gleb Natapov wrote:

> On Mon, Jul 15, 2013 at 04:41:17PM +0530, Bharat Bhushan wrote:
>> KVM_HC_VM_RESET: Requests that the virtual machine be reset.
>> KVM_HC_VM_SHUTDOWN: Requests that the virtual machine be powered-off/halted.
>> 
>> These hcalls are handled by guest userspace.
>> 
>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>> ---
>> Documentation/virtual/kvm/hypercalls.txt |   16 ++++++++++++++++
>> include/uapi/linux/kvm_para.h            |    3 ++-
>> 2 files changed, 18 insertions(+), 1 deletions(-)
>> 
>> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
>> index ea113b5..58acdc1 100644
>> --- a/Documentation/virtual/kvm/hypercalls.txt
>> +++ b/Documentation/virtual/kvm/hypercalls.txt
>> @@ -64,3 +64,19 @@ Purpose: To enable communication between the hypervisor and guest there is a
>> shared page that contains parts of supervisor visible register state.
>> The guest can map this shared page to access its supervisor register through
>> memory using this hypercall.
>> +
>> +5. KVM_HC_VM_RESET
>> +------------------------
>> +Architecture: PPC
>> +Status: active
>> +Purpose:  Requests that the virtual machine be reset.  The hcall takes no
>> +arguments. If successful the hcall does not return. If an error occurs it
>> +returns EV_INTERNAL.
>> +
>> +6. KVM_HC_VM_SHUTDOWN
>> +------------------------
>> +Architecture: PPC
>> +Status: active
>> +Purpose: Requests that the virtual machine be powered-off/halted.
>> +The hcall takes no arguments. If successful the hcall does not return.
>> +If an error occurs it returns EV_INTERNAL.
>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
>> index cea2c5c..218882d 100644
>> --- a/include/uapi/linux/kvm_para.h
>> +++ b/include/uapi/linux/kvm_para.h
>> @@ -19,7 +19,8 @@
>> #define KVM_HC_MMU_OP			2
>> #define KVM_HC_FEATURES			3
>> #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
>> -
>> +#define KVM_HC_VM_RESET			5
>> +#define KVM_HC_VM_SHUTDOWN		6
> There is no much sense to share hypercalls between architectures. There
> is zero probability x86 will implement those for instance (not sure
> why PPC will want them either instead of emulating devices that do
> shutdown/reset

Implementing devices gets pretty tricky. Usually all of your devices sit on the SoC with a strictly defined layout. We can randomly shove some device in there, but there's a good chance we're overlapping with another device.

So having a separate namespace with hcalls makes things a lot easier. And the guest needs to learn how to access it either way.

> ).  So lets move them to arch headers.

Do we want to keep the numbering scheme interchangable? Maybe there will be hcalls that can get shared between archs? If so, leaving it in the same header file might make sense.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 15, 2013, 12:15 p.m. UTC | #3
On Mon, Jul 15, 2013 at 01:44:46PM +0200, Alexander Graf wrote:
> 
> On 15.07.2013, at 13:30, Gleb Natapov wrote:
> 
> > On Mon, Jul 15, 2013 at 04:41:17PM +0530, Bharat Bhushan wrote:
> >> KVM_HC_VM_RESET: Requests that the virtual machine be reset.
> >> KVM_HC_VM_SHUTDOWN: Requests that the virtual machine be powered-off/halted.
> >> 
> >> These hcalls are handled by guest userspace.
> >> 
> >> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> >> ---
> >> Documentation/virtual/kvm/hypercalls.txt |   16 ++++++++++++++++
> >> include/uapi/linux/kvm_para.h            |    3 ++-
> >> 2 files changed, 18 insertions(+), 1 deletions(-)
> >> 
> >> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
> >> index ea113b5..58acdc1 100644
> >> --- a/Documentation/virtual/kvm/hypercalls.txt
> >> +++ b/Documentation/virtual/kvm/hypercalls.txt
> >> @@ -64,3 +64,19 @@ Purpose: To enable communication between the hypervisor and guest there is a
> >> shared page that contains parts of supervisor visible register state.
> >> The guest can map this shared page to access its supervisor register through
> >> memory using this hypercall.
> >> +
> >> +5. KVM_HC_VM_RESET
> >> +------------------------
> >> +Architecture: PPC
> >> +Status: active
> >> +Purpose:  Requests that the virtual machine be reset.  The hcall takes no
> >> +arguments. If successful the hcall does not return. If an error occurs it
> >> +returns EV_INTERNAL.
> >> +
> >> +6. KVM_HC_VM_SHUTDOWN
> >> +------------------------
> >> +Architecture: PPC
> >> +Status: active
> >> +Purpose: Requests that the virtual machine be powered-off/halted.
> >> +The hcall takes no arguments. If successful the hcall does not return.
> >> +If an error occurs it returns EV_INTERNAL.
> >> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> >> index cea2c5c..218882d 100644
> >> --- a/include/uapi/linux/kvm_para.h
> >> +++ b/include/uapi/linux/kvm_para.h
> >> @@ -19,7 +19,8 @@
> >> #define KVM_HC_MMU_OP			2
> >> #define KVM_HC_FEATURES			3
> >> #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
> >> -
> >> +#define KVM_HC_VM_RESET			5
> >> +#define KVM_HC_VM_SHUTDOWN		6
> > There is no much sense to share hypercalls between architectures. There
> > is zero probability x86 will implement those for instance (not sure
> > why PPC will want them either instead of emulating devices that do
> > shutdown/reset
> 
> Implementing devices gets pretty tricky. Usually all of your devices sit on the SoC with a strictly defined layout. We can randomly shove some device in there, but there's a good chance we're overlapping with another device.
> 
I thought we have device trees to sort these things out.

> So having a separate namespace with hcalls makes things a lot easier. And the guest needs to learn how to access it either way.
> 
> > ).  So lets move them to arch headers.
> 
> Do we want to keep the numbering scheme interchangable? Maybe there will be hcalls that can get shared between archs? If so, leaving it in the same header file might make sense.
> 
hcalls will not be handled in shared code, so I do not see why would we
want to have interchangable numbering scheme. hcalls handlers of
different arches can call common code after intercepting hcall and
retrieving arguments from an arch vcpu state.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 15, 2013, 12:21 p.m. UTC | #4
On 15.07.2013, at 14:15, Gleb Natapov wrote:

> On Mon, Jul 15, 2013 at 01:44:46PM +0200, Alexander Graf wrote:
>> 
>> On 15.07.2013, at 13:30, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 15, 2013 at 04:41:17PM +0530, Bharat Bhushan wrote:
>>>> KVM_HC_VM_RESET: Requests that the virtual machine be reset.
>>>> KVM_HC_VM_SHUTDOWN: Requests that the virtual machine be powered-off/halted.
>>>> 
>>>> These hcalls are handled by guest userspace.
>>>> 
>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>> ---
>>>> Documentation/virtual/kvm/hypercalls.txt |   16 ++++++++++++++++
>>>> include/uapi/linux/kvm_para.h            |    3 ++-
>>>> 2 files changed, 18 insertions(+), 1 deletions(-)
>>>> 
>>>> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
>>>> index ea113b5..58acdc1 100644
>>>> --- a/Documentation/virtual/kvm/hypercalls.txt
>>>> +++ b/Documentation/virtual/kvm/hypercalls.txt
>>>> @@ -64,3 +64,19 @@ Purpose: To enable communication between the hypervisor and guest there is a
>>>> shared page that contains parts of supervisor visible register state.
>>>> The guest can map this shared page to access its supervisor register through
>>>> memory using this hypercall.
>>>> +
>>>> +5. KVM_HC_VM_RESET
>>>> +------------------------
>>>> +Architecture: PPC
>>>> +Status: active
>>>> +Purpose:  Requests that the virtual machine be reset.  The hcall takes no
>>>> +arguments. If successful the hcall does not return. If an error occurs it
>>>> +returns EV_INTERNAL.
>>>> +
>>>> +6. KVM_HC_VM_SHUTDOWN
>>>> +------------------------
>>>> +Architecture: PPC
>>>> +Status: active
>>>> +Purpose: Requests that the virtual machine be powered-off/halted.
>>>> +The hcall takes no arguments. If successful the hcall does not return.
>>>> +If an error occurs it returns EV_INTERNAL.
>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
>>>> index cea2c5c..218882d 100644
>>>> --- a/include/uapi/linux/kvm_para.h
>>>> +++ b/include/uapi/linux/kvm_para.h
>>>> @@ -19,7 +19,8 @@
>>>> #define KVM_HC_MMU_OP			2
>>>> #define KVM_HC_FEATURES			3
>>>> #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
>>>> -
>>>> +#define KVM_HC_VM_RESET			5
>>>> +#define KVM_HC_VM_SHUTDOWN		6
>>> There is no much sense to share hypercalls between architectures. There
>>> is zero probability x86 will implement those for instance (not sure
>>> why PPC will want them either instead of emulating devices that do
>>> shutdown/reset
>> 
>> Implementing devices gets pretty tricky. Usually all of your devices sit on the SoC with a strictly defined layout. We can randomly shove some device in there, but there's a good chance we're overlapping with another device.
>> 
> I thought we have device trees to sort these things out.

For Linux guests, yes :). For proprietary random other guests, no.

> 
>> So having a separate namespace with hcalls makes things a lot easier. And the guest needs to learn how to access it either way.
>> 
>>> ).  So lets move them to arch headers.
>> 
>> Do we want to keep the numbering scheme interchangable? Maybe there will be hcalls that can get shared between archs? If so, leaving it in the same header file might make sense.
>> 
> hcalls will not be handled in shared code, so I do not see why would we
> want to have interchangable numbering scheme. hcalls handlers of
> different arches can call common code after intercepting hcall and
> retrieving arguments from an arch vcpu state.

Works for me, but then we should make hcall numbers 100% arch specific and have no global hc namespace anymore.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 15, 2013, 12:24 p.m. UTC | #5
On Mon, Jul 15, 2013 at 02:21:51PM +0200, Alexander Graf wrote:
> 
> On 15.07.2013, at 14:15, Gleb Natapov wrote:
> 
> > On Mon, Jul 15, 2013 at 01:44:46PM +0200, Alexander Graf wrote:
> >> 
> >> On 15.07.2013, at 13:30, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 15, 2013 at 04:41:17PM +0530, Bharat Bhushan wrote:
> >>>> KVM_HC_VM_RESET: Requests that the virtual machine be reset.
> >>>> KVM_HC_VM_SHUTDOWN: Requests that the virtual machine be powered-off/halted.
> >>>> 
> >>>> These hcalls are handled by guest userspace.
> >>>> 
> >>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> >>>> ---
> >>>> Documentation/virtual/kvm/hypercalls.txt |   16 ++++++++++++++++
> >>>> include/uapi/linux/kvm_para.h            |    3 ++-
> >>>> 2 files changed, 18 insertions(+), 1 deletions(-)
> >>>> 
> >>>> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
> >>>> index ea113b5..58acdc1 100644
> >>>> --- a/Documentation/virtual/kvm/hypercalls.txt
> >>>> +++ b/Documentation/virtual/kvm/hypercalls.txt
> >>>> @@ -64,3 +64,19 @@ Purpose: To enable communication between the hypervisor and guest there is a
> >>>> shared page that contains parts of supervisor visible register state.
> >>>> The guest can map this shared page to access its supervisor register through
> >>>> memory using this hypercall.
> >>>> +
> >>>> +5. KVM_HC_VM_RESET
> >>>> +------------------------
> >>>> +Architecture: PPC
> >>>> +Status: active
> >>>> +Purpose:  Requests that the virtual machine be reset.  The hcall takes no
> >>>> +arguments. If successful the hcall does not return. If an error occurs it
> >>>> +returns EV_INTERNAL.
> >>>> +
> >>>> +6. KVM_HC_VM_SHUTDOWN
> >>>> +------------------------
> >>>> +Architecture: PPC
> >>>> +Status: active
> >>>> +Purpose: Requests that the virtual machine be powered-off/halted.
> >>>> +The hcall takes no arguments. If successful the hcall does not return.
> >>>> +If an error occurs it returns EV_INTERNAL.
> >>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> >>>> index cea2c5c..218882d 100644
> >>>> --- a/include/uapi/linux/kvm_para.h
> >>>> +++ b/include/uapi/linux/kvm_para.h
> >>>> @@ -19,7 +19,8 @@
> >>>> #define KVM_HC_MMU_OP			2
> >>>> #define KVM_HC_FEATURES			3
> >>>> #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
> >>>> -
> >>>> +#define KVM_HC_VM_RESET			5
> >>>> +#define KVM_HC_VM_SHUTDOWN		6
> >>> There is no much sense to share hypercalls between architectures. There
> >>> is zero probability x86 will implement those for instance (not sure
> >>> why PPC will want them either instead of emulating devices that do
> >>> shutdown/reset
> >> 
> >> Implementing devices gets pretty tricky. Usually all of your devices sit on the SoC with a strictly defined layout. We can randomly shove some device in there, but there's a good chance we're overlapping with another device.
> >> 
> > I thought we have device trees to sort these things out.
> 
> For Linux guests, yes :). For proprietary random other guests, no.
> 
But those can't use hcalls too, no?

> > 
> >> So having a separate namespace with hcalls makes things a lot easier. And the guest needs to learn how to access it either way.
> >> 
> >>> ).  So lets move them to arch headers.
> >> 
> >> Do we want to keep the numbering scheme interchangable? Maybe there will be hcalls that can get shared between archs? If so, leaving it in the same header file might make sense.
> >> 
> > hcalls will not be handled in shared code, so I do not see why would we
> > want to have interchangable numbering scheme. hcalls handlers of
> > different arches can call common code after intercepting hcall and
> > retrieving arguments from an arch vcpu state.
> 
> Works for me, but then we should make hcall numbers 100% arch specific and have no global hc namespace anymore.
> 
Yes, of course. Move all of them to arch headers.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 15, 2013, 12:26 p.m. UTC | #6
On 15.07.2013, at 14:24, Gleb Natapov wrote:

> On Mon, Jul 15, 2013 at 02:21:51PM +0200, Alexander Graf wrote:
>> 
>> On 15.07.2013, at 14:15, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 15, 2013 at 01:44:46PM +0200, Alexander Graf wrote:
>>>> 
>>>> On 15.07.2013, at 13:30, Gleb Natapov wrote:
>>>> 
>>>>> On Mon, Jul 15, 2013 at 04:41:17PM +0530, Bharat Bhushan wrote:
>>>>>> KVM_HC_VM_RESET: Requests that the virtual machine be reset.
>>>>>> KVM_HC_VM_SHUTDOWN: Requests that the virtual machine be powered-off/halted.
>>>>>> 
>>>>>> These hcalls are handled by guest userspace.
>>>>>> 
>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>> ---
>>>>>> Documentation/virtual/kvm/hypercalls.txt |   16 ++++++++++++++++
>>>>>> include/uapi/linux/kvm_para.h            |    3 ++-
>>>>>> 2 files changed, 18 insertions(+), 1 deletions(-)
>>>>>> 
>>>>>> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
>>>>>> index ea113b5..58acdc1 100644
>>>>>> --- a/Documentation/virtual/kvm/hypercalls.txt
>>>>>> +++ b/Documentation/virtual/kvm/hypercalls.txt
>>>>>> @@ -64,3 +64,19 @@ Purpose: To enable communication between the hypervisor and guest there is a
>>>>>> shared page that contains parts of supervisor visible register state.
>>>>>> The guest can map this shared page to access its supervisor register through
>>>>>> memory using this hypercall.
>>>>>> +
>>>>>> +5. KVM_HC_VM_RESET
>>>>>> +------------------------
>>>>>> +Architecture: PPC
>>>>>> +Status: active
>>>>>> +Purpose:  Requests that the virtual machine be reset.  The hcall takes no
>>>>>> +arguments. If successful the hcall does not return. If an error occurs it
>>>>>> +returns EV_INTERNAL.
>>>>>> +
>>>>>> +6. KVM_HC_VM_SHUTDOWN
>>>>>> +------------------------
>>>>>> +Architecture: PPC
>>>>>> +Status: active
>>>>>> +Purpose: Requests that the virtual machine be powered-off/halted.
>>>>>> +The hcall takes no arguments. If successful the hcall does not return.
>>>>>> +If an error occurs it returns EV_INTERNAL.
>>>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
>>>>>> index cea2c5c..218882d 100644
>>>>>> --- a/include/uapi/linux/kvm_para.h
>>>>>> +++ b/include/uapi/linux/kvm_para.h
>>>>>> @@ -19,7 +19,8 @@
>>>>>> #define KVM_HC_MMU_OP			2
>>>>>> #define KVM_HC_FEATURES			3
>>>>>> #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
>>>>>> -
>>>>>> +#define KVM_HC_VM_RESET			5
>>>>>> +#define KVM_HC_VM_SHUTDOWN		6
>>>>> There is no much sense to share hypercalls between architectures. There
>>>>> is zero probability x86 will implement those for instance (not sure
>>>>> why PPC will want them either instead of emulating devices that do
>>>>> shutdown/reset
>>>> 
>>>> Implementing devices gets pretty tricky. Usually all of your devices sit on the SoC with a strictly defined layout. We can randomly shove some device in there, but there's a good chance we're overlapping with another device.
>>>> 
>>> I thought we have device trees to sort these things out.
>> 
>> For Linux guests, yes :). For proprietary random other guests, no.
>> 
> But those can't use hcalls too, no?

Why not? There are customers out there who are more than happy to add functionality, but not change functionality.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 15, 2013, 12:31 p.m. UTC | #7
On Mon, Jul 15, 2013 at 02:26:38PM +0200, Alexander Graf wrote:
> 
> On 15.07.2013, at 14:24, Gleb Natapov wrote:
> 
> > On Mon, Jul 15, 2013 at 02:21:51PM +0200, Alexander Graf wrote:
> >> 
> >> On 15.07.2013, at 14:15, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 15, 2013 at 01:44:46PM +0200, Alexander Graf wrote:
> >>>> 
> >>>> On 15.07.2013, at 13:30, Gleb Natapov wrote:
> >>>> 
> >>>>> On Mon, Jul 15, 2013 at 04:41:17PM +0530, Bharat Bhushan wrote:
> >>>>>> KVM_HC_VM_RESET: Requests that the virtual machine be reset.
> >>>>>> KVM_HC_VM_SHUTDOWN: Requests that the virtual machine be powered-off/halted.
> >>>>>> 
> >>>>>> These hcalls are handled by guest userspace.
> >>>>>> 
> >>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> >>>>>> ---
> >>>>>> Documentation/virtual/kvm/hypercalls.txt |   16 ++++++++++++++++
> >>>>>> include/uapi/linux/kvm_para.h            |    3 ++-
> >>>>>> 2 files changed, 18 insertions(+), 1 deletions(-)
> >>>>>> 
> >>>>>> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
> >>>>>> index ea113b5..58acdc1 100644
> >>>>>> --- a/Documentation/virtual/kvm/hypercalls.txt
> >>>>>> +++ b/Documentation/virtual/kvm/hypercalls.txt
> >>>>>> @@ -64,3 +64,19 @@ Purpose: To enable communication between the hypervisor and guest there is a
> >>>>>> shared page that contains parts of supervisor visible register state.
> >>>>>> The guest can map this shared page to access its supervisor register through
> >>>>>> memory using this hypercall.
> >>>>>> +
> >>>>>> +5. KVM_HC_VM_RESET
> >>>>>> +------------------------
> >>>>>> +Architecture: PPC
> >>>>>> +Status: active
> >>>>>> +Purpose:  Requests that the virtual machine be reset.  The hcall takes no
> >>>>>> +arguments. If successful the hcall does not return. If an error occurs it
> >>>>>> +returns EV_INTERNAL.
> >>>>>> +
> >>>>>> +6. KVM_HC_VM_SHUTDOWN
> >>>>>> +------------------------
> >>>>>> +Architecture: PPC
> >>>>>> +Status: active
> >>>>>> +Purpose: Requests that the virtual machine be powered-off/halted.
> >>>>>> +The hcall takes no arguments. If successful the hcall does not return.
> >>>>>> +If an error occurs it returns EV_INTERNAL.
> >>>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> >>>>>> index cea2c5c..218882d 100644
> >>>>>> --- a/include/uapi/linux/kvm_para.h
> >>>>>> +++ b/include/uapi/linux/kvm_para.h
> >>>>>> @@ -19,7 +19,8 @@
> >>>>>> #define KVM_HC_MMU_OP			2
> >>>>>> #define KVM_HC_FEATURES			3
> >>>>>> #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
> >>>>>> -
> >>>>>> +#define KVM_HC_VM_RESET			5
> >>>>>> +#define KVM_HC_VM_SHUTDOWN		6
> >>>>> There is no much sense to share hypercalls between architectures. There
> >>>>> is zero probability x86 will implement those for instance (not sure
> >>>>> why PPC will want them either instead of emulating devices that do
> >>>>> shutdown/reset
> >>>> 
> >>>> Implementing devices gets pretty tricky. Usually all of your devices sit on the SoC with a strictly defined layout. We can randomly shove some device in there, but there's a good chance we're overlapping with another device.
> >>>> 
> >>> I thought we have device trees to sort these things out.
> >> 
> >> For Linux guests, yes :). For proprietary random other guests, no.
> >> 
> > But those can't use hcalls too, no?
> 
> Why not? There are customers out there who are more than happy to add functionality, but not change functionality.
> 
Ah, so you are talking about proprietary guests that are actively
developed and it is easier to use separate address space than make them
parse device tree or agree upon common device location, oh well.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 16, 2013, 6:35 a.m. UTC | #8
On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
> >On Mon, Jul 15, 2013 at 04:41:17PM +0530, Bharat Bhushan wrote:
> >> KVM_HC_VM_RESET: Requests that the virtual machine be reset.
> >> KVM_HC_VM_SHUTDOWN: Requests that the virtual machine be
> >powered-off/halted.
> >>
> >> These hcalls are handled by guest userspace.
> >>
> >> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> >> ---
> >>  Documentation/virtual/kvm/hypercalls.txt |   16 ++++++++++++++++
> >>  include/uapi/linux/kvm_para.h            |    3 ++-
> >>  2 files changed, 18 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/Documentation/virtual/kvm/hypercalls.txt
> >b/Documentation/virtual/kvm/hypercalls.txt
> >> index ea113b5..58acdc1 100644
> >> --- a/Documentation/virtual/kvm/hypercalls.txt
> >> +++ b/Documentation/virtual/kvm/hypercalls.txt
> >> @@ -64,3 +64,19 @@ Purpose: To enable communication between the
> >hypervisor and guest there is a
> >>  shared page that contains parts of supervisor visible register
> >state.
> >>  The guest can map this shared page to access its supervisor
> >register through
> >>  memory using this hypercall.
> >> +
> >> +5. KVM_HC_VM_RESET
> >> +------------------------
> >> +Architecture: PPC
> >> +Status: active
> >> +Purpose:  Requests that the virtual machine be reset.  The
> >hcall takes no
> >> +arguments. If successful the hcall does not return. If an error
> >occurs it
> >> +returns EV_INTERNAL.
> >> +
> >> +6. KVM_HC_VM_SHUTDOWN
> >> +------------------------
> >> +Architecture: PPC
> >> +Status: active
> >> +Purpose: Requests that the virtual machine be powered-off/halted.
> >> +The hcall takes no arguments. If successful the hcall does not
> >return.
> >> +If an error occurs it returns EV_INTERNAL.
> >> diff --git a/include/uapi/linux/kvm_para.h
> >b/include/uapi/linux/kvm_para.h
> >> index cea2c5c..218882d 100644
> >> --- a/include/uapi/linux/kvm_para.h
> >> +++ b/include/uapi/linux/kvm_para.h
> >> @@ -19,7 +19,8 @@
> >>  #define KVM_HC_MMU_OP			2
> >>  #define KVM_HC_FEATURES			3
> >>  #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
> >> -
> >> +#define KVM_HC_VM_RESET			5
> >> +#define KVM_HC_VM_SHUTDOWN		6
> >There is no much sense to share hypercalls between architectures.
> >There
> >is zero probability x86 will implement those for instance
> 
> This is similar to the question of whether to keep device API
> enumerations per-architecture...  It costs very little to keep it in
> a common place, and it's hard to go back in the other direction if
> we later realize there are things that should be shared.
>
This is different from device API since with device API all arches have
to create/destroy devices, so it make sense to put device lifecycle
management into the common code, and device API has single entry point
to the code - device fd ioctl - where it makes sense to handle common
tasks, if any, and despatch others to specific device implementation.

This is totally unlike hypercalls which are, by definition, very
architecture specific (the way they are triggered, the way parameter
are passed from guest to host, what hypercalls arch needs...). The
entry point of hypercalls is in arch specific code (again unlike device
API), so they way to reuse code if need arise is different too and does
not require common name space - just call common function after
retrieving hypercalls parameters in arch specific way.

> Keeping it in a common place also makes it more visible to people
> looking to add new hcalls, which could cut down on reinventing the
> wheel.
I do not want other arches to start using hypercalls in the way powerpc
started to use them: separate device io space, so it is better to hide
this as far away from common code as possible :) But on a more serious
note hypercalls should be a last resort and added only when no other
possibility exists, so people should not look what hcalls others
implemented, so they can add them to their favorite arch, but they
should have a problem at hand that they cannot solve without hcall, but
at this point they will have pretty good idea what this hcall should do.

> 
> >(not sure why PPC will want them either instead of emulating
> >devices that do
> >shutdown/reset).
> 
> Besides what Alex said, for shutdown we don't have any existing
> device to emulate (our real hardware just doesn't have that
> functionality).  For reset we currently do emulate, but it's awkward
> to describe in the device tree what we actually emulate since the
> reset functionality is part of a kitchen-sink "device" of which we
> emulate virtually nothing other than the reset.  Currently we
> advertise the entire thing and just ignore the rest, but that causes
> problems with the guest seeing the node and trying to use that
> functionality.
> 
What about writing virtio device for shutdown and add missing emulation
to kitchen-sink device (yeah I know easily said that done), or make
the virtio device handle reset too? This of course raises the question
what address to use for a device hence the idea to use hcalls as
separate address space.
 
--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 17, 2013, 11 a.m. UTC | #9
On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
> >On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
> >> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
> >> >There is no much sense to share hypercalls between architectures.
> >> >There
> >> >is zero probability x86 will implement those for instance
> >>
> >> This is similar to the question of whether to keep device API
> >> enumerations per-architecture...  It costs very little to keep it in
> >> a common place, and it's hard to go back in the other direction if
> >> we later realize there are things that should be shared.
> >>
> >This is different from device API since with device API all arches
> >have
> >to create/destroy devices, so it make sense to put device lifecycle
> >management into the common code, and device API has single entry point
> >to the code - device fd ioctl - where it makes sense to handle common
> >tasks, if any, and despatch others to specific device implementation.
> >
> >This is totally unlike hypercalls which are, by definition, very
> >architecture specific (the way they are triggered, the way parameter
> >are passed from guest to host, what hypercalls arch needs...).
> 
> The ABI is architecture specific.  The API doesn't need to be, any
> more than it does with syscalls (I consider the
> architecture-specific definition of syscall numbers and similar
> constants in Linux to be unfortunate, especially for tools such as
> strace or QEMU's linux-user emulation).
> 
Unlike syscalls different arches have very different ideas what
hypercalls they need to implement, so while with unified syscall space I
can see how it may benefit (very) small number of tools, I do not see
what advantage it will give us. The disadvantage is one more global name
space to manage.

> >> Keeping it in a common place also makes it more visible to people
> >> looking to add new hcalls, which could cut down on reinventing the
> >> wheel.
> >I do not want other arches to start using hypercalls in the way
> >powerpc
> >started to use them: separate device io space, so it is better to hide
> >this as far away from common code as possible :) But on a more serious
> >note hypercalls should be a last resort and added only when no other
> >possibility exists, so people should not look what hcalls others
> >implemented, so they can add them to their favorite arch, but they
> >should have a problem at hand that they cannot solve without
> >hcall, but
> >at this point they will have pretty good idea what this hcall
> >should do.
> 
> Why are hcalls such a bad thing?
> 
Because they often used to do non architectural things making OSes
behave different from how they runs on real HW and real HW is what
OSes are designed and tested for. Example: there once was a KVM (XEN
have/had similar one) hypercall to accelerate MMU operation.  One thing it
allowed is to to flush tlb without doing IPI if vcpu is not running. Later
optimization was added to Linux MMU code that _relies_ on those IPIs for
synchronisation. Good that at that point those hypercalls were already
deprecated on KVM (IIRC XEN was broke for some time in that regard). Which
brings me to another point: they often get obsoleted by code improvement
and HW advancement (happened to aforementioned MMU hypercalls), but they
hard to deprecate if hypervisor supports live migration, without live
migration it is less of a problem. Next point is that people often try
to use them instead of emulate PV or real device just because they
think it is easier, but it is often not so. Example: pvpanic device was
initially proposed as hypercall, so lets say we would implement it as
such. It would have been KVM specific, implementation would touch core
guest KVM code and would have been Linux guest specific. Instead it was
implemented as platform device with very small platform driver confined
in drivers/ directory, immediately usable by XEN and QEMU tcg in addition
to KVM, will likely gain Windows driver. No downsides, only upsides.

So given all that hypercalls are considered more of a necessary evil in
KVM land :)

> Should new Linux syscalls be avoided too, in favor of new emulated
> devices exposed via vfio? :-)
Try to add new syscall to Linux and see how simple it is.

> 
> >> >(not sure why PPC will want them either instead of emulating
> >> >devices that do
> >> >shutdown/reset).
> >>
> >> Besides what Alex said, for shutdown we don't have any existing
> >> device to emulate (our real hardware just doesn't have that
> >> functionality).  For reset we currently do emulate, but it's awkward
> >> to describe in the device tree what we actually emulate since the
> >> reset functionality is part of a kitchen-sink "device" of which we
> >> emulate virtually nothing other than the reset.  Currently we
> >> advertise the entire thing and just ignore the rest, but that causes
> >> problems with the guest seeing the node and trying to use that
> >> functionality.
> >>
> >What about writing virtio device for shutdown
> 
> That sounds like quite a bit more work than hcalls.  It also ties up
> a virtual PCI slot -- some machines don't have very many (mpc8544ds
> has 2, though we could and should expand that in the paravirt e500
> machine).
Yes, virtio device may be more work, but it should not be complex
or high performance device, having only one outstanding command will
be OK.  The 2 slots limit is to harsh indeed, but since hcall implies PV
the device may be available only on paravirt. And device functionality
can be expandable, so you will not need to write another one and take
another slot for each little thing you want to add. It can advertise
capability in one bar and takes command/return values through virtio ring.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 17, 2013, 12:19 p.m. UTC | #10
On 17.07.2013, at 13:00, Gleb Natapov wrote:

> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
>>>>> There is no much sense to share hypercalls between architectures.
>>>>> There
>>>>> is zero probability x86 will implement those for instance
>>>> 
>>>> This is similar to the question of whether to keep device API
>>>> enumerations per-architecture...  It costs very little to keep it in
>>>> a common place, and it's hard to go back in the other direction if
>>>> we later realize there are things that should be shared.
>>>> 
>>> This is different from device API since with device API all arches
>>> have
>>> to create/destroy devices, so it make sense to put device lifecycle
>>> management into the common code, and device API has single entry point
>>> to the code - device fd ioctl - where it makes sense to handle common
>>> tasks, if any, and despatch others to specific device implementation.
>>> 
>>> This is totally unlike hypercalls which are, by definition, very
>>> architecture specific (the way they are triggered, the way parameter
>>> are passed from guest to host, what hypercalls arch needs...).
>> 
>> The ABI is architecture specific.  The API doesn't need to be, any
>> more than it does with syscalls (I consider the
>> architecture-specific definition of syscall numbers and similar
>> constants in Linux to be unfortunate, especially for tools such as
>> strace or QEMU's linux-user emulation).
>> 
> Unlike syscalls different arches have very different ideas what
> hypercalls they need to implement, so while with unified syscall space I
> can see how it may benefit (very) small number of tools, I do not see
> what advantage it will give us. The disadvantage is one more global name
> space to manage.
> 
>>>> Keeping it in a common place also makes it more visible to people
>>>> looking to add new hcalls, which could cut down on reinventing the
>>>> wheel.
>>> I do not want other arches to start using hypercalls in the way
>>> powerpc
>>> started to use them: separate device io space, so it is better to hide
>>> this as far away from common code as possible :) But on a more serious
>>> note hypercalls should be a last resort and added only when no other
>>> possibility exists, so people should not look what hcalls others
>>> implemented, so they can add them to their favorite arch, but they
>>> should have a problem at hand that they cannot solve without
>>> hcall, but
>>> at this point they will have pretty good idea what this hcall
>>> should do.
>> 
>> Why are hcalls such a bad thing?
>> 
> Because they often used to do non architectural things making OSes
> behave different from how they runs on real HW and real HW is what
> OSes are designed and tested for. Example: there once was a KVM (XEN
> have/had similar one) hypercall to accelerate MMU operation.  One thing it
> allowed is to to flush tlb without doing IPI if vcpu is not running. Later
> optimization was added to Linux MMU code that _relies_ on those IPIs for
> synchronisation. Good that at that point those hypercalls were already
> deprecated on KVM (IIRC XEN was broke for some time in that regard). Which
> brings me to another point: they often get obsoleted by code improvement
> and HW advancement (happened to aforementioned MMU hypercalls), but they
> hard to deprecate if hypervisor supports live migration, without live
> migration it is less of a problem. Next point is that people often try
> to use them instead of emulate PV or real device just because they
> think it is easier, but it is often not so. Example: pvpanic device was
> initially proposed as hypercall, so lets say we would implement it as
> such. It would have been KVM specific, implementation would touch core
> guest KVM code and would have been Linux guest specific. Instead it was
> implemented as platform device with very small platform driver confined
> in drivers/ directory, immediately usable by XEN and QEMU tcg in addition

This is actually a very good point. How do we support reboot and shutdown for TCG guests? We surely don't want to expose TCG as KVM hypervisor.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yoder Stuart-B08248 July 17, 2013, 3:19 p.m. UTC | #11
> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Wednesday, July 17, 2013 7:19 AM
> To: Gleb Natapov
> Cc: Wood Scott-B07421; Bhushan Bharat-R65777; kvm@vger.kernel.org; kvm-ppc@vger.kernel.org; Yoder
> Stuart-B08248; Bhushan Bharat-R65777
> Subject: Re: [PATCH 3/5] booke: define reset and shutdown hcalls
> 
> 
> On 17.07.2013, at 13:00, Gleb Natapov wrote:
> 
> > On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
> >> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
> >>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
> >>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
> >>>>> There is no much sense to share hypercalls between architectures.
> >>>>> There
> >>>>> is zero probability x86 will implement those for instance
> >>>>
> >>>> This is similar to the question of whether to keep device API
> >>>> enumerations per-architecture...  It costs very little to keep it in
> >>>> a common place, and it's hard to go back in the other direction if
> >>>> we later realize there are things that should be shared.
> >>>>
> >>> This is different from device API since with device API all arches
> >>> have
> >>> to create/destroy devices, so it make sense to put device lifecycle
> >>> management into the common code, and device API has single entry point
> >>> to the code - device fd ioctl - where it makes sense to handle common
> >>> tasks, if any, and despatch others to specific device implementation.
> >>>
> >>> This is totally unlike hypercalls which are, by definition, very
> >>> architecture specific (the way they are triggered, the way parameter
> >>> are passed from guest to host, what hypercalls arch needs...).
> >>
> >> The ABI is architecture specific.  The API doesn't need to be, any
> >> more than it does with syscalls (I consider the
> >> architecture-specific definition of syscall numbers and similar
> >> constants in Linux to be unfortunate, especially for tools such as
> >> strace or QEMU's linux-user emulation).
> >>
> > Unlike syscalls different arches have very different ideas what
> > hypercalls they need to implement, so while with unified syscall space I
> > can see how it may benefit (very) small number of tools, I do not see
> > what advantage it will give us. The disadvantage is one more global name
> > space to manage.
> >
> >>>> Keeping it in a common place also makes it more visible to people
> >>>> looking to add new hcalls, which could cut down on reinventing the
> >>>> wheel.
> >>> I do not want other arches to start using hypercalls in the way
> >>> powerpc
> >>> started to use them: separate device io space, so it is better to hide
> >>> this as far away from common code as possible :) But on a more serious
> >>> note hypercalls should be a last resort and added only when no other
> >>> possibility exists, so people should not look what hcalls others
> >>> implemented, so they can add them to their favorite arch, but they
> >>> should have a problem at hand that they cannot solve without
> >>> hcall, but
> >>> at this point they will have pretty good idea what this hcall
> >>> should do.
> >>
> >> Why are hcalls such a bad thing?
> >>
> > Because they often used to do non architectural things making OSes
> > behave different from how they runs on real HW and real HW is what
> > OSes are designed and tested for. Example: there once was a KVM (XEN
> > have/had similar one) hypercall to accelerate MMU operation.  One thing it
> > allowed is to to flush tlb without doing IPI if vcpu is not running. Later
> > optimization was added to Linux MMU code that _relies_ on those IPIs for
> > synchronisation. Good that at that point those hypercalls were already
> > deprecated on KVM (IIRC XEN was broke for some time in that regard). Which
> > brings me to another point: they often get obsoleted by code improvement
> > and HW advancement (happened to aforementioned MMU hypercalls), but they
> > hard to deprecate if hypervisor supports live migration, without live
> > migration it is less of a problem. Next point is that people often try
> > to use them instead of emulate PV or real device just because they
> > think it is easier, but it is often not so. Example: pvpanic device was
> > initially proposed as hypercall, so lets say we would implement it as
> > such. It would have been KVM specific, implementation would touch core
> > guest KVM code and would have been Linux guest specific. Instead it was
> > implemented as platform device with very small platform driver confined
> > in drivers/ directory, immediately usable by XEN and QEMU tcg in addition
> 
> This is actually a very good point. How do we support reboot and shutdown for TCG guests? We surely
> don't want to expose TCG as KVM hypervisor.

Hmm...so are you proposing that we abandon the current approach,
and switch to a device-based mechanism for reboot/shutdown?

Stuart

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 17, 2013, 3:21 p.m. UTC | #12
On 17.07.2013, at 17:19, Yoder Stuart-B08248 wrote:

> 
> 
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Wednesday, July 17, 2013 7:19 AM
>> To: Gleb Natapov
>> Cc: Wood Scott-B07421; Bhushan Bharat-R65777; kvm@vger.kernel.org; kvm-ppc@vger.kernel.org; Yoder
>> Stuart-B08248; Bhushan Bharat-R65777
>> Subject: Re: [PATCH 3/5] booke: define reset and shutdown hcalls
>> 
>> 
>> On 17.07.2013, at 13:00, Gleb Natapov wrote:
>> 
>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
>>>>>>> There is no much sense to share hypercalls between architectures.
>>>>>>> There
>>>>>>> is zero probability x86 will implement those for instance
>>>>>> 
>>>>>> This is similar to the question of whether to keep device API
>>>>>> enumerations per-architecture...  It costs very little to keep it in
>>>>>> a common place, and it's hard to go back in the other direction if
>>>>>> we later realize there are things that should be shared.
>>>>>> 
>>>>> This is different from device API since with device API all arches
>>>>> have
>>>>> to create/destroy devices, so it make sense to put device lifecycle
>>>>> management into the common code, and device API has single entry point
>>>>> to the code - device fd ioctl - where it makes sense to handle common
>>>>> tasks, if any, and despatch others to specific device implementation.
>>>>> 
>>>>> This is totally unlike hypercalls which are, by definition, very
>>>>> architecture specific (the way they are triggered, the way parameter
>>>>> are passed from guest to host, what hypercalls arch needs...).
>>>> 
>>>> The ABI is architecture specific.  The API doesn't need to be, any
>>>> more than it does with syscalls (I consider the
>>>> architecture-specific definition of syscall numbers and similar
>>>> constants in Linux to be unfortunate, especially for tools such as
>>>> strace or QEMU's linux-user emulation).
>>>> 
>>> Unlike syscalls different arches have very different ideas what
>>> hypercalls they need to implement, so while with unified syscall space I
>>> can see how it may benefit (very) small number of tools, I do not see
>>> what advantage it will give us. The disadvantage is one more global name
>>> space to manage.
>>> 
>>>>>> Keeping it in a common place also makes it more visible to people
>>>>>> looking to add new hcalls, which could cut down on reinventing the
>>>>>> wheel.
>>>>> I do not want other arches to start using hypercalls in the way
>>>>> powerpc
>>>>> started to use them: separate device io space, so it is better to hide
>>>>> this as far away from common code as possible :) But on a more serious
>>>>> note hypercalls should be a last resort and added only when no other
>>>>> possibility exists, so people should not look what hcalls others
>>>>> implemented, so they can add them to their favorite arch, but they
>>>>> should have a problem at hand that they cannot solve without
>>>>> hcall, but
>>>>> at this point they will have pretty good idea what this hcall
>>>>> should do.
>>>> 
>>>> Why are hcalls such a bad thing?
>>>> 
>>> Because they often used to do non architectural things making OSes
>>> behave different from how they runs on real HW and real HW is what
>>> OSes are designed and tested for. Example: there once was a KVM (XEN
>>> have/had similar one) hypercall to accelerate MMU operation.  One thing it
>>> allowed is to to flush tlb without doing IPI if vcpu is not running. Later
>>> optimization was added to Linux MMU code that _relies_ on those IPIs for
>>> synchronisation. Good that at that point those hypercalls were already
>>> deprecated on KVM (IIRC XEN was broke for some time in that regard). Which
>>> brings me to another point: they often get obsoleted by code improvement
>>> and HW advancement (happened to aforementioned MMU hypercalls), but they
>>> hard to deprecate if hypervisor supports live migration, without live
>>> migration it is less of a problem. Next point is that people often try
>>> to use them instead of emulate PV or real device just because they
>>> think it is easier, but it is often not so. Example: pvpanic device was
>>> initially proposed as hypercall, so lets say we would implement it as
>>> such. It would have been KVM specific, implementation would touch core
>>> guest KVM code and would have been Linux guest specific. Instead it was
>>> implemented as platform device with very small platform driver confined
>>> in drivers/ directory, immediately usable by XEN and QEMU tcg in addition
>> 
>> This is actually a very good point. How do we support reboot and shutdown for TCG guests? We surely
>> don't want to expose TCG as KVM hypervisor.
> 
> Hmm...so are you proposing that we abandon the current approach,
> and switch to a device-based mechanism for reboot/shutdown?

Reading Gleb's email it sounds like the more future proof approach, yes. I'm not quite sure yet where we should plug this though.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yoder Stuart-B08248 July 17, 2013, 3:36 p.m. UTC | #13
> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Wednesday, July 17, 2013 10:21 AM
> To: Yoder Stuart-B08248
> Cc: Wood Scott-B07421; Bhushan Bharat-R65777; kvm@vger.kernel.org; kvm-ppc@vger.kernel.org; Gleb Natapov
> Subject: Re: [PATCH 3/5] booke: define reset and shutdown hcalls
> 
> 
> On 17.07.2013, at 17:19, Yoder Stuart-B08248 wrote:
> 
> >
> >
> >> -----Original Message-----
> >> From: Alexander Graf [mailto:agraf@suse.de]
> >> Sent: Wednesday, July 17, 2013 7:19 AM
> >> To: Gleb Natapov
> >> Cc: Wood Scott-B07421; Bhushan Bharat-R65777; kvm@vger.kernel.org; kvm-ppc@vger.kernel.org; Yoder
> >> Stuart-B08248; Bhushan Bharat-R65777
> >> Subject: Re: [PATCH 3/5] booke: define reset and shutdown hcalls
> >>
> >>
> >> On 17.07.2013, at 13:00, Gleb Natapov wrote:
> >>
> >>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
> >>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
> >>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
> >>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
> >>>>>>> There is no much sense to share hypercalls between architectures.
> >>>>>>> There
> >>>>>>> is zero probability x86 will implement those for instance
> >>>>>>
> >>>>>> This is similar to the question of whether to keep device API
> >>>>>> enumerations per-architecture...  It costs very little to keep it in
> >>>>>> a common place, and it's hard to go back in the other direction if
> >>>>>> we later realize there are things that should be shared.
> >>>>>>
> >>>>> This is different from device API since with device API all arches
> >>>>> have
> >>>>> to create/destroy devices, so it make sense to put device lifecycle
> >>>>> management into the common code, and device API has single entry point
> >>>>> to the code - device fd ioctl - where it makes sense to handle common
> >>>>> tasks, if any, and despatch others to specific device implementation.
> >>>>>
> >>>>> This is totally unlike hypercalls which are, by definition, very
> >>>>> architecture specific (the way they are triggered, the way parameter
> >>>>> are passed from guest to host, what hypercalls arch needs...).
> >>>>
> >>>> The ABI is architecture specific.  The API doesn't need to be, any
> >>>> more than it does with syscalls (I consider the
> >>>> architecture-specific definition of syscall numbers and similar
> >>>> constants in Linux to be unfortunate, especially for tools such as
> >>>> strace or QEMU's linux-user emulation).
> >>>>
> >>> Unlike syscalls different arches have very different ideas what
> >>> hypercalls they need to implement, so while with unified syscall space I
> >>> can see how it may benefit (very) small number of tools, I do not see
> >>> what advantage it will give us. The disadvantage is one more global name
> >>> space to manage.
> >>>
> >>>>>> Keeping it in a common place also makes it more visible to people
> >>>>>> looking to add new hcalls, which could cut down on reinventing the
> >>>>>> wheel.
> >>>>> I do not want other arches to start using hypercalls in the way
> >>>>> powerpc
> >>>>> started to use them: separate device io space, so it is better to hide
> >>>>> this as far away from common code as possible :) But on a more serious
> >>>>> note hypercalls should be a last resort and added only when no other
> >>>>> possibility exists, so people should not look what hcalls others
> >>>>> implemented, so they can add them to their favorite arch, but they
> >>>>> should have a problem at hand that they cannot solve without
> >>>>> hcall, but
> >>>>> at this point they will have pretty good idea what this hcall
> >>>>> should do.
> >>>>
> >>>> Why are hcalls such a bad thing?
> >>>>
> >>> Because they often used to do non architectural things making OSes
> >>> behave different from how they runs on real HW and real HW is what
> >>> OSes are designed and tested for. Example: there once was a KVM (XEN
> >>> have/had similar one) hypercall to accelerate MMU operation.  One thing it
> >>> allowed is to to flush tlb without doing IPI if vcpu is not running. Later
> >>> optimization was added to Linux MMU code that _relies_ on those IPIs for
> >>> synchronisation. Good that at that point those hypercalls were already
> >>> deprecated on KVM (IIRC XEN was broke for some time in that regard). Which
> >>> brings me to another point: they often get obsoleted by code improvement
> >>> and HW advancement (happened to aforementioned MMU hypercalls), but they
> >>> hard to deprecate if hypervisor supports live migration, without live
> >>> migration it is less of a problem. Next point is that people often try
> >>> to use them instead of emulate PV or real device just because they
> >>> think it is easier, but it is often not so. Example: pvpanic device was
> >>> initially proposed as hypercall, so lets say we would implement it as
> >>> such. It would have been KVM specific, implementation would touch core
> >>> guest KVM code and would have been Linux guest specific. Instead it was
> >>> implemented as platform device with very small platform driver confined
> >>> in drivers/ directory, immediately usable by XEN and QEMU tcg in addition
> >>
> >> This is actually a very good point. How do we support reboot and shutdown for TCG guests? We surely
> >> don't want to expose TCG as KVM hypervisor.
> >
> > Hmm...so are you proposing that we abandon the current approach,
> > and switch to a device-based mechanism for reboot/shutdown?
> 
> Reading Gleb's email it sounds like the more future proof approach, yes. I'm not quite sure yet where we
> should plug this though.

What do you mean...where the paravirt device would go in the physical
address map??

Stuart

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 17, 2013, 3:41 p.m. UTC | #14
On 17.07.2013, at 17:36, Yoder Stuart-B08248 wrote:

> 
> 
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Wednesday, July 17, 2013 10:21 AM
>> To: Yoder Stuart-B08248
>> Cc: Wood Scott-B07421; Bhushan Bharat-R65777; kvm@vger.kernel.org; kvm-ppc@vger.kernel.org; Gleb Natapov
>> Subject: Re: [PATCH 3/5] booke: define reset and shutdown hcalls
>> 
>> 
>> On 17.07.2013, at 17:19, Yoder Stuart-B08248 wrote:
>> 
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Alexander Graf [mailto:agraf@suse.de]
>>>> Sent: Wednesday, July 17, 2013 7:19 AM
>>>> To: Gleb Natapov
>>>> Cc: Wood Scott-B07421; Bhushan Bharat-R65777; kvm@vger.kernel.org; kvm-ppc@vger.kernel.org; Yoder
>>>> Stuart-B08248; Bhushan Bharat-R65777
>>>> Subject: Re: [PATCH 3/5] booke: define reset and shutdown hcalls
>>>> 
>>>> 
>>>> On 17.07.2013, at 13:00, Gleb Natapov wrote:
>>>> 
>>>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
>>>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
>>>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
>>>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
>>>>>>>>> There is no much sense to share hypercalls between architectures.
>>>>>>>>> There
>>>>>>>>> is zero probability x86 will implement those for instance
>>>>>>>> 
>>>>>>>> This is similar to the question of whether to keep device API
>>>>>>>> enumerations per-architecture...  It costs very little to keep it in
>>>>>>>> a common place, and it's hard to go back in the other direction if
>>>>>>>> we later realize there are things that should be shared.
>>>>>>>> 
>>>>>>> This is different from device API since with device API all arches
>>>>>>> have
>>>>>>> to create/destroy devices, so it make sense to put device lifecycle
>>>>>>> management into the common code, and device API has single entry point
>>>>>>> to the code - device fd ioctl - where it makes sense to handle common
>>>>>>> tasks, if any, and despatch others to specific device implementation.
>>>>>>> 
>>>>>>> This is totally unlike hypercalls which are, by definition, very
>>>>>>> architecture specific (the way they are triggered, the way parameter
>>>>>>> are passed from guest to host, what hypercalls arch needs...).
>>>>>> 
>>>>>> The ABI is architecture specific.  The API doesn't need to be, any
>>>>>> more than it does with syscalls (I consider the
>>>>>> architecture-specific definition of syscall numbers and similar
>>>>>> constants in Linux to be unfortunate, especially for tools such as
>>>>>> strace or QEMU's linux-user emulation).
>>>>>> 
>>>>> Unlike syscalls different arches have very different ideas what
>>>>> hypercalls they need to implement, so while with unified syscall space I
>>>>> can see how it may benefit (very) small number of tools, I do not see
>>>>> what advantage it will give us. The disadvantage is one more global name
>>>>> space to manage.
>>>>> 
>>>>>>>> Keeping it in a common place also makes it more visible to people
>>>>>>>> looking to add new hcalls, which could cut down on reinventing the
>>>>>>>> wheel.
>>>>>>> I do not want other arches to start using hypercalls in the way
>>>>>>> powerpc
>>>>>>> started to use them: separate device io space, so it is better to hide
>>>>>>> this as far away from common code as possible :) But on a more serious
>>>>>>> note hypercalls should be a last resort and added only when no other
>>>>>>> possibility exists, so people should not look what hcalls others
>>>>>>> implemented, so they can add them to their favorite arch, but they
>>>>>>> should have a problem at hand that they cannot solve without
>>>>>>> hcall, but
>>>>>>> at this point they will have pretty good idea what this hcall
>>>>>>> should do.
>>>>>> 
>>>>>> Why are hcalls such a bad thing?
>>>>>> 
>>>>> Because they often used to do non architectural things making OSes
>>>>> behave different from how they runs on real HW and real HW is what
>>>>> OSes are designed and tested for. Example: there once was a KVM (XEN
>>>>> have/had similar one) hypercall to accelerate MMU operation.  One thing it
>>>>> allowed is to to flush tlb without doing IPI if vcpu is not running. Later
>>>>> optimization was added to Linux MMU code that _relies_ on those IPIs for
>>>>> synchronisation. Good that at that point those hypercalls were already
>>>>> deprecated on KVM (IIRC XEN was broke for some time in that regard). Which
>>>>> brings me to another point: they often get obsoleted by code improvement
>>>>> and HW advancement (happened to aforementioned MMU hypercalls), but they
>>>>> hard to deprecate if hypervisor supports live migration, without live
>>>>> migration it is less of a problem. Next point is that people often try
>>>>> to use them instead of emulate PV or real device just because they
>>>>> think it is easier, but it is often not so. Example: pvpanic device was
>>>>> initially proposed as hypercall, so lets say we would implement it as
>>>>> such. It would have been KVM specific, implementation would touch core
>>>>> guest KVM code and would have been Linux guest specific. Instead it was
>>>>> implemented as platform device with very small platform driver confined
>>>>> in drivers/ directory, immediately usable by XEN and QEMU tcg in addition
>>>> 
>>>> This is actually a very good point. How do we support reboot and shutdown for TCG guests? We surely
>>>> don't want to expose TCG as KVM hypervisor.
>>> 
>>> Hmm...so are you proposing that we abandon the current approach,
>>> and switch to a device-based mechanism for reboot/shutdown?
>> 
>> Reading Gleb's email it sounds like the more future proof approach, yes. I'm not quite sure yet where we
>> should plug this though.
> 
> What do you mean...where the paravirt device would go in the physical
> address map??

Right. Either we

  - let the guest decide (PCI)
  - let QEMU decide, but potentially break the SoC layout (SysBus)
  - let QEMU decide, but only for the virt machine so that we don't break anyone (PlatBus)


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bharat Bhushan July 17, 2013, 3:47 p.m. UTC | #15
> >>>> On 17.07.2013, at 13:00, Gleb Natapov wrote:
> >>>>
> >>>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
> >>>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
> >>>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
> >>>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
> >>>>>>>>> There is no much sense to share hypercalls between architectures.
> >>>>>>>>> There
> >>>>>>>>> is zero probability x86 will implement those for instance
> >>>>>>>>
> >>>>>>>> This is similar to the question of whether to keep device API
> >>>>>>>> enumerations per-architecture...  It costs very little to keep
> >>>>>>>> it in a common place, and it's hard to go back in the other
> >>>>>>>> direction if we later realize there are things that should be shared.
> >>>>>>>>
> >>>>>>> This is different from device API since with device API all
> >>>>>>> arches have to create/destroy devices, so it make sense to put
> >>>>>>> device lifecycle management into the common code, and device API
> >>>>>>> has single entry point to the code - device fd ioctl - where it
> >>>>>>> makes sense to handle common tasks, if any, and despatch others
> >>>>>>> to specific device implementation.
> >>>>>>>
> >>>>>>> This is totally unlike hypercalls which are, by definition, very
> >>>>>>> architecture specific (the way they are triggered, the way
> >>>>>>> parameter are passed from guest to host, what hypercalls arch needs...).
> >>>>>>
> >>>>>> The ABI is architecture specific.  The API doesn't need to be,
> >>>>>> any more than it does with syscalls (I consider the
> >>>>>> architecture-specific definition of syscall numbers and similar
> >>>>>> constants in Linux to be unfortunate, especially for tools such
> >>>>>> as strace or QEMU's linux-user emulation).
> >>>>>>
> >>>>> Unlike syscalls different arches have very different ideas what
> >>>>> hypercalls they need to implement, so while with unified syscall
> >>>>> space I can see how it may benefit (very) small number of tools, I
> >>>>> do not see what advantage it will give us. The disadvantage is one
> >>>>> more global name space to manage.
> >>>>>
> >>>>>>>> Keeping it in a common place also makes it more visible to
> >>>>>>>> people looking to add new hcalls, which could cut down on
> >>>>>>>> reinventing the wheel.
> >>>>>>> I do not want other arches to start using hypercalls in the way
> >>>>>>> powerpc started to use them: separate device io space, so it is
> >>>>>>> better to hide this as far away from common code as possible :)
> >>>>>>> But on a more serious note hypercalls should be a last resort
> >>>>>>> and added only when no other possibility exists, so people
> >>>>>>> should not look what hcalls others implemented, so they can add
> >>>>>>> them to their favorite arch, but they should have a problem at
> >>>>>>> hand that they cannot solve without hcall, but at this point
> >>>>>>> they will have pretty good idea what this hcall should do.
> >>>>>>
> >>>>>> Why are hcalls such a bad thing?
> >>>>>>
> >>>>> Because they often used to do non architectural things making OSes
> >>>>> behave different from how they runs on real HW and real HW is what
> >>>>> OSes are designed and tested for. Example: there once was a KVM
> >>>>> (XEN have/had similar one) hypercall to accelerate MMU operation.
> >>>>> One thing it allowed is to to flush tlb without doing IPI if vcpu
> >>>>> is not running. Later optimization was added to Linux MMU code
> >>>>> that _relies_ on those IPIs for synchronisation. Good that at that
> >>>>> point those hypercalls were already deprecated on KVM (IIRC XEN
> >>>>> was broke for some time in that regard). Which brings me to
> >>>>> another point: they often get obsoleted by code improvement and HW
> >>>>> advancement (happened to aforementioned MMU hypercalls), but they
> >>>>> hard to deprecate if hypervisor supports live migration, without
> >>>>> live migration it is less of a problem. Next point is that people
> >>>>> often try to use them instead of emulate PV or real device just
> >>>>> because they think it is easier, but it is often not so. Example:
> >>>>> pvpanic device was initially proposed as hypercall, so lets say we
> >>>>> would implement it as such. It would have been KVM specific,
> >>>>> implementation would touch core guest KVM code and would have been
> >>>>> Linux guest specific. Instead it was implemented as platform
> >>>>> device with very small platform driver confined in drivers/
> >>>>> directory, immediately usable by XEN and QEMU tcg in addition
> >>>>
> >>>> This is actually a very good point. How do we support reboot and
> >>>> shutdown for TCG guests? We surely don't want to expose TCG as KVM
> hypervisor.
> >>>
> >>> Hmm...so are you proposing that we abandon the current approach, and
> >>> switch to a device-based mechanism for reboot/shutdown?
> >>
> >> Reading Gleb's email it sounds like the more future proof approach,
> >> yes. I'm not quite sure yet where we should plug this though.
> >
> > What do you mean...where the paravirt device would go in the physical
> > address map??
> 
> Right. Either we
> 
>   - let the guest decide (PCI)
>   - let QEMU decide, but potentially break the SoC layout (SysBus)
>   - let QEMU decide, but only for the virt machine so that we don't break anyone
> (PlatBus)

Can you please elaborate above two points ?

-Bharat

> 
> 
> Alex
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 17, 2013, 3:52 p.m. UTC | #16
On 17.07.2013, at 17:47, Bhushan Bharat-R65777 wrote:

> 
>>>>>> On 17.07.2013, at 13:00, Gleb Natapov wrote:
>>>>>> 
>>>>>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
>>>>>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
>>>>>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
>>>>>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
>>>>>>>>>>> There is no much sense to share hypercalls between architectures.
>>>>>>>>>>> There
>>>>>>>>>>> is zero probability x86 will implement those for instance
>>>>>>>>>> 
>>>>>>>>>> This is similar to the question of whether to keep device API
>>>>>>>>>> enumerations per-architecture...  It costs very little to keep
>>>>>>>>>> it in a common place, and it's hard to go back in the other
>>>>>>>>>> direction if we later realize there are things that should be shared.
>>>>>>>>>> 
>>>>>>>>> This is different from device API since with device API all
>>>>>>>>> arches have to create/destroy devices, so it make sense to put
>>>>>>>>> device lifecycle management into the common code, and device API
>>>>>>>>> has single entry point to the code - device fd ioctl - where it
>>>>>>>>> makes sense to handle common tasks, if any, and despatch others
>>>>>>>>> to specific device implementation.
>>>>>>>>> 
>>>>>>>>> This is totally unlike hypercalls which are, by definition, very
>>>>>>>>> architecture specific (the way they are triggered, the way
>>>>>>>>> parameter are passed from guest to host, what hypercalls arch needs...).
>>>>>>>> 
>>>>>>>> The ABI is architecture specific.  The API doesn't need to be,
>>>>>>>> any more than it does with syscalls (I consider the
>>>>>>>> architecture-specific definition of syscall numbers and similar
>>>>>>>> constants in Linux to be unfortunate, especially for tools such
>>>>>>>> as strace or QEMU's linux-user emulation).
>>>>>>>> 
>>>>>>> Unlike syscalls different arches have very different ideas what
>>>>>>> hypercalls they need to implement, so while with unified syscall
>>>>>>> space I can see how it may benefit (very) small number of tools, I
>>>>>>> do not see what advantage it will give us. The disadvantage is one
>>>>>>> more global name space to manage.
>>>>>>> 
>>>>>>>>>> Keeping it in a common place also makes it more visible to
>>>>>>>>>> people looking to add new hcalls, which could cut down on
>>>>>>>>>> reinventing the wheel.
>>>>>>>>> I do not want other arches to start using hypercalls in the way
>>>>>>>>> powerpc started to use them: separate device io space, so it is
>>>>>>>>> better to hide this as far away from common code as possible :)
>>>>>>>>> But on a more serious note hypercalls should be a last resort
>>>>>>>>> and added only when no other possibility exists, so people
>>>>>>>>> should not look what hcalls others implemented, so they can add
>>>>>>>>> them to their favorite arch, but they should have a problem at
>>>>>>>>> hand that they cannot solve without hcall, but at this point
>>>>>>>>> they will have pretty good idea what this hcall should do.
>>>>>>>> 
>>>>>>>> Why are hcalls such a bad thing?
>>>>>>>> 
>>>>>>> Because they often used to do non architectural things making OSes
>>>>>>> behave different from how they runs on real HW and real HW is what
>>>>>>> OSes are designed and tested for. Example: there once was a KVM
>>>>>>> (XEN have/had similar one) hypercall to accelerate MMU operation.
>>>>>>> One thing it allowed is to to flush tlb without doing IPI if vcpu
>>>>>>> is not running. Later optimization was added to Linux MMU code
>>>>>>> that _relies_ on those IPIs for synchronisation. Good that at that
>>>>>>> point those hypercalls were already deprecated on KVM (IIRC XEN
>>>>>>> was broke for some time in that regard). Which brings me to
>>>>>>> another point: they often get obsoleted by code improvement and HW
>>>>>>> advancement (happened to aforementioned MMU hypercalls), but they
>>>>>>> hard to deprecate if hypervisor supports live migration, without
>>>>>>> live migration it is less of a problem. Next point is that people
>>>>>>> often try to use them instead of emulate PV or real device just
>>>>>>> because they think it is easier, but it is often not so. Example:
>>>>>>> pvpanic device was initially proposed as hypercall, so lets say we
>>>>>>> would implement it as such. It would have been KVM specific,
>>>>>>> implementation would touch core guest KVM code and would have been
>>>>>>> Linux guest specific. Instead it was implemented as platform
>>>>>>> device with very small platform driver confined in drivers/
>>>>>>> directory, immediately usable by XEN and QEMU tcg in addition
>>>>>> 
>>>>>> This is actually a very good point. How do we support reboot and
>>>>>> shutdown for TCG guests? We surely don't want to expose TCG as KVM
>> hypervisor.
>>>>> 
>>>>> Hmm...so are you proposing that we abandon the current approach, and
>>>>> switch to a device-based mechanism for reboot/shutdown?
>>>> 
>>>> Reading Gleb's email it sounds like the more future proof approach,
>>>> yes. I'm not quite sure yet where we should plug this though.
>>> 
>>> What do you mean...where the paravirt device would go in the physical
>>> address map??
>> 
>> Right. Either we
>> 
>>  - let the guest decide (PCI)
>>  - let QEMU decide, but potentially break the SoC layout (SysBus)
>>  - let QEMU decide, but only for the virt machine so that we don't break anyone
>> (PlatBus)
> 
> Can you please elaborate above two points ?

If we emulate an MPC8544DS, we need to emulate an MPC8544DS. Any time we diverge from the layout of the original chip, things can break.

However, for our PV machine (-M ppce500 / e500plat) we don't care about real hardware layouts. We simply emulate a machine that is 100% described through the device tree. So guests that can't deal with the machine looking different from real hardware don't really matter anyways, since they'd already be broken.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bharat Bhushan July 17, 2013, 3:59 p.m. UTC | #17
> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Wednesday, July 17, 2013 9:22 PM
> To: Bhushan Bharat-R65777
> Cc: Yoder Stuart-B08248; Wood Scott-B07421; kvm@vger.kernel.org; kvm-
> ppc@vger.kernel.org; Gleb Natapov
> Subject: Re: [PATCH 3/5] booke: define reset and shutdown hcalls
> 
> 
> On 17.07.2013, at 17:47, Bhushan Bharat-R65777 wrote:
> 
> >
> >>>>>> On 17.07.2013, at 13:00, Gleb Natapov wrote:
> >>>>>>
> >>>>>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
> >>>>>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
> >>>>>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
> >>>>>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
> >>>>>>>>>>> There is no much sense to share hypercalls between architectures.
> >>>>>>>>>>> There
> >>>>>>>>>>> is zero probability x86 will implement those for instance
> >>>>>>>>>>
> >>>>>>>>>> This is similar to the question of whether to keep device API
> >>>>>>>>>> enumerations per-architecture...  It costs very little to
> >>>>>>>>>> keep it in a common place, and it's hard to go back in the
> >>>>>>>>>> other direction if we later realize there are things that should be
> shared.
> >>>>>>>>>>
> >>>>>>>>> This is different from device API since with device API all
> >>>>>>>>> arches have to create/destroy devices, so it make sense to put
> >>>>>>>>> device lifecycle management into the common code, and device
> >>>>>>>>> API has single entry point to the code - device fd ioctl -
> >>>>>>>>> where it makes sense to handle common tasks, if any, and
> >>>>>>>>> despatch others to specific device implementation.
> >>>>>>>>>
> >>>>>>>>> This is totally unlike hypercalls which are, by definition,
> >>>>>>>>> very architecture specific (the way they are triggered, the
> >>>>>>>>> way parameter are passed from guest to host, what hypercalls arch
> needs...).
> >>>>>>>>
> >>>>>>>> The ABI is architecture specific.  The API doesn't need to be,
> >>>>>>>> any more than it does with syscalls (I consider the
> >>>>>>>> architecture-specific definition of syscall numbers and similar
> >>>>>>>> constants in Linux to be unfortunate, especially for tools such
> >>>>>>>> as strace or QEMU's linux-user emulation).
> >>>>>>>>
> >>>>>>> Unlike syscalls different arches have very different ideas what
> >>>>>>> hypercalls they need to implement, so while with unified syscall
> >>>>>>> space I can see how it may benefit (very) small number of tools,
> >>>>>>> I do not see what advantage it will give us. The disadvantage is
> >>>>>>> one more global name space to manage.
> >>>>>>>
> >>>>>>>>>> Keeping it in a common place also makes it more visible to
> >>>>>>>>>> people looking to add new hcalls, which could cut down on
> >>>>>>>>>> reinventing the wheel.
> >>>>>>>>> I do not want other arches to start using hypercalls in the
> >>>>>>>>> way powerpc started to use them: separate device io space, so
> >>>>>>>>> it is better to hide this as far away from common code as
> >>>>>>>>> possible :) But on a more serious note hypercalls should be a
> >>>>>>>>> last resort and added only when no other possibility exists,
> >>>>>>>>> so people should not look what hcalls others implemented, so
> >>>>>>>>> they can add them to their favorite arch, but they should have
> >>>>>>>>> a problem at hand that they cannot solve without hcall, but at
> >>>>>>>>> this point they will have pretty good idea what this hcall should do.
> >>>>>>>>
> >>>>>>>> Why are hcalls such a bad thing?
> >>>>>>>>
> >>>>>>> Because they often used to do non architectural things making
> >>>>>>> OSes behave different from how they runs on real HW and real HW
> >>>>>>> is what OSes are designed and tested for. Example: there once
> >>>>>>> was a KVM (XEN have/had similar one) hypercall to accelerate MMU
> operation.
> >>>>>>> One thing it allowed is to to flush tlb without doing IPI if
> >>>>>>> vcpu is not running. Later optimization was added to Linux MMU
> >>>>>>> code that _relies_ on those IPIs for synchronisation. Good that
> >>>>>>> at that point those hypercalls were already deprecated on KVM
> >>>>>>> (IIRC XEN was broke for some time in that regard). Which brings
> >>>>>>> me to another point: they often get obsoleted by code
> >>>>>>> improvement and HW advancement (happened to aforementioned MMU
> >>>>>>> hypercalls), but they hard to deprecate if hypervisor supports
> >>>>>>> live migration, without live migration it is less of a problem.
> >>>>>>> Next point is that people often try to use them instead of
> >>>>>>> emulate PV or real device just because they think it is easier, but it
> is often not so. Example:
> >>>>>>> pvpanic device was initially proposed as hypercall, so lets say
> >>>>>>> we would implement it as such. It would have been KVM specific,
> >>>>>>> implementation would touch core guest KVM code and would have
> >>>>>>> been Linux guest specific. Instead it was implemented as
> >>>>>>> platform device with very small platform driver confined in
> >>>>>>> drivers/ directory, immediately usable by XEN and QEMU tcg in
> >>>>>>> addition
> >>>>>>
> >>>>>> This is actually a very good point. How do we support reboot and
> >>>>>> shutdown for TCG guests? We surely don't want to expose TCG as
> >>>>>> KVM
> >> hypervisor.
> >>>>>
> >>>>> Hmm...so are you proposing that we abandon the current approach,
> >>>>> and switch to a device-based mechanism for reboot/shutdown?
> >>>>
> >>>> Reading Gleb's email it sounds like the more future proof approach,
> >>>> yes. I'm not quite sure yet where we should plug this though.
> >>>
> >>> What do you mean...where the paravirt device would go in the
> >>> physical address map??
> >>
> >> Right. Either we
> >>
> >>  - let the guest decide (PCI)
> >>  - let QEMU decide, but potentially break the SoC layout (SysBus)
> >>  - let QEMU decide, but only for the virt machine so that we don't
> >> break anyone
> >> (PlatBus)
> >
> > Can you please elaborate above two points ?
> 
> If we emulate an MPC8544DS, we need to emulate an MPC8544DS. Any time we diverge
> from the layout of the original chip, things can break.
> 
> However, for our PV machine (-M ppce500 / e500plat) we don't care about real
> hardware layouts. We simply emulate a machine that is 100% described through the
> device tree. So guests that can't deal with the machine looking different from
> real hardware don't really matter anyways, since they'd already be broken.
> 

Ah, so we can choose any address range in ccsr space of a PV machine (-M ppce500 / e500plat). What about MPC8544DS machine?.

So what is preferred way, vitio-reset/shutdown device or the above mentioned ?
 
Thanks
-Bharat


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 17, 2013, 4:04 p.m. UTC | #18
On 17.07.2013, at 17:59, Bhushan Bharat-R65777 wrote:

> 
> 
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Wednesday, July 17, 2013 9:22 PM
>> To: Bhushan Bharat-R65777
>> Cc: Yoder Stuart-B08248; Wood Scott-B07421; kvm@vger.kernel.org; kvm-
>> ppc@vger.kernel.org; Gleb Natapov
>> Subject: Re: [PATCH 3/5] booke: define reset and shutdown hcalls
>> 
>> 
>> On 17.07.2013, at 17:47, Bhushan Bharat-R65777 wrote:
>> 
>>> 
>>>>>>>> On 17.07.2013, at 13:00, Gleb Natapov wrote:
>>>>>>>> 
>>>>>>>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
>>>>>>>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
>>>>>>>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
>>>>>>>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
>>>>>>>>>>>>> There is no much sense to share hypercalls between architectures.
>>>>>>>>>>>>> There
>>>>>>>>>>>>> is zero probability x86 will implement those for instance
>>>>>>>>>>>> 
>>>>>>>>>>>> This is similar to the question of whether to keep device API
>>>>>>>>>>>> enumerations per-architecture...  It costs very little to
>>>>>>>>>>>> keep it in a common place, and it's hard to go back in the
>>>>>>>>>>>> other direction if we later realize there are things that should be
>> shared.
>>>>>>>>>>>> 
>>>>>>>>>>> This is different from device API since with device API all
>>>>>>>>>>> arches have to create/destroy devices, so it make sense to put
>>>>>>>>>>> device lifecycle management into the common code, and device
>>>>>>>>>>> API has single entry point to the code - device fd ioctl -
>>>>>>>>>>> where it makes sense to handle common tasks, if any, and
>>>>>>>>>>> despatch others to specific device implementation.
>>>>>>>>>>> 
>>>>>>>>>>> This is totally unlike hypercalls which are, by definition,
>>>>>>>>>>> very architecture specific (the way they are triggered, the
>>>>>>>>>>> way parameter are passed from guest to host, what hypercalls arch
>> needs...).
>>>>>>>>>> 
>>>>>>>>>> The ABI is architecture specific.  The API doesn't need to be,
>>>>>>>>>> any more than it does with syscalls (I consider the
>>>>>>>>>> architecture-specific definition of syscall numbers and similar
>>>>>>>>>> constants in Linux to be unfortunate, especially for tools such
>>>>>>>>>> as strace or QEMU's linux-user emulation).
>>>>>>>>>> 
>>>>>>>>> Unlike syscalls different arches have very different ideas what
>>>>>>>>> hypercalls they need to implement, so while with unified syscall
>>>>>>>>> space I can see how it may benefit (very) small number of tools,
>>>>>>>>> I do not see what advantage it will give us. The disadvantage is
>>>>>>>>> one more global name space to manage.
>>>>>>>>> 
>>>>>>>>>>>> Keeping it in a common place also makes it more visible to
>>>>>>>>>>>> people looking to add new hcalls, which could cut down on
>>>>>>>>>>>> reinventing the wheel.
>>>>>>>>>>> I do not want other arches to start using hypercalls in the
>>>>>>>>>>> way powerpc started to use them: separate device io space, so
>>>>>>>>>>> it is better to hide this as far away from common code as
>>>>>>>>>>> possible :) But on a more serious note hypercalls should be a
>>>>>>>>>>> last resort and added only when no other possibility exists,
>>>>>>>>>>> so people should not look what hcalls others implemented, so
>>>>>>>>>>> they can add them to their favorite arch, but they should have
>>>>>>>>>>> a problem at hand that they cannot solve without hcall, but at
>>>>>>>>>>> this point they will have pretty good idea what this hcall should do.
>>>>>>>>>> 
>>>>>>>>>> Why are hcalls such a bad thing?
>>>>>>>>>> 
>>>>>>>>> Because they often used to do non architectural things making
>>>>>>>>> OSes behave different from how they runs on real HW and real HW
>>>>>>>>> is what OSes are designed and tested for. Example: there once
>>>>>>>>> was a KVM (XEN have/had similar one) hypercall to accelerate MMU
>> operation.
>>>>>>>>> One thing it allowed is to to flush tlb without doing IPI if
>>>>>>>>> vcpu is not running. Later optimization was added to Linux MMU
>>>>>>>>> code that _relies_ on those IPIs for synchronisation. Good that
>>>>>>>>> at that point those hypercalls were already deprecated on KVM
>>>>>>>>> (IIRC XEN was broke for some time in that regard). Which brings
>>>>>>>>> me to another point: they often get obsoleted by code
>>>>>>>>> improvement and HW advancement (happened to aforementioned MMU
>>>>>>>>> hypercalls), but they hard to deprecate if hypervisor supports
>>>>>>>>> live migration, without live migration it is less of a problem.
>>>>>>>>> Next point is that people often try to use them instead of
>>>>>>>>> emulate PV or real device just because they think it is easier, but it
>> is often not so. Example:
>>>>>>>>> pvpanic device was initially proposed as hypercall, so lets say
>>>>>>>>> we would implement it as such. It would have been KVM specific,
>>>>>>>>> implementation would touch core guest KVM code and would have
>>>>>>>>> been Linux guest specific. Instead it was implemented as
>>>>>>>>> platform device with very small platform driver confined in
>>>>>>>>> drivers/ directory, immediately usable by XEN and QEMU tcg in
>>>>>>>>> addition
>>>>>>>> 
>>>>>>>> This is actually a very good point. How do we support reboot and
>>>>>>>> shutdown for TCG guests? We surely don't want to expose TCG as
>>>>>>>> KVM
>>>> hypervisor.
>>>>>>> 
>>>>>>> Hmm...so are you proposing that we abandon the current approach,
>>>>>>> and switch to a device-based mechanism for reboot/shutdown?
>>>>>> 
>>>>>> Reading Gleb's email it sounds like the more future proof approach,
>>>>>> yes. I'm not quite sure yet where we should plug this though.
>>>>> 
>>>>> What do you mean...where the paravirt device would go in the
>>>>> physical address map??
>>>> 
>>>> Right. Either we
>>>> 
>>>> - let the guest decide (PCI)
>>>> - let QEMU decide, but potentially break the SoC layout (SysBus)
>>>> - let QEMU decide, but only for the virt machine so that we don't
>>>> break anyone
>>>> (PlatBus)
>>> 
>>> Can you please elaborate above two points ?
>> 
>> If we emulate an MPC8544DS, we need to emulate an MPC8544DS. Any time we diverge
>> from the layout of the original chip, things can break.
>> 
>> However, for our PV machine (-M ppce500 / e500plat) we don't care about real
>> hardware layouts. We simply emulate a machine that is 100% described through the
>> device tree. So guests that can't deal with the machine looking different from
>> real hardware don't really matter anyways, since they'd already be broken.
>> 
> 
> Ah, so we can choose any address range in ccsr space of a PV machine (-M ppce500 / e500plat).

No, we don't put it in CCSR space. It'd just be orthogonal to CCSR.

> What about MPC8544DS machine?.

I guess we'll have to live with GUTS there.

> So what is preferred way, vitio-reset/shutdown device or the above mentioned ?

A virtio device would clutter our PCI space which we're already pretty tight on. So I'd personally prefer the above mentioned.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bharat Bhushan July 17, 2013, 4:21 p.m. UTC | #19
> >>>>>>>> On 17.07.2013, at 13:00, Gleb Natapov wrote:
> >>>>>>>>
> >>>>>>>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
> >>>>>>>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
> >>>>>>>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
> >>>>>>>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
> >>>>>>>>>>>>> There is no much sense to share hypercalls between architectures.
> >>>>>>>>>>>>> There
> >>>>>>>>>>>>> is zero probability x86 will implement those for instance
> >>>>>>>>>>>>
> >>>>>>>>>>>> This is similar to the question of whether to keep device
> >>>>>>>>>>>> API enumerations per-architecture...  It costs very little
> >>>>>>>>>>>> to keep it in a common place, and it's hard to go back in
> >>>>>>>>>>>> the other direction if we later realize there are things
> >>>>>>>>>>>> that should be
> >> shared.
> >>>>>>>>>>>>
> >>>>>>>>>>> This is different from device API since with device API all
> >>>>>>>>>>> arches have to create/destroy devices, so it make sense to
> >>>>>>>>>>> put device lifecycle management into the common code, and
> >>>>>>>>>>> device API has single entry point to the code - device fd
> >>>>>>>>>>> ioctl - where it makes sense to handle common tasks, if any,
> >>>>>>>>>>> and despatch others to specific device implementation.
> >>>>>>>>>>>
> >>>>>>>>>>> This is totally unlike hypercalls which are, by definition,
> >>>>>>>>>>> very architecture specific (the way they are triggered, the
> >>>>>>>>>>> way parameter are passed from guest to host, what hypercalls
> >>>>>>>>>>> arch
> >> needs...).
> >>>>>>>>>>
> >>>>>>>>>> The ABI is architecture specific.  The API doesn't need to
> >>>>>>>>>> be, any more than it does with syscalls (I consider the
> >>>>>>>>>> architecture-specific definition of syscall numbers and
> >>>>>>>>>> similar constants in Linux to be unfortunate, especially for
> >>>>>>>>>> tools such as strace or QEMU's linux-user emulation).
> >>>>>>>>>>
> >>>>>>>>> Unlike syscalls different arches have very different ideas
> >>>>>>>>> what hypercalls they need to implement, so while with unified
> >>>>>>>>> syscall space I can see how it may benefit (very) small number
> >>>>>>>>> of tools, I do not see what advantage it will give us. The
> >>>>>>>>> disadvantage is one more global name space to manage.
> >>>>>>>>>
> >>>>>>>>>>>> Keeping it in a common place also makes it more visible to
> >>>>>>>>>>>> people looking to add new hcalls, which could cut down on
> >>>>>>>>>>>> reinventing the wheel.
> >>>>>>>>>>> I do not want other arches to start using hypercalls in the
> >>>>>>>>>>> way powerpc started to use them: separate device io space,
> >>>>>>>>>>> so it is better to hide this as far away from common code as
> >>>>>>>>>>> possible :) But on a more serious note hypercalls should be
> >>>>>>>>>>> a last resort and added only when no other possibility
> >>>>>>>>>>> exists, so people should not look what hcalls others
> >>>>>>>>>>> implemented, so they can add them to their favorite arch,
> >>>>>>>>>>> but they should have a problem at hand that they cannot
> >>>>>>>>>>> solve without hcall, but at this point they will have pretty good
> idea what this hcall should do.
> >>>>>>>>>>
> >>>>>>>>>> Why are hcalls such a bad thing?
> >>>>>>>>>>
> >>>>>>>>> Because they often used to do non architectural things making
> >>>>>>>>> OSes behave different from how they runs on real HW and real
> >>>>>>>>> HW is what OSes are designed and tested for. Example: there
> >>>>>>>>> once was a KVM (XEN have/had similar one) hypercall to
> >>>>>>>>> accelerate MMU
> >> operation.
> >>>>>>>>> One thing it allowed is to to flush tlb without doing IPI if
> >>>>>>>>> vcpu is not running. Later optimization was added to Linux MMU
> >>>>>>>>> code that _relies_ on those IPIs for synchronisation. Good
> >>>>>>>>> that at that point those hypercalls were already deprecated on
> >>>>>>>>> KVM (IIRC XEN was broke for some time in that regard). Which
> >>>>>>>>> brings me to another point: they often get obsoleted by code
> >>>>>>>>> improvement and HW advancement (happened to aforementioned MMU
> >>>>>>>>> hypercalls), but they hard to deprecate if hypervisor supports
> >>>>>>>>> live migration, without live migration it is less of a problem.
> >>>>>>>>> Next point is that people often try to use them instead of
> >>>>>>>>> emulate PV or real device just because they think it is
> >>>>>>>>> easier, but it
> >> is often not so. Example:
> >>>>>>>>> pvpanic device was initially proposed as hypercall, so lets
> >>>>>>>>> say we would implement it as such. It would have been KVM
> >>>>>>>>> specific, implementation would touch core guest KVM code and
> >>>>>>>>> would have been Linux guest specific. Instead it was
> >>>>>>>>> implemented as platform device with very small platform driver
> >>>>>>>>> confined in drivers/ directory, immediately usable by XEN and
> >>>>>>>>> QEMU tcg in addition
> >>>>>>>>
> >>>>>>>> This is actually a very good point. How do we support reboot
> >>>>>>>> and shutdown for TCG guests? We surely don't want to expose TCG
> >>>>>>>> as KVM
> >>>> hypervisor.
> >>>>>>>
> >>>>>>> Hmm...so are you proposing that we abandon the current approach,
> >>>>>>> and switch to a device-based mechanism for reboot/shutdown?
> >>>>>>
> >>>>>> Reading Gleb's email it sounds like the more future proof
> >>>>>> approach, yes. I'm not quite sure yet where we should plug this though.
> >>>>>
> >>>>> What do you mean...where the paravirt device would go in the
> >>>>> physical address map??
> >>>>
> >>>> Right. Either we
> >>>>
> >>>> - let the guest decide (PCI)
> >>>> - let QEMU decide, but potentially break the SoC layout (SysBus)
> >>>> - let QEMU decide, but only for the virt machine so that we don't
> >>>> break anyone
> >>>> (PlatBus)
> >>>
> >>> Can you please elaborate above two points ?
> >>
> >> If we emulate an MPC8544DS, we need to emulate an MPC8544DS. Any time
> >> we diverge from the layout of the original chip, things can break.
> >>
> >> However, for our PV machine (-M ppce500 / e500plat) we don't care
> >> about real hardware layouts. We simply emulate a machine that is 100%
> >> described through the device tree. So guests that can't deal with the
> >> machine looking different from real hardware don't really matter anyways,
> since they'd already be broken.
> >>
> >
> > Ah, so we can choose any address range in ccsr space of a PV machine (-M
> ppce500 / e500plat).
> 
> No, we don't put it in CCSR space. It'd just be orthogonal to CCSR.

All devices are represented in guest device tree, so how we will represent this device in guest Device Tree?

-Bharat

> 
> > What about MPC8544DS machine?.
> 
> I guess we'll have to live with GUTS there.
> 
> > So what is preferred way, vitio-reset/shutdown device or the above mentioned ?
> 
> A virtio device would clutter our PCI space which we're already pretty tight on.
> So I'd personally prefer the above mentioned.
> 
> 
> Alex
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 17, 2013, 4:23 p.m. UTC | #20
On 17.07.2013, at 18:21, Bhushan Bharat-R65777 wrote:

>>>>>>>>>> On 17.07.2013, at 13:00, Gleb Natapov wrote:
>>>>>>>>>> 
>>>>>>>>>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
>>>>>>>>>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
>>>>>>>>>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
>>>>>>>>>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
>>>>>>>>>>>>>>> There is no much sense to share hypercalls between architectures.
>>>>>>>>>>>>>>> There
>>>>>>>>>>>>>>> is zero probability x86 will implement those for instance
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This is similar to the question of whether to keep device
>>>>>>>>>>>>>> API enumerations per-architecture...  It costs very little
>>>>>>>>>>>>>> to keep it in a common place, and it's hard to go back in
>>>>>>>>>>>>>> the other direction if we later realize there are things
>>>>>>>>>>>>>> that should be
>>>> shared.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> This is different from device API since with device API all
>>>>>>>>>>>>> arches have to create/destroy devices, so it make sense to
>>>>>>>>>>>>> put device lifecycle management into the common code, and
>>>>>>>>>>>>> device API has single entry point to the code - device fd
>>>>>>>>>>>>> ioctl - where it makes sense to handle common tasks, if any,
>>>>>>>>>>>>> and despatch others to specific device implementation.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This is totally unlike hypercalls which are, by definition,
>>>>>>>>>>>>> very architecture specific (the way they are triggered, the
>>>>>>>>>>>>> way parameter are passed from guest to host, what hypercalls
>>>>>>>>>>>>> arch
>>>> needs...).
>>>>>>>>>>>> 
>>>>>>>>>>>> The ABI is architecture specific.  The API doesn't need to
>>>>>>>>>>>> be, any more than it does with syscalls (I consider the
>>>>>>>>>>>> architecture-specific definition of syscall numbers and
>>>>>>>>>>>> similar constants in Linux to be unfortunate, especially for
>>>>>>>>>>>> tools such as strace or QEMU's linux-user emulation).
>>>>>>>>>>>> 
>>>>>>>>>>> Unlike syscalls different arches have very different ideas
>>>>>>>>>>> what hypercalls they need to implement, so while with unified
>>>>>>>>>>> syscall space I can see how it may benefit (very) small number
>>>>>>>>>>> of tools, I do not see what advantage it will give us. The
>>>>>>>>>>> disadvantage is one more global name space to manage.
>>>>>>>>>>> 
>>>>>>>>>>>>>> Keeping it in a common place also makes it more visible to
>>>>>>>>>>>>>> people looking to add new hcalls, which could cut down on
>>>>>>>>>>>>>> reinventing the wheel.
>>>>>>>>>>>>> I do not want other arches to start using hypercalls in the
>>>>>>>>>>>>> way powerpc started to use them: separate device io space,
>>>>>>>>>>>>> so it is better to hide this as far away from common code as
>>>>>>>>>>>>> possible :) But on a more serious note hypercalls should be
>>>>>>>>>>>>> a last resort and added only when no other possibility
>>>>>>>>>>>>> exists, so people should not look what hcalls others
>>>>>>>>>>>>> implemented, so they can add them to their favorite arch,
>>>>>>>>>>>>> but they should have a problem at hand that they cannot
>>>>>>>>>>>>> solve without hcall, but at this point they will have pretty good
>> idea what this hcall should do.
>>>>>>>>>>>> 
>>>>>>>>>>>> Why are hcalls such a bad thing?
>>>>>>>>>>>> 
>>>>>>>>>>> Because they often used to do non architectural things making
>>>>>>>>>>> OSes behave different from how they runs on real HW and real
>>>>>>>>>>> HW is what OSes are designed and tested for. Example: there
>>>>>>>>>>> once was a KVM (XEN have/had similar one) hypercall to
>>>>>>>>>>> accelerate MMU
>>>> operation.
>>>>>>>>>>> One thing it allowed is to to flush tlb without doing IPI if
>>>>>>>>>>> vcpu is not running. Later optimization was added to Linux MMU
>>>>>>>>>>> code that _relies_ on those IPIs for synchronisation. Good
>>>>>>>>>>> that at that point those hypercalls were already deprecated on
>>>>>>>>>>> KVM (IIRC XEN was broke for some time in that regard). Which
>>>>>>>>>>> brings me to another point: they often get obsoleted by code
>>>>>>>>>>> improvement and HW advancement (happened to aforementioned MMU
>>>>>>>>>>> hypercalls), but they hard to deprecate if hypervisor supports
>>>>>>>>>>> live migration, without live migration it is less of a problem.
>>>>>>>>>>> Next point is that people often try to use them instead of
>>>>>>>>>>> emulate PV or real device just because they think it is
>>>>>>>>>>> easier, but it
>>>> is often not so. Example:
>>>>>>>>>>> pvpanic device was initially proposed as hypercall, so lets
>>>>>>>>>>> say we would implement it as such. It would have been KVM
>>>>>>>>>>> specific, implementation would touch core guest KVM code and
>>>>>>>>>>> would have been Linux guest specific. Instead it was
>>>>>>>>>>> implemented as platform device with very small platform driver
>>>>>>>>>>> confined in drivers/ directory, immediately usable by XEN and
>>>>>>>>>>> QEMU tcg in addition
>>>>>>>>>> 
>>>>>>>>>> This is actually a very good point. How do we support reboot
>>>>>>>>>> and shutdown for TCG guests? We surely don't want to expose TCG
>>>>>>>>>> as KVM
>>>>>> hypervisor.
>>>>>>>>> 
>>>>>>>>> Hmm...so are you proposing that we abandon the current approach,
>>>>>>>>> and switch to a device-based mechanism for reboot/shutdown?
>>>>>>>> 
>>>>>>>> Reading Gleb's email it sounds like the more future proof
>>>>>>>> approach, yes. I'm not quite sure yet where we should plug this though.
>>>>>>> 
>>>>>>> What do you mean...where the paravirt device would go in the
>>>>>>> physical address map??
>>>>>> 
>>>>>> Right. Either we
>>>>>> 
>>>>>> - let the guest decide (PCI)
>>>>>> - let QEMU decide, but potentially break the SoC layout (SysBus)
>>>>>> - let QEMU decide, but only for the virt machine so that we don't
>>>>>> break anyone
>>>>>> (PlatBus)
>>>>> 
>>>>> Can you please elaborate above two points ?
>>>> 
>>>> If we emulate an MPC8544DS, we need to emulate an MPC8544DS. Any time
>>>> we diverge from the layout of the original chip, things can break.
>>>> 
>>>> However, for our PV machine (-M ppce500 / e500plat) we don't care
>>>> about real hardware layouts. We simply emulate a machine that is 100%
>>>> described through the device tree. So guests that can't deal with the
>>>> machine looking different from real hardware don't really matter anyways,
>> since they'd already be broken.
>>>> 
>>> 
>>> Ah, so we can choose any address range in ccsr space of a PV machine (-M
>> ppce500 / e500plat).
>> 
>> No, we don't put it in CCSR space. It'd just be orthogonal to CCSR.
> 
> All devices are represented in guest device tree, so how we will represent this device in guest Device Tree?

Not inside of the CCSR node :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood July 17, 2013, 4:59 p.m. UTC | #21
On 07/17/2013 11:04:08 AM, Alexander Graf wrote:
> 
> On 17.07.2013, at 17:59, Bhushan Bharat-R65777 wrote:
> 
> > Ah, so we can choose any address range in ccsr space of a PV  
> machine (-M ppce500 / e500plat).
> 
> No, we don't put it in CCSR space. It'd just be orthogonal to CCSR.

I'd rather we put it in CCSR, especially if/when we implement LAWs and  
CCSRBAR which gives the guest control of its address space.

-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 17, 2013, 5:05 p.m. UTC | #22
On 17.07.2013, at 18:59, Scott Wood wrote:

> On 07/17/2013 11:04:08 AM, Alexander Graf wrote:
>> On 17.07.2013, at 17:59, Bhushan Bharat-R65777 wrote:
>> > Ah, so we can choose any address range in ccsr space of a PV machine (-M ppce500 / e500plat).
>> No, we don't put it in CCSR space. It'd just be orthogonal to CCSR.
> 
> I'd rather we put it in CCSR, especially if/when we implement LAWs and CCSRBAR which gives the guest control of its address space.

Do we have space in CCSR?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood July 17, 2013, 5:09 p.m. UTC | #23
On 07/17/2013 12:05:41 PM, Alexander Graf wrote:
> 
> On 17.07.2013, at 18:59, Scott Wood wrote:
> 
> > On 07/17/2013 11:04:08 AM, Alexander Graf wrote:
> >> On 17.07.2013, at 17:59, Bhushan Bharat-R65777 wrote:
> >> > Ah, so we can choose any address range in ccsr space of a PV  
> machine (-M ppce500 / e500plat).
> >> No, we don't put it in CCSR space. It'd just be orthogonal to CCSR.
> >
> > I'd rather we put it in CCSR, especially if/when we implement LAWs  
> and CCSRBAR which gives the guest control of its address space.
> 
> Do we have space in CCSR?

Sure.  Even on real hardware there are gaps, and on the paravirt  
platform we have loads of space. :-)

This does raise the question of what compatible the ccsr node should  
have on the paravirt platform.  Currently it's still labelled  
"fsl,mpc8544-immr", which is clearly wrong.  If we add CCSRBAR support,  
should the paravirt platform have it as well?  If so, what is the size  
of CCSR on the paravirt platform?

-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
index ea113b5..58acdc1 100644
--- a/Documentation/virtual/kvm/hypercalls.txt
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -64,3 +64,19 @@  Purpose: To enable communication between the hypervisor and guest there is a
 shared page that contains parts of supervisor visible register state.
 The guest can map this shared page to access its supervisor register through
 memory using this hypercall.
+
+5. KVM_HC_VM_RESET
+------------------------
+Architecture: PPC
+Status: active
+Purpose:  Requests that the virtual machine be reset.  The hcall takes no
+arguments. If successful the hcall does not return. If an error occurs it
+returns EV_INTERNAL.
+
+6. KVM_HC_VM_SHUTDOWN
+------------------------
+Architecture: PPC
+Status: active
+Purpose: Requests that the virtual machine be powered-off/halted.
+The hcall takes no arguments. If successful the hcall does not return.
+If an error occurs it returns EV_INTERNAL.
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index cea2c5c..218882d 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -19,7 +19,8 @@ 
 #define KVM_HC_MMU_OP			2
 #define KVM_HC_FEATURES			3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
-
+#define KVM_HC_VM_RESET			5
+#define KVM_HC_VM_SHUTDOWN		6
 /*
  * hypercalls use architecture specific
  */