[RFC,v1,4/8] x86/init: add linker table support

Message ID	20160120210014.GF4769@char.us.oracle.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> Date: Wed, 20 Jan 2016 16:00:14 -0500 From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> To: "Luis R. Rodriguez" <mcgrof@do-not-panic.com> Message-ID: <20160120210014.GF4769@char.us.oracle.com> References: <1450217797-19295-1-git-send-email-mcgrof@do-not-panic.com> <1450217797-19295-5-git-send-email-mcgrof@do-not-panic.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1450217797-19295-5-git-send-email-mcgrof@do-not-panic.com> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: peter.senna@gmail.com, ryabinin.a.a@gmail.com, JBeulich@suse.com, hpa@zytor.com, qiuxishi@huawei.com, boris.ostrovsky@oracle.com, xen-devel@lists.xensource.com, joro@8bytes.org, x86@kernel.org, mingo@redhat.com, aryabinin@virtuozzo.com, mchehab@osg.samsung.com, andreyknvl@google.com, mcgrof@suse.com, rusty@rustcorp.com.au, bp@alien8.de, tglx@linutronix.de, mcb30@ipxe.org, valentinrothberg@gmail.com, jgross@suse.com, linux-kernel@vger.kernel.org, luto@amacapital.net, long.wanglong@huawei.com Subject: Re: [Xen-devel] [RFC v1 4/8] x86/init: add linker table support Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org

Konrad Rzeszutek Wilk Jan. 20, 2016, 9 p.m. UTC

> +static bool x86_init_fn_supports_subarch(struct x86_init_fn *fn)
> +{
> +	if (!fn->supp_hardware_subarch) {
> +		pr_err("Init sequence fails to declares any supported subarchs: %pF\n", fn->early_init);
> +		WARN_ON(1);
> +	}
> +	if (BIT(boot_params.hdr.hardware_subarch) & fn->supp_hardware_subarch)
> +		return true;
> +	return false;
> +}

So the logic for this working is that boot_params.hdr.hardware_subarch

And the macros define two: BIT(X86_SUBARCH_PC) or BIT(X86_SUBARCH_XEN).

But hardware_subarch by default is set to zero. Which means if GRUB2, PXELinux, Xen multiboot1
don't set it - then the X86_SUBARCH_PC is choosen right?

 1 << 0 & 1 << X86_SUBARCH_PC (which is zero).

For this to nicely work with Xen it ought to do this:



?

Luis R. Rodriguez Jan. 20, 2016, 9:33 p.m. UTC | #1

On Wed, Jan 20, 2016 at 1:00 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> +static bool x86_init_fn_supports_subarch(struct x86_init_fn *fn)
>> +{
>> +     if (!fn->supp_hardware_subarch) {
>> +             pr_err("Init sequence fails to declares any supported subarchs: %pF\n", fn->early_init);
>> +             WARN_ON(1);
>> +     }
>> +     if (BIT(boot_params.hdr.hardware_subarch) & fn->supp_hardware_subarch)
>> +             return true;
>> +     return false;
>> +}
>
> So the logic for this working is that boot_params.hdr.hardware_subarch
>
> And the macros define two: BIT(X86_SUBARCH_PC) or BIT(X86_SUBARCH_XEN).
>
> But hardware_subarch by default is set to zero. Which means if GRUB2, PXELinux, Xen multiboot1
> don't set it - then the X86_SUBARCH_PC is choosen right?
>
>  1 << 0 & 1 << X86_SUBARCH_PC (which is zero).
>
> For this to nicely work with Xen it ought to do this:
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 993b7a7..6cf9afd 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1676,6 +1676,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
>         boot_params.hdr.ramdisk_image = initrd_start;
>         boot_params.hdr.ramdisk_size = xen_start_info->mod_len;
>         boot_params.hdr.cmd_line_ptr = __pa(xen_start_info->cmd_line);
> +       boot_params.hdr.hardware_subarch = X86_SUBARCH_XEN;
>
>         if (!xen_initial_domain()) {
>                 add_preferred_console("xenboot", 0, NULL);
>
>
> ?

That's correct for PV and PVH, likewise when qemu is required for HVM
qemu could set it. I have the qemu change done but that should only
cover HVM. A common place to set this as well could be the hypervisor,
but currently the hypervisor doesn't set any boot_params, instead a
generic struct is passed and the kernel code (for any OS) is expected
to interpret this and then set the required values for the OS in the
init path. Long term though if we wanted to merge init further one way
could be to have the hypervisor just set the zero page cleanly for the
different modes. If we needed more data other than the
hardware_subarch we also have the hardware_subarch_data, that's a u64
, and how that is used would be up to the subarch. In Xen's case it
could do what it wants with it. That would still mean perhaps defining
as part of a Xen boot protocol a place where xen specific code can
count on finding more Xen data passed by the hypervisor, the
xen_start_info. That is, if we wanted to merge init paths this is
something to consider.

One thing I considered on the question of who should set the zero page
for Xen with the prospect of merging inits, or at least this subarch
for both short term and long term are the obvious implications in
terms of hypervisor / kernel / qemu combination requirements if the
subarch is needed. Having it set in the kernel is an obvious immediate
choice for PV / PVH but it means we can't merge init paths completely
(down to asm inits), we'd still be able to merge some C init paths
though, the first entry would still be different. Having the zero page
set on the hypervisor would go long ways but it would mean a
hypervisor change required.

These prospects are worth discussing, specially in light of Boris's
hvmlite work.

  Luis

H. Peter Anvin Jan. 20, 2016, 9:41 p.m. UTC | #2

On 01/20/16 13:33, Luis R. Rodriguez wrote:
> 
> That's correct for PV and PVH, likewise when qemu is required for HVM
> qemu could set it. I have the qemu change done but that should only
> cover HVM. A common place to set this as well could be the hypervisor,
> but currently the hypervisor doesn't set any boot_params, instead a
> generic struct is passed and the kernel code (for any OS) is expected
> to interpret this and then set the required values for the OS in the
> init path. Long term though if we wanted to merge init further one way
> could be to have the hypervisor just set the zero page cleanly for the
> different modes. If we needed more data other than the
> hardware_subarch we also have the hardware_subarch_data, that's a u64
> , and how that is used would be up to the subarch. In Xen's case it
> could do what it wants with it. That would still mean perhaps defining
> as part of a Xen boot protocol a place where xen specific code can
> count on finding more Xen data passed by the hypervisor, the
> xen_start_info. That is, if we wanted to merge init paths this is
> something to consider.
> 
> One thing I considered on the question of who should set the zero page
> for Xen with the prospect of merging inits, or at least this subarch
> for both short term and long term are the obvious implications in
> terms of hypervisor / kernel / qemu combination requirements if the
> subarch is needed. Having it set in the kernel is an obvious immediate
> choice for PV / PVH but it means we can't merge init paths completely
> (down to asm inits), we'd still be able to merge some C init paths
> though, the first entry would still be different. Having the zero page
> set on the hypervisor would go long ways but it would mean a
> hypervisor change required.
> 
> These prospects are worth discussing, specially in light of Boris's
> hvmlite work.
> 

The above doesn't make sense to me.  hardware_subarch is really used
when the boot sequence is somehow nonstandard.  HVM probably doesn't
need that.

	-hpa

Luis R. Rodriguez Jan. 20, 2016, 10:12 p.m. UTC | #3

On Wed, Jan 20, 2016 at 1:41 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 01/20/16 13:33, Luis R. Rodriguez wrote:
>>
>> That's correct for PV and PVH, likewise when qemu is required for HVM
>> qemu could set it. I have the qemu change done but that should only
>> cover HVM. A common place to set this as well could be the hypervisor,
>> but currently the hypervisor doesn't set any boot_params, instead a
>> generic struct is passed and the kernel code (for any OS) is expected
>> to interpret this and then set the required values for the OS in the
>> init path. Long term though if we wanted to merge init further one way
>> could be to have the hypervisor just set the zero page cleanly for the
>> different modes. If we needed more data other than the
>> hardware_subarch we also have the hardware_subarch_data, that's a u64
>> , and how that is used would be up to the subarch. In Xen's case it
>> could do what it wants with it. That would still mean perhaps defining
>> as part of a Xen boot protocol a place where xen specific code can
>> count on finding more Xen data passed by the hypervisor, the
>> xen_start_info. That is, if we wanted to merge init paths this is
>> something to consider.
>>
>> One thing I considered on the question of who should set the zero page
>> for Xen with the prospect of merging inits, or at least this subarch
>> for both short term and long term are the obvious implications in
>> terms of hypervisor / kernel / qemu combination requirements if the
>> subarch is needed. Having it set in the kernel is an obvious immediate
>> choice for PV / PVH but it means we can't merge init paths completely
>> (down to asm inits), we'd still be able to merge some C init paths
>> though, the first entry would still be different. Having the zero page
>> set on the hypervisor would go long ways but it would mean a
>> hypervisor change required.
>>
>> These prospects are worth discussing, specially in light of Boris's
>> hvmlite work.
>>
>
> The above doesn't make sense to me.  hardware_subarch is really used
> when the boot sequence is somehow nonstandard.

Thanks for the feedback -- as it stands today hardware_subarch is only
used by lguest, Moorestown, and CE4100 even though we had definitions
for it for Xen -- this is not used yet. Its documentation does make
references to differences for a paravirtualized environment, and uses
a few examples but doesn't go into great depths about restrictions so
its limitations in how we could use it were not clear to me.

>  HVM probably doesn't need that.

Today HVM doesn't need it, but perhaps that is because it has not
needed changes early on boot. Will it, or could it? I'd even invite us
to consider the same for other hypervisors or PV hypervisors. I'll
note that things like cpu_has_hypervisor() or derivatives
(kvm_para_available() which is now used on drivers even, see
sound/pci/intel8x0.c) requires init_hypervisor_platform() run, in
terms of the x86 init sequence this is run pretty late at
setup_arch(). Should code need to know hypervisor info anytime before
that they have no generic option available.

I'm fine if we want to restrict hardware_subarch but I'll note the
semantics were not that explicit to delineate clear differences and I
just wanted to highlight the current early boot restriction of
cpu_has_hypervisor().

  Luis

H. Peter Anvin Jan. 20, 2016, 10:20 p.m. UTC | #4

On January 20, 2016 2:12:49 PM PST, "Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:
>On Wed, Jan 20, 2016 at 1:41 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 01/20/16 13:33, Luis R. Rodriguez wrote:
>>>
>>> That's correct for PV and PVH, likewise when qemu is required for
>HVM
>>> qemu could set it. I have the qemu change done but that should only
>>> cover HVM. A common place to set this as well could be the
>hypervisor,
>>> but currently the hypervisor doesn't set any boot_params, instead a
>>> generic struct is passed and the kernel code (for any OS) is
>expected
>>> to interpret this and then set the required values for the OS in the
>>> init path. Long term though if we wanted to merge init further one
>way
>>> could be to have the hypervisor just set the zero page cleanly for
>the
>>> different modes. If we needed more data other than the
>>> hardware_subarch we also have the hardware_subarch_data, that's a
>u64
>>> , and how that is used would be up to the subarch. In Xen's case it
>>> could do what it wants with it. That would still mean perhaps
>defining
>>> as part of a Xen boot protocol a place where xen specific code can
>>> count on finding more Xen data passed by the hypervisor, the
>>> xen_start_info. That is, if we wanted to merge init paths this is
>>> something to consider.
>>>
>>> One thing I considered on the question of who should set the zero
>page
>>> for Xen with the prospect of merging inits, or at least this subarch
>>> for both short term and long term are the obvious implications in
>>> terms of hypervisor / kernel / qemu combination requirements if the
>>> subarch is needed. Having it set in the kernel is an obvious
>immediate
>>> choice for PV / PVH but it means we can't merge init paths
>completely
>>> (down to asm inits), we'd still be able to merge some C init paths
>>> though, the first entry would still be different. Having the zero
>page
>>> set on the hypervisor would go long ways but it would mean a
>>> hypervisor change required.
>>>
>>> These prospects are worth discussing, specially in light of Boris's
>>> hvmlite work.
>>>
>>
>> The above doesn't make sense to me.  hardware_subarch is really used
>> when the boot sequence is somehow nonstandard.
>
>Thanks for the feedback -- as it stands today hardware_subarch is only
>used by lguest, Moorestown, and CE4100 even though we had definitions
>for it for Xen -- this is not used yet. Its documentation does make
>references to differences for a paravirtualized environment, and uses
>a few examples but doesn't go into great depths about restrictions so
>its limitations in how we could use it were not clear to me.
>
>>  HVM probably doesn't need that.
>
>Today HVM doesn't need it, but perhaps that is because it has not
>needed changes early on boot. Will it, or could it? I'd even invite us
>to consider the same for other hypervisors or PV hypervisors. I'll
>note that things like cpu_has_hypervisor() or derivatives
>(kvm_para_available() which is now used on drivers even, see
>sound/pci/intel8x0.c) requires init_hypervisor_platform() run, in
>terms of the x86 init sequence this is run pretty late at
>setup_arch(). Should code need to know hypervisor info anytime before
>that they have no generic option available.
>
>I'm fine if we want to restrict hardware_subarch but I'll note the
>semantics were not that explicit to delineate clear differences and I
>just wanted to highlight the current early boot restriction of
>cpu_has_hypervisor().
>
>  Luis

Basically, if the hardware is enumerable using standard PC mechanisms (PCI, ACPI) and doesn't need a special boot flow it should use type 0.

Roger Pau Monné Jan. 21, 2016, 8:38 a.m. UTC | #5

El 20/01/16 a les 22.33, Luis R. Rodriguez ha escrit:
> On Wed, Jan 20, 2016 at 1:00 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>>> +static bool x86_init_fn_supports_subarch(struct x86_init_fn *fn)
>>> +{
>>> +     if (!fn->supp_hardware_subarch) {
>>> +             pr_err("Init sequence fails to declares any supported subarchs: %pF\n", fn->early_init);
>>> +             WARN_ON(1);
>>> +     }
>>> +     if (BIT(boot_params.hdr.hardware_subarch) & fn->supp_hardware_subarch)
>>> +             return true;
>>> +     return false;
>>> +}
>>
>> So the logic for this working is that boot_params.hdr.hardware_subarch
>>
>> And the macros define two: BIT(X86_SUBARCH_PC) or BIT(X86_SUBARCH_XEN).
>>
>> But hardware_subarch by default is set to zero. Which means if GRUB2, PXELinux, Xen multiboot1
>> don't set it - then the X86_SUBARCH_PC is choosen right?
>>
>>  1 << 0 & 1 << X86_SUBARCH_PC (which is zero).
>>
>> For this to nicely work with Xen it ought to do this:
>>
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 993b7a7..6cf9afd 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -1676,6 +1676,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
>>         boot_params.hdr.ramdisk_image = initrd_start;
>>         boot_params.hdr.ramdisk_size = xen_start_info->mod_len;
>>         boot_params.hdr.cmd_line_ptr = __pa(xen_start_info->cmd_line);
>> +       boot_params.hdr.hardware_subarch = X86_SUBARCH_XEN;
>>
>>         if (!xen_initial_domain()) {
>>                 add_preferred_console("xenboot", 0, NULL);
>>
>>
>> ?
> 
> That's correct for PV and PVH, likewise when qemu is required for HVM
> qemu could set it. I have the qemu change done but that should only
> cover HVM. A common place to set this as well could be the hypervisor,
> but currently the hypervisor doesn't set any boot_params, instead a
> generic struct is passed and the kernel code (for any OS) is expected
> to interpret this and then set the required values for the OS in the
> init path. Long term though if we wanted to merge init further one way
> could be to have the hypervisor just set the zero page cleanly for the
> different modes. If we needed more data other than the
> hardware_subarch we also have the hardware_subarch_data, that's a u64
> , and how that is used would be up to the subarch. In Xen's case it
> could do what it wants with it. That would still mean perhaps defining
> as part of a Xen boot protocol a place where xen specific code can
> count on finding more Xen data passed by the hypervisor, the
> xen_start_info. That is, if we wanted to merge init paths this is
> something to consider.
> 
> One thing I considered on the question of who should set the zero page
> for Xen with the prospect of merging inits, or at least this subarch
> for both short term and long term are the obvious implications in
> terms of hypervisor / kernel / qemu combination requirements if the
> subarch is needed. Having it set in the kernel is an obvious immediate
> choice for PV / PVH but it means we can't merge init paths completely
> (down to asm inits), we'd still be able to merge some C init paths
> though, the first entry would still be different. Having the zero page
> set on the hypervisor would go long ways but it would mean a
> hypervisor change required.

I don't think the hypervisor should be setting Linux specific boot
related parameters, the boot ABI should be OS agnostic. IMHO, a small
shim should be added to Linux in order to set what Linux requires when
entering from a Xen entry point.

Roger.

Boris Ostrovsky Jan. 21, 2016, 1:45 p.m. UTC | #6

On 01/21/2016 03:38 AM, Roger Pau Monné wrote:
> El 20/01/16 a les 22.33, Luis R. Rodriguez ha escrit:
>> On Wed, Jan 20, 2016 at 1:00 PM, Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com> wrote:
>>>> +static bool x86_init_fn_supports_subarch(struct x86_init_fn *fn)
>>>> +{
>>>> +     if (!fn->supp_hardware_subarch) {
>>>> +             pr_err("Init sequence fails to declares any supported subarchs: %pF\n", fn->early_init);
>>>> +             WARN_ON(1);
>>>> +     }
>>>> +     if (BIT(boot_params.hdr.hardware_subarch) & fn->supp_hardware_subarch)
>>>> +             return true;
>>>> +     return false;
>>>> +}
>>> So the logic for this working is that boot_params.hdr.hardware_subarch
>>>
>>> And the macros define two: BIT(X86_SUBARCH_PC) or BIT(X86_SUBARCH_XEN).
>>>
>>> But hardware_subarch by default is set to zero. Which means if GRUB2, PXELinux, Xen multiboot1
>>> don't set it - then the X86_SUBARCH_PC is choosen right?
>>>
>>>   1 << 0 & 1 << X86_SUBARCH_PC (which is zero).
>>>
>>> For this to nicely work with Xen it ought to do this:
>>>
>>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>>> index 993b7a7..6cf9afd 100644
>>> --- a/arch/x86/xen/enlighten.c
>>> +++ b/arch/x86/xen/enlighten.c
>>> @@ -1676,6 +1676,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
>>>          boot_params.hdr.ramdisk_image = initrd_start;
>>>          boot_params.hdr.ramdisk_size = xen_start_info->mod_len;
>>>          boot_params.hdr.cmd_line_ptr = __pa(xen_start_info->cmd_line);
>>> +       boot_params.hdr.hardware_subarch = X86_SUBARCH_XEN;
>>>
>>>          if (!xen_initial_domain()) {
>>>                  add_preferred_console("xenboot", 0, NULL);
>>>
>>>
>>> ?
>> That's correct for PV and PVH, likewise when qemu is required for HVM
>> qemu could set it. I have the qemu change done but that should only
>> cover HVM. A common place to set this as well could be the hypervisor,
>> but currently the hypervisor doesn't set any boot_params, instead a
>> generic struct is passed and the kernel code (for any OS) is expected
>> to interpret this and then set the required values for the OS in the
>> init path. Long term though if we wanted to merge init further one way
>> could be to have the hypervisor just set the zero page cleanly for the
>> different modes. If we needed more data other than the
>> hardware_subarch we also have the hardware_subarch_data, that's a u64
>> , and how that is used would be up to the subarch. In Xen's case it
>> could do what it wants with it. That would still mean perhaps defining
>> as part of a Xen boot protocol a place where xen specific code can
>> count on finding more Xen data passed by the hypervisor, the
>> xen_start_info. That is, if we wanted to merge init paths this is
>> something to consider.
>>
>> One thing I considered on the question of who should set the zero page
>> for Xen with the prospect of merging inits, or at least this subarch
>> for both short term and long term are the obvious implications in
>> terms of hypervisor / kernel / qemu combination requirements if the
>> subarch is needed. Having it set in the kernel is an obvious immediate
>> choice for PV / PVH but it means we can't merge init paths completely
>> (down to asm inits), we'd still be able to merge some C init paths
>> though, the first entry would still be different. Having the zero page
>> set on the hypervisor would go long ways but it would mean a
>> hypervisor change required.
> I don't think the hypervisor should be setting Linux specific boot
> related parameters, the boot ABI should be OS agnostic. IMHO, a small
> shim should be added to Linux in order to set what Linux requires when
> entering from a Xen entry point.

And that's exactly what HVMlite does. Most of this shim layer is setting 
up boot_params, after which we jump to standard x86 boot path (i.e. 
startup_{32|64}). With hardware_subarch set to zero.

-boris

H. Peter Anvin Jan. 21, 2016, 7:25 p.m. UTC | #7

On 01/21/16 05:45, Boris Ostrovsky wrote:
>> I don't think the hypervisor should be setting Linux specific boot
>> related parameters, the boot ABI should be OS agnostic. IMHO, a small
>> shim should be added to Linux in order to set what Linux requires when
>> entering from a Xen entry point.

For the record, this is exactly what hardware_subarch is meant to do:
revector to a stub to do this kind of setup, for a subarchitecture which
doesn't have a natural stub like BIOS or EFI.  In the case of Xen PV, or
lguest, there are special care that has to be done very early in the
path due to the nonstandard handling of page tables, which is another
reason for this field.

> And that's exactly what HVMlite does. Most of this shim layer is setting
> up boot_params, after which we jump to standard x86 boot path (i.e.
> startup_{32|64}). With hardware_subarch set to zero.

Which is the way to do it as long as the early code can be the same.

	-hpa

Luis R. Rodriguez Jan. 21, 2016, 7:46 p.m. UTC | #8

On Thu, Jan 21, 2016 at 11:25 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> And that's exactly what HVMlite does. Most of this shim layer is setting
>> up boot_params, after which we jump to standard x86 boot path (i.e.
>> startup_{32|64}). With hardware_subarch set to zero.
>
> Which is the way to do it as long as the early code can be the same.

To be clear, with the subarchand linker table suggested in my patch
series, it should be possible to have the same exact entry point, the
Xen PV setup code could run early in the order. For instance in the
linker table we could use the reserved order levels 01-09 for PV
hypervisor code:

+/* Init order levels, we can start at 01 but reserve 01-09 for now */
+#define X86_INIT_ORDER_EARLY   10
+#define X86_INIT_ORDER_NORMAL  30
+#define X86_INIT_ORDER_LATE    50

So perhaps X86_INIT_ORDER_PV as 05 later.

The standard x86 init would just then be:

asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
{
        x86_init_fn_init_tables();
        x86_init_fn_early_init();
}

The PV init code would kick in early and could parse the
boot_params.hdr.hardware_subarch_data pointer as it sees fit.

 Luis

H. Peter Anvin Jan. 21, 2016, 7:50 p.m. UTC | #9

On 01/21/16 11:46, Luis R. Rodriguez wrote:
> On Thu, Jan 21, 2016 at 11:25 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> And that's exactly what HVMlite does. Most of this shim layer is setting
>>> up boot_params, after which we jump to standard x86 boot path (i.e.
>>> startup_{32|64}). With hardware_subarch set to zero.
>>
>> Which is the way to do it as long as the early code can be the same.
> 
> To be clear, with the subarchand linker table suggested in my patch
> series, it should be possible to have the same exact entry point, the
> Xen PV setup code could run early in the order. For instance in the
> linker table we could use the reserved order levels 01-09 for PV
> hypervisor code:
> 
> +/* Init order levels, we can start at 01 but reserve 01-09 for now */
> +#define X86_INIT_ORDER_EARLY   10
> +#define X86_INIT_ORDER_NORMAL  30
> +#define X86_INIT_ORDER_LATE    50
> 
> So perhaps X86_INIT_ORDER_PV as 05 later.
> 
> The standard x86 init would just then be:
> 
> asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
> {
>         x86_init_fn_init_tables();
>         x86_init_fn_early_init();
> }
> 
> The PV init code would kick in early and could parse the
> boot_params.hdr.hardware_subarch_data pointer as it sees fit.
> 

Right... we already do that.

I don't even think you need to initialize any tables.  At least on i386,
we have to do this in assembly code.  However, it is just a simple table
walk.  :)

	-hpa

H. Peter Anvin Jan. 21, 2016, 7:52 p.m. UTC | #10

On 01/21/16 11:50, H. Peter Anvin wrote:
> 
> Right... we already do that.
> 
> I don't even think you need to initialize any tables.  At least on i386,
> we have to do this in assembly code.  However, it is just a simple table
> walk.  :)
> 

It might make more sense to make subarch its own table, though, although
I haven't looked at your code in enough detail to say.

	-hpa

Luis R. Rodriguez Jan. 21, 2016, 8:05 p.m. UTC | #11

On Thu, Jan 21, 2016 at 11:50 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> The PV init code would kick in early and could parse the
>> boot_params.hdr.hardware_subarch_data pointer as it sees fit.
>>
>
> Right... we already do that.
>
> I don't even think you need to initialize any tables.

The init above is for the sorting. Its not needed unless your
subsystem uses it, its the same sort as with the IOMMU stuff only with
stronger semantics. In theory one would *not* need to do this so
early, it could wait, provided you at least are confident the linker
order suffices for all routines called early. To be clear, the struct
proposed defines a set of callbacks for calling feature's callbacks at
different levels of the x86 init path. As this patch series is
concerned it had one for x86_64_start_reservations(). This could
easily be extended to also include one to be called for instance at
setup_arch(). If we were able to solve the ability to make use of
subarch so early as of we could then have a callback for
x86_64_start_kernel(), then one at x86_64_start_reservations() and
perhaps one at setup_arch() later -- Kasan is an example that could
fit this example in the future. What I mean is that we could avoid the
sort or wait for the run time sort later until
x86_64_start_reservations() or even setup_arch() provided the deafult
linker order suffices for the series of previous callbacks. Its up to
us but I'm taken care to be pedantic about semantics here.

> At least on i386,
> we have to do this in assembly code.

Neat! How is that order kept?

> However, it is just a simple table walk.  :)

 Luis

H. Peter Anvin Jan. 21, 2016, 9:36 p.m. UTC | #12

On 01/21/16 12:05, Luis R. Rodriguez wrote:
> 
>> At least on i386,
>> we have to do this in assembly code.
> 
> Neat! How is that order kept?
> 

Right now subarch_entries[] is just an array indexed by subarch number
hardcoded in head_32.S.

However, if you have a list of (id, target) then you could just iterate
over it.

	-hpa

Luis R. Rodriguez Jan. 22, 2016, 12:19 a.m. UTC | #13

On Thu, Jan 21, 2016 at 11:52 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 01/21/16 11:50, H. Peter Anvin wrote:
>>
>> Right... we already do that.
>>
>> I don't even think you need to initialize any tables.  At least on i386,
>> we have to do this in assembly code.  However, it is just a simple table
>> walk.  :)
>>
>
> It might make more sense to make subarch its own table, though, although
> I haven't looked at your code in enough detail to say.

The gist of it:

void x86_init_fn_early_init() {
  for_each_table_entry(init_fn, X86_INIT_FNS) {
     if supported subarc
      init_fn->early_init();
}

void x86_init_fn_setup_arch() {
  for_each_table_entry(init_fn, X86_INIT_FNS) {
     if supported subarc
      init_fn->setup_archt();
}

Right now the code defines just an init routine at the stage for
x86_64_start_reservations(), we call the callback early_init(), ie
setup_arch() doesn't exist yet. Since certain routines can run on
either Xen or bare-metal we allow a bitwise OR for the subarch as a
bitmask. I'll think a bit more about how to use subarch for a table,
but can't fit it yet. I'd like to later revise kfree'ing unused
sections, and a table per subarch might be nice for that, but
otherwise this flow seems easy to follow, so if we can kfree sections
within a table we could still accomplish that later.

  Luis

Luis R. Rodriguez Jan. 22, 2016, 12:25 a.m. UTC | #14

On Wed, Jan 20, 2016 at 2:20 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On January 20, 2016 2:12:49 PM PST, "Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:
>>On Wed, Jan 20, 2016 at 1:41 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> On 01/20/16 13:33, Luis R. Rodriguez wrote:
>>>>
>>>> That's correct for PV and PVH, likewise when qemu is required for
>>HVM
>>>> qemu could set it. I have the qemu change done but that should only
>>>> cover HVM. A common place to set this as well could be the
>>hypervisor,
>>>> but currently the hypervisor doesn't set any boot_params, instead a
>>>> generic struct is passed and the kernel code (for any OS) is
>>expected
>>>> to interpret this and then set the required values for the OS in the
>>>> init path. Long term though if we wanted to merge init further one
>>way
>>>> could be to have the hypervisor just set the zero page cleanly for
>>the
>>>> different modes. If we needed more data other than the
>>>> hardware_subarch we also have the hardware_subarch_data, that's a
>>u64
>>>> , and how that is used would be up to the subarch. In Xen's case it
>>>> could do what it wants with it. That would still mean perhaps
>>defining
>>>> as part of a Xen boot protocol a place where xen specific code can
>>>> count on finding more Xen data passed by the hypervisor, the
>>>> xen_start_info. That is, if we wanted to merge init paths this is
>>>> something to consider.
>>>>
>>>> One thing I considered on the question of who should set the zero
>>page
>>>> for Xen with the prospect of merging inits, or at least this subarch
>>>> for both short term and long term are the obvious implications in
>>>> terms of hypervisor / kernel / qemu combination requirements if the
>>>> subarch is needed. Having it set in the kernel is an obvious
>>immediate
>>>> choice for PV / PVH but it means we can't merge init paths
>>completely
>>>> (down to asm inits), we'd still be able to merge some C init paths
>>>> though, the first entry would still be different. Having the zero
>>page
>>>> set on the hypervisor would go long ways but it would mean a
>>>> hypervisor change required.
>>>>
>>>> These prospects are worth discussing, specially in light of Boris's
>>>> hvmlite work.
>>>>
>>>
>>> The above doesn't make sense to me.  hardware_subarch is really used
>>> when the boot sequence is somehow nonstandard.
>>
>>Thanks for the feedback -- as it stands today hardware_subarch is only
>>used by lguest, Moorestown, and CE4100 even though we had definitions
>>for it for Xen -- this is not used yet. Its documentation does make
>>references to differences for a paravirtualized environment, and uses
>>a few examples but doesn't go into great depths about restrictions so
>>its limitations in how we could use it were not clear to me.
>>
>>>  HVM probably doesn't need that.
>>
>>Today HVM doesn't need it, but perhaps that is because it has not
>>needed changes early on boot. Will it, or could it? I'd even invite us
>>to consider the same for other hypervisors or PV hypervisors. I'll
>>note that things like cpu_has_hypervisor() or derivatives
>>(kvm_para_available() which is now used on drivers even, see
>>sound/pci/intel8x0.c) requires init_hypervisor_platform() run, in
>>terms of the x86 init sequence this is run pretty late at
>>setup_arch(). Should code need to know hypervisor info anytime before
>>that they have no generic option available.
>>
>>I'm fine if we want to restrict hardware_subarch but I'll note the
>>semantics were not that explicit to delineate clear differences and I
>>just wanted to highlight the current early boot restriction of
>>cpu_has_hypervisor().
>>
>>  Luis
>
> Basically, if the hardware is enumerable using standard PC mechanisms (PCI, ACPI) and doesn't need a special boot flow it should use type 0.

I can extend the documentation as part of this to be clear.

I will note though that this still leaves a gap on code which might
want to access the question "are we in a virt environment, and if so
on which one" in between really early boot and right before
init_hypervisor_platform(). Or rather, given subarch can be used by
Xen and lguest but not KVM it means KVM doesn't get to use it. It may
not need it, but its also rather trivial to set up on qemu, and I have
a patch for that if we wanted one for KVM. That would extend the
definition of subarch a bit more, but as it stands today its use was
rather limited. Heck, subharch_data is to this day unused.

  Luis

H. Peter Anvin Jan. 22, 2016, 12:42 a.m. UTC | #15

On 01/21/16 16:25, Luis R. Rodriguez wrote:
>>
>> Basically, if the hardware is enumerable using standard PC mechanisms (PCI, ACPI) and doesn't need a special boot flow it should use type 0.
> 
> I can extend the documentation as part of this to be clear.
> 
> I will note though that this still leaves a gap on code which might
> want to access the question "are we in a virt environment, and if so
> on which one" in between really early boot and right before
> init_hypervisor_platform(). Or rather, given subarch can be used by
> Xen and lguest but not KVM it means KVM doesn't get to use it. It may
> not need it, but its also rather trivial to set up on qemu, and I have
> a patch for that if we wanted one for KVM. That would extend the
> definition of subarch a bit more, but as it stands today its use was
> rather limited. Heck, subharch_data is to this day unused.
> 

KVM is not a subarch, and Xen HVM isn't either; the subarch was meant to
be specifically to handle nonstandard boot entries; the CE4100 extension
was itself kind of a hack.

If you have a genuine need for a "hypervisor type" then that is a
separate thing and should be treated separately from subarch.  However,
you need to consider that some hypervisors can emulate other hypervisors
and you may have more than one hypervisor API available.

	-hpa

Luis R. Rodriguez Feb. 11, 2016, 8:45 p.m. UTC | #16

On Thu, Jan 21, 2016 at 4:42 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 01/21/16 16:25, Luis R. Rodriguez wrote:
>>>
>>> Basically, if the hardware is enumerable using standard PC mechanisms (PCI, ACPI) and doesn't need a special boot flow it should use type 0.
>>
>> I can extend the documentation as part of this to be clear.
>>
>> I will note though that this still leaves a gap on code which might
>> want to access the question "are we in a virt environment, and if so
>> on which one" in between really early boot and right before
>> init_hypervisor_platform(). Or rather, given subarch can be used by
>> Xen and lguest but not KVM it means KVM doesn't get to use it. It may
>> not need it, but its also rather trivial to set up on qemu, and I have
>> a patch for that if we wanted one for KVM. That would extend the
>> definition of subarch a bit more, but as it stands today its use was
>> rather limited. Heck, subharch_data is to this day unused.
>>
>
> KVM is not a subarch, and Xen HVM isn't either; the subarch was meant to
> be specifically to handle nonstandard boot entries; the CE4100 extension
> was itself kind of a hack.
>
> If you have a genuine need for a "hypervisor type" then that is a
> separate thing and should be treated separately from subarch.  However,
> you need to consider that some hypervisors can emulate other hypervisors
> and you may have more than one hypervisor API available.

Thanks, I've pegged this onto my series as a documentation extensions
for boot params.

 Luis

[RFC,v1,4/8] x86/init: add linker table support

Commit Message

Comments

Patch