diff mbox series

[4.17] EFI: don't convert memory marked for runtime use to ordinary RAM

Message ID cc0fbcb4-5ea3-178c-e691-9acb7cc9a3a7@suse.com (mailing list archive)
State Superseded
Headers show
Series [4.17] EFI: don't convert memory marked for runtime use to ordinary RAM | expand

Commit Message

Jan Beulich Sept. 30, 2022, 7:50 a.m. UTC
efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
higher priority than the type of the range. To avoid accessing memory at
runtime which was re-used for other purposes, make
efi_arch_process_memory_map() follow suit. While on x86 in theory the
same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
E820_ACPI memory there and hence that type's handling can be left alone.

Fixes: bf6501a62e80 ("x86-64: EFI boot code")
Fixes: facac0af87ef ("x86-64: EFI runtime code")
Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Partly RFC for Arm, for two reasons:

On Arm I question the conversion of EfiACPIReclaimMemory, in two ways:
For one like on x86 such ranges would likely better be retained, as Dom0
may (will?) have a need to look at tables placed there. Plus converting
such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to
me as well. I'd be inclined to make the latter adjustment right here
(while the other change probably would better be separate, if there
aren't actually reasons for the present behavior).

On Arm efi_init_memory() is compiled out, so adjusting Arm code here is
perhaps more for consistency (not leaving a trap for someone to later
fall into) than a strict requirement. I wonder though how Arm has
managed to get away without at least some parts of efi_init_memory() for
all the years that ACPI support has been present there. I guess this is
connected to most of runtime.c also being compiled out, but that
continuing to be the case is another aspect puzzling me.

Comments

Bertrand Marquis Sept. 30, 2022, 11:55 a.m. UTC | #1
Hi Jan,

We will review and test the arm part (even though it is modifying some unused
 code at the moment) but I wanted to answer you on some questions you have ..

> On 30 Sep 2022, at 09:50, Jan Beulich <jbeulich@suse.com> wrote:
> 
> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
> higher priority than the type of the range. To avoid accessing memory at
> runtime which was re-used for other purposes, make
> efi_arch_process_memory_map() follow suit. While on x86 in theory the
> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
> E820_ACPI memory there and hence that type's handling can be left alone.
> 
> Fixes: bf6501a62e80 ("x86-64: EFI boot code")
> Fixes: facac0af87ef ("x86-64: EFI runtime code")
> Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Partly RFC for Arm, for two reasons:
> 
> On Arm I question the conversion of EfiACPIReclaimMemory, in two ways:
> For one like on x86 such ranges would likely better be retained, as Dom0
> may (will?) have a need to look at tables placed there. Plus converting
> such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to
> me as well. I'd be inclined to make the latter adjustment right here
> (while the other change probably would better be separate, if there
> aren't actually reasons for the present behavior).
> 
> On Arm efi_init_memory() is compiled out, so adjusting Arm code here is
> perhaps more for consistency (not leaving a trap for someone to later
> fall into) than a strict requirement. I wonder though how Arm has
> managed to get away without at least some parts of efi_init_memory() for
> all the years that ACPI support has been present there. I guess this is
> connected to most of runtime.c also being compiled out, but that
> continuing to be the case is another aspect puzzling me.

On arm we only use the boot services in Xen and we do not provide
any efi services to dom0. The required info is passed through a simple device
tree.
There was a discussion on that subject some weeks ago and it is still an open
point to be solved.
Also APCI is officially unsupported on arm.

Cheers
Bertrand

> 
> --- a/xen/arch/arm/efi/efi-boot.h
> +++ b/xen/arch/arm/efi/efi-boot.h
> @@ -183,13 +183,15 @@ static EFI_STATUS __init efi_process_mem
> 
>     for ( Index = 0; Index < (mmap_size / desc_size); Index++ )
>     {
> -        if ( desc_ptr->Attribute & EFI_MEMORY_WB &&
> -             (desc_ptr->Type == EfiConventionalMemory ||
> -              desc_ptr->Type == EfiLoaderCode ||
> -              desc_ptr->Type == EfiLoaderData ||
> -              (!map_bs &&
> -               (desc_ptr->Type == EfiBootServicesCode ||
> -                desc_ptr->Type == EfiBootServicesData))) )
> +        if ( desc_ptr->Attribute & EFI_MEMORY_RUNTIME )
> +            /* nothing */;
> +        else if ( (desc_ptr->Attribute & EFI_MEMORY_WB) &&
> +                  (desc_ptr->Type == EfiConventionalMemory ||
> +                   desc_ptr->Type == EfiLoaderCode ||
> +                   desc_ptr->Type == EfiLoaderData ||
> +                   (!map_bs &&
> +                    (desc_ptr->Type == EfiBootServicesCode ||
> +                     desc_ptr->Type == EfiBootServicesData))) )
>         {
>             if ( !meminfo_add_bank(&bootinfo.mem, desc_ptr) )
>             {
> --- a/xen/arch/x86/efi/efi-boot.h
> +++ b/xen/arch/x86/efi/efi-boot.h
> @@ -185,7 +185,9 @@ static void __init efi_arch_process_memo
>             /* fall through */
>         case EfiLoaderCode:
>         case EfiLoaderData:
> -            if ( desc->Attribute & EFI_MEMORY_WB )
> +            if ( desc->Attribute & EFI_MEMORY_RUNTIME )
> +                type = E820_RESERVED;
> +            else if ( desc->Attribute & EFI_MEMORY_WB )
>                 type = E820_RAM;
>             else
>         case EfiUnusableMemory:
Luca Fancellu Sept. 30, 2022, 12:47 p.m. UTC | #2
> On 30 Sep 2022, at 08:50, Jan Beulich <jbeulich@suse.com> wrote:
> 
> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
> higher priority than the type of the range. To avoid accessing memory at
> runtime which was re-used for other purposes, make
> efi_arch_process_memory_map() follow suit. While on x86 in theory the
> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
> E820_ACPI memory there and hence that type's handling can be left alone.
> 
> Fixes: bf6501a62e80 ("x86-64: EFI boot code")
> Fixes: facac0af87ef ("x86-64: EFI runtime code")
> Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---

Hi Jan,

For the Arm part:

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

I’ve also tested the EFI+ACPI boot on two arm boards

Tested-By: Luca Fancellu <luca.fancellu@arm.com>
Bertrand Marquis Sept. 30, 2022, 12:51 p.m. UTC | #3
Hi Jan,

> On 30 Sep 2022, at 09:50, Jan Beulich <jbeulich@suse.com> wrote:
> 
> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
> higher priority than the type of the range. To avoid accessing memory at
> runtime which was re-used for other purposes, make
> efi_arch_process_memory_map() follow suit. While on x86 in theory the
> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
> E820_ACPI memory there and hence that type's handling can be left alone.
> 
> Fixes: bf6501a62e80 ("x86-64: EFI boot code")
> Fixes: facac0af87ef ("x86-64: EFI runtime code")
> Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> #arm

Cheers
Bertrand

> ---
> Partly RFC for Arm, for two reasons:
> 
> On Arm I question the conversion of EfiACPIReclaimMemory, in two ways:
> For one like on x86 such ranges would likely better be retained, as Dom0
> may (will?) have a need to look at tables placed there. Plus converting
> such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to
> me as well. I'd be inclined to make the latter adjustment right here
> (while the other change probably would better be separate, if there
> aren't actually reasons for the present behavior).
> 
> On Arm efi_init_memory() is compiled out, so adjusting Arm code here is
> perhaps more for consistency (not leaving a trap for someone to later
> fall into) than a strict requirement. I wonder though how Arm has
> managed to get away without at least some parts of efi_init_memory() for
> all the years that ACPI support has been present there. I guess this is
> connected to most of runtime.c also being compiled out, but that
> continuing to be the case is another aspect puzzling me.
> 
> --- a/xen/arch/arm/efi/efi-boot.h
> +++ b/xen/arch/arm/efi/efi-boot.h
> @@ -183,13 +183,15 @@ static EFI_STATUS __init efi_process_mem
> 
>     for ( Index = 0; Index < (mmap_size / desc_size); Index++ )
>     {
> -        if ( desc_ptr->Attribute & EFI_MEMORY_WB &&
> -             (desc_ptr->Type == EfiConventionalMemory ||
> -              desc_ptr->Type == EfiLoaderCode ||
> -              desc_ptr->Type == EfiLoaderData ||
> -              (!map_bs &&
> -               (desc_ptr->Type == EfiBootServicesCode ||
> -                desc_ptr->Type == EfiBootServicesData))) )
> +        if ( desc_ptr->Attribute & EFI_MEMORY_RUNTIME )
> +            /* nothing */;
> +        else if ( (desc_ptr->Attribute & EFI_MEMORY_WB) &&
> +                  (desc_ptr->Type == EfiConventionalMemory ||
> +                   desc_ptr->Type == EfiLoaderCode ||
> +                   desc_ptr->Type == EfiLoaderData ||
> +                   (!map_bs &&
> +                    (desc_ptr->Type == EfiBootServicesCode ||
> +                     desc_ptr->Type == EfiBootServicesData))) )
>         {
>             if ( !meminfo_add_bank(&bootinfo.mem, desc_ptr) )
>             {
> --- a/xen/arch/x86/efi/efi-boot.h
> +++ b/xen/arch/x86/efi/efi-boot.h
> @@ -185,7 +185,9 @@ static void __init efi_arch_process_memo
>             /* fall through */
>         case EfiLoaderCode:
>         case EfiLoaderData:
> -            if ( desc->Attribute & EFI_MEMORY_WB )
> +            if ( desc->Attribute & EFI_MEMORY_RUNTIME )
> +                type = E820_RESERVED;
> +            else if ( desc->Attribute & EFI_MEMORY_WB )
>                 type = E820_RAM;
>             else
>         case EfiUnusableMemory:
Andrew Cooper Sept. 30, 2022, 12:53 p.m. UTC | #4
On 30/09/2022 08:50, Jan Beulich wrote:
> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
> higher priority than the type of the range. To avoid accessing memory at
> runtime which was re-used for other purposes, make
> efi_arch_process_memory_map() follow suit. While on x86 in theory the
> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
> E820_ACPI memory there and hence that type's handling can be left alone.
>
> Fixes: bf6501a62e80 ("x86-64: EFI boot code")
> Fixes: facac0af87ef ("x86-64: EFI runtime code")
> Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Isn't this also liable to be the root cause of the crash reported on
IRC, where a read-only page was being inserted into the heap?

~Andrew
Jan Beulich Sept. 30, 2022, 1:07 p.m. UTC | #5
On 30.09.2022 14:53, Andrew Cooper wrote:
> On 30/09/2022 08:50, Jan Beulich wrote:
>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>> higher priority than the type of the range. To avoid accessing memory at
>> runtime which was re-used for other purposes, make
>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>> E820_ACPI memory there and hence that type's handling can be left alone.
>>
>> Fixes: bf6501a62e80 ("x86-64: EFI boot code")
>> Fixes: facac0af87ef ("x86-64: EFI runtime code")
>> Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Isn't this also liable to be the root cause of the crash reported on
> IRC, where a read-only page was being inserted into the heap?

I wouldn't be able to make the connection, I'm afraid. What was asked there
was lacking details, though.

Jan
Bertrand Marquis Sept. 30, 2022, 1:35 p.m. UTC | #6
Hi Andrew,

> On 30 Sep 2022, at 14:53, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
> 
> On 30/09/2022 08:50, Jan Beulich wrote:
>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>> higher priority than the type of the range. To avoid accessing memory at
>> runtime which was re-used for other purposes, make
>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>> E820_ACPI memory there and hence that type's handling can be left alone.
>> 
>> Fixes: bf6501a62e80 ("x86-64: EFI boot code")
>> Fixes: facac0af87ef ("x86-64: EFI runtime code")
>> Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Isn't this also liable to be the root cause of the crash reported on
> IRC, where a read-only page was being inserted into the heap?

No this should not be related at all.

Cheers
Bertrand

> 
> ~Andrew
Roger Pau Monné Sept. 30, 2022, 2:28 p.m. UTC | #7
On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
> higher priority than the type of the range. To avoid accessing memory at
> runtime which was re-used for other purposes, make
> efi_arch_process_memory_map() follow suit. While on x86 in theory the
> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
> E820_ACPI memory there and hence that type's handling can be left alone.

What about dom0?  Should it be translated to E820_RESERVED so that
dom0 doesn't try to use it either?  I guess using E820_RESERVED
could also confuse dom0 about ACPI data placement.

Thanks, Roger.
Jan Beulich Oct. 4, 2022, 8:06 a.m. UTC | #8
On 30.09.2022 16:28, Roger Pau Monné wrote:
> On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>> higher priority than the type of the range. To avoid accessing memory at
>> runtime which was re-used for other purposes, make
>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>> E820_ACPI memory there and hence that type's handling can be left alone.
> 
> What about dom0?  Should it be translated to E820_RESERVED so that
> dom0 doesn't try to use it either?

I'm afraid I don't understand the questions. Not the least because I
think "it" can't really mean "dom0" from the earlier sentence.

Jan

>  I guess using E820_RESERVED
> could also confuse dom0 about ACPI data placement.
> 
> Thanks, Roger.
Roger Pau Monné Oct. 4, 2022, 9:33 a.m. UTC | #9
On Tue, Oct 04, 2022 at 10:06:36AM +0200, Jan Beulich wrote:
> On 30.09.2022 16:28, Roger Pau Monné wrote:
> > On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
> >> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
> >> higher priority than the type of the range. To avoid accessing memory at
> >> runtime which was re-used for other purposes, make
> >> efi_arch_process_memory_map() follow suit. While on x86 in theory the
> >> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
> >> E820_ACPI memory there and hence that type's handling can be left alone.
> > 
> > What about dom0?  Should it be translated to E820_RESERVED so that
> > dom0 doesn't try to use it either?
> 
> I'm afraid I don't understand the questions. Not the least because I
> think "it" can't really mean "dom0" from the earlier sentence.

Sorry, let me try again:

The memory map provided to dom0 will contain E820_ACPI entries for
memory ranges with the EFI_MEMORY_RUNTIME attributes in the EFI memory
map.  Is there a risk from dom0 reclaiming such E820_ACPI ranges,
overwriting the data needed for runtime services?

Thanks, Roger.
Jan Beulich Oct. 4, 2022, 10:23 a.m. UTC | #10
On 04.10.2022 11:33, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 10:06:36AM +0200, Jan Beulich wrote:
>> On 30.09.2022 16:28, Roger Pau Monné wrote:
>>> On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>>>> higher priority than the type of the range. To avoid accessing memory at
>>>> runtime which was re-used for other purposes, make
>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>>>> E820_ACPI memory there and hence that type's handling can be left alone.
>>>
>>> What about dom0?  Should it be translated to E820_RESERVED so that
>>> dom0 doesn't try to use it either?
>>
>> I'm afraid I don't understand the questions. Not the least because I
>> think "it" can't really mean "dom0" from the earlier sentence.
> 
> Sorry, let me try again:
> 
> The memory map provided to dom0 will contain E820_ACPI entries for
> memory ranges with the EFI_MEMORY_RUNTIME attributes in the EFI memory
> map.  Is there a risk from dom0 reclaiming such E820_ACPI ranges,
> overwriting the data needed for runtime services?

How would Dom0 go about doing so? It has no control over what we hand
to the page allocator - it can only free pages which were actually
allocated to it. E820_ACPI and E820_RESERVED pages are assigned to
DomIO - Dom0 can map and access them, but it cannot free them.

Jan
Roger Pau Monné Oct. 4, 2022, 10:38 a.m. UTC | #11
On Tue, Oct 04, 2022 at 12:23:23PM +0200, Jan Beulich wrote:
> On 04.10.2022 11:33, Roger Pau Monné wrote:
> > On Tue, Oct 04, 2022 at 10:06:36AM +0200, Jan Beulich wrote:
> >> On 30.09.2022 16:28, Roger Pau Monné wrote:
> >>> On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
> >>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
> >>>> higher priority than the type of the range. To avoid accessing memory at
> >>>> runtime which was re-used for other purposes, make
> >>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
> >>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
> >>>> E820_ACPI memory there and hence that type's handling can be left alone.
> >>>
> >>> What about dom0?  Should it be translated to E820_RESERVED so that
> >>> dom0 doesn't try to use it either?
> >>
> >> I'm afraid I don't understand the questions. Not the least because I
> >> think "it" can't really mean "dom0" from the earlier sentence.
> > 
> > Sorry, let me try again:
> > 
> > The memory map provided to dom0 will contain E820_ACPI entries for
> > memory ranges with the EFI_MEMORY_RUNTIME attributes in the EFI memory
> > map.  Is there a risk from dom0 reclaiming such E820_ACPI ranges,
> > overwriting the data needed for runtime services?
> 
> How would Dom0 go about doing so? It has no control over what we hand
> to the page allocator - it can only free pages which were actually
> allocated to it. E820_ACPI and E820_RESERVED pages are assigned to
> DomIO - Dom0 can map and access them, but it cannot free them.

Maybe I'm very confused, but what about dom0 overwriting the data
there, won't it cause issues to runtime services?

If the memory is reported in the memory map provided to dom0 as
E820_ACPI dom0 is free to reclaim the region for it's own usage.

Thanks, Roger.
Jan Beulich Oct. 4, 2022, 10:44 a.m. UTC | #12
On 04.10.2022 12:38, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 12:23:23PM +0200, Jan Beulich wrote:
>> On 04.10.2022 11:33, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 10:06:36AM +0200, Jan Beulich wrote:
>>>> On 30.09.2022 16:28, Roger Pau Monné wrote:
>>>>> On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
>>>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>>>>>> higher priority than the type of the range. To avoid accessing memory at
>>>>>> runtime which was re-used for other purposes, make
>>>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>>>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>>>>>> E820_ACPI memory there and hence that type's handling can be left alone.
>>>>>
>>>>> What about dom0?  Should it be translated to E820_RESERVED so that
>>>>> dom0 doesn't try to use it either?
>>>>
>>>> I'm afraid I don't understand the questions. Not the least because I
>>>> think "it" can't really mean "dom0" from the earlier sentence.
>>>
>>> Sorry, let me try again:
>>>
>>> The memory map provided to dom0 will contain E820_ACPI entries for
>>> memory ranges with the EFI_MEMORY_RUNTIME attributes in the EFI memory
>>> map.  Is there a risk from dom0 reclaiming such E820_ACPI ranges,
>>> overwriting the data needed for runtime services?
>>
>> How would Dom0 go about doing so? It has no control over what we hand
>> to the page allocator - it can only free pages which were actually
>> allocated to it. E820_ACPI and E820_RESERVED pages are assigned to
>> DomIO - Dom0 can map and access them, but it cannot free them.
> 
> Maybe I'm very confused, but what about dom0 overwriting the data
> there, won't it cause issues to runtime services?

If it overwrites it, of course there are going to be issues. Just like
there are going to be problems from anything else Dom0 does wrong.

> If the memory is reported in the memory map provided to dom0 as
> E820_ACPI dom0 is free to reclaim the region for it's own usage.

Could you outline to me how such a "reclaim" process would look like?
For the range to become ordinary RAM, Xen needs to be involved. But
there's no hypercall allowing Dom0 to free a page which wasn't
allocated to it. And the Dom0 kernel simply re-using the range as if
it was RAM is flawed - it would break the latest once Dom0 would try
to balloon out such a page.

Jan
Andrew Cooper Oct. 4, 2022, 10:49 a.m. UTC | #13
On 04/10/2022 11:23, Jan Beulich wrote:
> On 04.10.2022 11:33, Roger Pau Monné wrote:
>> On Tue, Oct 04, 2022 at 10:06:36AM +0200, Jan Beulich wrote:
>>> On 30.09.2022 16:28, Roger Pau Monné wrote:
>>>> On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
>>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>>>>> higher priority than the type of the range. To avoid accessing memory at
>>>>> runtime which was re-used for other purposes, make
>>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>>>>> E820_ACPI memory there and hence that type's handling can be left alone.
>>>> What about dom0?  Should it be translated to E820_RESERVED so that
>>>> dom0 doesn't try to use it either?
>>> I'm afraid I don't understand the questions. Not the least because I
>>> think "it" can't really mean "dom0" from the earlier sentence.
>> Sorry, let me try again:
>>
>> The memory map provided to dom0 will contain E820_ACPI entries for
>> memory ranges with the EFI_MEMORY_RUNTIME attributes in the EFI memory
>> map.  Is there a risk from dom0 reclaiming such E820_ACPI ranges,
>> overwriting the data needed for runtime services?
> How would Dom0 go about doing so?

Zeroing the memory and putting it into its own heap.

You seem to be presuming that some unwritten and unenforced rules exist.

Once dom0 has booted, Xen cannot safety touch any ACPI reclaimable
range.  It doesn't go wrong in practice because OSes don't actually
reclaim the reclaimable ranges (which is also why Xen HVM guests don't
explode either.)

~Andrew
Roger Pau Monné Oct. 4, 2022, 10:54 a.m. UTC | #14
On Tue, Oct 04, 2022 at 12:44:16PM +0200, Jan Beulich wrote:
> On 04.10.2022 12:38, Roger Pau Monné wrote:
> > On Tue, Oct 04, 2022 at 12:23:23PM +0200, Jan Beulich wrote:
> >> On 04.10.2022 11:33, Roger Pau Monné wrote:
> >>> On Tue, Oct 04, 2022 at 10:06:36AM +0200, Jan Beulich wrote:
> >>>> On 30.09.2022 16:28, Roger Pau Monné wrote:
> >>>>> On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
> >>>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
> >>>>>> higher priority than the type of the range. To avoid accessing memory at
> >>>>>> runtime which was re-used for other purposes, make
> >>>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
> >>>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
> >>>>>> E820_ACPI memory there and hence that type's handling can be left alone.
> >>>>>
> >>>>> What about dom0?  Should it be translated to E820_RESERVED so that
> >>>>> dom0 doesn't try to use it either?
> >>>>
> >>>> I'm afraid I don't understand the questions. Not the least because I
> >>>> think "it" can't really mean "dom0" from the earlier sentence.
> >>>
> >>> Sorry, let me try again:
> >>>
> >>> The memory map provided to dom0 will contain E820_ACPI entries for
> >>> memory ranges with the EFI_MEMORY_RUNTIME attributes in the EFI memory
> >>> map.  Is there a risk from dom0 reclaiming such E820_ACPI ranges,
> >>> overwriting the data needed for runtime services?
> >>
> >> How would Dom0 go about doing so? It has no control over what we hand
> >> to the page allocator - it can only free pages which were actually
> >> allocated to it. E820_ACPI and E820_RESERVED pages are assigned to
> >> DomIO - Dom0 can map and access them, but it cannot free them.
> > 
> > Maybe I'm very confused, but what about dom0 overwriting the data
> > there, won't it cause issues to runtime services?
> 
> If it overwrites it, of course there are going to be issues. Just like
> there are going to be problems from anything else Dom0 does wrong.

But would dom0 know it's doing something wrong?

The region is just marked as E820_ACPI from dom0 PoV, so it doesn't
know it's required by EFI runtime services, and dom0 could
legitimately overwrite the region once it considers all ACPI parsing
done from it's side.

Thanks, Roger.
Jan Beulich Oct. 4, 2022, 11:09 a.m. UTC | #15
On 04.10.2022 12:49, Andrew Cooper wrote:
> On 04/10/2022 11:23, Jan Beulich wrote:
>> On 04.10.2022 11:33, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 10:06:36AM +0200, Jan Beulich wrote:
>>>> On 30.09.2022 16:28, Roger Pau Monné wrote:
>>>>> On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
>>>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>>>>>> higher priority than the type of the range. To avoid accessing memory at
>>>>>> runtime which was re-used for other purposes, make
>>>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>>>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>>>>>> E820_ACPI memory there and hence that type's handling can be left alone.
>>>>> What about dom0?  Should it be translated to E820_RESERVED so that
>>>>> dom0 doesn't try to use it either?
>>>> I'm afraid I don't understand the questions. Not the least because I
>>>> think "it" can't really mean "dom0" from the earlier sentence.
>>> Sorry, let me try again:
>>>
>>> The memory map provided to dom0 will contain E820_ACPI entries for
>>> memory ranges with the EFI_MEMORY_RUNTIME attributes in the EFI memory
>>> map.  Is there a risk from dom0 reclaiming such E820_ACPI ranges,
>>> overwriting the data needed for runtime services?
>> How would Dom0 go about doing so?
> 
> Zeroing the memory and putting it into its own heap.

This makes no sense.

> You seem to be presuming that some unwritten and unenforced rules exist.

There's the PV interface. All memory management related functionality
has to be based on the E820 map handed to the domain (irrespective of
it being Dom0 or DomU). That map has no E820_ACPI entries for PV (and
PVH Dom0 is as of yet unsupported). Dom0 is also handed the host E820,
but it is supposed to use it only for resource management (e.g. to
determine where BARs may be placed, or how to arrange its PFN space).

> Once dom0 has booted, Xen cannot safety touch any ACPI reclaimable
> range.

I clearly disagree, and I expect existing behavior also disagrees with
you.

Jan

>  It doesn't go wrong in practice because OSes don't actually
> reclaim the reclaimable ranges (which is also why Xen HVM guests don't
> explode either.)
> 
> ~Andrew
Jan Beulich Oct. 4, 2022, 12:18 p.m. UTC | #16
On 04.10.2022 12:54, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 12:44:16PM +0200, Jan Beulich wrote:
>> On 04.10.2022 12:38, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 12:23:23PM +0200, Jan Beulich wrote:
>>>> On 04.10.2022 11:33, Roger Pau Monné wrote:
>>>>> On Tue, Oct 04, 2022 at 10:06:36AM +0200, Jan Beulich wrote:
>>>>>> On 30.09.2022 16:28, Roger Pau Monné wrote:
>>>>>>> On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
>>>>>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>>>>>>>> higher priority than the type of the range. To avoid accessing memory at
>>>>>>>> runtime which was re-used for other purposes, make
>>>>>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>>>>>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>>>>>>>> E820_ACPI memory there and hence that type's handling can be left alone.
>>>>>>>
>>>>>>> What about dom0?  Should it be translated to E820_RESERVED so that
>>>>>>> dom0 doesn't try to use it either?
>>>>>>
>>>>>> I'm afraid I don't understand the questions. Not the least because I
>>>>>> think "it" can't really mean "dom0" from the earlier sentence.
>>>>>
>>>>> Sorry, let me try again:
>>>>>
>>>>> The memory map provided to dom0 will contain E820_ACPI entries for
>>>>> memory ranges with the EFI_MEMORY_RUNTIME attributes in the EFI memory
>>>>> map.  Is there a risk from dom0 reclaiming such E820_ACPI ranges,
>>>>> overwriting the data needed for runtime services?
>>>>
>>>> How would Dom0 go about doing so? It has no control over what we hand
>>>> to the page allocator - it can only free pages which were actually
>>>> allocated to it. E820_ACPI and E820_RESERVED pages are assigned to
>>>> DomIO - Dom0 can map and access them, but it cannot free them.
>>>
>>> Maybe I'm very confused, but what about dom0 overwriting the data
>>> there, won't it cause issues to runtime services?
>>
>> If it overwrites it, of course there are going to be issues. Just like
>> there are going to be problems from anything else Dom0 does wrong.
> 
> But would dom0 know it's doing something wrong?

Yes. Please also see my reply to Andrew.

> The region is just marked as E820_ACPI from dom0 PoV, so it doesn't
> know it's required by EFI runtime services, and dom0 could
> legitimately overwrite the region once it considers all ACPI parsing
> done from it's side.

PV Dom0 won't ever see E820_ACPI in the relevant E820 map; this type can
only appear in the machine E820. In how far PVH Dom0 might need to take
special care I can't tell right now (but at least for kexec purposes I
expect Linux isn't going to recycle E820_ACPI regions even going forward).

Jan
Roger Pau Monné Oct. 4, 2022, 12:52 p.m. UTC | #17
On Tue, Oct 04, 2022 at 02:18:31PM +0200, Jan Beulich wrote:
> On 04.10.2022 12:54, Roger Pau Monné wrote:
> > On Tue, Oct 04, 2022 at 12:44:16PM +0200, Jan Beulich wrote:
> >> On 04.10.2022 12:38, Roger Pau Monné wrote:
> >>> On Tue, Oct 04, 2022 at 12:23:23PM +0200, Jan Beulich wrote:
> >>>> On 04.10.2022 11:33, Roger Pau Monné wrote:
> >>>>> On Tue, Oct 04, 2022 at 10:06:36AM +0200, Jan Beulich wrote:
> >>>>>> On 30.09.2022 16:28, Roger Pau Monné wrote:
> >>>>>>> On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
> >>>>>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
> >>>>>>>> higher priority than the type of the range. To avoid accessing memory at
> >>>>>>>> runtime which was re-used for other purposes, make
> >>>>>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
> >>>>>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
> >>>>>>>> E820_ACPI memory there and hence that type's handling can be left alone.
> >>>>>>>
> >>>>>>> What about dom0?  Should it be translated to E820_RESERVED so that
> >>>>>>> dom0 doesn't try to use it either?
> >>>>>>
> >>>>>> I'm afraid I don't understand the questions. Not the least because I
> >>>>>> think "it" can't really mean "dom0" from the earlier sentence.
> >>>>>
> >>>>> Sorry, let me try again:
> >>>>>
> >>>>> The memory map provided to dom0 will contain E820_ACPI entries for
> >>>>> memory ranges with the EFI_MEMORY_RUNTIME attributes in the EFI memory
> >>>>> map.  Is there a risk from dom0 reclaiming such E820_ACPI ranges,
> >>>>> overwriting the data needed for runtime services?
> >>>>
> >>>> How would Dom0 go about doing so? It has no control over what we hand
> >>>> to the page allocator - it can only free pages which were actually
> >>>> allocated to it. E820_ACPI and E820_RESERVED pages are assigned to
> >>>> DomIO - Dom0 can map and access them, but it cannot free them.
> >>>
> >>> Maybe I'm very confused, but what about dom0 overwriting the data
> >>> there, won't it cause issues to runtime services?
> >>
> >> If it overwrites it, of course there are going to be issues. Just like
> >> there are going to be problems from anything else Dom0 does wrong.
> > 
> > But would dom0 know it's doing something wrong?
> 
> Yes. Please also see my reply to Andrew.
> 
> > The region is just marked as E820_ACPI from dom0 PoV, so it doesn't
> > know it's required by EFI runtime services, and dom0 could
> > legitimately overwrite the region once it considers all ACPI parsing
> > done from it's side.
> 
> PV Dom0 won't ever see E820_ACPI in the relevant E820 map; this type can
> only appear in the machine E820. In how far PVH Dom0 might need to take
> special care I can't tell right now (but at least for kexec purposes I
> expect Linux isn't going to recycle E820_ACPI regions even going forward).

Even if unlikely, couldn't some dom0 OS look at the machine map after
processing ACPI and just decide to overwrite the ACPI regions?

Not that it's useful from an OS PoV, but also we have no statement
saying that E820_ACPI in the machine memory map shouldn't be
overwritten.

Thanks, Roger.
Jan Beulich Oct. 4, 2022, 1:10 p.m. UTC | #18
On 04.10.2022 14:52, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 02:18:31PM +0200, Jan Beulich wrote:
>> On 04.10.2022 12:54, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 12:44:16PM +0200, Jan Beulich wrote:
>>>> On 04.10.2022 12:38, Roger Pau Monné wrote:
>>>>> On Tue, Oct 04, 2022 at 12:23:23PM +0200, Jan Beulich wrote:
>>>>>> On 04.10.2022 11:33, Roger Pau Monné wrote:
>>>>>>> On Tue, Oct 04, 2022 at 10:06:36AM +0200, Jan Beulich wrote:
>>>>>>>> On 30.09.2022 16:28, Roger Pau Monné wrote:
>>>>>>>>> On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
>>>>>>>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>>>>>>>>>> higher priority than the type of the range. To avoid accessing memory at
>>>>>>>>>> runtime which was re-used for other purposes, make
>>>>>>>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>>>>>>>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>>>>>>>>>> E820_ACPI memory there and hence that type's handling can be left alone.
>>>>>>>>>
>>>>>>>>> What about dom0?  Should it be translated to E820_RESERVED so that
>>>>>>>>> dom0 doesn't try to use it either?
>>>>>>>>
>>>>>>>> I'm afraid I don't understand the questions. Not the least because I
>>>>>>>> think "it" can't really mean "dom0" from the earlier sentence.
>>>>>>>
>>>>>>> Sorry, let me try again:
>>>>>>>
>>>>>>> The memory map provided to dom0 will contain E820_ACPI entries for
>>>>>>> memory ranges with the EFI_MEMORY_RUNTIME attributes in the EFI memory
>>>>>>> map.  Is there a risk from dom0 reclaiming such E820_ACPI ranges,
>>>>>>> overwriting the data needed for runtime services?
>>>>>>
>>>>>> How would Dom0 go about doing so? It has no control over what we hand
>>>>>> to the page allocator - it can only free pages which were actually
>>>>>> allocated to it. E820_ACPI and E820_RESERVED pages are assigned to
>>>>>> DomIO - Dom0 can map and access them, but it cannot free them.
>>>>>
>>>>> Maybe I'm very confused, but what about dom0 overwriting the data
>>>>> there, won't it cause issues to runtime services?
>>>>
>>>> If it overwrites it, of course there are going to be issues. Just like
>>>> there are going to be problems from anything else Dom0 does wrong.
>>>
>>> But would dom0 know it's doing something wrong?
>>
>> Yes. Please also see my reply to Andrew.
>>
>>> The region is just marked as E820_ACPI from dom0 PoV, so it doesn't
>>> know it's required by EFI runtime services, and dom0 could
>>> legitimately overwrite the region once it considers all ACPI parsing
>>> done from it's side.
>>
>> PV Dom0 won't ever see E820_ACPI in the relevant E820 map; this type can
>> only appear in the machine E820. In how far PVH Dom0 might need to take
>> special care I can't tell right now (but at least for kexec purposes I
>> expect Linux isn't going to recycle E820_ACPI regions even going forward).
> 
> Even if unlikely, couldn't some dom0 OS look at the machine map after
> processing ACPI and just decide to overwrite the ACPI regions?
> 
> Not that it's useful from an OS PoV, but also we have no statement
> saying that E820_ACPI in the machine memory map shouldn't be
> overwritten.

There are many things we have no statements for, yet we imply certain
behavior or restrictions. The machine memory map, imo, clearly isn't
intended for this kind of use.

Jan
Roger Pau Monné Oct. 4, 2022, 2:01 p.m. UTC | #19
On Tue, Oct 04, 2022 at 03:10:57PM +0200, Jan Beulich wrote:
> On 04.10.2022 14:52, Roger Pau Monné wrote:
> > On Tue, Oct 04, 2022 at 02:18:31PM +0200, Jan Beulich wrote:
> >> On 04.10.2022 12:54, Roger Pau Monné wrote:
> >>> On Tue, Oct 04, 2022 at 12:44:16PM +0200, Jan Beulich wrote:
> >>>> On 04.10.2022 12:38, Roger Pau Monné wrote:
> >>>>> On Tue, Oct 04, 2022 at 12:23:23PM +0200, Jan Beulich wrote:
> >>>>>> On 04.10.2022 11:33, Roger Pau Monné wrote:
> >>>>>>> On Tue, Oct 04, 2022 at 10:06:36AM +0200, Jan Beulich wrote:
> >>>>>>>> On 30.09.2022 16:28, Roger Pau Monné wrote:
> >>>>>>>>> On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
> >>>>>>>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
> >>>>>>>>>> higher priority than the type of the range. To avoid accessing memory at
> >>>>>>>>>> runtime which was re-used for other purposes, make
> >>>>>>>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
> >>>>>>>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
> >>>>>>>>>> E820_ACPI memory there and hence that type's handling can be left alone.
> >>>>>>>>>
> >>>>>>>>> What about dom0?  Should it be translated to E820_RESERVED so that
> >>>>>>>>> dom0 doesn't try to use it either?
> >>>>>>>>
> >>>>>>>> I'm afraid I don't understand the questions. Not the least because I
> >>>>>>>> think "it" can't really mean "dom0" from the earlier sentence.
> >>>>>>>
> >>>>>>> Sorry, let me try again:
> >>>>>>>
> >>>>>>> The memory map provided to dom0 will contain E820_ACPI entries for
> >>>>>>> memory ranges with the EFI_MEMORY_RUNTIME attributes in the EFI memory
> >>>>>>> map.  Is there a risk from dom0 reclaiming such E820_ACPI ranges,
> >>>>>>> overwriting the data needed for runtime services?
> >>>>>>
> >>>>>> How would Dom0 go about doing so? It has no control over what we hand
> >>>>>> to the page allocator - it can only free pages which were actually
> >>>>>> allocated to it. E820_ACPI and E820_RESERVED pages are assigned to
> >>>>>> DomIO - Dom0 can map and access them, but it cannot free them.
> >>>>>
> >>>>> Maybe I'm very confused, but what about dom0 overwriting the data
> >>>>> there, won't it cause issues to runtime services?
> >>>>
> >>>> If it overwrites it, of course there are going to be issues. Just like
> >>>> there are going to be problems from anything else Dom0 does wrong.
> >>>
> >>> But would dom0 know it's doing something wrong?
> >>
> >> Yes. Please also see my reply to Andrew.
> >>
> >>> The region is just marked as E820_ACPI from dom0 PoV, so it doesn't
> >>> know it's required by EFI runtime services, and dom0 could
> >>> legitimately overwrite the region once it considers all ACPI parsing
> >>> done from it's side.
> >>
> >> PV Dom0 won't ever see E820_ACPI in the relevant E820 map; this type can
> >> only appear in the machine E820. In how far PVH Dom0 might need to take
> >> special care I can't tell right now (but at least for kexec purposes I
> >> expect Linux isn't going to recycle E820_ACPI regions even going forward).
> > 
> > Even if unlikely, couldn't some dom0 OS look at the machine map after
> > processing ACPI and just decide to overwrite the ACPI regions?
> > 
> > Not that it's useful from an OS PoV, but also we have no statement
> > saying that E820_ACPI in the machine memory map shouldn't be
> > overwritten.
> 
> There are many things we have no statements for, yet we imply certain
> behavior or restrictions. The machine memory map, imo, clearly isn't
> intended for this kind of use.

There isn't much I can say then.  I do feel we are creating rules out
of thin air.

I do think the commit message should mention that we rely on dom0 not
overwriting the data in the E820_ACPI regions on the machine memory
map.

Thanks, Roger.
Jan Beulich Oct. 4, 2022, 2:39 p.m. UTC | #20
On 04.10.2022 16:01, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 03:10:57PM +0200, Jan Beulich wrote:
>> On 04.10.2022 14:52, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 02:18:31PM +0200, Jan Beulich wrote:
>>>> On 04.10.2022 12:54, Roger Pau Monné wrote:
>>>>> On Tue, Oct 04, 2022 at 12:44:16PM +0200, Jan Beulich wrote:
>>>>>> On 04.10.2022 12:38, Roger Pau Monné wrote:
>>>>>>> On Tue, Oct 04, 2022 at 12:23:23PM +0200, Jan Beulich wrote:
>>>>>>>> On 04.10.2022 11:33, Roger Pau Monné wrote:
>>>>>>>>> On Tue, Oct 04, 2022 at 10:06:36AM +0200, Jan Beulich wrote:
>>>>>>>>>> On 30.09.2022 16:28, Roger Pau Monné wrote:
>>>>>>>>>>> On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
>>>>>>>>>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>>>>>>>>>>>> higher priority than the type of the range. To avoid accessing memory at
>>>>>>>>>>>> runtime which was re-used for other purposes, make
>>>>>>>>>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>>>>>>>>>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>>>>>>>>>>>> E820_ACPI memory there and hence that type's handling can be left alone.
>>>>>>>>>>>
>>>>>>>>>>> What about dom0?  Should it be translated to E820_RESERVED so that
>>>>>>>>>>> dom0 doesn't try to use it either?
>>>>>>>>>>
>>>>>>>>>> I'm afraid I don't understand the questions. Not the least because I
>>>>>>>>>> think "it" can't really mean "dom0" from the earlier sentence.
>>>>>>>>>
>>>>>>>>> Sorry, let me try again:
>>>>>>>>>
>>>>>>>>> The memory map provided to dom0 will contain E820_ACPI entries for
>>>>>>>>> memory ranges with the EFI_MEMORY_RUNTIME attributes in the EFI memory
>>>>>>>>> map.  Is there a risk from dom0 reclaiming such E820_ACPI ranges,
>>>>>>>>> overwriting the data needed for runtime services?
>>>>>>>>
>>>>>>>> How would Dom0 go about doing so? It has no control over what we hand
>>>>>>>> to the page allocator - it can only free pages which were actually
>>>>>>>> allocated to it. E820_ACPI and E820_RESERVED pages are assigned to
>>>>>>>> DomIO - Dom0 can map and access them, but it cannot free them.
>>>>>>>
>>>>>>> Maybe I'm very confused, but what about dom0 overwriting the data
>>>>>>> there, won't it cause issues to runtime services?
>>>>>>
>>>>>> If it overwrites it, of course there are going to be issues. Just like
>>>>>> there are going to be problems from anything else Dom0 does wrong.
>>>>>
>>>>> But would dom0 know it's doing something wrong?
>>>>
>>>> Yes. Please also see my reply to Andrew.
>>>>
>>>>> The region is just marked as E820_ACPI from dom0 PoV, so it doesn't
>>>>> know it's required by EFI runtime services, and dom0 could
>>>>> legitimately overwrite the region once it considers all ACPI parsing
>>>>> done from it's side.
>>>>
>>>> PV Dom0 won't ever see E820_ACPI in the relevant E820 map; this type can
>>>> only appear in the machine E820. In how far PVH Dom0 might need to take
>>>> special care I can't tell right now (but at least for kexec purposes I
>>>> expect Linux isn't going to recycle E820_ACPI regions even going forward).
>>>
>>> Even if unlikely, couldn't some dom0 OS look at the machine map after
>>> processing ACPI and just decide to overwrite the ACPI regions?
>>>
>>> Not that it's useful from an OS PoV, but also we have no statement
>>> saying that E820_ACPI in the machine memory map shouldn't be
>>> overwritten.
>>
>> There are many things we have no statements for, yet we imply certain
>> behavior or restrictions. The machine memory map, imo, clearly isn't
>> intended for this kind of use.
> 
> There isn't much I can say then.  I do feel we are creating rules out
> of thin air.
> 
> I do think the commit message should mention that we rely on dom0 not
> overwriting the data in the E820_ACPI regions on the machine memory
> map.

Hmm, am I getting it right that you think I need to add further
justification for a change I'm _not_ making? And which, if we wanted
to change our behavior, would require a similar change (or perhaps a
change elsewhere) in E820 (i.e. non-EFI) handling? The modification
I'm making is solely towards Xen's internal memory management. I'm
really having a hard time seeing how commenting on expected Dom0
behavior would fit here (leaving aside that I'm still puzzled by both
you and Andrew thinking that there's any whatsoever remote indication
anywhere that Dom0 recycling E820_ACPI could be an okay thing in a PV
Dom0 kernel). The more that marking EfiACPIReclaimMemory anything
other than E820_ACPI might, as iirc you did say yourself, also confuse
e.g. the ACPI subsystem of Dom0's kernel.

But well, would extending that sentence to "While on x86 in theory the
same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
E820_ACPI memory there (and it would be a bug if the Dom0 kernel tried
to do so, bypassing Xen's memory management), hence that type's
handling can be left alone" satisfy your request?

Jan
Roger Pau Monné Oct. 4, 2022, 3:20 p.m. UTC | #21
On Tue, Oct 04, 2022 at 04:39:26PM +0200, Jan Beulich wrote:
> On 04.10.2022 16:01, Roger Pau Monné wrote:
> > On Tue, Oct 04, 2022 at 03:10:57PM +0200, Jan Beulich wrote:
> >> On 04.10.2022 14:52, Roger Pau Monné wrote:
> >>> On Tue, Oct 04, 2022 at 02:18:31PM +0200, Jan Beulich wrote:
> >>>> On 04.10.2022 12:54, Roger Pau Monné wrote:
> >>>>> On Tue, Oct 04, 2022 at 12:44:16PM +0200, Jan Beulich wrote:
> >>>>>> On 04.10.2022 12:38, Roger Pau Monné wrote:
> >>>>>>> On Tue, Oct 04, 2022 at 12:23:23PM +0200, Jan Beulich wrote:
> >>>>>>>> On 04.10.2022 11:33, Roger Pau Monné wrote:
> >>>>>>>>> On Tue, Oct 04, 2022 at 10:06:36AM +0200, Jan Beulich wrote:
> >>>>>>>>>> On 30.09.2022 16:28, Roger Pau Monné wrote:
> >>>>>>>>>>> On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
> >>>>>>>>>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
> >>>>>>>>>>>> higher priority than the type of the range. To avoid accessing memory at
> >>>>>>>>>>>> runtime which was re-used for other purposes, make
> >>>>>>>>>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
> >>>>>>>>>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
> >>>>>>>>>>>> E820_ACPI memory there and hence that type's handling can be left alone.
> >>>>>>>>>>>
> >>>>>>>>>>> What about dom0?  Should it be translated to E820_RESERVED so that
> >>>>>>>>>>> dom0 doesn't try to use it either?
> >>>>>>>>>>
> >>>>>>>>>> I'm afraid I don't understand the questions. Not the least because I
> >>>>>>>>>> think "it" can't really mean "dom0" from the earlier sentence.
> >>>>>>>>>
> >>>>>>>>> Sorry, let me try again:
> >>>>>>>>>
> >>>>>>>>> The memory map provided to dom0 will contain E820_ACPI entries for
> >>>>>>>>> memory ranges with the EFI_MEMORY_RUNTIME attributes in the EFI memory
> >>>>>>>>> map.  Is there a risk from dom0 reclaiming such E820_ACPI ranges,
> >>>>>>>>> overwriting the data needed for runtime services?
> >>>>>>>>
> >>>>>>>> How would Dom0 go about doing so? It has no control over what we hand
> >>>>>>>> to the page allocator - it can only free pages which were actually
> >>>>>>>> allocated to it. E820_ACPI and E820_RESERVED pages are assigned to
> >>>>>>>> DomIO - Dom0 can map and access them, but it cannot free them.
> >>>>>>>
> >>>>>>> Maybe I'm very confused, but what about dom0 overwriting the data
> >>>>>>> there, won't it cause issues to runtime services?
> >>>>>>
> >>>>>> If it overwrites it, of course there are going to be issues. Just like
> >>>>>> there are going to be problems from anything else Dom0 does wrong.
> >>>>>
> >>>>> But would dom0 know it's doing something wrong?
> >>>>
> >>>> Yes. Please also see my reply to Andrew.
> >>>>
> >>>>> The region is just marked as E820_ACPI from dom0 PoV, so it doesn't
> >>>>> know it's required by EFI runtime services, and dom0 could
> >>>>> legitimately overwrite the region once it considers all ACPI parsing
> >>>>> done from it's side.
> >>>>
> >>>> PV Dom0 won't ever see E820_ACPI in the relevant E820 map; this type can
> >>>> only appear in the machine E820. In how far PVH Dom0 might need to take
> >>>> special care I can't tell right now (but at least for kexec purposes I
> >>>> expect Linux isn't going to recycle E820_ACPI regions even going forward).
> >>>
> >>> Even if unlikely, couldn't some dom0 OS look at the machine map after
> >>> processing ACPI and just decide to overwrite the ACPI regions?
> >>>
> >>> Not that it's useful from an OS PoV, but also we have no statement
> >>> saying that E820_ACPI in the machine memory map shouldn't be
> >>> overwritten.
> >>
> >> There are many things we have no statements for, yet we imply certain
> >> behavior or restrictions. The machine memory map, imo, clearly isn't
> >> intended for this kind of use.
> > 
> > There isn't much I can say then.  I do feel we are creating rules out
> > of thin air.
> > 
> > I do think the commit message should mention that we rely on dom0 not
> > overwriting the data in the E820_ACPI regions on the machine memory
> > map.
> 
> Hmm, am I getting it right that you think I need to add further
> justification for a change I'm _not_ making?

In the commit message you explicitly mentioned 'we don't actually
"reclaim" E820_ACPI memory' and I assumed that "we" in the sentence to
only include Xen.  Now I see that the "we" there seems to include both
Xen and the dom0 kernel.  This wasn't clear to me at first sight.

> And which, if we wanted
> to change our behavior, would require a similar change (or perhaps a
> change elsewhere) in E820 (i.e. non-EFI) handling?

Why would that be required?

Without EFI dom0 should be fine in overwriting (some?) of the data in
E820_ACPI regions once it's finished with all ACPI processing, as a
region of type E820_ACPI is reclaimable and Xen won't try to access it
once handled to dom0.

> The modification
> I'm making is solely towards Xen's internal memory management. I'm
> really having a hard time seeing how commenting on expected Dom0
> behavior would fit here

The type in the e820 memory map also gets propagated to dom0 in the
machine memory map hypercall, so it can have effect outside of Xen
itself.

> (leaving aside that I'm still puzzled by both
> you and Andrew thinking that there's any whatsoever remote indication
> anywhere that Dom0 recycling E820_ACPI could be an okay thing in a PV
> Dom0 kernel). The more that marking EfiACPIReclaimMemory anything
> other than E820_ACPI might, as iirc you did say yourself, also confuse
> e.g. the ACPI subsystem of Dom0's kernel.

Indeed.  There's no good way to convert a region of type
EfiACPIReclaimMemory that has the EFI_MEMORY_RUNTIME attribute set, as
there's no mapping to an e820 type.

One of the quirks of trying to retrofit an EFI memory map into e820
format.

> But well, would extending that sentence to "While on x86 in theory the
> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
> E820_ACPI memory there (and it would be a bug if the Dom0 kernel tried
> to do so, bypassing Xen's memory management), hence that type's
> handling can be left alone" satisfy your request?

I think that would indeed make it clearer.

Thanks, Roger.
Jan Beulich Oct. 4, 2022, 3:55 p.m. UTC | #22
On 04.10.2022 17:20, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 04:39:26PM +0200, Jan Beulich wrote:
>> On 04.10.2022 16:01, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 03:10:57PM +0200, Jan Beulich wrote:
>>>> On 04.10.2022 14:52, Roger Pau Monné wrote:
>>>>> On Tue, Oct 04, 2022 at 02:18:31PM +0200, Jan Beulich wrote:
>>>>>> On 04.10.2022 12:54, Roger Pau Monné wrote:
>>>>>>> On Tue, Oct 04, 2022 at 12:44:16PM +0200, Jan Beulich wrote:
>>>>>>>> On 04.10.2022 12:38, Roger Pau Monné wrote:
>>>>>>>>> On Tue, Oct 04, 2022 at 12:23:23PM +0200, Jan Beulich wrote:
>>>>>>>>>> On 04.10.2022 11:33, Roger Pau Monné wrote:
>>>>>>>>>>> On Tue, Oct 04, 2022 at 10:06:36AM +0200, Jan Beulich wrote:
>>>>>>>>>>>> On 30.09.2022 16:28, Roger Pau Monné wrote:
>>>>>>>>>>>>> On Fri, Sep 30, 2022 at 09:50:40AM +0200, Jan Beulich wrote:
>>>>>>>>>>>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>>>>>>>>>>>>>> higher priority than the type of the range. To avoid accessing memory at
>>>>>>>>>>>>>> runtime which was re-used for other purposes, make
>>>>>>>>>>>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>>>>>>>>>>>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>>>>>>>>>>>>>> E820_ACPI memory there and hence that type's handling can be left alone.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What about dom0?  Should it be translated to E820_RESERVED so that
>>>>>>>>>>>>> dom0 doesn't try to use it either?
>>>>>>>>>>>>
>>>>>>>>>>>> I'm afraid I don't understand the questions. Not the least because I
>>>>>>>>>>>> think "it" can't really mean "dom0" from the earlier sentence.
>>>>>>>>>>>
>>>>>>>>>>> Sorry, let me try again:
>>>>>>>>>>>
>>>>>>>>>>> The memory map provided to dom0 will contain E820_ACPI entries for
>>>>>>>>>>> memory ranges with the EFI_MEMORY_RUNTIME attributes in the EFI memory
>>>>>>>>>>> map.  Is there a risk from dom0 reclaiming such E820_ACPI ranges,
>>>>>>>>>>> overwriting the data needed for runtime services?
>>>>>>>>>>
>>>>>>>>>> How would Dom0 go about doing so? It has no control over what we hand
>>>>>>>>>> to the page allocator - it can only free pages which were actually
>>>>>>>>>> allocated to it. E820_ACPI and E820_RESERVED pages are assigned to
>>>>>>>>>> DomIO - Dom0 can map and access them, but it cannot free them.
>>>>>>>>>
>>>>>>>>> Maybe I'm very confused, but what about dom0 overwriting the data
>>>>>>>>> there, won't it cause issues to runtime services?
>>>>>>>>
>>>>>>>> If it overwrites it, of course there are going to be issues. Just like
>>>>>>>> there are going to be problems from anything else Dom0 does wrong.
>>>>>>>
>>>>>>> But would dom0 know it's doing something wrong?
>>>>>>
>>>>>> Yes. Please also see my reply to Andrew.
>>>>>>
>>>>>>> The region is just marked as E820_ACPI from dom0 PoV, so it doesn't
>>>>>>> know it's required by EFI runtime services, and dom0 could
>>>>>>> legitimately overwrite the region once it considers all ACPI parsing
>>>>>>> done from it's side.
>>>>>>
>>>>>> PV Dom0 won't ever see E820_ACPI in the relevant E820 map; this type can
>>>>>> only appear in the machine E820. In how far PVH Dom0 might need to take
>>>>>> special care I can't tell right now (but at least for kexec purposes I
>>>>>> expect Linux isn't going to recycle E820_ACPI regions even going forward).
>>>>>
>>>>> Even if unlikely, couldn't some dom0 OS look at the machine map after
>>>>> processing ACPI and just decide to overwrite the ACPI regions?
>>>>>
>>>>> Not that it's useful from an OS PoV, but also we have no statement
>>>>> saying that E820_ACPI in the machine memory map shouldn't be
>>>>> overwritten.
>>>>
>>>> There are many things we have no statements for, yet we imply certain
>>>> behavior or restrictions. The machine memory map, imo, clearly isn't
>>>> intended for this kind of use.
>>>
>>> There isn't much I can say then.  I do feel we are creating rules out
>>> of thin air.
>>>
>>> I do think the commit message should mention that we rely on dom0 not
>>> overwriting the data in the E820_ACPI regions on the machine memory
>>> map.
>>
>> Hmm, am I getting it right that you think I need to add further
>> justification for a change I'm _not_ making?
> 
> In the commit message you explicitly mentioned 'we don't actually
> "reclaim" E820_ACPI memory' and I assumed that "we" in the sentence to
> only include Xen.  Now I see that the "we" there seems to include both
> Xen and the dom0 kernel.  This wasn't clear to me at first sight.

It was clear, actually, as I did mean Xen alone. It didn't even occur to
me that one could consider Dom0 potentially trying to do so.

>> And which, if we wanted
>> to change our behavior, would require a similar change (or perhaps a
>> change elsewhere) in E820 (i.e. non-EFI) handling?
> 
> Why would that be required?

Because if EFI can (ab)use that type for other purposes, why couldn't
legacy firmware, too?

> Without EFI dom0 should be fine in overwriting (some?) of the data in
> E820_ACPI regions once it's finished with all ACPI processing, as a
> region of type E820_ACPI is reclaimable and Xen won't try to access it
> once handled to dom0.
> 
>> The modification
>> I'm making is solely towards Xen's internal memory management. I'm
>> really having a hard time seeing how commenting on expected Dom0
>> behavior would fit here
> 
> The type in the e820 memory map also gets propagated to dom0 in the
> machine memory map hypercall, so it can have effect outside of Xen
> itself.

If used beyond the very limited intended purposes, yes.

>> (leaving aside that I'm still puzzled by both
>> you and Andrew thinking that there's any whatsoever remote indication
>> anywhere that Dom0 recycling E820_ACPI could be an okay thing in a PV
>> Dom0 kernel). The more that marking EfiACPIReclaimMemory anything
>> other than E820_ACPI might, as iirc you did say yourself, also confuse
>> e.g. the ACPI subsystem of Dom0's kernel.
> 
> Indeed.  There's no good way to convert a region of type
> EfiACPIReclaimMemory that has the EFI_MEMORY_RUNTIME attribute set, as
> there's no mapping to an e820 type.
> 
> One of the quirks of trying to retrofit an EFI memory map into e820
> format.
> 
>> But well, would extending that sentence to "While on x86 in theory the
>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>> E820_ACPI memory there (and it would be a bug if the Dom0 kernel tried
>> to do so, bypassing Xen's memory management), hence that type's
>> handling can be left alone" satisfy your request?
> 
> I think that would indeed make it clearer.

Okay, I'll make the adjustment then and submit a v2. This will now need
an ack also by Henry anyway.

Jan
Jan Beulich Oct. 4, 2022, 3:58 p.m. UTC | #23
On 30.09.2022 14:51, Bertrand Marquis wrote:
>> On 30 Sep 2022, at 09:50, Jan Beulich <jbeulich@suse.com> wrote:
>>
>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>> higher priority than the type of the range. To avoid accessing memory at
>> runtime which was re-used for other purposes, make
>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>> E820_ACPI memory there and hence that type's handling can be left alone.
>>
>> Fixes: bf6501a62e80 ("x86-64: EFI boot code")
>> Fixes: facac0af87ef ("x86-64: EFI runtime code")
>> Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> #arm

Thanks. However ...

>> ---
>> Partly RFC for Arm, for two reasons:
>>
>> On Arm I question the conversion of EfiACPIReclaimMemory, in two ways:
>> For one like on x86 such ranges would likely better be retained, as Dom0
>> may (will?) have a need to look at tables placed there. Plus converting
>> such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to
>> me as well. I'd be inclined to make the latter adjustment right here
>> (while the other change probably would better be separate, if there
>> aren't actually reasons for the present behavior).

... any views on this WB aspect at least (also Stefano or Julien)? Would be
good to know before I send v2.

Jan
Julien Grall Oct. 5, 2022, 10:44 a.m. UTC | #24
Hi Jan,

On 04/10/2022 16:58, Jan Beulich wrote:
> On 30.09.2022 14:51, Bertrand Marquis wrote:
>>> On 30 Sep 2022, at 09:50, Jan Beulich <jbeulich@suse.com> wrote:
>>>
>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>>> higher priority than the type of the range. To avoid accessing memory at
>>> runtime which was re-used for other purposes, make
>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>>> E820_ACPI memory there and hence that type's handling can be left alone.
>>>
>>> Fixes: bf6501a62e80 ("x86-64: EFI boot code")
>>> Fixes: facac0af87ef ("x86-64: EFI runtime code")
>>> Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> #arm
> 
> Thanks. However ...
> 
>>> ---
>>> Partly RFC for Arm, for two reasons:
>>>
>>> On Arm I question the conversion of EfiACPIReclaimMemory, in two ways:
>>> For one like on x86 such ranges would likely better be retained, as Dom0
>>> may (will?) have a need to look at tables placed there. Plus converting
>>> such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to
>>> me as well. I'd be inclined to make the latter adjustment right here
>>> (while the other change probably would better be separate, if there
>>> aren't actually reasons for the present behavior).
> 
> ... any views on this WB aspect at least (also Stefano or Julien)? Would be
> good to know before I send v2.

I don't quite understand what you are questioning here. Looking at the 
code, EfiACPIReclaimMemory will not be converted to RAM but added in a 
separate array.

Furthermore, all the EfiACPIReclaimMemory regions will be passed to dom0 
(see acpi_create_efi_mmap_table()).

So to me the code looks correct.

Cheers,
Jan Beulich Oct. 5, 2022, 11:55 a.m. UTC | #25
On 05.10.2022 12:44, Julien Grall wrote:
> On 04/10/2022 16:58, Jan Beulich wrote:
>> On 30.09.2022 14:51, Bertrand Marquis wrote:
>>>> On 30 Sep 2022, at 09:50, Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>>>> higher priority than the type of the range. To avoid accessing memory at
>>>> runtime which was re-used for other purposes, make
>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>>>> E820_ACPI memory there and hence that type's handling can be left alone.
>>>>
>>>> Fixes: bf6501a62e80 ("x86-64: EFI boot code")
>>>> Fixes: facac0af87ef ("x86-64: EFI runtime code")
>>>> Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> #arm
>>
>> Thanks. However ...
>>
>>>> ---
>>>> Partly RFC for Arm, for two reasons:
>>>>
>>>> On Arm I question the conversion of EfiACPIReclaimMemory, in two ways:
>>>> For one like on x86 such ranges would likely better be retained, as Dom0
>>>> may (will?) have a need to look at tables placed there. Plus converting
>>>> such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to
>>>> me as well. I'd be inclined to make the latter adjustment right here
>>>> (while the other change probably would better be separate, if there
>>>> aren't actually reasons for the present behavior).
>>
>> ... any views on this WB aspect at least (also Stefano or Julien)? Would be
>> good to know before I send v2.
> 
> I don't quite understand what you are questioning here. Looking at the 
> code, EfiACPIReclaimMemory will not be converted to RAM but added in a 
> separate array.
> 
> Furthermore, all the EfiACPIReclaimMemory regions will be passed to dom0 
> (see acpi_create_efi_mmap_table()).
> 
> So to me the code looks correct.

Oh, I've indeed not paid enough attention to the first argument passed
to meminfo_add_bank(). I'm sorry for the extra noise. However, the
question I wanted to have addressed before sending out v3 was that
regarding the present using of memory when EFI_MEMORY_WB is not set.
Is that correct for the EfiACPIReclaimMemory case, i.e. is the
consumer (Dom0) aware that there might be a restriction? And would
this memory then be guaranteed to never be freed into the general pool
of RAM pages?

Jan
Julien Grall Oct. 5, 2022, 6:09 p.m. UTC | #26
Hi Jan,

On 05/10/2022 12:55, Jan Beulich wrote:
> On 05.10.2022 12:44, Julien Grall wrote:
>> On 04/10/2022 16:58, Jan Beulich wrote:
>>> On 30.09.2022 14:51, Bertrand Marquis wrote:
>>>>> On 30 Sep 2022, at 09:50, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>
>>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>>>>> higher priority than the type of the range. To avoid accessing memory at
>>>>> runtime which was re-used for other purposes, make
>>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>>>>> E820_ACPI memory there and hence that type's handling can be left alone.
>>>>>
>>>>> Fixes: bf6501a62e80 ("x86-64: EFI boot code")
>>>>> Fixes: facac0af87ef ("x86-64: EFI runtime code")
>>>>> Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> #arm
>>>
>>> Thanks. However ...
>>>
>>>>> ---
>>>>> Partly RFC for Arm, for two reasons:
>>>>>
>>>>> On Arm I question the conversion of EfiACPIReclaimMemory, in two ways:
>>>>> For one like on x86 such ranges would likely better be retained, as Dom0
>>>>> may (will?) have a need to look at tables placed there. Plus converting
>>>>> such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to
>>>>> me as well. I'd be inclined to make the latter adjustment right here
>>>>> (while the other change probably would better be separate, if there
>>>>> aren't actually reasons for the present behavior).
>>>
>>> ... any views on this WB aspect at least (also Stefano or Julien)? Would be
>>> good to know before I send v2.
>>
>> I don't quite understand what you are questioning here. Looking at the
>> code, EfiACPIReclaimMemory will not be converted to RAM but added in a
>> separate array.
>>
>> Furthermore, all the EfiACPIReclaimMemory regions will be passed to dom0
>> (see acpi_create_efi_mmap_table()).
>>
>> So to me the code looks correct.
> 
> Oh, I've indeed not paid enough attention to the first argument passed
> to meminfo_add_bank(). I'm sorry for the extra noise. However, the
> question I wanted to have addressed before sending out v3 was that
> regarding the present using of memory when EFI_MEMORY_WB is not set.
> Is that correct for the EfiACPIReclaimMemory case, i.e. is the
> consumer (Dom0) aware that there might be a restriction?

Looking at the code, we always set EFI_MEMORY_WB for the reclaimable 
region and the stage-2 mapping will be cachable.

So it looks like there would be a mismatch if EFI_MEMORY_WB is not set. 
However, given the region is reclaimable, shouldn't this imply that the 
flag is always set?

> And would
> this memory then be guaranteed to never be freed into the general pool
> of RAM pages?

The region is not treated as RAM by Xen and not owned by the dom0. 
Therefore, it should not be possible to free the page because 
get_page_from_gfn() would not be able to get a reference.

Cheers,
Jan Beulich Oct. 6, 2022, 8:39 a.m. UTC | #27
On 05.10.2022 20:09, Julien Grall wrote:
> Hi Jan,
> 
> On 05/10/2022 12:55, Jan Beulich wrote:
>> On 05.10.2022 12:44, Julien Grall wrote:
>>> On 04/10/2022 16:58, Jan Beulich wrote:
>>>> On 30.09.2022 14:51, Bertrand Marquis wrote:
>>>>>> On 30 Sep 2022, at 09:50, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>>>>>> higher priority than the type of the range. To avoid accessing memory at
>>>>>> runtime which was re-used for other purposes, make
>>>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>>>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>>>>>> E820_ACPI memory there and hence that type's handling can be left alone.
>>>>>>
>>>>>> Fixes: bf6501a62e80 ("x86-64: EFI boot code")
>>>>>> Fixes: facac0af87ef ("x86-64: EFI runtime code")
>>>>>> Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>
>>>>> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> #arm
>>>>
>>>> Thanks. However ...
>>>>
>>>>>> ---
>>>>>> Partly RFC for Arm, for two reasons:
>>>>>>
>>>>>> On Arm I question the conversion of EfiACPIReclaimMemory, in two ways:
>>>>>> For one like on x86 such ranges would likely better be retained, as Dom0
>>>>>> may (will?) have a need to look at tables placed there. Plus converting
>>>>>> such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to
>>>>>> me as well. I'd be inclined to make the latter adjustment right here
>>>>>> (while the other change probably would better be separate, if there
>>>>>> aren't actually reasons for the present behavior).
>>>>
>>>> ... any views on this WB aspect at least (also Stefano or Julien)? Would be
>>>> good to know before I send v2.
>>>
>>> I don't quite understand what you are questioning here. Looking at the
>>> code, EfiACPIReclaimMemory will not be converted to RAM but added in a
>>> separate array.
>>>
>>> Furthermore, all the EfiACPIReclaimMemory regions will be passed to dom0
>>> (see acpi_create_efi_mmap_table()).
>>>
>>> So to me the code looks correct.
>>
>> Oh, I've indeed not paid enough attention to the first argument passed
>> to meminfo_add_bank(). I'm sorry for the extra noise. However, the
>> question I wanted to have addressed before sending out v3 was that
>> regarding the present using of memory when EFI_MEMORY_WB is not set.
>> Is that correct for the EfiACPIReclaimMemory case, i.e. is the
>> consumer (Dom0) aware that there might be a restriction?
> 
> Looking at the code, we always set EFI_MEMORY_WB for the reclaimable 
> region and the stage-2 mapping will be cachable.
> 
> So it looks like there would be a mismatch if EFI_MEMORY_WB is not set. 
> However, given the region is reclaimable, shouldn't this imply that the 
> flag is always set?

Possibly (but then again consider [perhaps hypothetical] systems with e.g.
just WT caches, where specifying WB simply wouldn't make sense). In any
event, even if that's the case, being on the safe side and doing

        if ( (desc_ptr->Attribute & EFI_MEMORY_RUNTIME) ||
             !(desc_ptr->Attribute & EFI_MEMORY_WB) )
            /* nothing */;
        else if ( ...

would seem better to me. However, if the mapping you mention above
would be adjusted and ...

>> And would
>> this memory then be guaranteed to never be freed into the general pool
>> of RAM pages?
> 
> The region is not treated as RAM by Xen and not owned by the dom0. 
> Therefore, it should not be possible to free the page because 
> get_page_from_gfn() would not be able to get a reference.

... the space cannot become ordinary RAM, then such a precaution
wouldn't be necessary. After all hiding EfiACPIReclaimMemory from
Dom0 just because it can't be mapped WB wouldn't be very nice
either. I guess I'll submit v2 with this part of the change left
as it was.

Jan
Jan Beulich Oct. 6, 2022, 2:11 p.m. UTC | #28
On 06.10.2022 10:39, Jan Beulich wrote:
> On 05.10.2022 20:09, Julien Grall wrote:
>> Hi Jan,
>>
>> On 05/10/2022 12:55, Jan Beulich wrote:
>>> On 05.10.2022 12:44, Julien Grall wrote:
>>>> On 04/10/2022 16:58, Jan Beulich wrote:
>>>>> On 30.09.2022 14:51, Bertrand Marquis wrote:
>>>>>>> On 30 Sep 2022, at 09:50, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>
>>>>>>> efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
>>>>>>> higher priority than the type of the range. To avoid accessing memory at
>>>>>>> runtime which was re-used for other purposes, make
>>>>>>> efi_arch_process_memory_map() follow suit. While on x86 in theory the
>>>>>>> same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
>>>>>>> E820_ACPI memory there and hence that type's handling can be left alone.
>>>>>>>
>>>>>>> Fixes: bf6501a62e80 ("x86-64: EFI boot code")
>>>>>>> Fixes: facac0af87ef ("x86-64: EFI runtime code")
>>>>>>> Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
>>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>>
>>>>>> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> #arm
>>>>>
>>>>> Thanks. However ...
>>>>>
>>>>>>> ---
>>>>>>> Partly RFC for Arm, for two reasons:
>>>>>>>
>>>>>>> On Arm I question the conversion of EfiACPIReclaimMemory, in two ways:
>>>>>>> For one like on x86 such ranges would likely better be retained, as Dom0
>>>>>>> may (will?) have a need to look at tables placed there. Plus converting
>>>>>>> such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to
>>>>>>> me as well. I'd be inclined to make the latter adjustment right here
>>>>>>> (while the other change probably would better be separate, if there
>>>>>>> aren't actually reasons for the present behavior).
>>>>>
>>>>> ... any views on this WB aspect at least (also Stefano or Julien)? Would be
>>>>> good to know before I send v2.
>>>>
>>>> I don't quite understand what you are questioning here. Looking at the
>>>> code, EfiACPIReclaimMemory will not be converted to RAM but added in a
>>>> separate array.
>>>>
>>>> Furthermore, all the EfiACPIReclaimMemory regions will be passed to dom0
>>>> (see acpi_create_efi_mmap_table()).
>>>>
>>>> So to me the code looks correct.
>>>
>>> Oh, I've indeed not paid enough attention to the first argument passed
>>> to meminfo_add_bank(). I'm sorry for the extra noise. However, the
>>> question I wanted to have addressed before sending out v3 was that
>>> regarding the present using of memory when EFI_MEMORY_WB is not set.
>>> Is that correct for the EfiACPIReclaimMemory case, i.e. is the
>>> consumer (Dom0) aware that there might be a restriction?
>>
>> Looking at the code, we always set EFI_MEMORY_WB for the reclaimable 
>> region and the stage-2 mapping will be cachable.
>>
>> So it looks like there would be a mismatch if EFI_MEMORY_WB is not set. 
>> However, given the region is reclaimable, shouldn't this imply that the 
>> flag is always set?
> 
> Possibly (but then again consider [perhaps hypothetical] systems with e.g.
> just WT caches, where specifying WB simply wouldn't make sense). In any
> event, even if that's the case, being on the safe side and doing
> 
>         if ( (desc_ptr->Attribute & EFI_MEMORY_RUNTIME) ||
>              !(desc_ptr->Attribute & EFI_MEMORY_WB) )
>             /* nothing */;
>         else if ( ...
> 
> would seem better to me. However, if the mapping you mention above
> would be adjusted and ...
> 
>>> And would
>>> this memory then be guaranteed to never be freed into the general pool
>>> of RAM pages?
>>
>> The region is not treated as RAM by Xen and not owned by the dom0. 
>> Therefore, it should not be possible to free the page because 
>> get_page_from_gfn() would not be able to get a reference.
> 
> ... the space cannot become ordinary RAM, then such a precaution
> wouldn't be necessary. After all hiding EfiACPIReclaimMemory from
> Dom0 just because it can't be mapped WB wouldn't be very nice
> either. I guess I'll submit v2 with this part of the change left
> as it was.

And while already in the process of committing the patch I came to
realize that if the WB conditional isn't supposed to move, isn't
the change done for Arm then wrong as well? Shouldn't it then be

        if ( !(desc_ptr->Attribute & EFI_MEMORY_RUNTIME) &&
             (desc_ptr->Attribute & EFI_MEMORY_WB) &&
             (desc_ptr->Type == EfiConventionalMemory ||
             ...

leaving the EfiACPIReclaimMemory case entirely unaffected by the
change?

Jan
Julien Grall Oct. 8, 2022, 7:08 p.m. UTC | #29
Hi Jan,

On 06/10/2022 15:11, Jan Beulich wrote:
>> ... the space cannot become ordinary RAM, then such a precaution
>> wouldn't be necessary. After all hiding EfiACPIReclaimMemory from
>> Dom0 just because it can't be mapped WB wouldn't be very nice
>> either. I guess I'll submit v2 with this part of the change left
>> as it was.
> 
> And while already in the process of committing the patch I came to
> realize that if the WB conditional isn't supposed to move, isn't
> the change done for Arm then wrong as well? Shouldn't it then be
> 
>          if ( !(desc_ptr->Attribute & EFI_MEMORY_RUNTIME) &&
>               (desc_ptr->Attribute & EFI_MEMORY_WB) &&
>               (desc_ptr->Type == EfiConventionalMemory ||
>               ...
> 
> leaving the EfiACPIReclaimMemory case entirely unaffected by the
> change?

IIUC, the concern is the region EfiACPIReclaimMemory could have the 
attribute EFI_MEMORY_RUNTIME. Is that correct?

Given that the memory is reclaimable, I am not sure why it can also have 
this atribute set (to me it means the opposite). But I guess for 
hardening purpose it would be better to use the version you just suggested.

Bertrand, Stefano, what do you think?

Cheers,
Jan Beulich Oct. 10, 2022, 6:20 a.m. UTC | #30
On 08.10.2022 21:08, Julien Grall wrote:
> On 06/10/2022 15:11, Jan Beulich wrote:
>>> ... the space cannot become ordinary RAM, then such a precaution
>>> wouldn't be necessary. After all hiding EfiACPIReclaimMemory from
>>> Dom0 just because it can't be mapped WB wouldn't be very nice
>>> either. I guess I'll submit v2 with this part of the change left
>>> as it was.
>>
>> And while already in the process of committing the patch I came to
>> realize that if the WB conditional isn't supposed to move, isn't
>> the change done for Arm then wrong as well? Shouldn't it then be
>>
>>          if ( !(desc_ptr->Attribute & EFI_MEMORY_RUNTIME) &&
>>               (desc_ptr->Attribute & EFI_MEMORY_WB) &&
>>               (desc_ptr->Type == EfiConventionalMemory ||
>>               ...
>>
>> leaving the EfiACPIReclaimMemory case entirely unaffected by the
>> change?
> 
> IIUC, the concern is the region EfiACPIReclaimMemory could have the attribute EFI_MEMORY_RUNTIME. Is that correct?

Yes, ...

> Given that the memory is reclaimable, I am not sure why it can also have this atribute set (to me it means the opposite).

... at least on x86 all sorts of strange/bogus type/attribute combinations
have been observed.

Jan

> But I guess for hardening purpose it would be better to use the version you just suggested.
> 
> Bertrand, Stefano, what do you think?
> 
> Cheers,
>
Stefano Stabellini Oct. 10, 2022, 11:58 p.m. UTC | #31
On Mon, 10 Oct 2022, Jan Beulich wrote:
> On 08.10.2022 21:08, Julien Grall wrote:
> > On 06/10/2022 15:11, Jan Beulich wrote:
> >>> ... the space cannot become ordinary RAM, then such a precaution
> >>> wouldn't be necessary. After all hiding EfiACPIReclaimMemory from
> >>> Dom0 just because it can't be mapped WB wouldn't be very nice
> >>> either. I guess I'll submit v2 with this part of the change left
> >>> as it was.
> >>
> >> And while already in the process of committing the patch I came to
> >> realize that if the WB conditional isn't supposed to move, isn't
> >> the change done for Arm then wrong as well? Shouldn't it then be
> >>
> >>          if ( !(desc_ptr->Attribute & EFI_MEMORY_RUNTIME) &&
> >>               (desc_ptr->Attribute & EFI_MEMORY_WB) &&
> >>               (desc_ptr->Type == EfiConventionalMemory ||
> >>               ...
> >>
> >> leaving the EfiACPIReclaimMemory case entirely unaffected by the
> >> change?
> > 
> > IIUC, the concern is the region EfiACPIReclaimMemory could have the attribute EFI_MEMORY_RUNTIME. Is that correct?
> 
> Yes, ...
> 
> > Given that the memory is reclaimable, I am not sure why it can also have this atribute set (to me it means the opposite).
> 
> ... at least on x86 all sorts of strange/bogus type/attribute combinations
> have been observed.

Yeah... it is a good idea to be able to cope with strange and bogus
firmware tables as it is known to happen
Bertrand Marquis Oct. 11, 2022, 7:52 a.m. UTC | #32
Hi,

> On 11 Oct 2022, at 00:58, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> On Mon, 10 Oct 2022, Jan Beulich wrote:
>> On 08.10.2022 21:08, Julien Grall wrote:
>>> On 06/10/2022 15:11, Jan Beulich wrote:
>>>>> ... the space cannot become ordinary RAM, then such a precaution
>>>>> wouldn't be necessary. After all hiding EfiACPIReclaimMemory from
>>>>> Dom0 just because it can't be mapped WB wouldn't be very nice
>>>>> either. I guess I'll submit v2 with this part of the change left
>>>>> as it was.
>>>> 
>>>> And while already in the process of committing the patch I came to
>>>> realize that if the WB conditional isn't supposed to move, isn't
>>>> the change done for Arm then wrong as well? Shouldn't it then be
>>>> 
>>>>          if ( !(desc_ptr->Attribute & EFI_MEMORY_RUNTIME) &&
>>>>               (desc_ptr->Attribute & EFI_MEMORY_WB) &&
>>>>               (desc_ptr->Type == EfiConventionalMemory ||
>>>>               ...
>>>> 
>>>> leaving the EfiACPIReclaimMemory case entirely unaffected by the
>>>> change?
>>> 
>>> IIUC, the concern is the region EfiACPIReclaimMemory could have the attribute EFI_MEMORY_RUNTIME. Is that correct?
>> 
>> Yes, ...
>> 
>>> Given that the memory is reclaimable, I am not sure why it can also have this atribute set (to me it means the opposite).
>> 
>> ... at least on x86 all sorts of strange/bogus type/attribute combinations
>> have been observed.
> 
> Yeah... it is a good idea to be able to cope with strange and bogus
> firmware tables as it is known to happen

I agree with that but if we make an assumption that something is bogus, we should at least warn the user if possible.

Bertrand
diff mbox series

Patch

--- a/xen/arch/arm/efi/efi-boot.h
+++ b/xen/arch/arm/efi/efi-boot.h
@@ -183,13 +183,15 @@  static EFI_STATUS __init efi_process_mem
 
     for ( Index = 0; Index < (mmap_size / desc_size); Index++ )
     {
-        if ( desc_ptr->Attribute & EFI_MEMORY_WB &&
-             (desc_ptr->Type == EfiConventionalMemory ||
-              desc_ptr->Type == EfiLoaderCode ||
-              desc_ptr->Type == EfiLoaderData ||
-              (!map_bs &&
-               (desc_ptr->Type == EfiBootServicesCode ||
-                desc_ptr->Type == EfiBootServicesData))) )
+        if ( desc_ptr->Attribute & EFI_MEMORY_RUNTIME )
+            /* nothing */;
+        else if ( (desc_ptr->Attribute & EFI_MEMORY_WB) &&
+                  (desc_ptr->Type == EfiConventionalMemory ||
+                   desc_ptr->Type == EfiLoaderCode ||
+                   desc_ptr->Type == EfiLoaderData ||
+                   (!map_bs &&
+                    (desc_ptr->Type == EfiBootServicesCode ||
+                     desc_ptr->Type == EfiBootServicesData))) )
         {
             if ( !meminfo_add_bank(&bootinfo.mem, desc_ptr) )
             {
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -185,7 +185,9 @@  static void __init efi_arch_process_memo
             /* fall through */
         case EfiLoaderCode:
         case EfiLoaderData:
-            if ( desc->Attribute & EFI_MEMORY_WB )
+            if ( desc->Attribute & EFI_MEMORY_RUNTIME )
+                type = E820_RESERVED;
+            else if ( desc->Attribute & EFI_MEMORY_WB )
                 type = E820_RAM;
             else
         case EfiUnusableMemory: