diff mbox series

hvmloader: indicate firmware tables as ACPI NVS in e820

Message ID 1598573599-23792-1-git-send-email-igor.druzhinin@citrix.com (mailing list archive)
State New, archived
Headers show
Series hvmloader: indicate firmware tables as ACPI NVS in e820 | expand

Commit Message

Igor Druzhinin Aug. 28, 2020, 12:13 a.m. UTC
Guest kernel does need to know in some cases where the tables are located
to treat these regions properly. One example is kexec process where
the first kernel needs to pass firmware region locations to the second
kernel which is now a requirement after 02a3e3cdb7f12 ("x86/boot: Parse SRAT
table and count immovable memory regions").

The memory that hvmloader allocates in the reserved region mostly contains
these useful tables and could be safely indicated as ACPI without the need
to designate a sub-region specially for that. Making it non-reclaimable
(ACPI NVS) would avoid potential reuse of this memory by the guest.
Swtiching from Reserved to ACPI NVS type for this memory would also mean
its content is preserved across S4 (which is true for any ACPI type according
to the spec).

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
 tools/firmware/hvmloader/e820.c | 21 +++++++++++++++++----
 tools/firmware/hvmloader/util.c |  6 ++++++
 tools/firmware/hvmloader/util.h |  3 +++
 3 files changed, 26 insertions(+), 4 deletions(-)

Comments

Jan Beulich Aug. 28, 2020, 7:51 a.m. UTC | #1
On 28.08.2020 02:13, Igor Druzhinin wrote:
> Guest kernel does need to know in some cases where the tables are located
> to treat these regions properly. One example is kexec process where
> the first kernel needs to pass firmware region locations to the second
> kernel which is now a requirement after 02a3e3cdb7f12 ("x86/boot: Parse SRAT
> table and count immovable memory regions").
> 
> The memory that hvmloader allocates in the reserved region mostly contains
> these useful tables and could be safely indicated as ACPI without the need
> to designate a sub-region specially for that. Making it non-reclaimable
> (ACPI NVS) would avoid potential reuse of this memory by the guest.
> Swtiching from Reserved to ACPI NVS type for this memory would also mean
> its content is preserved across S4 (which is true for any ACPI type according
> to the spec).

Marking the range as ACPI is certainly fine, but why did you choose NVS?
There's nothing non-volatile there afaict. And the part of the last
sentence in parentheses suggests to me that the "normal" ACPI type is as
suitable for the purpose you're after.

> --- a/tools/firmware/hvmloader/e820.c
> +++ b/tools/firmware/hvmloader/e820.c
> @@ -155,6 +155,8 @@ int build_e820_table(struct e820entry *e820,
>  {
>      unsigned int nr = 0, i, j;
>      uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
> +    uint32_t firmware_mem_end =
> +        RESERVED_MEMORY_DYNAMIC_START + (mem_mfns_allocated() << PAGE_SHIFT);

Here and elsewhere - please avoid uint32_t and friends when a plain
C type (unsigned int or unsigned long in this case) will do.

> @@ -210,8 +223,8 @@ int build_e820_table(struct e820entry *e820,
>      {
>          uint32_t igd_opregion_base = igd_opregion_pgbase << PAGE_SHIFT;
>  
> -        e820[nr].addr = RESERVED_MEMBASE;
> -        e820[nr].size = (uint32_t) igd_opregion_base - RESERVED_MEMBASE;
> +        e820[nr].addr = firmware_mem_end;
> +        e820[nr].size = (uint32_t) igd_opregion_base - firmware_mem_end;

Any chance I could talk you into also dropping the pointless cast
at this occasion?

Jan
Igor Druzhinin Aug. 28, 2020, 2:45 p.m. UTC | #2
On 28/08/2020 08:51, Jan Beulich wrote:
> On 28.08.2020 02:13, Igor Druzhinin wrote:
>> Guest kernel does need to know in some cases where the tables are located
>> to treat these regions properly. One example is kexec process where
>> the first kernel needs to pass firmware region locations to the second
>> kernel which is now a requirement after 02a3e3cdb7f12 ("x86/boot: Parse SRAT
>> table and count immovable memory regions").
>>
>> The memory that hvmloader allocates in the reserved region mostly contains
>> these useful tables and could be safely indicated as ACPI without the need
>> to designate a sub-region specially for that. Making it non-reclaimable
>> (ACPI NVS) would avoid potential reuse of this memory by the guest.
>> Swtiching from Reserved to ACPI NVS type for this memory would also mean
>> its content is preserved across S4 (which is true for any ACPI type according
>> to the spec).
> 
> Marking the range as ACPI is certainly fine, but why did you choose NVS?
> There's nothing non-volatile there afaict. And the part of the last
> sentence in parentheses suggests to me that the "normal" ACPI type is as
> suitable for the purpose you're after.

The problem with normal ACPI type is that according to the spec it's reclaimable
by the OS:
"This range is available RAM usable by the OS after it reads the ACPI tables."
while NVS type is specifically designated as non-reclaimable. The spec is also
denotes that both normal and NVS types should be preserved across S4 - that sounds
ambiguous to me. But it might mean that non-NVS preservation is not OS
responsibility as it's assumed to be static.

Our region also contains VM86 TSS which we certainly need at runtime (although that
could be allocated from the reserved region above if desirable).

To stay on the safe side and do not rely on perceived sensible OS behavior NVS type
was chosen which OS shouldn't touch under any circumstances.

In fact while writing this response I found that document which confirms some of my
suspicions:
http://www.baldwin.cx/~phoenix/reference/docs/acpi_impguide.pdf

>> --- a/tools/firmware/hvmloader/e820.c
>> +++ b/tools/firmware/hvmloader/e820.c
>> @@ -155,6 +155,8 @@ int build_e820_table(struct e820entry *e820,
>>  {
>>      unsigned int nr = 0, i, j;
>>      uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
>> +    uint32_t firmware_mem_end =
>> +        RESERVED_MEMORY_DYNAMIC_START + (mem_mfns_allocated() << PAGE_SHIFT);
> 
> Here and elsewhere - please avoid uint32_t and friends when a plain
> C type (unsigned int or unsigned long in this case) will do.

Ok, should I also take a chance and convert some of the surroundings?

>> @@ -210,8 +223,8 @@ int build_e820_table(struct e820entry *e820,
>>      {
>>          uint32_t igd_opregion_base = igd_opregion_pgbase << PAGE_SHIFT;
>>  
>> -        e820[nr].addr = RESERVED_MEMBASE;
>> -        e820[nr].size = (uint32_t) igd_opregion_base - RESERVED_MEMBASE;
>> +        e820[nr].addr = firmware_mem_end;
>> +        e820[nr].size = (uint32_t) igd_opregion_base - firmware_mem_end;
> 
> Any chance I could talk you into also dropping the pointless cast
> at this occasion?

Sure.

Igor
Jan Beulich Aug. 28, 2020, 3:50 p.m. UTC | #3
On 28.08.2020 16:45, Igor Druzhinin wrote:
> On 28/08/2020 08:51, Jan Beulich wrote:
>> On 28.08.2020 02:13, Igor Druzhinin wrote:
>>> Guest kernel does need to know in some cases where the tables are located
>>> to treat these regions properly. One example is kexec process where
>>> the first kernel needs to pass firmware region locations to the second
>>> kernel which is now a requirement after 02a3e3cdb7f12 ("x86/boot: Parse SRAT
>>> table and count immovable memory regions").
>>>
>>> The memory that hvmloader allocates in the reserved region mostly contains
>>> these useful tables and could be safely indicated as ACPI without the need
>>> to designate a sub-region specially for that. Making it non-reclaimable
>>> (ACPI NVS) would avoid potential reuse of this memory by the guest.
>>> Swtiching from Reserved to ACPI NVS type for this memory would also mean
>>> its content is preserved across S4 (which is true for any ACPI type according
>>> to the spec).
>>
>> Marking the range as ACPI is certainly fine, but why did you choose NVS?
>> There's nothing non-volatile there afaict. And the part of the last
>> sentence in parentheses suggests to me that the "normal" ACPI type is as
>> suitable for the purpose you're after.
> 
> The problem with normal ACPI type is that according to the spec it's reclaimable
> by the OS:
> "This range is available RAM usable by the OS after it reads the ACPI tables."
> while NVS type is specifically designated as non-reclaimable. The spec is also
> denotes that both normal and NVS types should be preserved across S4 - that sounds
> ambiguous to me. But it might mean that non-NVS preservation is not OS
> responsibility as it's assumed to be static.

I've taken a random system and found that all of its "real" ACPI
tables, in particular the DSDT, live in "ACPI data", not "ACPI NVS".
The DSDT includes AML, which I understand is needed at runtime. So
"after it reads the ACPI tables" is even more ambiguous than just
wrt S4 as you say.

> Our region also contains VM86 TSS which we certainly need at runtime (although that
> could be allocated from the reserved region above if desirable).
> 
> To stay on the safe side and do not rely on perceived sensible OS behavior NVS type
> was chosen which OS shouldn't touch under any circumstances.

Can you at the very least emphasize this in the description?

>>> --- a/tools/firmware/hvmloader/e820.c
>>> +++ b/tools/firmware/hvmloader/e820.c
>>> @@ -155,6 +155,8 @@ int build_e820_table(struct e820entry *e820,
>>>  {
>>>      unsigned int nr = 0, i, j;
>>>      uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
>>> +    uint32_t firmware_mem_end =
>>> +        RESERVED_MEMORY_DYNAMIC_START + (mem_mfns_allocated() << PAGE_SHIFT);
>>
>> Here and elsewhere - please avoid uint32_t and friends when a plain
>> C type (unsigned int or unsigned long in this case) will do.
> 
> Ok, should I also take a chance and convert some of the surroundings?

In general I'd recommend only adjusting code which gets touched
anyway; in a few cases adjacent code better also gets adjusted
so everything together looks reasonably consistent afterwards.
But I didn't think this was necessary here. IOW - I'd suggest
you don't, but in the end it's up to you (at the risk of needing
to undo parts if you end up going too far).

Jan
diff mbox series

Patch

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 4d1c955..ef60007 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -155,6 +155,8 @@  int build_e820_table(struct e820entry *e820,
 {
     unsigned int nr = 0, i, j;
     uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
+    uint32_t firmware_mem_end =
+        RESERVED_MEMORY_DYNAMIC_START + (mem_mfns_allocated() << PAGE_SHIFT);
 
     if ( !lowmem_reserved_base )
             lowmem_reserved_base = 0xA0000;
@@ -199,8 +201,19 @@  int build_e820_table(struct e820entry *e820,
     nr++;
 
     /*
+     * Mark populated reserved memory that contains ACPI and other tables as
+     * ACPI NVS (non-reclaimable) space - that should help the guest to treat
+     * it correctly later (e.g. pass to the next kernel on kexec).
+     */
+
+    e820[nr].addr = RESERVED_MEMBASE;
+    e820[nr].size = firmware_mem_end - RESERVED_MEMBASE;
+    e820[nr].type = E820_NVS;
+    nr++;
+
+    /*
      * Explicitly reserve space for special pages.
-     * This space starts at RESERVED_MEMBASE an extends to cover various
+     * This space starts after ACPI region and extends to cover various
      * fixed hardware mappings (e.g., LAPIC, IOAPIC, default SVGA framebuffer).
      *
      * If igd_opregion_pgbase we need to split the RESERVED region in two.
@@ -210,8 +223,8 @@  int build_e820_table(struct e820entry *e820,
     {
         uint32_t igd_opregion_base = igd_opregion_pgbase << PAGE_SHIFT;
 
-        e820[nr].addr = RESERVED_MEMBASE;
-        e820[nr].size = (uint32_t) igd_opregion_base - RESERVED_MEMBASE;
+        e820[nr].addr = firmware_mem_end;
+        e820[nr].size = (uint32_t) igd_opregion_base - firmware_mem_end;
         e820[nr].type = E820_RESERVED;
         nr++;
 
@@ -227,7 +240,7 @@  int build_e820_table(struct e820entry *e820,
     }
     else
     {
-        e820[nr].addr = RESERVED_MEMBASE;
+        e820[nr].addr = firmware_mem_end;
         e820[nr].size = (uint32_t)-e820[nr].addr;
         e820[nr].type = E820_RESERVED;
         nr++;
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 0c3f2d2..af68862 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -444,6 +444,12 @@  void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
 static uint32_t alloc_up = RESERVED_MEMORY_DYNAMIC_START - 1;
 static uint32_t alloc_down = RESERVED_MEMORY_DYNAMIC_END;
 
+uint32_t mem_mfns_allocated(void)
+{
+    return (alloc_up >> PAGE_SHIFT) -
+            ((RESERVED_MEMORY_DYNAMIC_START - 1) >> PAGE_SHIFT);
+}
+
 xen_pfn_t mem_hole_alloc(uint32_t nr_mfns)
 {
     alloc_down -= nr_mfns << PAGE_SHIFT;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 7bca641..98d5e02 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -200,6 +200,9 @@  void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns);
 /* Allocate a memory hole below 4GB. */
 xen_pfn_t mem_hole_alloc(uint32_t nr_mfns);
 
+/* Return number of pages allocated */
+uint32_t mem_mfns_allocated(void);
+
 /* Allocate memory in a reserved region below 4GB. */
 void *mem_alloc(uint32_t size, uint32_t align);
 #define virt_to_phys(v) ((unsigned long)(v))