diff mbox

[v2] hw/arm/virt-acpi - reserve ECAM space as PNP0C02 device

Message ID 1484328738-21149-1-git-send-email-ard.biesheuvel@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Ard Biesheuvel Jan. 13, 2017, 5:32 p.m. UTC
Linux for arm64 v4.10 and later will complain if the ECAM config space is
not reserved in the ACPI namespace:

  acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace

The rationale is that OSes that don't consume the MCFG table should still
be able to infer that the PCI config space MMIO region is occupied.

So update the ACPI table generation routine to add this reservation.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 hw/arm/virt-acpi-build.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Peter Maydell Jan. 16, 2017, 5:25 p.m. UTC | #1
On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> Linux for arm64 v4.10 and later will complain if the ECAM config space is
> not reserved in the ACPI namespace:
>
>   acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace
>
> The rationale is that OSes that don't consume the MCFG table should still
> be able to infer that the PCI config space MMIO region is occupied.
>
> So update the ACPI table generation routine to add this reservation.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  hw/arm/virt-acpi-build.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 085a61117378..50d52f685f68 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
>      Aml *dev_rp0 = aml_device("%s", "RP0");
>      aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0)));
>      aml_append(dev, dev_rp0);
> +
> +    Aml *dev_res0 = aml_device("%s", "RES0");
> +    aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
> +    crs = aml_resource_template();
> +    aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE));
> +    aml_append(dev_res0, aml_name_decl("_CRS", crs));
> +    aml_append(dev, dev_res0);
>      aml_append(scope, dev);
>  }

This needs to be controlled via the machine class back-compat
machinery in hw/arm/virt.c so that it only happens for virt-2.9
and later.

thanks
-- PMM
Ard Biesheuvel Jan. 16, 2017, 5:30 p.m. UTC | #2
On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> Linux for arm64 v4.10 and later will complain if the ECAM config space is
>> not reserved in the ACPI namespace:
>>
>>   acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace
>>
>> The rationale is that OSes that don't consume the MCFG table should still
>> be able to infer that the PCI config space MMIO region is occupied.
>>
>> So update the ACPI table generation routine to add this reservation.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  hw/arm/virt-acpi-build.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>> index 085a61117378..50d52f685f68 100644
>> --- a/hw/arm/virt-acpi-build.c
>> +++ b/hw/arm/virt-acpi-build.c
>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
>>      Aml *dev_rp0 = aml_device("%s", "RP0");
>>      aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0)));
>>      aml_append(dev, dev_rp0);
>> +
>> +    Aml *dev_res0 = aml_device("%s", "RES0");
>> +    aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
>> +    crs = aml_resource_template();
>> +    aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE));
>> +    aml_append(dev_res0, aml_name_decl("_CRS", crs));
>> +    aml_append(dev, dev_res0);
>>      aml_append(scope, dev);
>>  }
>
> This needs to be controlled via the machine class back-compat
> machinery in hw/arm/virt.c so that it only happens for virt-2.9
> and later.
>

Why exactly?
Peter Maydell Jan. 16, 2017, 6:20 p.m. UTC | #3
On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is
>>> not reserved in the ACPI namespace:
>>>
>>>   acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace
>>>
>>> The rationale is that OSes that don't consume the MCFG table should still
>>> be able to infer that the PCI config space MMIO region is occupied.
>>>
>>> So update the ACPI table generation routine to add this reservation.
>>>
>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> ---
>>>  hw/arm/virt-acpi-build.c | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>>> index 085a61117378..50d52f685f68 100644
>>> --- a/hw/arm/virt-acpi-build.c
>>> +++ b/hw/arm/virt-acpi-build.c
>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
>>>      Aml *dev_rp0 = aml_device("%s", "RP0");
>>>      aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0)));
>>>      aml_append(dev, dev_rp0);
>>> +
>>> +    Aml *dev_res0 = aml_device("%s", "RES0");
>>> +    aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
>>> +    crs = aml_resource_template();
>>> +    aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE));
>>> +    aml_append(dev_res0, aml_name_decl("_CRS", crs));
>>> +    aml_append(dev, dev_res0);
>>>      aml_append(scope, dev);
>>>  }
>>
>> This needs to be controlled via the machine class back-compat
>> machinery in hw/arm/virt.c so that it only happens for virt-2.9
>> and later.
>>
>
> Why exactly?

Because the "virt-2.8" machine has to present to the guest
exactly what "virt" did as of the QEMU 2.8 release, including
any bugs or missing things we happened to have in our ACPI
tables. This allows cross-version compatibility (including
VM migration). Drew will have a more detailed explanation
if you need it.

thanks
-- PMM
Ard Biesheuvel Jan. 16, 2017, 7:31 p.m. UTC | #4
On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is
>>>> not reserved in the ACPI namespace:
>>>>
>>>>   acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace
>>>>
>>>> The rationale is that OSes that don't consume the MCFG table should still
>>>> be able to infer that the PCI config space MMIO region is occupied.
>>>>
>>>> So update the ACPI table generation routine to add this reservation.
>>>>
>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>> ---
>>>>  hw/arm/virt-acpi-build.c | 7 +++++++
>>>>  1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>>>> index 085a61117378..50d52f685f68 100644
>>>> --- a/hw/arm/virt-acpi-build.c
>>>> +++ b/hw/arm/virt-acpi-build.c
>>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
>>>>      Aml *dev_rp0 = aml_device("%s", "RP0");
>>>>      aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0)));
>>>>      aml_append(dev, dev_rp0);
>>>> +
>>>> +    Aml *dev_res0 = aml_device("%s", "RES0");
>>>> +    aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
>>>> +    crs = aml_resource_template();
>>>> +    aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE));
>>>> +    aml_append(dev_res0, aml_name_decl("_CRS", crs));
>>>> +    aml_append(dev, dev_res0);
>>>>      aml_append(scope, dev);
>>>>  }
>>>
>>> This needs to be controlled via the machine class back-compat
>>> machinery in hw/arm/virt.c so that it only happens for virt-2.9
>>> and later.
>>>
>>
>> Why exactly?
>
> Because the "virt-2.8" machine has to present to the guest
> exactly what "virt" did as of the QEMU 2.8 release, including
> any bugs or missing things we happened to have in our ACPI
> tables. This allows cross-version compatibility (including
> VM migration). Drew will have a more detailed explanation
> if you need it.
>

I suspected as much.

But in this case, I am not sure if it is worth the trouble: the
generated data is only consumed at boot time by the firmware, and I
suppose migration involves freezing a VM, including whatever resident
firmware image was used to boot the OS, and so this is unlikely to
affect migration.

But I will let Drew explain ...

Thanks,
Ard.
Laszlo Ersek Jan. 16, 2017, 9:13 p.m. UTC | #5
On 01/16/17 20:31, Ard Biesheuvel wrote:
> On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is
>>>>> not reserved in the ACPI namespace:
>>>>>
>>>>>   acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace
>>>>>
>>>>> The rationale is that OSes that don't consume the MCFG table should still
>>>>> be able to infer that the PCI config space MMIO region is occupied.
>>>>>
>>>>> So update the ACPI table generation routine to add this reservation.
>>>>>
>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>> ---
>>>>>  hw/arm/virt-acpi-build.c | 7 +++++++
>>>>>  1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>>>>> index 085a61117378..50d52f685f68 100644
>>>>> --- a/hw/arm/virt-acpi-build.c
>>>>> +++ b/hw/arm/virt-acpi-build.c
>>>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
>>>>>      Aml *dev_rp0 = aml_device("%s", "RP0");
>>>>>      aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0)));
>>>>>      aml_append(dev, dev_rp0);
>>>>> +
>>>>> +    Aml *dev_res0 = aml_device("%s", "RES0");
>>>>> +    aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
>>>>> +    crs = aml_resource_template();
>>>>> +    aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE));
>>>>> +    aml_append(dev_res0, aml_name_decl("_CRS", crs));
>>>>> +    aml_append(dev, dev_res0);
>>>>>      aml_append(scope, dev);
>>>>>  }
>>>>
>>>> This needs to be controlled via the machine class back-compat
>>>> machinery in hw/arm/virt.c so that it only happens for virt-2.9
>>>> and later.
>>>>
>>>
>>> Why exactly?
>>
>> Because the "virt-2.8" machine has to present to the guest
>> exactly what "virt" did as of the QEMU 2.8 release, including
>> any bugs or missing things we happened to have in our ACPI
>> tables. This allows cross-version compatibility (including
>> VM migration). Drew will have a more detailed explanation
>> if you need it.
>>
> 
> I suspected as much.
> 
> But in this case, I am not sure if it is worth the trouble: the
> generated data is only consumed at boot time by the firmware, and I
> suppose migration involves freezing a VM, including whatever resident
> firmware image was used to boot the OS, and so this is unlikely to
> affect migration.
> 
> But I will let Drew explain ...

The PCI Firmware Specification (rev 3.1) says in 4.1.2. "MCFG Table
Description": "The resources can *optionally* be returned in [...]
EFIGetMemoryMap as reserved memory [...]". (Emphasis mine.) Linux seems
to *insist* on this kind of reservation however.

PNP0C02 is "General ID for reserving resources required by PnP
motherboard registers. (Not device specific.)", according to
<http://www.plasma-online.de/english/identify/serial/pnp_id_pnp.html>.
So what this patch does is reserve a memory area through ACPI,
practically as an unspecified "platform resource".

There's an alternative that's contained entirely in the firmware. You
can cover the MMCONFIG area in ArmVirtQemu with an EfiReservedMemoryType
memory map entry (by producing an appropriate memalloc HOB in PEI, or by
calling the appropriate gDS memory space map functions in DXE). OVMF
does the former (memalloc HOB).

In ArmVirtQemu, we grab the MMCONFIG range from "pci-host-ecam-generic",
from QEMU's DTB. If you don't dislike the idea, we could cover the range
as well, right in "ArmVirtPkg/Library/FdtPciPcdProducerLib". That lib
instance already sets the base address PCD, and makes sure that the
relevant code is executed only once (in whatever driver module the
library instance was built into). You could call the gDS functions
mentioned above from that spot. (The library instance is already
restricted to DXE_DRIVER and UEFI_DRIVER modules.)

Thanks!
Laszlo
Ard Biesheuvel Jan. 16, 2017, 9:23 p.m. UTC | #6
On 16 January 2017 at 21:13, Laszlo Ersek <lersek@redhat.com> wrote:
> On 01/16/17 20:31, Ard Biesheuvel wrote:
>> On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>>> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>>>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is
>>>>>> not reserved in the ACPI namespace:
>>>>>>
>>>>>>   acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace
>>>>>>
>>>>>> The rationale is that OSes that don't consume the MCFG table should still
>>>>>> be able to infer that the PCI config space MMIO region is occupied.
>>>>>>
>>>>>> So update the ACPI table generation routine to add this reservation.
>>>>>>
>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>>> ---
>>>>>>  hw/arm/virt-acpi-build.c | 7 +++++++
>>>>>>  1 file changed, 7 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>>>>>> index 085a61117378..50d52f685f68 100644
>>>>>> --- a/hw/arm/virt-acpi-build.c
>>>>>> +++ b/hw/arm/virt-acpi-build.c
>>>>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
>>>>>>      Aml *dev_rp0 = aml_device("%s", "RP0");
>>>>>>      aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0)));
>>>>>>      aml_append(dev, dev_rp0);
>>>>>> +
>>>>>> +    Aml *dev_res0 = aml_device("%s", "RES0");
>>>>>> +    aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
>>>>>> +    crs = aml_resource_template();
>>>>>> +    aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE));
>>>>>> +    aml_append(dev_res0, aml_name_decl("_CRS", crs));
>>>>>> +    aml_append(dev, dev_res0);
>>>>>>      aml_append(scope, dev);
>>>>>>  }
>>>>>
>>>>> This needs to be controlled via the machine class back-compat
>>>>> machinery in hw/arm/virt.c so that it only happens for virt-2.9
>>>>> and later.
>>>>>
>>>>
>>>> Why exactly?
>>>
>>> Because the "virt-2.8" machine has to present to the guest
>>> exactly what "virt" did as of the QEMU 2.8 release, including
>>> any bugs or missing things we happened to have in our ACPI
>>> tables. This allows cross-version compatibility (including
>>> VM migration). Drew will have a more detailed explanation
>>> if you need it.
>>>
>>
>> I suspected as much.
>>
>> But in this case, I am not sure if it is worth the trouble: the
>> generated data is only consumed at boot time by the firmware, and I
>> suppose migration involves freezing a VM, including whatever resident
>> firmware image was used to boot the OS, and so this is unlikely to
>> affect migration.
>>
>> But I will let Drew explain ...
>
> The PCI Firmware Specification (rev 3.1) says in 4.1.2. "MCFG Table
> Description": "The resources can *optionally* be returned in [...]
> EFIGetMemoryMap as reserved memory [...]". (Emphasis mine.) Linux seems
> to *insist* on this kind of reservation however.
>

No, not at the UEFI level but at the ACPI level. Reservations in the
UEFI memory map describe memory not MMIO space

> PNP0C02 is "General ID for reserving resources required by PnP
> motherboard registers. (Not device specific.)", according to
> <http://www.plasma-online.de/english/identify/serial/pnp_id_pnp.html>.
> So what this patch does is reserve a memory area through ACPI,
> practically as an unspecified "platform resource".
>

This has been discussed at great length on the linux mailing lists

https://patchwork.kernel.org/patch/9453149/

> There's an alternative that's contained entirely in the firmware. You
> can cover the MMCONFIG area in ArmVirtQemu with an EfiReservedMemoryType
> memory map entry (by producing an appropriate memalloc HOB in PEI, or by
> calling the appropriate gDS memory space map functions in DXE). OVMF
> does the former (memalloc HOB).
>
> In ArmVirtQemu, we grab the MMCONFIG range from "pci-host-ecam-generic",
> from QEMU's DTB. If you don't dislike the idea, we could cover the range
> as well, right in "ArmVirtPkg/Library/FdtPciPcdProducerLib". That lib
> instance already sets the base address PCD, and makes sure that the
> relevant code is executed only once (in whatever driver module the
> library instance was built into). You could call the gDS functions
> mentioned above from that spot. (The library instance is already
> restricted to DXE_DRIVER and UEFI_DRIVER modules.)
>

In general, I think describing MMIO in the UEFI memory map is not very
useful, and counter to the spec, which mentions that the memory map
describes memory ("however it is used"), not memory *space* (unless
UEFI itself needs to access it to implement runtime services)
Laszlo Ersek Jan. 16, 2017, 10:35 p.m. UTC | #7
On 01/16/17 22:23, Ard Biesheuvel wrote:
> On 16 January 2017 at 21:13, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 01/16/17 20:31, Ard Biesheuvel wrote:
>>> On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>> On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>>>> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>>>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>>>>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is
>>>>>>> not reserved in the ACPI namespace:
>>>>>>>
>>>>>>>   acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace
>>>>>>>
>>>>>>> The rationale is that OSes that don't consume the MCFG table should still
>>>>>>> be able to infer that the PCI config space MMIO region is occupied.
>>>>>>>
>>>>>>> So update the ACPI table generation routine to add this reservation.
>>>>>>>
>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>>>> ---
>>>>>>>  hw/arm/virt-acpi-build.c | 7 +++++++
>>>>>>>  1 file changed, 7 insertions(+)
>>>>>>>
>>>>>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>>>>>>> index 085a61117378..50d52f685f68 100644
>>>>>>> --- a/hw/arm/virt-acpi-build.c
>>>>>>> +++ b/hw/arm/virt-acpi-build.c
>>>>>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
>>>>>>>      Aml *dev_rp0 = aml_device("%s", "RP0");
>>>>>>>      aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0)));
>>>>>>>      aml_append(dev, dev_rp0);
>>>>>>> +
>>>>>>> +    Aml *dev_res0 = aml_device("%s", "RES0");
>>>>>>> +    aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
>>>>>>> +    crs = aml_resource_template();
>>>>>>> +    aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE));
>>>>>>> +    aml_append(dev_res0, aml_name_decl("_CRS", crs));
>>>>>>> +    aml_append(dev, dev_res0);
>>>>>>>      aml_append(scope, dev);
>>>>>>>  }
>>>>>>
>>>>>> This needs to be controlled via the machine class back-compat
>>>>>> machinery in hw/arm/virt.c so that it only happens for virt-2.9
>>>>>> and later.
>>>>>>
>>>>>
>>>>> Why exactly?
>>>>
>>>> Because the "virt-2.8" machine has to present to the guest
>>>> exactly what "virt" did as of the QEMU 2.8 release, including
>>>> any bugs or missing things we happened to have in our ACPI
>>>> tables. This allows cross-version compatibility (including
>>>> VM migration). Drew will have a more detailed explanation
>>>> if you need it.
>>>>
>>>
>>> I suspected as much.
>>>
>>> But in this case, I am not sure if it is worth the trouble: the
>>> generated data is only consumed at boot time by the firmware, and I
>>> suppose migration involves freezing a VM, including whatever resident
>>> firmware image was used to boot the OS, and so this is unlikely to
>>> affect migration.
>>>
>>> But I will let Drew explain ...
>>
>> The PCI Firmware Specification (rev 3.1) says in 4.1.2. "MCFG Table
>> Description": "The resources can *optionally* be returned in [...]
>> EFIGetMemoryMap as reserved memory [...]". (Emphasis mine.) Linux seems
>> to *insist* on this kind of reservation however.
>>
> 
> No, not at the UEFI level but at the ACPI level. Reservations in the
> UEFI memory map describe memory not MMIO space
> 
>> PNP0C02 is "General ID for reserving resources required by PnP
>> motherboard registers. (Not device specific.)", according to
>> <http://www.plasma-online.de/english/identify/serial/pnp_id_pnp.html>.
>> So what this patch does is reserve a memory area through ACPI,
>> practically as an unspecified "platform resource".
>>
> 
> This has been discussed at great length on the linux mailing lists
> 
> https://patchwork.kernel.org/patch/9453149/
> 
>> There's an alternative that's contained entirely in the firmware. You
>> can cover the MMCONFIG area in ArmVirtQemu with an EfiReservedMemoryType
>> memory map entry (by producing an appropriate memalloc HOB in PEI, or by
>> calling the appropriate gDS memory space map functions in DXE). OVMF
>> does the former (memalloc HOB).
>>
>> In ArmVirtQemu, we grab the MMCONFIG range from "pci-host-ecam-generic",
>> from QEMU's DTB. If you don't dislike the idea, we could cover the range
>> as well, right in "ArmVirtPkg/Library/FdtPciPcdProducerLib". That lib
>> instance already sets the base address PCD, and makes sure that the
>> relevant code is executed only once (in whatever driver module the
>> library instance was built into). You could call the gDS functions
>> mentioned above from that spot. (The library instance is already
>> restricted to DXE_DRIVER and UEFI_DRIVER modules.)
>>
> 
> In general, I think describing MMIO in the UEFI memory map is not very
> useful, and counter to the spec, which mentions that the memory map
> describes memory ("however it is used"), not memory *space* (unless
> UEFI itself needs to access it to implement runtime services)
> 

The UEFI memory map will reflect allocations from the GCD memory space,
for the Reserved and MMIO types. See "Figure 2. GCD Memory State
Transitions" in "7.2.2 GCD Memory Resources", Vol2 of the PI spec.

See also "9.7.1 UEFI Boot Services Dependencies" in the same,

  9.7.1.8 GetMemoryMap()

  The GetMemoryMap() implementation must include into the UEFI memory
  map all GCD map entries of types EfiGcdMemoryTypeReserved and
  EfiPersistentMemory, and all GCD map entries of type
  EfiGcdMemoryTypeMemoryMappedIo that have EFI_MEMORY_RUNTIME attribute
  set.

(Note that I wrote Reserved earlier, not MMIO.)

However, you are right that *just* the UEFI memmap entry is not
sufficient, according to the PCI firmware spec. (Regardless of the fact
that in practice, just the memmap entry does keep Linux happy. Or is it
about to change?)

Namely, looking again at the spot I quoted above (and it's also quoted
in the kernel docs patch that you linked above, under ref [6]), we find

    If the operating system does not natively comprehend reserving the
    MMCFG region, the MMCFG region must be reserved by firmware. The
    address range reported in the MCFG table or by _CBA method (see
    Section 4.1.3) must be reserved by declaring a motherboard resource.
    For most systems, the motherboard resource would appear at the root
    of the ACPI namespace (under \_SB) in a node with a _HID of EISAID
    (PNP0C02), and the resources in this case should not be claimed in
    the root PCI bus’s _CRS. The resources can optionally be returned in
    Int15 E820 or EFIGetMemoryMap as reserved memory but must always be
    reported through ACPI as a motherboard resource.

Therefore I agree that reserving the MMCONFIG area via a PNP0C02 object
in QEMU's ACPI payload improves spec conformance.

(Actually, the argument can be made for x86/Q35 as well. Adding Marcel
and MST.)

... Beyond the machine-type dependency raised by Peter (which I gather
is still being discussed), I suggest that the commit message of this
patch quote the relevant passage from the PCI fw spec in full (see
above, or in the kernel docs patch).

Thanks!
Laszlo
Ard Biesheuvel Jan. 17, 2017, 7:47 a.m. UTC | #8
On 16 January 2017 at 22:35, Laszlo Ersek <lersek@redhat.com> wrote:
> On 01/16/17 22:23, Ard Biesheuvel wrote:
>> On 16 January 2017 at 21:13, Laszlo Ersek <lersek@redhat.com> wrote:
>>> On 01/16/17 20:31, Ard Biesheuvel wrote:
>>>> On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>>> On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>>>>> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>>>>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>>>>>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is
>>>>>>>> not reserved in the ACPI namespace:
>>>>>>>>
>>>>>>>>   acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace
>>>>>>>>
>>>>>>>> The rationale is that OSes that don't consume the MCFG table should still
>>>>>>>> be able to infer that the PCI config space MMIO region is occupied.
>>>>>>>>
>>>>>>>> So update the ACPI table generation routine to add this reservation.
>>>>>>>>
>>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>>>>> ---
>>>>>>>>  hw/arm/virt-acpi-build.c | 7 +++++++
>>>>>>>>  1 file changed, 7 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>>>>>>>> index 085a61117378..50d52f685f68 100644
>>>>>>>> --- a/hw/arm/virt-acpi-build.c
>>>>>>>> +++ b/hw/arm/virt-acpi-build.c
>>>>>>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
>>>>>>>>      Aml *dev_rp0 = aml_device("%s", "RP0");
>>>>>>>>      aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0)));
>>>>>>>>      aml_append(dev, dev_rp0);
>>>>>>>> +
>>>>>>>> +    Aml *dev_res0 = aml_device("%s", "RES0");
>>>>>>>> +    aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
>>>>>>>> +    crs = aml_resource_template();
>>>>>>>> +    aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE));
>>>>>>>> +    aml_append(dev_res0, aml_name_decl("_CRS", crs));
>>>>>>>> +    aml_append(dev, dev_res0);
>>>>>>>>      aml_append(scope, dev);
>>>>>>>>  }
>>>>>>>
>>>>>>> This needs to be controlled via the machine class back-compat
>>>>>>> machinery in hw/arm/virt.c so that it only happens for virt-2.9
>>>>>>> and later.
>>>>>>>
>>>>>>
>>>>>> Why exactly?
>>>>>
>>>>> Because the "virt-2.8" machine has to present to the guest
>>>>> exactly what "virt" did as of the QEMU 2.8 release, including
>>>>> any bugs or missing things we happened to have in our ACPI
>>>>> tables. This allows cross-version compatibility (including
>>>>> VM migration). Drew will have a more detailed explanation
>>>>> if you need it.
>>>>>
>>>>
>>>> I suspected as much.
>>>>
>>>> But in this case, I am not sure if it is worth the trouble: the
>>>> generated data is only consumed at boot time by the firmware, and I
>>>> suppose migration involves freezing a VM, including whatever resident
>>>> firmware image was used to boot the OS, and so this is unlikely to
>>>> affect migration.
>>>>
>>>> But I will let Drew explain ...
>>>
>>> The PCI Firmware Specification (rev 3.1) says in 4.1.2. "MCFG Table
>>> Description": "The resources can *optionally* be returned in [...]
>>> EFIGetMemoryMap as reserved memory [...]". (Emphasis mine.) Linux seems
>>> to *insist* on this kind of reservation however.
>>>
>>
>> No, not at the UEFI level but at the ACPI level. Reservations in the
>> UEFI memory map describe memory not MMIO space
>>
>>> PNP0C02 is "General ID for reserving resources required by PnP
>>> motherboard registers. (Not device specific.)", according to
>>> <http://www.plasma-online.de/english/identify/serial/pnp_id_pnp.html>.
>>> So what this patch does is reserve a memory area through ACPI,
>>> practically as an unspecified "platform resource".
>>>
>>
>> This has been discussed at great length on the linux mailing lists
>>
>> https://patchwork.kernel.org/patch/9453149/
>>
>>> There's an alternative that's contained entirely in the firmware. You
>>> can cover the MMCONFIG area in ArmVirtQemu with an EfiReservedMemoryType
>>> memory map entry (by producing an appropriate memalloc HOB in PEI, or by
>>> calling the appropriate gDS memory space map functions in DXE). OVMF
>>> does the former (memalloc HOB).
>>>
>>> In ArmVirtQemu, we grab the MMCONFIG range from "pci-host-ecam-generic",
>>> from QEMU's DTB. If you don't dislike the idea, we could cover the range
>>> as well, right in "ArmVirtPkg/Library/FdtPciPcdProducerLib". That lib
>>> instance already sets the base address PCD, and makes sure that the
>>> relevant code is executed only once (in whatever driver module the
>>> library instance was built into). You could call the gDS functions
>>> mentioned above from that spot. (The library instance is already
>>> restricted to DXE_DRIVER and UEFI_DRIVER modules.)
>>>
>>
>> In general, I think describing MMIO in the UEFI memory map is not very
>> useful, and counter to the spec, which mentions that the memory map
>> describes memory ("however it is used"), not memory *space* (unless
>> UEFI itself needs to access it to implement runtime services)
>>
>
> The UEFI memory map will reflect allocations from the GCD memory space,
> for the Reserved and MMIO types. See "Figure 2. GCD Memory State
> Transitions" in "7.2.2 GCD Memory Resources", Vol2 of the PI spec.
>
> See also "9.7.1 UEFI Boot Services Dependencies" in the same,
>
>   9.7.1.8 GetMemoryMap()
>
>   The GetMemoryMap() implementation must include into the UEFI memory
>   map all GCD map entries of types EfiGcdMemoryTypeReserved and
>   EfiPersistentMemory, and all GCD map entries of type
>   EfiGcdMemoryTypeMemoryMappedIo that have EFI_MEMORY_RUNTIME attribute
>   set.
>
> (Note that I wrote Reserved earlier, not MMIO.)
>

What the PI spec stipulates is irrelevant: the contract between the
firmware and the OS is in the UEFI and ACPI specifications, not in the
PI spec.

> However, you are right that *just* the UEFI memmap entry is not
> sufficient, according to the PCI firmware spec. (Regardless of the fact
> that in practice, just the memmap entry does keep Linux happy. Or is it
> about to change?)
>

The kernel uses the UEFI memory map for two purposes:
- finding out where memory is, and which parts are usable (i.e., non-reserved)
- setting up page tables to allow UEFI runtime services calls, which
may include MMIO mappings

This means that MMIO regions in the UEFI memory map are *not*
considered reservations. This is in line with the wording of the UEFI
spec, which mentions that the memory map describes memory, not MMIO
(with the exception of MMIO peripherals that the firmware needs to
access to implement the runtime services)

> Namely, looking again at the spot I quoted above (and it's also quoted
> in the kernel docs patch that you linked above, under ref [6]), we find
>
>     If the operating system does not natively comprehend reserving the
>     MMCFG region, the MMCFG region must be reserved by firmware. The
>     address range reported in the MCFG table or by _CBA method (see
>     Section 4.1.3) must be reserved by declaring a motherboard resource.
>     For most systems, the motherboard resource would appear at the root
>     of the ACPI namespace (under \_SB) in a node with a _HID of EISAID
>     (PNP0C02), and the resources in this case should not be claimed in
>     the root PCI bus’s _CRS. The resources can optionally be returned in
>     Int15 E820 or EFIGetMemoryMap as reserved memory but must always be
>     reported through ACPI as a motherboard resource.
>
> Therefore I agree that reserving the MMCONFIG area via a PNP0C02 object
> in QEMU's ACPI payload improves spec conformance.
>

Good.

> (Actually, the argument can be made for x86/Q35 as well. Adding Marcel
> and MST.)
>
> ... Beyond the machine-type dependency raised by Peter (which I gather
> is still being discussed), I suggest that the commit message of this
> patch quote the relevant passage from the PCI fw spec in full (see
> above, or in the kernel docs patch).
>

I will expand the commit message in the next respin

Thanks (and apologies for not cc'ing you in the first place),
Ard.
Laszlo Ersek Jan. 17, 2017, 8:50 a.m. UTC | #9
(my reply is no longer related to the patch, so maybe I shouldn't send
it... I can't resist, sorry :))

On 01/17/17 08:47, Ard Biesheuvel wrote:
> On 16 January 2017 at 22:35, Laszlo Ersek <lersek@redhat.com> wrote:

>> The UEFI memory map will reflect allocations from the GCD memory space,
>> for the Reserved and MMIO types. See "Figure 2. GCD Memory State
>> Transitions" in "7.2.2 GCD Memory Resources", Vol2 of the PI spec.
>>
>> See also "9.7.1 UEFI Boot Services Dependencies" in the same,
>>
>>   9.7.1.8 GetMemoryMap()
>>
>>   The GetMemoryMap() implementation must include into the UEFI memory
>>   map all GCD map entries of types EfiGcdMemoryTypeReserved and
>>   EfiPersistentMemory, and all GCD map entries of type
>>   EfiGcdMemoryTypeMemoryMappedIo that have EFI_MEMORY_RUNTIME attribute
>>   set.
>>
>> (Note that I wrote Reserved earlier, not MMIO.)
>>
> 
> What the PI spec stipulates is irrelevant: the contract between the
> firmware and the OS is in the UEFI and ACPI specifications, not in the
> PI spec.

I disagree that what the PI spec stipulates is irrelevant. For platforms
that implement both PI and UEFI, the PI spec expresses additional
requirements for the UEFI implementation (in PI terminology). So what it
says certainly matters for the ArmVirtQemu firmware specifically.

End-to-end, if we want to achieve a particular result in a UEFI OS, we
can certainly work towards that end in the PEI phase (or in the DXE
phase, using the DXE services) in a specific firmware that aims to
conform to both PI and UEFI. Because, the effects that those low-level
operations will have on the UEFI level (and consequently, on the OS) are
well defined in the PI spec.

> 
>> However, you are right that *just* the UEFI memmap entry is not
>> sufficient, according to the PCI firmware spec. (Regardless of the fact
>> that in practice, just the memmap entry does keep Linux happy. Or is it
>> about to change?)
>>
> 
> The kernel uses the UEFI memory map for two purposes:
> - finding out where memory is, and which parts are usable (i.e., non-reserved)
> - setting up page tables to allow UEFI runtime services calls, which
> may include MMIO mappings
> 
> This means that MMIO regions in the UEFI memory map are *not*
> considered reservations. [...]

Yes, I understand that. Now please understand that my suggestion was
never to cover the MMCONFIG area with MMIO type memory; all along I've
been saying "reserved memory".

(Again, this is now independent of the patch.)

Thanks,
Laszlo
Ard Biesheuvel Jan. 17, 2017, 9:06 a.m. UTC | #10
On 17 January 2017 at 08:50, Laszlo Ersek <lersek@redhat.com> wrote:
> (my reply is no longer related to the patch, so maybe I shouldn't send
> it... I can't resist, sorry :))
>
> On 01/17/17 08:47, Ard Biesheuvel wrote:
>> On 16 January 2017 at 22:35, Laszlo Ersek <lersek@redhat.com> wrote:
>
>>> The UEFI memory map will reflect allocations from the GCD memory space,
>>> for the Reserved and MMIO types. See "Figure 2. GCD Memory State
>>> Transitions" in "7.2.2 GCD Memory Resources", Vol2 of the PI spec.
>>>
>>> See also "9.7.1 UEFI Boot Services Dependencies" in the same,
>>>
>>>   9.7.1.8 GetMemoryMap()
>>>
>>>   The GetMemoryMap() implementation must include into the UEFI memory
>>>   map all GCD map entries of types EfiGcdMemoryTypeReserved and
>>>   EfiPersistentMemory, and all GCD map entries of type
>>>   EfiGcdMemoryTypeMemoryMappedIo that have EFI_MEMORY_RUNTIME attribute
>>>   set.
>>>
>>> (Note that I wrote Reserved earlier, not MMIO.)
>>>
>>
>> What the PI spec stipulates is irrelevant: the contract between the
>> firmware and the OS is in the UEFI and ACPI specifications, not in the
>> PI spec.
>
> I disagree that what the PI spec stipulates is irrelevant. For platforms
> that implement both PI and UEFI, the PI spec expresses additional
> requirements for the UEFI implementation (in PI terminology). So what it
> says certainly matters for the ArmVirtQemu firmware specifically.
>
> End-to-end, if we want to achieve a particular result in a UEFI OS, we
> can certainly work towards that end in the PEI phase (or in the DXE
> phase, using the DXE services) in a specific firmware that aims to
> conform to both PI and UEFI. Because, the effects that those low-level
> operations will have on the UEFI level (and consequently, on the OS) are
> well defined in the PI spec.
>

PI spec should drive the implementation choices we make at the
ArmVirtQemu end, and the ACPI generation is tightly coupled with that,
so in that sense, I agree that the PI spec *is* relevant.

However, the purpose of the patch (which we are no longer discussing
:-)), is to ensure that QEMU + ArmVirtQemu adheres to the pertinent
contracts with the OS, and PI is not one of them.

>>
>>> However, you are right that *just* the UEFI memmap entry is not
>>> sufficient, according to the PCI firmware spec. (Regardless of the fact
>>> that in practice, just the memmap entry does keep Linux happy. Or is it
>>> about to change?)
>>>
>>
>> The kernel uses the UEFI memory map for two purposes:
>> - finding out where memory is, and which parts are usable (i.e., non-reserved)
>> - setting up page tables to allow UEFI runtime services calls, which
>> may include MMIO mappings
>>
>> This means that MMIO regions in the UEFI memory map are *not*
>> considered reservations. [...]
>
> Yes, I understand that. Now please understand that my suggestion was
> never to cover the MMCONFIG area with MMIO type memory; all along I've
> been saying "reserved memory".
>
> (Again, this is now independent of the patch.)
>

I know the various specs are vague and slightly contradictory, but I
would oppose to using EfiReservedMemory to describe an MMIO region,
given that the wording of the UEFI spec (which is authoritative imo)
suggests that the memory map should only describe memory (unless we
are dealing with MMIO regions that require a runtime mapping so that
the firmware can use the device while running under the OS)
Laszlo Ersek Jan. 17, 2017, 9:28 a.m. UTC | #11
On 01/17/17 10:06, Ard Biesheuvel wrote:
> On 17 January 2017 at 08:50, Laszlo Ersek <lersek@redhat.com> wrote:
>> (my reply is no longer related to the patch, so maybe I shouldn't send
>> it... I can't resist, sorry :))
>>
>> On 01/17/17 08:47, Ard Biesheuvel wrote:
>>> On 16 January 2017 at 22:35, Laszlo Ersek <lersek@redhat.com> wrote:
>>
>>>> The UEFI memory map will reflect allocations from the GCD memory space,
>>>> for the Reserved and MMIO types. See "Figure 2. GCD Memory State
>>>> Transitions" in "7.2.2 GCD Memory Resources", Vol2 of the PI spec.
>>>>
>>>> See also "9.7.1 UEFI Boot Services Dependencies" in the same,
>>>>
>>>>   9.7.1.8 GetMemoryMap()
>>>>
>>>>   The GetMemoryMap() implementation must include into the UEFI memory
>>>>   map all GCD map entries of types EfiGcdMemoryTypeReserved and
>>>>   EfiPersistentMemory, and all GCD map entries of type
>>>>   EfiGcdMemoryTypeMemoryMappedIo that have EFI_MEMORY_RUNTIME attribute
>>>>   set.
>>>>
>>>> (Note that I wrote Reserved earlier, not MMIO.)
>>>>
>>>
>>> What the PI spec stipulates is irrelevant: the contract between the
>>> firmware and the OS is in the UEFI and ACPI specifications, not in the
>>> PI spec.
>>
>> I disagree that what the PI spec stipulates is irrelevant. For platforms
>> that implement both PI and UEFI, the PI spec expresses additional
>> requirements for the UEFI implementation (in PI terminology). So what it
>> says certainly matters for the ArmVirtQemu firmware specifically.
>>
>> End-to-end, if we want to achieve a particular result in a UEFI OS, we
>> can certainly work towards that end in the PEI phase (or in the DXE
>> phase, using the DXE services) in a specific firmware that aims to
>> conform to both PI and UEFI. Because, the effects that those low-level
>> operations will have on the UEFI level (and consequently, on the OS) are
>> well defined in the PI spec.
>>
> 
> PI spec should drive the implementation choices we make at the
> ArmVirtQemu end, and the ACPI generation is tightly coupled with that,
> so in that sense, I agree that the PI spec *is* relevant.
> 
> However, the purpose of the patch (which we are no longer discussing
> :-)), is to ensure that QEMU + ArmVirtQemu adheres to the pertinent
> contracts with the OS, and PI is not one of them.
> 
>>>
>>>> However, you are right that *just* the UEFI memmap entry is not
>>>> sufficient, according to the PCI firmware spec. (Regardless of the fact
>>>> that in practice, just the memmap entry does keep Linux happy. Or is it
>>>> about to change?)
>>>>
>>>
>>> The kernel uses the UEFI memory map for two purposes:
>>> - finding out where memory is, and which parts are usable (i.e., non-reserved)
>>> - setting up page tables to allow UEFI runtime services calls, which
>>> may include MMIO mappings
>>>
>>> This means that MMIO regions in the UEFI memory map are *not*
>>> considered reservations. [...]
>>
>> Yes, I understand that. Now please understand that my suggestion was
>> never to cover the MMCONFIG area with MMIO type memory; all along I've
>> been saying "reserved memory".
>>
>> (Again, this is now independent of the patch.)
>>
> 
> I know the various specs are vague and slightly contradictory, but I
> would oppose to using EfiReservedMemory to describe an MMIO region,
> given that the wording of the UEFI spec (which is authoritative imo)
> suggests that the memory map should only describe memory (unless we
> are dealing with MMIO regions that require a runtime mapping so that
> the firmware can use the device while running under the OS)
> 

Fair enough, on both counts :)
Andrew Jones Jan. 17, 2017, 9:49 a.m. UTC | #12
On Mon, Jan 16, 2017 at 07:31:33PM +0000, Ard Biesheuvel wrote:
> On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote:
> > On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote:
> >>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is
> >>>> not reserved in the ACPI namespace:
> >>>>
> >>>>   acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace
> >>>>
> >>>> The rationale is that OSes that don't consume the MCFG table should still
> >>>> be able to infer that the PCI config space MMIO region is occupied.
> >>>>
> >>>> So update the ACPI table generation routine to add this reservation.
> >>>>
> >>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >>>> ---
> >>>>  hw/arm/virt-acpi-build.c | 7 +++++++
> >>>>  1 file changed, 7 insertions(+)
> >>>>
> >>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> >>>> index 085a61117378..50d52f685f68 100644
> >>>> --- a/hw/arm/virt-acpi-build.c
> >>>> +++ b/hw/arm/virt-acpi-build.c
> >>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
> >>>>      Aml *dev_rp0 = aml_device("%s", "RP0");
> >>>>      aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0)));
> >>>>      aml_append(dev, dev_rp0);
> >>>> +
> >>>> +    Aml *dev_res0 = aml_device("%s", "RES0");
> >>>> +    aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
> >>>> +    crs = aml_resource_template();
> >>>> +    aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE));
> >>>> +    aml_append(dev_res0, aml_name_decl("_CRS", crs));
> >>>> +    aml_append(dev, dev_res0);
> >>>>      aml_append(scope, dev);
> >>>>  }
> >>>
> >>> This needs to be controlled via the machine class back-compat
> >>> machinery in hw/arm/virt.c so that it only happens for virt-2.9
> >>> and later.
> >>>
> >>
> >> Why exactly?
> >
> > Because the "virt-2.8" machine has to present to the guest
> > exactly what "virt" did as of the QEMU 2.8 release, including
> > any bugs or missing things we happened to have in our ACPI
> > tables. This allows cross-version compatibility (including
> > VM migration). Drew will have a more detailed explanation
> > if you need it.
> >
> 
> I suspected as much.
> 
> But in this case, I am not sure if it is worth the trouble: the
> generated data is only consumed at boot time by the firmware, and I
> suppose migration involves freezing a VM, including whatever resident
> firmware image was used to boot the OS, and so this is unlikely to
> affect migration.
> 
> But I will let Drew explain ...
>

In some cases the problem we're solving with the compat guards is
a bit hypothetical, but, IMHO, nonetheless a good practice. While
we may be sure that AAVMF and Linux will be fine with this table
changing under their feet, we can't be sure there aren't other
mach-virt users that have more sensitive firmwares/OSes. An ACPI-
sensitive OS may notice the change on its next reboot after a
migration, and then simply refuse to continue.

Now, that said, I just spoke with Igor in order to learn the x86
practice. He says that the policy has been more lax than what I
suggest above. Hypothetical, low-risk issues are left unguarded,
and only when a bug is found during testing is it then managed.
The idea is to try and reduce the amount of compat variables and
conditions needed in the ACPI generation code, but, of course, at
some level of risk to users expecting their versioned machine type
to always appear the same.

So far we've been strict with mach-virt, guarding all hypothetical
issues. Perhaps this patch is a good example to get a discussion
started on whether or not we should be so strict though.

Thanks,
drew
Peter Maydell Jan. 17, 2017, 10:56 a.m. UTC | #13
On 17 January 2017 at 09:49, Andrew Jones <drjones@redhat.com> wrote:
> In some cases the problem we're solving with the compat guards is
> a bit hypothetical, but, IMHO, nonetheless a good practice. While
> we may be sure that AAVMF and Linux will be fine with this table
> changing under their feet, we can't be sure there aren't other
> mach-virt users that have more sensitive firmwares/OSes. An ACPI-
> sensitive OS may notice the change on its next reboot after a
> migration, and then simply refuse to continue.

There's also the case where you do a VM migration midway through
UEFI booting up, I think, which might cause things to go wrong
if you catch it just at the wrong moment.

> Now, that said, I just spoke with Igor in order to learn the x86
> practice. He says that the policy has been more lax than what I
> suggest above. Hypothetical, low-risk issues are left unguarded,
> and only when a bug is found during testing is it then managed.
> The idea is to try and reduce the amount of compat variables and
> conditions needed in the ACPI generation code, but, of course, at
> some level of risk to users expecting their versioned machine type
> to always appear the same.
>
> So far we've been strict with mach-virt, guarding all hypothetical
> issues. Perhaps this patch is a good example to get a discussion
> started on whether or not we should be so strict though.

That said, I don't have a very strong opinion here, beyond that
we should be consistent at least with x86 practice.

thanks
-- PMM
Michael S. Tsirkin Jan. 17, 2017, 2:46 p.m. UTC | #14
On Mon, Jan 16, 2017 at 11:35:04PM +0100, Laszlo Ersek wrote:
> On 01/16/17 22:23, Ard Biesheuvel wrote:
> > On 16 January 2017 at 21:13, Laszlo Ersek <lersek@redhat.com> wrote:
> >> On 01/16/17 20:31, Ard Biesheuvel wrote:
> >>> On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote:
> >>>> On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >>>>> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote:
> >>>>>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >>>>>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is
> >>>>>>> not reserved in the ACPI namespace:
> >>>>>>>
> >>>>>>>   acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace
> >>>>>>>
> >>>>>>> The rationale is that OSes that don't consume the MCFG table should still
> >>>>>>> be able to infer that the PCI config space MMIO region is occupied.
> >>>>>>>
> >>>>>>> So update the ACPI table generation routine to add this reservation.
> >>>>>>>
> >>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >>>>>>> ---
> >>>>>>>  hw/arm/virt-acpi-build.c | 7 +++++++
> >>>>>>>  1 file changed, 7 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> >>>>>>> index 085a61117378..50d52f685f68 100644
> >>>>>>> --- a/hw/arm/virt-acpi-build.c
> >>>>>>> +++ b/hw/arm/virt-acpi-build.c
> >>>>>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
> >>>>>>>      Aml *dev_rp0 = aml_device("%s", "RP0");
> >>>>>>>      aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0)));
> >>>>>>>      aml_append(dev, dev_rp0);
> >>>>>>> +
> >>>>>>> +    Aml *dev_res0 = aml_device("%s", "RES0");
> >>>>>>> +    aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
> >>>>>>> +    crs = aml_resource_template();
> >>>>>>> +    aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE));
> >>>>>>> +    aml_append(dev_res0, aml_name_decl("_CRS", crs));
> >>>>>>> +    aml_append(dev, dev_res0);
> >>>>>>>      aml_append(scope, dev);
> >>>>>>>  }
> >>>>>>
> >>>>>> This needs to be controlled via the machine class back-compat
> >>>>>> machinery in hw/arm/virt.c so that it only happens for virt-2.9
> >>>>>> and later.
> >>>>>>
> >>>>>
> >>>>> Why exactly?
> >>>>
> >>>> Because the "virt-2.8" machine has to present to the guest
> >>>> exactly what "virt" did as of the QEMU 2.8 release, including
> >>>> any bugs or missing things we happened to have in our ACPI
> >>>> tables. This allows cross-version compatibility (including
> >>>> VM migration). Drew will have a more detailed explanation
> >>>> if you need it.
> >>>>
> >>>
> >>> I suspected as much.
> >>>
> >>> But in this case, I am not sure if it is worth the trouble: the
> >>> generated data is only consumed at boot time by the firmware, and I
> >>> suppose migration involves freezing a VM, including whatever resident
> >>> firmware image was used to boot the OS, and so this is unlikely to
> >>> affect migration.
> >>>
> >>> But I will let Drew explain ...
> >>
> >> The PCI Firmware Specification (rev 3.1) says in 4.1.2. "MCFG Table
> >> Description": "The resources can *optionally* be returned in [...]
> >> EFIGetMemoryMap as reserved memory [...]". (Emphasis mine.) Linux seems
> >> to *insist* on this kind of reservation however.
> >>
> > 
> > No, not at the UEFI level but at the ACPI level. Reservations in the
> > UEFI memory map describe memory not MMIO space
> > 
> >> PNP0C02 is "General ID for reserving resources required by PnP
> >> motherboard registers. (Not device specific.)", according to
> >> <http://www.plasma-online.de/english/identify/serial/pnp_id_pnp.html>.
> >> So what this patch does is reserve a memory area through ACPI,
> >> practically as an unspecified "platform resource".
> >>
> > 
> > This has been discussed at great length on the linux mailing lists
> > 
> > https://patchwork.kernel.org/patch/9453149/
> > 
> >> There's an alternative that's contained entirely in the firmware. You
> >> can cover the MMCONFIG area in ArmVirtQemu with an EfiReservedMemoryType
> >> memory map entry (by producing an appropriate memalloc HOB in PEI, or by
> >> calling the appropriate gDS memory space map functions in DXE). OVMF
> >> does the former (memalloc HOB).
> >>
> >> In ArmVirtQemu, we grab the MMCONFIG range from "pci-host-ecam-generic",
> >> from QEMU's DTB. If you don't dislike the idea, we could cover the range
> >> as well, right in "ArmVirtPkg/Library/FdtPciPcdProducerLib". That lib
> >> instance already sets the base address PCD, and makes sure that the
> >> relevant code is executed only once (in whatever driver module the
> >> library instance was built into). You could call the gDS functions
> >> mentioned above from that spot. (The library instance is already
> >> restricted to DXE_DRIVER and UEFI_DRIVER modules.)
> >>
> > 
> > In general, I think describing MMIO in the UEFI memory map is not very
> > useful, and counter to the spec, which mentions that the memory map
> > describes memory ("however it is used"), not memory *space* (unless
> > UEFI itself needs to access it to implement runtime services)
> > 
> 
> The UEFI memory map will reflect allocations from the GCD memory space,
> for the Reserved and MMIO types. See "Figure 2. GCD Memory State
> Transitions" in "7.2.2 GCD Memory Resources", Vol2 of the PI spec.
> 
> See also "9.7.1 UEFI Boot Services Dependencies" in the same,
> 
>   9.7.1.8 GetMemoryMap()
> 
>   The GetMemoryMap() implementation must include into the UEFI memory
>   map all GCD map entries of types EfiGcdMemoryTypeReserved and
>   EfiPersistentMemory, and all GCD map entries of type
>   EfiGcdMemoryTypeMemoryMappedIo that have EFI_MEMORY_RUNTIME attribute
>   set.
> 
> (Note that I wrote Reserved earlier, not MMIO.)
> 
> However, you are right that *just* the UEFI memmap entry is not
> sufficient, according to the PCI firmware spec. (Regardless of the fact
> that in practice, just the memmap entry does keep Linux happy. Or is it
> about to change?)
> 
> Namely, looking again at the spot I quoted above (and it's also quoted
> in the kernel docs patch that you linked above, under ref [6]), we find
> 
>     If the operating system does not natively comprehend reserving the
>     MMCFG region, the MMCFG region must be reserved by firmware. The
>     address range reported in the MCFG table or by _CBA method (see
>     Section 4.1.3) must be reserved by declaring a motherboard resource.
>     For most systems, the motherboard resource would appear at the root
>     of the ACPI namespace (under \_SB) in a node with a _HID of EISAID
>     (PNP0C02), and the resources in this case should not be claimed in
>     the root PCI bus’s _CRS. The resources can optionally be returned in
>     Int15 E820 or EFIGetMemoryMap as reserved memory but must always be
>     reported through ACPI as a motherboard resource.
> 
> Therefore I agree that reserving the MMCONFIG area via a PNP0C02 object
> in QEMU's ACPI payload improves spec conformance.
> 
> (Actually, the argument can be made for x86/Q35 as well. Adding Marcel
> and MST.)

I agree, thanks for pointing this out.
Patch, anyone?

> ... Beyond the machine-type dependency raised by Peter (which I gather
> is still being discussed), I suggest that the commit message of this
> patch quote the relevant passage from the PCI fw spec in full (see
> above, or in the kernel docs patch).
> 
> Thanks!
> Laszlo
Ard Biesheuvel Jan. 18, 2017, 2:49 p.m. UTC | #15
On 17 January 2017 at 09:49, Andrew Jones <drjones@redhat.com> wrote:
> On Mon, Jan 16, 2017 at 07:31:33PM +0000, Ard Biesheuvel wrote:
>> On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote:
>> > On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> >> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote:
>> >>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> >>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is
>> >>>> not reserved in the ACPI namespace:
>> >>>>
>> >>>>   acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace
>> >>>>
>> >>>> The rationale is that OSes that don't consume the MCFG table should still
>> >>>> be able to infer that the PCI config space MMIO region is occupied.
>> >>>>
>> >>>> So update the ACPI table generation routine to add this reservation.
>> >>>>
>> >>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> >>>> ---
>> >>>>  hw/arm/virt-acpi-build.c | 7 +++++++
>> >>>>  1 file changed, 7 insertions(+)
>> >>>>
>> >>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>> >>>> index 085a61117378..50d52f685f68 100644
>> >>>> --- a/hw/arm/virt-acpi-build.c
>> >>>> +++ b/hw/arm/virt-acpi-build.c
>> >>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
>> >>>>      Aml *dev_rp0 = aml_device("%s", "RP0");
>> >>>>      aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0)));
>> >>>>      aml_append(dev, dev_rp0);
>> >>>> +
>> >>>> +    Aml *dev_res0 = aml_device("%s", "RES0");
>> >>>> +    aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
>> >>>> +    crs = aml_resource_template();
>> >>>> +    aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE));
>> >>>> +    aml_append(dev_res0, aml_name_decl("_CRS", crs));
>> >>>> +    aml_append(dev, dev_res0);
>> >>>>      aml_append(scope, dev);
>> >>>>  }
>> >>>
>> >>> This needs to be controlled via the machine class back-compat
>> >>> machinery in hw/arm/virt.c so that it only happens for virt-2.9
>> >>> and later.
>> >>>
>> >>
>> >> Why exactly?
>> >
>> > Because the "virt-2.8" machine has to present to the guest
>> > exactly what "virt" did as of the QEMU 2.8 release, including
>> > any bugs or missing things we happened to have in our ACPI
>> > tables. This allows cross-version compatibility (including
>> > VM migration). Drew will have a more detailed explanation
>> > if you need it.
>> >
>>
>> I suspected as much.
>>
>> But in this case, I am not sure if it is worth the trouble: the
>> generated data is only consumed at boot time by the firmware, and I
>> suppose migration involves freezing a VM, including whatever resident
>> firmware image was used to boot the OS, and so this is unlikely to
>> affect migration.
>>
>> But I will let Drew explain ...
>>
>
> In some cases the problem we're solving with the compat guards is
> a bit hypothetical, but, IMHO, nonetheless a good practice. While
> we may be sure that AAVMF and Linux will be fine with this table
> changing under their feet, we can't be sure there aren't other
> mach-virt users that have more sensitive firmwares/OSes. An ACPI-
> sensitive OS may notice the change on its next reboot after a
> migration, and then simply refuse to continue.
>
> Now, that said, I just spoke with Igor in order to learn the x86
> practice. He says that the policy has been more lax than what I
> suggest above. Hypothetical, low-risk issues are left unguarded,
> and only when a bug is found during testing is it then managed.
> The idea is to try and reduce the amount of compat variables and
> conditions needed in the ACPI generation code, but, of course, at
> some level of risk to users expecting their versioned machine type
> to always appear the same.
>
> So far we've been strict with mach-virt, guarding all hypothetical
> issues. Perhaps this patch is a good example to get a discussion
> started on whether or not we should be so strict though.
>

Yes please. I don't mind respinning the patch, but I agree that it
makes sense to consider whether minimal bug fixes like this one
require this treatment in the first place
Igor Mammedov Jan. 18, 2017, 3:18 p.m. UTC | #16
On Tue, 17 Jan 2017 10:56:53 +0000
Peter Maydell <peter.maydell@linaro.org> wrote:

> On 17 January 2017 at 09:49, Andrew Jones <drjones@redhat.com> wrote:
> > In some cases the problem we're solving with the compat guards is
> > a bit hypothetical, but, IMHO, nonetheless a good practice. While
> > we may be sure that AAVMF and Linux will be fine with this table
> > changing under their feet, we can't be sure there aren't other
> > mach-virt users that have more sensitive firmwares/OSes. An ACPI-
> > sensitive OS may notice the change on its next reboot after a
> > migration, and then simply refuse to continue.  
> 
> There's also the case where you do a VM migration midway through
> UEFI booting up, I think, which might cause things to go wrong
> if you catch it just at the wrong moment.
acpi blobs are migrated from source so above won't happen.
The time guest will see new table is fresh boot or reboot.

> 
> > Now, that said, I just spoke with Igor in order to learn the x86
> > practice. He says that the policy has been more lax than what I
> > suggest above. Hypothetical, low-risk issues are left unguarded,
> > and only when a bug is found during testing is it then managed.
> > The idea is to try and reduce the amount of compat variables and
> > conditions needed in the ACPI generation code, but, of course, at
> > some level of risk to users expecting their versioned machine type
> > to always appear the same.
> >
> > So far we've been strict with mach-virt, guarding all hypothetical
> > issues. Perhaps this patch is a good example to get a discussion
> > started on whether or not we should be so strict though.  
> 
> That said, I don't have a very strong opinion here, beyond that
> we should be consistent at least with x86 practice.
another reason why we are trying not to use strict approach with ACPI
tables is that it's part of firmware and we didn't version firmwares
so far. (i.e. dst host with newer QEMU will typically have newer
firmware and guest with old machine-type migrated to host with newer
QEMU will run new firmware on (re)boot)

> 
> thanks
> -- PMM
Laszlo Ersek Jan. 18, 2017, 3:55 p.m. UTC | #17
On 01/18/17 16:18, Igor Mammedov wrote:
> On Tue, 17 Jan 2017 10:56:53 +0000
> Peter Maydell <peter.maydell@linaro.org> wrote:
> 
>> On 17 January 2017 at 09:49, Andrew Jones <drjones@redhat.com> wrote:
>>> In some cases the problem we're solving with the compat guards is
>>> a bit hypothetical, but, IMHO, nonetheless a good practice. While
>>> we may be sure that AAVMF and Linux will be fine with this table
>>> changing under their feet, we can't be sure there aren't other
>>> mach-virt users that have more sensitive firmwares/OSes. An ACPI-
>>> sensitive OS may notice the change on its next reboot after a
>>> migration, and then simply refuse to continue.  
>>
>> There's also the case where you do a VM migration midway through
>> UEFI booting up, I think, which might cause things to go wrong
>> if you catch it just at the wrong moment.
> acpi blobs are migrated from source so above won't happen.
> The time guest will see new table is fresh boot or reboot.
> 
>>
>>> Now, that said, I just spoke with Igor in order to learn the x86
>>> practice. He says that the policy has been more lax than what I
>>> suggest above. Hypothetical, low-risk issues are left unguarded,
>>> and only when a bug is found during testing is it then managed.
>>> The idea is to try and reduce the amount of compat variables and
>>> conditions needed in the ACPI generation code, but, of course, at
>>> some level of risk to users expecting their versioned machine type
>>> to always appear the same.
>>>
>>> So far we've been strict with mach-virt, guarding all hypothetical
>>> issues. Perhaps this patch is a good example to get a discussion
>>> started on whether or not we should be so strict though.  
>>
>> That said, I don't have a very strong opinion here, beyond that
>> we should be consistent at least with x86 practice.
> another reason why we are trying not to use strict approach with ACPI
> tables is that it's part of firmware and we didn't version firmwares
> so far. (i.e. dst host with newer QEMU will typically have newer
> firmware and guest with old machine-type migrated to host with newer
> QEMU will run new firmware on (re)boot)

I haven't been aware of this argument, and I'm surprised by it, but I
think it's valid. Regardless of our choice to ultimately compose the
ACPI tables in QEMU, guest OSes definitely consider ACPI as part of the
firmware. So, different ACPI content after a migration + guest reboot on
the target host is not much different from any other firmware-level
changes encountered on the same target host, after reboot.

Laszlo
Ard Biesheuvel Jan. 18, 2017, 5:02 p.m. UTC | #18
On 18 January 2017 at 15:55, Laszlo Ersek <lersek@redhat.com> wrote:
> On 01/18/17 16:18, Igor Mammedov wrote:
>> On Tue, 17 Jan 2017 10:56:53 +0000
>> Peter Maydell <peter.maydell@linaro.org> wrote:
>>
>>> On 17 January 2017 at 09:49, Andrew Jones <drjones@redhat.com> wrote:
>>>> In some cases the problem we're solving with the compat guards is
>>>> a bit hypothetical, but, IMHO, nonetheless a good practice. While
>>>> we may be sure that AAVMF and Linux will be fine with this table
>>>> changing under their feet, we can't be sure there aren't other
>>>> mach-virt users that have more sensitive firmwares/OSes. An ACPI-
>>>> sensitive OS may notice the change on its next reboot after a
>>>> migration, and then simply refuse to continue.
>>>
>>> There's also the case where you do a VM migration midway through
>>> UEFI booting up, I think, which might cause things to go wrong
>>> if you catch it just at the wrong moment.
>> acpi blobs are migrated from source so above won't happen.
>> The time guest will see new table is fresh boot or reboot.
>>
>>>
>>>> Now, that said, I just spoke with Igor in order to learn the x86
>>>> practice. He says that the policy has been more lax than what I
>>>> suggest above. Hypothetical, low-risk issues are left unguarded,
>>>> and only when a bug is found during testing is it then managed.
>>>> The idea is to try and reduce the amount of compat variables and
>>>> conditions needed in the ACPI generation code, but, of course, at
>>>> some level of risk to users expecting their versioned machine type
>>>> to always appear the same.
>>>>
>>>> So far we've been strict with mach-virt, guarding all hypothetical
>>>> issues. Perhaps this patch is a good example to get a discussion
>>>> started on whether or not we should be so strict though.
>>>
>>> That said, I don't have a very strong opinion here, beyond that
>>> we should be consistent at least with x86 practice.
>> another reason why we are trying not to use strict approach with ACPI
>> tables is that it's part of firmware and we didn't version firmwares
>> so far. (i.e. dst host with newer QEMU will typically have newer
>> firmware and guest with old machine-type migrated to host with newer
>> QEMU will run new firmware on (re)boot)
>
> I haven't been aware of this argument, and I'm surprised by it, but I
> think it's valid. Regardless of our choice to ultimately compose the
> ACPI tables in QEMU, guest OSes definitely consider ACPI as part of the
> firmware. So, different ACPI content after a migration + guest reboot on
> the target host is not much different from any other firmware-level
> changes encountered on the same target host, after reboot.
>

I agree. But does that imply that this fix should be tightly coupled
to the mach-virt version, considering that the UEFI firmware you run
*inside* such a vm is not versioned either?
Laszlo Ersek Jan. 18, 2017, 5:26 p.m. UTC | #19
On 01/18/17 18:02, Ard Biesheuvel wrote:
> On 18 January 2017 at 15:55, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 01/18/17 16:18, Igor Mammedov wrote:
>>> On Tue, 17 Jan 2017 10:56:53 +0000
>>> Peter Maydell <peter.maydell@linaro.org> wrote:
>>>
>>>> On 17 January 2017 at 09:49, Andrew Jones <drjones@redhat.com> wrote:
>>>>> In some cases the problem we're solving with the compat guards is
>>>>> a bit hypothetical, but, IMHO, nonetheless a good practice. While
>>>>> we may be sure that AAVMF and Linux will be fine with this table
>>>>> changing under their feet, we can't be sure there aren't other
>>>>> mach-virt users that have more sensitive firmwares/OSes. An ACPI-
>>>>> sensitive OS may notice the change on its next reboot after a
>>>>> migration, and then simply refuse to continue.
>>>>
>>>> There's also the case where you do a VM migration midway through
>>>> UEFI booting up, I think, which might cause things to go wrong
>>>> if you catch it just at the wrong moment.
>>> acpi blobs are migrated from source so above won't happen.
>>> The time guest will see new table is fresh boot or reboot.
>>>
>>>>
>>>>> Now, that said, I just spoke with Igor in order to learn the x86
>>>>> practice. He says that the policy has been more lax than what I
>>>>> suggest above. Hypothetical, low-risk issues are left unguarded,
>>>>> and only when a bug is found during testing is it then managed.
>>>>> The idea is to try and reduce the amount of compat variables and
>>>>> conditions needed in the ACPI generation code, but, of course, at
>>>>> some level of risk to users expecting their versioned machine type
>>>>> to always appear the same.
>>>>>
>>>>> So far we've been strict with mach-virt, guarding all hypothetical
>>>>> issues. Perhaps this patch is a good example to get a discussion
>>>>> started on whether or not we should be so strict though.
>>>>
>>>> That said, I don't have a very strong opinion here, beyond that
>>>> we should be consistent at least with x86 practice.
>>> another reason why we are trying not to use strict approach with ACPI
>>> tables is that it's part of firmware and we didn't version firmwares
>>> so far. (i.e. dst host with newer QEMU will typically have newer
>>> firmware and guest with old machine-type migrated to host with newer
>>> QEMU will run new firmware on (re)boot)
>>
>> I haven't been aware of this argument, and I'm surprised by it, but I
>> think it's valid. Regardless of our choice to ultimately compose the
>> ACPI tables in QEMU, guest OSes definitely consider ACPI as part of the
>> firmware. So, different ACPI content after a migration + guest reboot on
>> the target host is not much different from any other firmware-level
>> changes encountered on the same target host, after reboot.
>>
> 
> I agree. But does that imply that this fix should be tightly coupled
> to the mach-virt version, considering that the UEFI firmware you run
> *inside* such a vm is not versioned either?

No, it implies the exact opposite: given that the UEFI firmware is not
versioned, and may very well differ on source host and target host, the
ACPI payload that QEMU generates (and that the guest OS considers part
of the firmware) should be permitted to differ between src and dst host
just the same.

In brief, for one data point, I'd be fine if we didn't tie this change
to machine types.

Thanks
Laszlo
Peter Maydell Jan. 19, 2017, 1:16 p.m. UTC | #20
On 18 January 2017 at 17:26, Laszlo Ersek <lersek@redhat.com> wrote:
> In brief, for one data point, I'd be fine if we didn't tie this change
> to machine types.

We seem to have arrived at a consensus that we don't need
to version-constrain this change, so I'm applying Ard's
patch to target-arm.next.

thanks
-- PMM
diff mbox

Patch

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 085a61117378..50d52f685f68 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -310,6 +310,13 @@  static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
     Aml *dev_rp0 = aml_device("%s", "RP0");
     aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0)));
     aml_append(dev, dev_rp0);
+
+    Aml *dev_res0 = aml_device("%s", "RES0");
+    aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
+    crs = aml_resource_template();
+    aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE));
+    aml_append(dev_res0, aml_name_decl("_CRS", crs));
+    aml_append(dev, dev_res0);
     aml_append(scope, dev);
 }