mbox series

[RFC,v2,0/4] Use ACPI PCI hot-plug for q35

Message ID 20200818215227.181654-1-jusual@redhat.com (mailing list archive)
Headers show
Series Use ACPI PCI hot-plug for q35 | expand

Message

Julia Suvorova Aug. 18, 2020, 9:52 p.m. UTC
PCIe native hot-plug has numerous problems with racing events and unpredictable
guest behaviour (Windows). Switching to ACPI hot-plug for now.

Tested on RHEL 8 and Windows 2019.
pxb-pcie is not yet supported.

v2:
    * new ioport range for acpiphp [Gerd]
    * drop find_pci_host() [Igor]
    * explain magic numbers in _OSC [Igor]
    * drop build_q35_pci_hotplug() wrapper [Igor]

Julia Suvorova (4):
  hw/acpi/ich9: Trace ich9_gpe_readb()/writeb()
  hw/i386/acpi-build: Add ACPI PCI hot-plug methods to q35
  hw/i386/acpi-build: Turn off support of PCIe native hot-plug and SHPC
    in _OSC
  hw/acpi/ich9: Enable ACPI PCI hot-plug

 hw/i386/acpi-build.h    | 12 ++++++++++
 include/hw/acpi/ich9.h  |  3 +++
 include/hw/acpi/pcihp.h |  3 ++-
 hw/acpi/ich9.c          | 52 ++++++++++++++++++++++++++++++++++++++++-
 hw/acpi/pcihp.c         | 15 ++++++++----
 hw/acpi/piix4.c         |  2 +-
 hw/i386/acpi-build.c    | 48 +++++++++++++++++++++++--------------
 hw/i386/pc.c            |  1 +
 hw/acpi/trace-events    |  4 ++++
 9 files changed, 114 insertions(+), 26 deletions(-)

Comments

no-reply@patchew.org Aug. 18, 2020, 10:29 p.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/20200818215227.181654-1-jusual@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

acpi-test: Warning! DSDT binary file mismatch. Actual [aml:/tmp/aml-BFBCP0], Expected [aml:tests/data/acpi/pc/DSDT].
See source file tests/qtest/bios-tables-test.c for instructions on how to update expected files.
to see ASL diff between mismatched files install IASL, rebuild QEMU from scratch and re-run tests with V=1 environment variable set**
ERROR:/tmp/qemu-test/src/tests/qtest/bios-tables-test.c:494:test_acpi_asl: assertion failed: (all_tables_match)
ERROR bios-tables-test - Bail out! ERROR:/tmp/qemu-test/src/tests/qtest/bios-tables-test.c:494:test_acpi_asl: assertion failed: (all_tables_match)
make: *** [check-qtest-x86_64] Error 1
make: *** Waiting for unfinished jobs....
qemu-system-aarch64: -accel kvm: invalid accelerator kvm
qemu-system-aarch64: falling back to tcg
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=89a74ccc1bb04197855ac348bd5b106a', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-0hn12f1e/src/docker-src.2020-08-18-18.16.37.21604:/var/tmp/qemu:z,ro', 'qemu/centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=89a74ccc1bb04197855ac348bd5b106a
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-0hn12f1e/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    12m47.487s
user    0m8.668s


The full log is available at
http://patchew.org/logs/20200818215227.181654-1-jusual@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Michael S. Tsirkin Aug. 21, 2020, 3:41 a.m. UTC | #2
On Tue, Aug 18, 2020 at 11:52:23PM +0200, Julia Suvorova wrote:
> PCIe native hot-plug has numerous problems with racing events and unpredictable
> guest behaviour (Windows). Switching to ACPI hot-plug for now.
> 
> Tested on RHEL 8 and Windows 2019.
> pxb-pcie is not yet supported.

Julia, pls update expected acpi tables, see tests/qtest/bios-tables-test.c

> v2:
>     * new ioport range for acpiphp [Gerd]
>     * drop find_pci_host() [Igor]
>     * explain magic numbers in _OSC [Igor]
>     * drop build_q35_pci_hotplug() wrapper [Igor]
> 
> Julia Suvorova (4):
>   hw/acpi/ich9: Trace ich9_gpe_readb()/writeb()
>   hw/i386/acpi-build: Add ACPI PCI hot-plug methods to q35
>   hw/i386/acpi-build: Turn off support of PCIe native hot-plug and SHPC
>     in _OSC
>   hw/acpi/ich9: Enable ACPI PCI hot-plug
> 
>  hw/i386/acpi-build.h    | 12 ++++++++++
>  include/hw/acpi/ich9.h  |  3 +++
>  include/hw/acpi/pcihp.h |  3 ++-
>  hw/acpi/ich9.c          | 52 ++++++++++++++++++++++++++++++++++++++++-
>  hw/acpi/pcihp.c         | 15 ++++++++----
>  hw/acpi/piix4.c         |  2 +-
>  hw/i386/acpi-build.c    | 48 +++++++++++++++++++++++--------------
>  hw/i386/pc.c            |  1 +
>  hw/acpi/trace-events    |  4 ++++
>  9 files changed, 114 insertions(+), 26 deletions(-)
> 
> -- 
> 2.25.4
Igor Mammedov Aug. 21, 2020, 10:30 a.m. UTC | #3
On Tue, 18 Aug 2020 23:52:23 +0200
Julia Suvorova <jusual@redhat.com> wrote:

> PCIe native hot-plug has numerous problems with racing events and unpredictable
> guest behaviour (Windows).
Documenting these misterious problems I've asked for  in previous review
hasn't been addressed.
Pls see v1 for comments and add requested info into cover letter at least
or in a commit message.


> Switching to ACPI hot-plug for now.
> 
> Tested on RHEL 8 and Windows 2019.
> pxb-pcie is not yet supported.
> 
> v2:
>     * new ioport range for acpiphp [Gerd]
>     * drop find_pci_host() [Igor]
>     * explain magic numbers in _OSC [Igor]
>     * drop build_q35_pci_hotplug() wrapper [Igor]
> 
> Julia Suvorova (4):
>   hw/acpi/ich9: Trace ich9_gpe_readb()/writeb()
>   hw/i386/acpi-build: Add ACPI PCI hot-plug methods to q35
>   hw/i386/acpi-build: Turn off support of PCIe native hot-plug and SHPC
>     in _OSC
>   hw/acpi/ich9: Enable ACPI PCI hot-plug
> 
>  hw/i386/acpi-build.h    | 12 ++++++++++
>  include/hw/acpi/ich9.h  |  3 +++
>  include/hw/acpi/pcihp.h |  3 ++-
>  hw/acpi/ich9.c          | 52 ++++++++++++++++++++++++++++++++++++++++-
>  hw/acpi/pcihp.c         | 15 ++++++++----
>  hw/acpi/piix4.c         |  2 +-
>  hw/i386/acpi-build.c    | 48 +++++++++++++++++++++++--------------
>  hw/i386/pc.c            |  1 +
>  hw/acpi/trace-events    |  4 ++++
>  9 files changed, 114 insertions(+), 26 deletions(-)
>
Igor Mammedov Aug. 21, 2020, 12:29 p.m. UTC | #4
On Fri, 21 Aug 2020 12:30:07 +0200
Igor Mammedov <imammedo@redhat.com> wrote:

> On Tue, 18 Aug 2020 23:52:23 +0200
> Julia Suvorova <jusual@redhat.com> wrote:
> 
> > PCIe native hot-plug has numerous problems with racing events and unpredictable
> > guest behaviour (Windows).  
> Documenting these misterious problems I've asked for  in previous review
> hasn't been addressed.
> Pls see v1 for comments and add requested info into cover letter at least
> or in a commit message.
> 
> 
> > Switching to ACPI hot-plug for now.
in addition to above it looks like the way it's implemented,
it's either all on or all off.
Don't we want to have it per bridge?
What about per port that's possible with native.

We also should document (other than in C) expectations
toward this feature.

> > 
> > Tested on RHEL 8 and Windows 2019.
> > pxb-pcie is not yet supported.
> > 
> > v2:
> >     * new ioport range for acpiphp [Gerd]
> >     * drop find_pci_host() [Igor]
> >     * explain magic numbers in _OSC [Igor]
> >     * drop build_q35_pci_hotplug() wrapper [Igor]
> > 
> > Julia Suvorova (4):
> >   hw/acpi/ich9: Trace ich9_gpe_readb()/writeb()
> >   hw/i386/acpi-build: Add ACPI PCI hot-plug methods to q35
> >   hw/i386/acpi-build: Turn off support of PCIe native hot-plug and SHPC
> >     in _OSC
> >   hw/acpi/ich9: Enable ACPI PCI hot-plug
> > 
> >  hw/i386/acpi-build.h    | 12 ++++++++++
> >  include/hw/acpi/ich9.h  |  3 +++
> >  include/hw/acpi/pcihp.h |  3 ++-
> >  hw/acpi/ich9.c          | 52 ++++++++++++++++++++++++++++++++++++++++-
> >  hw/acpi/pcihp.c         | 15 ++++++++----
> >  hw/acpi/piix4.c         |  2 +-
> >  hw/i386/acpi-build.c    | 48 +++++++++++++++++++++++--------------
> >  hw/i386/pc.c            |  1 +
> >  hw/acpi/trace-events    |  4 ++++
> >  9 files changed, 114 insertions(+), 26 deletions(-)
> >   
> 
>
Laszlo Ersek Aug. 22, 2020, 2:25 p.m. UTC | #5
+Marcel, Laine, Daniel

On 08/21/20 12:30, Igor Mammedov wrote:
> On Tue, 18 Aug 2020 23:52:23 +0200
> Julia Suvorova <jusual@redhat.com> wrote:
>
>> PCIe native hot-plug has numerous problems with racing events and
>> unpredictable guest behaviour (Windows).
> Documenting these misterious problems I've asked for  in previous
> review hasn't been addressed.
> Pls see v1 for comments and add requested info into cover letter at
> least or in a commit message.

Igor, I assume you are referring to

  http://mid.mail-archive.com/20200715153321.3495e62d@redhat.com

and I couldn't agree more.

I'd like to understand the specific motivation for this patch series.

- I'm very concerned that it could regress various hotplug scenarios
with at least OVMF.

So minimally I'm hoping that the work is being meticulously tested with
OVMF.

- I don't recall testing native PCIe hot-*unplug*, but we had repeatedly
tested native PCIe plug with both Linux and Windows guests, and in the
end, it worked fine. (I recall working with at least Marcel on that; one
historical reference I can find now is
<https://bugzilla.tianocore.org/show_bug.cgi?id=75>.)

I remember users confirming that native PCIe hotplug worked with
assigned physical devices even (e.g. GPUs), assuming they made use of
the resource reservation capability (e.g. they'd reserve large MMIO64
areas during initial enumeration).

- I seem to remember that we had tested hotplug on extra root bridges
(PXB) too; regressing that -- per the pxb-pcie mention in the blurb,
quoted below -- wouldn't be great. At least, please don't flip the big
switch so soon (IIUC, there is a big switch being proposed).

- The documentation at "docs/pcie.txt" and "docs/pcie_pci_bridge.txt" is
chock-full of hotplug references; we had spent days if not weeks for
writing and reviewing those. I hope it's being evaluated how much of
that is going to need an update.

In particular, do we know how this work is going to affect the resource
reservation capability?

$ qemu-system-x86_64 -device pcie-root-port,\? | grep reserve
  bus-reserve=<uint32>   -  (default: 4294967295)
  io-reserve=<size>      -  (default: 18446744073709551615)
  mem-reserve=<size>     -  (default: 18446744073709551615)
  pref32-reserve=<size>  -  (default: 18446744073709551615)
  pref64-reserve=<size>  -  (default: 18446744073709551615)

The OVMF-side code (OvmfPkg/PciHotPlugInitDxe) was tough to write. As
far as I remember, especially commit fe4049471bdf
("OvmfPkg/PciHotPlugInitDxe: translate QEMU's resource reservation
hints", 2017-10-03) had taken a lot of navel-gazing. So the best answer
I'm looking for here is "this series does not affect resource
reservation at all".

- If my message is suggesting that I'm alarmed: that's an
understatement. This stuff is a mine-field. A good example is Gerd's
(correct!) response "Oh no, please don't" to Igor's question in the v1
thread, as to whether the piix4 IO port range could be reused:

  http://mid.mail-archive.com/20200715065751.ogchtdqmnn7cxsyi@sirius.home.kraxel.org

That kind of "reuse" would have been a catastrophe, because for PCI IO
port aperture, OVMF uses [0xC000..0xFFFF] on i440fx, but
[0x6000..0xFFFF] on q35:

  commit bba734ab4c7c9b4386d39420983bf61484f65dda
  Author: Laszlo Ersek <lersek@redhat.com>
  Date:   Mon May 9 22:54:36 2016 +0200

      OvmfPkg/PlatformPei: provide 10 * 4KB of PCI IO Port space on Q35

      This can accommodate 10 bridges (including root bridges, PCIe upstream and
      downstream ports, etc -- see
      <https://bugzilla.redhat.com/show_bug.cgi?id=1333238#c12> for more
      details).

      10 is not a whole lot, but closer to the architectural limit of 15 than
      our current 4, so it can be considered a stop-gap solution until all
      guests manage to migrate to virtio-1.0, and no longer need PCI IO BARs
      behind PCIe downstream ports.

      Cc: Gabriel Somlo <somlo@cmu.edu>
      Cc: Jordan Justen <jordan.l.justen@intel.com>
      Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1333238
      Contributed-under: TianoCore Contribution Agreement 1.0
      Signed-off-by: Laszlo Ersek <lersek@redhat.com>
      Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
      Tested-by: Gabriel Somlo <somlo@cmu.edu>

- If native PCIe hot-unplug is not working well (or at all) right now,
then I guess I can't just summarily say "we had better drop this like
hot potato".

But then, if we are committed to *juggling* that potato, we should at
least document the use case / motivation / current issues meticulously,
please. Do we have a public BZ at least?

- The other work, with regard to *disabling* unplug, which seems to be
progressing in parallel, is similarly in need of a good explanation, in
my opinion:

  http://mid.mail-archive.com/20200820092157.17792-1-ani@anisinha.ca

Yes, I have read Laine's long email, linked from the QEMU cover letter:

  https://www.redhat.com/archives/libvir-list/2020-February/msg00110.html

The whole use case "prevent guest admins from unplugging virtual
devices" still doesn't make any sense to me. How is "some cloud admins
don't want that" acceptable at face value, without further explanation?

Thanks
Laszlo

>
>
>> Switching to ACPI hot-plug for now.
>>
>> Tested on RHEL 8 and Windows 2019.
>> pxb-pcie is not yet supported.
>>
>> v2:
>>     * new ioport range for acpiphp [Gerd]
>>     * drop find_pci_host() [Igor]
>>     * explain magic numbers in _OSC [Igor]
>>     * drop build_q35_pci_hotplug() wrapper [Igor]
>>
>> Julia Suvorova (4):
>>   hw/acpi/ich9: Trace ich9_gpe_readb()/writeb()
>>   hw/i386/acpi-build: Add ACPI PCI hot-plug methods to q35
>>   hw/i386/acpi-build: Turn off support of PCIe native hot-plug and SHPC
>>     in _OSC
>>   hw/acpi/ich9: Enable ACPI PCI hot-plug
>>
>>  hw/i386/acpi-build.h    | 12 ++++++++++
>>  include/hw/acpi/ich9.h  |  3 +++
>>  include/hw/acpi/pcihp.h |  3 ++-
>>  hw/acpi/ich9.c          | 52 ++++++++++++++++++++++++++++++++++++++++-
>>  hw/acpi/pcihp.c         | 15 ++++++++----
>>  hw/acpi/piix4.c         |  2 +-
>>  hw/i386/acpi-build.c    | 48 +++++++++++++++++++++++--------------
>>  hw/i386/pc.c            |  1 +
>>  hw/acpi/trace-events    |  4 ++++
>>  9 files changed, 114 insertions(+), 26 deletions(-)
>>
>
>
Igor Mammedov Aug. 24, 2020, 11:35 a.m. UTC | #6
On Sat, 22 Aug 2020 16:25:55 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> +Marcel, Laine, Daniel
> 
> On 08/21/20 12:30, Igor Mammedov wrote:
> > On Tue, 18 Aug 2020 23:52:23 +0200
> > Julia Suvorova <jusual@redhat.com> wrote:
> >  
> >> PCIe native hot-plug has numerous problems with racing events and
> >> unpredictable guest behaviour (Windows).  
> > Documenting these misterious problems I've asked for  in previous
> > review hasn't been addressed.
> > Pls see v1 for comments and add requested info into cover letter at
> > least or in a commit message.  
> 
> Igor, I assume you are referring to
> 
>   http://mid.mail-archive.com/20200715153321.3495e62d@redhat.com
> 
> and I couldn't agree more.
> 
> I'd like to understand the specific motivation for this patch series.
> 
> - I'm very concerned that it could regress various hotplug scenarios
> with at least OVMF.
> 
> So minimally I'm hoping that the work is being meticulously tested with
> OVMF.
> 
> - I don't recall testing native PCIe hot-*unplug*, but we had repeatedly
> tested native PCIe plug with both Linux and Windows guests, and in the
> end, it worked fine. (I recall working with at least Marcel on that; one
> historical reference I can find now is
> <https://bugzilla.tianocore.org/show_bug.cgi?id=75>.)
> 
> I remember users confirming that native PCIe hotplug worked with
> assigned physical devices even (e.g. GPUs), assuming they made use of
> the resource reservation capability (e.g. they'd reserve large MMIO64
> areas during initial enumeration).
> 
> - I seem to remember that we had tested hotplug on extra root bridges
> (PXB) too; regressing that -- per the pxb-pcie mention in the blurb,
> quoted below -- wouldn't be great. At least, please don't flip the big
> switch so soon (IIUC, there is a big switch being proposed).

I'm suggesting to make ACPI hotplug on q35 opt-in,
becasue it's only Windows guests that doesn't work well with unplug.
Linux guests seems to be just fine with native hotplug.

> - The documentation at "docs/pcie.txt" and "docs/pcie_pci_bridge.txt" is
> chock-full of hotplug references; we had spent days if not weeks for
> writing and reviewing those. I hope it's being evaluated how much of
> that is going to need an update.
> 
> In particular, do we know how this work is going to affect the resource
> reservation capability?
My hunch is that should not be affected (but I will not bet on it).
ACPI hotplug just changes route hotplug event is delivered, and unplug
happens via ACPI as well. That works around drivers offlining/onlining
devices in rapid succession during native unplug (that's quite crude
justification).

So I'd like reasons to be well documented, including what ideas were
tried to fix or workaround those issues (beside ACPI one), so the next
time we look at it we don't have to start from ground up.

 
> $ qemu-system-x86_64 -device pcie-root-port,\? | grep reserve
>   bus-reserve=<uint32>   -  (default: 4294967295)
>   io-reserve=<size>      -  (default: 18446744073709551615)
>   mem-reserve=<size>     -  (default: 18446744073709551615)
>   pref32-reserve=<size>  -  (default: 18446744073709551615)
>   pref64-reserve=<size>  -  (default: 18446744073709551615)
> 
> The OVMF-side code (OvmfPkg/PciHotPlugInitDxe) was tough to write. As
> far as I remember, especially commit fe4049471bdf
> ("OvmfPkg/PciHotPlugInitDxe: translate QEMU's resource reservation
> hints", 2017-10-03) had taken a lot of navel-gazing. So the best answer
> I'm looking for here is "this series does not affect resource
> reservation at all".
> 
> - If my message is suggesting that I'm alarmed: that's an
> understatement. This stuff is a mine-field. A good example is Gerd's
> (correct!) response "Oh no, please don't" to Igor's question in the v1
> thread, as to whether the piix4 IO port range could be reused:
> 
>   http://mid.mail-archive.com/20200715065751.ogchtdqmnn7cxsyi@sirius.home.kraxel.org
> 
> That kind of "reuse" would have been a catastrophe, because for PCI IO
> port aperture, OVMF uses [0xC000..0xFFFF] on i440fx, but
> [0x6000..0xFFFF] on q35:
> 
>   commit bba734ab4c7c9b4386d39420983bf61484f65dda
>   Author: Laszlo Ersek <lersek@redhat.com>
>   Date:   Mon May 9 22:54:36 2016 +0200
> 
>       OvmfPkg/PlatformPei: provide 10 * 4KB of PCI IO Port space on Q35
> 
>       This can accommodate 10 bridges (including root bridges, PCIe upstream and
>       downstream ports, etc -- see
>       <https://bugzilla.redhat.com/show_bug.cgi?id=1333238#c12> for more
>       details).
> 
>       10 is not a whole lot, but closer to the architectural limit of 15 than
>       our current 4, so it can be considered a stop-gap solution until all
>       guests manage to migrate to virtio-1.0, and no longer need PCI IO BARs
>       behind PCIe downstream ports.
> 
>       Cc: Gabriel Somlo <somlo@cmu.edu>
>       Cc: Jordan Justen <jordan.l.justen@intel.com>
>       Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1333238
>       Contributed-under: TianoCore Contribution Agreement 1.0
>       Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>       Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
>       Tested-by: Gabriel Somlo <somlo@cmu.edu>
> 
> - If native PCIe hot-unplug is not working well (or at all) right now,
> then I guess I can't just summarily say "we had better drop this like
> hot potato".
> 
> But then, if we are committed to *juggling* that potato, we should at
> least document the use case / motivation / current issues meticulously,
> please. Do we have a public BZ at least?
> 
> - The other work, with regard to *disabling* unplug, which seems to be
> progressing in parallel, is similarly in need of a good explanation, in
> my opinion:
> 
>   http://mid.mail-archive.com/20200820092157.17792-1-ani@anisinha.ca
> 
> Yes, I have read Laine's long email, linked from the QEMU cover letter:
> 
>   https://www.redhat.com/archives/libvir-list/2020-February/msg00110.html
> 
> The whole use case "prevent guest admins from unplugging virtual
> devices" still doesn't make any sense to me. How is "some cloud admins
> don't want that" acceptable at face value, without further explanation?
My take on it that, Windows always exposes unplug icon, and lets VM users
to unplug PCI devices. Notably, user is able to click away the only NIC
VM was configured with.

Unfortunatly the 'feature' can't be fixed on guest side, that's why
hypervisors implement such hack to disable ACPI hotplug. Which I guess
is backed by demand from users deploying Windows virtual desktops.

PS:
I'd made PCI hotplug opt-in, since not everyone needs it.
But that ship sailed long ago.



> 
> Thanks
> Laszlo
> 
> >
> >  
> >> Switching to ACPI hot-plug for now.
> >>
> >> Tested on RHEL 8 and Windows 2019.
> >> pxb-pcie is not yet supported.
> >>
> >> v2:
> >>     * new ioport range for acpiphp [Gerd]
> >>     * drop find_pci_host() [Igor]
> >>     * explain magic numbers in _OSC [Igor]
> >>     * drop build_q35_pci_hotplug() wrapper [Igor]
> >>
> >> Julia Suvorova (4):
> >>   hw/acpi/ich9: Trace ich9_gpe_readb()/writeb()
> >>   hw/i386/acpi-build: Add ACPI PCI hot-plug methods to q35
> >>   hw/i386/acpi-build: Turn off support of PCIe native hot-plug and SHPC
> >>     in _OSC
> >>   hw/acpi/ich9: Enable ACPI PCI hot-plug
> >>
> >>  hw/i386/acpi-build.h    | 12 ++++++++++
> >>  include/hw/acpi/ich9.h  |  3 +++
> >>  include/hw/acpi/pcihp.h |  3 ++-
> >>  hw/acpi/ich9.c          | 52 ++++++++++++++++++++++++++++++++++++++++-
> >>  hw/acpi/pcihp.c         | 15 ++++++++----
> >>  hw/acpi/piix4.c         |  2 +-
> >>  hw/i386/acpi-build.c    | 48 +++++++++++++++++++++++--------------
> >>  hw/i386/pc.c            |  1 +
> >>  hw/acpi/trace-events    |  4 ++++
> >>  9 files changed, 114 insertions(+), 26 deletions(-)
> >>  
> >
> >  
>
Ani Sinha Aug. 24, 2020, 11:51 a.m. UTC | #7
On Mon, Aug 24, 2020 at 5:06 PM Igor Mammedov <imammedo@redhat.com> wrote:
>
> On Sat, 22 Aug 2020 16:25:55 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
>
> > +Marcel, Laine, Daniel
> >
> > On 08/21/20 12:30, Igor Mammedov wrote:
> > > On Tue, 18 Aug 2020 23:52:23 +0200
> > > Julia Suvorova <jusual@redhat.com> wrote:
> > >
> > >> PCIe native hot-plug has numerous problems with racing events and
> > >> unpredictable guest behaviour (Windows).
> > > Documenting these misterious problems I've asked for  in previous
> > > review hasn't been addressed.
> > > Pls see v1 for comments and add requested info into cover letter at
> > > least or in a commit message.
> >
> > Igor, I assume you are referring to
> >
> >   http://mid.mail-archive.com/20200715153321.3495e62d@redhat.com
> >
> > and I couldn't agree more.
> >
> > I'd like to understand the specific motivation for this patch series.
> >
> > - I'm very concerned that it could regress various hotplug scenarios
> > with at least OVMF.
> >
> > So minimally I'm hoping that the work is being meticulously tested with
> > OVMF.
> >
> > - I don't recall testing native PCIe hot-*unplug*, but we had repeatedly
> > tested native PCIe plug with both Linux and Windows guests, and in the
> > end, it worked fine. (I recall working with at least Marcel on that; one
> > historical reference I can find now is
> > <https://bugzilla.tianocore.org/show_bug.cgi?id=75>.)
> >
> > I remember users confirming that native PCIe hotplug worked with
> > assigned physical devices even (e.g. GPUs), assuming they made use of
> > the resource reservation capability (e.g. they'd reserve large MMIO64
> > areas during initial enumeration).
> >
> > - I seem to remember that we had tested hotplug on extra root bridges
> > (PXB) too; regressing that -- per the pxb-pcie mention in the blurb,
> > quoted below -- wouldn't be great. At least, please don't flip the big
> > switch so soon (IIUC, there is a big switch being proposed).
>
> I'm suggesting to make ACPI hotplug on q35 opt-in,
> becasue it's only Windows guests that doesn't work well with unplug.
> Linux guests seems to be just fine with native hotplug.
>
> > - The documentation at "docs/pcie.txt" and "docs/pcie_pci_bridge.txt" is
> > chock-full of hotplug references; we had spent days if not weeks for
> > writing and reviewing those. I hope it's being evaluated how much of
> > that is going to need an update.
> >
> > In particular, do we know how this work is going to affect the resource
> > reservation capability?
> My hunch is that should not be affected (but I will not bet on it).
> ACPI hotplug just changes route hotplug event is delivered, and unplug
> happens via ACPI as well. That works around drivers offlining/onlining
> devices in rapid succession during native unplug (that's quite crude
> justification).
>
> So I'd like reasons to be well documented, including what ideas were
> tried to fix or workaround those issues (beside ACPI one), so the next
> time we look at it we don't have to start from ground up.
>
>
> > $ qemu-system-x86_64 -device pcie-root-port,\? | grep reserve
> >   bus-reserve=<uint32>   -  (default: 4294967295)
> >   io-reserve=<size>      -  (default: 18446744073709551615)
> >   mem-reserve=<size>     -  (default: 18446744073709551615)
> >   pref32-reserve=<size>  -  (default: 18446744073709551615)
> >   pref64-reserve=<size>  -  (default: 18446744073709551615)
> >
> > The OVMF-side code (OvmfPkg/PciHotPlugInitDxe) was tough to write. As
> > far as I remember, especially commit fe4049471bdf
> > ("OvmfPkg/PciHotPlugInitDxe: translate QEMU's resource reservation
> > hints", 2017-10-03) had taken a lot of navel-gazing. So the best answer
> > I'm looking for here is "this series does not affect resource
> > reservation at all".
> >
> > - If my message is suggesting that I'm alarmed: that's an
> > understatement. This stuff is a mine-field. A good example is Gerd's
> > (correct!) response "Oh no, please don't" to Igor's question in the v1
> > thread, as to whether the piix4 IO port range could be reused:
> >
> >   http://mid.mail-archive.com/20200715065751.ogchtdqmnn7cxsyi@sirius.home.kraxel.org
> >
> > That kind of "reuse" would have been a catastrophe, because for PCI IO
> > port aperture, OVMF uses [0xC000..0xFFFF] on i440fx, but
> > [0x6000..0xFFFF] on q35:
> >
> >   commit bba734ab4c7c9b4386d39420983bf61484f65dda
> >   Author: Laszlo Ersek <lersek@redhat.com>
> >   Date:   Mon May 9 22:54:36 2016 +0200
> >
> >       OvmfPkg/PlatformPei: provide 10 * 4KB of PCI IO Port space on Q35
> >
> >       This can accommodate 10 bridges (including root bridges, PCIe upstream and
> >       downstream ports, etc -- see
> >       <https://bugzilla.redhat.com/show_bug.cgi?id=1333238#c12> for more
> >       details).
> >
> >       10 is not a whole lot, but closer to the architectural limit of 15 than
> >       our current 4, so it can be considered a stop-gap solution until all
> >       guests manage to migrate to virtio-1.0, and no longer need PCI IO BARs
> >       behind PCIe downstream ports.
> >
> >       Cc: Gabriel Somlo <somlo@cmu.edu>
> >       Cc: Jordan Justen <jordan.l.justen@intel.com>
> >       Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1333238
> >       Contributed-under: TianoCore Contribution Agreement 1.0
> >       Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> >       Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
> >       Tested-by: Gabriel Somlo <somlo@cmu.edu>
> >
> > - If native PCIe hot-unplug is not working well (or at all) right now,
> > then I guess I can't just summarily say "we had better drop this like
> > hot potato".
> >
> > But then, if we are committed to *juggling* that potato, we should at
> > least document the use case / motivation / current issues meticulously,
> > please. Do we have a public BZ at least?
> >
> > - The other work, with regard to *disabling* unplug, which seems to be
> > progressing in parallel, is similarly in need of a good explanation, in
> > my opinion:
> >
> >   http://mid.mail-archive.com/20200820092157.17792-1-ani@anisinha.ca
> >
> > Yes, I have read Laine's long email, linked from the QEMU cover letter:
> >
> >   https://www.redhat.com/archives/libvir-list/2020-February/msg00110.html
> >
> > The whole use case "prevent guest admins from unplugging virtual
> > devices" still doesn't make any sense to me. How is "some cloud admins
> > don't want that" acceptable at face value, without further explanation?
> My take on it that, Windows always exposes unplug icon, and lets VM users
> to unplug PCI devices. Notably, user is able to click away the only NIC
> VM was configured with.

Correct. Also sometimes the admins may not want some other PCI devices
to be hot unpluggable such as the balloon device.

>
> Unfortunatly the 'feature' can't be fixed on guest side,

It can be using driver hacks but they are very operating system
specific and also needs to be applied per VM everytime they are
powered on.

that's why
> hypervisors implement such hack to disable ACPI hotplug. Which I guess
> is backed by demand from users deploying Windows virtual desktops.
>
> PS:
> I'd made PCI hotplug opt-in, since not everyone needs it.
> But that ship sailed long ago.
>
>
>
> >
> > Thanks
> > Laszlo
> >
> > >
> > >
> > >> Switching to ACPI hot-plug for now.
> > >>
> > >> Tested on RHEL 8 and Windows 2019.
> > >> pxb-pcie is not yet supported.
> > >>
> > >> v2:
> > >>     * new ioport range for acpiphp [Gerd]
> > >>     * drop find_pci_host() [Igor]
> > >>     * explain magic numbers in _OSC [Igor]
> > >>     * drop build_q35_pci_hotplug() wrapper [Igor]
> > >>
> > >> Julia Suvorova (4):
> > >>   hw/acpi/ich9: Trace ich9_gpe_readb()/writeb()
> > >>   hw/i386/acpi-build: Add ACPI PCI hot-plug methods to q35
> > >>   hw/i386/acpi-build: Turn off support of PCIe native hot-plug and SHPC
> > >>     in _OSC
> > >>   hw/acpi/ich9: Enable ACPI PCI hot-plug
> > >>
> > >>  hw/i386/acpi-build.h    | 12 ++++++++++
> > >>  include/hw/acpi/ich9.h  |  3 +++
> > >>  include/hw/acpi/pcihp.h |  3 ++-
> > >>  hw/acpi/ich9.c          | 52 ++++++++++++++++++++++++++++++++++++++++-
> > >>  hw/acpi/pcihp.c         | 15 ++++++++----
> > >>  hw/acpi/piix4.c         |  2 +-
> > >>  hw/i386/acpi-build.c    | 48 +++++++++++++++++++++++--------------
> > >>  hw/i386/pc.c            |  1 +
> > >>  hw/acpi/trace-events    |  4 ++++
> > >>  9 files changed, 114 insertions(+), 26 deletions(-)
> > >>
> > >
> > >
> >
>
Laszlo Ersek Aug. 24, 2020, 3:03 p.m. UTC | #8
On 08/24/20 13:51, Ani Sinha wrote:
> On Mon, Aug 24, 2020 at 5:06 PM Igor Mammedov <imammedo@redhat.com> wrote:
>>
>> On Sat, 22 Aug 2020 16:25:55 +0200
>> Laszlo Ersek <lersek@redhat.com> wrote:
>>
>>> +Marcel, Laine, Daniel
>>>
>>> On 08/21/20 12:30, Igor Mammedov wrote:
>>>> On Tue, 18 Aug 2020 23:52:23 +0200
>>>> Julia Suvorova <jusual@redhat.com> wrote:
>>>>
>>>>> PCIe native hot-plug has numerous problems with racing events and
>>>>> unpredictable guest behaviour (Windows).
>>>> Documenting these misterious problems I've asked for  in previous
>>>> review hasn't been addressed.
>>>> Pls see v1 for comments and add requested info into cover letter at
>>>> least or in a commit message.
>>>
>>> Igor, I assume you are referring to
>>>
>>>   http://mid.mail-archive.com/20200715153321.3495e62d@redhat.com
>>>
>>> and I couldn't agree more.
>>>
>>> I'd like to understand the specific motivation for this patch series.
>>>
>>> - I'm very concerned that it could regress various hotplug scenarios
>>> with at least OVMF.
>>>
>>> So minimally I'm hoping that the work is being meticulously tested with
>>> OVMF.
>>>
>>> - I don't recall testing native PCIe hot-*unplug*, but we had repeatedly
>>> tested native PCIe plug with both Linux and Windows guests, and in the
>>> end, it worked fine. (I recall working with at least Marcel on that; one
>>> historical reference I can find now is
>>> <https://bugzilla.tianocore.org/show_bug.cgi?id=75>.)
>>>
>>> I remember users confirming that native PCIe hotplug worked with
>>> assigned physical devices even (e.g. GPUs), assuming they made use of
>>> the resource reservation capability (e.g. they'd reserve large MMIO64
>>> areas during initial enumeration).
>>>
>>> - I seem to remember that we had tested hotplug on extra root bridges
>>> (PXB) too; regressing that -- per the pxb-pcie mention in the blurb,
>>> quoted below -- wouldn't be great. At least, please don't flip the big
>>> switch so soon (IIUC, there is a big switch being proposed).
>>
>> I'm suggesting to make ACPI hotplug on q35 opt-in,
>> becasue it's only Windows guests that doesn't work well with unplug.
>> Linux guests seems to be just fine with native hotplug.
>>
>>> - The documentation at "docs/pcie.txt" and "docs/pcie_pci_bridge.txt" is
>>> chock-full of hotplug references; we had spent days if not weeks for
>>> writing and reviewing those. I hope it's being evaluated how much of
>>> that is going to need an update.
>>>
>>> In particular, do we know how this work is going to affect the resource
>>> reservation capability?
>> My hunch is that should not be affected (but I will not bet on it).
>> ACPI hotplug just changes route hotplug event is delivered, and unplug
>> happens via ACPI as well. That works around drivers offlining/onlining
>> devices in rapid succession during native unplug (that's quite crude
>> justification).
>>
>> So I'd like reasons to be well documented, including what ideas were
>> tried to fix or workaround those issues (beside ACPI one), so the next
>> time we look at it we don't have to start from ground up.
>>
>>
>>> $ qemu-system-x86_64 -device pcie-root-port,\? | grep reserve
>>>   bus-reserve=<uint32>   -  (default: 4294967295)
>>>   io-reserve=<size>      -  (default: 18446744073709551615)
>>>   mem-reserve=<size>     -  (default: 18446744073709551615)
>>>   pref32-reserve=<size>  -  (default: 18446744073709551615)
>>>   pref64-reserve=<size>  -  (default: 18446744073709551615)
>>>
>>> The OVMF-side code (OvmfPkg/PciHotPlugInitDxe) was tough to write. As
>>> far as I remember, especially commit fe4049471bdf
>>> ("OvmfPkg/PciHotPlugInitDxe: translate QEMU's resource reservation
>>> hints", 2017-10-03) had taken a lot of navel-gazing. So the best answer
>>> I'm looking for here is "this series does not affect resource
>>> reservation at all".
>>>
>>> - If my message is suggesting that I'm alarmed: that's an
>>> understatement. This stuff is a mine-field. A good example is Gerd's
>>> (correct!) response "Oh no, please don't" to Igor's question in the v1
>>> thread, as to whether the piix4 IO port range could be reused:
>>>
>>>   http://mid.mail-archive.com/20200715065751.ogchtdqmnn7cxsyi@sirius.home.kraxel.org
>>>
>>> That kind of "reuse" would have been a catastrophe, because for PCI IO
>>> port aperture, OVMF uses [0xC000..0xFFFF] on i440fx, but
>>> [0x6000..0xFFFF] on q35:
>>>
>>>   commit bba734ab4c7c9b4386d39420983bf61484f65dda
>>>   Author: Laszlo Ersek <lersek@redhat.com>
>>>   Date:   Mon May 9 22:54:36 2016 +0200
>>>
>>>       OvmfPkg/PlatformPei: provide 10 * 4KB of PCI IO Port space on Q35
>>>
>>>       This can accommodate 10 bridges (including root bridges, PCIe upstream and
>>>       downstream ports, etc -- see
>>>       <https://bugzilla.redhat.com/show_bug.cgi?id=1333238#c12> for more
>>>       details).
>>>
>>>       10 is not a whole lot, but closer to the architectural limit of 15 than
>>>       our current 4, so it can be considered a stop-gap solution until all
>>>       guests manage to migrate to virtio-1.0, and no longer need PCI IO BARs
>>>       behind PCIe downstream ports.
>>>
>>>       Cc: Gabriel Somlo <somlo@cmu.edu>
>>>       Cc: Jordan Justen <jordan.l.justen@intel.com>
>>>       Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1333238
>>>       Contributed-under: TianoCore Contribution Agreement 1.0
>>>       Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>>>       Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
>>>       Tested-by: Gabriel Somlo <somlo@cmu.edu>
>>>
>>> - If native PCIe hot-unplug is not working well (or at all) right now,
>>> then I guess I can't just summarily say "we had better drop this like
>>> hot potato".
>>>
>>> But then, if we are committed to *juggling* that potato, we should at
>>> least document the use case / motivation / current issues meticulously,
>>> please. Do we have a public BZ at least?
>>>
>>> - The other work, with regard to *disabling* unplug, which seems to be
>>> progressing in parallel, is similarly in need of a good explanation, in
>>> my opinion:
>>>
>>>   http://mid.mail-archive.com/20200820092157.17792-1-ani@anisinha.ca
>>>
>>> Yes, I have read Laine's long email, linked from the QEMU cover letter:
>>>
>>>   https://www.redhat.com/archives/libvir-list/2020-February/msg00110.html
>>>
>>> The whole use case "prevent guest admins from unplugging virtual
>>> devices" still doesn't make any sense to me. How is "some cloud admins
>>> don't want that" acceptable at face value, without further explanation?
>> My take on it that, Windows always exposes unplug icon, and lets VM users
>> to unplug PCI devices. Notably, user is able to click away the only NIC
>> VM was configured with.
> 
> Correct. Also sometimes the admins may not want some other PCI devices
> to be hot unpluggable such as the balloon device.
> 
>>
>> Unfortunatly the 'feature' can't be fixed on guest side,
> 
> It can be using driver hacks but they are very operating system
> specific and also needs to be applied per VM everytime they are
> powered on.
> 
> that's why
>> hypervisors implement such hack to disable ACPI hotplug. Which I guess
>> is backed by demand from users deploying Windows virtual desktops.
>>
>> PS:
>> I'd made PCI hotplug opt-in, since not everyone needs it.
>> But that ship sailed long ago.

Thank you both for explaining.

All of these use cases seems justified to me.

Given that they are basically quirks, for addressing guest OS specific
peculiarities, changing machine type defaults does not seem warranted.
In my opinion, all of these bits should be opt-in. If we need to capture
permanent recommendations, we can always document them, and/or libosinfo
can expose them machine-readably.

Thanks
Laszlo