Message ID | 20220224174411.3296848-3-imammedo@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix broken PCIe device after migration | expand |
On Thu, Feb 24, 2022 at 12:44:09PM -0500, Igor Mammedov wrote: > on creation a PCIDevice has power turned on at the end of pci_qdev_realize() > however later on if PCIe slot isn't populated with any children > it's power is turned off. It's fine if native hotplug is used > as plug callback will power slot on among other things. > However when ACPI hotplug is enabled it replaces native PCIe plug > callbacks with ACPI specific ones (acpi_pcihp_device_*plug_cb) and > as result slot stays powered off. It works fine as ACPI hotplug > on guest side takes care of enumerating/initializing hotplugged > device. But when later guest is migrated, call chain introduced by [1] > > pcie_cap_slot_post_load() > -> pcie_cap_update_power() > -> pcie_set_power_device() > -> pci_set_power() > -> pci_update_mappings() > > will disable earlier initialized BARs for the hotplugged device > in powered off slot due to commit [2] which disables BARs if > power is off. As result guest OS after migration will be very > much confused [3], still thinking that it has working device, > which isn't true anymore due to disabled BARs. > > Fix it by honoring PCI_EXP_SLTCAP_PCP and updating power status > only if capability is enabled. Follow up patch will disable > PCI_EXP_SLTCAP_PCP overriding COMPAT_PROP_PCP property when > PCIe slot is under ACPI PCI hotplug control. > > See [3] for reproducer. > > 1) > Fixes: commit d5daff7d312 (pcie: implement slot power control for pcie root ports) > 2) > commit 23786d13441 (pci: implement power state) > 3) > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2053584 > Correct format for the last paragraph: Fixes: d5daff7d312 ("pcie: implement slot power control for pcie root ports") Fixes: 23786d13441 ("pci: implement power state") Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584 > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > hw/pci/pcie.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > index d7d73a31e4..2339729a7c 100644 > --- a/hw/pci/pcie.c > +++ b/hw/pci/pcie.c > @@ -383,10 +383,9 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) > > if (sltcap & PCI_EXP_SLTCAP_PCP) { > power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON; > + pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > + pcie_set_power_device, &power); > } > - > - pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > - pcie_set_power_device, &power); I think this is correct. However, I wonder whether for 6.2 compatiblity as a hack we should sometimes skip the power update even when PCI_EXP_SLTCAP_PCP exists. Will that not work around the issue for these machine types? And assuming we want bug for bug compat anyway, why not just put it here? It seems easier to reason about frankly ... > } > > /* > -- > 2.31.1
On Thu, 24 Feb 2022 13:05:07 -0500 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Thu, Feb 24, 2022 at 12:44:09PM -0500, Igor Mammedov wrote: > > on creation a PCIDevice has power turned on at the end of pci_qdev_realize() > > however later on if PCIe slot isn't populated with any children > > it's power is turned off. It's fine if native hotplug is used > > as plug callback will power slot on among other things. > > However when ACPI hotplug is enabled it replaces native PCIe plug > > callbacks with ACPI specific ones (acpi_pcihp_device_*plug_cb) and > > as result slot stays powered off. It works fine as ACPI hotplug > > on guest side takes care of enumerating/initializing hotplugged > > device. But when later guest is migrated, call chain introduced by [1] > > > > pcie_cap_slot_post_load() > > -> pcie_cap_update_power() > > -> pcie_set_power_device() > > -> pci_set_power() > > -> pci_update_mappings() > > > > will disable earlier initialized BARs for the hotplugged device > > in powered off slot due to commit [2] which disables BARs if > > power is off. As result guest OS after migration will be very > > much confused [3], still thinking that it has working device, > > which isn't true anymore due to disabled BARs. > > > > Fix it by honoring PCI_EXP_SLTCAP_PCP and updating power status > > only if capability is enabled. Follow up patch will disable > > PCI_EXP_SLTCAP_PCP overriding COMPAT_PROP_PCP property when > > PCIe slot is under ACPI PCI hotplug control. > > > > See [3] for reproducer. > > > > 1) > > Fixes: commit d5daff7d312 (pcie: implement slot power control for pcie root ports) > > 2) > > commit 23786d13441 (pci: implement power state) > > 3) > > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2053584 > > > > > Correct format for the last paragraph: > > > Fixes: d5daff7d312 ("pcie: implement slot power control for pcie root ports") > Fixes: 23786d13441 ("pci: implement power state") > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584 ok, will fix it up on respin like this to have references: 1) Fixes: d5daff7d312 ("pcie: implement slot power control for pcie root ports") 2) Fixes: 23786d13441 ("pci: implement power state") Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584 > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > --- > > hw/pci/pcie.c | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > index d7d73a31e4..2339729a7c 100644 > > --- a/hw/pci/pcie.c > > +++ b/hw/pci/pcie.c > > @@ -383,10 +383,9 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) > > > > if (sltcap & PCI_EXP_SLTCAP_PCP) { > > power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON; > > + pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > + pcie_set_power_device, &power); > > } > > - > > - pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > - pcie_set_power_device, &power); > > I think this is correct. However, I wonder whether for 6.2 compatiblity > as a hack we should sometimes skip the power update even when > PCI_EXP_SLTCAP_PCP exists. Will that not work around the issue for > these machine types? pc-q35-6.2 is broken utterly. With pc-q35-6.1, it's a mess. Here is a ping-pong migration matrix for it v6.1 | v6.2 | Fix v6.1 ok | broken | ok (#1) v6.2 | broken | broken (#2) [1] has PCI_EXP_SLTCAP_PCP due to x-pcihp-enable-pcie-pcp-cap=on i.e. pci_config is exactly the same as in qemu-v6.1 [2] PCI_EXP_SLTCAP_PCP is enabled + empty slot is powered off (+ state is migrated) there are some invariants that might work in one direction, but it won't survive ping-pong migration. And more importantly for upstream we care mostly care for old -> new working, and it's direction that is broken in v6.2. > And assuming we want bug for bug compat anyway, why not just put > it here? It seems easier to reason about frankly ... It should be possible hack PCI core to fixup broken power state on incoming migration at (at postload time), but that would just create more confusion, where in some cases migration would work and in some would not (depending on used qemu versions). Lets just declare v6.2 qemu broken, with upgrade/downgrade to (7.0/6.1) as suggested solution. PS: I'd very much prefer avoid adding hacks for ACPI pcihp sake to PCI core, and let PCI code behave as it's supposed to per spec. It's already bad enough with pcihp layered on top of PCI, making PCI code depend on pcihp will just make it more fragile. > > } > > > > /* > > -- > > 2.31.1 >
On Fri, Feb 25, 2022 at 09:18:30AM +0100, Igor Mammedov wrote: > On Thu, 24 Feb 2022 13:05:07 -0500 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Thu, Feb 24, 2022 at 12:44:09PM -0500, Igor Mammedov wrote: > > > on creation a PCIDevice has power turned on at the end of pci_qdev_realize() > > > however later on if PCIe slot isn't populated with any children > > > it's power is turned off. It's fine if native hotplug is used > > > as plug callback will power slot on among other things. > > > However when ACPI hotplug is enabled it replaces native PCIe plug > > > callbacks with ACPI specific ones (acpi_pcihp_device_*plug_cb) and > > > as result slot stays powered off. It works fine as ACPI hotplug > > > on guest side takes care of enumerating/initializing hotplugged > > > device. But when later guest is migrated, call chain introduced by [1] > > > > > > pcie_cap_slot_post_load() > > > -> pcie_cap_update_power() > > > -> pcie_set_power_device() > > > -> pci_set_power() > > > -> pci_update_mappings() > > > > > > will disable earlier initialized BARs for the hotplugged device > > > in powered off slot due to commit [2] which disables BARs if > > > power is off. As result guest OS after migration will be very > > > much confused [3], still thinking that it has working device, > > > which isn't true anymore due to disabled BARs. > > > > > > Fix it by honoring PCI_EXP_SLTCAP_PCP and updating power status > > > only if capability is enabled. Follow up patch will disable > > > PCI_EXP_SLTCAP_PCP overriding COMPAT_PROP_PCP property when > > > PCIe slot is under ACPI PCI hotplug control. > > > > > > See [3] for reproducer. > > > > > > 1) > > > Fixes: commit d5daff7d312 (pcie: implement slot power control for pcie root ports) > > > 2) > > > commit 23786d13441 (pci: implement power state) > > > 3) > > > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2053584 > > > > > > > > > Correct format for the last paragraph: > > > > > > Fixes: d5daff7d312 ("pcie: implement slot power control for pcie root ports") > > Fixes: 23786d13441 ("pci: implement power state") > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584 > > ok, will fix it up on respin like this to have references: > > 1) > Fixes: d5daff7d312 ("pcie: implement slot power control for pcie root ports") > 2) > Fixes: 23786d13441 ("pci: implement power state") > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584 Just drop references, a bit of duplication is not a problem. E.g. in powered off slot due to commit 23786d13441 ("pci: implement power state") which disables BARs if Trailer tags belong in a group at the end with no interruptions, not all tools handle them otherwise. > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > --- > > > hw/pci/pcie.c | 5 ++--- > > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > index d7d73a31e4..2339729a7c 100644 > > > --- a/hw/pci/pcie.c > > > +++ b/hw/pci/pcie.c > > > @@ -383,10 +383,9 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) > > > > > > if (sltcap & PCI_EXP_SLTCAP_PCP) { > > > power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON; > > > + pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > > + pcie_set_power_device, &power); > > > } > > > - > > > - pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > > - pcie_set_power_device, &power); > > > > I think this is correct. However, I wonder whether for 6.2 compatiblity > > as a hack we should sometimes skip the power update even when > > PCI_EXP_SLTCAP_PCP exists. Will that not work around the issue for > > these machine types? > > pc-q35-6.2 is broken utterly. > With pc-q35-6.1, it's a mess. Here is a ping-pong migration matrix for it > > v6.1 | v6.2 | Fix > v6.1 ok | broken | ok (#1) > v6.2 | broken | broken (#2) > > [1] has PCI_EXP_SLTCAP_PCP due to x-pcihp-enable-pcie-pcp-cap=on > i.e. pci_config is exactly the same as in qemu-v6.1 > [2] PCI_EXP_SLTCAP_PCP is enabled + empty slot is powered off > (+ state is migrated) > > there are some invariants that might work in one direction, > but it won't survive ping-pong migration. And more importantly > for upstream we care mostly care for old -> new working, > and it's direction that is broken in v6.2. > > > And assuming we want bug for bug compat anyway, why not just put > > it here? It seems easier to reason about frankly ... > > It should be possible hack PCI core to fixup broken power state > on incoming migration at (at postload time), but that would just > create more confusion, where in some cases migration would work > and in some would not (depending on used qemu versions). > > Lets just declare v6.2 qemu broken, with upgrade/downgrade to > (7.0/6.1) as suggested solution. > > PS: > I'd very much prefer avoid adding hacks for ACPI pcihp sake to > PCI core, and let PCI code behave as it's supposed to per spec. > It's already bad enough with pcihp layered on top of PCI, > making PCI code depend on pcihp will just make it more fragile. > > > > } > > > > > > /* > > > -- > > > 2.31.1 > >
On Thu, Feb 24, 2022 at 12:44:09PM -0500, Igor Mammedov wrote: > on creation a PCIDevice has power turned on at the end of pci_qdev_realize() > however later on if PCIe slot isn't populated with any children > it's power is turned off. It's fine if native hotplug is used > as plug callback will power slot on among other things. > However when ACPI hotplug is enabled it replaces native PCIe plug > callbacks with ACPI specific ones (acpi_pcihp_device_*plug_cb) and > as result slot stays powered off. It works fine as ACPI hotplug > on guest side takes care of enumerating/initializing hotplugged > device. But when later guest is migrated, call chain introduced by [1] > > pcie_cap_slot_post_load() > -> pcie_cap_update_power() > -> pcie_set_power_device() > -> pci_set_power() > -> pci_update_mappings() > > will disable earlier initialized BARs for the hotplugged device > in powered off slot due to commit [2] which disables BARs if > power is off. As result guest OS after migration will be very > much confused [3], still thinking that it has working device, > which isn't true anymore due to disabled BARs. > > Fix it by honoring PCI_EXP_SLTCAP_PCP and updating power status > only if capability is enabled. > Follow up patch will disable > PCI_EXP_SLTCAP_PCP overriding COMPAT_PROP_PCP property when > PCIe slot is under ACPI PCI hotplug control. > > See [3] for reproducer. > > 1) > Fixes: commit d5daff7d312 (pcie: implement slot power control for pcie root ports) > 2) > commit 23786d13441 (pci: implement power state) > 3) > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2053584 > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > hw/pci/pcie.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > index d7d73a31e4..2339729a7c 100644 > --- a/hw/pci/pcie.c > +++ b/hw/pci/pcie.c > @@ -383,10 +383,9 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) > > if (sltcap & PCI_EXP_SLTCAP_PCP) { > power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON; > + pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > + pcie_set_power_device, &power); > } > - > - pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > - pcie_set_power_device, &power); Hmm I wrote I like it but now I am not sure I understand how does this patch help fix things. here is the full function: static void pcie_cap_update_power(PCIDevice *hotplug_dev) { uint8_t *exp_cap = hotplug_dev->config + hotplug_dev->exp.exp_cap; PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(hotplug_dev)); uint32_t sltcap = pci_get_long(exp_cap + PCI_EXP_SLTCAP); uint16_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL); bool power = true; if (sltcap & PCI_EXP_SLTCAP_PCP) { power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON; } pci_for_each_device(sec_bus, pci_bus_num(sec_bus), pcie_set_power_device, &power); } I understand the follow up patch, but it looks to me that without the capability, power is always on. Why does skipping that help fix anything? Thanks, > } > > /* > -- > 2.31.1
Hi, > pcie_cap_slot_post_load() > -> pcie_cap_update_power() > -> pcie_set_power_device() > -> pci_set_power() > -> pci_update_mappings() > Fix it by honoring PCI_EXP_SLTCAP_PCP and updating power status > only if capability is enabled. > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > index d7d73a31e4..2339729a7c 100644 > --- a/hw/pci/pcie.c > +++ b/hw/pci/pcie.c > @@ -383,10 +383,9 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) > > if (sltcap & PCI_EXP_SLTCAP_PCP) { > power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON; > + pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > + pcie_set_power_device, &power); > } > - > - pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > - pcie_set_power_device, &power); > } The change makes sense, although I don't see how that changes qemu behavior. 'power' defaults to true, so when SLTCAP_PCP is off it should never ever try to power off the devices. And pci_set_power() should figure the state didn't change and instantly return without touching the device. take care, Gerd
On Fri, Feb 25, 2022 at 11:12:59AM +0100, Gerd Hoffmann wrote: > Hi, > > > pcie_cap_slot_post_load() > > -> pcie_cap_update_power() > > -> pcie_set_power_device() > > -> pci_set_power() > > -> pci_update_mappings() > > > Fix it by honoring PCI_EXP_SLTCAP_PCP and updating power status > > only if capability is enabled. > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > index d7d73a31e4..2339729a7c 100644 > > --- a/hw/pci/pcie.c > > +++ b/hw/pci/pcie.c > > @@ -383,10 +383,9 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) > > > > if (sltcap & PCI_EXP_SLTCAP_PCP) { > > power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON; > > + pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > + pcie_set_power_device, &power); > > } > > - > > - pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > - pcie_set_power_device, &power); > > } > > The change makes sense, although I don't see how that changes qemu > behavior. > > 'power' defaults to true, so when SLTCAP_PCP is off it should never > ever try to power off the devices. And pci_set_power() should figure > the state didn't change and instantly return without touching the > device. > > take care, > Gerd And making sure power is actually up might be a bit cleaner just in case down the road we start plugging devices in a powered off state.
On Fri, 25 Feb 2022 11:12:59 +0100 Gerd Hoffmann <kraxel@redhat.com> wrote: > Hi, > > > pcie_cap_slot_post_load() > > -> pcie_cap_update_power() > > -> pcie_set_power_device() > > -> pci_set_power() > > -> pci_update_mappings() > > > Fix it by honoring PCI_EXP_SLTCAP_PCP and updating power status > > only if capability is enabled. > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > index d7d73a31e4..2339729a7c 100644 > > --- a/hw/pci/pcie.c > > +++ b/hw/pci/pcie.c > > @@ -383,10 +383,9 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) > > > > if (sltcap & PCI_EXP_SLTCAP_PCP) { > > power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON; > > + pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > + pcie_set_power_device, &power); > > } > > - > > - pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > - pcie_set_power_device, &power); > > } > > The change makes sense, although I don't see how that changes qemu > behavior. looks like I need to fix commit message > > 'power' defaults to true, so when SLTCAP_PCP is off it should never > ever try to power off the devices. And pci_set_power() should figure > the state didn't change and instantly return without touching the > device. SLTCAP_PCP is on by default as well, so empty slot ends up with power disabled PCC state [1]: sltctl & SLTCTL_PCC == 0x400 by the time machine is initialized. Then ACPI pcihp callbacks override native hotplug ones so PCC remains stuck in this state since all power control is out of picture in case of ACPI based hotplug. Guest OS doesn't use/or ignore native PCC. After device hotplug, the device stays in has_power=on default state as pci_qdev_realize() initialized it. [2] However when migrated SLTCTL_PCC is also migrated, and kick in as described in commit message and turns power off [3] > > take care, > Gerd > 1) pci_qdev_realize pci_set_power: extra_root0, d->has_power: 0, new power state: 1 pci_set_power: extra_root0, set has_power to: 1 pcie_cap_slot_reset pcie_cap_update_power(extra_root0): sltcap & PCI_EXP_SLTCAP_PCP: 2, sltctl & SLTCTL_PCC: 400 pcie_cap_update_power(extra_root0): updated power: 0 <== has no effect on children since there is none 2) pci_qdev_realize pci_set_power: nic, d->has_power: 0, new power state: 1 pci_set_power: nic, set has_power to: 1 3) pci_qdev_realize pci_set_power: extra_root0, d->has_power: 0, new power state: 1 pci_set_power: extra_root0, set has_power to: 1 pci_qdev_realize pci_set_power: nic, d->has_power: 0, new power state: 1 pci_set_power: nic, set has_power to: 1 pcie_cap_slot_reset pcie_cap_update_power(extra_root0): sltcap & PCI_EXP_SLTCAP_PCP: 2, sltctl & SLTCTL_PCC: 0 pcie_cap_update_power(extra_root0): updated power: 1 pci_set_power: nic, d->has_power: 1, new power state: 1 pcie_cap_slot_post_load(extra_root0) pcie_cap_update_power(extra_root0): sltcap & PCI_EXP_SLTCAP_PCP: 2, sltctl & SLTCTL_PCC: 400 pcie_cap_update_power(extra_root0): updated power: 0 pci_set_power: nic, d->has_power: 1, new power state: 0 pci_set_power: nic, set has_power to: 0
On Fri, Feb 25, 2022 at 02:02:31PM +0100, Igor Mammedov wrote: > On Fri, 25 Feb 2022 11:12:59 +0100 > Gerd Hoffmann <kraxel@redhat.com> wrote: > > > Hi, > > > > > pcie_cap_slot_post_load() > > > -> pcie_cap_update_power() > > > -> pcie_set_power_device() > > > -> pci_set_power() > > > -> pci_update_mappings() > > > > > Fix it by honoring PCI_EXP_SLTCAP_PCP and updating power status > > > only if capability is enabled. > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > index d7d73a31e4..2339729a7c 100644 > > > --- a/hw/pci/pcie.c > > > +++ b/hw/pci/pcie.c > > > @@ -383,10 +383,9 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) > > > > > > if (sltcap & PCI_EXP_SLTCAP_PCP) { > > > power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON; > > > + pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > > + pcie_set_power_device, &power); > > > } > > > - > > > - pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > > - pcie_set_power_device, &power); > > > } > > > > The change makes sense, although I don't see how that changes qemu > > behavior. > > looks like I need to fix commit message > > > > > 'power' defaults to true, so when SLTCAP_PCP is off it should never > > ever try to power off the devices. And pci_set_power() should figure > > the state didn't change and instantly return without touching the > > device. > > > SLTCAP_PCP is on by default as well, so empty slot ends up with > power disabled PCC state [1]: > > sltctl & SLTCTL_PCC == 0x400 > > by the time machine is initialized. > > Then ACPI pcihp callbacks override native hotplug ones > so PCC remains stuck in this state since all power control > is out of picture in case of ACPI based hotplug. Guest OS > doesn't use/or ignore native PCC. So how about when ACPI pcihp overrides native with its callbacks we also set PCC power to on? > > After device hotplug, the device stays in has_power=on default > state as pci_qdev_realize() initialized it. [2] > > However when migrated SLTCTL_PCC is also migrated, and kick in > as described in commit message and turns power off [3] > > > > > take care, > > Gerd > > > > 1) > pci_qdev_realize > pci_set_power: extra_root0, d->has_power: 0, new power state: 1 > pci_set_power: extra_root0, set has_power to: 1 > pcie_cap_slot_reset > pcie_cap_update_power(extra_root0): sltcap & PCI_EXP_SLTCAP_PCP: 2, sltctl & SLTCTL_PCC: 400 > pcie_cap_update_power(extra_root0): updated power: 0 <== has no effect on children since there is none > > 2) > pci_qdev_realize > pci_set_power: nic, d->has_power: 0, new power state: 1 > pci_set_power: nic, set has_power to: 1 > > > 3) > pci_qdev_realize > pci_set_power: extra_root0, d->has_power: 0, new power state: 1 > pci_set_power: extra_root0, set has_power to: 1 > pci_qdev_realize > pci_set_power: nic, d->has_power: 0, new power state: 1 > pci_set_power: nic, set has_power to: 1 > pcie_cap_slot_reset > pcie_cap_update_power(extra_root0): sltcap & PCI_EXP_SLTCAP_PCP: 2, sltctl & SLTCTL_PCC: 0 > pcie_cap_update_power(extra_root0): updated power: 1 > pci_set_power: nic, d->has_power: 1, new power state: 1 > pcie_cap_slot_post_load(extra_root0) > pcie_cap_update_power(extra_root0): sltcap & PCI_EXP_SLTCAP_PCP: 2, sltctl & SLTCTL_PCC: 400 > pcie_cap_update_power(extra_root0): updated power: 0 > pci_set_power: nic, d->has_power: 1, new power state: 0 > pci_set_power: nic, set has_power to: 0
On Fri, 25 Feb 2022 08:08:57 -0500 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Fri, Feb 25, 2022 at 02:02:31PM +0100, Igor Mammedov wrote: > > On Fri, 25 Feb 2022 11:12:59 +0100 > > Gerd Hoffmann <kraxel@redhat.com> wrote: > > > > > Hi, > > > > > > > pcie_cap_slot_post_load() > > > > -> pcie_cap_update_power() > > > > -> pcie_set_power_device() > > > > -> pci_set_power() > > > > -> pci_update_mappings() > > > > > > > Fix it by honoring PCI_EXP_SLTCAP_PCP and updating power status > > > > only if capability is enabled. > > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > > index d7d73a31e4..2339729a7c 100644 > > > > --- a/hw/pci/pcie.c > > > > +++ b/hw/pci/pcie.c > > > > @@ -383,10 +383,9 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) > > > > > > > > if (sltcap & PCI_EXP_SLTCAP_PCP) { > > > > power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON; > > > > + pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > > > + pcie_set_power_device, &power); > > > > } > > > > - > > > > - pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > > > - pcie_set_power_device, &power); > > > > } > > > > > > The change makes sense, although I don't see how that changes qemu > > > behavior. > > > > looks like I need to fix commit message > > > > > > > > 'power' defaults to true, so when SLTCAP_PCP is off it should never > > > ever try to power off the devices. And pci_set_power() should figure > > > the state didn't change and instantly return without touching the > > > device. > > > > > > SLTCAP_PCP is on by default as well, so empty slot ends up with > > power disabled PCC state [1]: > > > > sltctl & SLTCTL_PCC == 0x400 > > > > by the time machine is initialized. > > > > Then ACPI pcihp callbacks override native hotplug ones > > so PCC remains stuck in this state since all power control > > is out of picture in case of ACPI based hotplug. Guest OS > > doesn't use/or ignore native PCC. > > So how about when ACPI pcihp overrides native with its callbacks we also > set PCC power to on? with some reworks it should work (i.e. adding an extra knob that will tell PCI core not to power off when it should, looks fragile and very hacky). It has the same migration implications as this patch, so I'd rather go after disabling whole SLTCAP_PCP thing to be correct and keeping PCI code free from ACPI hacks. > > > > > After device hotplug, the device stays in has_power=on default > > state as pci_qdev_realize() initialized it. [2] > > > > However when migrated SLTCTL_PCC is also migrated, and kick in > > as described in commit message and turns power off [3] > > > > > > > > take care, > > > Gerd > > > > > > > 1) > > pci_qdev_realize > > pci_set_power: extra_root0, d->has_power: 0, new power state: 1 > > pci_set_power: extra_root0, set has_power to: 1 > > pcie_cap_slot_reset > > pcie_cap_update_power(extra_root0): sltcap & PCI_EXP_SLTCAP_PCP: 2, sltctl & SLTCTL_PCC: 400 > > pcie_cap_update_power(extra_root0): updated power: 0 <== has no effect on children since there is none > > > > 2) > > pci_qdev_realize > > pci_set_power: nic, d->has_power: 0, new power state: 1 > > pci_set_power: nic, set has_power to: 1 > > > > > > 3) > > pci_qdev_realize > > pci_set_power: extra_root0, d->has_power: 0, new power state: 1 > > pci_set_power: extra_root0, set has_power to: 1 > > pci_qdev_realize > > pci_set_power: nic, d->has_power: 0, new power state: 1 > > pci_set_power: nic, set has_power to: 1 > > pcie_cap_slot_reset > > pcie_cap_update_power(extra_root0): sltcap & PCI_EXP_SLTCAP_PCP: 2, sltctl & SLTCTL_PCC: 0 > > pcie_cap_update_power(extra_root0): updated power: 1 > > pci_set_power: nic, d->has_power: 1, new power state: 1 > > pcie_cap_slot_post_load(extra_root0) > > pcie_cap_update_power(extra_root0): sltcap & PCI_EXP_SLTCAP_PCP: 2, sltctl & SLTCTL_PCC: 400 > > pcie_cap_update_power(extra_root0): updated power: 0 > > pci_set_power: nic, d->has_power: 1, new power state: 0 > > pci_set_power: nic, set has_power to: 0 >
On Fri, Feb 25, 2022 at 02:35:28PM +0100, Igor Mammedov wrote: > On Fri, 25 Feb 2022 08:08:57 -0500 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Fri, Feb 25, 2022 at 02:02:31PM +0100, Igor Mammedov wrote: > > > On Fri, 25 Feb 2022 11:12:59 +0100 > > > Gerd Hoffmann <kraxel@redhat.com> wrote: > > > > > > > Hi, > > > > > > > > > pcie_cap_slot_post_load() > > > > > -> pcie_cap_update_power() > > > > > -> pcie_set_power_device() > > > > > -> pci_set_power() > > > > > -> pci_update_mappings() > > > > > > > > > Fix it by honoring PCI_EXP_SLTCAP_PCP and updating power status > > > > > only if capability is enabled. > > > > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > > > index d7d73a31e4..2339729a7c 100644 > > > > > --- a/hw/pci/pcie.c > > > > > +++ b/hw/pci/pcie.c > > > > > @@ -383,10 +383,9 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) > > > > > > > > > > if (sltcap & PCI_EXP_SLTCAP_PCP) { > > > > > power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON; > > > > > + pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > > > > + pcie_set_power_device, &power); > > > > > } > > > > > - > > > > > - pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > > > > - pcie_set_power_device, &power); > > > > > } > > > > > > > > The change makes sense, although I don't see how that changes qemu > > > > behavior. > > > > > > looks like I need to fix commit message > > > > > > > > > > > 'power' defaults to true, so when SLTCAP_PCP is off it should never > > > > ever try to power off the devices. And pci_set_power() should figure > > > > the state didn't change and instantly return without touching the > > > > device. > > > > > > > > > SLTCAP_PCP is on by default as well, so empty slot ends up with > > > power disabled PCC state [1]: > > > > > > sltctl & SLTCTL_PCC == 0x400 > > > > > > by the time machine is initialized. > > > > > > Then ACPI pcihp callbacks override native hotplug ones > > > so PCC remains stuck in this state since all power control > > > is out of picture in case of ACPI based hotplug. Guest OS > > > doesn't use/or ignore native PCC. > > > > So how about when ACPI pcihp overrides native with its callbacks we also > > set PCC power to on? > > with some reworks it should work (i.e. adding an extra knob that will tell > PCI core not to power off when it should, looks fragile and very hacky). > It has the same migration implications as this patch, so I'd rather go > after disabling whole SLTCAP_PCP thing to be correct and keeping PCI > code free from ACPI hacks. Hmm I don't get it. I literally mean this: diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index d7d73a31e4..72de72ce7a 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -389,6 +389,17 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) pcie_set_power_device, &power); } +void pcie_cap_enable_power(PCIDevice *hotplug_dev) +{ + uint8_t *exp_cap = hotplug_dev->config + hotplug_dev->exp.exp_cap; + uint32_t sltcap = pci_get_long(exp_cap + PCI_EXP_SLTCAP); + + if (sltcap & PCI_EXP_SLTCAP_PCP) { + pci_set_word_by_mask(exp_cap + PCI_EXP_SLTCTL, + PCI_EXP_SLTCTL_PCC, PCI_EXP_SLTCTL_PWR_ON); + } +} + /* * A PCI Express Hot-Plug Event has occurred, so update slot status register * and notify OS of the event if necessary. Then call this from ACPI. How would this have any migration implications at all? And why do we need a knob not to power off then? Power will just stay on since there's nothing turning it off.
On Fri, 25 Feb 2022 08:48:13 -0500 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Fri, Feb 25, 2022 at 02:35:28PM +0100, Igor Mammedov wrote: > > On Fri, 25 Feb 2022 08:08:57 -0500 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Fri, Feb 25, 2022 at 02:02:31PM +0100, Igor Mammedov wrote: > > > > On Fri, 25 Feb 2022 11:12:59 +0100 > > > > Gerd Hoffmann <kraxel@redhat.com> wrote: > > > > > > > > > Hi, > > > > > > > > > > > pcie_cap_slot_post_load() > > > > > > -> pcie_cap_update_power() > > > > > > -> pcie_set_power_device() > > > > > > -> pci_set_power() > > > > > > -> pci_update_mappings() > > > > > > > > > > > Fix it by honoring PCI_EXP_SLTCAP_PCP and updating power status > > > > > > only if capability is enabled. > > > > > > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > > > > index d7d73a31e4..2339729a7c 100644 > > > > > > --- a/hw/pci/pcie.c > > > > > > +++ b/hw/pci/pcie.c > > > > > > @@ -383,10 +383,9 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) > > > > > > > > > > > > if (sltcap & PCI_EXP_SLTCAP_PCP) { > > > > > > power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON; > > > > > > + pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > > > > > + pcie_set_power_device, &power); > > > > > > } > > > > > > - > > > > > > - pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > > > > > > - pcie_set_power_device, &power); > > > > > > } > > > > > > > > > > The change makes sense, although I don't see how that changes qemu > > > > > behavior. > > > > > > > > looks like I need to fix commit message > > > > > > > > > > > > > > 'power' defaults to true, so when SLTCAP_PCP is off it should never > > > > > ever try to power off the devices. And pci_set_power() should figure > > > > > the state didn't change and instantly return without touching the > > > > > device. > > > > > > > > > > > > SLTCAP_PCP is on by default as well, so empty slot ends up with > > > > power disabled PCC state [1]: > > > > > > > > sltctl & SLTCTL_PCC == 0x400 > > > > > > > > by the time machine is initialized. > > > > > > > > Then ACPI pcihp callbacks override native hotplug ones > > > > so PCC remains stuck in this state since all power control > > > > is out of picture in case of ACPI based hotplug. Guest OS > > > > doesn't use/or ignore native PCC. > > > > > > So how about when ACPI pcihp overrides native with its callbacks we also > > > set PCC power to on? > > > > with some reworks it should work (i.e. adding an extra knob that will tell > > PCI core not to power off when it should, looks fragile and very hacky). > > It has the same migration implications as this patch, so I'd rather go > > after disabling whole SLTCAP_PCP thing to be correct and keeping PCI > > code free from ACPI hacks. > > Hmm I don't get it. I literally mean this: I was thinking about the time when we do override native callbacks. which happens for every root port at its realize time. Start up sequence on src: acpi_pcihp_device_pre_plug_cb(dev: extra_root0) pci_qdev_realize(extra_root0) pci_set_power: extra_root0, d->has_power: 0, new power state: 1 pci_set_power: extra_root0, set has_power to: 1 acpi_pcihp_device_plug_cb(dev: extra_root0) <== lets assume we call pcie_cap_enable_power(dev) here it's all good wrt layering as we are rewiring being initialized root port to another hp controller from context of its parent device pcie_cap_slot_reset <== then here we hit "if (populated) {} else {}" which kills whatever above has done since slot is not populated and a knob would be needed to prevent reset (i.e. don't touch power state as it's 'managed' by ACPI) pcie_cap_update_power(extra_root0): sltcap & PCI_EXP_SLTCAP_PCP: 2, sltctl & SLTCTL_PCC: 400 pcie_cap_update_power(extra_root0): updated power: 0 Though I haven't thought about end-device hotplug time: (qemu) device_add e1000e,bus=extra_root0,id=nic acpi_pcihp_device_pre_plug_cb(dev: nic) pci_qdev_realize(nic) pci_set_power: nic, d->has_power: 0, new power state: 1 pci_set_power: nic, set has_power to: 1 acpi_pcihp_device_plug_cb(dev: nic) <== here we have a chance to power on no longer empty slot pcie_cap_enable_power(hotplug_dev) then when target loads state it will see SLTCTL_PCC: 0 and keep slot powered on. pci_set_power: nic, d->has_power: 1, new power state: 1 This where I wasn't comfortable with idea of calling random PCIe code chunks and thought about chaining callbacks so that pcie_cap_slot_[pre_]plug_cb() would do necessary PCIe steps and acpi_pcihp_device_[pre_]plug_cb() do ACPI specific things not intruding on each other, but that requires telling PCIe code that it should not issue native hotplug event to guest. > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > index d7d73a31e4..72de72ce7a 100644 > --- a/hw/pci/pcie.c > +++ b/hw/pci/pcie.c > @@ -389,6 +389,17 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) > pcie_set_power_device, &power); > } > > +void pcie_cap_enable_power(PCIDevice *hotplug_dev) > +{ > + uint8_t *exp_cap = hotplug_dev->config + hotplug_dev->exp.exp_cap; > + uint32_t sltcap = pci_get_long(exp_cap + PCI_EXP_SLTCAP); > + > + if (sltcap & PCI_EXP_SLTCAP_PCP) { > + pci_set_word_by_mask(exp_cap + PCI_EXP_SLTCTL, > + PCI_EXP_SLTCTL_PCC, PCI_EXP_SLTCTL_PWR_ON); > + } > +} > + > /* > * A PCI Express Hot-Plug Event has occurred, so update slot status register > * and notify OS of the event if necessary. > > Then call this from ACPI. How would this have any migration > implications at all? And why do we need a knob not to power off then? > Power will just stay on since there's nothing turning it off. It still changes pci_config, the similar to disabling SLTCAP_PCP, so I think we still need migration compat knob to have the same device state in cross version migration case.
Hi, > This where I wasn't comfortable with idea of calling random PCIe code > chunks and thought about chaining callbacks so that > pcie_cap_slot_[pre_]plug_cb() would do necessary PCIe steps > and acpi_pcihp_device_[pre_]plug_cb() do ACPI specific things not > intruding on each other, but that requires telling PCIe code that > it should not issue native hotplug event to guest. I think with both acpi and pcie hotplug being active it surely makes sense that both are in sync when it comes to device state. So acpihp updating pcie slot state (power control, maybe also device presence) looks sane to me. Not sure whenever it would be better to call into pcie code or just update the pci config space bits directly to make sure pcie doesn't take unwanted actions like sending out events. take care, Gerd
On Mon, 28 Feb 2022 08:39:57 +0100 Gerd Hoffmann <kraxel@redhat.com> wrote: > Hi, > > > This where I wasn't comfortable with idea of calling random PCIe code > > chunks and thought about chaining callbacks so that > > pcie_cap_slot_[pre_]plug_cb() would do necessary PCIe steps > > and acpi_pcihp_device_[pre_]plug_cb() do ACPI specific things not > > intruding on each other, but that requires telling PCIe code that > > it should not issue native hotplug event to guest. > > I think with both acpi and pcie hotplug being active it surely makes > sense that both are in sync when it comes to device state. So acpihp > updating pcie slot state (power control, maybe also device presence) > looks sane to me. > > Not sure whenever it would be better to call into pcie code or just > update the pci config space bits directly to make sure pcie doesn't > take unwanted actions like sending out events. If changing power state is preferred over disabling power control, I can respin series with what Michael has suggested earlier and drop 2/4 as not necessary. I'll wait for a day to see if there would more ideas/suggestions > take care, > Gerd >
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index d7d73a31e4..2339729a7c 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -383,10 +383,9 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) if (sltcap & PCI_EXP_SLTCAP_PCP) { power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON; + pci_for_each_device(sec_bus, pci_bus_num(sec_bus), + pcie_set_power_device, &power); } - - pci_for_each_device(sec_bus, pci_bus_num(sec_bus), - pcie_set_power_device, &power); } /*
on creation a PCIDevice has power turned on at the end of pci_qdev_realize() however later on if PCIe slot isn't populated with any children it's power is turned off. It's fine if native hotplug is used as plug callback will power slot on among other things. However when ACPI hotplug is enabled it replaces native PCIe plug callbacks with ACPI specific ones (acpi_pcihp_device_*plug_cb) and as result slot stays powered off. It works fine as ACPI hotplug on guest side takes care of enumerating/initializing hotplugged device. But when later guest is migrated, call chain introduced by [1] pcie_cap_slot_post_load() -> pcie_cap_update_power() -> pcie_set_power_device() -> pci_set_power() -> pci_update_mappings() will disable earlier initialized BARs for the hotplugged device in powered off slot due to commit [2] which disables BARs if power is off. As result guest OS after migration will be very much confused [3], still thinking that it has working device, which isn't true anymore due to disabled BARs. Fix it by honoring PCI_EXP_SLTCAP_PCP and updating power status only if capability is enabled. Follow up patch will disable PCI_EXP_SLTCAP_PCP overriding COMPAT_PROP_PCP property when PCIe slot is under ACPI PCI hotplug control. See [3] for reproducer. 1) Fixes: commit d5daff7d312 (pcie: implement slot power control for pcie root ports) 2) commit 23786d13441 (pci: implement power state) 3) Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2053584 Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- hw/pci/pcie.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)