Message ID | 20250206142307.921070-1-eric.auger@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix vIOMMU reset order | expand |
On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote: > This is a follow-up of Peter's attempt to fix the fact that > vIOMMUs are likely to be reset before the device they protect: > > [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/ > > This is especially observed with virtio devices when a qmp system_reset > command is sent but also with VFIO devices. > > This series puts the vIOMMU reset in the 3-phase exit callback. > > This scheme was tested successful with virtio-devices and some > VFIO devices. Nevertheless not all the topologies have been > tested yet. > > Best Regards > > Eric Looks good. Acked-by: Michael S. Tsirkin <mst@redhat.com> How should this be merged? I supposed I can merge the 1st three and the other two by the respective maintainers? I don't think there's a dependency here, right? > This series can be found at: > https://github.com/eauger/qemu/tree/viommu-3phase-reset-v1 > > Eric Auger (4): > hw/virtio/virtio-iommu: Migrate to 3-phase reset > hw/i386/intel-iommu: Migrate to 3-phase reset > hw/arm/smmuv3: Move reset to exit phase > hw/vfio/common: Add a trace point in vfio_reset_handler > > Peter Xu (1): > hw/i386/intel_iommu: Tear down address spaces before IOMMU reset > > hw/arm/smmuv3.c | 9 +++++---- > hw/i386/intel_iommu.c | 10 ++++++---- > hw/vfio/common.c | 1 + > hw/virtio/virtio-iommu.c | 9 +++++---- > hw/arm/trace-events | 1 + > hw/i386/trace-events | 1 + > hw/vfio/trace-events | 1 + > hw/virtio/trace-events | 2 +- > 8 files changed, 21 insertions(+), 13 deletions(-) > > -- > 2.47.1
On Fri, 7 Feb 2025 at 11:10, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote: > > This is a follow-up of Peter's attempt to fix the fact that > > vIOMMUs are likely to be reset before the device they protect: > > > > [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices > > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/ > > > > This is especially observed with virtio devices when a qmp system_reset > > command is sent but also with VFIO devices. > > > > This series puts the vIOMMU reset in the 3-phase exit callback. > > > > This scheme was tested successful with virtio-devices and some > > VFIO devices. Nevertheless not all the topologies have been > > tested yet. > > > > Best Regards > > > > Eric > > > > Looks good. > > > Acked-by: Michael S. Tsirkin <mst@redhat.com> > > How should this be merged? > I supposed I can merge the 1st three and the other > two by the respective maintainers? > I don't think there's a dependency here, right? If we're happy with the design of the series I think it would be simpler to take the whole thing through one tree, rather than split it up. I had a question on the smmu patch which is mostly about clarifying what the issue is that we're running into, but in principle I'm happy for you to take that patch as well. -- PMM
Hi Peter, Michael, On 2/7/25 5:40 PM, Peter Maydell wrote: > On Fri, 7 Feb 2025 at 11:10, Michael S. Tsirkin <mst@redhat.com> wrote: >> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote: >>> This is a follow-up of Peter's attempt to fix the fact that >>> vIOMMUs are likely to be reset before the device they protect: >>> >>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices >>> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/ >>> >>> This is especially observed with virtio devices when a qmp system_reset >>> command is sent but also with VFIO devices. >>> >>> This series puts the vIOMMU reset in the 3-phase exit callback. >>> >>> This scheme was tested successful with virtio-devices and some >>> VFIO devices. Nevertheless not all the topologies have been >>> tested yet. >>> >>> Best Regards >>> >>> Eric >> >> >> Looks good. >> >> >> Acked-by: Michael S. Tsirkin <mst@redhat.com> >> >> How should this be merged? >> I supposed I can merge the 1st three and the other >> two by the respective maintainers? >> I don't think there's a dependency here, right? > If we're happy with the design of the series I think it > would be simpler to take the whole thing through one > tree, rather than split it up. I had a question on the > smmu patch which is mostly about clarifying what the > issue is that we're running into, but in principle > I'm happy for you to take that patch as well. Thank you for the swift review. I will respin to add some comments along with the reset function and the relevance of exit phase + add more details in the cover letter. Thanks Erc > > -- PMM >
On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote: > This is a follow-up of Peter's attempt to fix the fact that > vIOMMUs are likely to be reset before the device they protect: > > [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/ > > This is especially observed with virtio devices when a qmp system_reset > command is sent but also with VFIO devices. > > This series puts the vIOMMU reset in the 3-phase exit callback. > > This scheme was tested successful with virtio-devices and some > VFIO devices. Nevertheless not all the topologies have been > tested yet. Eric, It's great to know that we seem to be able to fix everything in such small changeset! I would like to double check two things with you here: - For VFIO's reset hook, looks like we have landed more changes so that vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the reset during "hold" phase only (via legacy_reset_hold()). That part will make sure vIOMMU (if switching to exit()-only reset) will order properly with VFIO. Is my understanding correct here? - Is it possible if some PCIe devices that will provide its own phase.exit(), would it matter on the order of PCIe device's phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use exit()-only approach like this one)? PS: it would be great to attach such information in either cover letter or commit message. But definitely not a request to repost the patchset, if Michael would have Message-ID when merge that'll be far enough to help anyone find this discussion again. Thanks!
On Fri, 7 Feb 2025 at 16:54, Peter Xu <peterx@redhat.com> wrote: > > On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote: > > This is a follow-up of Peter's attempt to fix the fact that > > vIOMMUs are likely to be reset before the device they protect: > > > > [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices > > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/ > > > > This is especially observed with virtio devices when a qmp system_reset > > command is sent but also with VFIO devices. > > > > This series puts the vIOMMU reset in the 3-phase exit callback. > > > > This scheme was tested successful with virtio-devices and some > > VFIO devices. Nevertheless not all the topologies have been > > tested yet. > > Eric, > > It's great to know that we seem to be able to fix everything in such small > changeset! > > I would like to double check two things with you here: > > - For VFIO's reset hook, looks like we have landed more changes so that > vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the > reset during "hold" phase only (via legacy_reset_hold()). That part > will make sure vIOMMU (if switching to exit()-only reset) will order > properly with VFIO. Is my understanding correct here? Yes, we now do a reset of the whole system as a three-phase setup, and the old pre-three-phase reset APIs like qemu_register_reset() and device_class_set_legacy_reset() all happen during the "hold" phase. > - Is it possible if some PCIe devices that will provide its own > phase.exit(), would it matter on the order of PCIe device's > phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use > exit()-only approach like this one)? It's certainly possible for a PCIe device to implement a three-phase reset which does things in the exit phase. However I think I would say that such a device which didn't cancel all outstanding DMA operations during either 'enter' or 'hold' phases would be broken. If it did some other things during the 'exit' phase I don't think the ordering of those vs the iommu 'exit' handling should matter. (To some extent the splitting into three phases is trying to set up a consistent model as outlined in docs/devel/reset.rst and to some extent it's just a convenient way to get a basic "this reset thing I need to do must happen after some other device has done its reset things" which you can achieve by ad-hoc putting them in different phases. Ideally we get mostly the former and a little pragmatic dose of the latter, but the consistent model is not very solidly nailed down so I have a feeling the proportions may not be quite as lopsided as we'd like :-) ) thanks -- PMM
On 2/7/25 17:54, Peter Xu wrote: > On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote: >> This is a follow-up of Peter's attempt to fix the fact that >> vIOMMUs are likely to be reset before the device they protect: >> >> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices >> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/ >> >> This is especially observed with virtio devices when a qmp system_reset >> command is sent but also with VFIO devices. >> >> This series puts the vIOMMU reset in the 3-phase exit callback. >> >> This scheme was tested successful with virtio-devices and some >> VFIO devices. Nevertheless not all the topologies have been >> tested yet. > > Eric, > > It's great to know that we seem to be able to fix everything in such small > changeset! > > I would like to double check two things with you here: > > - For VFIO's reset hook, looks like we have landed more changes so that > vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the > reset during "hold" phase only (via legacy_reset_hold()). That part > will make sure vIOMMU (if switching to exit()-only reset) will order > properly with VFIO. Is my understanding correct here? Eric, We were still seeing DMA errors from VFIO devices : VFIO_MAP_DMA failed: Bad address with this series at shutdown (machine or OS) when using an intel_iommu device. We could see that the VIOMMU was reset and the device DMAs were still alive. Do you know why now ? Thanks, C. > > - Is it possible if some PCIe devices that will provide its own > phase.exit(), would it matter on the order of PCIe device's > phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use > exit()-only approach like this one)? > > PS: it would be great to attach such information in either cover letter or > commit message. But definitely not a request to repost the patchset, if > Michael would have Message-ID when merge that'll be far enough to help > anyone find this discussion again. > > Thanks! >
On Fri, Feb 07, 2025 at 05:06:20PM +0000, Peter Maydell wrote: > On Fri, 7 Feb 2025 at 16:54, Peter Xu <peterx@redhat.com> wrote: > > > > On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote: > > > This is a follow-up of Peter's attempt to fix the fact that > > > vIOMMUs are likely to be reset before the device they protect: > > > > > > [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices > > > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/ > > > > > > This is especially observed with virtio devices when a qmp system_reset > > > command is sent but also with VFIO devices. > > > > > > This series puts the vIOMMU reset in the 3-phase exit callback. > > > > > > This scheme was tested successful with virtio-devices and some > > > VFIO devices. Nevertheless not all the topologies have been > > > tested yet. > > > > Eric, > > > > It's great to know that we seem to be able to fix everything in such small > > changeset! > > > > I would like to double check two things with you here: > > > > - For VFIO's reset hook, looks like we have landed more changes so that > > vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the > > reset during "hold" phase only (via legacy_reset_hold()). That part > > will make sure vIOMMU (if switching to exit()-only reset) will order > > properly with VFIO. Is my understanding correct here? > > Yes, we now do a reset of the whole system as a three-phase setup, > and the old pre-three-phase reset APIs like qemu_register_reset() and > device_class_set_legacy_reset() all happen during the "hold" phase. > > > - Is it possible if some PCIe devices that will provide its own > > phase.exit(), would it matter on the order of PCIe device's > > phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use > > exit()-only approach like this one)? > > It's certainly possible for a PCIe device to implement > a three-phase reset which does things in the exit phase. However > I think I would say that such a device which didn't cancel all > outstanding DMA operations during either 'enter' or 'hold' > phases would be broken. If it did some other things during > the 'exit' phase I don't think the ordering of those vs the > iommu 'exit' handling should matter. Yes, this sounds fair. > > (To some extent the splitting into three phases is trying > to set up a consistent model as outlined in docs/devel/reset.rst > and to some extent it's just a convenient way to get a basic > "this reset thing I need to do must happen after some other > device has done its reset things" which you can achieve > by ad-hoc putting them in different phases. Ideally we get > mostly the former and a little pragmatic dose of the latter, > but the consistent model is not very solidly nailed down > so I have a feeling the proportions may not be quite as > lopsided as we'd like :-) ) Yes, it's a good move that we can have other ways to fix all the problems without major surgery, and it also looks solid and clean if we have plan to fix any outlier PCIe devices. If there will be a repost after all, not sure if Eric would like to add some of above discussions into either some commit messages or cover letter. Or some comment in the code might be even better. Thanks!
Hi, On 2/7/25 6:31 PM, Peter Xu wrote: > On Fri, Feb 07, 2025 at 05:06:20PM +0000, Peter Maydell wrote: >> On Fri, 7 Feb 2025 at 16:54, Peter Xu <peterx@redhat.com> wrote: >>> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote: >>>> This is a follow-up of Peter's attempt to fix the fact that >>>> vIOMMUs are likely to be reset before the device they protect: >>>> >>>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices >>>> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/ >>>> >>>> This is especially observed with virtio devices when a qmp system_reset >>>> command is sent but also with VFIO devices. >>>> >>>> This series puts the vIOMMU reset in the 3-phase exit callback. >>>> >>>> This scheme was tested successful with virtio-devices and some >>>> VFIO devices. Nevertheless not all the topologies have been >>>> tested yet. >>> Eric, >>> >>> It's great to know that we seem to be able to fix everything in such small >>> changeset! >>> >>> I would like to double check two things with you here: >>> >>> - For VFIO's reset hook, looks like we have landed more changes so that >>> vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the >>> reset during "hold" phase only (via legacy_reset_hold()). That part >>> will make sure vIOMMU (if switching to exit()-only reset) will order >>> properly with VFIO. Is my understanding correct here? >> Yes, we now do a reset of the whole system as a three-phase setup, >> and the old pre-three-phase reset APIs like qemu_register_reset() and >> device_class_set_legacy_reset() all happen during the "hold" phase. >> >>> - Is it possible if some PCIe devices that will provide its own >>> phase.exit(), would it matter on the order of PCIe device's >>> phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use >>> exit()-only approach like this one)? >> It's certainly possible for a PCIe device to implement >> a three-phase reset which does things in the exit phase. However >> I think I would say that such a device which didn't cancel all >> outstanding DMA operations during either 'enter' or 'hold' >> phases would be broken. If it did some other things during >> the 'exit' phase I don't think the ordering of those vs the >> iommu 'exit' handling should matter. > Yes, this sounds fair. > >> (To some extent the splitting into three phases is trying >> to set up a consistent model as outlined in docs/devel/reset.rst >> and to some extent it's just a convenient way to get a basic >> "this reset thing I need to do must happen after some other >> device has done its reset things" which you can achieve >> by ad-hoc putting them in different phases. Ideally we get >> mostly the former and a little pragmatic dose of the latter, >> but the consistent model is not very solidly nailed down >> so I have a feeling the proportions may not be quite as >> lopsided as we'd like :-) ) > Yes, it's a good move that we can have other ways to fix all the problems > without major surgery, and it also looks solid and clean if we have plan to > fix any outlier PCIe devices. > > If there will be a repost after all, not sure if Eric would like to add > some of above discussions into either some commit messages or cover letter. > Or some comment in the code might be even better. Yes I will definitively augment commit msgs/cover letter with all those considerations. Thank you very much for this rich discussion! Eric > > Thanks! >
Hi Cédric, On 2/7/25 6:25 PM, Cédric Le Goater wrote: > On 2/7/25 17:54, Peter Xu wrote: >> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote: >>> This is a follow-up of Peter's attempt to fix the fact that >>> vIOMMUs are likely to be reset before the device they protect: >>> >>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices >>> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/ >>> >>> This is especially observed with virtio devices when a qmp system_reset >>> command is sent but also with VFIO devices. >>> >>> This series puts the vIOMMU reset in the 3-phase exit callback. >>> >>> This scheme was tested successful with virtio-devices and some >>> VFIO devices. Nevertheless not all the topologies have been >>> tested yet. >> >> Eric, >> >> It's great to know that we seem to be able to fix everything in such >> small >> changeset! >> >> I would like to double check two things with you here: >> >> - For VFIO's reset hook, looks like we have landed more changes so >> that >> vfio's reset function is now a TYPE_LEGACY_RESET, and it always >> do the >> reset during "hold" phase only (via legacy_reset_hold()). That >> part >> will make sure vIOMMU (if switching to exit()-only reset) will >> order >> properly with VFIO. Is my understanding correct here? > > > Eric, > > We were still seeing DMA errors from VFIO devices : > > VFIO_MAP_DMA failed: Bad address > > with this series at shutdown (machine or OS) when using an intel_iommu > device. We could see that the VIOMMU was reset and the device DMAs > were still alive. Do you know why now ? I have started debugging this other case. At first sight this looks like a different problem. First this occurs on a qmp system_powerdown The error messages do not occur on qemu reset but rather as a result of the guest disabling the intel iommu anc curiously when the aliased IOMMU MR (nodma) is re-enabled. I need more time to debug this. Eric > > Thanks, > > C. > > >> >> - Is it possible if some PCIe devices that will provide its own >> phase.exit(), would it matter on the order of PCIe device's >> phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use >> exit()-only approach like this one)? >> >> PS: it would be great to attach such information in either cover >> letter or >> commit message. But definitely not a request to repost the patchset, if >> Michael would have Message-ID when merge that'll be far enough to help >> anyone find this discussion again. >> >> Thanks! >> >