mbox series

[0/5] Fix vIOMMU reset order

Message ID 20250206142307.921070-1-eric.auger@redhat.com (mailing list archive)
Headers show
Series Fix vIOMMU reset order | expand

Message

Eric Auger Feb. 6, 2025, 2:21 p.m. UTC
This is a follow-up of Peter's attempt to fix the fact that
vIOMMUs are likely to be reset before the device they protect:

[PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/

This is especially observed with virtio devices when a qmp system_reset
command is sent but also with VFIO devices.

This series puts the vIOMMU reset in the 3-phase exit callback.

This scheme was tested successful with virtio-devices and some
VFIO devices. Nevertheless not all the topologies have been
tested yet.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/viommu-3phase-reset-v1

Eric Auger (4):
  hw/virtio/virtio-iommu: Migrate to 3-phase reset
  hw/i386/intel-iommu: Migrate to 3-phase reset
  hw/arm/smmuv3: Move reset to exit phase
  hw/vfio/common: Add a trace point in vfio_reset_handler

Peter Xu (1):
  hw/i386/intel_iommu: Tear down address spaces before IOMMU reset

 hw/arm/smmuv3.c          |  9 +++++----
 hw/i386/intel_iommu.c    | 10 ++++++----
 hw/vfio/common.c         |  1 +
 hw/virtio/virtio-iommu.c |  9 +++++----
 hw/arm/trace-events      |  1 +
 hw/i386/trace-events     |  1 +
 hw/vfio/trace-events     |  1 +
 hw/virtio/trace-events   |  2 +-
 8 files changed, 21 insertions(+), 13 deletions(-)

Comments

Michael S. Tsirkin Feb. 7, 2025, 11:09 a.m. UTC | #1
On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
> This is a follow-up of Peter's attempt to fix the fact that
> vIOMMUs are likely to be reset before the device they protect:
> 
> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
> 
> This is especially observed with virtio devices when a qmp system_reset
> command is sent but also with VFIO devices.
> 
> This series puts the vIOMMU reset in the 3-phase exit callback.
> 
> This scheme was tested successful with virtio-devices and some
> VFIO devices. Nevertheless not all the topologies have been
> tested yet.
> 
> Best Regards
> 
> Eric



Looks good.


Acked-by: Michael S. Tsirkin <mst@redhat.com>

How should this be merged?
I supposed I can merge the 1st three and the other
two by the respective maintainers?
I don't think there's a dependency here, right?

> This series can be found at:
> https://github.com/eauger/qemu/tree/viommu-3phase-reset-v1
> 
> Eric Auger (4):
>   hw/virtio/virtio-iommu: Migrate to 3-phase reset
>   hw/i386/intel-iommu: Migrate to 3-phase reset
>   hw/arm/smmuv3: Move reset to exit phase
>   hw/vfio/common: Add a trace point in vfio_reset_handler
> 
> Peter Xu (1):
>   hw/i386/intel_iommu: Tear down address spaces before IOMMU reset
> 
>  hw/arm/smmuv3.c          |  9 +++++----
>  hw/i386/intel_iommu.c    | 10 ++++++----
>  hw/vfio/common.c         |  1 +
>  hw/virtio/virtio-iommu.c |  9 +++++----
>  hw/arm/trace-events      |  1 +
>  hw/i386/trace-events     |  1 +
>  hw/vfio/trace-events     |  1 +
>  hw/virtio/trace-events   |  2 +-
>  8 files changed, 21 insertions(+), 13 deletions(-)
> 
> -- 
> 2.47.1
Peter Maydell Feb. 7, 2025, 4:40 p.m. UTC | #2
On Fri, 7 Feb 2025 at 11:10, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
> > This is a follow-up of Peter's attempt to fix the fact that
> > vIOMMUs are likely to be reset before the device they protect:
> >
> > [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
> >
> > This is especially observed with virtio devices when a qmp system_reset
> > command is sent but also with VFIO devices.
> >
> > This series puts the vIOMMU reset in the 3-phase exit callback.
> >
> > This scheme was tested successful with virtio-devices and some
> > VFIO devices. Nevertheless not all the topologies have been
> > tested yet.
> >
> > Best Regards
> >
> > Eric
>
>
>
> Looks good.
>
>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>
> How should this be merged?
> I supposed I can merge the 1st three and the other
> two by the respective maintainers?
> I don't think there's a dependency here, right?

If we're happy with the design of the series I think it
would be simpler to take the whole thing through one
tree, rather than split it up. I had a question on the
smmu patch which is mostly about clarifying what the
issue is that we're running into, but in principle
I'm happy for you to take that patch as well.

-- PMM
Eric Auger Feb. 7, 2025, 4:52 p.m. UTC | #3
Hi Peter, Michael,


On 2/7/25 5:40 PM, Peter Maydell wrote:
> On Fri, 7 Feb 2025 at 11:10, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
>>> This is a follow-up of Peter's attempt to fix the fact that
>>> vIOMMUs are likely to be reset before the device they protect:
>>>
>>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
>>> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
>>>
>>> This is especially observed with virtio devices when a qmp system_reset
>>> command is sent but also with VFIO devices.
>>>
>>> This series puts the vIOMMU reset in the 3-phase exit callback.
>>>
>>> This scheme was tested successful with virtio-devices and some
>>> VFIO devices. Nevertheless not all the topologies have been
>>> tested yet.
>>>
>>> Best Regards
>>>
>>> Eric
>>
>>
>> Looks good.
>>
>>
>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>
>> How should this be merged?
>> I supposed I can merge the 1st three and the other
>> two by the respective maintainers?
>> I don't think there's a dependency here, right?
> If we're happy with the design of the series I think it
> would be simpler to take the whole thing through one
> tree, rather than split it up. I had a question on the
> smmu patch which is mostly about clarifying what the
> issue is that we're running into, but in principle
> I'm happy for you to take that patch as well.
Thank you for the swift review. I will respin to add some comments along
with the reset function and the relevance of exit phase + add more
details in the cover letter.

Thanks

Erc
>
> -- PMM
>
Peter Xu Feb. 7, 2025, 4:54 p.m. UTC | #4
On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
> This is a follow-up of Peter's attempt to fix the fact that
> vIOMMUs are likely to be reset before the device they protect:
> 
> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
> 
> This is especially observed with virtio devices when a qmp system_reset
> command is sent but also with VFIO devices.
> 
> This series puts the vIOMMU reset in the 3-phase exit callback.
> 
> This scheme was tested successful with virtio-devices and some
> VFIO devices. Nevertheless not all the topologies have been
> tested yet.

Eric,

It's great to know that we seem to be able to fix everything in such small
changeset!

I would like to double check two things with you here:

  - For VFIO's reset hook, looks like we have landed more changes so that
    vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the
    reset during "hold" phase only (via legacy_reset_hold()).  That part
    will make sure vIOMMU (if switching to exit()-only reset) will order
    properly with VFIO.  Is my understanding correct here?

  - Is it possible if some PCIe devices that will provide its own
    phase.exit(), would it matter on the order of PCIe device's
    phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use
    exit()-only approach like this one)?

PS: it would be great to attach such information in either cover letter or
commit message.  But definitely not a request to repost the patchset, if
Michael would have Message-ID when merge that'll be far enough to help
anyone find this discussion again.

Thanks!
Peter Maydell Feb. 7, 2025, 5:06 p.m. UTC | #5
On Fri, 7 Feb 2025 at 16:54, Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
> > This is a follow-up of Peter's attempt to fix the fact that
> > vIOMMUs are likely to be reset before the device they protect:
> >
> > [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
> >
> > This is especially observed with virtio devices when a qmp system_reset
> > command is sent but also with VFIO devices.
> >
> > This series puts the vIOMMU reset in the 3-phase exit callback.
> >
> > This scheme was tested successful with virtio-devices and some
> > VFIO devices. Nevertheless not all the topologies have been
> > tested yet.
>
> Eric,
>
> It's great to know that we seem to be able to fix everything in such small
> changeset!
>
> I would like to double check two things with you here:
>
>   - For VFIO's reset hook, looks like we have landed more changes so that
>     vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the
>     reset during "hold" phase only (via legacy_reset_hold()).  That part
>     will make sure vIOMMU (if switching to exit()-only reset) will order
>     properly with VFIO.  Is my understanding correct here?

Yes, we now do a reset of the whole system as a three-phase setup,
and the old pre-three-phase reset APIs like qemu_register_reset() and
device_class_set_legacy_reset() all happen during the "hold" phase.

>   - Is it possible if some PCIe devices that will provide its own
>     phase.exit(), would it matter on the order of PCIe device's
>     phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use
>     exit()-only approach like this one)?

It's certainly possible for a PCIe device to implement
a three-phase reset which does things in the exit phase. However
I think I would say that such a device which didn't cancel all
outstanding DMA operations during either 'enter' or 'hold'
phases would be broken. If it did some other things during
the 'exit' phase I don't think the ordering of those vs the
iommu 'exit' handling should matter.

(To some extent the splitting into three phases is trying
to set up a consistent model as outlined in docs/devel/reset.rst
and to some extent it's just a convenient way to get a basic
"this reset thing I need to do must happen after some other
device has done its reset things" which you can achieve
by ad-hoc putting them in different phases. Ideally we get
mostly the former and a little pragmatic dose of the latter,
but the consistent model is not very solidly nailed down
so I have a feeling the proportions may not be quite as
lopsided as we'd like :-) )

thanks
-- PMM
Cédric Le Goater Feb. 7, 2025, 5:25 p.m. UTC | #6
On 2/7/25 17:54, Peter Xu wrote:
> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
>> This is a follow-up of Peter's attempt to fix the fact that
>> vIOMMUs are likely to be reset before the device they protect:
>>
>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
>> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
>>
>> This is especially observed with virtio devices when a qmp system_reset
>> command is sent but also with VFIO devices.
>>
>> This series puts the vIOMMU reset in the 3-phase exit callback.
>>
>> This scheme was tested successful with virtio-devices and some
>> VFIO devices. Nevertheless not all the topologies have been
>> tested yet.
> 
> Eric,
> 
> It's great to know that we seem to be able to fix everything in such small
> changeset!
> 
> I would like to double check two things with you here:
> 
>    - For VFIO's reset hook, looks like we have landed more changes so that
>      vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the
>      reset during "hold" phase only (via legacy_reset_hold()).  That part
>      will make sure vIOMMU (if switching to exit()-only reset) will order
>      properly with VFIO.  Is my understanding correct here?


Eric,

We were still seeing DMA errors from VFIO devices :

   VFIO_MAP_DMA failed: Bad address

with this series at shutdown (machine or OS) when using an intel_iommu
device. We could see that the VIOMMU was reset and the device DMAs
were still alive. Do you know why now ?

Thanks,

C.


> 
>    - Is it possible if some PCIe devices that will provide its own
>      phase.exit(), would it matter on the order of PCIe device's
>      phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use
>      exit()-only approach like this one)?
> 
> PS: it would be great to attach such information in either cover letter or
> commit message.  But definitely not a request to repost the patchset, if
> Michael would have Message-ID when merge that'll be far enough to help
> anyone find this discussion again.
> 
> Thanks!
>
Peter Xu Feb. 7, 2025, 5:31 p.m. UTC | #7
On Fri, Feb 07, 2025 at 05:06:20PM +0000, Peter Maydell wrote:
> On Fri, 7 Feb 2025 at 16:54, Peter Xu <peterx@redhat.com> wrote:
> >
> > On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
> > > This is a follow-up of Peter's attempt to fix the fact that
> > > vIOMMUs are likely to be reset before the device they protect:
> > >
> > > [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> > > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
> > >
> > > This is especially observed with virtio devices when a qmp system_reset
> > > command is sent but also with VFIO devices.
> > >
> > > This series puts the vIOMMU reset in the 3-phase exit callback.
> > >
> > > This scheme was tested successful with virtio-devices and some
> > > VFIO devices. Nevertheless not all the topologies have been
> > > tested yet.
> >
> > Eric,
> >
> > It's great to know that we seem to be able to fix everything in such small
> > changeset!
> >
> > I would like to double check two things with you here:
> >
> >   - For VFIO's reset hook, looks like we have landed more changes so that
> >     vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the
> >     reset during "hold" phase only (via legacy_reset_hold()).  That part
> >     will make sure vIOMMU (if switching to exit()-only reset) will order
> >     properly with VFIO.  Is my understanding correct here?
> 
> Yes, we now do a reset of the whole system as a three-phase setup,
> and the old pre-three-phase reset APIs like qemu_register_reset() and
> device_class_set_legacy_reset() all happen during the "hold" phase.
> 
> >   - Is it possible if some PCIe devices that will provide its own
> >     phase.exit(), would it matter on the order of PCIe device's
> >     phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use
> >     exit()-only approach like this one)?
> 
> It's certainly possible for a PCIe device to implement
> a three-phase reset which does things in the exit phase. However
> I think I would say that such a device which didn't cancel all
> outstanding DMA operations during either 'enter' or 'hold'
> phases would be broken. If it did some other things during
> the 'exit' phase I don't think the ordering of those vs the
> iommu 'exit' handling should matter.

Yes, this sounds fair.

> 
> (To some extent the splitting into three phases is trying
> to set up a consistent model as outlined in docs/devel/reset.rst
> and to some extent it's just a convenient way to get a basic
> "this reset thing I need to do must happen after some other
> device has done its reset things" which you can achieve
> by ad-hoc putting them in different phases. Ideally we get
> mostly the former and a little pragmatic dose of the latter,
> but the consistent model is not very solidly nailed down
> so I have a feeling the proportions may not be quite as
> lopsided as we'd like :-) )

Yes, it's a good move that we can have other ways to fix all the problems
without major surgery, and it also looks solid and clean if we have plan to
fix any outlier PCIe devices.

If there will be a repost after all, not sure if Eric would like to add
some of above discussions into either some commit messages or cover letter.
Or some comment in the code might be even better.

Thanks!
Eric Auger Feb. 10, 2025, 8:45 a.m. UTC | #8
Hi,

On 2/7/25 6:31 PM, Peter Xu wrote:
> On Fri, Feb 07, 2025 at 05:06:20PM +0000, Peter Maydell wrote:
>> On Fri, 7 Feb 2025 at 16:54, Peter Xu <peterx@redhat.com> wrote:
>>> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
>>>> This is a follow-up of Peter's attempt to fix the fact that
>>>> vIOMMUs are likely to be reset before the device they protect:
>>>>
>>>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
>>>> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
>>>>
>>>> This is especially observed with virtio devices when a qmp system_reset
>>>> command is sent but also with VFIO devices.
>>>>
>>>> This series puts the vIOMMU reset in the 3-phase exit callback.
>>>>
>>>> This scheme was tested successful with virtio-devices and some
>>>> VFIO devices. Nevertheless not all the topologies have been
>>>> tested yet.
>>> Eric,
>>>
>>> It's great to know that we seem to be able to fix everything in such small
>>> changeset!
>>>
>>> I would like to double check two things with you here:
>>>
>>>   - For VFIO's reset hook, looks like we have landed more changes so that
>>>     vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the
>>>     reset during "hold" phase only (via legacy_reset_hold()).  That part
>>>     will make sure vIOMMU (if switching to exit()-only reset) will order
>>>     properly with VFIO.  Is my understanding correct here?
>> Yes, we now do a reset of the whole system as a three-phase setup,
>> and the old pre-three-phase reset APIs like qemu_register_reset() and
>> device_class_set_legacy_reset() all happen during the "hold" phase.
>>
>>>   - Is it possible if some PCIe devices that will provide its own
>>>     phase.exit(), would it matter on the order of PCIe device's
>>>     phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use
>>>     exit()-only approach like this one)?
>> It's certainly possible for a PCIe device to implement
>> a three-phase reset which does things in the exit phase. However
>> I think I would say that such a device which didn't cancel all
>> outstanding DMA operations during either 'enter' or 'hold'
>> phases would be broken. If it did some other things during
>> the 'exit' phase I don't think the ordering of those vs the
>> iommu 'exit' handling should matter.
> Yes, this sounds fair.
>
>> (To some extent the splitting into three phases is trying
>> to set up a consistent model as outlined in docs/devel/reset.rst
>> and to some extent it's just a convenient way to get a basic
>> "this reset thing I need to do must happen after some other
>> device has done its reset things" which you can achieve
>> by ad-hoc putting them in different phases. Ideally we get
>> mostly the former and a little pragmatic dose of the latter,
>> but the consistent model is not very solidly nailed down
>> so I have a feeling the proportions may not be quite as
>> lopsided as we'd like :-) )
> Yes, it's a good move that we can have other ways to fix all the problems
> without major surgery, and it also looks solid and clean if we have plan to
> fix any outlier PCIe devices.
>
> If there will be a repost after all, not sure if Eric would like to add
> some of above discussions into either some commit messages or cover letter.
> Or some comment in the code might be even better.
Yes I will definitively augment commit msgs/cover letter with all those
considerations. Thank you very much for this rich discussion!

Eric
>
> Thanks!
>
Eric Auger Feb. 10, 2025, 8:49 a.m. UTC | #9
Hi Cédric,


On 2/7/25 6:25 PM, Cédric Le Goater wrote:
> On 2/7/25 17:54, Peter Xu wrote:
>> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
>>> This is a follow-up of Peter's attempt to fix the fact that
>>> vIOMMUs are likely to be reset before the device they protect:
>>>
>>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
>>> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
>>>
>>> This is especially observed with virtio devices when a qmp system_reset
>>> command is sent but also with VFIO devices.
>>>
>>> This series puts the vIOMMU reset in the 3-phase exit callback.
>>>
>>> This scheme was tested successful with virtio-devices and some
>>> VFIO devices. Nevertheless not all the topologies have been
>>> tested yet.
>>
>> Eric,
>>
>> It's great to know that we seem to be able to fix everything in such
>> small
>> changeset!
>>
>> I would like to double check two things with you here:
>>
>>    - For VFIO's reset hook, looks like we have landed more changes so
>> that
>>      vfio's reset function is now a TYPE_LEGACY_RESET, and it always
>> do the
>>      reset during "hold" phase only (via legacy_reset_hold()).  That
>> part
>>      will make sure vIOMMU (if switching to exit()-only reset) will
>> order
>>      properly with VFIO.  Is my understanding correct here?
>
>
> Eric,
>
> We were still seeing DMA errors from VFIO devices :
>
>   VFIO_MAP_DMA failed: Bad address
>
> with this series at shutdown (machine or OS) when using an intel_iommu
> device. We could see that the VIOMMU was reset and the device DMAs
> were still alive. Do you know why now ?

I have started debugging this other case. At first sight this looks like
a different problem. First this occurs on a qmp system_powerdown
The error messages do not occur on qemu reset but rather as a result of
the guest disabling the intel iommu anc curiously when the aliased IOMMU
MR (nodma) is re-enabled. I need more time to debug this.

Eric

>
> Thanks,
>
> C.
>
>
>>
>>    - Is it possible if some PCIe devices that will provide its own
>>      phase.exit(), would it matter on the order of PCIe device's
>>      phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use
>>      exit()-only approach like this one)?
>>
>> PS: it would be great to attach such information in either cover
>> letter or
>> commit message.  But definitely not a request to repost the patchset, if
>> Michael would have Message-ID when merge that'll be far enough to help
>> anyone find this discussion again.
>>
>> Thanks!
>>
>