Message ID | 1461245745-6710-1-git-send-email-mst@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Add Stefano and Anthony On Thu, Apr 21, 2016 at 04:43:45PM +0300, Michael S. Tsirkin wrote: > This adds a flag to enable/disable bypassing the IOMMU by > virtio devices. > > This is on top of patch > http://article.gmane.org/gmane.comp.emulators.qemu/403467 > virtio: convert to use DMA api > > Tested with patchset > http://article.gmane.org/gmane.linux.kernel.virtualization/27545 > virtio-pci: iommu support (note: bit number has been kept at 34 > intentionally to match posted guest code. a non-RFC version will > renumber bits to be contigious). > > changes from v1: > drop PASSTHROUGH flag > > The interaction between virtio and DMA API is messy. > > On most systems with virtio, physical addresses match bus addresses, > and it doesn't particularly matter whether we use the DMA API. > > On some systems, including Xen and any system with a physical device > that speaks virtio behind a physical IOMMU, we must use the DMA API > for virtio DMA to work at all. > > Add a feature bit to detect that: VIRTIO_F_IOMMU_PLATFORM. > > If not there, we preserve historic behavior and bypass the DMA > API unless within Xen guest. This is actually required for > systems, including SPARC and PPC64, where virtio-pci devices are > enumerated as though they are behind an IOMMU, but the virtio host > ignores the IOMMU, so we must either pretend that the IOMMU isn't > there or somehow map everything as the identity. > > Re: non-virtio devices. > > It turns out that on old QEMU hosts, only emulated devices which were > part of QEMU use the IOMMU. Should we want to bypass the IOMMU for such > devices *only*, it would be rather easy to detect them by looking at > subsystem vendor and device ID. Thus, no new interfaces are required > except for virtio which always uses the same subsystem vendor and device ID. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > include/hw/virtio/virtio-access.h | 3 ++- > include/hw/virtio/virtio.h | 4 +++- > include/standard-headers/linux/virtio_config.h | 2 ++ > 3 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h > index 967cc75..bb6f34e 100644 > --- a/include/hw/virtio/virtio-access.h > +++ b/include/hw/virtio/virtio-access.h > @@ -23,7 +23,8 @@ static inline AddressSpace *virtio_get_dma_as(VirtIODevice *vdev) > BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); > VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); > > - if (k->get_dma_as) { > + if (virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) && > + k->get_dma_as) { > return k->get_dma_as(qbus->parent); > } > return &address_space_memory; > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > index b12faa9..44f3788 100644 > --- a/include/hw/virtio/virtio.h > +++ b/include/hw/virtio/virtio.h > @@ -228,7 +228,9 @@ typedef struct VirtIORNGConf VirtIORNGConf; > DEFINE_PROP_BIT64("notify_on_empty", _state, _field, \ > VIRTIO_F_NOTIFY_ON_EMPTY, true), \ > DEFINE_PROP_BIT64("any_layout", _state, _field, \ > - VIRTIO_F_ANY_LAYOUT, true) > + VIRTIO_F_ANY_LAYOUT, true), \ > + DEFINE_PROP_BIT64("iommu_platform", _state, _field, \ > + VIRTIO_F_IOMMU_PLATFORM, false) > > hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n); > hwaddr virtio_queue_get_avail_addr(VirtIODevice *vdev, int n); > diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h > index bcc445b..3fcfbb1 100644 > --- a/include/standard-headers/linux/virtio_config.h > +++ b/include/standard-headers/linux/virtio_config.h > @@ -61,4 +61,6 @@ > /* v1.0 compliant. */ > #define VIRTIO_F_VERSION_1 32 > > +/* Do not bypass the IOMMU (if configured) */ > +#define VIRTIO_F_IOMMU_PLATFORM 34 > #endif /* _LINUX_VIRTIO_CONFIG_H */ > -- > MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 21, 2016 at 04:43:45PM +0300, Michael S. Tsirkin wrote: > This adds a flag to enable/disable bypassing the IOMMU by > virtio devices. > > This is on top of patch > http://article.gmane.org/gmane.comp.emulators.qemu/403467 > virtio: convert to use DMA api > > Tested with patchset > http://article.gmane.org/gmane.linux.kernel.virtualization/27545 > virtio-pci: iommu support (note: bit number has been kept at 34 > intentionally to match posted guest code. a non-RFC version will > renumber bits to be contigious). > > changes from v1: > drop PASSTHROUGH flag > > The interaction between virtio and DMA API is messy. > > On most systems with virtio, physical addresses match bus addresses, > and it doesn't particularly matter whether we use the DMA API. > > On some systems, including Xen and any system with a physical device > that speaks virtio behind a physical IOMMU, we must use the DMA API > for virtio DMA to work at all. > > Add a feature bit to detect that: VIRTIO_F_IOMMU_PLATFORM. > > If not there, we preserve historic behavior and bypass the DMA > API unless within Xen guest. This is actually required for > systems, including SPARC and PPC64, where virtio-pci devices are > enumerated as though they are behind an IOMMU, but the virtio host > ignores the IOMMU, so we must either pretend that the IOMMU isn't > there or somehow map everything as the identity. > > Re: non-virtio devices. > > It turns out that on old QEMU hosts, only emulated devices which were > part of QEMU use the IOMMU. Should we want to bypass the IOMMU for such > devices *only*, it would be rather easy to detect them by looking at > subsystem vendor and device ID. Thus, no new interfaces are required > except for virtio which always uses the same subsystem vendor and device ID. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > include/hw/virtio/virtio-access.h | 3 ++- > include/hw/virtio/virtio.h | 4 +++- > include/standard-headers/linux/virtio_config.h | 2 ++ > 3 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h > index 967cc75..bb6f34e 100644 > --- a/include/hw/virtio/virtio-access.h > +++ b/include/hw/virtio/virtio-access.h > @@ -23,7 +23,8 @@ static inline AddressSpace *virtio_get_dma_as(VirtIODevice *vdev) > BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); > VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); > > - if (k->get_dma_as) { > + if (virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) && > + k->get_dma_as) { > return k->get_dma_as(qbus->parent); > } > return &address_space_memory; > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > index b12faa9..44f3788 100644 > --- a/include/hw/virtio/virtio.h > +++ b/include/hw/virtio/virtio.h > @@ -228,7 +228,9 @@ typedef struct VirtIORNGConf VirtIORNGConf; > DEFINE_PROP_BIT64("notify_on_empty", _state, _field, \ > VIRTIO_F_NOTIFY_ON_EMPTY, true), \ > DEFINE_PROP_BIT64("any_layout", _state, _field, \ > - VIRTIO_F_ANY_LAYOUT, true) > + VIRTIO_F_ANY_LAYOUT, true), \ > + DEFINE_PROP_BIT64("iommu_platform", _state, _field, \ > + VIRTIO_F_IOMMU_PLATFORM, false) Looks like the impact of this patch is that users who relied on k->get_dma_as today may now have to explicitly add iommu_platform=on. Are there any such users (e.g. Xen)? Instead of breaking the command-line for these users you could invert the flag's meaning ("iommu_bypass=on") and set it in the SPARC/PPC machine types. Stefan
On Thu, Apr 21, 2016 at 03:56:53PM +0100, Stefan Hajnoczi wrote: > On Thu, Apr 21, 2016 at 04:43:45PM +0300, Michael S. Tsirkin wrote: > > This adds a flag to enable/disable bypassing the IOMMU by > > virtio devices. > > > > This is on top of patch > > http://article.gmane.org/gmane.comp.emulators.qemu/403467 > > virtio: convert to use DMA api > > > > Tested with patchset > > http://article.gmane.org/gmane.linux.kernel.virtualization/27545 > > virtio-pci: iommu support (note: bit number has been kept at 34 > > intentionally to match posted guest code. a non-RFC version will > > renumber bits to be contigious). > > > > changes from v1: > > drop PASSTHROUGH flag > > > > The interaction between virtio and DMA API is messy. > > > > On most systems with virtio, physical addresses match bus addresses, > > and it doesn't particularly matter whether we use the DMA API. > > > > On some systems, including Xen and any system with a physical device > > that speaks virtio behind a physical IOMMU, we must use the DMA API > > for virtio DMA to work at all. > > > > Add a feature bit to detect that: VIRTIO_F_IOMMU_PLATFORM. > > > > If not there, we preserve historic behavior and bypass the DMA > > API unless within Xen guest. This is actually required for > > systems, including SPARC and PPC64, where virtio-pci devices are > > enumerated as though they are behind an IOMMU, but the virtio host > > ignores the IOMMU, so we must either pretend that the IOMMU isn't > > there or somehow map everything as the identity. > > > > Re: non-virtio devices. > > > > It turns out that on old QEMU hosts, only emulated devices which were > > part of QEMU use the IOMMU. Should we want to bypass the IOMMU for such > > devices *only*, it would be rather easy to detect them by looking at > > subsystem vendor and device ID. Thus, no new interfaces are required > > except for virtio which always uses the same subsystem vendor and device ID. > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > --- > > include/hw/virtio/virtio-access.h | 3 ++- > > include/hw/virtio/virtio.h | 4 +++- > > include/standard-headers/linux/virtio_config.h | 2 ++ > > 3 files changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h > > index 967cc75..bb6f34e 100644 > > --- a/include/hw/virtio/virtio-access.h > > +++ b/include/hw/virtio/virtio-access.h > > @@ -23,7 +23,8 @@ static inline AddressSpace *virtio_get_dma_as(VirtIODevice *vdev) > > BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); > > VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); > > > > - if (k->get_dma_as) { > > + if (virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) && > > + k->get_dma_as) { > > return k->get_dma_as(qbus->parent); > > } > > return &address_space_memory; > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > > index b12faa9..44f3788 100644 > > --- a/include/hw/virtio/virtio.h > > +++ b/include/hw/virtio/virtio.h > > @@ -228,7 +228,9 @@ typedef struct VirtIORNGConf VirtIORNGConf; > > DEFINE_PROP_BIT64("notify_on_empty", _state, _field, \ > > VIRTIO_F_NOTIFY_ON_EMPTY, true), \ > > DEFINE_PROP_BIT64("any_layout", _state, _field, \ > > - VIRTIO_F_ANY_LAYOUT, true) > > + VIRTIO_F_ANY_LAYOUT, true), \ > > + DEFINE_PROP_BIT64("iommu_platform", _state, _field, \ > > + VIRTIO_F_IOMMU_PLATFORM, false) > > Looks like the impact of this patch is that users who relied on > k->get_dma_as today may now have to explicitly add iommu_platform=on. > Are there any such users (e.g. Xen)? No because upstream this is ignored. This is an incremental patch on top of Jason's one. > Instead of breaking the command-line for these users you could invert > the flag's meaning ("iommu_bypass=on") and set it in the SPARC/PPC > machine types. > > Stefan I hope I made it clear that there are no such users. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 21, 2016 at 06:11:40PM +0300, Michael S. Tsirkin wrote: > On Thu, Apr 21, 2016 at 03:56:53PM +0100, Stefan Hajnoczi wrote: > > On Thu, Apr 21, 2016 at 04:43:45PM +0300, Michael S. Tsirkin wrote: > > > This adds a flag to enable/disable bypassing the IOMMU by > > > virtio devices. > > > > > > This is on top of patch > > > http://article.gmane.org/gmane.comp.emulators.qemu/403467 > > > virtio: convert to use DMA api > > > > > > Tested with patchset > > > http://article.gmane.org/gmane.linux.kernel.virtualization/27545 > > > virtio-pci: iommu support (note: bit number has been kept at 34 > > > intentionally to match posted guest code. a non-RFC version will > > > renumber bits to be contigious). > > > > > > changes from v1: > > > drop PASSTHROUGH flag > > > > > > The interaction between virtio and DMA API is messy. > > > > > > On most systems with virtio, physical addresses match bus addresses, > > > and it doesn't particularly matter whether we use the DMA API. > > > > > > On some systems, including Xen and any system with a physical device > > > that speaks virtio behind a physical IOMMU, we must use the DMA API > > > for virtio DMA to work at all. > > > > > > Add a feature bit to detect that: VIRTIO_F_IOMMU_PLATFORM. > > > > > > If not there, we preserve historic behavior and bypass the DMA > > > API unless within Xen guest. This is actually required for > > > systems, including SPARC and PPC64, where virtio-pci devices are > > > enumerated as though they are behind an IOMMU, but the virtio host > > > ignores the IOMMU, so we must either pretend that the IOMMU isn't > > > there or somehow map everything as the identity. > > > > > > Re: non-virtio devices. > > > > > > It turns out that on old QEMU hosts, only emulated devices which were > > > part of QEMU use the IOMMU. Should we want to bypass the IOMMU for such > > > devices *only*, it would be rather easy to detect them by looking at > > > subsystem vendor and device ID. Thus, no new interfaces are required > > > except for virtio which always uses the same subsystem vendor and device ID. > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > --- > > > include/hw/virtio/virtio-access.h | 3 ++- > > > include/hw/virtio/virtio.h | 4 +++- > > > include/standard-headers/linux/virtio_config.h | 2 ++ > > > 3 files changed, 7 insertions(+), 2 deletions(-) > > > > > > diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h > > > index 967cc75..bb6f34e 100644 > > > --- a/include/hw/virtio/virtio-access.h > > > +++ b/include/hw/virtio/virtio-access.h > > > @@ -23,7 +23,8 @@ static inline AddressSpace *virtio_get_dma_as(VirtIODevice *vdev) > > > BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); > > > VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); > > > > > > - if (k->get_dma_as) { > > > + if (virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) && > > > + k->get_dma_as) { > > > return k->get_dma_as(qbus->parent); > > > } > > > return &address_space_memory; > > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > > > index b12faa9..44f3788 100644 > > > --- a/include/hw/virtio/virtio.h > > > +++ b/include/hw/virtio/virtio.h > > > @@ -228,7 +228,9 @@ typedef struct VirtIORNGConf VirtIORNGConf; > > > DEFINE_PROP_BIT64("notify_on_empty", _state, _field, \ > > > VIRTIO_F_NOTIFY_ON_EMPTY, true), \ > > > DEFINE_PROP_BIT64("any_layout", _state, _field, \ > > > - VIRTIO_F_ANY_LAYOUT, true) > > > + VIRTIO_F_ANY_LAYOUT, true), \ > > > + DEFINE_PROP_BIT64("iommu_platform", _state, _field, \ > > > + VIRTIO_F_IOMMU_PLATFORM, false) > > > > Looks like the impact of this patch is that users who relied on > > k->get_dma_as today may now have to explicitly add iommu_platform=on. > > Are there any such users (e.g. Xen)? > > No because upstream this is ignored. This is an incremental patch > on top of Jason's one. > > > Instead of breaking the command-line for these users you could invert > > the flag's meaning ("iommu_bypass=on") and set it in the SPARC/PPC > > machine types. > > > > Stefan > > I hope I made it clear that there are no such users. Okay, thanks! Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > On some systems, including Xen and any system with a physical device > > that speaks virtio behind a physical IOMMU, we must use the DMA API > > for virtio DMA to work at all. > > > > Add a feature bit to detect that: VIRTIO_F_IOMMU_PLATFORM. > > > > If not there, we preserve historic behavior and bypass the DMA > > API unless within Xen guest. This is actually required for > > systems, including SPARC and PPC64, where virtio-pci devices are > > enumerated as though they are behind an IOMMU, but the virtio host > > ignores the IOMMU, so we must either pretend that the IOMMU isn't > > there or somehow map everything as the identity. > > > > Re: non-virtio devices. > > > > It turns out that on old QEMU hosts, only emulated devices which were > > part of QEMU use the IOMMU. Should we want to bypass the IOMMU for such > > devices *only*, it would be rather easy to detect them by looking at > > subsystem vendor and device ID. Thus, no new interfaces are required > > except for virtio which always uses the same subsystem vendor and device ID. Apologies for dropping this thread; I've been travelling. But seriously, NO! I understand why you want to see this as a virtio-specific issue, but it isn't. And we don't *want* it to be. In the guest, drivers SHALL use the DMA API. And the DMA API SHALL do the right thing for each device according to its needs. So any information passed from qemu to the guest should be directed at the platform IOMMU code (or handled by qemu-detection quirks in the guest, if we must). It is *not* acceptable for the virtio drivers in the guest to just eschew the DMA API completely, triggered by some device-specific flag. The qemu implementation is, of course, monolithic. In qemu the fact that virtio doesn't get translated by the emulated IOMMU *is* actually down to code in the virtio implementation. I get that. But then again, it's not just virtio. *Any* device which we emulate for the guest could have that same issue, and appear as untranslated. (And assigned PCI devices currently do). Let's think about the parallel with a system-on-chip. Let's say we have a peripheral which got included, but which was wired up such that it bypasses the IOMMU and gets to do direct physical DMA. Is that a feature of that specific peripheral? Do we hack its drivers to make the distinction between this incarnation, and a normal discrete version of the same device? No! It's a feature of the *system* and needs to be conveyed to the OS IOMMU code to do the right thing. Not to the driver. In my opinion, adding the VIRTIO_F_IOMMU_PLATFORM feature bit is absolutely the wrong thing to do. What we *should* do is a patchset in the guest which both fixes virtio drivers to *always* use the DMA API, and fixes the DMA API to DTRT at the same time — by detecting qemu and installing no-op DMA ops for the appropriate devices, perhaps. Then we can look at giving qemu a way to properly indicate which devices it actually does DMA mapping for, so we can remove those heuristic assumptions. But that flag does *not* live in the virtio host??guest ABI.
On Wed, Apr 27, 2016 at 01:18:21PM +0100, David Woodhouse wrote: > > > > On some systems, including Xen and any system with a physical device > > > that speaks virtio behind a physical IOMMU, we must use the DMA API > > > for virtio DMA to work at all. > > > > > > Add a feature bit to detect that: VIRTIO_F_IOMMU_PLATFORM. > > > > > > If not there, we preserve historic behavior and bypass the DMA > > > API unless within Xen guest. This is actually required for > > > systems, including SPARC and PPC64, where virtio-pci devices are > > > enumerated as though they are behind an IOMMU, but the virtio host > > > ignores the IOMMU, so we must either pretend that the IOMMU isn't > > > there or somehow map everything as the identity. > > > > > > Re: non-virtio devices. > > > > > > It turns out that on old QEMU hosts, only emulated devices which were > > > part of QEMU use the IOMMU. Should we want to bypass the IOMMU for such > > > devices *only*, it would be rather easy to detect them by looking at > > > subsystem vendor and device ID. Thus, no new interfaces are required > > > except for virtio which always uses the same subsystem vendor and device ID. > > Apologies for dropping this thread; I've been travelling. > > But seriously, NO! > > I understand why you want to see this as a virtio-specific issue, but > it isn't. And we don't *want* it to be. > > In the guest, drivers SHALL use the DMA API. And the DMA API SHALL do > the right thing for each device according to its needs. > > So any information passed from qemu to the guest should be directed at > the platform IOMMU code (or handled by qemu-detection quirks in the > guest, if we must). > > It is *not* acceptable for the virtio drivers in the guest to just > eschew the DMA API completely, triggered by some device-specific flag. > > The qemu implementation is, of course, monolithic. In qemu the fact > that virtio doesn't get translated by the emulated IOMMU *is* actually > down to code in the virtio implementation. I get that. > > But then again, it's not just virtio. *Any* device which we emulate for > the guest could have that same issue, and appear as untranslated. (And > assigned PCI devices currently do). > > Let's think about the parallel with a system-on-chip. Let's say we have > a peripheral which got included, but which was wired up such that it > bypasses the IOMMU and gets to do direct physical DMA. Is that a > feature of that specific peripheral? Do we hack its drivers to make the > distinction between this incarnation, and a normal discrete version of > the same device? No! It's a feature of the *system* One correction: it's a feature of the device in the system. There could be a mix of devices bypassing and not bypassing the IOMMU. > and needs to be > conveyed to the OS IOMMU code to do the right thing. Not to the driver. > > In my opinion, adding the VIRTIO_F_IOMMU_PLATFORM feature bit is > absolutely the wrong thing to do. > > What we *should* do is a patchset in the guest which both fixes virtio > drivers to *always* use the DMA API, and fixes the DMA API to DTRT at > the same time — by detecting qemu and installing no-op DMA ops for the > appropriate devices, perhaps. Sounds good. And a way to detect appropriate devices could be by looking at the feature flag, perhaps? > Then we can look at giving qemu a way to properly indicate which > devices it actually does DMA mapping for, so we can remove those > heuristic assumptions. > > But that flag does *not* live in the virtio host??guest ABI. > > -- > David Woodhouse Open Source Technology Centre > David.Woodhouse@intel.com Intel Corporation > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 27, 2016 at 04:37:04PM +0300, Michael S. Tsirkin wrote: > One correction: it's a feature of the device in the system. > There could be a mix of devices bypassing and not > bypassing the IOMMU. No, it really is not. A device can't chose to bypass the IOMMU. But the IOMMU can chose to let the device bypass. So any fix here belongs into the platform/iommu code too and not into some driver. > Sounds good. And a way to detect appropriate devices could > be by looking at the feature flag, perhaps? Again, no! The way to detect that is to look into the iommu description structures provided by the firmware. They provide everything necessary to tell the iommu code which devices are not translated. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 27, 2016 at 7:23 AM, Joerg Roedel <joro@8bytes.org> wrote: > On Wed, Apr 27, 2016 at 04:37:04PM +0300, Michael S. Tsirkin wrote: >> One correction: it's a feature of the device in the system. >> There could be a mix of devices bypassing and not >> bypassing the IOMMU. > > No, it really is not. A device can't chose to bypass the IOMMU. But the > IOMMU can chose to let the device bypass. So any fix here belongs > into the platform/iommu code too and not into some driver. > >> Sounds good. And a way to detect appropriate devices could >> be by looking at the feature flag, perhaps? > > Again, no! The way to detect that is to look into the iommu description > structures provided by the firmware. They provide everything necessary > to tell the iommu code which devices are not translated. > Except on PPC and SPARC. As far as I know, those are the only problematic platforms. Is it too late to *disable* QEMU's q35-iommu thingy until it can be fixed to report correct data in the DMAR tables? --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 27, 2016 at 04:23:32PM +0200, Joerg Roedel wrote: > On Wed, Apr 27, 2016 at 04:37:04PM +0300, Michael S. Tsirkin wrote: > > One correction: it's a feature of the device in the system. > > There could be a mix of devices bypassing and not > > bypassing the IOMMU. > > No, it really is not. A device can't chose to bypass the IOMMU. But the > IOMMU can chose to let the device bypass. QEMU can choose to bypass IOMMU for one device and not the other. IOMMU in QEMU isn't involved when it's bypassed. > So any fix here belongs > into the platform/iommu code too and not into some driver. Fine but this is beside the point. Almost all virtio devices bypass IOMMU and what this patch does is create a way to detect devices that don't. This code can maybe go into platform. > > Sounds good. And a way to detect appropriate devices could > > be by looking at the feature flag, perhaps? > > Again, no! The way to detect that is to look into the iommu description > structures provided by the firmware. They provide everything necessary > to tell the iommu code which devices are not translated. > > > > Joerg It would be easy if they did but they don't do this on all systems. In particular the idea for firmware interface was clearly that a given bus either is connected through IOMMU or bypassing it. Whether virtio bypasses the iommu is unrelated to the bus it's on.
On Wed, Apr 27, 2016 at 07:31:43AM -0700, Andy Lutomirski wrote: > On Wed, Apr 27, 2016 at 7:23 AM, Joerg Roedel <joro@8bytes.org> wrote: > > On Wed, Apr 27, 2016 at 04:37:04PM +0300, Michael S. Tsirkin wrote: > >> One correction: it's a feature of the device in the system. > >> There could be a mix of devices bypassing and not > >> bypassing the IOMMU. > > > > No, it really is not. A device can't chose to bypass the IOMMU. But the > > IOMMU can chose to let the device bypass. So any fix here belongs > > into the platform/iommu code too and not into some driver. > > > >> Sounds good. And a way to detect appropriate devices could > >> be by looking at the feature flag, perhaps? > > > > Again, no! The way to detect that is to look into the iommu description > > structures provided by the firmware. They provide everything necessary > > to tell the iommu code which devices are not translated. > > > > Except on PPC and SPARC. As far as I know, those are the only > problematic platforms. > > Is it too late to *disable* QEMU's q35-iommu thingy until it can be > fixed to report correct data in the DMAR tables? > > --Andy Meaning virtio or assigned devices? For virtio - it's way too late since these are working configurations. For assigned devices - they don't work on x86 so it doesn't have to be disabled, it's safe to ignore.
On Wed, Apr 27, 2016 at 7:38 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Wed, Apr 27, 2016 at 07:31:43AM -0700, Andy Lutomirski wrote: >> On Wed, Apr 27, 2016 at 7:23 AM, Joerg Roedel <joro@8bytes.org> wrote: >> > On Wed, Apr 27, 2016 at 04:37:04PM +0300, Michael S. Tsirkin wrote: >> >> One correction: it's a feature of the device in the system. >> >> There could be a mix of devices bypassing and not >> >> bypassing the IOMMU. >> > >> > No, it really is not. A device can't chose to bypass the IOMMU. But the >> > IOMMU can chose to let the device bypass. So any fix here belongs >> > into the platform/iommu code too and not into some driver. >> > >> >> Sounds good. And a way to detect appropriate devices could >> >> be by looking at the feature flag, perhaps? >> > >> > Again, no! The way to detect that is to look into the iommu description >> > structures provided by the firmware. They provide everything necessary >> > to tell the iommu code which devices are not translated. >> > >> >> Except on PPC and SPARC. As far as I know, those are the only >> problematic platforms. >> >> Is it too late to *disable* QEMU's q35-iommu thingy until it can be >> fixed to report correct data in the DMAR tables? >> >> --Andy > > Meaning virtio or assigned devices? > For virtio - it's way too late since these are working configurations. > For assigned devices - they don't work on x86 so it doesn't have > to be disabled, it's safe to ignore. I mean actually prevent QEMU from running in q35-iommu mode with any virtio devices attached or maybe even turn off q35-iommu mode entirely [1]. Doesn't it require that the user literally pass the word "experimental" into QEMU right now? It did at some point IIRC. The reason I'm asking is that, other than q35-iommu, QEMU's virtio devices *don't* bypass the IOMMU except on PPC and SPARC, simply because there is no other configuration AFAICT that has virtio and and IOMMU. So maybe the right solution is to fix q35-iommu to use DMAR correctly (thus breaking q35-iommu users with older guest kernels, which hopefully don't actually exist) and to come up with a PPC- and SPARC-specific solution, or maybe OpenFirmware-specific solution, to handle PPC and SPARC down the road. [1] I'm pretty sure I emailed the QEMU list before q35-iommu ever showed up in a release asking the QEMU team to please not do that until this issue was resolved. Sadly, that email was ignored :( --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 27, 2016 at 07:43:07AM -0700, Andy Lutomirski wrote: > On Wed, Apr 27, 2016 at 7:38 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Wed, Apr 27, 2016 at 07:31:43AM -0700, Andy Lutomirski wrote: > >> On Wed, Apr 27, 2016 at 7:23 AM, Joerg Roedel <joro@8bytes.org> wrote: > >> > On Wed, Apr 27, 2016 at 04:37:04PM +0300, Michael S. Tsirkin wrote: > >> >> One correction: it's a feature of the device in the system. > >> >> There could be a mix of devices bypassing and not > >> >> bypassing the IOMMU. > >> > > >> > No, it really is not. A device can't chose to bypass the IOMMU. But the > >> > IOMMU can chose to let the device bypass. So any fix here belongs > >> > into the platform/iommu code too and not into some driver. > >> > > >> >> Sounds good. And a way to detect appropriate devices could > >> >> be by looking at the feature flag, perhaps? > >> > > >> > Again, no! The way to detect that is to look into the iommu description > >> > structures provided by the firmware. They provide everything necessary > >> > to tell the iommu code which devices are not translated. > >> > > >> > >> Except on PPC and SPARC. As far as I know, those are the only > >> problematic platforms. > >> > >> Is it too late to *disable* QEMU's q35-iommu thingy until it can be > >> fixed to report correct data in the DMAR tables? > >> > >> --Andy > > > > Meaning virtio or assigned devices? > > For virtio - it's way too late since these are working configurations. > > For assigned devices - they don't work on x86 so it doesn't have > > to be disabled, it's safe to ignore. > > I mean actually prevent QEMU from running in q35-iommu mode with any > virtio devices attached or maybe even turn off q35-iommu mode entirely > [1]. Doesn't it require that the user literally pass the word > "experimental" into QEMU right now? It did at some point IIRC. > > The reason I'm asking is that, other than q35-iommu, QEMU's virtio > devices *don't* bypass the IOMMU except on PPC and SPARC, simply > because there is no other configuration AFAICT that has virtio and and > IOMMU. So maybe the right solution is to fix q35-iommu to use DMAR > correctly (thus breaking q35-iommu users with older guest kernels, > which hopefully don't actually exist) and to come up with a PPC- and > SPARC-specific solution, or maybe OpenFirmware-specific solution, to > handle PPC and SPARC down the road. > > [1] I'm pretty sure I emailed the QEMU list before q35-iommu ever > showed up in a release asking the QEMU team to please not do that > until this issue was resolved. Sadly, that email was ignored :( > > --Andy Sorry, I didn't make myself clear. Point is, QEMU is not the only virtio implementation out there. So we can't know no virtio implementations have an IOMMU as long as linux supports this IOMMU. virtio always used physical addresses since it was born and if it changes that it must do this in a way that does not break existing users.
On Wed, Apr 27, 2016 at 05:34:30PM +0300, Michael S. Tsirkin wrote: > On Wed, Apr 27, 2016 at 04:23:32PM +0200, Joerg Roedel wrote: > QEMU can choose to bypass IOMMU for one device and not the other. > IOMMU in QEMU isn't involved when it's bypassed. And it is QEMU's task to tell the OS, right? And the correct way to do this is via the firmware ACPI tables. > Fine but this is beside the point. Almost all virtio devices > bypass IOMMU and what this patch does is create a way > to detect devices that don't. This code can maybe go into > platform. Again, the way to detect this is in platform code must not be device specific. This is what the DMAR and IVRS tables on x86 are for. When there is no way to do this in the firmware (or there is no firmware at all), we have to do a quirk in the platform code for it. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 27, 2016 at 05:54:57PM +0300, Michael S. Tsirkin wrote: > Point is, QEMU is not the only virtio implementation out there. > So we can't know no virtio implementations have an IOMMU as long as > linux supports this IOMMU. > virtio always used physical addresses since it was born and if it > changes that it must do this in a way that does not break existing > users. FWIW, virtio in qemu can continue to just use physical addresses. But qemu needs to advertise that fact correctly to the OS in the DMAR table. This way old kernels (where virtio does not use DMA-API) will also continue to work on the fixed qemu. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 27, 2016 at 04:56:32PM +0200, Joerg Roedel wrote: > On Wed, Apr 27, 2016 at 05:34:30PM +0300, Michael S. Tsirkin wrote: > > On Wed, Apr 27, 2016 at 04:23:32PM +0200, Joerg Roedel wrote: > > > QEMU can choose to bypass IOMMU for one device and not the other. > > IOMMU in QEMU isn't involved when it's bypassed. > > And it is QEMU's task to tell the OS, right? And the correct way to do > this is via the firmware ACPI tables. Going forward, this might be reasonable. Of course it didn't in the past and no one cared because virtio devices used physical addresses. We have to keep these setups working. > > Fine but this is beside the point. Almost all virtio devices > > bypass IOMMU and what this patch does is create a way > > to detect devices that don't. This code can maybe go into > > platform. > > Again, the way to detect this is in platform code must not be device > specific. This is what the DMAR and IVRS tables on x86 are for. > > When there is no way to do this in the firmware (or there is no firmware > at all), we have to do a quirk in the platform code for it. > > > > Joerg I really don't get it. There's exactly one device that works now and needs the work-around and so that we need to support, and that is virtio. It happens to have exactly the same issue on all platforms. Why would we want to work hard to build platform-specific solutions to a problem that can be solved in 5 lines of generic code?
On Wed, Apr 27, 2016 at 04:58:51PM +0200, Joerg Roedel wrote: > On Wed, Apr 27, 2016 at 05:54:57PM +0300, Michael S. Tsirkin wrote: > > Point is, QEMU is not the only virtio implementation out there. > > So we can't know no virtio implementations have an IOMMU as long as > > linux supports this IOMMU. > > virtio always used physical addresses since it was born and if it > > changes that it must do this in a way that does not break existing > > users. > > FWIW, virtio in qemu can continue to just use physical addresses. But > qemu needs to advertise that fact correctly to the OS in the DMAR table. > This way old kernels (where virtio does not use DMA-API) will also > continue to work on the fixed qemu. > > > > Joerg It's not clear it can do this since DMAR tables seem to assume a given slot is either bypassing IOMMU or going through it. QEMU allows reusing same slot for virtio and non-virtio devices. Besides, this patch is not about it, it's a flag for QEMU to tell guest that it can trust DMAR tables.
On Wed, Apr 27, 2016 at 7:54 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Wed, Apr 27, 2016 at 07:43:07AM -0700, Andy Lutomirski wrote: >> On Wed, Apr 27, 2016 at 7:38 AM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > On Wed, Apr 27, 2016 at 07:31:43AM -0700, Andy Lutomirski wrote: >> >> On Wed, Apr 27, 2016 at 7:23 AM, Joerg Roedel <joro@8bytes.org> wrote: >> >> > On Wed, Apr 27, 2016 at 04:37:04PM +0300, Michael S. Tsirkin wrote: >> >> >> One correction: it's a feature of the device in the system. >> >> >> There could be a mix of devices bypassing and not >> >> >> bypassing the IOMMU. >> >> > >> >> > No, it really is not. A device can't chose to bypass the IOMMU. But the >> >> > IOMMU can chose to let the device bypass. So any fix here belongs >> >> > into the platform/iommu code too and not into some driver. >> >> > >> >> >> Sounds good. And a way to detect appropriate devices could >> >> >> be by looking at the feature flag, perhaps? >> >> > >> >> > Again, no! The way to detect that is to look into the iommu description >> >> > structures provided by the firmware. They provide everything necessary >> >> > to tell the iommu code which devices are not translated. >> >> > >> >> >> >> Except on PPC and SPARC. As far as I know, those are the only >> >> problematic platforms. >> >> >> >> Is it too late to *disable* QEMU's q35-iommu thingy until it can be >> >> fixed to report correct data in the DMAR tables? >> >> >> >> --Andy >> > >> > Meaning virtio or assigned devices? >> > For virtio - it's way too late since these are working configurations. >> > For assigned devices - they don't work on x86 so it doesn't have >> > to be disabled, it's safe to ignore. >> >> I mean actually prevent QEMU from running in q35-iommu mode with any >> virtio devices attached or maybe even turn off q35-iommu mode entirely >> [1]. Doesn't it require that the user literally pass the word >> "experimental" into QEMU right now? It did at some point IIRC. >> >> The reason I'm asking is that, other than q35-iommu, QEMU's virtio >> devices *don't* bypass the IOMMU except on PPC and SPARC, simply >> because there is no other configuration AFAICT that has virtio and and >> IOMMU. So maybe the right solution is to fix q35-iommu to use DMAR >> correctly (thus breaking q35-iommu users with older guest kernels, >> which hopefully don't actually exist) and to come up with a PPC- and >> SPARC-specific solution, or maybe OpenFirmware-specific solution, to >> handle PPC and SPARC down the road. >> >> [1] I'm pretty sure I emailed the QEMU list before q35-iommu ever >> showed up in a release asking the QEMU team to please not do that >> until this issue was resolved. Sadly, that email was ignored :( >> >> --Andy > > Sorry, I didn't make myself clear. > Point is, QEMU is not the only virtio implementation out there. > So we can't know no virtio implementations have an IOMMU as long as > linux supports this IOMMU. > virtio always used physical addresses since it was born and if it > changes that it must do this in a way that does not break existing > users. Is there any non-QEMU virtio implementation can provide an IOMMU-bypassing virtio device on a platform that has a nontrivial IOMMU? --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2016-04-27 at 18:05 +0300, Michael S. Tsirkin wrote: > > I really don't get it. > > There's exactly one device that works now and needs the work-around and > so that we need to support, and that is virtio. It happens to have > exactly the same issue on all platforms. False. We have other devices which are currently *not* translated by the emulated IOMMU and which aren't going to be in the short term either. We also have other devices (emulated hardware NICs) to which precisely the same "we don't need protection" arguments apply, and which we *could* expose to the guest without an IOMMU translation if we really wanted to. It makes as much sense as exposing virtio without an IOMMU, going forward. > Why would we want to work hard to build platform-specific > solutions to a problem that can be solved in 5 lines of > generic code? Because it's a dirty hack in the *wrong* place.
On Wed, Apr 27, 2016 at 04:15:35PM +0100, David Woodhouse wrote: > On Wed, 2016-04-27 at 18:05 +0300, Michael S. Tsirkin wrote: > > > > I really don't get it. > > > > There's exactly one device that works now and needs the work-around and > > so that we need to support, and that is virtio. It happens to have > > exactly the same issue on all platforms. > > False. We have other devices which are currently *not* translated by > the emulated IOMMU and which aren't going to be in the short term > either. > > We also have other devices (emulated hardware NICs) to which precisely > the same "we don't need protection" arguments apply, and which we > *could* expose to the guest without an IOMMU translation if we really > wanted to. It makes as much sense as exposing virtio without an IOMMU, > going forward. The reasons for virtio are mostly dealing legacy. We don't need protection is a separate issue that I'd rather drop for now. > > Why would we want to work hard to build platform-specific > > solutions to a problem that can be solved in 5 lines of > > generic code? > > Because it's a dirty hack in the *wrong* place. No one came up with a better one so far :( > -- > dwmw2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2016-04-27 at 21:17 +0300, Michael S. Tsirkin wrote: > > > Because it's a dirty hack in the *wrong* place. > > No one came up with a better one so far :( Seriously? Take a look at drivers/iommu/intel-iommu.c. It has quirks for all kinds of shitty devices that have to be put in passthrough mode or otherwise excluded. We don't actually *need* it for the Intel IOMMU; all we need is for QEMU to stop lying in its DMAR tables. We *do* want the same kind of quirks in the relevant POWER and ARM IOMMU code in the kernel. Do that (hell, a simple quirk for all virtio devices will suffice, but NOT in the virtio driver) at the same moment you fix the virtio devices to use the DMA API. Job done. Some time *later* we can work on *refining* that quirk, and a way for QEMU to tell the guest (via something generic like fwcfg, maybe) that some devices are and aren't translated. Actually, I'm about to look at moving dma_ops into struct device and cleaning up the way we detect which IOMMU is attached, at device instantiation time. Perhaps I can shove the virtio-exception quirk in there while I'm at it...
On Wed, Apr 27, 2016 at 08:16:57PM +0100, David Woodhouse wrote: > On Wed, 2016-04-27 at 21:17 +0300, Michael S. Tsirkin wrote: > > > > > Because it's a dirty hack in the *wrong* place. > > > > No one came up with a better one so far :( > > Seriously? > > Take a look at drivers/iommu/intel-iommu.c. It has quirks for all kinds > of shitty devices that have to be put in passthrough mode or otherwise > excluded. I see work-arounds for broken IOMMUs but not for individual devices. Could you point me to a more specific example? > We don't actually *need* it for the Intel IOMMU; all we need is for > QEMU to stop lying in its DMAR tables. We need it for legacy QEMU anyway, and it's not easy for QEMU to stop lying about virtio, so we'll need it for a while. I think it's easy for QEMU to stop lying about assigned devices, so we don't need it for non-virtio devices. > We *do* want the same kind of quirks in the relevant POWER and ARM > IOMMU code in the kernel. Do that (hell, a simple quirk for all virtio > devices will suffice, but NOT in the virtio driver Sure - that works. It does not have to be in the driver. >) at the same moment > you fix the virtio devices to use the DMA API. Job done. > > Some time *later* we can work on *refining* that quirk, and a way for > QEMU to tell the guest (via something generic like fwcfg, maybe) that > some devices are and aren't translated. I don't see why how fwcfg can work here. It's a static thing, devices can come and go with hotplug. > Actually, I'm about to look at moving dma_ops into struct device and > cleaning up the way we detect which IOMMU is attached, at device > instantiation time. Perhaps I can shove the virtio-exception quirk in > there while I'm at it... Sounds good. > -- > dwmw2 > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2016-04-28 at 17:34 +0300, Michael S. Tsirkin wrote: > I see work-arounds for broken IOMMUs but not for > individual devices. Could you point me to a more specific > example? I think the closest example is probably quirk_ioat_snb_local_iommu(). If we see this particular device, we *know* what the topology actually looks like. We check the hardware setup, and if we're *not* being told the truth, then we stick it in bypass mode because we know it *isn't* actually being translated. Actually, that's almost *identical* to what we want, isn't it? Except instead of checking undocumented chipset registers, it wants to be checking "am I on a version of qemu known to lie about virtio being translated?" > > We don't actually *need* it for the Intel IOMMU; all we need is for > > QEMU to stop lying in its DMAR tables. > We need it for legacy QEMU anyway, and it's not easy for QEMU to stop > lying about virtio, so we'll need it for a while. > I think it's easy for QEMU to stop lying about assigned devices, > so we don't need it for non-virtio devices. Why is it easier for QEMU to tell the truth about assigned devices, than it is for virtio? Assuming they both remain actually untranslated for now, why's it easier to fix the DMAR table for one and not the other? (Implementing translation of assigned devices is on my list, but it's a long way off). > I don't see why how fwcfg can work here. It's a static thing, > devices can come and go with hotplug. This touches on something you said elsewhere, that it's painful/impossible to hot-unplug a translated device and hot-plug an untranslated device in the same slot (and vice versa). So let's assume for now that a given slot is indeed static, and either translated or untranslated. Like the DMAR table, the fwcfg can just give a list of slot which are (or aren't) translated. And then you can *only* add a translated device to a translated slot, or an untranslated device to an untranslated slot. All the internally-emulated devices *can* be either translated or untranslated. That's just a matter of software. Surely, you currently *can't* have translated assigned devices (until someone implements the whole VT-d page table shadowing or whatever), so you'll be barred from assigning a device to a slot which *previously* had an untranslated device. But so what? Put it in a different slot instead.
On Thu, Apr 28, 2016 at 04:11:54PM +0100, David Woodhouse wrote: > On Thu, 2016-04-28 at 17:34 +0300, Michael S. Tsirkin wrote: > > I see work-arounds for broken IOMMUs but not for > > individual devices. Could you point me to a more specific > > example? > > I think the closest example is probably quirk_ioat_snb_local_iommu(). OK, so for intel, it seems that it's enough to set pdev->dev.archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO; for the device. Do I have to poke at each iommu implementation to find a way to do this, or is there some way to do it portably? > If we see this particular device, we *know* what the topology actually > looks like. We check the hardware setup, and if we're *not* being told > the truth, then we stick it in bypass mode because we know it *isn't* > actually being translated. > > Actually, that's almost *identical* to what we want, isn't it? > > Except instead of checking undocumented chipset registers, it wants to > be checking "am I on a version of qemu known to lie about virtio being > translated?" Not exactly - I think that future versions of qemu might lie about some devices but not others. > > > We don't actually *need* it for the Intel IOMMU; all we need is for > > > QEMU to stop lying in its DMAR tables. > > We need it for legacy QEMU anyway, and it's not easy for QEMU to stop > > lying about virtio, so we'll need it for a while. > > I think it's easy for QEMU to stop lying about assigned devices, > > so we don't need it for non-virtio devices. > > Why is it easier for QEMU to tell the truth about assigned devices, > than it is for virtio? Assuming they both remain actually untranslated > for now, why's it easier to fix the DMAR table for one and not the > other? > > (Implementing translation of assigned devices is on my list, but it's a > long way off). DMAR is unfortunately not a good match for what people do with QEMU. There is a patchset on list fixing translation of assigned devices. So the fix for these will simply be to do translation for all assigned devices. It's harder for virtio as it isn't always processed in QEMU - there's vhost in kernel and an out of process vhost-user plugin. So we can end up e.g. with modern QEMU which does translate in-process virtio but not out of process one. > > I don't see why how fwcfg can work here. It's a static thing, > > devices can come and go with hotplug. > > This touches on something you said elsewhere, that it's > painful/impossible to hot-unplug a translated device and hot-plug an > untranslated device in the same slot (and vice versa). > > So let's assume for now that a given slot is indeed static, and either > translated or untranslated. Like the DMAR table, the fwcfg can just > give a list of slot which are (or aren't) translated. > > And then you can *only* add a translated device to a translated slot, > or an untranslated device to an untranslated slot. > > All the internally-emulated devices *can* be either translated or > untranslated. That's just a matter of software. Surely, you currently > *can't* have translated assigned devices (until someone implements the > whole VT-d page table shadowing or whatever), so you'll be barred from > assigning a device to a slot which *previously* had an untranslated > device. But so what? Put it in a different slot instead. Unfortunately people got used to be able to put any device in any slot, and built external tools around that ability. It's rather painful to break this assumption. > -- > dwmw2 > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2016-04-28 at 18:37 +0300, Michael S. Tsirkin wrote: > OK, so for intel, it seems that it's enough to set > pdev->dev.archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO; > for the device. Yes, currently. Although that's vile. In fact what we *want* to happen is for the intel-iommu code simply to decline to provide DMA ops for this device, and let it fall back to the swiotlb or no-op DMA ops, as appropriate. As it is, we have the intel-iommu DMA ops *unconditionally, and they have a hack to manually fall back to calling swiotlb. It's all just horrid, which is why I want to clean it up with nice per-device DMA ops and discovery thereof :) > Do I have to poke at each iommu implementation to find > a way to do this, or is there some way to do it > portably? There *will* be.... Christoph has already done some of the cleanup in this space, and I need to take stock of what he's already done, and finish off the parts I want to build on top of it. > Not exactly - I think that future versions of qemu might lie > about some devices but not others. Can we keep this simple? QEMU currently lies about some devices. Let's implement a heuristic for the guest OS to know about that, and react accordingly. Then let's fix QEMU to tell the truth. All the time, unconditionally. Even on POWER/ARM where there's no obvious *way* for it to tell the truth (because you don't have the flexibility that DMAR tables do), and we need to devise a way to put it in the device-tree or fwcfg or something else. And only once QEMU consistently tells the *truth*, then we can start to do new stuff and let it actually change its behaviour. > DMAR is unfortunately not a good match for what people do with QEMU. > > There is a patchset on list fixing translation of assigned > devices. So the fix for these will simply be to do translation for > all assigned devices. It's harder for virtio as it isn't always > processed in QEMU - there's vhost in kernel and an out of process > vhost-user plugin. So we can end up e.g. with modern QEMU which > does translate in-process virtio but not out of process one. Right... just stop. Fix QEMU to tell the truth first, and *then* once we can trust it, we can start to change its behaviour. :) > Unfortunately people got used to be able to put any device > in any slot, and built external tools around that ability. > It's rather painful to break this assumption. Well, if you just said you have a patch set which allows translation of assigned devices then you are most of the way there, aren't you? We just need to fix the out-of-process virtio case, and everything can be either translated or untranslated?
On Thu, Apr 28, 2016 at 04:48:25PM +0100, David Woodhouse wrote: > On Thu, 2016-04-28 at 18:37 +0300, Michael S. Tsirkin wrote: > > OK, so for intel, it seems that it's enough to set > > pdev->dev.archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO; > > for the device. > > Yes, currently. Although that's vile. In fact what we *want* to happen > is for the intel-iommu code simply to decline to provide DMA ops for > this device, and let it fall back to the swiotlb or no-op DMA ops, as > appropriate. > > As it is, we have the intel-iommu DMA ops *unconditionally, and they > have a hack to manually fall back to calling swiotlb. It's all just > horrid, which is why I want to clean it up with nice per-device DMA ops > and discovery thereof :) > > > Do I have to poke at each iommu implementation to find > > a way to do this, or is there some way to do it > > portably? > > There *will* be.... Christoph has already done some of the cleanup in > this space, and I need to take stock of what he's already done, and > finish off the parts I want to build on top of it. > > > Not exactly - I think that future versions of qemu might lie > > about some devices but not others. > > Can we keep this simple? > > QEMU currently lies about some devices. Let's implement a heuristic for > the guest OS to know about that, and react accordingly. > > Then let's fix QEMU to tell the truth. All the time, unconditionally. > Even on POWER/ARM where there's no obvious *way* for it to tell the > truth (because you don't have the flexibility that DMAR tables do), and > we need to devise a way to put it in the device-tree or fwcfg or > something else. Right. Unfortunately all these aren't easy to implement at all. So I'm inclined to go the "something else" route. It has the added benefit of giving us a heuristic for free. > And only once QEMU consistently tells the *truth*, then we can start to > do new stuff and let it actually change its behaviour. > > > DMAR is unfortunately not a good match for what people do with QEMU. > > > > There is a patchset on list fixing translation of assigned > > devices. So the fix for these will simply be to do translation for > > all assigned devices. It's harder for virtio as it isn't always > > processed in QEMU - there's vhost in kernel and an out of process > > vhost-user plugin. So we can end up e.g. with modern QEMU which > > does translate in-process virtio but not out of process one. > > Right... just stop. Fix QEMU to tell the truth first, and *then* once > we can trust it, we can start to change its behaviour. :) > > > Unfortunately people got used to be able to put any device > > in any slot, and built external tools around that ability. > > It's rather painful to break this assumption. > > Well, if you just said you have a patch set which allows translation of > assigned devices then you are most of the way there, aren't you? We > just need to fix the out-of-process virtio case, and everything can be > either translated or untranslated? Absolutely. But that "just" will take a while. With out of process there's always a chance that remote doesn't implement translation. E.g. new QEMU running on an old host kernel. > -- > dwmw2 > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 28/04/2016 17:37, Michael S. Tsirkin wrote: > > All the internally-emulated devices *can* be either translated or > > untranslated. That's just a matter of software. Surely, you currently > > *can't* have translated assigned devices (until someone implements the > > whole VT-d page table shadowing or whatever), so you'll be barred from > > assigning a device to a slot which *previously* had an untranslated > > device. But so what? Put it in a different slot instead. > > Unfortunately people got used to be able to put any device > in any slot, and built external tools around that ability. > It's rather painful to break this assumption. Once you move to PCIe, a lot of things become more complicated. This is just one of them; instead of needing half a dozen PCI bridges, you'll need half a dozen plus one. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h index 967cc75..bb6f34e 100644 --- a/include/hw/virtio/virtio-access.h +++ b/include/hw/virtio/virtio-access.h @@ -23,7 +23,8 @@ static inline AddressSpace *virtio_get_dma_as(VirtIODevice *vdev) BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); - if (k->get_dma_as) { + if (virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) && + k->get_dma_as) { return k->get_dma_as(qbus->parent); } return &address_space_memory; diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index b12faa9..44f3788 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -228,7 +228,9 @@ typedef struct VirtIORNGConf VirtIORNGConf; DEFINE_PROP_BIT64("notify_on_empty", _state, _field, \ VIRTIO_F_NOTIFY_ON_EMPTY, true), \ DEFINE_PROP_BIT64("any_layout", _state, _field, \ - VIRTIO_F_ANY_LAYOUT, true) + VIRTIO_F_ANY_LAYOUT, true), \ + DEFINE_PROP_BIT64("iommu_platform", _state, _field, \ + VIRTIO_F_IOMMU_PLATFORM, false) hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n); hwaddr virtio_queue_get_avail_addr(VirtIODevice *vdev, int n); diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h index bcc445b..3fcfbb1 100644 --- a/include/standard-headers/linux/virtio_config.h +++ b/include/standard-headers/linux/virtio_config.h @@ -61,4 +61,6 @@ /* v1.0 compliant. */ #define VIRTIO_F_VERSION_1 32 +/* Do not bypass the IOMMU (if configured) */ +#define VIRTIO_F_IOMMU_PLATFORM 34 #endif /* _LINUX_VIRTIO_CONFIG_H */
This adds a flag to enable/disable bypassing the IOMMU by virtio devices. This is on top of patch http://article.gmane.org/gmane.comp.emulators.qemu/403467 virtio: convert to use DMA api Tested with patchset http://article.gmane.org/gmane.linux.kernel.virtualization/27545 virtio-pci: iommu support (note: bit number has been kept at 34 intentionally to match posted guest code. a non-RFC version will renumber bits to be contigious). changes from v1: drop PASSTHROUGH flag The interaction between virtio and DMA API is messy. On most systems with virtio, physical addresses match bus addresses, and it doesn't particularly matter whether we use the DMA API. On some systems, including Xen and any system with a physical device that speaks virtio behind a physical IOMMU, we must use the DMA API for virtio DMA to work at all. Add a feature bit to detect that: VIRTIO_F_IOMMU_PLATFORM. If not there, we preserve historic behavior and bypass the DMA API unless within Xen guest. This is actually required for systems, including SPARC and PPC64, where virtio-pci devices are enumerated as though they are behind an IOMMU, but the virtio host ignores the IOMMU, so we must either pretend that the IOMMU isn't there or somehow map everything as the identity. Re: non-virtio devices. It turns out that on old QEMU hosts, only emulated devices which were part of QEMU use the IOMMU. Should we want to bypass the IOMMU for such devices *only*, it would be rather easy to detect them by looking at subsystem vendor and device ID. Thus, no new interfaces are required except for virtio which always uses the same subsystem vendor and device ID. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- include/hw/virtio/virtio-access.h | 3 ++- include/hw/virtio/virtio.h | 4 +++- include/standard-headers/linux/virtio_config.h | 2 ++ 3 files changed, 7 insertions(+), 2 deletions(-)