diff mbox series

[V7,vfio,9/9] vfio/virtio: Introduce a vfio driver over virtio devices

Message ID 20231207102820.74820-10-yishaih@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Introduce a vfio driver over virtio devices | expand

Commit Message

Yishai Hadas Dec. 7, 2023, 10:28 a.m. UTC
Introduce a vfio driver over virtio devices to support the legacy
interface functionality for VFs.

Background, from the virtio spec [1].
--------------------------------------------------------------------
In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.

For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.
--------------------------------------------------------------------

Specifically, this driver adds support for a virtio-net VF to be exposed
as a transitional device to a guest driver and allows the legacy IO BAR
functionality on top.

This allows a VM which uses a legacy virtio-net driver in the guest to
work transparently over a VF which its driver in the host is that new
driver.

The driver can be extended easily to support some other types of virtio
devices (e.g virtio-blk), by adding in a few places the specific type
properties as was done for virtio-net.

For now, only the virtio-net use case was tested and as such we introduce
the support only for such a device.

Practically,
Upon probing a VF for a virtio-net device, in case its PF supports
legacy access over the virtio admin commands and the VF doesn't have BAR
0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
transitional device with I/O BAR in BAR 0.

The existence of the simulated I/O bar is reported later on by
overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
exposes itself as a transitional device by overwriting some properties
upon reading its config space.

Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
guest may use it via read/write calls according to the virtio
specification.

Any read/write towards the control parts of the BAR will be captured by
the new driver and will be translated into admin commands towards the
device.

Any data path read/write access (i.e. virtio driver notifications) will
be forwarded to the physical BAR which its properties were supplied by
the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
probing/init flow.

With that code in place a legacy driver in the guest has the look and
feel as if having a transitional device with legacy support for both its
control and data path flows.

[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 MAINTAINERS                      |   7 +
 drivers/vfio/pci/Kconfig         |   2 +
 drivers/vfio/pci/Makefile        |   2 +
 drivers/vfio/pci/virtio/Kconfig  |  16 +
 drivers/vfio/pci/virtio/Makefile |   4 +
 drivers/vfio/pci/virtio/main.c   | 567 +++++++++++++++++++++++++++++++
 6 files changed, 598 insertions(+)
 create mode 100644 drivers/vfio/pci/virtio/Kconfig
 create mode 100644 drivers/vfio/pci/virtio/Makefile
 create mode 100644 drivers/vfio/pci/virtio/main.c

Comments

Tian, Kevin Dec. 13, 2023, 8:23 a.m. UTC | #1
> From: Yishai Hadas <yishaih@nvidia.com>
> Sent: Thursday, December 7, 2023 6:28 PM
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
> probing/init flow.

this is still captured by the new driver. Just the difference between
using admin cmds vs. directly accessing bar when emulating the access.

> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO NET PCI devices"
> +        depends on VIRTIO_PCI
> +        select VFIO_PCI_CORE
> +        help
> +          This provides support for exposing VIRTIO NET VF devices which
> support
> +          legacy IO access, using the VFIO framework that can work with a
> legacy
> +          virtio driver in the guest.
> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> +          not indicate I/O Space.

"thus, ..." duplicates with the former part.

> +          As of that this driver emulated I/O BAR in software to let a VF be

s/emulated/emulates/

> +          seen as a transitional device in the guest and let it work with
> +          a legacy driver.

VFIO is not specific to the guest. a native application including a legacy
virtio driver could also benefit. let's not write it in a way specific to virt.

> +
> +static int
> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> +			    loff_t pos, char __user *buf,
> +			    size_t count, bool read)

this name only talks about the behavior for VIRTIO_PCI_QUEUE_NOTIFY.

for legacy admin cmd it's unclear whether it's actually conveyed to a
mem bar.

is it clearer to call it virtiovf_pci_bar0_rw()?

> +
> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> +					char __user *buf, size_t count,
> +					loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	size_t register_offset;
> +	loff_t copy_offset;
> +	size_t copy_count;
> +	__le32 val32;
> +	__le16 val16;
> +	u8 val8;
> +	int ret;
> +
> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> +				  &copy_offset, &copy_count,
> &register_offset)) {
> +		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
> +		if (copy_to_user(buf + copy_offset, (void *)&val16 +
> register_offset, copy_count))
> +			return -EFAULT;
> +	}
> +
> +	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
> +				  &copy_offset, &copy_count,
> &register_offset)) {
> +		if (copy_from_user((void *)&val16 + register_offset, buf +
> copy_offset,
> +				   copy_count))
> +			return -EFAULT;
> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> +		if (copy_to_user(buf + copy_offset, (void *)&val16 +
> register_offset,
> +				 copy_count))
> +			return -EFAULT;
> +	}

the write handler calls vfio_pci_core_write() for PCI_COMMAND so
the core vconfig should have the latest copy of the IO bit value which
is copied to the user buffer by vfio_pci_core_read(). then not necessary
to update it again.

btw the approach in this patch sounds a bit hackish - it modifies the
result before/after vfio pci core emulation instead of directly injecting
its specific emulation logic in vfio vconfig. It's probably being that
vfio vconfig currently has a global permission/handler scheme for
all pci devices. Extending it to support per-device tweak might need
lots of change.

So I'm not advocating that big change at this point, especially when
only this driver imposes such requirement now. But in the future when
more drivers e.g. Ankit's nvgrace-gpu want to do similar tweak we
may consider such possibility.

> +
> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
> sizeof(val32),
> +				  &copy_offset, &copy_count,
> &register_offset)) {
> +		u32 bar_mask = ~(virtvdev->bar0_virtual_buf_size - 1);
> +		u32 pci_base_addr_0 = le32_to_cpu(virtvdev-
> >pci_base_addr_0);
> +
> +		val32 = cpu_to_le32((pci_base_addr_0 & bar_mask) |
> PCI_BASE_ADDRESS_SPACE_IO);
> +		if (copy_to_user(buf + copy_offset, (void *)&val32 +
> register_offset, copy_count))
> +			return -EFAULT;
> +	}

Do we care about the initial value of bar0? this patch leaves it as 0,
unlike other real bars initialized with the hw value. In reality this
may not be a problem as software usually writes all 1's to detect
the size as the first step.

raise it just in case others may see a potential issue.

> +
> +static ssize_t
> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user
> *buf,
> +			size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> +		size_t register_offset;
> +		loff_t copy_offset;
> +		size_t copy_count;
> +
> +		if (range_intersect_range(pos, count, PCI_COMMAND,
> sizeof(virtvdev->pci_cmd),
> +					  &copy_offset, &copy_count,
> +					  &register_offset)) {
> +			if (copy_from_user((void *)&virtvdev->pci_cmd +
> register_offset,
> +					   buf + copy_offset,
> +					   copy_count))
> +				return -EFAULT;
> +		}
> +
> +		if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
> +					  sizeof(virtvdev->pci_base_addr_0),
> +					  &copy_offset, &copy_count,
> +					  &register_offset)) {
> +			if (copy_from_user((void *)&virtvdev-
> >pci_base_addr_0 + register_offset,
> +					   buf + copy_offset,
> +					   copy_count))
> +				return -EFAULT;
> +		}
> +	}

wrap above into virtiovf_pci_write_config() to be symmetric with
the read path.

> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev;
> +	int ret;
> +
> +	ret = vfio_pci_core_init_dev(core_vdev);
> +	if (ret)
> +		return ret;
> +
> +	pdev = virtvdev->core_device.pdev;
> +	ret = virtiovf_read_notify_info(virtvdev);
> +	if (ret)
> +		return ret;
> +
> +	/* Being ready with a buffer that supports MSIX */
> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
> +				virtiovf_get_device_config_size(pdev-
> >device);

which code is relevant to MSIX?


> +
> +static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
> +	.name = "virtio-vfio-pci",
> +	.init = vfio_pci_core_init_dev,
> +	.release = vfio_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,

could be vfio_pci_core_open_device(). Given virtiovf specific init func
is not called  virtiovf_pci_open_device() is essentially same as the
core func.

> +
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
> +	    !virtiovf_bar0_exists(pdev))
> +		ops = &virtiovf_vfio_pci_tran_ops;

I have a confusion here.

why do we want to allow this driver binding to non-matching VF or
even PF?

if that is the intention then the naming/description should be adjusted
to not specific to vf throughout this patch.

e.g. don't use "virtiovf_" prefix...

the config option is generic:

+config VIRTIO_VFIO_PCI
+        tristate "VFIO support for VIRTIO NET PCI devices"

but the description is specific to vf:

+          This provides support for exposing VIRTIO NET VF devices which support
+          legacy IO access, using the VFIO framework that can work with a legacy
+          virtio driver in the guest.

then the module description is generic again:

+MODULE_DESCRIPTION(
+	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");
Yishai Hadas Dec. 13, 2023, 12:25 p.m. UTC | #2
On 13/12/2023 10:23, Tian, Kevin wrote:
>> From: Yishai Hadas <yishaih@nvidia.com>
>> Sent: Thursday, December 7, 2023 6:28 PM
>>
>> Any read/write towards the control parts of the BAR will be captured by
>> the new driver and will be translated into admin commands towards the
>> device.
>>
>> Any data path read/write access (i.e. virtio driver notifications) will
>> be forwarded to the physical BAR which its properties were supplied by
>> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
>> probing/init flow.
> 
> this is still captured by the new driver. Just the difference between
> using admin cmds vs. directly accessing bar when emulating the access.

OK, I can rephrase the above to clarify that.

> 
>> +config VIRTIO_VFIO_PCI
>> +        tristate "VFIO support for VIRTIO NET PCI devices"
>> +        depends on VIRTIO_PCI
>> +        select VFIO_PCI_CORE
>> +        help
>> +          This provides support for exposing VIRTIO NET VF devices which
>> support
>> +          legacy IO access, using the VFIO framework that can work with a
>> legacy
>> +          virtio driver in the guest.
>> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
>> +          not indicate I/O Space.
> 
> "thus, ..." duplicates with the former part.
> 
>> +          As of that this driver emulated I/O BAR in software to let a VF be
> 
> s/emulated/emulates/

OK
> 
>> +          seen as a transitional device in the guest and let it work with
>> +          a legacy driver.
> 
> VFIO is not specific to the guest. a native application including a legacy
> virtio driver could also benefit. let's not write it in a way specific to virt.
> 

OK, we can rephrase to the below.
" .. to let a VF be seen as a transitional device by its users and .."


>> +
>> +static int
>> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
>> +			    loff_t pos, char __user *buf,
>> +			    size_t count, bool read)
> 
> this name only talks about the behavior for VIRTIO_PCI_QUEUE_NOTIFY.
> 
> for legacy admin cmd it's unclear whether it's actually conveyed to a
> mem bar.
> 
> is it clearer to call it virtiovf_pci_bar0_rw()?

I'm fine with your rename suggestion, will be part of V8.

> 
>> +
>> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
>> +					char __user *buf, size_t count,
>> +					loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	size_t register_offset;
>> +	loff_t copy_offset;
>> +	size_t copy_count;
>> +	__le32 val32;
>> +	__le16 val16;
>> +	u8 val8;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
>> +				  &copy_offset, &copy_count,
>> &register_offset)) {
>> +		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 +
>> register_offset, copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
>> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
>> +				  &copy_offset, &copy_count,
>> &register_offset)) {
>> +		if (copy_from_user((void *)&val16 + register_offset, buf +
>> copy_offset,
>> +				   copy_count))
>> +			return -EFAULT;
>> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 +
>> register_offset,
>> +				 copy_count))
>> +			return -EFAULT;
>> +	}
> 
> the write handler calls vfio_pci_core_write() for PCI_COMMAND so
> the core vconfig should have the latest copy of the IO bit value which
> is copied to the user buffer by vfio_pci_core_read(). then not necessary
> to update it again.

You assume the the 'vconfig' mechanism/flow is always applicable for 
that specific field, this should be double-checked.
However, as for now the driver doesn't rely / use the vconfig for other 
fields as it doesn't match and need a big refactor, I prefer to not rely 
on it at all and have it here.

> 
> btw the approach in this patch sounds a bit hackish - it modifies the
> result before/after vfio pci core emulation instead of directly injecting
> its specific emulation logic in vfio vconfig. It's probably being that
> vfio vconfig currently has a global permission/handler scheme for
> all pci devices. Extending it to support per-device tweak might need
> lots of change.

Right, the vconfig is not ready for that and might require lots of 
change, for now all is done by the driver layer.

> 
> So I'm not advocating that big change at this point, especially when
> only this driver imposes such requirement now. But in the future when
> more drivers e.g. Ankit's nvgrace-gpu want to do similar tweak we
> may consider such possibility.

This can be some orthogonal future refactoring once we'll have it.

> 
>> +
>> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
>> sizeof(val32),
>> +				  &copy_offset, &copy_count,
>> &register_offset)) {
>> +		u32 bar_mask = ~(virtvdev->bar0_virtual_buf_size - 1);
>> +		u32 pci_base_addr_0 = le32_to_cpu(virtvdev-
>>> pci_base_addr_0);
>> +
>> +		val32 = cpu_to_le32((pci_base_addr_0 & bar_mask) |
>> PCI_BASE_ADDRESS_SPACE_IO);
>> +		if (copy_to_user(buf + copy_offset, (void *)&val32 +
>> register_offset, copy_count))
>> +			return -EFAULT;
>> +	}
> 
> Do we care about the initial value of bar0? this patch leaves it as 0,
> unlike other real bars initialized with the hw value. In reality this
> may not be a problem as software usually writes all 1's to detect
> the size as the first step.

> 
> raise it just in case others may see a potential issue.

We don't see here an issue as you mentioned above.

> 
>> +
>> +static ssize_t
>> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user
>> *buf,
>> +			size_t count, loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +
>> +	if (!count)
>> +		return 0;
>> +
>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
>> +		size_t register_offset;
>> +		loff_t copy_offset;
>> +		size_t copy_count;
>> +
>> +		if (range_intersect_range(pos, count, PCI_COMMAND,
>> sizeof(virtvdev->pci_cmd),
>> +					  &copy_offset, &copy_count,
>> +					  &register_offset)) {
>> +			if (copy_from_user((void *)&virtvdev->pci_cmd +
>> register_offset,
>> +					   buf + copy_offset,
>> +					   copy_count))
>> +				return -EFAULT;
>> +		}
>> +
>> +		if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
>> +					  sizeof(virtvdev->pci_base_addr_0),
>> +					  &copy_offset, &copy_count,
>> +					  &register_offset)) {
>> +			if (copy_from_user((void *)&virtvdev-
>>> pci_base_addr_0 + register_offset,
>> +					   buf + copy_offset,
>> +					   copy_count))
>> +				return -EFAULT;
>> +		}
>> +	}
> 
> wrap above into virtiovf_pci_write_config() to be symmetric with
> the read path.

Upon the read path, we do the full flow and return to the user. Here we 
just save some data and continue to call the core, so I'm not sure that 
this worth a dedicated function.

However, this can be done, do you still suggest it for V8 ?

> 
>> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_init_dev(core_vdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	pdev = virtvdev->core_device.pdev;
>> +	ret = virtiovf_read_notify_info(virtvdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* Being ready with a buffer that supports MSIX */
>> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
>> +				virtiovf_get_device_config_size(pdev-
>>> device);
> 
> which code is relevant to MSIX?

The buffer size must include the MSIX part to match the virtio-net 
specification.

As part of virtiovf_issue_legacy_rw_cmd() we may use the full buffer 
upon read/write.

> 
> 
>> +
>> +static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
>> +	.name = "virtio-vfio-pci",
>> +	.init = vfio_pci_core_init_dev,
>> +	.release = vfio_pci_core_release_dev,
>> +	.open_device = virtiovf_pci_open_device,
> 
> could be vfio_pci_core_open_device(). Given virtiovf specific init func
> is not called  virtiovf_pci_open_device() is essentially same as the
> core func.

We don't have today vfio_pci_core_open_device() as an exported function.

The virtiovf_pci_open_device() matches both cases so I don't see a real 
reason to export it now.

By the way, it follows other drivers in vfio, see for example here [1].

[1] 
https://elixir.bootlin.com/linux/v6.7-rc5/source/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c#L1383


> 
>> +
>> +static int virtiovf_pci_probe(struct pci_dev *pdev,
>> +			      const struct pci_device_id *id)
>> +{
>> +	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
>> +	struct virtiovf_pci_core_device *virtvdev;
>> +	int ret;
>> +
>> +	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
>> +	    !virtiovf_bar0_exists(pdev))
>> +		ops = &virtiovf_vfio_pci_tran_ops;
> 
> I have a confusion here.
> 
> why do we want to allow this driver binding to non-matching VF or
> even PF?

The intention is to allow the binding of any virtio-net device (i.e. PF, 
VF which is not transitional capable) to have a single driver over VFIO 
for all virtio-net devices.

This enables any user space application to bind and use any virtio-net 
device without the need to care.

In case the device is not transitional capable, it will simply use the 
generic vfio functionality.

> 
> if that is the intention then the naming/description should be adjusted
> to not specific to vf throughout this patch.
> 
> e.g. don't use "virtiovf_" prefix...

The main functionality is to supply the transitional device to user 
space for the VF, hence the prefix and the description for that driver 
refers to VF.

Let's stay with that.

> 
> the config option is generic:
> 
> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO NET PCI devices"
> 
> but the description is specific to vf:
> 
> +          This provides support for exposing VIRTIO NET VF devices which support
> +          legacy IO access, using the VFIO framework that can work with a legacy
> +          virtio driver in the guest.
> 
> then the module description is generic again:
> 
> +MODULE_DESCRIPTION(
> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");
> 

Yes, as the binding allows that, it looks fine to me.

Thanks,
Yishai
Alex Williamson Dec. 13, 2023, 8:23 p.m. UTC | #3
On Wed, 13 Dec 2023 14:25:10 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 13/12/2023 10:23, Tian, Kevin wrote:
 
> >> +
> >> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> >> +			      const struct pci_device_id *id)
> >> +{
> >> +	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
> >> +	struct virtiovf_pci_core_device *virtvdev;
> >> +	int ret;
> >> +
> >> +	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
> >> +	    !virtiovf_bar0_exists(pdev))
> >> +		ops = &virtiovf_vfio_pci_tran_ops;  
> > 
> > I have a confusion here.
> > 
> > why do we want to allow this driver binding to non-matching VF or
> > even PF?  
> 
> The intention is to allow the binding of any virtio-net device (i.e. PF, 
> VF which is not transitional capable) to have a single driver over VFIO 
> for all virtio-net devices.
> 
> This enables any user space application to bind and use any virtio-net 
> device without the need to care.
> 
> In case the device is not transitional capable, it will simply use the 
> generic vfio functionality.

The algorithm we've suggested for finding the most appropriate variant
driver for the device doesn't include a step of moving on to another
driver if the binding fails.  We lose determinism at that point.
Therefore this driver needs to handle all devices matching the id table.
The fact that virtio dictates various config space fields limits our
ability to refine the match from the id table. Thanks,

Alex
Tian, Kevin Dec. 14, 2023, 5:52 a.m. UTC | #4
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, December 14, 2023 4:24 AM
> 
> On Wed, 13 Dec 2023 14:25:10 +0200
> Yishai Hadas <yishaih@nvidia.com> wrote:
> 
> > On 13/12/2023 10:23, Tian, Kevin wrote:
> 
> > >> +
> > >> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> > >> +			      const struct pci_device_id *id)
> > >> +{
> > >> +	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
> > >> +	struct virtiovf_pci_core_device *virtvdev;
> > >> +	int ret;
> > >> +
> > >> +	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
> > >> +	    !virtiovf_bar0_exists(pdev))
> > >> +		ops = &virtiovf_vfio_pci_tran_ops;
> > >
> > > I have a confusion here.
> > >
> > > why do we want to allow this driver binding to non-matching VF or
> > > even PF?
> >
> > The intention is to allow the binding of any virtio-net device (i.e. PF,
> > VF which is not transitional capable) to have a single driver over VFIO
> > for all virtio-net devices.
> >
> > This enables any user space application to bind and use any virtio-net
> > device without the need to care.
> >
> > In case the device is not transitional capable, it will simply use the
> > generic vfio functionality.
> 
> The algorithm we've suggested for finding the most appropriate variant
> driver for the device doesn't include a step of moving on to another
> driver if the binding fails.  We lose determinism at that point.
> Therefore this driver needs to handle all devices matching the id table.
> The fact that virtio dictates various config space fields limits our
> ability to refine the match from the id table. Thanks,
> 

OK, that makes sense.
Tian, Kevin Dec. 14, 2023, 6:07 a.m. UTC | #5
> From: Yishai Hadas <yishaih@nvidia.com>
> Sent: Wednesday, December 13, 2023 8:25 PM
> 
> On 13/12/2023 10:23, Tian, Kevin wrote:
> >> From: Yishai Hadas <yishaih@nvidia.com>
> >> Sent: Thursday, December 7, 2023 6:28 PM
> >>
> >> +
> >> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> >> +					char __user *buf, size_t count,
> >> +					loff_t *ppos)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +	size_t register_offset;
> >> +	loff_t copy_offset;
> >> +	size_t copy_count;
> >> +	__le32 val32;
> >> +	__le16 val16;
> >> +	u8 val8;
> >> +	int ret;
> >> +
> >> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> >> +	if (ret < 0)
> >> +		return ret;
> >> +
> >> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> >> +				  &copy_offset, &copy_count,
> >> &register_offset)) {
> >> +		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
> >> +		if (copy_to_user(buf + copy_offset, (void *)&val16 +
> >> register_offset, copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
> >> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
> >> +				  &copy_offset, &copy_count,
> >> &register_offset)) {
> >> +		if (copy_from_user((void *)&val16 + register_offset, buf +
> >> copy_offset,
> >> +				   copy_count))
> >> +			return -EFAULT;
> >> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> >> +		if (copy_to_user(buf + copy_offset, (void *)&val16 +
> >> register_offset,
> >> +				 copy_count))
> >> +			return -EFAULT;
> >> +	}
> >
> > the write handler calls vfio_pci_core_write() for PCI_COMMAND so
> > the core vconfig should have the latest copy of the IO bit value which
> > is copied to the user buffer by vfio_pci_core_read(). then not necessary
> > to update it again.
> 
> You assume the the 'vconfig' mechanism/flow is always applicable for
> that specific field, this should be double-checked.
> However, as for now the driver doesn't rely / use the vconfig for other
> fields as it doesn't match and need a big refactor, I prefer to not rely
> on it at all and have it here.

iiuc this driver does relies on vconfig for other fields. It first calls
vfio_pci_core_read() and then modifies selected fields which needs
special tweak in this driver.

btw what virtio-net specific tweak is applied to PCI_COMMAND? You
basically cache the cmd value and then set PCI_COMMAND_IO if
it's set in the cached value. But this is already covered by pci
vconfig. If anything is broken there then we already have a big
trouble.

The trick for bar0 makes sense as it doesn't exist physically then
the vconfig mechanism may not give the expected result. But
I didn't see the same rationale for PCI_COMMAND.

> >> +
> >> +static ssize_t
> >> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user
> >> *buf,
> >> +			size_t count, loff_t *ppos)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> >> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +
> >> +	if (!count)
> >> +		return 0;
> >> +
> >> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> >> +		size_t register_offset;
> >> +		loff_t copy_offset;
> >> +		size_t copy_count;
> >> +
> >> +		if (range_intersect_range(pos, count, PCI_COMMAND,
> >> sizeof(virtvdev->pci_cmd),
> >> +					  &copy_offset, &copy_count,
> >> +					  &register_offset)) {
> >> +			if (copy_from_user((void *)&virtvdev->pci_cmd +
> >> register_offset,
> >> +					   buf + copy_offset,
> >> +					   copy_count))
> >> +				return -EFAULT;
> >> +		}
> >> +
> >> +		if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
> >> +					  sizeof(virtvdev->pci_base_addr_0),
> >> +					  &copy_offset, &copy_count,
> >> +					  &register_offset)) {
> >> +			if (copy_from_user((void *)&virtvdev-
> >>> pci_base_addr_0 + register_offset,
> >> +					   buf + copy_offset,
> >> +					   copy_count))
> >> +				return -EFAULT;
> >> +		}
> >> +	}
> >
> > wrap above into virtiovf_pci_write_config() to be symmetric with
> > the read path.
> 
> Upon the read path, we do the full flow and return to the user. Here we
> just save some data and continue to call the core, so I'm not sure that
> this worth a dedicated function.

I don't see a real difference.

for the read path you first call vfio_pci_core_read() then modifies some
fields.

for the write path you save some data then call vfio_pci_core_write().

> 
> However, this can be done, do you still suggest it for V8 ?

yes

> >> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	struct pci_dev *pdev;
> >> +	int ret;
> >> +
> >> +	ret = vfio_pci_core_init_dev(core_vdev);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	pdev = virtvdev->core_device.pdev;
> >> +	ret = virtiovf_read_notify_info(virtvdev);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	/* Being ready with a buffer that supports MSIX */
> >> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
> >> +				virtiovf_get_device_config_size(pdev-
> >>> device);
> >
> > which code is relevant to MSIX?
> 
> The buffer size must include the MSIX part to match the virtio-net
> specification.
> 
> As part of virtiovf_issue_legacy_rw_cmd() we may use the full buffer
> upon read/write.

at least mention that MSIX is in the device config region otherwise
it's not helpful to people w/o virtio background.

> >> +
> >> +static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
> >> +	.name = "virtio-vfio-pci",
> >> +	.init = vfio_pci_core_init_dev,
> >> +	.release = vfio_pci_core_release_dev,
> >> +	.open_device = virtiovf_pci_open_device,
> >
> > could be vfio_pci_core_open_device(). Given virtiovf specific init func
> > is not called  virtiovf_pci_open_device() is essentially same as the
> > core func.
> 
> We don't have today vfio_pci_core_open_device() as an exported function.
> 
> The virtiovf_pci_open_device() matches both cases so I don't see a real
> reason to export it now.
> 
> By the way, it follows other drivers in vfio, see for example here [1].
> 
> [1]
> https://elixir.bootlin.com/linux/v6.7-
> rc5/source/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c#L1383

ah, yes.

> >> +
> >> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> >> +			      const struct pci_device_id *id)
> >> +{
> >> +	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
> >> +	struct virtiovf_pci_core_device *virtvdev;
> >> +	int ret;
> >> +
> >> +	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
> >> +	    !virtiovf_bar0_exists(pdev))
> >> +		ops = &virtiovf_vfio_pci_tran_ops;
> >
> > I have a confusion here.
> >
> > why do we want to allow this driver binding to non-matching VF or
> > even PF?
> 
> The intention is to allow the binding of any virtio-net device (i.e. PF,
> VF which is not transitional capable) to have a single driver over VFIO
> for all virtio-net devices.
> 
> This enables any user space application to bind and use any virtio-net
> device without the need to care.
> 
> In case the device is not transitional capable, it will simply use the
> generic vfio functionality.

Is it useful to print a message here?

> 
> >
> > if that is the intention then the naming/description should be adjusted
> > to not specific to vf throughout this patch.
> >
> > e.g. don't use "virtiovf_" prefix...
> 
> The main functionality is to supply the transitional device to user
> space for the VF, hence the prefix and the description for that driver
> refers to VF.
> 
> Let's stay with that.

ok

> 
> >
> > the config option is generic:
> >
> > +config VIRTIO_VFIO_PCI
> > +        tristate "VFIO support for VIRTIO NET PCI devices"
> >
> > but the description is specific to vf:
> >
> > +          This provides support for exposing VIRTIO NET VF devices which
> support
> > +          legacy IO access, using the VFIO framework that can work with a
> legacy
> > +          virtio driver in the guest.
> >
> > then the module description is generic again:
> >
> > +MODULE_DESCRIPTION(
> > +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");
> >
> 
> Yes, as the binding allows that, it looks fine to me.
> 

what about revising the kconfig message to make it clear that it's
for all virtio-net device with special trick to make VF as a
transitional device?
Michael S. Tsirkin Dec. 14, 2023, 6:38 a.m. UTC | #6
On Thu, Dec 07, 2023 at 12:28:20PM +0200, Yishai Hadas wrote:
> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
> probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.
> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  MAINTAINERS                      |   7 +
>  drivers/vfio/pci/Kconfig         |   2 +
>  drivers/vfio/pci/Makefile        |   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  16 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/main.c   | 567 +++++++++++++++++++++++++++++++
>  6 files changed, 598 insertions(+)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 012df8ccf34e..b246b769092d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22872,6 +22872,13 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:	Yishai Hadas <yishaih@nvidia.com>
> +L:	kvm@vger.kernel.org
> +L:	virtualization@lists.linux-foundation.org
> +S:	Maintained
> +F:	drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..046139a4eca5 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> new file mode 100644
> index 000000000000..3a6707639220
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -0,0 +1,16 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO NET PCI devices"
> +        depends on VIRTIO_PCI
> +        select VFIO_PCI_CORE
> +        help
> +          This provides support for exposing VIRTIO NET VF devices which support
> +          legacy IO access, using the VFIO framework that can work with a legacy
> +          virtio driver in the guest.
> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> +          not indicate I/O Space.
> +          As of that this driver emulated I/O BAR in software to let a VF be
> +          seen as a transitional device in the guest and let it work with
> +          a legacy driver.
> +
> +          If you don't know what to do here, say N.

BTW shouldn't this driver be limited to X86? Things like lack of memory
barriers will make legacy virtio racy on e.g. ARM will they not?
And endian-ness will be broken on PPC ...
Yishai Hadas Dec. 14, 2023, 8:57 a.m. UTC | #7
On 14/12/2023 8:07, Tian, Kevin wrote:
>> From: Yishai Hadas <yishaih@nvidia.com>
>> Sent: Wednesday, December 13, 2023 8:25 PM
>>
>> On 13/12/2023 10:23, Tian, Kevin wrote:
>>>> From: Yishai Hadas <yishaih@nvidia.com>
>>>> Sent: Thursday, December 7, 2023 6:28 PM
>>>>
>>>> +
>>>> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
>>>> +					char __user *buf, size_t count,
>>>> +					loff_t *ppos)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>>>> +	size_t register_offset;
>>>> +	loff_t copy_offset;
>>>> +	size_t copy_count;
>>>> +	__le32 val32;
>>>> +	__le16 val16;
>>>> +	u8 val8;
>>>> +	int ret;
>>>> +
>>>> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
>>>> +	if (ret < 0)
>>>> +		return ret;
>>>> +
>>>> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
>>>> +				  &copy_offset, &copy_count,
>>>> &register_offset)) {
>>>> +		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
>>>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 +
>>>> register_offset, copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
>>>> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
>>>> +				  &copy_offset, &copy_count,
>>>> &register_offset)) {
>>>> +		if (copy_from_user((void *)&val16 + register_offset, buf +
>>>> copy_offset,
>>>> +				   copy_count))
>>>> +			return -EFAULT;
>>>> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
>>>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 +
>>>> register_offset,
>>>> +				 copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>
>>> the write handler calls vfio_pci_core_write() for PCI_COMMAND so
>>> the core vconfig should have the latest copy of the IO bit value which
>>> is copied to the user buffer by vfio_pci_core_read(). then not necessary
>>> to update it again.
>>
>> You assume the the 'vconfig' mechanism/flow is always applicable for
>> that specific field, this should be double-checked.
>> However, as for now the driver doesn't rely / use the vconfig for other
>> fields as it doesn't match and need a big refactor, I prefer to not rely
>> on it at all and have it here.
> 
> iiuc this driver does relies on vconfig for other fields. It first calls
> vfio_pci_core_read() and then modifies selected fields which needs
> special tweak in this driver.

No, there is no dependency at all on vconfig for other fields in the driver.

vfio_pci_core_read() for most of its fields including the PCI_COMMAND 
goes directly over the PCI API/flow to the device and doesn't use the 
vconfig.

So, we must save/restore the PCI_COMMAND on the driver context to have 
it properly reported/emulated the PCI_COMMAND_IO bit.

> 
> btw what virtio-net specific tweak is applied to PCI_COMMAND? You
> basically cache the cmd value and then set PCI_COMMAND_IO if
> it's set in the cached value. But this is already covered by pci
> vconfig. If anything is broken there then we already have a big
> trouble.

As I wrote above, this field (i.e.PCI_COMMAND) is not managed at all by 
vconfig, so we need to emulate it in the driver.

> 
> The trick for bar0 makes sense as it doesn't exist physically then
> the vconfig mechanism may not give the expected result. But
> I didn't see the same rationale for PCI_COMMAND.
> 
>>>> +
>>>> +static ssize_t
>>>> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user
>>>> *buf,
>>>> +			size_t count, loff_t *ppos)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>>>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>>>> +
>>>> +	if (!count)
>>>> +		return 0;
>>>> +
>>>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
>>>> +		size_t register_offset;
>>>> +		loff_t copy_offset;
>>>> +		size_t copy_count;
>>>> +
>>>> +		if (range_intersect_range(pos, count, PCI_COMMAND,
>>>> sizeof(virtvdev->pci_cmd),
>>>> +					  &copy_offset, &copy_count,
>>>> +					  &register_offset)) {
>>>> +			if (copy_from_user((void *)&virtvdev->pci_cmd +
>>>> register_offset,
>>>> +					   buf + copy_offset,
>>>> +					   copy_count))
>>>> +				return -EFAULT;
>>>> +		}
>>>> +
>>>> +		if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
>>>> +					  sizeof(virtvdev->pci_base_addr_0),
>>>> +					  &copy_offset, &copy_count,
>>>> +					  &register_offset)) {
>>>> +			if (copy_from_user((void *)&virtvdev-
>>>>> pci_base_addr_0 + register_offset,
>>>> +					   buf + copy_offset,
>>>> +					   copy_count))
>>>> +				return -EFAULT;
>>>> +		}
>>>> +	}
>>>
>>> wrap above into virtiovf_pci_write_config() to be symmetric with
>>> the read path.
>>
>> Upon the read path, we do the full flow and return to the user. Here we
>> just save some data and continue to call the core, so I'm not sure that
>> this worth a dedicated function.
> 
> I don't see a real difference.
> 
> for the read path you first call vfio_pci_core_read() then modifies some
> fields.
> 
> for the write path you save some data then call vfio_pci_core_write().
> 
>>
>> However, this can be done, do you still suggest it for V8 ?
> 
> yes

OK, will add as part of V8.

> 
>>>> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	struct pci_dev *pdev;
>>>> +	int ret;
>>>> +
>>>> +	ret = vfio_pci_core_init_dev(core_vdev);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	pdev = virtvdev->core_device.pdev;
>>>> +	ret = virtiovf_read_notify_info(virtvdev);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	/* Being ready with a buffer that supports MSIX */
>>>> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
>>>> +				virtiovf_get_device_config_size(pdev-
>>>>> device);
>>>
>>> which code is relevant to MSIX?
>>
>> The buffer size must include the MSIX part to match the virtio-net
>> specification.
>>
>> As part of virtiovf_issue_legacy_rw_cmd() we may use the full buffer
>> upon read/write.
> 
> at least mention that MSIX is in the device config region otherwise
> it's not helpful to people w/o virtio background.

MSIX is not in the device configuration region, it follows the common one.
In any case, it looks like this comment doesn't give any real value, 
I'll simply drop it in V8.

> 
>>>> +
>>>> +static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
>>>> +	.name = "virtio-vfio-pci",
>>>> +	.init = vfio_pci_core_init_dev,
>>>> +	.release = vfio_pci_core_release_dev,
>>>> +	.open_device = virtiovf_pci_open_device,
>>>
>>> could be vfio_pci_core_open_device(). Given virtiovf specific init func
>>> is not called  virtiovf_pci_open_device() is essentially same as the
>>> core func.
>>
>> We don't have today vfio_pci_core_open_device() as an exported function.
>>
>> The virtiovf_pci_open_device() matches both cases so I don't see a real
>> reason to export it now.
>>
>> By the way, it follows other drivers in vfio, see for example here [1].
>>
>> [1]
>> https://elixir.bootlin.com/linux/v6.7-
>> rc5/source/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c#L1383
> 
> ah, yes.
> 
>>>> +
>>>> +static int virtiovf_pci_probe(struct pci_dev *pdev,
>>>> +			      const struct pci_device_id *id)
>>>> +{
>>>> +	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
>>>> +	struct virtiovf_pci_core_device *virtvdev;
>>>> +	int ret;
>>>> +
>>>> +	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
>>>> +	    !virtiovf_bar0_exists(pdev))
>>>> +		ops = &virtiovf_vfio_pci_tran_ops;
>>>
>>> I have a confusion here.
>>>
>>> why do we want to allow this driver binding to non-matching VF or
>>> even PF?
>>
>> The intention is to allow the binding of any virtio-net device (i.e. PF,
>> VF which is not transitional capable) to have a single driver over VFIO
>> for all virtio-net devices.
>>
>> This enables any user space application to bind and use any virtio-net
>> device without the need to care.
>>
>> In case the device is not transitional capable, it will simply use the
>> generic vfio functionality.
> 
> Is it useful to print a message here?

I don't think so, we usually don't print such messages.
User space will work based on caps to know what is supported, as of 
other cases.

> 
>>
>>>
>>> if that is the intention then the naming/description should be adjusted
>>> to not specific to vf throughout this patch.
>>>
>>> e.g. don't use "virtiovf_" prefix...
>>
>> The main functionality is to supply the transitional device to user
>> space for the VF, hence the prefix and the description for that driver
>> refers to VF.
>>
>> Let's stay with that.
> 
> ok
> 
>>
>>>
>>> the config option is generic:
>>>
>>> +config VIRTIO_VFIO_PCI
>>> +        tristate "VFIO support for VIRTIO NET PCI devices"
>>>
>>> but the description is specific to vf:
>>>
>>> +          This provides support for exposing VIRTIO NET VF devices which
>> support
>>> +          legacy IO access, using the VFIO framework that can work with a
>> legacy
>>> +          virtio driver in the guest.
>>>
>>> then the module description is generic again:
>>>
>>> +MODULE_DESCRIPTION(
>>> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");
>>>
>>
>> Yes, as the binding allows that, it looks fine to me.
>>
> 
> what about revising the kconfig message to make it clear that it's
> for all virtio-net device with special trick to make VF as a
> transitional device?

Kconfig doesn't need to get into the details of how things were done.
We already mention that it emulates I/O BAR, this seems to be good enough.

However, I'll improve it as you already suggested in the previous mail.

Yishai
Yishai Hadas Dec. 14, 2023, 9:03 a.m. UTC | #8
On 14/12/2023 8:38, Michael S. Tsirkin wrote:
> On Thu, Dec 07, 2023 at 12:28:20PM +0200, Yishai Hadas wrote:
>> Introduce a vfio driver over virtio devices to support the legacy
>> interface functionality for VFs.
>>
>> Background, from the virtio spec [1].
>> --------------------------------------------------------------------
>> In some systems, there is a need to support a virtio legacy driver with
>> a device that does not directly support the legacy interface. In such
>> scenarios, a group owner device can provide the legacy interface
>> functionality for the group member devices. The driver of the owner
>> device can then access the legacy interface of a member device on behalf
>> of the legacy member device driver.
>>
>> For example, with the SR-IOV group type, group members (VFs) can not
>> present the legacy interface in an I/O BAR in BAR0 as expected by the
>> legacy pci driver. If the legacy driver is running inside a virtual
>> machine, the hypervisor executing the virtual machine can present a
>> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
>> legacy driver accesses to this I/O BAR and forwards them to the group
>> owner device (PF) using group administration commands.
>> --------------------------------------------------------------------
>>
>> Specifically, this driver adds support for a virtio-net VF to be exposed
>> as a transitional device to a guest driver and allows the legacy IO BAR
>> functionality on top.
>>
>> This allows a VM which uses a legacy virtio-net driver in the guest to
>> work transparently over a VF which its driver in the host is that new
>> driver.
>>
>> The driver can be extended easily to support some other types of virtio
>> devices (e.g virtio-blk), by adding in a few places the specific type
>> properties as was done for virtio-net.
>>
>> For now, only the virtio-net use case was tested and as such we introduce
>> the support only for such a device.
>>
>> Practically,
>> Upon probing a VF for a virtio-net device, in case its PF supports
>> legacy access over the virtio admin commands and the VF doesn't have BAR
>> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
>> transitional device with I/O BAR in BAR 0.
>>
>> The existence of the simulated I/O bar is reported later on by
>> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
>> exposes itself as a transitional device by overwriting some properties
>> upon reading its config space.
>>
>> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
>> guest may use it via read/write calls according to the virtio
>> specification.
>>
>> Any read/write towards the control parts of the BAR will be captured by
>> the new driver and will be translated into admin commands towards the
>> device.
>>
>> Any data path read/write access (i.e. virtio driver notifications) will
>> be forwarded to the physical BAR which its properties were supplied by
>> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
>> probing/init flow.
>>
>> With that code in place a legacy driver in the guest has the look and
>> feel as if having a transitional device with legacy support for both its
>> control and data path flows.
>>
>> [1]
>> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
>>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   MAINTAINERS                      |   7 +
>>   drivers/vfio/pci/Kconfig         |   2 +
>>   drivers/vfio/pci/Makefile        |   2 +
>>   drivers/vfio/pci/virtio/Kconfig  |  16 +
>>   drivers/vfio/pci/virtio/Makefile |   4 +
>>   drivers/vfio/pci/virtio/main.c   | 567 +++++++++++++++++++++++++++++++
>>   6 files changed, 598 insertions(+)
>>   create mode 100644 drivers/vfio/pci/virtio/Kconfig
>>   create mode 100644 drivers/vfio/pci/virtio/Makefile
>>   create mode 100644 drivers/vfio/pci/virtio/main.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 012df8ccf34e..b246b769092d 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -22872,6 +22872,13 @@ L:	kvm@vger.kernel.org
>>   S:	Maintained
>>   F:	drivers/vfio/pci/mlx5/
>>   
>> +VFIO VIRTIO PCI DRIVER
>> +M:	Yishai Hadas <yishaih@nvidia.com>
>> +L:	kvm@vger.kernel.org
>> +L:	virtualization@lists.linux-foundation.org
>> +S:	Maintained
>> +F:	drivers/vfio/pci/virtio
>> +
>>   VFIO PCI DEVICE SPECIFIC DRIVERS
>>   R:	Jason Gunthorpe <jgg@nvidia.com>
>>   R:	Yishai Hadas <yishaih@nvidia.com>
>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
>> index 8125e5f37832..18c397df566d 100644
>> --- a/drivers/vfio/pci/Kconfig
>> +++ b/drivers/vfio/pci/Kconfig
>> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>>   
>>   source "drivers/vfio/pci/pds/Kconfig"
>>   
>> +source "drivers/vfio/pci/virtio/Kconfig"
>> +
>>   endmenu
>> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
>> index 45167be462d8..046139a4eca5 100644
>> --- a/drivers/vfio/pci/Makefile
>> +++ b/drivers/vfio/pci/Makefile
>> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>>   obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>>   
>>   obj-$(CONFIG_PDS_VFIO_PCI) += pds/
>> +
>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
>> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
>> new file mode 100644
>> index 000000000000..3a6707639220
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/Kconfig
>> @@ -0,0 +1,16 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +config VIRTIO_VFIO_PCI
>> +        tristate "VFIO support for VIRTIO NET PCI devices"
>> +        depends on VIRTIO_PCI
>> +        select VFIO_PCI_CORE
>> +        help
>> +          This provides support for exposing VIRTIO NET VF devices which support
>> +          legacy IO access, using the VFIO framework that can work with a legacy
>> +          virtio driver in the guest.
>> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
>> +          not indicate I/O Space.
>> +          As of that this driver emulated I/O BAR in software to let a VF be
>> +          seen as a transitional device in the guest and let it work with
>> +          a legacy driver.
>> +
>> +          If you don't know what to do here, say N.
> 
> BTW shouldn't this driver be limited to X86? Things like lack of memory
> barriers will make legacy virtio racy on e.g. ARM will they not?
> And endian-ness will be broken on PPC ...
> 

OK, if so, we can come with the below extra code.
Makes sense ?

I'll squash it as part of V8 to the relevant patch.

diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index 37a0035f8381..b652e91b9df4 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -794,6 +794,9 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev 
*pdev)
         struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
         struct virtio_pci_device *vp_dev;

+#ifndef CONFIG_X86
+       return false;
+#endif
         if (!virtio_dev)
                 return false;

Yishai
Michael S. Tsirkin Dec. 14, 2023, 9:19 a.m. UTC | #9
On Thu, Dec 14, 2023 at 11:03:49AM +0200, Yishai Hadas wrote:
> On 14/12/2023 8:38, Michael S. Tsirkin wrote:
> > On Thu, Dec 07, 2023 at 12:28:20PM +0200, Yishai Hadas wrote:
> > > Introduce a vfio driver over virtio devices to support the legacy
> > > interface functionality for VFs.
> > > 
> > > Background, from the virtio spec [1].
> > > --------------------------------------------------------------------
> > > In some systems, there is a need to support a virtio legacy driver with
> > > a device that does not directly support the legacy interface. In such
> > > scenarios, a group owner device can provide the legacy interface
> > > functionality for the group member devices. The driver of the owner
> > > device can then access the legacy interface of a member device on behalf
> > > of the legacy member device driver.
> > > 
> > > For example, with the SR-IOV group type, group members (VFs) can not
> > > present the legacy interface in an I/O BAR in BAR0 as expected by the
> > > legacy pci driver. If the legacy driver is running inside a virtual
> > > machine, the hypervisor executing the virtual machine can present a
> > > virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> > > legacy driver accesses to this I/O BAR and forwards them to the group
> > > owner device (PF) using group administration commands.
> > > --------------------------------------------------------------------
> > > 
> > > Specifically, this driver adds support for a virtio-net VF to be exposed
> > > as a transitional device to a guest driver and allows the legacy IO BAR
> > > functionality on top.
> > > 
> > > This allows a VM which uses a legacy virtio-net driver in the guest to
> > > work transparently over a VF which its driver in the host is that new
> > > driver.
> > > 
> > > The driver can be extended easily to support some other types of virtio
> > > devices (e.g virtio-blk), by adding in a few places the specific type
> > > properties as was done for virtio-net.
> > > 
> > > For now, only the virtio-net use case was tested and as such we introduce
> > > the support only for such a device.
> > > 
> > > Practically,
> > > Upon probing a VF for a virtio-net device, in case its PF supports
> > > legacy access over the virtio admin commands and the VF doesn't have BAR
> > > 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> > > transitional device with I/O BAR in BAR 0.
> > > 
> > > The existence of the simulated I/O bar is reported later on by
> > > overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> > > exposes itself as a transitional device by overwriting some properties
> > > upon reading its config space.
> > > 
> > > Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> > > guest may use it via read/write calls according to the virtio
> > > specification.
> > > 
> > > Any read/write towards the control parts of the BAR will be captured by
> > > the new driver and will be translated into admin commands towards the
> > > device.
> > > 
> > > Any data path read/write access (i.e. virtio driver notifications) will
> > > be forwarded to the physical BAR which its properties were supplied by
> > > the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
> > > probing/init flow.
> > > 
> > > With that code in place a legacy driver in the guest has the look and
> > > feel as if having a transitional device with legacy support for both its
> > > control and data path flows.
> > > 
> > > [1]
> > > https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> > > 
> > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > > ---
> > >   MAINTAINERS                      |   7 +
> > >   drivers/vfio/pci/Kconfig         |   2 +
> > >   drivers/vfio/pci/Makefile        |   2 +
> > >   drivers/vfio/pci/virtio/Kconfig  |  16 +
> > >   drivers/vfio/pci/virtio/Makefile |   4 +
> > >   drivers/vfio/pci/virtio/main.c   | 567 +++++++++++++++++++++++++++++++
> > >   6 files changed, 598 insertions(+)
> > >   create mode 100644 drivers/vfio/pci/virtio/Kconfig
> > >   create mode 100644 drivers/vfio/pci/virtio/Makefile
> > >   create mode 100644 drivers/vfio/pci/virtio/main.c
> > > 
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index 012df8ccf34e..b246b769092d 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -22872,6 +22872,13 @@ L:	kvm@vger.kernel.org
> > >   S:	Maintained
> > >   F:	drivers/vfio/pci/mlx5/
> > > +VFIO VIRTIO PCI DRIVER
> > > +M:	Yishai Hadas <yishaih@nvidia.com>
> > > +L:	kvm@vger.kernel.org
> > > +L:	virtualization@lists.linux-foundation.org
> > > +S:	Maintained
> > > +F:	drivers/vfio/pci/virtio
> > > +
> > >   VFIO PCI DEVICE SPECIFIC DRIVERS
> > >   R:	Jason Gunthorpe <jgg@nvidia.com>
> > >   R:	Yishai Hadas <yishaih@nvidia.com>
> > > diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> > > index 8125e5f37832..18c397df566d 100644
> > > --- a/drivers/vfio/pci/Kconfig
> > > +++ b/drivers/vfio/pci/Kconfig
> > > @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
> > >   source "drivers/vfio/pci/pds/Kconfig"
> > > +source "drivers/vfio/pci/virtio/Kconfig"
> > > +
> > >   endmenu
> > > diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> > > index 45167be462d8..046139a4eca5 100644
> > > --- a/drivers/vfio/pci/Makefile
> > > +++ b/drivers/vfio/pci/Makefile
> > > @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
> > >   obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
> > >   obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> > > +
> > > +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> > > diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> > > new file mode 100644
> > > index 000000000000..3a6707639220
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/Kconfig
> > > @@ -0,0 +1,16 @@
> > > +# SPDX-License-Identifier: GPL-2.0-only
> > > +config VIRTIO_VFIO_PCI
> > > +        tristate "VFIO support for VIRTIO NET PCI devices"
> > > +        depends on VIRTIO_PCI
> > > +        select VFIO_PCI_CORE
> > > +        help
> > > +          This provides support for exposing VIRTIO NET VF devices which support
> > > +          legacy IO access, using the VFIO framework that can work with a legacy
> > > +          virtio driver in the guest.
> > > +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> > > +          not indicate I/O Space.
> > > +          As of that this driver emulated I/O BAR in software to let a VF be
> > > +          seen as a transitional device in the guest and let it work with
> > > +          a legacy driver.
> > > +
> > > +          If you don't know what to do here, say N.
> > 
> > BTW shouldn't this driver be limited to X86? Things like lack of memory
> > barriers will make legacy virtio racy on e.g. ARM will they not?
> > And endian-ness will be broken on PPC ...
> > 
> 
> OK, if so, we can come with the below extra code.
> Makes sense ?
> 
> I'll squash it as part of V8 to the relevant patch.
> 
> diff --git a/drivers/virtio/virtio_pci_modern.c
> b/drivers/virtio/virtio_pci_modern.c
> index 37a0035f8381..b652e91b9df4 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -794,6 +794,9 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
> *pdev)
>         struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>         struct virtio_pci_device *vp_dev;
> 
> +#ifndef CONFIG_X86
> +       return false;
> +#endif
>         if (!virtio_dev)
>                 return false;
> 
> Yishai

Isn't there going to be a bunch more dead code that compiler won't be
able to elide?
Yishai Hadas Dec. 14, 2023, 9:37 a.m. UTC | #10
On 14/12/2023 11:19, Michael S. Tsirkin wrote:
> On Thu, Dec 14, 2023 at 11:03:49AM +0200, Yishai Hadas wrote:
>> On 14/12/2023 8:38, Michael S. Tsirkin wrote:
>>> On Thu, Dec 07, 2023 at 12:28:20PM +0200, Yishai Hadas wrote:
>>>> Introduce a vfio driver over virtio devices to support the legacy
>>>> interface functionality for VFs.
>>>>
>>>> Background, from the virtio spec [1].
>>>> --------------------------------------------------------------------
>>>> In some systems, there is a need to support a virtio legacy driver with
>>>> a device that does not directly support the legacy interface. In such
>>>> scenarios, a group owner device can provide the legacy interface
>>>> functionality for the group member devices. The driver of the owner
>>>> device can then access the legacy interface of a member device on behalf
>>>> of the legacy member device driver.
>>>>
>>>> For example, with the SR-IOV group type, group members (VFs) can not
>>>> present the legacy interface in an I/O BAR in BAR0 as expected by the
>>>> legacy pci driver. If the legacy driver is running inside a virtual
>>>> machine, the hypervisor executing the virtual machine can present a
>>>> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
>>>> legacy driver accesses to this I/O BAR and forwards them to the group
>>>> owner device (PF) using group administration commands.
>>>> --------------------------------------------------------------------
>>>>
>>>> Specifically, this driver adds support for a virtio-net VF to be exposed
>>>> as a transitional device to a guest driver and allows the legacy IO BAR
>>>> functionality on top.
>>>>
>>>> This allows a VM which uses a legacy virtio-net driver in the guest to
>>>> work transparently over a VF which its driver in the host is that new
>>>> driver.
>>>>
>>>> The driver can be extended easily to support some other types of virtio
>>>> devices (e.g virtio-blk), by adding in a few places the specific type
>>>> properties as was done for virtio-net.
>>>>
>>>> For now, only the virtio-net use case was tested and as such we introduce
>>>> the support only for such a device.
>>>>
>>>> Practically,
>>>> Upon probing a VF for a virtio-net device, in case its PF supports
>>>> legacy access over the virtio admin commands and the VF doesn't have BAR
>>>> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
>>>> transitional device with I/O BAR in BAR 0.
>>>>
>>>> The existence of the simulated I/O bar is reported later on by
>>>> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
>>>> exposes itself as a transitional device by overwriting some properties
>>>> upon reading its config space.
>>>>
>>>> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
>>>> guest may use it via read/write calls according to the virtio
>>>> specification.
>>>>
>>>> Any read/write towards the control parts of the BAR will be captured by
>>>> the new driver and will be translated into admin commands towards the
>>>> device.
>>>>
>>>> Any data path read/write access (i.e. virtio driver notifications) will
>>>> be forwarded to the physical BAR which its properties were supplied by
>>>> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
>>>> probing/init flow.
>>>>
>>>> With that code in place a legacy driver in the guest has the look and
>>>> feel as if having a transitional device with legacy support for both its
>>>> control and data path flows.
>>>>
>>>> [1]
>>>> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
>>>>
>>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>>>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>>>> ---
>>>>    MAINTAINERS                      |   7 +
>>>>    drivers/vfio/pci/Kconfig         |   2 +
>>>>    drivers/vfio/pci/Makefile        |   2 +
>>>>    drivers/vfio/pci/virtio/Kconfig  |  16 +
>>>>    drivers/vfio/pci/virtio/Makefile |   4 +
>>>>    drivers/vfio/pci/virtio/main.c   | 567 +++++++++++++++++++++++++++++++
>>>>    6 files changed, 598 insertions(+)
>>>>    create mode 100644 drivers/vfio/pci/virtio/Kconfig
>>>>    create mode 100644 drivers/vfio/pci/virtio/Makefile
>>>>    create mode 100644 drivers/vfio/pci/virtio/main.c
>>>>
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> index 012df8ccf34e..b246b769092d 100644
>>>> --- a/MAINTAINERS
>>>> +++ b/MAINTAINERS
>>>> @@ -22872,6 +22872,13 @@ L:	kvm@vger.kernel.org
>>>>    S:	Maintained
>>>>    F:	drivers/vfio/pci/mlx5/
>>>> +VFIO VIRTIO PCI DRIVER
>>>> +M:	Yishai Hadas <yishaih@nvidia.com>
>>>> +L:	kvm@vger.kernel.org
>>>> +L:	virtualization@lists.linux-foundation.org
>>>> +S:	Maintained
>>>> +F:	drivers/vfio/pci/virtio
>>>> +
>>>>    VFIO PCI DEVICE SPECIFIC DRIVERS
>>>>    R:	Jason Gunthorpe <jgg@nvidia.com>
>>>>    R:	Yishai Hadas <yishaih@nvidia.com>
>>>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
>>>> index 8125e5f37832..18c397df566d 100644
>>>> --- a/drivers/vfio/pci/Kconfig
>>>> +++ b/drivers/vfio/pci/Kconfig
>>>> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>>>>    source "drivers/vfio/pci/pds/Kconfig"
>>>> +source "drivers/vfio/pci/virtio/Kconfig"
>>>> +
>>>>    endmenu
>>>> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
>>>> index 45167be462d8..046139a4eca5 100644
>>>> --- a/drivers/vfio/pci/Makefile
>>>> +++ b/drivers/vfio/pci/Makefile
>>>> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>>>>    obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>>>>    obj-$(CONFIG_PDS_VFIO_PCI) += pds/
>>>> +
>>>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
>>>> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
>>>> new file mode 100644
>>>> index 000000000000..3a6707639220
>>>> --- /dev/null
>>>> +++ b/drivers/vfio/pci/virtio/Kconfig
>>>> @@ -0,0 +1,16 @@
>>>> +# SPDX-License-Identifier: GPL-2.0-only
>>>> +config VIRTIO_VFIO_PCI
>>>> +        tristate "VFIO support for VIRTIO NET PCI devices"
>>>> +        depends on VIRTIO_PCI
>>>> +        select VFIO_PCI_CORE
>>>> +        help
>>>> +          This provides support for exposing VIRTIO NET VF devices which support
>>>> +          legacy IO access, using the VFIO framework that can work with a legacy
>>>> +          virtio driver in the guest.
>>>> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
>>>> +          not indicate I/O Space.
>>>> +          As of that this driver emulated I/O BAR in software to let a VF be
>>>> +          seen as a transitional device in the guest and let it work with
>>>> +          a legacy driver.
>>>> +
>>>> +          If you don't know what to do here, say N.
>>>
>>> BTW shouldn't this driver be limited to X86? Things like lack of memory
>>> barriers will make legacy virtio racy on e.g. ARM will they not?
>>> And endian-ness will be broken on PPC ...
>>>
>>
>> OK, if so, we can come with the below extra code.
>> Makes sense ?
>>
>> I'll squash it as part of V8 to the relevant patch.
>>
>> diff --git a/drivers/virtio/virtio_pci_modern.c
>> b/drivers/virtio/virtio_pci_modern.c
>> index 37a0035f8381..b652e91b9df4 100644
>> --- a/drivers/virtio/virtio_pci_modern.c
>> +++ b/drivers/virtio/virtio_pci_modern.c
>> @@ -794,6 +794,9 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
>> *pdev)
>>          struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>          struct virtio_pci_device *vp_dev;
>>
>> +#ifndef CONFIG_X86
>> +       return false;
>> +#endif
>>          if (!virtio_dev)
>>                  return false;
>>
>> Yishai
> 
> Isn't there going to be a bunch more dead code that compiler won't be
> able to elide?
> 

On my setup the compiler didn't complain about dead-code (I simulated it 
by using ifdef CONFIG_X86 return false).

However, if we suspect that some compiler might complain, we can come 
with the below instead.

Do you prefer that ?

index 37a0035f8381..53e29824d404 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -782,6 +782,7 @@ static void vp_modern_destroy_avq(struct 
virtio_device *vdev)
          BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
          BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))

+#ifdef CONFIG_X86
  /*
   * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
   * commands are supported
@@ -807,6 +808,12 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev 
*pdev)
                 return true;
         return false;
  }
+#else
+bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
+{
+       return false;
+}
+#endif
  EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);

Yishai
Alex Williamson Dec. 14, 2023, 2:59 p.m. UTC | #11
On Thu, 14 Dec 2023 11:37:10 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 14/12/2023 11:19, Michael S. Tsirkin wrote:
> > On Thu, Dec 14, 2023 at 11:03:49AM +0200, Yishai Hadas wrote:  
> >> On 14/12/2023 8:38, Michael S. Tsirkin wrote:  
> >>> On Thu, Dec 07, 2023 at 12:28:20PM +0200, Yishai Hadas wrote:  
> >>>> Introduce a vfio driver over virtio devices to support the legacy
> >>>> interface functionality for VFs.
> >>>>
> >>>> Background, from the virtio spec [1].
> >>>> --------------------------------------------------------------------
> >>>> In some systems, there is a need to support a virtio legacy driver with
> >>>> a device that does not directly support the legacy interface. In such
> >>>> scenarios, a group owner device can provide the legacy interface
> >>>> functionality for the group member devices. The driver of the owner
> >>>> device can then access the legacy interface of a member device on behalf
> >>>> of the legacy member device driver.
> >>>>
> >>>> For example, with the SR-IOV group type, group members (VFs) can not
> >>>> present the legacy interface in an I/O BAR in BAR0 as expected by the
> >>>> legacy pci driver. If the legacy driver is running inside a virtual
> >>>> machine, the hypervisor executing the virtual machine can present a
> >>>> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> >>>> legacy driver accesses to this I/O BAR and forwards them to the group
> >>>> owner device (PF) using group administration commands.
> >>>> --------------------------------------------------------------------
> >>>>
> >>>> Specifically, this driver adds support for a virtio-net VF to be exposed
> >>>> as a transitional device to a guest driver and allows the legacy IO BAR
> >>>> functionality on top.
> >>>>
> >>>> This allows a VM which uses a legacy virtio-net driver in the guest to
> >>>> work transparently over a VF which its driver in the host is that new
> >>>> driver.
> >>>>
> >>>> The driver can be extended easily to support some other types of virtio
> >>>> devices (e.g virtio-blk), by adding in a few places the specific type
> >>>> properties as was done for virtio-net.
> >>>>
> >>>> For now, only the virtio-net use case was tested and as such we introduce
> >>>> the support only for such a device.
> >>>>
> >>>> Practically,
> >>>> Upon probing a VF for a virtio-net device, in case its PF supports
> >>>> legacy access over the virtio admin commands and the VF doesn't have BAR
> >>>> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> >>>> transitional device with I/O BAR in BAR 0.
> >>>>
> >>>> The existence of the simulated I/O bar is reported later on by
> >>>> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> >>>> exposes itself as a transitional device by overwriting some properties
> >>>> upon reading its config space.
> >>>>
> >>>> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> >>>> guest may use it via read/write calls according to the virtio
> >>>> specification.
> >>>>
> >>>> Any read/write towards the control parts of the BAR will be captured by
> >>>> the new driver and will be translated into admin commands towards the
> >>>> device.
> >>>>
> >>>> Any data path read/write access (i.e. virtio driver notifications) will
> >>>> be forwarded to the physical BAR which its properties were supplied by
> >>>> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
> >>>> probing/init flow.
> >>>>
> >>>> With that code in place a legacy driver in the guest has the look and
> >>>> feel as if having a transitional device with legacy support for both its
> >>>> control and data path flows.
> >>>>
> >>>> [1]
> >>>> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> >>>>
> >>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> >>>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> >>>> ---
> >>>>    MAINTAINERS                      |   7 +
> >>>>    drivers/vfio/pci/Kconfig         |   2 +
> >>>>    drivers/vfio/pci/Makefile        |   2 +
> >>>>    drivers/vfio/pci/virtio/Kconfig  |  16 +
> >>>>    drivers/vfio/pci/virtio/Makefile |   4 +
> >>>>    drivers/vfio/pci/virtio/main.c   | 567 +++++++++++++++++++++++++++++++
> >>>>    6 files changed, 598 insertions(+)
> >>>>    create mode 100644 drivers/vfio/pci/virtio/Kconfig
> >>>>    create mode 100644 drivers/vfio/pci/virtio/Makefile
> >>>>    create mode 100644 drivers/vfio/pci/virtio/main.c
> >>>>
> >>>> diff --git a/MAINTAINERS b/MAINTAINERS
> >>>> index 012df8ccf34e..b246b769092d 100644
> >>>> --- a/MAINTAINERS
> >>>> +++ b/MAINTAINERS
> >>>> @@ -22872,6 +22872,13 @@ L:	kvm@vger.kernel.org
> >>>>    S:	Maintained
> >>>>    F:	drivers/vfio/pci/mlx5/
> >>>> +VFIO VIRTIO PCI DRIVER
> >>>> +M:	Yishai Hadas <yishaih@nvidia.com>
> >>>> +L:	kvm@vger.kernel.org
> >>>> +L:	virtualization@lists.linux-foundation.org
> >>>> +S:	Maintained
> >>>> +F:	drivers/vfio/pci/virtio
> >>>> +
> >>>>    VFIO PCI DEVICE SPECIFIC DRIVERS
> >>>>    R:	Jason Gunthorpe <jgg@nvidia.com>
> >>>>    R:	Yishai Hadas <yishaih@nvidia.com>
> >>>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> >>>> index 8125e5f37832..18c397df566d 100644
> >>>> --- a/drivers/vfio/pci/Kconfig
> >>>> +++ b/drivers/vfio/pci/Kconfig
> >>>> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
> >>>>    source "drivers/vfio/pci/pds/Kconfig"
> >>>> +source "drivers/vfio/pci/virtio/Kconfig"
> >>>> +
> >>>>    endmenu
> >>>> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> >>>> index 45167be462d8..046139a4eca5 100644
> >>>> --- a/drivers/vfio/pci/Makefile
> >>>> +++ b/drivers/vfio/pci/Makefile
> >>>> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
> >>>>    obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
> >>>>    obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> >>>> +
> >>>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> >>>> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> >>>> new file mode 100644
> >>>> index 000000000000..3a6707639220
> >>>> --- /dev/null
> >>>> +++ b/drivers/vfio/pci/virtio/Kconfig
> >>>> @@ -0,0 +1,16 @@
> >>>> +# SPDX-License-Identifier: GPL-2.0-only
> >>>> +config VIRTIO_VFIO_PCI
> >>>> +        tristate "VFIO support for VIRTIO NET PCI devices"
> >>>> +        depends on VIRTIO_PCI
> >>>> +        select VFIO_PCI_CORE
> >>>> +        help
> >>>> +          This provides support for exposing VIRTIO NET VF devices which support
> >>>> +          legacy IO access, using the VFIO framework that can work with a legacy
> >>>> +          virtio driver in the guest.
> >>>> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> >>>> +          not indicate I/O Space.
> >>>> +          As of that this driver emulated I/O BAR in software to let a VF be
> >>>> +          seen as a transitional device in the guest and let it work with
> >>>> +          a legacy driver.
> >>>> +
> >>>> +          If you don't know what to do here, say N.  
> >>>
> >>> BTW shouldn't this driver be limited to X86? Things like lack of memory
> >>> barriers will make legacy virtio racy on e.g. ARM will they not?
> >>> And endian-ness will be broken on PPC ...
> >>>  
> >>
> >> OK, if so, we can come with the below extra code.
> >> Makes sense ?
> >>
> >> I'll squash it as part of V8 to the relevant patch.
> >>
> >> diff --git a/drivers/virtio/virtio_pci_modern.c
> >> b/drivers/virtio/virtio_pci_modern.c
> >> index 37a0035f8381..b652e91b9df4 100644
> >> --- a/drivers/virtio/virtio_pci_modern.c
> >> +++ b/drivers/virtio/virtio_pci_modern.c
> >> @@ -794,6 +794,9 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
> >> *pdev)
> >>          struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> >>          struct virtio_pci_device *vp_dev;
> >>
> >> +#ifndef CONFIG_X86
> >> +       return false;
> >> +#endif
> >>          if (!virtio_dev)
> >>                  return false;
> >>
> >> Yishai  
> > 
> > Isn't there going to be a bunch more dead code that compiler won't be
> > able to elide?
> >   
> 
> On my setup the compiler didn't complain about dead-code (I simulated it 
> by using ifdef CONFIG_X86 return false).
> 
> However, if we suspect that some compiler might complain, we can come 
> with the below instead.
> 
> Do you prefer that ?
> 
> index 37a0035f8381..53e29824d404 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -782,6 +782,7 @@ static void vp_modern_destroy_avq(struct 
> virtio_device *vdev)
>           BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
>           BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> 
> +#ifdef CONFIG_X86
>   /*
>    * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
>    * commands are supported
> @@ -807,6 +808,12 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev 
> *pdev)
>                  return true;
>          return false;
>   }
> +#else
> +bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
> +{
> +       return false;
> +}
> +#endif
>   EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);

Doesn't this also raise the question of the purpose of virtio-vfio-pci
on non-x86?  Without any other features it offers nothing over vfio-pci
and we should likely adjust the Kconfig for x86 or COMPILE_TEST.
Thanks,

Alex
Michael S. Tsirkin Dec. 14, 2023, 3:05 p.m. UTC | #12
On Thu, Dec 14, 2023 at 07:59:05AM -0700, Alex Williamson wrote:
> On Thu, 14 Dec 2023 11:37:10 +0200
> Yishai Hadas <yishaih@nvidia.com> wrote:
> 
> > On 14/12/2023 11:19, Michael S. Tsirkin wrote:
> > > On Thu, Dec 14, 2023 at 11:03:49AM +0200, Yishai Hadas wrote:  
> > >> On 14/12/2023 8:38, Michael S. Tsirkin wrote:  
> > >>> On Thu, Dec 07, 2023 at 12:28:20PM +0200, Yishai Hadas wrote:  
> > >>>> Introduce a vfio driver over virtio devices to support the legacy
> > >>>> interface functionality for VFs.
> > >>>>
> > >>>> Background, from the virtio spec [1].
> > >>>> --------------------------------------------------------------------
> > >>>> In some systems, there is a need to support a virtio legacy driver with
> > >>>> a device that does not directly support the legacy interface. In such
> > >>>> scenarios, a group owner device can provide the legacy interface
> > >>>> functionality for the group member devices. The driver of the owner
> > >>>> device can then access the legacy interface of a member device on behalf
> > >>>> of the legacy member device driver.
> > >>>>
> > >>>> For example, with the SR-IOV group type, group members (VFs) can not
> > >>>> present the legacy interface in an I/O BAR in BAR0 as expected by the
> > >>>> legacy pci driver. If the legacy driver is running inside a virtual
> > >>>> machine, the hypervisor executing the virtual machine can present a
> > >>>> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> > >>>> legacy driver accesses to this I/O BAR and forwards them to the group
> > >>>> owner device (PF) using group administration commands.
> > >>>> --------------------------------------------------------------------
> > >>>>
> > >>>> Specifically, this driver adds support for a virtio-net VF to be exposed
> > >>>> as a transitional device to a guest driver and allows the legacy IO BAR
> > >>>> functionality on top.
> > >>>>
> > >>>> This allows a VM which uses a legacy virtio-net driver in the guest to
> > >>>> work transparently over a VF which its driver in the host is that new
> > >>>> driver.
> > >>>>
> > >>>> The driver can be extended easily to support some other types of virtio
> > >>>> devices (e.g virtio-blk), by adding in a few places the specific type
> > >>>> properties as was done for virtio-net.
> > >>>>
> > >>>> For now, only the virtio-net use case was tested and as such we introduce
> > >>>> the support only for such a device.
> > >>>>
> > >>>> Practically,
> > >>>> Upon probing a VF for a virtio-net device, in case its PF supports
> > >>>> legacy access over the virtio admin commands and the VF doesn't have BAR
> > >>>> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> > >>>> transitional device with I/O BAR in BAR 0.
> > >>>>
> > >>>> The existence of the simulated I/O bar is reported later on by
> > >>>> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> > >>>> exposes itself as a transitional device by overwriting some properties
> > >>>> upon reading its config space.
> > >>>>
> > >>>> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> > >>>> guest may use it via read/write calls according to the virtio
> > >>>> specification.
> > >>>>
> > >>>> Any read/write towards the control parts of the BAR will be captured by
> > >>>> the new driver and will be translated into admin commands towards the
> > >>>> device.
> > >>>>
> > >>>> Any data path read/write access (i.e. virtio driver notifications) will
> > >>>> be forwarded to the physical BAR which its properties were supplied by
> > >>>> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
> > >>>> probing/init flow.
> > >>>>
> > >>>> With that code in place a legacy driver in the guest has the look and
> > >>>> feel as if having a transitional device with legacy support for both its
> > >>>> control and data path flows.
> > >>>>
> > >>>> [1]
> > >>>> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> > >>>>
> > >>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > >>>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > >>>> ---
> > >>>>    MAINTAINERS                      |   7 +
> > >>>>    drivers/vfio/pci/Kconfig         |   2 +
> > >>>>    drivers/vfio/pci/Makefile        |   2 +
> > >>>>    drivers/vfio/pci/virtio/Kconfig  |  16 +
> > >>>>    drivers/vfio/pci/virtio/Makefile |   4 +
> > >>>>    drivers/vfio/pci/virtio/main.c   | 567 +++++++++++++++++++++++++++++++
> > >>>>    6 files changed, 598 insertions(+)
> > >>>>    create mode 100644 drivers/vfio/pci/virtio/Kconfig
> > >>>>    create mode 100644 drivers/vfio/pci/virtio/Makefile
> > >>>>    create mode 100644 drivers/vfio/pci/virtio/main.c
> > >>>>
> > >>>> diff --git a/MAINTAINERS b/MAINTAINERS
> > >>>> index 012df8ccf34e..b246b769092d 100644
> > >>>> --- a/MAINTAINERS
> > >>>> +++ b/MAINTAINERS
> > >>>> @@ -22872,6 +22872,13 @@ L:	kvm@vger.kernel.org
> > >>>>    S:	Maintained
> > >>>>    F:	drivers/vfio/pci/mlx5/
> > >>>> +VFIO VIRTIO PCI DRIVER
> > >>>> +M:	Yishai Hadas <yishaih@nvidia.com>
> > >>>> +L:	kvm@vger.kernel.org
> > >>>> +L:	virtualization@lists.linux-foundation.org
> > >>>> +S:	Maintained
> > >>>> +F:	drivers/vfio/pci/virtio
> > >>>> +
> > >>>>    VFIO PCI DEVICE SPECIFIC DRIVERS
> > >>>>    R:	Jason Gunthorpe <jgg@nvidia.com>
> > >>>>    R:	Yishai Hadas <yishaih@nvidia.com>
> > >>>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> > >>>> index 8125e5f37832..18c397df566d 100644
> > >>>> --- a/drivers/vfio/pci/Kconfig
> > >>>> +++ b/drivers/vfio/pci/Kconfig
> > >>>> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
> > >>>>    source "drivers/vfio/pci/pds/Kconfig"
> > >>>> +source "drivers/vfio/pci/virtio/Kconfig"
> > >>>> +
> > >>>>    endmenu
> > >>>> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> > >>>> index 45167be462d8..046139a4eca5 100644
> > >>>> --- a/drivers/vfio/pci/Makefile
> > >>>> +++ b/drivers/vfio/pci/Makefile
> > >>>> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
> > >>>>    obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
> > >>>>    obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> > >>>> +
> > >>>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> > >>>> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> > >>>> new file mode 100644
> > >>>> index 000000000000..3a6707639220
> > >>>> --- /dev/null
> > >>>> +++ b/drivers/vfio/pci/virtio/Kconfig
> > >>>> @@ -0,0 +1,16 @@
> > >>>> +# SPDX-License-Identifier: GPL-2.0-only
> > >>>> +config VIRTIO_VFIO_PCI
> > >>>> +        tristate "VFIO support for VIRTIO NET PCI devices"
> > >>>> +        depends on VIRTIO_PCI
> > >>>> +        select VFIO_PCI_CORE
> > >>>> +        help
> > >>>> +          This provides support for exposing VIRTIO NET VF devices which support
> > >>>> +          legacy IO access, using the VFIO framework that can work with a legacy
> > >>>> +          virtio driver in the guest.
> > >>>> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> > >>>> +          not indicate I/O Space.
> > >>>> +          As of that this driver emulated I/O BAR in software to let a VF be
> > >>>> +          seen as a transitional device in the guest and let it work with
> > >>>> +          a legacy driver.
> > >>>> +
> > >>>> +          If you don't know what to do here, say N.  
> > >>>
> > >>> BTW shouldn't this driver be limited to X86? Things like lack of memory
> > >>> barriers will make legacy virtio racy on e.g. ARM will they not?
> > >>> And endian-ness will be broken on PPC ...
> > >>>  
> > >>
> > >> OK, if so, we can come with the below extra code.
> > >> Makes sense ?
> > >>
> > >> I'll squash it as part of V8 to the relevant patch.
> > >>
> > >> diff --git a/drivers/virtio/virtio_pci_modern.c
> > >> b/drivers/virtio/virtio_pci_modern.c
> > >> index 37a0035f8381..b652e91b9df4 100644
> > >> --- a/drivers/virtio/virtio_pci_modern.c
> > >> +++ b/drivers/virtio/virtio_pci_modern.c
> > >> @@ -794,6 +794,9 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
> > >> *pdev)
> > >>          struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >>          struct virtio_pci_device *vp_dev;
> > >>
> > >> +#ifndef CONFIG_X86
> > >> +       return false;
> > >> +#endif
> > >>          if (!virtio_dev)
> > >>                  return false;
> > >>
> > >> Yishai  
> > > 
> > > Isn't there going to be a bunch more dead code that compiler won't be
> > > able to elide?
> > >   
> > 
> > On my setup the compiler didn't complain about dead-code (I simulated it 
> > by using ifdef CONFIG_X86 return false).
> > 
> > However, if we suspect that some compiler might complain, we can come 
> > with the below instead.
> > 
> > Do you prefer that ?
> > 
> > index 37a0035f8381..53e29824d404 100644
> > --- a/drivers/virtio/virtio_pci_modern.c
> > +++ b/drivers/virtio/virtio_pci_modern.c
> > @@ -782,6 +782,7 @@ static void vp_modern_destroy_avq(struct 
> > virtio_device *vdev)
> >           BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> >           BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> > 
> > +#ifdef CONFIG_X86
> >   /*
> >    * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
> >    * commands are supported
> > @@ -807,6 +808,12 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev 
> > *pdev)
> >                  return true;
> >          return false;
> >   }
> > +#else
> > +bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
> > +{
> > +       return false;
> > +}
> > +#endif
> >   EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);
> 
> Doesn't this also raise the question of the purpose of virtio-vfio-pci
> on non-x86?  Without any other features it offers nothing over vfio-pci
> and we should likely adjust the Kconfig for x86 or COMPILE_TEST.
> Thanks,
> 
> Alex

Kconfig dependency is what I had in mind, yes. The X86 specific code in
virtio_pci_modern.c can be moved to a separate file then use makefile
tricks to skip it on other platforms.
Yishai Hadas Dec. 14, 2023, 4:03 p.m. UTC | #13
On 14/12/2023 17:05, Michael S. Tsirkin wrote:
> On Thu, Dec 14, 2023 at 07:59:05AM -0700, Alex Williamson wrote:
>> On Thu, 14 Dec 2023 11:37:10 +0200
>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>
>>> On 14/12/2023 11:19, Michael S. Tsirkin wrote:
>>>> On Thu, Dec 14, 2023 at 11:03:49AM +0200, Yishai Hadas wrote:
>>>>> On 14/12/2023 8:38, Michael S. Tsirkin wrote:
>>>>>> On Thu, Dec 07, 2023 at 12:28:20PM +0200, Yishai Hadas wrote:
>>>>>>> Introduce a vfio driver over virtio devices to support the legacy
>>>>>>> interface functionality for VFs.
>>>>>>>
>>>>>>> Background, from the virtio spec [1].
>>>>>>> --------------------------------------------------------------------
>>>>>>> In some systems, there is a need to support a virtio legacy driver with
>>>>>>> a device that does not directly support the legacy interface. In such
>>>>>>> scenarios, a group owner device can provide the legacy interface
>>>>>>> functionality for the group member devices. The driver of the owner
>>>>>>> device can then access the legacy interface of a member device on behalf
>>>>>>> of the legacy member device driver.
>>>>>>>
>>>>>>> For example, with the SR-IOV group type, group members (VFs) can not
>>>>>>> present the legacy interface in an I/O BAR in BAR0 as expected by the
>>>>>>> legacy pci driver. If the legacy driver is running inside a virtual
>>>>>>> machine, the hypervisor executing the virtual machine can present a
>>>>>>> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
>>>>>>> legacy driver accesses to this I/O BAR and forwards them to the group
>>>>>>> owner device (PF) using group administration commands.
>>>>>>> --------------------------------------------------------------------
>>>>>>>
>>>>>>> Specifically, this driver adds support for a virtio-net VF to be exposed
>>>>>>> as a transitional device to a guest driver and allows the legacy IO BAR
>>>>>>> functionality on top.
>>>>>>>
>>>>>>> This allows a VM which uses a legacy virtio-net driver in the guest to
>>>>>>> work transparently over a VF which its driver in the host is that new
>>>>>>> driver.
>>>>>>>
>>>>>>> The driver can be extended easily to support some other types of virtio
>>>>>>> devices (e.g virtio-blk), by adding in a few places the specific type
>>>>>>> properties as was done for virtio-net.
>>>>>>>
>>>>>>> For now, only the virtio-net use case was tested and as such we introduce
>>>>>>> the support only for such a device.
>>>>>>>
>>>>>>> Practically,
>>>>>>> Upon probing a VF for a virtio-net device, in case its PF supports
>>>>>>> legacy access over the virtio admin commands and the VF doesn't have BAR
>>>>>>> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
>>>>>>> transitional device with I/O BAR in BAR 0.
>>>>>>>
>>>>>>> The existence of the simulated I/O bar is reported later on by
>>>>>>> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
>>>>>>> exposes itself as a transitional device by overwriting some properties
>>>>>>> upon reading its config space.
>>>>>>>
>>>>>>> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
>>>>>>> guest may use it via read/write calls according to the virtio
>>>>>>> specification.
>>>>>>>
>>>>>>> Any read/write towards the control parts of the BAR will be captured by
>>>>>>> the new driver and will be translated into admin commands towards the
>>>>>>> device.
>>>>>>>
>>>>>>> Any data path read/write access (i.e. virtio driver notifications) will
>>>>>>> be forwarded to the physical BAR which its properties were supplied by
>>>>>>> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
>>>>>>> probing/init flow.
>>>>>>>
>>>>>>> With that code in place a legacy driver in the guest has the look and
>>>>>>> feel as if having a transitional device with legacy support for both its
>>>>>>> control and data path flows.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
>>>>>>>
>>>>>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>>>>>>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>>>>>>> ---
>>>>>>>     MAINTAINERS                      |   7 +
>>>>>>>     drivers/vfio/pci/Kconfig         |   2 +
>>>>>>>     drivers/vfio/pci/Makefile        |   2 +
>>>>>>>     drivers/vfio/pci/virtio/Kconfig  |  16 +
>>>>>>>     drivers/vfio/pci/virtio/Makefile |   4 +
>>>>>>>     drivers/vfio/pci/virtio/main.c   | 567 +++++++++++++++++++++++++++++++
>>>>>>>     6 files changed, 598 insertions(+)
>>>>>>>     create mode 100644 drivers/vfio/pci/virtio/Kconfig
>>>>>>>     create mode 100644 drivers/vfio/pci/virtio/Makefile
>>>>>>>     create mode 100644 drivers/vfio/pci/virtio/main.c
>>>>>>>
>>>>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>>>>> index 012df8ccf34e..b246b769092d 100644
>>>>>>> --- a/MAINTAINERS
>>>>>>> +++ b/MAINTAINERS
>>>>>>> @@ -22872,6 +22872,13 @@ L:	kvm@vger.kernel.org
>>>>>>>     S:	Maintained
>>>>>>>     F:	drivers/vfio/pci/mlx5/
>>>>>>> +VFIO VIRTIO PCI DRIVER
>>>>>>> +M:	Yishai Hadas <yishaih@nvidia.com>
>>>>>>> +L:	kvm@vger.kernel.org
>>>>>>> +L:	virtualization@lists.linux-foundation.org
>>>>>>> +S:	Maintained
>>>>>>> +F:	drivers/vfio/pci/virtio
>>>>>>> +
>>>>>>>     VFIO PCI DEVICE SPECIFIC DRIVERS
>>>>>>>     R:	Jason Gunthorpe <jgg@nvidia.com>
>>>>>>>     R:	Yishai Hadas <yishaih@nvidia.com>
>>>>>>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
>>>>>>> index 8125e5f37832..18c397df566d 100644
>>>>>>> --- a/drivers/vfio/pci/Kconfig
>>>>>>> +++ b/drivers/vfio/pci/Kconfig
>>>>>>> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>>>>>>>     source "drivers/vfio/pci/pds/Kconfig"
>>>>>>> +source "drivers/vfio/pci/virtio/Kconfig"
>>>>>>> +
>>>>>>>     endmenu
>>>>>>> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
>>>>>>> index 45167be462d8..046139a4eca5 100644
>>>>>>> --- a/drivers/vfio/pci/Makefile
>>>>>>> +++ b/drivers/vfio/pci/Makefile
>>>>>>> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>>>>>>>     obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>>>>>>>     obj-$(CONFIG_PDS_VFIO_PCI) += pds/
>>>>>>> +
>>>>>>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
>>>>>>> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
>>>>>>> new file mode 100644
>>>>>>> index 000000000000..3a6707639220
>>>>>>> --- /dev/null
>>>>>>> +++ b/drivers/vfio/pci/virtio/Kconfig
>>>>>>> @@ -0,0 +1,16 @@
>>>>>>> +# SPDX-License-Identifier: GPL-2.0-only
>>>>>>> +config VIRTIO_VFIO_PCI
>>>>>>> +        tristate "VFIO support for VIRTIO NET PCI devices"
>>>>>>> +        depends on VIRTIO_PCI
>>>>>>> +        select VFIO_PCI_CORE
>>>>>>> +        help
>>>>>>> +          This provides support for exposing VIRTIO NET VF devices which support
>>>>>>> +          legacy IO access, using the VFIO framework that can work with a legacy
>>>>>>> +          virtio driver in the guest.
>>>>>>> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
>>>>>>> +          not indicate I/O Space.
>>>>>>> +          As of that this driver emulated I/O BAR in software to let a VF be
>>>>>>> +          seen as a transitional device in the guest and let it work with
>>>>>>> +          a legacy driver.
>>>>>>> +
>>>>>>> +          If you don't know what to do here, say N.
>>>>>>
>>>>>> BTW shouldn't this driver be limited to X86? Things like lack of memory
>>>>>> barriers will make legacy virtio racy on e.g. ARM will they not?
>>>>>> And endian-ness will be broken on PPC ...
>>>>>>   
>>>>>
>>>>> OK, if so, we can come with the below extra code.
>>>>> Makes sense ?
>>>>>
>>>>> I'll squash it as part of V8 to the relevant patch.
>>>>>
>>>>> diff --git a/drivers/virtio/virtio_pci_modern.c
>>>>> b/drivers/virtio/virtio_pci_modern.c
>>>>> index 37a0035f8381..b652e91b9df4 100644
>>>>> --- a/drivers/virtio/virtio_pci_modern.c
>>>>> +++ b/drivers/virtio/virtio_pci_modern.c
>>>>> @@ -794,6 +794,9 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
>>>>> *pdev)
>>>>>           struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>>           struct virtio_pci_device *vp_dev;
>>>>>
>>>>> +#ifndef CONFIG_X86
>>>>> +       return false;
>>>>> +#endif
>>>>>           if (!virtio_dev)
>>>>>                   return false;
>>>>>
>>>>> Yishai
>>>>
>>>> Isn't there going to be a bunch more dead code that compiler won't be
>>>> able to elide?
>>>>    
>>>
>>> On my setup the compiler didn't complain about dead-code (I simulated it
>>> by using ifdef CONFIG_X86 return false).
>>>
>>> However, if we suspect that some compiler might complain, we can come
>>> with the below instead.
>>>
>>> Do you prefer that ?
>>>
>>> index 37a0035f8381..53e29824d404 100644
>>> --- a/drivers/virtio/virtio_pci_modern.c
>>> +++ b/drivers/virtio/virtio_pci_modern.c
>>> @@ -782,6 +782,7 @@ static void vp_modern_destroy_avq(struct
>>> virtio_device *vdev)
>>>            BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
>>>            BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
>>>
>>> +#ifdef CONFIG_X86
>>>    /*
>>>     * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
>>>     * commands are supported
>>> @@ -807,6 +808,12 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
>>> *pdev)
>>>                   return true;
>>>           return false;
>>>    }
>>> +#else
>>> +bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
>>> +{
>>> +       return false;
>>> +}
>>> +#endif
>>>    EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);
>>
>> Doesn't this also raise the question of the purpose of virtio-vfio-pci
>> on non-x86?  Without any other features it offers nothing over vfio-pci
>> and we should likely adjust the Kconfig for x86 or COMPILE_TEST.
>> Thanks,
>>
>> Alex
> 
> Kconfig dependency is what I had in mind, yes. The X86 specific code in
> virtio_pci_modern.c can be moved to a separate file then use makefile
> tricks to skip it on other platforms.
> 

The next feature for that driver will be the live migration support over 
virtio, once the specification which is WIP those day will be accepted.

The migration functionality is not X86 dependent and doesn't have the 
legacy virtio driver limitations that enforced us to run only on X86.

So, by that time we may need to enable in VFIO the loading of 
virtio-vfio-pci driver and put back the ifdef X86 inside VIRTIO, only on 
the legacy IO API, as I did already in V8.

So using a KCONFIG solution in VFIO is a short term one, which will be 
reverted just later on.

In addition, the virtio_pci_admin_has_legacy_io() API can be used in the 
future not only by VFIO, this was one of the reasons to put it inside 
VIRTIO.

As of that, we may expect that VIRTIO will consider its limitation and 
return false in systems that are not X86 and won't expose its 
concerns/knowledge to other modules.

As of the above, I believe that better staying with the solution inside 
VIRTIO as was suggested here and is part of V8 already.

Makes sense ?

Yishai
Alex Williamson Dec. 14, 2023, 4:15 p.m. UTC | #14
On Thu, 14 Dec 2023 18:03:30 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 14/12/2023 17:05, Michael S. Tsirkin wrote:
> > On Thu, Dec 14, 2023 at 07:59:05AM -0700, Alex Williamson wrote:  
> >> On Thu, 14 Dec 2023 11:37:10 +0200
> >> Yishai Hadas <yishaih@nvidia.com> wrote:
> >>>>> OK, if so, we can come with the below extra code.
> >>>>> Makes sense ?
> >>>>>
> >>>>> I'll squash it as part of V8 to the relevant patch.
> >>>>>
> >>>>> diff --git a/drivers/virtio/virtio_pci_modern.c
> >>>>> b/drivers/virtio/virtio_pci_modern.c
> >>>>> index 37a0035f8381..b652e91b9df4 100644
> >>>>> --- a/drivers/virtio/virtio_pci_modern.c
> >>>>> +++ b/drivers/virtio/virtio_pci_modern.c
> >>>>> @@ -794,6 +794,9 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
> >>>>> *pdev)
> >>>>>           struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> >>>>>           struct virtio_pci_device *vp_dev;
> >>>>>
> >>>>> +#ifndef CONFIG_X86
> >>>>> +       return false;
> >>>>> +#endif
> >>>>>           if (!virtio_dev)
> >>>>>                   return false;
> >>>>>
> >>>>> Yishai  
> >>>>
> >>>> Isn't there going to be a bunch more dead code that compiler won't be
> >>>> able to elide?
> >>>>      
> >>>
> >>> On my setup the compiler didn't complain about dead-code (I simulated it
> >>> by using ifdef CONFIG_X86 return false).
> >>>
> >>> However, if we suspect that some compiler might complain, we can come
> >>> with the below instead.
> >>>
> >>> Do you prefer that ?
> >>>
> >>> index 37a0035f8381..53e29824d404 100644
> >>> --- a/drivers/virtio/virtio_pci_modern.c
> >>> +++ b/drivers/virtio/virtio_pci_modern.c
> >>> @@ -782,6 +782,7 @@ static void vp_modern_destroy_avq(struct
> >>> virtio_device *vdev)
> >>>            BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> >>>            BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> >>>
> >>> +#ifdef CONFIG_X86
> >>>    /*
> >>>     * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
> >>>     * commands are supported
> >>> @@ -807,6 +808,12 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
> >>> *pdev)
> >>>                   return true;
> >>>           return false;
> >>>    }
> >>> +#else
> >>> +bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
> >>> +{
> >>> +       return false;
> >>> +}
> >>> +#endif
> >>>    EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);  
> >>
> >> Doesn't this also raise the question of the purpose of virtio-vfio-pci
> >> on non-x86?  Without any other features it offers nothing over vfio-pci
> >> and we should likely adjust the Kconfig for x86 or COMPILE_TEST.
> >> Thanks,
> >>
> >> Alex  
> > 
> > Kconfig dependency is what I had in mind, yes. The X86 specific code in
> > virtio_pci_modern.c can be moved to a separate file then use makefile
> > tricks to skip it on other platforms.
> >   
> 
> The next feature for that driver will be the live migration support over 
> virtio, once the specification which is WIP those day will be accepted.
> 
> The migration functionality is not X86 dependent and doesn't have the 
> legacy virtio driver limitations that enforced us to run only on X86.
> 
> So, by that time we may need to enable in VFIO the loading of 
> virtio-vfio-pci driver and put back the ifdef X86 inside VIRTIO, only on 
> the legacy IO API, as I did already in V8.
> 
> So using a KCONFIG solution in VFIO is a short term one, which will be 
> reverted just later on.

I understand the intent, but I don't think that justifies building a
driver that serves no purpose in the interim.  IF and when migration
support becomes a reality, it's trivial to update the depends line.

> In addition, the virtio_pci_admin_has_legacy_io() API can be used in the 
> future not only by VFIO, this was one of the reasons to put it inside 
> VIRTIO.

Maybe this should be governed by a new Kconfig option which would be
selected by drivers like this.  Thanks,

Alex
Yishai Hadas Dec. 14, 2023, 4:25 p.m. UTC | #15
On 14/12/2023 18:15, Alex Williamson wrote:
> On Thu, 14 Dec 2023 18:03:30 +0200
> Yishai Hadas <yishaih@nvidia.com> wrote:
> 
>> On 14/12/2023 17:05, Michael S. Tsirkin wrote:
>>> On Thu, Dec 14, 2023 at 07:59:05AM -0700, Alex Williamson wrote:
>>>> On Thu, 14 Dec 2023 11:37:10 +0200
>>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>>>>>> OK, if so, we can come with the below extra code.
>>>>>>> Makes sense ?
>>>>>>>
>>>>>>> I'll squash it as part of V8 to the relevant patch.
>>>>>>>
>>>>>>> diff --git a/drivers/virtio/virtio_pci_modern.c
>>>>>>> b/drivers/virtio/virtio_pci_modern.c
>>>>>>> index 37a0035f8381..b652e91b9df4 100644
>>>>>>> --- a/drivers/virtio/virtio_pci_modern.c
>>>>>>> +++ b/drivers/virtio/virtio_pci_modern.c
>>>>>>> @@ -794,6 +794,9 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
>>>>>>> *pdev)
>>>>>>>            struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>>>>            struct virtio_pci_device *vp_dev;
>>>>>>>
>>>>>>> +#ifndef CONFIG_X86
>>>>>>> +       return false;
>>>>>>> +#endif
>>>>>>>            if (!virtio_dev)
>>>>>>>                    return false;
>>>>>>>
>>>>>>> Yishai
>>>>>>
>>>>>> Isn't there going to be a bunch more dead code that compiler won't be
>>>>>> able to elide?
>>>>>>       
>>>>>
>>>>> On my setup the compiler didn't complain about dead-code (I simulated it
>>>>> by using ifdef CONFIG_X86 return false).
>>>>>
>>>>> However, if we suspect that some compiler might complain, we can come
>>>>> with the below instead.
>>>>>
>>>>> Do you prefer that ?
>>>>>
>>>>> index 37a0035f8381..53e29824d404 100644
>>>>> --- a/drivers/virtio/virtio_pci_modern.c
>>>>> +++ b/drivers/virtio/virtio_pci_modern.c
>>>>> @@ -782,6 +782,7 @@ static void vp_modern_destroy_avq(struct
>>>>> virtio_device *vdev)
>>>>>             BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
>>>>>             BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
>>>>>
>>>>> +#ifdef CONFIG_X86
>>>>>     /*
>>>>>      * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
>>>>>      * commands are supported
>>>>> @@ -807,6 +808,12 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
>>>>> *pdev)
>>>>>                    return true;
>>>>>            return false;
>>>>>     }
>>>>> +#else
>>>>> +bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
>>>>> +{
>>>>> +       return false;
>>>>> +}
>>>>> +#endif
>>>>>     EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);
>>>>
>>>> Doesn't this also raise the question of the purpose of virtio-vfio-pci
>>>> on non-x86?  Without any other features it offers nothing over vfio-pci
>>>> and we should likely adjust the Kconfig for x86 or COMPILE_TEST.
>>>> Thanks,
>>>>
>>>> Alex
>>>
>>> Kconfig dependency is what I had in mind, yes. The X86 specific code in
>>> virtio_pci_modern.c can be moved to a separate file then use makefile
>>> tricks to skip it on other platforms.
>>>    
>>
>> The next feature for that driver will be the live migration support over
>> virtio, once the specification which is WIP those day will be accepted.
>>
>> The migration functionality is not X86 dependent and doesn't have the
>> legacy virtio driver limitations that enforced us to run only on X86.
>>
>> So, by that time we may need to enable in VFIO the loading of
>> virtio-vfio-pci driver and put back the ifdef X86 inside VIRTIO, only on
>> the legacy IO API, as I did already in V8.
>>
>> So using a KCONFIG solution in VFIO is a short term one, which will be
>> reverted just later on.
> 
> I understand the intent, but I don't think that justifies building a
> driver that serves no purpose in the interim.  IF and when migration
> support becomes a reality, it's trivial to update the depends line.
> 

OK, so I'll add a KCONFIG dependency on X86 as you suggested as part of 
V9 inside VFIO.

>> In addition, the virtio_pci_admin_has_legacy_io() API can be used in the
>> future not only by VFIO, this was one of the reasons to put it inside
>> VIRTIO.
> 
> Maybe this should be governed by a new Kconfig option which would be
> selected by drivers like this.  Thanks,
> 

We can still keep the simple ifdef X86 inside VIRTIO for future 
users/usage which is not only VFIO.

Michael,
Can that work for you ?

Yishai

> Alex
>
Michael S. Tsirkin Dec. 14, 2023, 4:40 p.m. UTC | #16
On Thu, Dec 14, 2023 at 06:25:25PM +0200, Yishai Hadas wrote:
> On 14/12/2023 18:15, Alex Williamson wrote:
> > On Thu, 14 Dec 2023 18:03:30 +0200
> > Yishai Hadas <yishaih@nvidia.com> wrote:
> > 
> > > On 14/12/2023 17:05, Michael S. Tsirkin wrote:
> > > > On Thu, Dec 14, 2023 at 07:59:05AM -0700, Alex Williamson wrote:
> > > > > On Thu, 14 Dec 2023 11:37:10 +0200
> > > > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > > > > > > > OK, if so, we can come with the below extra code.
> > > > > > > > Makes sense ?
> > > > > > > > 
> > > > > > > > I'll squash it as part of V8 to the relevant patch.
> > > > > > > > 
> > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern.c
> > > > > > > > b/drivers/virtio/virtio_pci_modern.c
> > > > > > > > index 37a0035f8381..b652e91b9df4 100644
> > > > > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > > > > @@ -794,6 +794,9 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
> > > > > > > > *pdev)
> > > > > > > >            struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > > > > > >            struct virtio_pci_device *vp_dev;
> > > > > > > > 
> > > > > > > > +#ifndef CONFIG_X86
> > > > > > > > +       return false;
> > > > > > > > +#endif
> > > > > > > >            if (!virtio_dev)
> > > > > > > >                    return false;
> > > > > > > > 
> > > > > > > > Yishai
> > > > > > > 
> > > > > > > Isn't there going to be a bunch more dead code that compiler won't be
> > > > > > > able to elide?
> > > > > > 
> > > > > > On my setup the compiler didn't complain about dead-code (I simulated it
> > > > > > by using ifdef CONFIG_X86 return false).
> > > > > > 
> > > > > > However, if we suspect that some compiler might complain, we can come
> > > > > > with the below instead.
> > > > > > 
> > > > > > Do you prefer that ?
> > > > > > 
> > > > > > index 37a0035f8381..53e29824d404 100644
> > > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > > @@ -782,6 +782,7 @@ static void vp_modern_destroy_avq(struct
> > > > > > virtio_device *vdev)
> > > > > >             BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> > > > > >             BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> > > > > > 
> > > > > > +#ifdef CONFIG_X86
> > > > > >     /*
> > > > > >      * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
> > > > > >      * commands are supported
> > > > > > @@ -807,6 +808,12 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
> > > > > > *pdev)
> > > > > >                    return true;
> > > > > >            return false;
> > > > > >     }
> > > > > > +#else
> > > > > > +bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
> > > > > > +{
> > > > > > +       return false;
> > > > > > +}
> > > > > > +#endif
> > > > > >     EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);
> > > > > 
> > > > > Doesn't this also raise the question of the purpose of virtio-vfio-pci
> > > > > on non-x86?  Without any other features it offers nothing over vfio-pci
> > > > > and we should likely adjust the Kconfig for x86 or COMPILE_TEST.
> > > > > Thanks,
> > > > > 
> > > > > Alex
> > > > 
> > > > Kconfig dependency is what I had in mind, yes. The X86 specific code in
> > > > virtio_pci_modern.c can be moved to a separate file then use makefile
> > > > tricks to skip it on other platforms.
> > > 
> > > The next feature for that driver will be the live migration support over
> > > virtio, once the specification which is WIP those day will be accepted.
> > > 
> > > The migration functionality is not X86 dependent and doesn't have the
> > > legacy virtio driver limitations that enforced us to run only on X86.
> > > 
> > > So, by that time we may need to enable in VFIO the loading of
> > > virtio-vfio-pci driver and put back the ifdef X86 inside VIRTIO, only on
> > > the legacy IO API, as I did already in V8.
> > > 
> > > So using a KCONFIG solution in VFIO is a short term one, which will be
> > > reverted just later on.
> > 
> > I understand the intent, but I don't think that justifies building a
> > driver that serves no purpose in the interim.  IF and when migration
> > support becomes a reality, it's trivial to update the depends line.
> > 
> 
> OK, so I'll add a KCONFIG dependency on X86 as you suggested as part of V9
> inside VFIO.
> 
> > > In addition, the virtio_pci_admin_has_legacy_io() API can be used in the
> > > future not only by VFIO, this was one of the reasons to put it inside
> > > VIRTIO.
> > 
> > Maybe this should be governed by a new Kconfig option which would be
> > selected by drivers like this.  Thanks,
> > 
> 
> We can still keep the simple ifdef X86 inside VIRTIO for future users/usage
> which is not only VFIO.
> 
> Michael,
> Can that work for you ?
> 
> Yishai
> 
> > Alex
> > 

I am not sure what is proposed exactly. General admin q infrastructure
can be kept as is. The legacy things however can never work outside X86.
Best way to limit it to x86 is to move it to a separate file and
only build that on X86. This way the only ifdef we need is where
we set the flags to enable legacy commands.
Tian, Kevin Dec. 15, 2023, 12:32 a.m. UTC | #17
> From: Yishai Hadas <yishaih@nvidia.com>
> Sent: Thursday, December 14, 2023 4:58 PM
> 
> On 14/12/2023 8:07, Tian, Kevin wrote:
> >> From: Yishai Hadas <yishaih@nvidia.com>
> >> Sent: Wednesday, December 13, 2023 8:25 PM
> >>
> >> On 13/12/2023 10:23, Tian, Kevin wrote:
> >>>> From: Yishai Hadas <yishaih@nvidia.com>
> >>>> Sent: Thursday, December 7, 2023 6:28 PM
> >>>>
> >>>> +
> >>>> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> >>>> +					char __user *buf, size_t count,
> >>>> +					loff_t *ppos)
> >>>> +{
> >>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >>>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >>>> +	size_t register_offset;
> >>>> +	loff_t copy_offset;
> >>>> +	size_t copy_count;
> >>>> +	__le32 val32;
> >>>> +	__le16 val16;
> >>>> +	u8 val8;
> >>>> +	int ret;
> >>>> +
> >>>> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> >>>> +	if (ret < 0)
> >>>> +		return ret;
> >>>> +
> >>>> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> >>>> +				  &copy_offset, &copy_count,
> >>>> &register_offset)) {
> >>>> +		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
> >>>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 +
> >>>> register_offset, copy_count))
> >>>> +			return -EFAULT;
> >>>> +	}
> >>>> +
> >>>> +	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
> >>>> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
> >>>> +				  &copy_offset, &copy_count,
> >>>> &register_offset)) {
> >>>> +		if (copy_from_user((void *)&val16 + register_offset, buf +
> >>>> copy_offset,
> >>>> +				   copy_count))
> >>>> +			return -EFAULT;
> >>>> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> >>>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 +
> >>>> register_offset,
> >>>> +				 copy_count))
> >>>> +			return -EFAULT;
> >>>> +	}
> >>>
> >>> the write handler calls vfio_pci_core_write() for PCI_COMMAND so
> >>> the core vconfig should have the latest copy of the IO bit value which
> >>> is copied to the user buffer by vfio_pci_core_read(). then not necessary
> >>> to update it again.
> >>
> >> You assume the the 'vconfig' mechanism/flow is always applicable for
> >> that specific field, this should be double-checked.
> >> However, as for now the driver doesn't rely / use the vconfig for other
> >> fields as it doesn't match and need a big refactor, I prefer to not rely
> >> on it at all and have it here.
> >
> > iiuc this driver does relies on vconfig for other fields. It first calls
> > vfio_pci_core_read() and then modifies selected fields which needs
> > special tweak in this driver.
> 
> No, there is no dependency at all on vconfig for other fields in the driver.
> 
> vfio_pci_core_read() for most of its fields including the PCI_COMMAND
> goes directly over the PCI API/flow to the device and doesn't use the
> vconfig.
> 
> So, we must save/restore the PCI_COMMAND on the driver context to have
> it properly reported/emulated the PCI_COMMAND_IO bit.

you are right. sorry that I ignored this fact.
Yishai Hadas Dec. 17, 2023, 10:39 a.m. UTC | #18
On 14/12/2023 18:40, Michael S. Tsirkin wrote:
> On Thu, Dec 14, 2023 at 06:25:25PM +0200, Yishai Hadas wrote:
>> On 14/12/2023 18:15, Alex Williamson wrote:
>>> On Thu, 14 Dec 2023 18:03:30 +0200
>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>>
>>>> On 14/12/2023 17:05, Michael S. Tsirkin wrote:
>>>>> On Thu, Dec 14, 2023 at 07:59:05AM -0700, Alex Williamson wrote:
>>>>>> On Thu, 14 Dec 2023 11:37:10 +0200
>>>>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>>>>>>>> OK, if so, we can come with the below extra code.
>>>>>>>>> Makes sense ?
>>>>>>>>>
>>>>>>>>> I'll squash it as part of V8 to the relevant patch.
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/virtio/virtio_pci_modern.c
>>>>>>>>> b/drivers/virtio/virtio_pci_modern.c
>>>>>>>>> index 37a0035f8381..b652e91b9df4 100644
>>>>>>>>> --- a/drivers/virtio/virtio_pci_modern.c
>>>>>>>>> +++ b/drivers/virtio/virtio_pci_modern.c
>>>>>>>>> @@ -794,6 +794,9 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
>>>>>>>>> *pdev)
>>>>>>>>>             struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>>>>>>             struct virtio_pci_device *vp_dev;
>>>>>>>>>
>>>>>>>>> +#ifndef CONFIG_X86
>>>>>>>>> +       return false;
>>>>>>>>> +#endif
>>>>>>>>>             if (!virtio_dev)
>>>>>>>>>                     return false;
>>>>>>>>>
>>>>>>>>> Yishai
>>>>>>>>
>>>>>>>> Isn't there going to be a bunch more dead code that compiler won't be
>>>>>>>> able to elide?
>>>>>>>
>>>>>>> On my setup the compiler didn't complain about dead-code (I simulated it
>>>>>>> by using ifdef CONFIG_X86 return false).
>>>>>>>
>>>>>>> However, if we suspect that some compiler might complain, we can come
>>>>>>> with the below instead.
>>>>>>>
>>>>>>> Do you prefer that ?
>>>>>>>
>>>>>>> index 37a0035f8381..53e29824d404 100644
>>>>>>> --- a/drivers/virtio/virtio_pci_modern.c
>>>>>>> +++ b/drivers/virtio/virtio_pci_modern.c
>>>>>>> @@ -782,6 +782,7 @@ static void vp_modern_destroy_avq(struct
>>>>>>> virtio_device *vdev)
>>>>>>>              BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
>>>>>>>              BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
>>>>>>>
>>>>>>> +#ifdef CONFIG_X86
>>>>>>>      /*
>>>>>>>       * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
>>>>>>>       * commands are supported
>>>>>>> @@ -807,6 +808,12 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
>>>>>>> *pdev)
>>>>>>>                     return true;
>>>>>>>             return false;
>>>>>>>      }
>>>>>>> +#else
>>>>>>> +bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
>>>>>>> +{
>>>>>>> +       return false;
>>>>>>> +}
>>>>>>> +#endif
>>>>>>>      EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);
>>>>>>
>>>>>> Doesn't this also raise the question of the purpose of virtio-vfio-pci
>>>>>> on non-x86?  Without any other features it offers nothing over vfio-pci
>>>>>> and we should likely adjust the Kconfig for x86 or COMPILE_TEST.
>>>>>> Thanks,
>>>>>>
>>>>>> Alex
>>>>>
>>>>> Kconfig dependency is what I had in mind, yes. The X86 specific code in
>>>>> virtio_pci_modern.c can be moved to a separate file then use makefile
>>>>> tricks to skip it on other platforms.
>>>>
>>>> The next feature for that driver will be the live migration support over
>>>> virtio, once the specification which is WIP those day will be accepted.
>>>>
>>>> The migration functionality is not X86 dependent and doesn't have the
>>>> legacy virtio driver limitations that enforced us to run only on X86.
>>>>
>>>> So, by that time we may need to enable in VFIO the loading of
>>>> virtio-vfio-pci driver and put back the ifdef X86 inside VIRTIO, only on
>>>> the legacy IO API, as I did already in V8.
>>>>
>>>> So using a KCONFIG solution in VFIO is a short term one, which will be
>>>> reverted just later on.
>>>
>>> I understand the intent, but I don't think that justifies building a
>>> driver that serves no purpose in the interim.  IF and when migration
>>> support becomes a reality, it's trivial to update the depends line.
>>>
>>
>> OK, so I'll add a KCONFIG dependency on X86 as you suggested as part of V9
>> inside VFIO.
>>
>>>> In addition, the virtio_pci_admin_has_legacy_io() API can be used in the
>>>> future not only by VFIO, this was one of the reasons to put it inside
>>>> VIRTIO.
>>>
>>> Maybe this should be governed by a new Kconfig option which would be
>>> selected by drivers like this.  Thanks,
>>>
>>
>> We can still keep the simple ifdef X86 inside VIRTIO for future users/usage
>> which is not only VFIO.
>>
>> Michael,
>> Can that work for you ?
>>
>> Yishai
>>
>>> Alex
>>>
> 
> I am not sure what is proposed exactly. General admin q infrastructure
> can be kept as is. The legacy things however can never work outside X86.
> Best way to limit it to x86 is to move it to a separate file and
> only build that on X86. This way the only ifdef we need is where
> we set the flags to enable legacy commands.
> 
> 

In VFIO we already agreed to add a dependency on X86 [1] as Alex asked.

As VIRTIO should be ready for other clients and be self contained, I 
thought to keep things simple and just return false from 
virtio_pci_admin_has_legacy_io() in non X86 systems as was sent in V8.

However, we can go with your approach as well and compile out all the 
legacy IO stuff in non X86 systems by moving its code to a separate file 
(i.e. virtio_pci_admin_legacy_io.c) and control this file upon the 
Makefile. In addition, you suggested to control the 'supported_cmds' by 
an ifdef. This will let the device know that we don't support legacy IO 
as well on non X86 systems.

Please be aware that the above approach requires another ifdef on the H 
file which exposes the 6 exported symbols and some further changes 
inside virtio as of making vp_modern_admin_cmd_exec() non static as now 
we move the legacy IO stuff to another C file, etc.

Please see below how [2] it will look like.

If you prefer that, so OK, it will be part of V9.
Please let me know.


[1] diff --git a/drivers/vfio/pci/virtio/Kconfig 
b/drivers/vfio/pci/virtio/Kconfig
index 050473b0e5df..a3e5d8ea22a0 100644
--- a/drivers/vfio/pci/virtio/Kconfig
+++ b/drivers/vfio/pci/virtio/Kconfig
@@ -1,7 +1,7 @@
  # SPDX-License-Identifier: GPL-2.0-only
  config VIRTIO_VFIO_PCI
          tristate "VFIO support for VIRTIO NET PCI devices"
-        depends on VIRTIO_PCI
+        depends on X86 && VIRTIO_PCI
          select VFIO_PCI_CORE
          help
            This provides support for exposing VIRTIO NET VF devices 
which support

[2] diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 8e98d24917cc..a73358bb4ebb 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
  virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
+virtio_pci-$(CONFIG_X86) += virtio_pci_admin_legacy_io.o
  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
  obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
diff --git a/drivers/virtio/virtio_pci_common.h 
b/drivers/virtio/virtio_pci_common.h
index af676b3b9907..9963e5d0e881 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -158,4 +158,14 @@ void virtio_pci_modern_remove(struct 
virtio_pci_device *);

  struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);

+#define VIRTIO_LEGACY_ADMIN_CMD_BITMAP \
+       (BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
+        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
+        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
+        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
+        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
+
+int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
+                            struct virtio_admin_cmd *cmd);
+
  #endif
diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index 53e29824d404..defb6282e1d7 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -75,8 +75,8 @@ static int virtqueue_exec_admin_cmd(struct 
virtio_pci_admin_vq *admin_vq,
         return 0;
  }

-static int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
-                                   struct virtio_admin_cmd *cmd)
+int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
+                            struct virtio_admin_cmd *cmd)
  {
         struct scatterlist *sgs[VIRTIO_AVQ_SGS_MAX], hdr, stat;
         struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -172,6 +172,9 @@ static void virtio_pci_admin_cmd_list_init(struct 
virtio_device *virtio_dev)
         if (ret)
                 goto end;

+#ifndef CONFIG_X86
+       *data &= ~(cpu_to_le64(VIRTIO_LEGACY_ADMIN_CMD_BITMAP));
+#endif
         sg_init_one(&data_sg, data, sizeof(*data));
         cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
         cmd.data_sg = &data_sg;
@@ -775,257 +778,6 @@ static void vp_modern_destroy_avq(struct 
virtio_device *vdev)
         vp_dev->del_vq(&vp_dev->admin_vq.info);
  }

-#define VIRTIO_LEGACY_ADMIN_CMD_BITMAP \
-       (BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
-        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
-        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
-        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
-        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
-
-#ifdef CONFIG_X86
-/*
- * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
- * commands are supported
- * @dev: VF pci_dev
- *
- * Returns true on success.
- */
-bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
-{
-       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
-       struct virtio_pci_device *vp_dev;
-
-       if (!virtio_dev)
-               return false;
-
-       if (!virtio_has_feature(virtio_dev, VIRTIO_F_ADMIN_VQ))
-               return false;


<other deletion to the new file>
<other deletion to the new file>
..
..

diff --git a/drivers/virtio/virtio_pci_admin_legacy_io.c 
b/drivers/virtio/virtio_pci_admin_legacy_io.c
new file mode 100644
index 000000000000..c48eaaa7c086
--- /dev/null
+++ b/drivers/virtio/virtio_pci_admin_legacy_io.c
@@ -0,0 +1,244 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include "virtio_pci_common.h"
+
+/*
+ * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
+ * commands are supported
+ * @dev: VF pci_dev
+ *
+ * Returns true on success.
+ */
+bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
+{
+       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+       struct virtio_pci_device *vp_dev;
+
+       if (!virtio_dev)
+               return false;
+
+       if (!virtio_has_feature(virtio_dev, VIRTIO_F_ADMIN_VQ))
+               return false;
+
+       vp_dev = to_vp_device(virtio_dev);
+
+       if ((vp_dev->admin_vq.supported_cmds & 
VIRTIO_LEGACY_ADMIN_CMD_BITMAP) ==
+               VIRTIO_LEGACY_ADMIN_CMD_BITMAP)
+               return true;
+       return false;
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);


<other legacy IO code>
<other legacy IO code>
...
...


diff --git a/include/linux/virtio_pci_admin.h 
b/include/linux/virtio_pci_admin.h
index 446ced8cb050..0c9c1f336d3f 100644
--- a/include/linux/virtio_pci_admin.h
+++ b/include/linux/virtio_pci_admin.h
@@ -5,6 +5,7 @@
  #include <linux/types.h>
  #include <linux/pci.h>

+#ifdef CONFIG_X86
  bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev);
  int virtio_pci_admin_legacy_common_io_write(struct pci_dev *pdev, u8 
offset,
                                             u8 size, u8 *buf);
@@ -17,5 +18,6 @@ int virtio_pci_admin_legacy_device_io_read(struct 
pci_dev *pdev, u8 offset,
  int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
                                            u8 req_bar_flags, u8 *bar,
                                            u64 *bar_offset);
+#endif

Yishai
Michael S. Tsirkin Dec. 17, 2023, 12:20 p.m. UTC | #19
On Sun, Dec 17, 2023 at 12:39:48PM +0200, Yishai Hadas wrote:
> On 14/12/2023 18:40, Michael S. Tsirkin wrote:
> > On Thu, Dec 14, 2023 at 06:25:25PM +0200, Yishai Hadas wrote:
> > > On 14/12/2023 18:15, Alex Williamson wrote:
> > > > On Thu, 14 Dec 2023 18:03:30 +0200
> > > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > > > 
> > > > > On 14/12/2023 17:05, Michael S. Tsirkin wrote:
> > > > > > On Thu, Dec 14, 2023 at 07:59:05AM -0700, Alex Williamson wrote:
> > > > > > > On Thu, 14 Dec 2023 11:37:10 +0200
> > > > > > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > > > > > > > > > OK, if so, we can come with the below extra code.
> > > > > > > > > > Makes sense ?
> > > > > > > > > > 
> > > > > > > > > > I'll squash it as part of V8 to the relevant patch.
> > > > > > > > > > 
> > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern.c
> > > > > > > > > > b/drivers/virtio/virtio_pci_modern.c
> > > > > > > > > > index 37a0035f8381..b652e91b9df4 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > > > > > > @@ -794,6 +794,9 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
> > > > > > > > > > *pdev)
> > > > > > > > > >             struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > > > > > > > >             struct virtio_pci_device *vp_dev;
> > > > > > > > > > 
> > > > > > > > > > +#ifndef CONFIG_X86
> > > > > > > > > > +       return false;
> > > > > > > > > > +#endif
> > > > > > > > > >             if (!virtio_dev)
> > > > > > > > > >                     return false;
> > > > > > > > > > 
> > > > > > > > > > Yishai
> > > > > > > > > 
> > > > > > > > > Isn't there going to be a bunch more dead code that compiler won't be
> > > > > > > > > able to elide?
> > > > > > > > 
> > > > > > > > On my setup the compiler didn't complain about dead-code (I simulated it
> > > > > > > > by using ifdef CONFIG_X86 return false).
> > > > > > > > 
> > > > > > > > However, if we suspect that some compiler might complain, we can come
> > > > > > > > with the below instead.
> > > > > > > > 
> > > > > > > > Do you prefer that ?
> > > > > > > > 
> > > > > > > > index 37a0035f8381..53e29824d404 100644
> > > > > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > > > > @@ -782,6 +782,7 @@ static void vp_modern_destroy_avq(struct
> > > > > > > > virtio_device *vdev)
> > > > > > > >              BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> > > > > > > >              BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> > > > > > > > 
> > > > > > > > +#ifdef CONFIG_X86
> > > > > > > >      /*
> > > > > > > >       * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
> > > > > > > >       * commands are supported
> > > > > > > > @@ -807,6 +808,12 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
> > > > > > > > *pdev)
> > > > > > > >                     return true;
> > > > > > > >             return false;
> > > > > > > >      }
> > > > > > > > +#else
> > > > > > > > +bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
> > > > > > > > +{
> > > > > > > > +       return false;
> > > > > > > > +}
> > > > > > > > +#endif
> > > > > > > >      EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);
> > > > > > > 
> > > > > > > Doesn't this also raise the question of the purpose of virtio-vfio-pci
> > > > > > > on non-x86?  Without any other features it offers nothing over vfio-pci
> > > > > > > and we should likely adjust the Kconfig for x86 or COMPILE_TEST.
> > > > > > > Thanks,
> > > > > > > 
> > > > > > > Alex
> > > > > > 
> > > > > > Kconfig dependency is what I had in mind, yes. The X86 specific code in
> > > > > > virtio_pci_modern.c can be moved to a separate file then use makefile
> > > > > > tricks to skip it on other platforms.
> > > > > 
> > > > > The next feature for that driver will be the live migration support over
> > > > > virtio, once the specification which is WIP those day will be accepted.
> > > > > 
> > > > > The migration functionality is not X86 dependent and doesn't have the
> > > > > legacy virtio driver limitations that enforced us to run only on X86.
> > > > > 
> > > > > So, by that time we may need to enable in VFIO the loading of
> > > > > virtio-vfio-pci driver and put back the ifdef X86 inside VIRTIO, only on
> > > > > the legacy IO API, as I did already in V8.
> > > > > 
> > > > > So using a KCONFIG solution in VFIO is a short term one, which will be
> > > > > reverted just later on.
> > > > 
> > > > I understand the intent, but I don't think that justifies building a
> > > > driver that serves no purpose in the interim.  IF and when migration
> > > > support becomes a reality, it's trivial to update the depends line.
> > > > 
> > > 
> > > OK, so I'll add a KCONFIG dependency on X86 as you suggested as part of V9
> > > inside VFIO.
> > > 
> > > > > In addition, the virtio_pci_admin_has_legacy_io() API can be used in the
> > > > > future not only by VFIO, this was one of the reasons to put it inside
> > > > > VIRTIO.
> > > > 
> > > > Maybe this should be governed by a new Kconfig option which would be
> > > > selected by drivers like this.  Thanks,
> > > > 
> > > 
> > > We can still keep the simple ifdef X86 inside VIRTIO for future users/usage
> > > which is not only VFIO.
> > > 
> > > Michael,
> > > Can that work for you ?
> > > 
> > > Yishai
> > > 
> > > > Alex
> > > > 
> > 
> > I am not sure what is proposed exactly. General admin q infrastructure
> > can be kept as is. The legacy things however can never work outside X86.
> > Best way to limit it to x86 is to move it to a separate file and
> > only build that on X86. This way the only ifdef we need is where
> > we set the flags to enable legacy commands.
> > 
> > 
> 
> In VFIO we already agreed to add a dependency on X86 [1] as Alex asked.
> 
> As VIRTIO should be ready for other clients and be self contained, I thought
> to keep things simple and just return false from
> virtio_pci_admin_has_legacy_io() in non X86 systems as was sent in V8.
> 
> However, we can go with your approach as well and compile out all the legacy
> IO stuff in non X86 systems by moving its code to a separate file (i.e.
> virtio_pci_admin_legacy_io.c) and control this file upon the Makefile. In
> addition, you suggested to control the 'supported_cmds' by an ifdef. This
> will let the device know that we don't support legacy IO as well on non X86
> systems.
> 
> Please be aware that the above approach requires another ifdef on the H file
> which exposes the 6 exported symbols and some further changes inside virtio
> as of making vp_modern_admin_cmd_exec() non static as now we move the legacy
> IO stuff to another C file, etc.
> 
> Please see below how [2] it will look like.
> 
> If you prefer that, so OK, it will be part of V9.
> Please let me know.
> 
> 
> [1] diff --git a/drivers/vfio/pci/virtio/Kconfig
> b/drivers/vfio/pci/virtio/Kconfig
> index 050473b0e5df..a3e5d8ea22a0 100644
> --- a/drivers/vfio/pci/virtio/Kconfig
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -1,7 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  config VIRTIO_VFIO_PCI
>          tristate "VFIO support for VIRTIO NET PCI devices"
> -        depends on VIRTIO_PCI
> +        depends on X86 && VIRTIO_PCI
>          select VFIO_PCI_CORE
>          help
>            This provides support for exposing VIRTIO NET VF devices which
> support
> 
> [2] diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 8e98d24917cc..a73358bb4ebb 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -7,6 +7,7 @@ obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
>  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>  virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> +virtio_pci-$(CONFIG_X86) += virtio_pci_admin_legacy_io.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
>  obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
> diff --git a/drivers/virtio/virtio_pci_common.h
> b/drivers/virtio/virtio_pci_common.h
> index af676b3b9907..9963e5d0e881 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -158,4 +158,14 @@ void virtio_pci_modern_remove(struct virtio_pci_device
> *);
> 
>  struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
> 
> +#define VIRTIO_LEGACY_ADMIN_CMD_BITMAP \
> +       (BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> +


I'd add something like:

#ifdef CONFIG_X86
#define VIRTIO_ADMIN_CMD_BITMAP VIRTIO_LEGACY_ADMIN_CMD_BITMAP
#else
#define VIRTIO_ADMIN_CMD_BITMAP 0
#endif

Add a comment explaining why, please.


> +int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
> +                            struct virtio_admin_cmd *cmd);
> +
>  #endif
> diff --git a/drivers/virtio/virtio_pci_modern.c
> b/drivers/virtio/virtio_pci_modern.c
> index 53e29824d404..defb6282e1d7 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -75,8 +75,8 @@ static int virtqueue_exec_admin_cmd(struct
> virtio_pci_admin_vq *admin_vq,
>         return 0;
>  }
> 
> -static int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
> -                                   struct virtio_admin_cmd *cmd)
> +int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
> +                            struct virtio_admin_cmd *cmd)
>  {
>         struct scatterlist *sgs[VIRTIO_AVQ_SGS_MAX], hdr, stat;
>         struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> @@ -172,6 +172,9 @@ static void virtio_pci_admin_cmd_list_init(struct
> virtio_device *virtio_dev)
>         if (ret)
>                 goto end;
> 
> +#ifndef CONFIG_X86
> +       *data &= ~(cpu_to_le64(VIRTIO_LEGACY_ADMIN_CMD_BITMAP));
> +#endif

Then here we don't need an ifdef just use VIRTIO_ADMIN_CMD_BITMAP.

>         sg_init_one(&data_sg, data, sizeof(*data));
>         cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
>         cmd.data_sg = &data_sg;
> @@ -775,257 +778,6 @@ static void vp_modern_destroy_avq(struct virtio_device
> *vdev)
>         vp_dev->del_vq(&vp_dev->admin_vq.info);
>  }
> 
> -#define VIRTIO_LEGACY_ADMIN_CMD_BITMAP \
> -       (BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> -        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> -        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> -        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> -        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> -
> -#ifdef CONFIG_X86
> -/*
> - * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
> - * commands are supported
> - * @dev: VF pci_dev
> - *
> - * Returns true on success.
> - */
> -bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
> -{
> -       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> -       struct virtio_pci_device *vp_dev;
> -
> -       if (!virtio_dev)
> -               return false;
> -
> -       if (!virtio_has_feature(virtio_dev, VIRTIO_F_ADMIN_VQ))
> -               return false;
> 
> 
> <other deletion to the new file>
> <other deletion to the new file>
> ..
> ..
> 
> diff --git a/drivers/virtio/virtio_pci_admin_legacy_io.c
> b/drivers/virtio/virtio_pci_admin_legacy_io.c
> new file mode 100644
> index 000000000000..c48eaaa7c086
> --- /dev/null
> +++ b/drivers/virtio/virtio_pci_admin_legacy_io.c
> @@ -0,0 +1,244 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include "virtio_pci_common.h"
> +
> +/*
> + * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
> + * commands are supported
> + * @dev: VF pci_dev
> + *
> + * Returns true on success.
> + */
> +bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
> +{
> +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +       struct virtio_pci_device *vp_dev;
> +
> +       if (!virtio_dev)
> +               return false;
> +
> +       if (!virtio_has_feature(virtio_dev, VIRTIO_F_ADMIN_VQ))
> +               return false;
> +
> +       vp_dev = to_vp_device(virtio_dev);
> +
> +       if ((vp_dev->admin_vq.supported_cmds &
> VIRTIO_LEGACY_ADMIN_CMD_BITMAP) ==
> +               VIRTIO_LEGACY_ADMIN_CMD_BITMAP)
> +               return true;
> +       return false;
> +}
> +EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);
> 
> 
> <other legacy IO code>
> <other legacy IO code>
> ...
> ...
> 
> 
> diff --git a/include/linux/virtio_pci_admin.h
> b/include/linux/virtio_pci_admin.h
> index 446ced8cb050..0c9c1f336d3f 100644
> --- a/include/linux/virtio_pci_admin.h
> +++ b/include/linux/virtio_pci_admin.h
> @@ -5,6 +5,7 @@
>  #include <linux/types.h>
>  #include <linux/pci.h>
> 
> +#ifdef CONFIG_X86
>  bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev);
>  int virtio_pci_admin_legacy_common_io_write(struct pci_dev *pdev, u8
> offset,
>                                             u8 size, u8 *buf);
> @@ -17,5 +18,6 @@ int virtio_pci_admin_legacy_device_io_read(struct pci_dev
> *pdev, u8 offset,
>  int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>                                            u8 req_bar_flags, u8 *bar,
>                                            u64 *bar_offset);
> +#endif
> 
> Yishai
Yishai Hadas Dec. 17, 2023, 1:20 p.m. UTC | #20
On 17/12/2023 14:20, Michael S. Tsirkin wrote:
> On Sun, Dec 17, 2023 at 12:39:48PM +0200, Yishai Hadas wrote:
>> On 14/12/2023 18:40, Michael S. Tsirkin wrote:
>>> On Thu, Dec 14, 2023 at 06:25:25PM +0200, Yishai Hadas wrote:
>>>> On 14/12/2023 18:15, Alex Williamson wrote:
>>>>> On Thu, 14 Dec 2023 18:03:30 +0200
>>>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>>>>
>>>>>> On 14/12/2023 17:05, Michael S. Tsirkin wrote:
>>>>>>> On Thu, Dec 14, 2023 at 07:59:05AM -0700, Alex Williamson wrote:
>>>>>>>> On Thu, 14 Dec 2023 11:37:10 +0200
>>>>>>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>>>>>>>>>> OK, if so, we can come with the below extra code.
>>>>>>>>>>> Makes sense ?
>>>>>>>>>>>
>>>>>>>>>>> I'll squash it as part of V8 to the relevant patch.
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/virtio/virtio_pci_modern.c
>>>>>>>>>>> b/drivers/virtio/virtio_pci_modern.c
>>>>>>>>>>> index 37a0035f8381..b652e91b9df4 100644
>>>>>>>>>>> --- a/drivers/virtio/virtio_pci_modern.c
>>>>>>>>>>> +++ b/drivers/virtio/virtio_pci_modern.c
>>>>>>>>>>> @@ -794,6 +794,9 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
>>>>>>>>>>> *pdev)
>>>>>>>>>>>              struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>>>>>>>>              struct virtio_pci_device *vp_dev;
>>>>>>>>>>>
>>>>>>>>>>> +#ifndef CONFIG_X86
>>>>>>>>>>> +       return false;
>>>>>>>>>>> +#endif
>>>>>>>>>>>              if (!virtio_dev)
>>>>>>>>>>>                      return false;
>>>>>>>>>>>
>>>>>>>>>>> Yishai
>>>>>>>>>>
>>>>>>>>>> Isn't there going to be a bunch more dead code that compiler won't be
>>>>>>>>>> able to elide?
>>>>>>>>>
>>>>>>>>> On my setup the compiler didn't complain about dead-code (I simulated it
>>>>>>>>> by using ifdef CONFIG_X86 return false).
>>>>>>>>>
>>>>>>>>> However, if we suspect that some compiler might complain, we can come
>>>>>>>>> with the below instead.
>>>>>>>>>
>>>>>>>>> Do you prefer that ?
>>>>>>>>>
>>>>>>>>> index 37a0035f8381..53e29824d404 100644
>>>>>>>>> --- a/drivers/virtio/virtio_pci_modern.c
>>>>>>>>> +++ b/drivers/virtio/virtio_pci_modern.c
>>>>>>>>> @@ -782,6 +782,7 @@ static void vp_modern_destroy_avq(struct
>>>>>>>>> virtio_device *vdev)
>>>>>>>>>               BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
>>>>>>>>>               BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
>>>>>>>>>
>>>>>>>>> +#ifdef CONFIG_X86
>>>>>>>>>       /*
>>>>>>>>>        * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
>>>>>>>>>        * commands are supported
>>>>>>>>> @@ -807,6 +808,12 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
>>>>>>>>> *pdev)
>>>>>>>>>                      return true;
>>>>>>>>>              return false;
>>>>>>>>>       }
>>>>>>>>> +#else
>>>>>>>>> +bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
>>>>>>>>> +{
>>>>>>>>> +       return false;
>>>>>>>>> +}
>>>>>>>>> +#endif
>>>>>>>>>       EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);
>>>>>>>>
>>>>>>>> Doesn't this also raise the question of the purpose of virtio-vfio-pci
>>>>>>>> on non-x86?  Without any other features it offers nothing over vfio-pci
>>>>>>>> and we should likely adjust the Kconfig for x86 or COMPILE_TEST.
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Alex
>>>>>>>
>>>>>>> Kconfig dependency is what I had in mind, yes. The X86 specific code in
>>>>>>> virtio_pci_modern.c can be moved to a separate file then use makefile
>>>>>>> tricks to skip it on other platforms.
>>>>>>
>>>>>> The next feature for that driver will be the live migration support over
>>>>>> virtio, once the specification which is WIP those day will be accepted.
>>>>>>
>>>>>> The migration functionality is not X86 dependent and doesn't have the
>>>>>> legacy virtio driver limitations that enforced us to run only on X86.
>>>>>>
>>>>>> So, by that time we may need to enable in VFIO the loading of
>>>>>> virtio-vfio-pci driver and put back the ifdef X86 inside VIRTIO, only on
>>>>>> the legacy IO API, as I did already in V8.
>>>>>>
>>>>>> So using a KCONFIG solution in VFIO is a short term one, which will be
>>>>>> reverted just later on.
>>>>>
>>>>> I understand the intent, but I don't think that justifies building a
>>>>> driver that serves no purpose in the interim.  IF and when migration
>>>>> support becomes a reality, it's trivial to update the depends line.
>>>>>
>>>>
>>>> OK, so I'll add a KCONFIG dependency on X86 as you suggested as part of V9
>>>> inside VFIO.
>>>>
>>>>>> In addition, the virtio_pci_admin_has_legacy_io() API can be used in the
>>>>>> future not only by VFIO, this was one of the reasons to put it inside
>>>>>> VIRTIO.
>>>>>
>>>>> Maybe this should be governed by a new Kconfig option which would be
>>>>> selected by drivers like this.  Thanks,
>>>>>
>>>>
>>>> We can still keep the simple ifdef X86 inside VIRTIO for future users/usage
>>>> which is not only VFIO.
>>>>
>>>> Michael,
>>>> Can that work for you ?
>>>>
>>>> Yishai
>>>>
>>>>> Alex
>>>>>
>>>
>>> I am not sure what is proposed exactly. General admin q infrastructure
>>> can be kept as is. The legacy things however can never work outside X86.
>>> Best way to limit it to x86 is to move it to a separate file and
>>> only build that on X86. This way the only ifdef we need is where
>>> we set the flags to enable legacy commands.
>>>
>>>
>>
>> In VFIO we already agreed to add a dependency on X86 [1] as Alex asked.
>>
>> As VIRTIO should be ready for other clients and be self contained, I thought
>> to keep things simple and just return false from
>> virtio_pci_admin_has_legacy_io() in non X86 systems as was sent in V8.
>>
>> However, we can go with your approach as well and compile out all the legacy
>> IO stuff in non X86 systems by moving its code to a separate file (i.e.
>> virtio_pci_admin_legacy_io.c) and control this file upon the Makefile. In
>> addition, you suggested to control the 'supported_cmds' by an ifdef. This
>> will let the device know that we don't support legacy IO as well on non X86
>> systems.
>>
>> Please be aware that the above approach requires another ifdef on the H file
>> which exposes the 6 exported symbols and some further changes inside virtio
>> as of making vp_modern_admin_cmd_exec() non static as now we move the legacy
>> IO stuff to another C file, etc.
>>
>> Please see below how [2] it will look like.
>>
>> If you prefer that, so OK, it will be part of V9.
>> Please let me know.
>>
>>
>> [1] diff --git a/drivers/vfio/pci/virtio/Kconfig
>> b/drivers/vfio/pci/virtio/Kconfig
>> index 050473b0e5df..a3e5d8ea22a0 100644
>> --- a/drivers/vfio/pci/virtio/Kconfig
>> +++ b/drivers/vfio/pci/virtio/Kconfig
>> @@ -1,7 +1,7 @@
>>   # SPDX-License-Identifier: GPL-2.0-only
>>   config VIRTIO_VFIO_PCI
>>           tristate "VFIO support for VIRTIO NET PCI devices"
>> -        depends on VIRTIO_PCI
>> +        depends on X86 && VIRTIO_PCI
>>           select VFIO_PCI_CORE
>>           help
>>             This provides support for exposing VIRTIO NET VF devices which
>> support
>>
>> [2] diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
>> index 8e98d24917cc..a73358bb4ebb 100644
>> --- a/drivers/virtio/Makefile
>> +++ b/drivers/virtio/Makefile
>> @@ -7,6 +7,7 @@ obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
>>   obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>>   virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>>   virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>> +virtio_pci-$(CONFIG_X86) += virtio_pci_admin_legacy_io.o
>>   obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>>   obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
>>   obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
>> diff --git a/drivers/virtio/virtio_pci_common.h
>> b/drivers/virtio/virtio_pci_common.h
>> index af676b3b9907..9963e5d0e881 100644
>> --- a/drivers/virtio/virtio_pci_common.h
>> +++ b/drivers/virtio/virtio_pci_common.h
>> @@ -158,4 +158,14 @@ void virtio_pci_modern_remove(struct virtio_pci_device
>> *);
>>
>>   struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
>>
>> +#define VIRTIO_LEGACY_ADMIN_CMD_BITMAP \
>> +       (BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
>> +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
>> +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
>> +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
>> +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
>> +
> 
> 
> I'd add something like:
> 
> #ifdef CONFIG_X86
> #define VIRTIO_ADMIN_CMD_BITMAP VIRTIO_LEGACY_ADMIN_CMD_BITMAP
> #else
> #define VIRTIO_ADMIN_CMD_BITMAP 0
> #endif
> 

This new macro (i.e. VIRTIO_ADMIN_CMD_BITMAP) will hold in the future 
another sets of supported bits (e.g. Live migration), right ?

> Add a comment explaining why, please.

OK

How about the below ?

"As of some limitations in the legacy driver (e.g. lack of memory
barriers in ARM, endian-ness is broken in PPC) the supported legacy IO
commands are masked out on non x86 systems."

> 
> 
>> +int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
>> +                            struct virtio_admin_cmd *cmd);
>> +
>>   #endif
>> diff --git a/drivers/virtio/virtio_pci_modern.c
>> b/drivers/virtio/virtio_pci_modern.c
>> index 53e29824d404..defb6282e1d7 100644
>> --- a/drivers/virtio/virtio_pci_modern.c
>> +++ b/drivers/virtio/virtio_pci_modern.c
>> @@ -75,8 +75,8 @@ static int virtqueue_exec_admin_cmd(struct
>> virtio_pci_admin_vq *admin_vq,
>>          return 0;
>>   }
>>
>> -static int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
>> -                                   struct virtio_admin_cmd *cmd)
>> +int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
>> +                            struct virtio_admin_cmd *cmd)
>>   {
>>          struct scatterlist *sgs[VIRTIO_AVQ_SGS_MAX], hdr, stat;
>>          struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>> @@ -172,6 +172,9 @@ static void virtio_pci_admin_cmd_list_init(struct
>> virtio_device *virtio_dev)
>>          if (ret)
>>                  goto end;
>>
>> +#ifndef CONFIG_X86
>> +       *data &= ~(cpu_to_le64(VIRTIO_LEGACY_ADMIN_CMD_BITMAP));
>> +#endif
> 
> Then here we don't need an ifdef just use VIRTIO_ADMIN_CMD_BITMAP.
> 

Following your suggestion, this will become,
*data &= cpu_to_le64(VIRTIO_ADMIN_CMD_BITMAP);

(no ifdef, and no ~).

Yishai
Michael S. Tsirkin Dec. 17, 2023, 1:42 p.m. UTC | #21
On Sun, Dec 17, 2023 at 03:20:30PM +0200, Yishai Hadas wrote:
> On 17/12/2023 14:20, Michael S. Tsirkin wrote:
> > On Sun, Dec 17, 2023 at 12:39:48PM +0200, Yishai Hadas wrote:
> > > On 14/12/2023 18:40, Michael S. Tsirkin wrote:
> > > > On Thu, Dec 14, 2023 at 06:25:25PM +0200, Yishai Hadas wrote:
> > > > > On 14/12/2023 18:15, Alex Williamson wrote:
> > > > > > On Thu, 14 Dec 2023 18:03:30 +0200
> > > > > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > > > > > 
> > > > > > > On 14/12/2023 17:05, Michael S. Tsirkin wrote:
> > > > > > > > On Thu, Dec 14, 2023 at 07:59:05AM -0700, Alex Williamson wrote:
> > > > > > > > > On Thu, 14 Dec 2023 11:37:10 +0200
> > > > > > > > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > > > > > > > > > > > OK, if so, we can come with the below extra code.
> > > > > > > > > > > > Makes sense ?
> > > > > > > > > > > > 
> > > > > > > > > > > > I'll squash it as part of V8 to the relevant patch.
> > > > > > > > > > > > 
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern.c
> > > > > > > > > > > > b/drivers/virtio/virtio_pci_modern.c
> > > > > > > > > > > > index 37a0035f8381..b652e91b9df4 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > > > > > > > > @@ -794,6 +794,9 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
> > > > > > > > > > > > *pdev)
> > > > > > > > > > > >              struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > > > > > > > > > >              struct virtio_pci_device *vp_dev;
> > > > > > > > > > > > 
> > > > > > > > > > > > +#ifndef CONFIG_X86
> > > > > > > > > > > > +       return false;
> > > > > > > > > > > > +#endif
> > > > > > > > > > > >              if (!virtio_dev)
> > > > > > > > > > > >                      return false;
> > > > > > > > > > > > 
> > > > > > > > > > > > Yishai
> > > > > > > > > > > 
> > > > > > > > > > > Isn't there going to be a bunch more dead code that compiler won't be
> > > > > > > > > > > able to elide?
> > > > > > > > > > 
> > > > > > > > > > On my setup the compiler didn't complain about dead-code (I simulated it
> > > > > > > > > > by using ifdef CONFIG_X86 return false).
> > > > > > > > > > 
> > > > > > > > > > However, if we suspect that some compiler might complain, we can come
> > > > > > > > > > with the below instead.
> > > > > > > > > > 
> > > > > > > > > > Do you prefer that ?
> > > > > > > > > > 
> > > > > > > > > > index 37a0035f8381..53e29824d404 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > > > > > > @@ -782,6 +782,7 @@ static void vp_modern_destroy_avq(struct
> > > > > > > > > > virtio_device *vdev)
> > > > > > > > > >               BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> > > > > > > > > >               BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> > > > > > > > > > 
> > > > > > > > > > +#ifdef CONFIG_X86
> > > > > > > > > >       /*
> > > > > > > > > >        * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
> > > > > > > > > >        * commands are supported
> > > > > > > > > > @@ -807,6 +808,12 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
> > > > > > > > > > *pdev)
> > > > > > > > > >                      return true;
> > > > > > > > > >              return false;
> > > > > > > > > >       }
> > > > > > > > > > +#else
> > > > > > > > > > +bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
> > > > > > > > > > +{
> > > > > > > > > > +       return false;
> > > > > > > > > > +}
> > > > > > > > > > +#endif
> > > > > > > > > >       EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);
> > > > > > > > > 
> > > > > > > > > Doesn't this also raise the question of the purpose of virtio-vfio-pci
> > > > > > > > > on non-x86?  Without any other features it offers nothing over vfio-pci
> > > > > > > > > and we should likely adjust the Kconfig for x86 or COMPILE_TEST.
> > > > > > > > > Thanks,
> > > > > > > > > 
> > > > > > > > > Alex
> > > > > > > > 
> > > > > > > > Kconfig dependency is what I had in mind, yes. The X86 specific code in
> > > > > > > > virtio_pci_modern.c can be moved to a separate file then use makefile
> > > > > > > > tricks to skip it on other platforms.
> > > > > > > 
> > > > > > > The next feature for that driver will be the live migration support over
> > > > > > > virtio, once the specification which is WIP those day will be accepted.
> > > > > > > 
> > > > > > > The migration functionality is not X86 dependent and doesn't have the
> > > > > > > legacy virtio driver limitations that enforced us to run only on X86.
> > > > > > > 
> > > > > > > So, by that time we may need to enable in VFIO the loading of
> > > > > > > virtio-vfio-pci driver and put back the ifdef X86 inside VIRTIO, only on
> > > > > > > the legacy IO API, as I did already in V8.
> > > > > > > 
> > > > > > > So using a KCONFIG solution in VFIO is a short term one, which will be
> > > > > > > reverted just later on.
> > > > > > 
> > > > > > I understand the intent, but I don't think that justifies building a
> > > > > > driver that serves no purpose in the interim.  IF and when migration
> > > > > > support becomes a reality, it's trivial to update the depends line.
> > > > > > 
> > > > > 
> > > > > OK, so I'll add a KCONFIG dependency on X86 as you suggested as part of V9
> > > > > inside VFIO.
> > > > > 
> > > > > > > In addition, the virtio_pci_admin_has_legacy_io() API can be used in the
> > > > > > > future not only by VFIO, this was one of the reasons to put it inside
> > > > > > > VIRTIO.
> > > > > > 
> > > > > > Maybe this should be governed by a new Kconfig option which would be
> > > > > > selected by drivers like this.  Thanks,
> > > > > > 
> > > > > 
> > > > > We can still keep the simple ifdef X86 inside VIRTIO for future users/usage
> > > > > which is not only VFIO.
> > > > > 
> > > > > Michael,
> > > > > Can that work for you ?
> > > > > 
> > > > > Yishai
> > > > > 
> > > > > > Alex
> > > > > > 
> > > > 
> > > > I am not sure what is proposed exactly. General admin q infrastructure
> > > > can be kept as is. The legacy things however can never work outside X86.
> > > > Best way to limit it to x86 is to move it to a separate file and
> > > > only build that on X86. This way the only ifdef we need is where
> > > > we set the flags to enable legacy commands.
> > > > 
> > > > 
> > > 
> > > In VFIO we already agreed to add a dependency on X86 [1] as Alex asked.
> > > 
> > > As VIRTIO should be ready for other clients and be self contained, I thought
> > > to keep things simple and just return false from
> > > virtio_pci_admin_has_legacy_io() in non X86 systems as was sent in V8.
> > > 
> > > However, we can go with your approach as well and compile out all the legacy
> > > IO stuff in non X86 systems by moving its code to a separate file (i.e.
> > > virtio_pci_admin_legacy_io.c) and control this file upon the Makefile. In
> > > addition, you suggested to control the 'supported_cmds' by an ifdef. This
> > > will let the device know that we don't support legacy IO as well on non X86
> > > systems.
> > > 
> > > Please be aware that the above approach requires another ifdef on the H file
> > > which exposes the 6 exported symbols and some further changes inside virtio
> > > as of making vp_modern_admin_cmd_exec() non static as now we move the legacy
> > > IO stuff to another C file, etc.
> > > 
> > > Please see below how [2] it will look like.
> > > 
> > > If you prefer that, so OK, it will be part of V9.
> > > Please let me know.
> > > 
> > > 
> > > [1] diff --git a/drivers/vfio/pci/virtio/Kconfig
> > > b/drivers/vfio/pci/virtio/Kconfig
> > > index 050473b0e5df..a3e5d8ea22a0 100644
> > > --- a/drivers/vfio/pci/virtio/Kconfig
> > > +++ b/drivers/vfio/pci/virtio/Kconfig
> > > @@ -1,7 +1,7 @@
> > >   # SPDX-License-Identifier: GPL-2.0-only
> > >   config VIRTIO_VFIO_PCI
> > >           tristate "VFIO support for VIRTIO NET PCI devices"
> > > -        depends on VIRTIO_PCI
> > > +        depends on X86 && VIRTIO_PCI
> > >           select VFIO_PCI_CORE
> > >           help
> > >             This provides support for exposing VIRTIO NET VF devices which
> > > support
> > > 
> > > [2] diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > > index 8e98d24917cc..a73358bb4ebb 100644
> > > --- a/drivers/virtio/Makefile
> > > +++ b/drivers/virtio/Makefile
> > > @@ -7,6 +7,7 @@ obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
> > >   obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
> > >   virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> > >   virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> > > +virtio_pci-$(CONFIG_X86) += virtio_pci_admin_legacy_io.o
> > >   obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> > >   obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > >   obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
> > > diff --git a/drivers/virtio/virtio_pci_common.h
> > > b/drivers/virtio/virtio_pci_common.h
> > > index af676b3b9907..9963e5d0e881 100644
> > > --- a/drivers/virtio/virtio_pci_common.h
> > > +++ b/drivers/virtio/virtio_pci_common.h
> > > @@ -158,4 +158,14 @@ void virtio_pci_modern_remove(struct virtio_pci_device
> > > *);
> > > 
> > >   struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
> > > 
> > > +#define VIRTIO_LEGACY_ADMIN_CMD_BITMAP \
> > > +       (BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> > > +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> > > +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> > > +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> > > +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> > > +
> > 
> > 
> > I'd add something like:
> > 
> > #ifdef CONFIG_X86
> > #define VIRTIO_ADMIN_CMD_BITMAP VIRTIO_LEGACY_ADMIN_CMD_BITMAP
> > #else
> > #define VIRTIO_ADMIN_CMD_BITMAP 0
> > #endif
> > 
> 
> This new macro (i.e. VIRTIO_ADMIN_CMD_BITMAP) will hold in the future
> another sets of supported bits (e.g. Live migration), right ?


That's my idea, yes.

> > Add a comment explaining why, please.
> 
> OK
> 
> How about the below ?
> 
> "As of some limitations in the legacy driver (e.g. lack of memory
> barriers in ARM, endian-ness is broken in PPC) the supported legacy IO
> commands are masked out on non x86 systems."

Some grammar corrections and clarifications:

Unlike modern drivers which support hardware virtio devices, legacy
drivers assume software-based devices: e.g. they don't use proper memory
barriers on ARM, use big endian on PPC, etc. X86 drivers are mostly ok
though, more or less by chance. For now, only support legacy IO on x86.


> > 
> > 
> > > +int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
> > > +                            struct virtio_admin_cmd *cmd);
> > > +
> > >   #endif
> > > diff --git a/drivers/virtio/virtio_pci_modern.c
> > > b/drivers/virtio/virtio_pci_modern.c
> > > index 53e29824d404..defb6282e1d7 100644
> > > --- a/drivers/virtio/virtio_pci_modern.c
> > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > @@ -75,8 +75,8 @@ static int virtqueue_exec_admin_cmd(struct
> > > virtio_pci_admin_vq *admin_vq,
> > >          return 0;
> > >   }
> > > 
> > > -static int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
> > > -                                   struct virtio_admin_cmd *cmd)
> > > +int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
> > > +                            struct virtio_admin_cmd *cmd)
> > >   {
> > >          struct scatterlist *sgs[VIRTIO_AVQ_SGS_MAX], hdr, stat;
> > >          struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > @@ -172,6 +172,9 @@ static void virtio_pci_admin_cmd_list_init(struct
> > > virtio_device *virtio_dev)
> > >          if (ret)
> > >                  goto end;
> > > 
> > > +#ifndef CONFIG_X86
> > > +       *data &= ~(cpu_to_le64(VIRTIO_LEGACY_ADMIN_CMD_BITMAP));
> > > +#endif
> > 
> > Then here we don't need an ifdef just use VIRTIO_ADMIN_CMD_BITMAP.
> > 
> 
> Following your suggestion, this will become,
> *data &= cpu_to_le64(VIRTIO_ADMIN_CMD_BITMAP);
> 
> (no ifdef, and no ~).
> 
> Yishai

right
Yishai Hadas Dec. 17, 2023, 2:18 p.m. UTC | #22
On 17/12/2023 15:42, Michael S. Tsirkin wrote:
> On Sun, Dec 17, 2023 at 03:20:30PM +0200, Yishai Hadas wrote:
>> On 17/12/2023 14:20, Michael S. Tsirkin wrote:
>>> On Sun, Dec 17, 2023 at 12:39:48PM +0200, Yishai Hadas wrote:
>>>> On 14/12/2023 18:40, Michael S. Tsirkin wrote:
>>>>> On Thu, Dec 14, 2023 at 06:25:25PM +0200, Yishai Hadas wrote:
>>>>>> On 14/12/2023 18:15, Alex Williamson wrote:
>>>>>>> On Thu, 14 Dec 2023 18:03:30 +0200
>>>>>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>>>>>>
>>>>>>>> On 14/12/2023 17:05, Michael S. Tsirkin wrote:
>>>>>>>>> On Thu, Dec 14, 2023 at 07:59:05AM -0700, Alex Williamson wrote:
>>>>>>>>>> On Thu, 14 Dec 2023 11:37:10 +0200
>>>>>>>>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>>>>>>>>>>>> OK, if so, we can come with the below extra code.
>>>>>>>>>>>>> Makes sense ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'll squash it as part of V8 to the relevant patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/virtio/virtio_pci_modern.c
>>>>>>>>>>>>> b/drivers/virtio/virtio_pci_modern.c
>>>>>>>>>>>>> index 37a0035f8381..b652e91b9df4 100644
>>>>>>>>>>>>> --- a/drivers/virtio/virtio_pci_modern.c
>>>>>>>>>>>>> +++ b/drivers/virtio/virtio_pci_modern.c
>>>>>>>>>>>>> @@ -794,6 +794,9 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
>>>>>>>>>>>>> *pdev)
>>>>>>>>>>>>>               struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>>>>>>>>>>               struct virtio_pci_device *vp_dev;
>>>>>>>>>>>>>
>>>>>>>>>>>>> +#ifndef CONFIG_X86
>>>>>>>>>>>>> +       return false;
>>>>>>>>>>>>> +#endif
>>>>>>>>>>>>>               if (!virtio_dev)
>>>>>>>>>>>>>                       return false;
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yishai
>>>>>>>>>>>>
>>>>>>>>>>>> Isn't there going to be a bunch more dead code that compiler won't be
>>>>>>>>>>>> able to elide?
>>>>>>>>>>>
>>>>>>>>>>> On my setup the compiler didn't complain about dead-code (I simulated it
>>>>>>>>>>> by using ifdef CONFIG_X86 return false).
>>>>>>>>>>>
>>>>>>>>>>> However, if we suspect that some compiler might complain, we can come
>>>>>>>>>>> with the below instead.
>>>>>>>>>>>
>>>>>>>>>>> Do you prefer that ?
>>>>>>>>>>>
>>>>>>>>>>> index 37a0035f8381..53e29824d404 100644
>>>>>>>>>>> --- a/drivers/virtio/virtio_pci_modern.c
>>>>>>>>>>> +++ b/drivers/virtio/virtio_pci_modern.c
>>>>>>>>>>> @@ -782,6 +782,7 @@ static void vp_modern_destroy_avq(struct
>>>>>>>>>>> virtio_device *vdev)
>>>>>>>>>>>                BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
>>>>>>>>>>>                BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
>>>>>>>>>>>
>>>>>>>>>>> +#ifdef CONFIG_X86
>>>>>>>>>>>        /*
>>>>>>>>>>>         * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
>>>>>>>>>>>         * commands are supported
>>>>>>>>>>> @@ -807,6 +808,12 @@ bool virtio_pci_admin_has_legacy_io(struct pci_dev
>>>>>>>>>>> *pdev)
>>>>>>>>>>>                       return true;
>>>>>>>>>>>               return false;
>>>>>>>>>>>        }
>>>>>>>>>>> +#else
>>>>>>>>>>> +bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
>>>>>>>>>>> +{
>>>>>>>>>>> +       return false;
>>>>>>>>>>> +}
>>>>>>>>>>> +#endif
>>>>>>>>>>>        EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);
>>>>>>>>>>
>>>>>>>>>> Doesn't this also raise the question of the purpose of virtio-vfio-pci
>>>>>>>>>> on non-x86?  Without any other features it offers nothing over vfio-pci
>>>>>>>>>> and we should likely adjust the Kconfig for x86 or COMPILE_TEST.
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Alex
>>>>>>>>>
>>>>>>>>> Kconfig dependency is what I had in mind, yes. The X86 specific code in
>>>>>>>>> virtio_pci_modern.c can be moved to a separate file then use makefile
>>>>>>>>> tricks to skip it on other platforms.
>>>>>>>>
>>>>>>>> The next feature for that driver will be the live migration support over
>>>>>>>> virtio, once the specification which is WIP those day will be accepted.
>>>>>>>>
>>>>>>>> The migration functionality is not X86 dependent and doesn't have the
>>>>>>>> legacy virtio driver limitations that enforced us to run only on X86.
>>>>>>>>
>>>>>>>> So, by that time we may need to enable in VFIO the loading of
>>>>>>>> virtio-vfio-pci driver and put back the ifdef X86 inside VIRTIO, only on
>>>>>>>> the legacy IO API, as I did already in V8.
>>>>>>>>
>>>>>>>> So using a KCONFIG solution in VFIO is a short term one, which will be
>>>>>>>> reverted just later on.
>>>>>>>
>>>>>>> I understand the intent, but I don't think that justifies building a
>>>>>>> driver that serves no purpose in the interim.  IF and when migration
>>>>>>> support becomes a reality, it's trivial to update the depends line.
>>>>>>>
>>>>>>
>>>>>> OK, so I'll add a KCONFIG dependency on X86 as you suggested as part of V9
>>>>>> inside VFIO.
>>>>>>
>>>>>>>> In addition, the virtio_pci_admin_has_legacy_io() API can be used in the
>>>>>>>> future not only by VFIO, this was one of the reasons to put it inside
>>>>>>>> VIRTIO.
>>>>>>>
>>>>>>> Maybe this should be governed by a new Kconfig option which would be
>>>>>>> selected by drivers like this.  Thanks,
>>>>>>>
>>>>>>
>>>>>> We can still keep the simple ifdef X86 inside VIRTIO for future users/usage
>>>>>> which is not only VFIO.
>>>>>>
>>>>>> Michael,
>>>>>> Can that work for you ?
>>>>>>
>>>>>> Yishai
>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>
>>>>> I am not sure what is proposed exactly. General admin q infrastructure
>>>>> can be kept as is. The legacy things however can never work outside X86.
>>>>> Best way to limit it to x86 is to move it to a separate file and
>>>>> only build that on X86. This way the only ifdef we need is where
>>>>> we set the flags to enable legacy commands.
>>>>>
>>>>>
>>>>
>>>> In VFIO we already agreed to add a dependency on X86 [1] as Alex asked.
>>>>
>>>> As VIRTIO should be ready for other clients and be self contained, I thought
>>>> to keep things simple and just return false from
>>>> virtio_pci_admin_has_legacy_io() in non X86 systems as was sent in V8.
>>>>
>>>> However, we can go with your approach as well and compile out all the legacy
>>>> IO stuff in non X86 systems by moving its code to a separate file (i.e.
>>>> virtio_pci_admin_legacy_io.c) and control this file upon the Makefile. In
>>>> addition, you suggested to control the 'supported_cmds' by an ifdef. This
>>>> will let the device know that we don't support legacy IO as well on non X86
>>>> systems.
>>>>
>>>> Please be aware that the above approach requires another ifdef on the H file
>>>> which exposes the 6 exported symbols and some further changes inside virtio
>>>> as of making vp_modern_admin_cmd_exec() non static as now we move the legacy
>>>> IO stuff to another C file, etc.
>>>>
>>>> Please see below how [2] it will look like.
>>>>
>>>> If you prefer that, so OK, it will be part of V9.
>>>> Please let me know.
>>>>
>>>>
>>>> [1] diff --git a/drivers/vfio/pci/virtio/Kconfig
>>>> b/drivers/vfio/pci/virtio/Kconfig
>>>> index 050473b0e5df..a3e5d8ea22a0 100644
>>>> --- a/drivers/vfio/pci/virtio/Kconfig
>>>> +++ b/drivers/vfio/pci/virtio/Kconfig
>>>> @@ -1,7 +1,7 @@
>>>>    # SPDX-License-Identifier: GPL-2.0-only
>>>>    config VIRTIO_VFIO_PCI
>>>>            tristate "VFIO support for VIRTIO NET PCI devices"
>>>> -        depends on VIRTIO_PCI
>>>> +        depends on X86 && VIRTIO_PCI
>>>>            select VFIO_PCI_CORE
>>>>            help
>>>>              This provides support for exposing VIRTIO NET VF devices which
>>>> support
>>>>
>>>> [2] diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
>>>> index 8e98d24917cc..a73358bb4ebb 100644
>>>> --- a/drivers/virtio/Makefile
>>>> +++ b/drivers/virtio/Makefile
>>>> @@ -7,6 +7,7 @@ obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
>>>>    obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>>>>    virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>>>>    virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>>>> +virtio_pci-$(CONFIG_X86) += virtio_pci_admin_legacy_io.o
>>>>    obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>>>>    obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
>>>>    obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
>>>> diff --git a/drivers/virtio/virtio_pci_common.h
>>>> b/drivers/virtio/virtio_pci_common.h
>>>> index af676b3b9907..9963e5d0e881 100644
>>>> --- a/drivers/virtio/virtio_pci_common.h
>>>> +++ b/drivers/virtio/virtio_pci_common.h
>>>> @@ -158,4 +158,14 @@ void virtio_pci_modern_remove(struct virtio_pci_device
>>>> *);
>>>>
>>>>    struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
>>>>
>>>> +#define VIRTIO_LEGACY_ADMIN_CMD_BITMAP \
>>>> +       (BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
>>>> +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
>>>> +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
>>>> +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
>>>> +        BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
>>>> +
>>>
>>>
>>> I'd add something like:
>>>
>>> #ifdef CONFIG_X86
>>> #define VIRTIO_ADMIN_CMD_BITMAP VIRTIO_LEGACY_ADMIN_CMD_BITMAP
>>> #else
>>> #define VIRTIO_ADMIN_CMD_BITMAP 0
>>> #endif
>>>
>>
>> This new macro (i.e. VIRTIO_ADMIN_CMD_BITMAP) will hold in the future
>> another sets of supported bits (e.g. Live migration), right ?
> 
> 
> That's my idea, yes.
> 
>>> Add a comment explaining why, please.
>>
>> OK
>>
>> How about the below ?
>>
>> "As of some limitations in the legacy driver (e.g. lack of memory
>> barriers in ARM, endian-ness is broken in PPC) the supported legacy IO
>> commands are masked out on non x86 systems."
> 
> Some grammar corrections and clarifications:
> 
> Unlike modern drivers which support hardware virtio devices, legacy
> drivers assume software-based devices: e.g. they don't use proper memory
> barriers on ARM, use big endian on PPC, etc. X86 drivers are mostly ok
> though, more or less by chance. For now, only support legacy IO on x86.

OK, thanks, I'll use the above.

Note:
I may use it also as part of the commit log (next patch) where we'll 
introduce the legacy IO APIs only on X86 systems, to let it be claer why 
it was done.

Yishai
diff mbox series

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 012df8ccf34e..b246b769092d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22872,6 +22872,13 @@  L:	kvm@vger.kernel.org
 S:	Maintained
 F:	drivers/vfio/pci/mlx5/
 
+VFIO VIRTIO PCI DRIVER
+M:	Yishai Hadas <yishaih@nvidia.com>
+L:	kvm@vger.kernel.org
+L:	virtualization@lists.linux-foundation.org
+S:	Maintained
+F:	drivers/vfio/pci/virtio
+
 VFIO PCI DEVICE SPECIFIC DRIVERS
 R:	Jason Gunthorpe <jgg@nvidia.com>
 R:	Yishai Hadas <yishaih@nvidia.com>
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 8125e5f37832..18c397df566d 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -65,4 +65,6 @@  source "drivers/vfio/pci/hisilicon/Kconfig"
 
 source "drivers/vfio/pci/pds/Kconfig"
 
+source "drivers/vfio/pci/virtio/Kconfig"
+
 endmenu
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 45167be462d8..046139a4eca5 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -13,3 +13,5 @@  obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
 obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
 
 obj-$(CONFIG_PDS_VFIO_PCI) += pds/
+
+obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
new file mode 100644
index 000000000000..3a6707639220
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Kconfig
@@ -0,0 +1,16 @@ 
+# SPDX-License-Identifier: GPL-2.0-only
+config VIRTIO_VFIO_PCI
+        tristate "VFIO support for VIRTIO NET PCI devices"
+        depends on VIRTIO_PCI
+        select VFIO_PCI_CORE
+        help
+          This provides support for exposing VIRTIO NET VF devices which support
+          legacy IO access, using the VFIO framework that can work with a legacy
+          virtio driver in the guest.
+          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
+          not indicate I/O Space.
+          As of that this driver emulated I/O BAR in software to let a VF be
+          seen as a transitional device in the guest and let it work with
+          a legacy driver.
+
+          If you don't know what to do here, say N.
diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
new file mode 100644
index 000000000000..2039b39fb723
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Makefile
@@ -0,0 +1,4 @@ 
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
+virtio-vfio-pci-y := main.o
+
diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
new file mode 100644
index 000000000000..9c316c115ce8
--- /dev/null
+++ b/drivers/vfio/pci/virtio/main.c
@@ -0,0 +1,567 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/pm_runtime.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/vfio_pci_core.h>
+#include <linux/virtio_pci.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_pci_admin.h>
+
+struct virtiovf_pci_core_device {
+	struct vfio_pci_core_device core_device;
+	u8 *bar0_virtual_buf;
+	/* synchronize access to the virtual buf */
+	struct mutex bar_mutex;
+	void __iomem *notify_addr;
+	u64 notify_offset;
+	__le32 pci_base_addr_0;
+	__le16 pci_cmd;
+	u8 bar0_virtual_buf_size;
+	u8 notify_bar;
+};
+
+static int
+virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
+			     loff_t pos, char __user *buf,
+			     size_t count, bool read)
+{
+	bool msix_enabled =
+		(virtvdev->core_device.irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
+	struct pci_dev *pdev = virtvdev->core_device.pdev;
+	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
+	bool common;
+	u8 offset;
+	int ret;
+
+	common = pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled);
+	/* offset within the relevant configuration area */
+	offset = common ? pos : pos - VIRTIO_PCI_CONFIG_OFF(msix_enabled);
+	mutex_lock(&virtvdev->bar_mutex);
+	if (read) {
+		if (common)
+			ret = virtio_pci_admin_legacy_common_io_read(pdev, offset,
+					count, bar0_buf + pos);
+		else
+			ret = virtio_pci_admin_legacy_device_io_read(pdev, offset,
+					count, bar0_buf + pos);
+		if (ret)
+			goto out;
+		if (copy_to_user(buf, bar0_buf + pos, count))
+			ret = -EFAULT;
+	} else {
+		if (copy_from_user(bar0_buf + pos, buf, count)) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		if (common)
+			ret = virtio_pci_admin_legacy_common_io_write(pdev, offset,
+					count, bar0_buf + pos);
+		else
+			ret = virtio_pci_admin_legacy_device_io_write(pdev, offset,
+					count, bar0_buf + pos);
+	}
+out:
+	mutex_unlock(&virtvdev->bar_mutex);
+	return ret;
+}
+
+static int
+translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
+			    loff_t pos, char __user *buf,
+			    size_t count, bool read)
+{
+	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
+	struct pci_dev *pdev = core_device->pdev;
+	u16 queue_notify;
+	int ret;
+
+	if (!(le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO))
+		return -EIO;
+
+	if (pos + count > virtvdev->bar0_virtual_buf_size)
+		return -EINVAL;
+
+	ret = pm_runtime_resume_and_get(&pdev->dev);
+	if (ret) {
+		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
+		return -EIO;
+	}
+
+	switch (pos) {
+	case VIRTIO_PCI_QUEUE_NOTIFY:
+		if (count != sizeof(queue_notify)) {
+			ret = -EINVAL;
+			goto end;
+		}
+		if (read) {
+			ret = vfio_pci_core_ioread16(core_device, true, &queue_notify,
+						     virtvdev->notify_addr);
+			if (ret)
+				goto end;
+			if (copy_to_user(buf, &queue_notify,
+					 sizeof(queue_notify))) {
+				ret = -EFAULT;
+				goto end;
+			}
+		} else {
+			if (copy_from_user(&queue_notify, buf, count)) {
+				ret = -EFAULT;
+				goto end;
+			}
+			ret = vfio_pci_core_iowrite16(core_device, true, queue_notify,
+						      virtvdev->notify_addr);
+		}
+		break;
+	default:
+		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
+						   read);
+	}
+
+end:
+	pm_runtime_put(&pdev->dev);
+	return ret ? ret : count;
+}
+
+static bool range_intersect_range(loff_t range1_start, size_t count1,
+				  loff_t range2_start, size_t count2,
+				  loff_t *start_offset,
+				  size_t *intersect_count,
+				  size_t *register_offset)
+{
+	if (range1_start <= range2_start &&
+	    range1_start + count1 > range2_start) {
+		*start_offset = range2_start - range1_start;
+		*intersect_count = min_t(size_t, count2,
+					 range1_start + count1 - range2_start);
+		*register_offset = 0;
+		return true;
+	}
+
+	if (range1_start > range2_start &&
+	    range1_start < range2_start + count2) {
+		*start_offset = 0;
+		*intersect_count = min_t(size_t, count1,
+					 range2_start + count2 - range1_start);
+		*register_offset = range1_start - range2_start;
+		return true;
+	}
+
+	return false;
+}
+
+static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
+					char __user *buf, size_t count,
+					loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	size_t register_offset;
+	loff_t copy_offset;
+	size_t copy_count;
+	__le32 val32;
+	__le16 val16;
+	u8 val8;
+	int ret;
+
+	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
+	if (ret < 0)
+		return ret;
+
+	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
+				  &copy_offset, &copy_count, &register_offset)) {
+		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
+		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset, copy_count))
+			return -EFAULT;
+	}
+
+	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
+	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
+				  &copy_offset, &copy_count, &register_offset)) {
+		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
+				   copy_count))
+			return -EFAULT;
+		val16 |= cpu_to_le16(PCI_COMMAND_IO);
+		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
+				 copy_count))
+			return -EFAULT;
+	}
+
+	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
+				  &copy_offset, &copy_count, &register_offset)) {
+		/* Transional needs to have revision 0 */
+		val8 = 0;
+		if (copy_to_user(buf + copy_offset, &val8, copy_count))
+			return -EFAULT;
+	}
+
+	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
+				  &copy_offset, &copy_count, &register_offset)) {
+		u32 bar_mask = ~(virtvdev->bar0_virtual_buf_size - 1);
+		u32 pci_base_addr_0 = le32_to_cpu(virtvdev->pci_base_addr_0);
+
+		val32 = cpu_to_le32((pci_base_addr_0 & bar_mask) | PCI_BASE_ADDRESS_SPACE_IO);
+		if (copy_to_user(buf + copy_offset, (void *)&val32 + register_offset, copy_count))
+			return -EFAULT;
+	}
+
+	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
+				  &copy_offset, &copy_count, &register_offset)) {
+		/*
+		 * Transitional devices use the PCI subsystem device id as
+		 * virtio device id, same as legacy driver always did.
+		 */
+		val16 = cpu_to_le16(VIRTIO_ID_NET);
+		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
+				 copy_count))
+			return -EFAULT;
+	}
+
+	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_VENDOR_ID, sizeof(val16),
+				  &copy_offset, &copy_count, &register_offset)) {
+		val16 = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET);
+		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
+				 copy_count))
+			return -EFAULT;
+	}
+
+	return count;
+}
+
+static ssize_t
+virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
+		       size_t count, loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+
+	if (!count)
+		return 0;
+
+	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
+		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
+
+	if (index == VFIO_PCI_BAR0_REGION_INDEX)
+		return translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
+
+	return vfio_pci_core_read(core_vdev, buf, count, ppos);
+}
+
+static ssize_t
+virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
+			size_t count, loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+
+	if (!count)
+		return 0;
+
+	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
+		size_t register_offset;
+		loff_t copy_offset;
+		size_t copy_count;
+
+		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
+					  &copy_offset, &copy_count,
+					  &register_offset)) {
+			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
+					   buf + copy_offset,
+					   copy_count))
+				return -EFAULT;
+		}
+
+		if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
+					  sizeof(virtvdev->pci_base_addr_0),
+					  &copy_offset, &copy_count,
+					  &register_offset)) {
+			if (copy_from_user((void *)&virtvdev->pci_base_addr_0 + register_offset,
+					   buf + copy_offset,
+					   copy_count))
+				return -EFAULT;
+		}
+	}
+
+	if (index == VFIO_PCI_BAR0_REGION_INDEX)
+		return translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf,
+						   count, false);
+
+	return vfio_pci_core_write(core_vdev, buf, count, ppos);
+}
+
+static int
+virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
+				   unsigned int cmd, unsigned long arg)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
+	void __user *uarg = (void __user *)arg;
+	struct vfio_region_info info = {};
+
+	if (copy_from_user(&info, uarg, minsz))
+		return -EFAULT;
+
+	if (info.argsz < minsz)
+		return -EINVAL;
+
+	switch (info.index) {
+	case VFIO_PCI_BAR0_REGION_INDEX:
+		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+		info.size = virtvdev->bar0_virtual_buf_size;
+		info.flags = VFIO_REGION_INFO_FLAG_READ |
+			     VFIO_REGION_INFO_FLAG_WRITE;
+		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
+	default:
+		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
+	}
+}
+
+static long
+virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
+			     unsigned long arg)
+{
+	switch (cmd) {
+	case VFIO_DEVICE_GET_REGION_INFO:
+		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
+	default:
+		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
+	}
+}
+
+static int
+virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
+{
+	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
+	int ret;
+
+	/*
+	 * Setup the BAR where the 'notify' exists to be used by vfio as well
+	 * This will let us mmap it only once and use it when needed.
+	 */
+	ret = vfio_pci_core_setup_barmap(core_device,
+					 virtvdev->notify_bar);
+	if (ret)
+		return ret;
+
+	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
+			virtvdev->notify_offset;
+	return 0;
+}
+
+static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
+	int ret;
+
+	ret = vfio_pci_core_enable(vdev);
+	if (ret)
+		return ret;
+
+	if (virtvdev->bar0_virtual_buf) {
+		/*
+		 * Upon close_device() the vfio_pci_core_disable() is called
+		 * and will close all the previous mmaps, so it seems that the
+		 * valid life cycle for the 'notify' addr is per open/close.
+		 */
+		ret = virtiovf_set_notify_addr(virtvdev);
+		if (ret) {
+			vfio_pci_core_disable(vdev);
+			return ret;
+		}
+	}
+
+	vfio_pci_core_finish_enable(vdev);
+	return 0;
+}
+
+static int virtiovf_get_device_config_size(unsigned short device)
+{
+	/* Network card */
+	return offsetofend(struct virtio_net_config, status);
+}
+
+static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
+{
+	u64 offset;
+	int ret;
+	u8 bar;
+
+	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
+				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
+				&bar, &offset);
+	if (ret)
+		return ret;
+
+	virtvdev->notify_bar = bar;
+	virtvdev->notify_offset = offset;
+	return 0;
+}
+
+static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev;
+	int ret;
+
+	ret = vfio_pci_core_init_dev(core_vdev);
+	if (ret)
+		return ret;
+
+	pdev = virtvdev->core_device.pdev;
+	ret = virtiovf_read_notify_info(virtvdev);
+	if (ret)
+		return ret;
+
+	/* Being ready with a buffer that supports MSIX */
+	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
+				virtiovf_get_device_config_size(pdev->device);
+	BUILD_BUG_ON(!is_power_of_2(virtvdev->bar0_virtual_buf_size));
+	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
+					     GFP_KERNEL);
+	if (!virtvdev->bar0_virtual_buf)
+		return -ENOMEM;
+	mutex_init(&virtvdev->bar_mutex);
+	return 0;
+}
+
+static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+
+	kfree(virtvdev->bar0_virtual_buf);
+	vfio_pci_core_release_dev(core_vdev);
+}
+
+static const struct vfio_device_ops virtiovf_vfio_pci_tran_ops = {
+	.name = "virtio-vfio-pci-trans",
+	.init = virtiovf_pci_init_device,
+	.release = virtiovf_pci_core_release_dev,
+	.open_device = virtiovf_pci_open_device,
+	.close_device = vfio_pci_core_close_device,
+	.ioctl = virtiovf_vfio_pci_core_ioctl,
+	.device_feature = vfio_pci_core_ioctl_feature,
+	.read = virtiovf_pci_core_read,
+	.write = virtiovf_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
+};
+
+static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
+	.name = "virtio-vfio-pci",
+	.init = vfio_pci_core_init_dev,
+	.release = vfio_pci_core_release_dev,
+	.open_device = virtiovf_pci_open_device,
+	.close_device = vfio_pci_core_close_device,
+	.ioctl = vfio_pci_core_ioctl,
+	.device_feature = vfio_pci_core_ioctl_feature,
+	.read = vfio_pci_core_read,
+	.write = vfio_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
+};
+
+static bool virtiovf_bar0_exists(struct pci_dev *pdev)
+{
+	struct resource *res = pdev->resource;
+
+	return res->flags;
+}
+
+static int virtiovf_pci_probe(struct pci_dev *pdev,
+			      const struct pci_device_id *id)
+{
+	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
+	struct virtiovf_pci_core_device *virtvdev;
+	int ret;
+
+	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
+	    !virtiovf_bar0_exists(pdev))
+		ops = &virtiovf_vfio_pci_tran_ops;
+
+	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
+				     &pdev->dev, ops);
+	if (IS_ERR(virtvdev))
+		return PTR_ERR(virtvdev);
+
+	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
+	ret = vfio_pci_core_register_device(&virtvdev->core_device);
+	if (ret)
+		goto out;
+	return 0;
+out:
+	vfio_put_device(&virtvdev->core_device.vdev);
+	return ret;
+}
+
+static void virtiovf_pci_remove(struct pci_dev *pdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
+
+	vfio_pci_core_unregister_device(&virtvdev->core_device);
+	vfio_put_device(&virtvdev->core_device.vdev);
+}
+
+static const struct pci_device_id virtiovf_pci_table[] = {
+	/* Only virtio-net is supported/tested so far */
+	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
+	{}
+};
+
+MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
+
+void virtiovf_pci_aer_reset_done(struct pci_dev *pdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
+
+	virtvdev->pci_cmd = 0;
+}
+
+static const struct pci_error_handlers virtiovf_err_handlers = {
+	.reset_done = virtiovf_pci_aer_reset_done,
+	.error_detected = vfio_pci_core_aer_err_detected,
+};
+
+static struct pci_driver virtiovf_pci_driver = {
+	.name = KBUILD_MODNAME,
+	.id_table = virtiovf_pci_table,
+	.probe = virtiovf_pci_probe,
+	.remove = virtiovf_pci_remove,
+	.err_handler = &virtiovf_err_handlers,
+	.driver_managed_dma = true,
+};
+
+module_pci_driver(virtiovf_pci_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
+MODULE_DESCRIPTION(
+	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");