diff mbox series

[v2] vfio/pci: Support 8-byte PCI loads and stores

Message ID 20240422153508.2355844-1-gbayer@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series [v2] vfio/pci: Support 8-byte PCI loads and stores | expand

Commit Message

Gerd Bayer April 22, 2024, 3:35 p.m. UTC
From: Ben Segal <bpsegal@us.ibm.com>

Many PCI adapters can benefit or even require full 64bit read
and write access to their registers. In order to enable work on
user-space drivers for these devices add two new variations
vfio_pci_core_io{read|write}64 of the existing access methods
when the architecture supports 64-bit ioreads and iowrites.

Since these access methods are instantiated on 64bit architectures,
only, their use in vfio_pci_core_do_io_rw() is restricted by conditional
compiles to these architectures.

Signed-off-by: Ben Segal <bpsegal@us.ibm.com>
Co-developed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
---
Hi all,

we've successfully used this patch with a user-mode driver for a PCI
device that requires 64bit register read/writes on s390. A quick grep
showed that there are several other drivers for PCI devices in the kernel
that use readq/writeq and eventually could use this, too.
So we decided to propose this for general inclusion.

Thank you,
Gerd Bayer

Changes v1 -> v2:
- On non 64bit architecture use at most 32bit accesses in
  vfio_pci_core_do_io_rw and describe that in the commit message.
- Drop the run-time error on 32bit architectures.
- The #endif splitting the "else if" is not really fortunate, but I'm
  open to suggestions.

 drivers/vfio/pci/vfio_pci_rdwr.c | 28 ++++++++++++++++++++++++++++
 include/linux/vfio_pci_core.h    |  3 +++
 2 files changed, 31 insertions(+)

Comments

Jason Gunthorpe April 22, 2024, 5:43 p.m. UTC | #1
On Mon, Apr 22, 2024 at 05:35:08PM +0200, Gerd Bayer wrote:
> From: Ben Segal <bpsegal@us.ibm.com>
> 
> Many PCI adapters can benefit or even require full 64bit read
> and write access to their registers. In order to enable work on
> user-space drivers for these devices add two new variations
> vfio_pci_core_io{read|write}64 of the existing access methods
> when the architecture supports 64-bit ioreads and iowrites.
> 
> Since these access methods are instantiated on 64bit architectures,
> only, their use in vfio_pci_core_do_io_rw() is restricted by conditional
> compiles to these architectures.
> 
> Signed-off-by: Ben Segal <bpsegal@us.ibm.com>
> Co-developed-by: Gerd Bayer <gbayer@linux.ibm.com>
> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> ---
> Hi all,
> 
> we've successfully used this patch with a user-mode driver for a PCI
> device that requires 64bit register read/writes on s390. A quick grep
> showed that there are several other drivers for PCI devices in the kernel
> that use readq/writeq and eventually could use this, too.
> So we decided to propose this for general inclusion.
> 
> Thank you,
> Gerd Bayer
> 
> Changes v1 -> v2:
> - On non 64bit architecture use at most 32bit accesses in
>   vfio_pci_core_do_io_rw and describe that in the commit message.
> - Drop the run-time error on 32bit architectures.
> - The #endif splitting the "else if" is not really fortunate, but I'm
>   open to suggestions.

Provide a iowrite64() that does back to back writes for 32 bit?

Jason
Alex Williamson April 22, 2024, 10:33 p.m. UTC | #2
On Mon, 22 Apr 2024 14:43:05 -0300
Jason Gunthorpe <jgg@ziepe.ca> wrote:

> On Mon, Apr 22, 2024 at 05:35:08PM +0200, Gerd Bayer wrote:
> > From: Ben Segal <bpsegal@us.ibm.com>
> > 
> > Many PCI adapters can benefit or even require full 64bit read
> > and write access to their registers. In order to enable work on
> > user-space drivers for these devices add two new variations
> > vfio_pci_core_io{read|write}64 of the existing access methods
> > when the architecture supports 64-bit ioreads and iowrites.
> > 
> > Since these access methods are instantiated on 64bit architectures,
> > only, their use in vfio_pci_core_do_io_rw() is restricted by conditional
> > compiles to these architectures.
> > 
> > Signed-off-by: Ben Segal <bpsegal@us.ibm.com>
> > Co-developed-by: Gerd Bayer <gbayer@linux.ibm.com>
> > Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> > ---
> > Hi all,
> > 
> > we've successfully used this patch with a user-mode driver for a PCI
> > device that requires 64bit register read/writes on s390. A quick grep
> > showed that there are several other drivers for PCI devices in the kernel
> > that use readq/writeq and eventually could use this, too.
> > So we decided to propose this for general inclusion.
> > 
> > Thank you,
> > Gerd Bayer
> > 
> > Changes v1 -> v2:
> > - On non 64bit architecture use at most 32bit accesses in
> >   vfio_pci_core_do_io_rw and describe that in the commit message.
> > - Drop the run-time error on 32bit architectures.
> > - The #endif splitting the "else if" is not really fortunate, but I'm
> >   open to suggestions.  
> 
> Provide a iowrite64() that does back to back writes for 32 bit?

I was curious what this looked like.  If we want to repeat the 4 byte
access then I think we need to refactor all the read/write accesses
into macros to avoid duplicating code, which results in something like
[1] below.  But also once we refactor to macros, the #ifdef within the
function as originally proposed gets a lot more bearable too [2].

I'd probably just go with something like [2] unless you want to further
macro-ize these branches out of existence in the main function.  Feel
free to grab any of this you like, the VFIO_IORDWR macro should
probably be it's own patch.  Thanks,

Alex

[1]
 drivers/vfio/pci/vfio_pci_rdwr.c |  145 +++++++++++++++++--------------
 include/linux/vfio_pci_core.h    |    3 
 2 files changed, 87 insertions(+), 61 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 03b8f7ada1ac..84bd1d670086 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -89,6 +89,71 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_ioread##size);
 VFIO_IOREAD(8)
 VFIO_IOREAD(16)
 VFIO_IOREAD(32)
+#ifdef ioread64
+VFIO_IOREAD(64)
+#endif
+
+#define VFIO_IORDWR(size)						\
+static int vfio_pci_core_iordwr##size(struct vfio_pci_core_device *vdev,\
+				bool iswrite, bool test_mem,		\
+				void __iomem *io, char __user *buf,	\
+				loff_t off, size_t *filled)		\
+{									\
+	u##size val;							\
+	int ret;							\
+									\
+	if (iswrite) {							\
+		if (copy_from_user(&val, buf, sizeof(val)))		\
+			return -EFAULT;					\
+									\
+		ret = vfio_pci_core_iowrite##size(vdev, test_mem,	\
+						  val, io + off);	\
+		if (ret)						\
+			return ret;					\
+	} else {							\
+		ret = vfio_pci_core_ioread##size(vdev, test_mem,	\
+						 &val, io + off);	\
+		if (ret)						\
+			return ret;					\
+									\
+		if (copy_to_user(buf, &val, sizeof(val)))		\
+			return -EFAULT;					\
+	}								\
+									\
+	*filled = sizeof(val);						\
+	return 0;							\
+}									\
+
+VFIO_IORDWR(8)
+VFIO_IORDWR(16)
+VFIO_IORDWR(32)
+#if defined(ioread64) && defined(iowrite64)
+VFIO_IORDWR(64)
+#else
+static int vfio_pci_core_iordwr64(struct vfio_pci_core_device *vdev,
+				  bool iswrite, bool test_mem,
+				  void __iomem *io, char __user *buf,
+				  loff_t off, size_t *filled)
+{
+	int ret;
+	size_t cnt;
+
+	ret = vfio_pci_core_iordwr32(vdev, iswrite, test_mem,
+				     io, buf, off, &cnt);
+	if (ret)
+		return ret;
+
+	ret = vfio_pci_core_iordwr32(vdev, iswrite, test_mem,
+				     io, buf + cnt, off + cnt,
+				     filled);
+	if (ret)
+		return ret;
+
+	*filled += cnt;
+
+	return 0;
+}
+#endif
 
 /*
  * Read or write from an __iomem region (MMIO or I/O port) with an excluded
@@ -114,72 +179,30 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
 		else
 			fillable = 0;
 
-		if (fillable >= 4 && !(off % 4)) {
-			u32 val;
-
-			if (iswrite) {
-				if (copy_from_user(&val, buf, 4))
-					return -EFAULT;
-
-				ret = vfio_pci_core_iowrite32(vdev, test_mem,
-							      val, io + off);
-				if (ret)
-					return ret;
-			} else {
-				ret = vfio_pci_core_ioread32(vdev, test_mem,
-							     &val, io + off);
-				if (ret)
-					return ret;
-
-				if (copy_to_user(buf, &val, 4))
-					return -EFAULT;
-			}
+		if (fillable >= 8 && !(off % 8)) {
+			ret = vfio_pci_core_iordwr64(vdev, iswrite, test_mem,
+						     io, buf, off, &filled);
+			if (ret)
+				return ret;
+
+		} else if (fillable >= 4 && !(off % 4)) {
+			ret = vfio_pci_core_iordwr32(vdev, iswrite, test_mem,
+						     io, buf, off, &filled);
+			if (ret)
+				return ret;
 
-			filled = 4;
 		} else if (fillable >= 2 && !(off % 2)) {
-			u16 val;
-
-			if (iswrite) {
-				if (copy_from_user(&val, buf, 2))
-					return -EFAULT;
-
-				ret = vfio_pci_core_iowrite16(vdev, test_mem,
-							      val, io + off);
-				if (ret)
-					return ret;
-			} else {
-				ret = vfio_pci_core_ioread16(vdev, test_mem,
-							     &val, io + off);
-				if (ret)
-					return ret;
-
-				if (copy_to_user(buf, &val, 2))
-					return -EFAULT;
-			}
+			ret = vfio_pci_core_iordwr16(vdev, iswrite, test_mem,
+						     io, buf, off, &filled);
+			if (ret)
+				return ret;
 
-			filled = 2;
 		} else if (fillable) {
-			u8 val;
-
-			if (iswrite) {
-				if (copy_from_user(&val, buf, 1))
-					return -EFAULT;
-
-				ret = vfio_pci_core_iowrite8(vdev, test_mem,
-							     val, io + off);
-				if (ret)
-					return ret;
-			} else {
-				ret = vfio_pci_core_ioread8(vdev, test_mem,
-							    &val, io + off);
-				if (ret)
-					return ret;
-
-				if (copy_to_user(buf, &val, 1))
-					return -EFAULT;
-			}
+			ret = vfio_pci_core_iordwr8(vdev, iswrite, test_mem,
+						    io, buf, off, &filled);
+			if (ret)
+				return ret;
 
-			filled = 1;
 		} else {
 			/* Fill reads with -1, drop writes */
 			filled = min(count, (size_t)(x_end - off));
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index a2c8b8bba711..f4cf5fd2350c 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -157,5 +157,8 @@ int vfio_pci_core_ioread##size(struct vfio_pci_core_device *vdev,	\
 VFIO_IOREAD_DECLATION(8)
 VFIO_IOREAD_DECLATION(16)
 VFIO_IOREAD_DECLATION(32)
+#ifdef ioread64
+VFIO_IOREAD_DECLATION(64)
+#endif
 
 #endif /* VFIO_PCI_CORE_H */

--------------------------------------------------------------------------------------

[2]
 drivers/vfio/pci/vfio_pci_rdwr.c |  122 +++++++++++++++----------------
 include/linux/vfio_pci_core.h    |    3 
 2 files changed, 65 insertions(+), 60 deletions(-)

commit 05aa3342b4f9e249f3bb370e2fdf66e95de556b8
Author: Alex Williamson <alex.williamson@redhat.com>
Date:   Mon Apr 22 15:58:48 2024 -0600

    test

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 03b8f7ada1ac..22e889374e47 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -89,6 +89,47 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_ioread##size);
 VFIO_IOREAD(8)
 VFIO_IOREAD(16)
 VFIO_IOREAD(32)
+#ifdef ioread64
+VFIO_IOREAD(64)
+#endif
+
+#define VFIO_IORDWR(size)						\
+static int vfio_pci_core_iordwr##size(struct vfio_pci_core_device *vdev,\
+				bool iswrite, bool test_mem,		\
+				void __iomem *io, char __user *buf,	\
+				loff_t off, size_t *filled)		\
+{									\
+	u##size val;							\
+	int ret;							\
+									\
+	if (iswrite) {							\
+		if (copy_from_user(&val, buf, sizeof(val)))		\
+			return -EFAULT;					\
+									\
+		ret = vfio_pci_core_iowrite##size(vdev, test_mem,	\
+						  val, io + off);	\
+		if (ret)						\
+			return ret;					\
+	} else {							\
+		ret = vfio_pci_core_ioread##size(vdev, test_mem,	\
+						 &val, io + off);	\
+		if (ret)						\
+			return ret;					\
+									\
+		if (copy_to_user(buf, &val, sizeof(val)))		\
+			return -EFAULT;					\
+	}								\
+									\
+	*filled = sizeof(val);						\
+	return 0;							\
+}									\
+
+VFIO_IORDWR(8)
+VFIO_IORDWR(16)
+VFIO_IORDWR(32)
+#if defined(ioread64) && defined(iowrite64)
+VFIO_IORDWR(64)
+#endif
 
 /*
  * Read or write from an __iomem region (MMIO or I/O port) with an excluded
@@ -114,72 +155,33 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
 		else
 			fillable = 0;
 
+#if defined(ioread64) && defined(iowrite64)
+		if (fillable >= 8 && !(off % 8)) {
+			ret = vfio_pci_core_iordwr64(vdev, iswrite, test_mem,
+						     io, buf, off, &filled);
+			if (ret)
+				return ret;
+
+		} else
+#endif
 		if (fillable >= 4 && !(off % 4)) {
-			u32 val;
-
-			if (iswrite) {
-				if (copy_from_user(&val, buf, 4))
-					return -EFAULT;
-
-				ret = vfio_pci_core_iowrite32(vdev, test_mem,
-							      val, io + off);
-				if (ret)
-					return ret;
-			} else {
-				ret = vfio_pci_core_ioread32(vdev, test_mem,
-							     &val, io + off);
-				if (ret)
-					return ret;
-
-				if (copy_to_user(buf, &val, 4))
-					return -EFAULT;
-			}
+			ret = vfio_pci_core_iordwr32(vdev, iswrite, test_mem,
+						     io, buf, off, &filled);
+			if (ret)
+				return ret;
 
-			filled = 4;
 		} else if (fillable >= 2 && !(off % 2)) {
-			u16 val;
-
-			if (iswrite) {
-				if (copy_from_user(&val, buf, 2))
-					return -EFAULT;
-
-				ret = vfio_pci_core_iowrite16(vdev, test_mem,
-							      val, io + off);
-				if (ret)
-					return ret;
-			} else {
-				ret = vfio_pci_core_ioread16(vdev, test_mem,
-							     &val, io + off);
-				if (ret)
-					return ret;
-
-				if (copy_to_user(buf, &val, 2))
-					return -EFAULT;
-			}
+			ret = vfio_pci_core_iordwr16(vdev, iswrite, test_mem,
+						     io, buf, off, &filled);
+			if (ret)
+				return ret;
 
-			filled = 2;
 		} else if (fillable) {
-			u8 val;
-
-			if (iswrite) {
-				if (copy_from_user(&val, buf, 1))
-					return -EFAULT;
-
-				ret = vfio_pci_core_iowrite8(vdev, test_mem,
-							     val, io + off);
-				if (ret)
-					return ret;
-			} else {
-				ret = vfio_pci_core_ioread8(vdev, test_mem,
-							    &val, io + off);
-				if (ret)
-					return ret;
-
-				if (copy_to_user(buf, &val, 1))
-					return -EFAULT;
-			}
+			ret = vfio_pci_core_iordwr8(vdev, iswrite, test_mem,
+						    io, buf, off, &filled);
+			if (ret)
+				return ret;
 
-			filled = 1;
 		} else {
 			/* Fill reads with -1, drop writes */
 			filled = min(count, (size_t)(x_end - off));
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index a2c8b8bba711..f4cf5fd2350c 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -157,5 +157,8 @@ int vfio_pci_core_ioread##size(struct vfio_pci_core_device *vdev,	\
 VFIO_IOREAD_DECLATION(8)
 VFIO_IOREAD_DECLATION(16)
 VFIO_IOREAD_DECLATION(32)
+#ifdef ioread64
+VFIO_IOREAD_DECLATION(64)
+#endif
 
 #endif /* VFIO_PCI_CORE_H */
Alex Williamson April 22, 2024, 10:48 p.m. UTC | #3
On Mon, 22 Apr 2024 16:33:53 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:
> [2]
..
> @@ -114,72 +155,33 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
>  		else
>  			fillable = 0;
>  
> +#if defined(ioread64) && defined(iowrite64)
> +		if (fillable >= 8 && !(off % 8)) {
> +			ret = vfio_pci_core_iordwr64(vdev, iswrite, test_mem,
> +						     io, buf, off, &filled);
> +			if (ret)
> +				return ret;
> +
> +		} else
> +#endif

This could also just be:

#ifdef vfio_pci_core_iordwr64

at this point.  BTW, all this is only compile tested.  Thanks,

Alex
Gerd Bayer April 23, 2024, 3:59 p.m. UTC | #4
On Mon, 2024-04-22 at 14:43 -0300, Jason Gunthorpe wrote:
> On Mon, Apr 22, 2024 at 05:35:08PM +0200, Gerd Bayer wrote:
> > From: Ben Segal <bpsegal@us.ibm.com>
> > 
> > Many PCI adapters can benefit or even require full 64bit read
> > and write access to their registers. In order to enable work on
> > user-space drivers for these devices add two new variations
> > vfio_pci_core_io{read|write}64 of the existing access methods
> > when the architecture supports 64-bit ioreads and iowrites.
> > 
> > Since these access methods are instantiated on 64bit architectures,
> > only, their use in vfio_pci_core_do_io_rw() is restricted by
> > conditional
> > compiles to these architectures.
> > 
> > Signed-off-by: Ben Segal <bpsegal@us.ibm.com>
> > Co-developed-by: Gerd Bayer <gbayer@linux.ibm.com>
> > Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> > ---
> > Hi all,
> > 
> > we've successfully used this patch with a user-mode driver for a
> > PCI
> > device that requires 64bit register read/writes on s390. A quick
> > grep
> > showed that there are several other drivers for PCI devices in the
> > kernel
> > that use readq/writeq and eventually could use this, too.
> > So we decided to propose this for general inclusion.
> > 
> > Thank you,
> > Gerd Bayer
> > 
> > Changes v1 -> v2:
> > - On non 64bit architecture use at most 32bit accesses in
> >   vfio_pci_core_do_io_rw and describe that in the commit message.
> > - Drop the run-time error on 32bit architectures.
> > - The #endif splitting the "else if" is not really fortunate, but
> > I'm
> >   open to suggestions.
> 
> Provide a iowrite64() that does back to back writes for 32 bit?

Hi Jason,

unfortunately, the nomenclature in vfio_pci_rdwr.c is not very clear...
vfio_io{read|write}64 are mapped to io{read|write}64 as defined in
include/asm-generic/io.h prior to my change already. OTOH, looks like
vfio_io{read|write}64 are consumed only by the
vfio_pci_core_io{read|write}64 functions. This however is an exported
symbol - that seems to be used only as vfio_pci_core_io{read|write}16,
so far.

vfio_pci_core_io{read|write}X is also used by vfio_pci_core_do_io_rw()
which does "bulk" reads/writes using the largest suitable access size.
I think there, we can live without 64bit accesses as the while loop
there will use 32bit read/writes back-to-back as applicable.

So I think 64bit accesses on 32bit architectures through VFIO are
somewhat uncharted territory - and I'm not sure that back-to-back 32bit
accesses are the right thing to do. If the device defined 64bit
registers, you could trigger side-effects in the wrong order (or not at
all).

Somewhat overwhelmed,
Gerd
Gerd Bayer April 23, 2024, 4:11 p.m. UTC | #5
On Mon, 2024-04-22 at 16:33 -0600, Alex Williamson wrote:
> On Mon, 22 Apr 2024 14:43:05 -0300
> Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> > On Mon, Apr 22, 2024 at 05:35:08PM +0200, Gerd Bayer wrote:
> > > From: Ben Segal <bpsegal@us.ibm.com>
> > > 
> > > Many PCI adapters can benefit or even require full 64bit read
> > > and write access to their registers. In order to enable work on
> > > user-space drivers for these devices add two new variations
> > > vfio_pci_core_io{read|write}64 of the existing access methods
> > > when the architecture supports 64-bit ioreads and iowrites.
> > > 
> > > Since these access methods are instantiated on 64bit
> > > architectures,
> > > only, their use in vfio_pci_core_do_io_rw() is restricted by
> > > conditional
> > > compiles to these architectures.
> > > 
> > > Signed-off-by: Ben Segal <bpsegal@us.ibm.com>
> > > Co-developed-by: Gerd Bayer <gbayer@linux.ibm.com>
> > > Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> > > ---
> > > Hi all,
> > > 
> > > we've successfully used this patch with a user-mode driver for a
> > > PCI
> > > device that requires 64bit register read/writes on s390. A quick
> > > grep
> > > showed that there are several other drivers for PCI devices in
> > > the kernel
> > > that use readq/writeq and eventually could use this, too.
> > > So we decided to propose this for general inclusion.
> > > 
> > > Thank you,
> > > Gerd Bayer
> > > 
> > > Changes v1 -> v2:
> > > - On non 64bit architecture use at most 32bit accesses in
> > >   vfio_pci_core_do_io_rw and describe that in the commit message.
> > > - Drop the run-time error on 32bit architectures.
> > > - The #endif splitting the "else if" is not really fortunate, but
> > > I'm
> > >   open to suggestions.  
> > 
> > Provide a iowrite64() that does back to back writes for 32 bit?
> 
> I was curious what this looked like.  If we want to repeat the 4 byte
> access then I think we need to refactor all the read/write accesses
> into macros to avoid duplicating code, which results in something
> like [1] below.  But also once we refactor to macros, the #ifdef
> within the function as originally proposed gets a lot more bearable
> too [2].
> 
> I'd probably just go with something like [2] unless you want to
> further macro-ize these branches out of existence in the main
> function. Feel free to grab any of this you like, the VFIO_IORDWR
> macro should probably be it's own patch.  Thanks,
> 
> Alex

Hi Alex,

thanks for your suggestions, I like your VFIO_IORDWR macro in [1].
As I just explained to Jason, I don't think that the back-to-back 32bit
accesses are a safe emulation of 64bit accesses in general, though. So
I'd rather leave that out.

However, I'm working on an idea - extending on the VFIO_IORDWR macro -
to convert the if - else if - chain into a switch/case construct, where
I can more easily #ifdef out the 64bit case if not available.

Now I "just" need to test this ;)

Thanks,
Gerd
Jason Gunthorpe April 23, 2024, 4:16 p.m. UTC | #6
On Tue, Apr 23, 2024 at 06:11:57PM +0200, Gerd Bayer wrote:
> On Mon, 2024-04-22 at 16:33 -0600, Alex Williamson wrote:
> > On Mon, 22 Apr 2024 14:43:05 -0300
> > Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > 
> > > On Mon, Apr 22, 2024 at 05:35:08PM +0200, Gerd Bayer wrote:
> > > > From: Ben Segal <bpsegal@us.ibm.com>
> > > > 
> > > > Many PCI adapters can benefit or even require full 64bit read
> > > > and write access to their registers. In order to enable work on
> > > > user-space drivers for these devices add two new variations
> > > > vfio_pci_core_io{read|write}64 of the existing access methods
> > > > when the architecture supports 64-bit ioreads and iowrites.
> > > > 
> > > > Since these access methods are instantiated on 64bit
> > > > architectures,
> > > > only, their use in vfio_pci_core_do_io_rw() is restricted by
> > > > conditional
> > > > compiles to these architectures.
> > > > 
> > > > Signed-off-by: Ben Segal <bpsegal@us.ibm.com>
> > > > Co-developed-by: Gerd Bayer <gbayer@linux.ibm.com>
> > > > Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> > > > ---
> > > > Hi all,
> > > > 
> > > > we've successfully used this patch with a user-mode driver for a
> > > > PCI
> > > > device that requires 64bit register read/writes on s390. A quick
> > > > grep
> > > > showed that there are several other drivers for PCI devices in
> > > > the kernel
> > > > that use readq/writeq and eventually could use this, too.
> > > > So we decided to propose this for general inclusion.
> > > > 
> > > > Thank you,
> > > > Gerd Bayer
> > > > 
> > > > Changes v1 -> v2:
> > > > - On non 64bit architecture use at most 32bit accesses in
> > > >   vfio_pci_core_do_io_rw and describe that in the commit message.
> > > > - Drop the run-time error on 32bit architectures.
> > > > - The #endif splitting the "else if" is not really fortunate, but
> > > > I'm
> > > >   open to suggestions.  
> > > 
> > > Provide a iowrite64() that does back to back writes for 32 bit?
> > 
> > I was curious what this looked like.  If we want to repeat the 4 byte
> > access then I think we need to refactor all the read/write accesses
> > into macros to avoid duplicating code, which results in something
> > like [1] below.  But also once we refactor to macros, the #ifdef
> > within the function as originally proposed gets a lot more bearable
> > too [2].
> > 
> > I'd probably just go with something like [2] unless you want to
> > further macro-ize these branches out of existence in the main
> > function. Feel free to grab any of this you like, the VFIO_IORDWR
> > macro should probably be it's own patch.  Thanks,
> > 
> > Alex
> 
> Hi Alex,
> 
> thanks for your suggestions, I like your VFIO_IORDWR macro in [1].
> As I just explained to Jason, I don't think that the back-to-back 32bit
> accesses are a safe emulation of 64bit accesses in general, though. So
> I'd rather leave that out.

It is though, it is exactly what the code does now.

This function is not 'do exactly XX byte store' it is actually 'memcpy
to io' with some occasional support for making contiguous TLPs in
common cases..

Jason
diff mbox series

Patch

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 03b8f7ada1ac..d83cb0bb7aa5 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -89,6 +89,9 @@  EXPORT_SYMBOL_GPL(vfio_pci_core_ioread##size);
 VFIO_IOREAD(8)
 VFIO_IOREAD(16)
 VFIO_IOREAD(32)
+#ifdef ioread64
+VFIO_IOREAD(64)
+#endif
 
 /*
  * Read or write from an __iomem region (MMIO or I/O port) with an excluded
@@ -114,6 +117,31 @@  ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
 		else
 			fillable = 0;
 
+#if defined(ioread64) && defined(iowrite64)
+		if (fillable >= 8 && !(off % 8)) {
+			u64 val;
+
+			if (iswrite) {
+				if (copy_from_user(&val, buf, 8))
+					return -EFAULT;
+
+				ret = vfio_pci_core_iowrite64(vdev, test_mem,
+							 val, io + off);
+				if (ret)
+					return ret;
+			} else {
+				ret = vfio_pci_core_ioread64(vdev, test_mem,
+							&val, io + off);
+				if (ret)
+					return ret;
+
+				if (copy_to_user(buf, &val, 8))
+					return -EFAULT;
+			}
+
+			filled = 8;
+		} else
+#endif /* defined(ioread64) && defined(iowrite64) */
 		if (fillable >= 4 && !(off % 4)) {
 			u32 val;
 
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index a2c8b8bba711..f4cf5fd2350c 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -157,5 +157,8 @@  int vfio_pci_core_ioread##size(struct vfio_pci_core_device *vdev,	\
 VFIO_IOREAD_DECLATION(8)
 VFIO_IOREAD_DECLATION(16)
 VFIO_IOREAD_DECLATION(32)
+#ifdef ioread64
+VFIO_IOREAD_DECLATION(64)
+#endif
 
 #endif /* VFIO_PCI_CORE_H */