Message ID | 20240318193019.123795-1-cassel@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | misc: pci_endpoint_test: Use memcpy_toio()/memcpy_fromio() for BAR tests | expand |
On Mon, Mar 18, 2024, at 20:30, Niklas Cassel wrote: > diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c > index 705029ad8eb5..cb6c9ccf3a5f 100644 > --- a/drivers/misc/pci_endpoint_test.c > +++ b/drivers/misc/pci_endpoint_test.c > @@ -272,31 +272,59 @@ static const u32 bar_test_pattern[] = { > 0xA5A5A5A5, > }; > > +static int pci_endpoint_test_bar_memcmp(struct pci_endpoint_test *test, > + enum pci_barno barno, int offset, > + void *write_buf, void *read_buf, > + int size) > +{ > + memset(write_buf, bar_test_pattern[barno], size); > + memcpy_toio(test->bar[barno] + offset, write_buf, size); > + > + /* Make sure that reads are performed after writes. */ > + mb(); > + memcpy_fromio(read_buf, test->bar[barno] + offset, size); Did you see actual bugs without the barrier? On normal PCI semantics, a read will always force a write to be flushed first. Arnd
On Mon, Mar 18, 2024 at 09:02:21PM +0100, Arnd Bergmann wrote: > On Mon, Mar 18, 2024, at 20:30, Niklas Cassel wrote: > > diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c > > index 705029ad8eb5..cb6c9ccf3a5f 100644 > > --- a/drivers/misc/pci_endpoint_test.c > > +++ b/drivers/misc/pci_endpoint_test.c > > @@ -272,31 +272,59 @@ static const u32 bar_test_pattern[] = { > > 0xA5A5A5A5, > > }; > > > > +static int pci_endpoint_test_bar_memcmp(struct pci_endpoint_test *test, > > + enum pci_barno barno, int offset, > > + void *write_buf, void *read_buf, > > + int size) > > +{ > > + memset(write_buf, bar_test_pattern[barno], size); > > + memcpy_toio(test->bar[barno] + offset, write_buf, size); > > + > > + /* Make sure that reads are performed after writes. */ > > + mb(); > > + memcpy_fromio(read_buf, test->bar[barno] + offset, size); > > Did you see actual bugs without the barrier? On normal PCI > semantics, a read will always force a write to be flushed first. > Even for non-PCI cases, read/writes to the same region are not reordered, right? - Mani
On Tue, Mar 19, 2024, at 05:31, Manivannan Sadhasivam wrote: > On Mon, Mar 18, 2024 at 09:02:21PM +0100, Arnd Bergmann wrote: >> On Mon, Mar 18, 2024, at 20:30, Niklas Cassel wrote: >> > diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c >> > index 705029ad8eb5..cb6c9ccf3a5f 100644 >> > --- a/drivers/misc/pci_endpoint_test.c >> > +++ b/drivers/misc/pci_endpoint_test.c >> > @@ -272,31 +272,59 @@ static const u32 bar_test_pattern[] = { >> > 0xA5A5A5A5, >> > }; >> > >> > +static int pci_endpoint_test_bar_memcmp(struct pci_endpoint_test *test, >> > + enum pci_barno barno, int offset, >> > + void *write_buf, void *read_buf, >> > + int size) >> > +{ >> > + memset(write_buf, bar_test_pattern[barno], size); >> > + memcpy_toio(test->bar[barno] + offset, write_buf, size); >> > + >> > + /* Make sure that reads are performed after writes. */ >> > + mb(); >> > + memcpy_fromio(read_buf, test->bar[barno] + offset, size); >> >> Did you see actual bugs without the barrier? On normal PCI >> semantics, a read will always force a write to be flushed first. >> > > Even for non-PCI cases, read/writes to the same region are not reordered, right? Correct. The only thing that is special about PCI here is that writes are explicitly allowed to get posted in the first place, on other buses the memcpy_toio() by itself might already require non-posted behavior. Arnd
Hello Arnd, On Mon, Mar 18, 2024 at 09:02:21PM +0100, Arnd Bergmann wrote: > On Mon, Mar 18, 2024, at 20:30, Niklas Cassel wrote: > > diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c > > index 705029ad8eb5..cb6c9ccf3a5f 100644 > > --- a/drivers/misc/pci_endpoint_test.c > > +++ b/drivers/misc/pci_endpoint_test.c > > @@ -272,31 +272,59 @@ static const u32 bar_test_pattern[] = { > > 0xA5A5A5A5, > > }; > > > > +static int pci_endpoint_test_bar_memcmp(struct pci_endpoint_test *test, > > + enum pci_barno barno, int offset, > > + void *write_buf, void *read_buf, > > + int size) > > +{ > > + memset(write_buf, bar_test_pattern[barno], size); > > + memcpy_toio(test->bar[barno] + offset, write_buf, size); > > + > > + /* Make sure that reads are performed after writes. */ > > + mb(); > > + memcpy_fromio(read_buf, test->bar[barno] + offset, size); > > Did you see actual bugs without the barrier? On normal PCI > semantics, a read will always force a write to be flushed first. I'm aware that a Read Request must not pass a Posted Request under normal PCI transaction ordering rules. (As defined in PCIe 6.0, Table 2-42 Ordering Rules Summary) I was more worried about the compiler or CPU reordering things: https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt#L1876-L1878 I did try the patch without the barrier, and did not see any issues. I did also see this comment: https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt#L2785-L2790 Do you think that we need to perform any flushing after the memset(), to ensure that the data written using memcpy_toio() is actually what we expect it to me? Kind regards, Niklas
On Tue, Mar 19, 2024 at 05:40:33PM +0100, Niklas Cassel wrote: > Hello Arnd, > > On Mon, Mar 18, 2024 at 09:02:21PM +0100, Arnd Bergmann wrote: > > On Mon, Mar 18, 2024, at 20:30, Niklas Cassel wrote: > > > diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c > > > index 705029ad8eb5..cb6c9ccf3a5f 100644 > > > --- a/drivers/misc/pci_endpoint_test.c > > > +++ b/drivers/misc/pci_endpoint_test.c > > > @@ -272,31 +272,59 @@ static const u32 bar_test_pattern[] = { > > > 0xA5A5A5A5, > > > }; > > > > > > +static int pci_endpoint_test_bar_memcmp(struct pci_endpoint_test *test, > > > + enum pci_barno barno, int offset, > > > + void *write_buf, void *read_buf, > > > + int size) > > > +{ > > > + memset(write_buf, bar_test_pattern[barno], size); > > > + memcpy_toio(test->bar[barno] + offset, write_buf, size); > > > + > > > + /* Make sure that reads are performed after writes. */ > > > + mb(); > > > + memcpy_fromio(read_buf, test->bar[barno] + offset, size); > > > > Did you see actual bugs without the barrier? On normal PCI > > semantics, a read will always force a write to be flushed first. > > I'm aware that a Read Request must not pass a Posted Request under > normal PCI transaction ordering rules. > (As defined in PCIe 6.0, Table 2-42 Ordering Rules Summary) > > I was more worried about the compiler or CPU reordering things: > https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt#L1876-L1878 > > I did try the patch without the barrier, and did not see any issues. > There shouldn't be an issue without the barrier. > > I did also see this comment: > https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt#L2785-L2790 > > Do you think that we need to perform any flushing after the memset(), > to ensure that the data written using memcpy_toio() is actually what > we expect it to me? > The documentation recommends cache flushing only if the normal memory write and MMIO access are dependent. But here you are just accessing the MMIO. So no explicit ordering or cache flushing is required. - Mani
On Tue, Mar 19, 2024 at 10:18:26PM +0530, Manivannan Sadhasivam wrote: > > > > I did also see this comment: > > https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt#L2785-L2790 > > > > Do you think that we need to perform any flushing after the memset(), > > to ensure that the data written using memcpy_toio() is actually what > > we expect it to me? > > > > The documentation recommends cache flushing only if the normal memory write and > MMIO access are dependent. But here you are just accessing the MMIO. So no > explicit ordering or cache flushing is required. What does dependent mean in this case then? Since the data that we are writing to the device is the data that was just written to memory using memset(). Kind regards, Niklas
On Tue, Mar 19, 2024, at 17:53, Niklas Cassel wrote: > On Tue, Mar 19, 2024 at 10:18:26PM +0530, Manivannan Sadhasivam wrote: >> > >> > I did also see this comment: >> > https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt#L2785-L2790 >> > >> > Do you think that we need to perform any flushing after the memset(), >> > to ensure that the data written using memcpy_toio() is actually what >> > we expect it to me? >> > >> >> The documentation recommends cache flushing only if the normal memory write and >> MMIO access are dependent. But here you are just accessing the MMIO. So no >> explicit ordering or cache flushing is required. > > What does dependent mean in this case then? > > Since the data that we are writing to the device is the data that was > just written to memory using memset(). You need a barrier for the case where the memset() writes to a buffer in RAM and then you write the address of that buffer into a device register, which triggers a DMA read from the buffer. Without a barrier, the side effect of the MMIO write may come before the data in the RAM buffer is visible. A memcpy_fromio() only involves a single master accessing the memory (i.e. the CPU executing both the memset() and the memcpy()), so there is no way this can go wrong. Arnd
Hello Arnd, On Tue, Mar 19, 2024 at 08:55:19PM +0100, Arnd Bergmann wrote: > On Tue, Mar 19, 2024, at 17:53, Niklas Cassel wrote: > > On Tue, Mar 19, 2024 at 10:18:26PM +0530, Manivannan Sadhasivam wrote: > >> > > >> > I did also see this comment: > >> > https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt#L2785-L2790 > >> > > >> > Do you think that we need to perform any flushing after the memset(), > >> > to ensure that the data written using memcpy_toio() is actually what > >> > we expect it to me? > >> > > >> > >> The documentation recommends cache flushing only if the normal memory write and > >> MMIO access are dependent. But here you are just accessing the MMIO. So no > >> explicit ordering or cache flushing is required. > > > > What does dependent mean in this case then? > > > > Since the data that we are writing to the device is the data that was > > just written to memory using memset(). > > You need a barrier for the case where the memset() writes to > a buffer in RAM and then you write the address of that buffer > into a device register, which triggers a DMA read from the > buffer. Without a barrier, the side effect of the MMIO write > may come before the data in the RAM buffer is visible. > > A memcpy_fromio() only involves a single master accessing > the memory (i.e. the CPU executing both the memset() and the > memcpy()), so there is no way this can go wrong. Thank you for the clarification. Kind regards, Niklas
diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c index 705029ad8eb5..cb6c9ccf3a5f 100644 --- a/drivers/misc/pci_endpoint_test.c +++ b/drivers/misc/pci_endpoint_test.c @@ -272,31 +272,59 @@ static const u32 bar_test_pattern[] = { 0xA5A5A5A5, }; +static int pci_endpoint_test_bar_memcmp(struct pci_endpoint_test *test, + enum pci_barno barno, int offset, + void *write_buf, void *read_buf, + int size) +{ + memset(write_buf, bar_test_pattern[barno], size); + memcpy_toio(test->bar[barno] + offset, write_buf, size); + + /* Make sure that reads are performed after writes. */ + mb(); + memcpy_fromio(read_buf, test->bar[barno] + offset, size); + + return memcmp(write_buf, read_buf, size); +} + static bool pci_endpoint_test_bar(struct pci_endpoint_test *test, enum pci_barno barno) { - int j; - u32 val; - int size; + int j, bar_size, buf_size, iters, remain; + void *write_buf; + void *read_buf; struct pci_dev *pdev = test->pdev; + struct device *dev = &pdev->dev; if (!test->bar[barno]) return false; - size = pci_resource_len(pdev, barno); + bar_size = pci_resource_len(pdev, barno); if (barno == test->test_reg_bar) - size = 0x4; + bar_size = 0x4; - for (j = 0; j < size; j += 4) - pci_endpoint_test_bar_writel(test, barno, j, - bar_test_pattern[barno]); + buf_size = min(SZ_1M, bar_size); - for (j = 0; j < size; j += 4) { - val = pci_endpoint_test_bar_readl(test, barno, j); - if (val != bar_test_pattern[barno]) + write_buf = kmalloc(buf_size, GFP_KERNEL); + if (!write_buf) + return false; + + read_buf = kmalloc(buf_size, GFP_KERNEL); + if (!read_buf) + return false; + + iters = bar_size / buf_size; + for (j = 0; j < iters; j++) + if (pci_endpoint_test_bar_memcmp(test, barno, buf_size * j, + write_buf, read_buf, buf_size)) + return false; + + remain = bar_size % buf_size; + if (remain) + if (pci_endpoint_test_bar_memcmp(test, barno, buf_size * iters, + write_buf, read_buf, remain)) return false; - } return true; }
The current code uses writel()/readl(), which has an implicit memory barrier for every single readl()/writel(). Additionally, reading 4 bytes at a time over the PCI bus is not really optimal, considering that this code is running in an ioctl handler. Use memcpy_toio()/memcpy_fromio() for BAR tests. Before patch with a 4MB BAR: $ time /usr/bin/pcitest -b 1 BAR1: OKAY real 0m 1.56s After patch with a 4MB BAR: $ time /usr/bin/pcitest -b 1 BAR1: OKAY real 0m 0.54s Signed-off-by: Niklas Cassel <cassel@kernel.org> --- drivers/misc/pci_endpoint_test.c | 52 ++++++++++++++++++++++++-------- 1 file changed, 40 insertions(+), 12 deletions(-)