diff mbox series

pl330: peripheral read corruption on cyclic buffers

Message ID BLAPR22MB2243B9158BCC7665034808B4BCA92@BLAPR22MB2243.namprd22.prod.outlook.com (mailing list archive)
State New
Headers show
Series pl330: peripheral read corruption on cyclic buffers | expand

Commit Message

Chris Pringle July 23, 2024, 3:28 p.m. UTC
When trying to perform cyclic DMA reads using the pl330, I found that portions of the data read back from the peripheral are stale/incomplete at the point of reading. The pattern of corruption in the data looks like a coherency issue; in our case, with a single read of 64 bytes of PCM data, I often see the first 32 bytes of data is good/coherent, but the second 32 bytes of data is stale (whatever was in the buffer previously). The audio driver we're using (via soc-generic-dmaengine-pcm) uses a cyclic DMA buffer for fetching audio samples; at the point of reading, the pl330 driver works out how much residual data is in the circular buffer by comparing the buffer address with the current "destination address" register in the PL330 to see how far the PL330 has got through the current DMA transfer; from this the audio driver can derive how much data we can safely read from the cyclic buffer. If I wait around 1uS after reading this "destination address" register before reading the data from system memory, the data is always coherent/complete by that point (which offers a potential ugly and perhaps flaky workaround for our use-case). The issue can also be observed by determining how much data was in the buffer, writing a known test pattern over that data, waiting 250uS and then reading back that test data; I often see that sections of the buffer (in a similar pattern to those observed with the audio distortion) contain the DMA'd PCM data instead of our test pattern, suggesting that either the DMA isn't complete at the time we wrote our test data.

I'm pretty sure all the existing code in the sound subsystem is already managing coherency correctly, but just for completeness, I forced allocation of the buffer using dma_alloc_coherent (instead of iram) and manually called dma_sync_single_for_cpu on the impacted memory but that didn't address the issue. I also tried adding in full system barriers (i.e. "dsb sy") but that also made no difference. Reducing the max burst length to 1 exacerbated the issue under normal conditions, but masked the issue once tracing was added. There are several errata on channel 0 of the PL330 so I excluded that channel from any testing, which also made no difference. I checked the alignment and all looks okay (matching the restrictions laid out in the PL330 TRM).

I found that adding a RMB into the PL330 microcode (between the LOADP and the STORE) was sufficient to address this problem, but I am slightly puzzled as to why this is needed. According to the PL330 TRM, the RMB will ensure that any existing AXI reads are complete. In this case _ldst_peripheral is performing a LOADP (loading data from the peripheral into the DMA FIFO - so not AXI), followed by a STORE (storing data from the DMA FIFO into system memory). It is unclear to me why a RMB is required to ensure that all the data is written at the point we call STORE; the observed behaviour suggests that adding a RMB is doing more than just ensuring any AXI reads are complete as to my knowledge, the DMA channel is not performing any AXI reads. I also tried adding a WMB after the STORE but that didn't resolve the issue. Similarly adding a WMB in place of the RMB (between the LOADP and STORE) also didn't resolve the issue.

We are using Rockchip's kernel source tree and unfortunately testing our use-case on a mainline kernel is a non-trivial amount of work. However, I ported across the latest pl330 driver from the mainline kernel (no changes except for a function prototype) and that shows exactly the same behaviour, so I believe the issue is also on mainline. All the drivers involved are part of the Rockchip kernel source tree, and most of them are identical to what's on v5.10; the delta between these source trees don't look like they'd cause the types of issues I'm seeing.

Does anyone here have any thoughts on why adding a RMB between the LOADP and the STORE might be fixing this problem? Or conversely have any other thoughts on possible root-causes or suggested fixes. Patch with potential fix for comment below. This patch mirrors exactly what the mem-to-mem DMA does in _ldst_memtomem; it will likely need a bit more work to cater for the REV_R1P0 lock-up (like in _ldst_memtomem) but seeking comment on this problem/fix first.

------------

From: Chris Pringle <cpringle@brightsign.biz>
Subject: [PATCH] dma,pl330: fix peripheral DMA read corruption

Fixes corruption observed on DMA reads from a peripheral into a cyclic
buffer. Previously, when trying to read from the peripheral, portions of
the buffer would contain stale/non-coherent data for a period up to 1uS
after pl330_get_current_xferred_count indicated data was available.

This patch addresses this issue by adding memory barriers into the PL330
microcode that handles peripheral loads/stores.

Issue was observed on a RK3588 with HDMI-in PCM audio read via an I2S
peripheral.

Signed-off-by: Chris Pringle <cpringle@brightsign.biz>
---
drivers/dma/pl330.c | 2 ++
1 file changed, 2 insertions(+)

--
2.30.2

BrightSign considers your privacy to be very important. The emails you send to us will be protected and secured. Furthermore, we will only use your email and contact information for the reasons you sent them to us and for tracking how effectively we respond to your requests.
diff mbox series

Patch

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index dfbf514188f3..427f1c2aa4b3 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -1193,8 +1193,10 @@  static inline int _ldst_peripheral(struct pl330_dmac *pl330,
                               off += _emit_WFP(dry_run, &buf[off], cond, pxs->desc->peri);
                               off += _emit_load(dry_run, &buf[off], cond, pxs->desc->rqtype,
                                               pxs->desc->peri);
+                              off += _emit_RMB(dry_run, &buf[off]);
                               off += _emit_store(dry_run, &buf[off], cond, pxs->desc->rqtype,
                                               pxs->desc->peri);
+                              off += _emit_WMB(dry_run, &buf[off]);
               }

                return off;