Message ID | 1591485677-20533-2-git-send-email-yibin.gong@nxp.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | add ecspi ERR009165 for i.mx6/7 soc family | expand |
On Sun, Jun 07, 2020 at 07:21:05AM +0800, Robin Gong wrote: > In case dma transfer failed and fallback to pio, tx_buf/rx_buf need to be > taken care cache since they have already been maintained by spi.c Is this needed as part of this series? This looks like an independent fix and it seems better to get this in independently. > Fixes: bcd8e7761ec9("spi: imx: fallback to PIO if dma setup failure") > Signed-off-by: Robin Gong <yibin.gong@nxp.com> > Reported-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> > Link: https://lore.kernel.org/linux-arm-kernel/5d246dd81607bb6e5cb9af86ad4e53f7a7a99c50.camel@ew.tq-group.com/ The Link is usually to the patch on the list. > --- a/drivers/spi/spi-imx.c > +++ b/drivers/spi/spi-imx.c > @@ -1456,6 +1456,13 @@ static int spi_imx_pio_transfer(struct spi_device *spi, > return -ETIMEDOUT; > } > > + if (transfer->rx_sg.sgl) { > + struct device *rx_dev = spi->controller->dma_rx->device->dev; > + > + dma_sync_sg_for_device(rx_dev, transfer->rx_sg.sgl, > + transfer->rx_sg.nents, DMA_TO_DEVICE); > + } > + > return transfer->len; > } This is confusing - why are we DMA mapping to the device after doing a PIO transfer?
On 2020/06/08 22:35 Mark Brown <broonie@kernel.org> wrote: > On Sun, Jun 07, 2020 at 07:21:05AM +0800, Robin Gong wrote: > > In case dma transfer failed and fallback to pio, tx_buf/rx_buf need to > > be taken care cache since they have already been maintained by spi.c > > Is this needed as part of this series? This looks like an independent fix and it > seems better to get this in independently. But that's used to fix one patch [05/13]of the v8 patch set. To be honest, I'm also not sure how to handle it so that I merged both into first v9....For now, I think you are right, since 'fallback pio' patch could be independent this series. Will resend in v10. > > > Fixes: bcd8e7761ec9("spi: imx: fallback to PIO if dma setup failure") > > Signed-off-by: Robin Gong <yibin.gong@nxp.com> > > Reported-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> > > Link: > > https://lore.kernel.org/linux-arm-kernel/5d246dd81607bb6e5cb9af86ad4e5 > > 3f7a7a99c50.camel@ew.tq-group.com/ > > The Link is usually to the patch on the list. Okay, will remove it. > > > --- a/drivers/spi/spi-imx.c > > +++ b/drivers/spi/spi-imx.c > > @@ -1456,6 +1456,13 @@ static int spi_imx_pio_transfer(struct spi_device > *spi, > > return -ETIMEDOUT; > > } > > > > + if (transfer->rx_sg.sgl) { > > + struct device *rx_dev = spi->controller->dma_rx->device->dev; > > + > > + dma_sync_sg_for_device(rx_dev, transfer->rx_sg.sgl, > > + transfer->rx_sg.nents, DMA_TO_DEVICE); > > + } > > + > > return transfer->len; > > } > > This is confusing - why are we DMA mapping to the device after doing a PIO > transfer? 'transfer->rx_sg.sgl' condition check that's the case fallback PIO after DMA transfer failed. But the spi core still think the buffer should be in 'device' while spi driver touch it by PIO(CPU), so sync it back to device to ensure all received data flush to DDR.
On Mon, Jun 08, 2020 at 03:08:45PM +0000, Robin Gong wrote: > > > + if (transfer->rx_sg.sgl) { > > > + struct device *rx_dev = spi->controller->dma_rx->device->dev; > > > + > > > + dma_sync_sg_for_device(rx_dev, transfer->rx_sg.sgl, > > > + transfer->rx_sg.nents, DMA_TO_DEVICE); > > > + } > > > + > > This is confusing - why are we DMA mapping to the device after doing a PIO > > transfer? > 'transfer->rx_sg.sgl' condition check that's the case fallback PIO after DMA transfer > failed. But the spi core still think the buffer should be in 'device' while spi driver > touch it by PIO(CPU), so sync it back to device to ensure all received data flush to DDR. So we sync it back to the device so that we can then do another sync to CPU? TBH I'm a bit surprised that there's a requirement that we explicitly undo a sync and that a redundant double sync in the same direction might be an issue but I've not had a need to care so I'm perfectly prepared to believe there is. At the very least this needs a comment.
On 2020-06-08 16:31, Mark Brown wrote: > On Mon, Jun 08, 2020 at 03:08:45PM +0000, Robin Gong wrote: > >>>> + if (transfer->rx_sg.sgl) { >>>> + struct device *rx_dev = spi->controller->dma_rx->device->dev; >>>> + >>>> + dma_sync_sg_for_device(rx_dev, transfer->rx_sg.sgl, >>>> + transfer->rx_sg.nents, DMA_TO_DEVICE); >>>> + } >>>> + > >>> This is confusing - why are we DMA mapping to the device after doing a PIO >>> transfer? > >> 'transfer->rx_sg.sgl' condition check that's the case fallback PIO after DMA transfer >> failed. But the spi core still think the buffer should be in 'device' while spi driver >> touch it by PIO(CPU), so sync it back to device to ensure all received data flush to DDR. > > So we sync it back to the device so that we can then do another sync to > CPU? TBH I'm a bit surprised that there's a requirement that we > explicitly undo a sync and that a redundant double sync in the same > direction might be an issue but I've not had a need to care so I'm > perfectly prepared to believe there is. > > At the very least this needs a comment. Yeah, something's off here - at the very least, syncing with DMA_TO_DEVICE on the Rx buffer that was mapped with DMA_FROM_DEVICE is clearly wrong. CONFIG_DMA_API_DEBUG should scream about that. If the device has written to the buffer at all since dma_map_sg() was called then you do need a dma_sync_sg_for_cpu() call before touching it from a CPU fallback path, but if nobody's going to touch it from that point until it's unmapped then there's no point syncing it again. The my_card_interrupt_handler() example in DMA-API_HOWTO.txt demonstrates this. Robin.
On 2020/06/08 23:32 Mark Brown <broonie@kernel.org> wrote: > On Mon, Jun 08, 2020 at 03:08:45PM +0000, Robin Gong wrote: > > > > > + if (transfer->rx_sg.sgl) { > > > > + struct device *rx_dev = spi->controller->dma_rx->device->dev; > > > > + > > > > + dma_sync_sg_for_device(rx_dev, transfer->rx_sg.sgl, > > > > + transfer->rx_sg.nents, DMA_TO_DEVICE); > > > > + } > > > > + > > > > This is confusing - why are we DMA mapping to the device after doing > > > a PIO transfer? > > > 'transfer->rx_sg.sgl' condition check that's the case fallback PIO > > after DMA transfer failed. But the spi core still think the buffer > > should be in 'device' while spi driver touch it by PIO(CPU), so sync it back to > device to ensure all received data flush to DDR. > > So we sync it back to the device so that we can then do another sync to CPU? Yes, spi.c will sync to CPU again in spi_unmap_buf() after transfer done finally. Otherwise, the fresh received data by CPU in this fallback case may be invalidated by spi.c, which led to the data corrupt on Matthias's side. > TBH I'm a bit surprised that there's a requirement that we explicitly undo a > sync and that a redundant double sync in the same direction might be an issue Considering DMA transfer may be failed(for example, sdma firmware may not be updated as ERR009165 depends on), we'd better fallback to PIO to ensure no any function break here. Thus should clean fresh rx data from cache into external memory as real 'device' received by DMA. Understood a bit confusing here, but that way could be avoided by any code changing in spi.c. Or make some code changes in spi.c to cancel spi_unmap_buf() in such fallback case? > but I've not had a need to care so I'm perfectly prepared to believe there is. > > At the very least this needs a comment. Okay, I'll add comment here in next.
On 2020/06/09 0:44 Robin Murphy <robin.murphy@arm.com> wrote: > On 2020-06-08 16:31, Mark Brown wrote: > > On Mon, Jun 08, 2020 at 03:08:45PM +0000, Robin Gong wrote: > > > >>>> + if (transfer->rx_sg.sgl) { > >>>> + struct device *rx_dev = spi->controller->dma_rx->device->dev; > >>>> + > >>>> + dma_sync_sg_for_device(rx_dev, transfer->rx_sg.sgl, > >>>> + transfer->rx_sg.nents, DMA_TO_DEVICE); > >>>> + } > >>>> + > > > >>> This is confusing - why are we DMA mapping to the device after doing > >>> a PIO transfer? > > > >> 'transfer->rx_sg.sgl' condition check that's the case fallback PIO > >> after DMA transfer failed. But the spi core still think the buffer > >> should be in 'device' while spi driver touch it by PIO(CPU), so sync it back to > device to ensure all received data flush to DDR. > > > > So we sync it back to the device so that we can then do another sync > > to CPU? TBH I'm a bit surprised that there's a requirement that we > > explicitly undo a sync and that a redundant double sync in the same > > direction might be an issue but I've not had a need to care so I'm > > perfectly prepared to believe there is. > > > > At the very least this needs a comment. > > Yeah, something's off here - at the very least, syncing with DMA_TO_DEVICE on > the Rx buffer that was mapped with DMA_FROM_DEVICE is clearly wrong. > CONFIG_DMA_API_DEBUG should scream about that. > > If the device has written to the buffer at all since dma_map_sg() was called > then you do need a dma_sync_sg_for_cpu() call before touching it from a CPU > fallback path, but if nobody's going to touch it from that point until it's > unmapped then there's no point syncing it again. The > my_card_interrupt_handler() example in DMA-API_HOWTO.txt demonstrates > this. Thanks for you post, but sorry, that's not spi-imx case now, because the rx data in device memory is not truly updated from 'device'/DMA, but from PIO, so that dma_sync_sg_for_cpu with DMA_FROM_DEVICE can't be used, otherwise the fresh data in cache will be invalidated. But you're right, kernel warning comes out if CONFIG_DMA_API_DEBUG enabled...
On 2020-06-09 06:21, Robin Gong wrote: > On 2020/06/09 0:44 Robin Murphy <robin.murphy@arm.com> wrote: >> On 2020-06-08 16:31, Mark Brown wrote: >>> On Mon, Jun 08, 2020 at 03:08:45PM +0000, Robin Gong wrote: >>> >>>>>> + if (transfer->rx_sg.sgl) { >>>>>> + struct device *rx_dev = spi->controller->dma_rx->device->dev; >>>>>> + >>>>>> + dma_sync_sg_for_device(rx_dev, transfer->rx_sg.sgl, >>>>>> + transfer->rx_sg.nents, DMA_TO_DEVICE); >>>>>> + } >>>>>> + >>> >>>>> This is confusing - why are we DMA mapping to the device after doing >>>>> a PIO transfer? >>> >>>> 'transfer->rx_sg.sgl' condition check that's the case fallback PIO >>>> after DMA transfer failed. But the spi core still think the buffer >>>> should be in 'device' while spi driver touch it by PIO(CPU), so sync it back to >> device to ensure all received data flush to DDR. >>> >>> So we sync it back to the device so that we can then do another sync >>> to CPU? TBH I'm a bit surprised that there's a requirement that we >>> explicitly undo a sync and that a redundant double sync in the same >>> direction might be an issue but I've not had a need to care so I'm >>> perfectly prepared to believe there is. >>> >>> At the very least this needs a comment. >> >> Yeah, something's off here - at the very least, syncing with DMA_TO_DEVICE on >> the Rx buffer that was mapped with DMA_FROM_DEVICE is clearly wrong. >> CONFIG_DMA_API_DEBUG should scream about that. >> >> If the device has written to the buffer at all since dma_map_sg() was called >> then you do need a dma_sync_sg_for_cpu() call before touching it from a CPU >> fallback path, but if nobody's going to touch it from that point until it's >> unmapped then there's no point syncing it again. The >> my_card_interrupt_handler() example in DMA-API_HOWTO.txt demonstrates >> this. > Thanks for you post, but sorry, that's not spi-imx case now, because the rx data in device memory is not truly updated from 'device'/DMA, but from PIO, so that dma_sync_sg_for_cpu with DMA_FROM_DEVICE can't be used, otherwise the fresh data in cache will be invalidated. > But you're right, kernel warning comes out if CONFIG_DMA_API_DEBUG enabled... Ah, I think I understand what's going on now. That's... really ugly :( Looking at the SPI core code, I think a better way to handle this would be to have your fallback path call spi_unmap_buf() directly (or perform the same actions, if exporting that to drivers is unacceptable), then make sure ->can_dma() returns false after that such that spi_unmap_msg() won't try to unmap it again. That's a lot more reasonable than trying to fake up a DMA_TO_DEVICE transfer in the middle of a DMA_FROM_DEVICE operation on the same buffer. Alternatively, is it feasible to initiate a dummy DMA request during probe, such that you can detect the failure condition and give up on the DMA channel early, and not have to deal with it during a real SPI transfer? Robin.
On Tue, 2020-06-09 at 11:00 +0100, Robin Murphy wrote: > On 2020-06-09 06:21, Robin Gong wrote: > > On 2020/06/09 0:44 Robin Murphy <robin.murphy@arm.com> wrote: > > > On 2020-06-08 16:31, Mark Brown wrote: > > > > On Mon, Jun 08, 2020 at 03:08:45PM +0000, Robin Gong wrote: > > > > > > > > > > > + if (transfer->rx_sg.sgl) { > > > > > > > + struct device *rx_dev = spi->controller- > > > > > > > >dma_rx->device->dev; > > > > > > > + > > > > > > > + dma_sync_sg_for_device(rx_dev, transfer- > > > > > > > >rx_sg.sgl, > > > > > > > + transfer->rx_sg.nents, > > > > > > > DMA_TO_DEVICE); > > > > > > > + } > > > > > > > + > > > > > > This is confusing - why are we DMA mapping to the device > > > > > > after doing > > > > > > a PIO transfer? > > > > > 'transfer->rx_sg.sgl' condition check that's the case > > > > > fallback PIO > > > > > after DMA transfer failed. But the spi core still think the > > > > > buffer > > > > > should be in 'device' while spi driver touch it by PIO(CPU), > > > > > so sync it back to > > > > > > device to ensure all received data flush to DDR. > > > > > > > > So we sync it back to the device so that we can then do another > > > > sync > > > > to CPU? TBH I'm a bit surprised that there's a requirement > > > > that we > > > > explicitly undo a sync and that a redundant double sync in the > > > > same > > > > direction might be an issue but I've not had a need to care so > > > > I'm > > > > perfectly prepared to believe there is. > > > > > > > > At the very least this needs a comment. > > > > > > Yeah, something's off here - at the very least, syncing with > > > DMA_TO_DEVICE on > > > the Rx buffer that was mapped with DMA_FROM_DEVICE is clearly > > > wrong. > > > CONFIG_DMA_API_DEBUG should scream about that. > > > > > > If the device has written to the buffer at all since dma_map_sg() > > > was called > > > then you do need a dma_sync_sg_for_cpu() call before touching it > > > from a CPU > > > fallback path, but if nobody's going to touch it from that point > > > until it's > > > unmapped then there's no point syncing it again. The > > > my_card_interrupt_handler() example in DMA-API_HOWTO.txt > > > demonstrates > > > this. > > > > Thanks for you post, but sorry, that's not spi-imx case now, > > because the rx data in device memory is not truly updated from > > 'device'/DMA, but from PIO, so that dma_sync_sg_for_cpu with > > DMA_FROM_DEVICE can't be used, otherwise the fresh data in cache > > will be invalidated. > > But you're right, kernel warning comes out if CONFIG_DMA_API_DEBUG > > enabled... > > Ah, I think I understand what's going on now. That's... really ugly > :( > > Looking at the SPI core code, I think a better way to handle this > would > be to have your fallback path call spi_unmap_buf() directly (or > perform > the same actions, if exporting that to drivers is unacceptable), > then > make sure ->can_dma() returns false after that such that > spi_unmap_msg() > won't try to unmap it again. That's a lot more reasonable than trying > to > fake up a DMA_TO_DEVICE transfer in the middle of a DMA_FROM_DEVICE > operation on the same buffer. > > Alternatively, is it feasible to initiate a dummy DMA request during > probe, such that you can detect the failure condition and give up on > the > DMA channel early, and not have to deal with it during a real SPI > transfer? > > Robin. Would this cover the transient DMA failure that is happening between SDMA registration and firmware load? This is exactly the case for which the PIO fallback is triggered for us: As soon as the SDMA driver is registered, the SPI driver can be probed as well, usually failing its first DMA transfer, as the SDMA firmware is not loaded yet. We would still like the SPI controller to use DMA as soon as it's actually available. I assume the actual issue is that the SDMA controller is considered registered before the firmware load has finished, but I have no idea how feasible it would be to change that (some comments in the code explain why this currently isn't the case). Matthias
On 2020/06/09 Robin Murphy <robin.murphy@arm.com> wrote: > On 2020-06-09 06:21, Robin Gong wrote: > > On 2020/06/09 0:44 Robin Murphy <robin.murphy@arm.com> wrote: > >> On 2020-06-08 16:31, Mark Brown wrote: > >>> On Mon, Jun 08, 2020 at 03:08:45PM +0000, Robin Gong wrote: > >>> > >>>>>> + if (transfer->rx_sg.sgl) { > >>>>>> + struct device *rx_dev = spi->controller->dma_rx->device->dev; > >>>>>> + > >>>>>> + dma_sync_sg_for_device(rx_dev, transfer->rx_sg.sgl, > >>>>>> + transfer->rx_sg.nents, DMA_TO_DEVICE); > >>>>>> + } > >>>>>> + > >>> > >>>>> This is confusing - why are we DMA mapping to the device after > >>>>> doing a PIO transfer? > >>> > >>>> 'transfer->rx_sg.sgl' condition check that's the case fallback PIO > >>>> after DMA transfer failed. But the spi core still think the buffer > >>>> should be in 'device' while spi driver touch it by PIO(CPU), so > >>>> sync it back to > >> device to ensure all received data flush to DDR. > >>> > >>> So we sync it back to the device so that we can then do another sync > >>> to CPU? TBH I'm a bit surprised that there's a requirement that we > >>> explicitly undo a sync and that a redundant double sync in the same > >>> direction might be an issue but I've not had a need to care so I'm > >>> perfectly prepared to believe there is. > >>> > >>> At the very least this needs a comment. > >> > >> Yeah, something's off here - at the very least, syncing with > >> DMA_TO_DEVICE on the Rx buffer that was mapped with > DMA_FROM_DEVICE is clearly wrong. > >> CONFIG_DMA_API_DEBUG should scream about that. > >> > >> If the device has written to the buffer at all since dma_map_sg() was > >> called then you do need a dma_sync_sg_for_cpu() call before touching > >> it from a CPU fallback path, but if nobody's going to touch it from > >> that point until it's unmapped then there's no point syncing it > >> again. The > >> my_card_interrupt_handler() example in DMA-API_HOWTO.txt > demonstrates > >> this. > > Thanks for you post, but sorry, that's not spi-imx case now, because the rx > data in device memory is not truly updated from 'device'/DMA, but from PIO, > so that dma_sync_sg_for_cpu with DMA_FROM_DEVICE can't be used, > otherwise the fresh data in cache will be invalidated. > > But you're right, kernel warning comes out if CONFIG_DMA_API_DEBUG > enabled... > > Ah, I think I understand what's going on now. That's... really ugly :( Yeah...The only reason is to avoid touch any spi core code...I'm trying to implement fallback at spi core so that can spi_unmap_buf directly if dma transfer error and no need such dma_sync_* in spi client driver. Not sure if Mark could accept it. Thanks for your below great thoughts :) > > Looking at the SPI core code, I think a better way to handle this would be to > have your fallback path call spi_unmap_buf() directly (or perform the same > actions, if exporting that to drivers is unacceptable), then make sure > ->can_dma() returns false after that such that spi_unmap_msg() won't try to > unmap it again. That's a lot more reasonable than trying to fake up a > DMA_TO_DEVICE transfer in the middle of a DMA_FROM_DEVICE operation on > the same buffer. > > Alternatively, is it feasible to initiate a dummy DMA request during probe, such > that you can detect the failure condition and give up on the DMA channel early, > and not have to deal with it during a real SPI transfer? > > Robin.
On Tue, Jun 09, 2020 at 12:09:09PM +0200, Matthias Schiffer wrote: > I assume the actual issue is that the SDMA controller is considered > registered before the firmware load has finished, but I have no idea > how feasible it would be to change that (some comments in the code > explain why this currently isn't the case). Right, this is what's causing trouble (or at least the DMA driver not doing PIO behind the driver I guess but that's pretty nasty).
On Tue, Jun 09, 2020 at 11:00:33AM +0100, Robin Murphy wrote: > Ah, I think I understand what's going on now. That's... really ugly :( > Looking at the SPI core code, I think a better way to handle this would be > to have your fallback path call spi_unmap_buf() directly (or perform the > same actions, if exporting that to drivers is unacceptable), then make sure > ->can_dma() returns false after that such that spi_unmap_msg() won't try to > unmap it again. That's a lot more reasonable than trying to fake up a > DMA_TO_DEVICE transfer in the middle of a DMA_FROM_DEVICE operation on the > same buffer. Ideally the driver would be checking in can_dma() if the DMA controller is able to perform transactions rather than letting things run as far as trying to actually do the transfer, that's a whole lot cleaner and more manageable than running into an error doing the transfer. I'm surprised that there's no DMA API way to figure this out TBH. We'll also need some handling for this changing at runtime, we're not expecting this to be dynamic at all - we're expecting it to be a static property of the controller/transfer combination, we didn't contemplate this varying randomly at runtime. Instead of rechecking can_dma() we ought to have a flag saying if we did the mapping (which the bodge Robin suggests above could clear).
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c index b7a85e3..84aebee 100644 --- a/drivers/spi/spi-imx.c +++ b/drivers/spi/spi-imx.c @@ -1456,6 +1456,13 @@ static int spi_imx_pio_transfer(struct spi_device *spi, return -ETIMEDOUT; } + if (transfer->rx_sg.sgl) { + struct device *rx_dev = spi->controller->dma_rx->device->dev; + + dma_sync_sg_for_device(rx_dev, transfer->rx_sg.sgl, + transfer->rx_sg.nents, DMA_TO_DEVICE); + } + return transfer->len; } @@ -1521,10 +1528,15 @@ static int spi_imx_transfer(struct spi_device *spi, * firmware may not be updated as ERR009165 required. */ if (spi_imx->usedma) { + struct device *tx_dev = spi->controller->dma_tx->device->dev; + ret = spi_imx_dma_transfer(spi_imx, transfer); if (ret != -EINVAL) return ret; + dma_sync_sg_for_cpu(tx_dev, transfer->tx_sg.sgl, + transfer->tx_sg.nents, DMA_FROM_DEVICE); + spi_imx->devtype_data->disable_dma(spi_imx); spi_imx->usedma = false;
In case dma transfer failed and fallback to pio, tx_buf/rx_buf need to be taken care cache since they have already been maintained by spi.c Fixes: bcd8e7761ec9("spi: imx: fallback to PIO if dma setup failure") Signed-off-by: Robin Gong <yibin.gong@nxp.com> Reported-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Link: https://lore.kernel.org/linux-arm-kernel/5d246dd81607bb6e5cb9af86ad4e53f7a7a99c50.camel@ew.tq-group.com/ --- drivers/spi/spi-imx.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)