Message ID | 20170227120839.16545-3-vigneshr@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Vignesh, Am 27.02.2017 um 13:08 schrieb Vignesh R: > Many SPI controller drivers use DMA to read/write from m25p80 compatible > flashes. Therefore enable bounce buffers support provided by spi-nor > framework to take care of handling vmalloc'd buffers which may not be > DMA'able. > > Signed-off-by: Vignesh R <vigneshr@ti.com> > --- > drivers/mtd/devices/m25p80.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c > index c4df3b1bded0..d05acf22eadf 100644 > --- a/drivers/mtd/devices/m25p80.c > +++ b/drivers/mtd/devices/m25p80.c > @@ -241,6 +241,7 @@ static int m25p_probe(struct spi_device *spi) > else > flash_name = spi->modalias; > > + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; Isn't there a better way to detect whether a bounce buffer is needed or not? Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday 01 March 2017 03:11 AM, Richard Weinberger wrote: > Vignesh, > > Am 27.02.2017 um 13:08 schrieb Vignesh R: >> Many SPI controller drivers use DMA to read/write from m25p80 compatible >> flashes. Therefore enable bounce buffers support provided by spi-nor >> framework to take care of handling vmalloc'd buffers which may not be >> DMA'able. >> >> Signed-off-by: Vignesh R <vigneshr@ti.com> >> --- >> drivers/mtd/devices/m25p80.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c >> index c4df3b1bded0..d05acf22eadf 100644 >> --- a/drivers/mtd/devices/m25p80.c >> +++ b/drivers/mtd/devices/m25p80.c >> @@ -241,6 +241,7 @@ static int m25p_probe(struct spi_device *spi) >> else >> flash_name = spi->modalias; >> >> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > > Isn't there a better way to detect whether a bounce buffer is needed or not? Yes, I can poke the spi->master struct to see of dma channels are populated and request SNOR_F_USE_BOUNCE_BUFFER accordingly: - nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; + if (spi->master->dma_tx || spi->master->dma_rx) + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; +
Le 01/03/2017 à 05:54, Vignesh R a écrit : > > > On Wednesday 01 March 2017 03:11 AM, Richard Weinberger wrote: >> Vignesh, >> >> Am 27.02.2017 um 13:08 schrieb Vignesh R: >>> Many SPI controller drivers use DMA to read/write from m25p80 compatible >>> flashes. Therefore enable bounce buffers support provided by spi-nor >>> framework to take care of handling vmalloc'd buffers which may not be >>> DMA'able. >>> >>> Signed-off-by: Vignesh R <vigneshr@ti.com> >>> --- >>> drivers/mtd/devices/m25p80.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c >>> index c4df3b1bded0..d05acf22eadf 100644 >>> --- a/drivers/mtd/devices/m25p80.c >>> +++ b/drivers/mtd/devices/m25p80.c >>> @@ -241,6 +241,7 @@ static int m25p_probe(struct spi_device *spi) >>> else >>> flash_name = spi->modalias; >>> >>> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; >> >> Isn't there a better way to detect whether a bounce buffer is needed or not? > I agree with Richard: the bounce buffer should be enabled only if needed by the SPI controller. > Yes, I can poke the spi->master struct to see of dma channels are > populated and request SNOR_F_USE_BOUNCE_BUFFER accordingly: > > - nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > + if (spi->master->dma_tx || spi->master->dma_rx) > + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > + > However I don't agree with this solution: master->dma_{tx|rx} can be set for SPI controllers which already rely on spi_map_msg() to handle vmalloc'ed memory during DMA transfers. Such SPI controllers don't need the spi-nor bounce buffer. spi_map_msg() can build a scatter-gather list from vmalloc'ed buffer then map this sg list with dma_map_sg(). AFAIK, It is safe to do so for architectures using PIPT caches since the possible cache aliases issue present for VIPT or VIVT caches is always avoided for PIPT caches. For instance, the drivers/spi/spi-atmel.c driver relies on spi_map_sg() to be called from the SPI sub-system to handle vmalloc'ed buffers and both master->dma_tx and master->dma_rx are set by the this driver. By the way, Is there any case where the same physical page is actually mapped into two different virtual addresses for the buffers allocated by the MTD sub-system? Because for a long time now I wonder whether the cache aliases issue is a real or only theoretical issue but I have no answer to that question. Then my next question: is spi_map_msg() enough in every case, even with VIPT or VIVT caches? Best regards, Cyrille -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/03/2017 11:43, Cyrille Pitchen wrote: > Le 01/03/2017 à 05:54, Vignesh R a écrit : >> >> On Wednesday 01 March 2017 03:11 AM, Richard Weinberger wrote: >>> Vignesh, >>> >>> Am 27.02.2017 um 13:08 schrieb Vignesh R: >>>> Many SPI controller drivers use DMA to read/write from m25p80 compatible >>>> flashes. Therefore enable bounce buffers support provided by spi-nor >>>> framework to take care of handling vmalloc'd buffers which may not be >>>> DMA'able. >>>> >>>> Signed-off-by: Vignesh R <vigneshr@ti.com> >>>> --- >>>> drivers/mtd/devices/m25p80.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c >>>> index c4df3b1bded0..d05acf22eadf 100644 >>>> --- a/drivers/mtd/devices/m25p80.c >>>> +++ b/drivers/mtd/devices/m25p80.c >>>> @@ -241,6 +241,7 @@ static int m25p_probe(struct spi_device *spi) >>>> else >>>> flash_name = spi->modalias; >>>> >>>> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; >>> Isn't there a better way to detect whether a bounce buffer is needed or not? > I agree with Richard: the bounce buffer should be enabled only if needed > by the SPI controller. > >> Yes, I can poke the spi->master struct to see of dma channels are >> populated and request SNOR_F_USE_BOUNCE_BUFFER accordingly: >> >> - nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; >> + if (spi->master->dma_tx || spi->master->dma_rx) >> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; >> + >> > However I don't agree with this solution: master->dma_{tx|rx} can be set > for SPI controllers which already rely on spi_map_msg() to handle > vmalloc'ed memory during DMA transfers. > Such SPI controllers don't need the spi-nor bounce buffer. I tried to push a patch handling the VIVT cache flush in the daVinci SPI driver, but this was NACK'ed. So, in general terms, the upper layers do not know if the SPI controller will correctly handle vmalloc'ed buffers. > > spi_map_msg() can build a scatter-gather list from vmalloc'ed buffer > then map this sg list with dma_map_sg(). AFAIK, It is safe to do so for > architectures using PIPT caches since the possible cache aliases issue > present for VIPT or VIVT caches is always avoided for PIPT caches. > > For instance, the drivers/spi/spi-atmel.c driver relies on spi_map_sg() > to be called from the SPI sub-system to handle vmalloc'ed buffers and > both master->dma_tx and master->dma_rx are set by the this driver. > > > By the way, Is there any case where the same physical page is actually > mapped into two different virtual addresses for the buffers allocated by > the MTD sub-system? Because for a long time now I wonder whether the > cache aliases issue is a real or only theoretical issue but I have no > answer to that question. I can reproduce the VIVT cache bug with 100% reliability either using spi-loopback-test (with patches) or mounting/reading/writing to a UBIFS volume. The physical address of the vmalloc'ed buffers is mapped to both vmap and lowmem virtual address, and when doing DMA, only the lowmem virtual address is used when invalidating/flushing the cache. Thanks, Frode > > Then my next question: is spi_map_msg() enough in every case, even with > VIPT or VIVT caches? > > Best regards, > > Cyrille -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 1 Mar 2017 17:16:30 +0530 Vignesh R <vigneshr@ti.com> wrote: > On Wednesday 01 March 2017 04:13 PM, Cyrille Pitchen wrote: > > Le 01/03/2017 à 05:54, Vignesh R a écrit : > >> > >> > >> On Wednesday 01 March 2017 03:11 AM, Richard Weinberger wrote: > >>> Vignesh, > >>> > >>> Am 27.02.2017 um 13:08 schrieb Vignesh R: > >>>> Many SPI controller drivers use DMA to read/write from m25p80 compatible > >>>> flashes. Therefore enable bounce buffers support provided by spi-nor > >>>> framework to take care of handling vmalloc'd buffers which may not be > >>>> DMA'able. > >>>> > >>>> Signed-off-by: Vignesh R <vigneshr@ti.com> > >>>> --- > >>>> drivers/mtd/devices/m25p80.c | 1 + > >>>> 1 file changed, 1 insertion(+) > >>>> > >>>> diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c > >>>> index c4df3b1bded0..d05acf22eadf 100644 > >>>> --- a/drivers/mtd/devices/m25p80.c > >>>> +++ b/drivers/mtd/devices/m25p80.c > >>>> @@ -241,6 +241,7 @@ static int m25p_probe(struct spi_device *spi) > >>>> else > >>>> flash_name = spi->modalias; > >>>> > >>>> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > >>> > >>> Isn't there a better way to detect whether a bounce buffer is needed or not? > >> > > > > I agree with Richard: the bounce buffer should be enabled only if needed > > by the SPI controller. > > > >> Yes, I can poke the spi->master struct to see of dma channels are > >> populated and request SNOR_F_USE_BOUNCE_BUFFER accordingly: > >> > >> - nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > >> + if (spi->master->dma_tx || spi->master->dma_rx) > >> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > >> + > >> > > > > However I don't agree with this solution: master->dma_{tx|rx} can be set > > for SPI controllers which already rely on spi_map_msg() to handle > > vmalloc'ed memory during DMA transfers. > > Such SPI controllers don't need the spi-nor bounce buffer. > > > > spi_map_msg() can build a scatter-gather list from vmalloc'ed buffer > > then map this sg list with dma_map_sg(). AFAIK, It is safe to do so for > > architectures using PIPT caches since the possible cache aliases issue > > present for VIPT or VIVT caches is always avoided for PIPT caches. > > > > For instance, the drivers/spi/spi-atmel.c driver relies on spi_map_sg() > > to be called from the SPI sub-system to handle vmalloc'ed buffers and > > both master->dma_tx and master->dma_rx are set by the this driver. > > > > > > By the way, Is there any case where the same physical page is actually > > mapped into two different virtual addresses for the buffers allocated by > > the MTD sub-system? Because for a long time now I wonder whether the > > cache aliases issue is a real or only theoretical issue but I have no > > answer to that question. > > > > I have atleast one evidence of VIVT aliasing causing problem. Please see > this thread on DMA issues with davinci-spi driver > https://www.spinics.net/lists/arm-kernel/msg563420.html > https://www.spinics.net/lists/arm-kernel/msg563445.html > > > Then my next question: is spi_map_msg() enough in every case, even with > > VIPT or VIVT caches? > > > > Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM > cortex-a15) wherein pages allocated by vmalloc are in highmem region > that are not addressable using 32 bit addresses and is backed by LPAE. > So, a 32 bit DMA cannot access these buffers at all. > When dma_map_sg() is called to map these pages by spi_map_buf() the > physical address is just truncated to 32 bit in pfn_to_dma() (as part of > dma_map_sg() call). This results in random crashes as DMA starts > accessing random memory during SPI read. > > IMO, there may be more undiscovered caveat with using dma_map_sg() for > non kmalloc'd buffers and its better that spi-nor starts handling these > buffers instead of relying on spi_map_msg() and working around every > time something pops up. > I agree that using a bounce buffer when the address is not in the kmalloc range is the safest solution we have. Now, if we are concerned about the perf penalty brought by the extra copy, we should patch UBI/UBIFS to allocate small chunks using kmalloc instead of allocating PEB/LEB buffers using vmalloc. Of course this implies some rework, but at least we'll get rid of all the cache invalidation problems. Also note that not all DMA controllers are supporting SG transfers, so having UBI allocate X buffers of ubi->max_write_size using kmalloc instead of one buffer of (X * ubi->max_write_size) size using vmalloc is probably the way to go if we want to avoid the extra copy in all cases. I started to work on the ubi_buffer concept a while ago (an object containing an array of kmalloc-ed bufs, and some helpers to read/write from/to these buffers), but I don't know if I still have this branch somewhere. -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
+ Mark Le 01/03/2017 à 12:46, Vignesh R a écrit : > > > On Wednesday 01 March 2017 04:13 PM, Cyrille Pitchen wrote: >> Le 01/03/2017 à 05:54, Vignesh R a écrit : >>> >>> >>> On Wednesday 01 March 2017 03:11 AM, Richard Weinberger wrote: >>>> Vignesh, >>>> >>>> Am 27.02.2017 um 13:08 schrieb Vignesh R: >>>>> Many SPI controller drivers use DMA to read/write from m25p80 compatible >>>>> flashes. Therefore enable bounce buffers support provided by spi-nor >>>>> framework to take care of handling vmalloc'd buffers which may not be >>>>> DMA'able. >>>>> >>>>> Signed-off-by: Vignesh R <vigneshr@ti.com> >>>>> --- >>>>> drivers/mtd/devices/m25p80.c | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c >>>>> index c4df3b1bded0..d05acf22eadf 100644 >>>>> --- a/drivers/mtd/devices/m25p80.c >>>>> +++ b/drivers/mtd/devices/m25p80.c >>>>> @@ -241,6 +241,7 @@ static int m25p_probe(struct spi_device *spi) >>>>> else >>>>> flash_name = spi->modalias; >>>>> >>>>> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; >>>> >>>> Isn't there a better way to detect whether a bounce buffer is needed or not? >>> >> >> I agree with Richard: the bounce buffer should be enabled only if needed >> by the SPI controller. >> >>> Yes, I can poke the spi->master struct to see of dma channels are >>> populated and request SNOR_F_USE_BOUNCE_BUFFER accordingly: >>> >>> - nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; >>> + if (spi->master->dma_tx || spi->master->dma_rx) >>> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; >>> + >>> >> >> However I don't agree with this solution: master->dma_{tx|rx} can be set >> for SPI controllers which already rely on spi_map_msg() to handle >> vmalloc'ed memory during DMA transfers. >> Such SPI controllers don't need the spi-nor bounce buffer. >> >> spi_map_msg() can build a scatter-gather list from vmalloc'ed buffer >> then map this sg list with dma_map_sg(). AFAIK, It is safe to do so for >> architectures using PIPT caches since the possible cache aliases issue >> present for VIPT or VIVT caches is always avoided for PIPT caches. >> >> For instance, the drivers/spi/spi-atmel.c driver relies on spi_map_sg() >> to be called from the SPI sub-system to handle vmalloc'ed buffers and >> both master->dma_tx and master->dma_rx are set by the this driver. >> >> >> By the way, Is there any case where the same physical page is actually >> mapped into two different virtual addresses for the buffers allocated by >> the MTD sub-system? Because for a long time now I wonder whether the >> cache aliases issue is a real or only theoretical issue but I have no >> answer to that question. >> > > I have atleast one evidence of VIVT aliasing causing problem. Please see > this thread on DMA issues with davinci-spi driver > https://www.spinics.net/lists/arm-kernel/msg563420.html > https://www.spinics.net/lists/arm-kernel/msg563445.html > >> Then my next question: is spi_map_msg() enough in every case, even with >> VIPT or VIVT caches? >> > > Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM > cortex-a15) wherein pages allocated by vmalloc are in highmem region > that are not addressable using 32 bit addresses and is backed by LPAE. > So, a 32 bit DMA cannot access these buffers at all. > When dma_map_sg() is called to map these pages by spi_map_buf() the > physical address is just truncated to 32 bit in pfn_to_dma() (as part of > dma_map_sg() call). This results in random crashes as DMA starts > accessing random memory during SPI read. > > IMO, there may be more undiscovered caveat with using dma_map_sg() for > non kmalloc'd buffers and its better that spi-nor starts handling these > buffers instead of relying on spi_map_msg() and working around every > time something pops up. > Both Frode and you confirmed that the alias issue does occur at least with VIVT caches, hence we can't rely on spi_map_msg() in that case. So I agree with you: adding a bounce buffer in spi-nor seems to be a good solution at least till some rework is done in the ubifs layer, as proposed by Boris, to replace vmalloc'ed buffers by kmalloc'ed memory. However I still think testing spi->master->dma_{tx|rx} only is not accurate enough to decided whether or not the SNOR_F_USE_BOUNCE_BUFFER flag should be set. I have doubts about forcing the use of the bounce buffer for *all* SPI controllers doing DMA transfers. We already know that the cache aliases issue can't occur with PIPT caches or some VIPT caches. So in those cases spi_map_buf() is enough, hence can we justify the performance loss of the copy into the bounce buffer for people who don't suffer the issue? Besides, some SPI controller drivers may already use their own bounce buffer for other reasons. Then for those controllers, it would be one more copy. Then I don't whether we should: 1 - extend in SPI sub-system API to tell us if the SPI controller can deal with non-kmalloc'd buffer for DMA transfers or 2 - get the answer at a system-wide level. Indeed, the question cache aliases is clearly not specific to the MTD or SPI sub-systems but only the SPI controller driver can tell whether it already uses its own bounce buffer. Best regards, Cyrille -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 1 Mar 2017 15:21:24 +0100 Cyrille Pitchen <cyrille.pitchen@atmel.com> wrote: > + Mark > > Le 01/03/2017 à 12:46, Vignesh R a écrit : > > > > > > On Wednesday 01 March 2017 04:13 PM, Cyrille Pitchen wrote: > >> Le 01/03/2017 à 05:54, Vignesh R a écrit : > >>> > >>> > >>> On Wednesday 01 March 2017 03:11 AM, Richard Weinberger wrote: > >>>> Vignesh, > >>>> > >>>> Am 27.02.2017 um 13:08 schrieb Vignesh R: > >>>>> Many SPI controller drivers use DMA to read/write from m25p80 compatible > >>>>> flashes. Therefore enable bounce buffers support provided by spi-nor > >>>>> framework to take care of handling vmalloc'd buffers which may not be > >>>>> DMA'able. > >>>>> > >>>>> Signed-off-by: Vignesh R <vigneshr@ti.com> > >>>>> --- > >>>>> drivers/mtd/devices/m25p80.c | 1 + > >>>>> 1 file changed, 1 insertion(+) > >>>>> > >>>>> diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c > >>>>> index c4df3b1bded0..d05acf22eadf 100644 > >>>>> --- a/drivers/mtd/devices/m25p80.c > >>>>> +++ b/drivers/mtd/devices/m25p80.c > >>>>> @@ -241,6 +241,7 @@ static int m25p_probe(struct spi_device *spi) > >>>>> else > >>>>> flash_name = spi->modalias; > >>>>> > >>>>> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > >>>> > >>>> Isn't there a better way to detect whether a bounce buffer is needed or not? > >>> > >> > >> I agree with Richard: the bounce buffer should be enabled only if needed > >> by the SPI controller. > >> > >>> Yes, I can poke the spi->master struct to see of dma channels are > >>> populated and request SNOR_F_USE_BOUNCE_BUFFER accordingly: > >>> > >>> - nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > >>> + if (spi->master->dma_tx || spi->master->dma_rx) > >>> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > >>> + > >>> > >> > >> However I don't agree with this solution: master->dma_{tx|rx} can be set > >> for SPI controllers which already rely on spi_map_msg() to handle > >> vmalloc'ed memory during DMA transfers. > >> Such SPI controllers don't need the spi-nor bounce buffer. > >> > >> spi_map_msg() can build a scatter-gather list from vmalloc'ed buffer > >> then map this sg list with dma_map_sg(). AFAIK, It is safe to do so for > >> architectures using PIPT caches since the possible cache aliases issue > >> present for VIPT or VIVT caches is always avoided for PIPT caches. > >> > >> For instance, the drivers/spi/spi-atmel.c driver relies on spi_map_sg() > >> to be called from the SPI sub-system to handle vmalloc'ed buffers and > >> both master->dma_tx and master->dma_rx are set by the this driver. > >> > >> > >> By the way, Is there any case where the same physical page is actually > >> mapped into two different virtual addresses for the buffers allocated by > >> the MTD sub-system? Because for a long time now I wonder whether the > >> cache aliases issue is a real or only theoretical issue but I have no > >> answer to that question. > >> > > > > I have atleast one evidence of VIVT aliasing causing problem. Please see > > this thread on DMA issues with davinci-spi driver > > https://www.spinics.net/lists/arm-kernel/msg563420.html > > https://www.spinics.net/lists/arm-kernel/msg563445.html > > > >> Then my next question: is spi_map_msg() enough in every case, even with > >> VIPT or VIVT caches? > >> > > > > Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM > > cortex-a15) wherein pages allocated by vmalloc are in highmem region > > that are not addressable using 32 bit addresses and is backed by LPAE. > > So, a 32 bit DMA cannot access these buffers at all. > > When dma_map_sg() is called to map these pages by spi_map_buf() the > > physical address is just truncated to 32 bit in pfn_to_dma() (as part of > > dma_map_sg() call). This results in random crashes as DMA starts > > accessing random memory during SPI read. > > > > IMO, there may be more undiscovered caveat with using dma_map_sg() for > > non kmalloc'd buffers and its better that spi-nor starts handling these > > buffers instead of relying on spi_map_msg() and working around every > > time something pops up. > > > > Both Frode and you confirmed that the alias issue does occur at least > with VIVT caches, hence we can't rely on spi_map_msg() in that case. > So I agree with you: adding a bounce buffer in spi-nor seems to be a > good solution at least till some rework is done in the ubifs layer, as > proposed by Boris, to replace vmalloc'ed buffers by kmalloc'ed memory. We should keep it even after reworking UBI/UBIFS, because UBI is just one user of MTD, and other users might pass vmalloc-ed or kmap-ed bufs. -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le 01/03/2017 à 15:28, Boris Brezillon a écrit : > On Wed, 1 Mar 2017 15:21:24 +0100 > Cyrille Pitchen <cyrille.pitchen@atmel.com> wrote: > >> + Mark >> >> Le 01/03/2017 à 12:46, Vignesh R a écrit : >>> >>> >>> On Wednesday 01 March 2017 04:13 PM, Cyrille Pitchen wrote: >>>> Le 01/03/2017 à 05:54, Vignesh R a écrit : >>>>> >>>>> >>>>> On Wednesday 01 March 2017 03:11 AM, Richard Weinberger wrote: >>>>>> Vignesh, >>>>>> >>>>>> Am 27.02.2017 um 13:08 schrieb Vignesh R: >>>>>>> Many SPI controller drivers use DMA to read/write from m25p80 compatible >>>>>>> flashes. Therefore enable bounce buffers support provided by spi-nor >>>>>>> framework to take care of handling vmalloc'd buffers which may not be >>>>>>> DMA'able. >>>>>>> >>>>>>> Signed-off-by: Vignesh R <vigneshr@ti.com> >>>>>>> --- >>>>>>> drivers/mtd/devices/m25p80.c | 1 + >>>>>>> 1 file changed, 1 insertion(+) >>>>>>> >>>>>>> diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c >>>>>>> index c4df3b1bded0..d05acf22eadf 100644 >>>>>>> --- a/drivers/mtd/devices/m25p80.c >>>>>>> +++ b/drivers/mtd/devices/m25p80.c >>>>>>> @@ -241,6 +241,7 @@ static int m25p_probe(struct spi_device *spi) >>>>>>> else >>>>>>> flash_name = spi->modalias; >>>>>>> >>>>>>> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; >>>>>> >>>>>> Isn't there a better way to detect whether a bounce buffer is needed or not? >>>>> >>>> >>>> I agree with Richard: the bounce buffer should be enabled only if needed >>>> by the SPI controller. >>>> >>>>> Yes, I can poke the spi->master struct to see of dma channels are >>>>> populated and request SNOR_F_USE_BOUNCE_BUFFER accordingly: >>>>> >>>>> - nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; >>>>> + if (spi->master->dma_tx || spi->master->dma_rx) >>>>> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; >>>>> + >>>>> >>>> >>>> However I don't agree with this solution: master->dma_{tx|rx} can be set >>>> for SPI controllers which already rely on spi_map_msg() to handle >>>> vmalloc'ed memory during DMA transfers. >>>> Such SPI controllers don't need the spi-nor bounce buffer. >>>> >>>> spi_map_msg() can build a scatter-gather list from vmalloc'ed buffer >>>> then map this sg list with dma_map_sg(). AFAIK, It is safe to do so for >>>> architectures using PIPT caches since the possible cache aliases issue >>>> present for VIPT or VIVT caches is always avoided for PIPT caches. >>>> >>>> For instance, the drivers/spi/spi-atmel.c driver relies on spi_map_sg() >>>> to be called from the SPI sub-system to handle vmalloc'ed buffers and >>>> both master->dma_tx and master->dma_rx are set by the this driver. >>>> >>>> >>>> By the way, Is there any case where the same physical page is actually >>>> mapped into two different virtual addresses for the buffers allocated by >>>> the MTD sub-system? Because for a long time now I wonder whether the >>>> cache aliases issue is a real or only theoretical issue but I have no >>>> answer to that question. >>>> >>> >>> I have atleast one evidence of VIVT aliasing causing problem. Please see >>> this thread on DMA issues with davinci-spi driver >>> https://www.spinics.net/lists/arm-kernel/msg563420.html >>> https://www.spinics.net/lists/arm-kernel/msg563445.html >>> >>>> Then my next question: is spi_map_msg() enough in every case, even with >>>> VIPT or VIVT caches? >>>> >>> >>> Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM >>> cortex-a15) wherein pages allocated by vmalloc are in highmem region >>> that are not addressable using 32 bit addresses and is backed by LPAE. >>> So, a 32 bit DMA cannot access these buffers at all. >>> When dma_map_sg() is called to map these pages by spi_map_buf() the >>> physical address is just truncated to 32 bit in pfn_to_dma() (as part of >>> dma_map_sg() call). This results in random crashes as DMA starts >>> accessing random memory during SPI read. >>> >>> IMO, there may be more undiscovered caveat with using dma_map_sg() for >>> non kmalloc'd buffers and its better that spi-nor starts handling these >>> buffers instead of relying on spi_map_msg() and working around every >>> time something pops up. >>> >> >> Both Frode and you confirmed that the alias issue does occur at least >> with VIVT caches, hence we can't rely on spi_map_msg() in that case. >> So I agree with you: adding a bounce buffer in spi-nor seems to be a >> good solution at least till some rework is done in the ubifs layer, as >> proposed by Boris, to replace vmalloc'ed buffers by kmalloc'ed memory. > > We should keep it even after reworking UBI/UBIFS, because UBI is just > one user of MTD, and other users might pass vmalloc-ed or kmap-ed bufs. > > I'm fine with that :) -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 01, 2017 at 03:21:24PM +0100, Cyrille Pitchen wrote: > Besides, some SPI controller drivers may already use their own bounce > buffer for other reasons. Then for those controllers, it would be one > more copy. They probably shouldn't, there's a lot of legacy drivers that do all sorts of fun stuff but if people are having performance problems we should probably be putting pressure on the driver users to improve things. Anything that has a bounce buffer probably isn't going to say it can do DMA though... > Then I don't whether we should: > 1 - extend in SPI sub-system API to tell us if the SPI controller can > deal with non-kmalloc'd buffer for DMA transfers I don't think we can tell any better than anything else really. We could tell if a driver has a bounce buffer or is PIO based but that's not the whole story so it's not ideal. > or > 2 - get the answer at a system-wide level. That seems better. It'd be really good if we had a function we could call that'd do the mappings including any bounce buffering that's needed rather than having to think about it in the subsystems.
On Wed, 1 Mar 2017 17:16:30 +0530 Vignesh R <vigneshr@ti.com> wrote: > On Wednesday 01 March 2017 04:13 PM, Cyrille Pitchen wrote: > > Le 01/03/2017 à 05:54, Vignesh R a écrit : > >> > >> > >> On Wednesday 01 March 2017 03:11 AM, Richard Weinberger wrote: > >>> Vignesh, > >>> > >>> Am 27.02.2017 um 13:08 schrieb Vignesh R: > >>>> Many SPI controller drivers use DMA to read/write from m25p80 compatible > >>>> flashes. Therefore enable bounce buffers support provided by spi-nor > >>>> framework to take care of handling vmalloc'd buffers which may not be > >>>> DMA'able. > >>>> > >>>> Signed-off-by: Vignesh R <vigneshr@ti.com> > >>>> --- > >>>> drivers/mtd/devices/m25p80.c | 1 + > >>>> 1 file changed, 1 insertion(+) > >>>> > >>>> diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c > >>>> index c4df3b1bded0..d05acf22eadf 100644 > >>>> --- a/drivers/mtd/devices/m25p80.c > >>>> +++ b/drivers/mtd/devices/m25p80.c > >>>> @@ -241,6 +241,7 @@ static int m25p_probe(struct spi_device *spi) > >>>> else > >>>> flash_name = spi->modalias; > >>>> > >>>> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > >>> > >>> Isn't there a better way to detect whether a bounce buffer is needed or not? > >> > > > > I agree with Richard: the bounce buffer should be enabled only if needed > > by the SPI controller. > > > >> Yes, I can poke the spi->master struct to see of dma channels are > >> populated and request SNOR_F_USE_BOUNCE_BUFFER accordingly: > >> > >> - nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > >> + if (spi->master->dma_tx || spi->master->dma_rx) > >> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > >> + > >> > > > > However I don't agree with this solution: master->dma_{tx|rx} can be set > > for SPI controllers which already rely on spi_map_msg() to handle > > vmalloc'ed memory during DMA transfers. > > Such SPI controllers don't need the spi-nor bounce buffer. > > > > spi_map_msg() can build a scatter-gather list from vmalloc'ed buffer > > then map this sg list with dma_map_sg(). AFAIK, It is safe to do so for > > architectures using PIPT caches since the possible cache aliases issue > > present for VIPT or VIVT caches is always avoided for PIPT caches. > > > > For instance, the drivers/spi/spi-atmel.c driver relies on spi_map_sg() > > to be called from the SPI sub-system to handle vmalloc'ed buffers and > > both master->dma_tx and master->dma_rx are set by the this driver. > > > > > > By the way, Is there any case where the same physical page is actually > > mapped into two different virtual addresses for the buffers allocated by > > the MTD sub-system? Because for a long time now I wonder whether the > > cache aliases issue is a real or only theoretical issue but I have no > > answer to that question. > > > > I have atleast one evidence of VIVT aliasing causing problem. Please see > this thread on DMA issues with davinci-spi driver > https://www.spinics.net/lists/arm-kernel/msg563420.html > https://www.spinics.net/lists/arm-kernel/msg563445.html > > > Then my next question: is spi_map_msg() enough in every case, even with > > VIPT or VIVT caches? > > > > Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM > cortex-a15) wherein pages allocated by vmalloc are in highmem region > that are not addressable using 32 bit addresses and is backed by LPAE. > So, a 32 bit DMA cannot access these buffers at all. > When dma_map_sg() is called to map these pages by spi_map_buf() the > physical address is just truncated to 32 bit in pfn_to_dma() (as part of > dma_map_sg() call). This results in random crashes as DMA starts > accessing random memory during SPI read. > > IMO, there may be more undiscovered caveat with using dma_map_sg() for > non kmalloc'd buffers and its better that spi-nor starts handling these > buffers instead of relying on spi_map_msg() and working around every > time something pops up. > Hm, I had a discussion with Cyrille on the spi-nor/m25p80 architecture, and I must admit I don't understand why the decision to use a bounce buffer is done at the m25p80 level. While it make sense to do that for dedicated spi-flash controllers, which can decide by themselves if they need a bounce buffer for non kmalloc-ed buffers, I don't understand why we have to do that for regular SPI controllers. I mean, the same problem can arise for other SPI device drivers if they decide to use vmalloc-ed or kmap-ed buffers, so this should be handled at the SPI level. If the SPI framework does not impose any restriction on the buffers it is passed, it should either take care of this, or ask SPI controller drivers to do so (which implies checking the buffer address directly inside the SPI controller driver and decide to use a bounce buffer or use PIO mode). Am I misunderstanding something here? -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 1 Mar 2017 17:16:30 +0530 Vignesh R <vigneshr@ti.com> wrote: > On Wednesday 01 March 2017 04:13 PM, Cyrille Pitchen wrote: > > Le 01/03/2017 à 05:54, Vignesh R a écrit : > >> > >> > >> On Wednesday 01 March 2017 03:11 AM, Richard Weinberger wrote: > >>> Vignesh, > >>> > >>> Am 27.02.2017 um 13:08 schrieb Vignesh R: > >>>> Many SPI controller drivers use DMA to read/write from m25p80 compatible > >>>> flashes. Therefore enable bounce buffers support provided by spi-nor > >>>> framework to take care of handling vmalloc'd buffers which may not be > >>>> DMA'able. > >>>> > >>>> Signed-off-by: Vignesh R <vigneshr@ti.com> > >>>> --- > >>>> drivers/mtd/devices/m25p80.c | 1 + > >>>> 1 file changed, 1 insertion(+) > >>>> > >>>> diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c > >>>> index c4df3b1bded0..d05acf22eadf 100644 > >>>> --- a/drivers/mtd/devices/m25p80.c > >>>> +++ b/drivers/mtd/devices/m25p80.c > >>>> @@ -241,6 +241,7 @@ static int m25p_probe(struct spi_device *spi) > >>>> else > >>>> flash_name = spi->modalias; > >>>> > >>>> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > >>> > >>> Isn't there a better way to detect whether a bounce buffer is needed or not? > >> > > > > I agree with Richard: the bounce buffer should be enabled only if needed > > by the SPI controller. > > > >> Yes, I can poke the spi->master struct to see of dma channels are > >> populated and request SNOR_F_USE_BOUNCE_BUFFER accordingly: > >> > >> - nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > >> + if (spi->master->dma_tx || spi->master->dma_rx) > >> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; > >> + > >> > > > > However I don't agree with this solution: master->dma_{tx|rx} can be set > > for SPI controllers which already rely on spi_map_msg() to handle > > vmalloc'ed memory during DMA transfers. > > Such SPI controllers don't need the spi-nor bounce buffer. > > > > spi_map_msg() can build a scatter-gather list from vmalloc'ed buffer > > then map this sg list with dma_map_sg(). AFAIK, It is safe to do so for > > architectures using PIPT caches since the possible cache aliases issue > > present for VIPT or VIVT caches is always avoided for PIPT caches. > > > > For instance, the drivers/spi/spi-atmel.c driver relies on spi_map_sg() > > to be called from the SPI sub-system to handle vmalloc'ed buffers and > > both master->dma_tx and master->dma_rx are set by the this driver. > > > > > > By the way, Is there any case where the same physical page is actually > > mapped into two different virtual addresses for the buffers allocated by > > the MTD sub-system? Because for a long time now I wonder whether the > > cache aliases issue is a real or only theoretical issue but I have no > > answer to that question. > > > > I have atleast one evidence of VIVT aliasing causing problem. Please see > this thread on DMA issues with davinci-spi driver > https://www.spinics.net/lists/arm-kernel/msg563420.html > https://www.spinics.net/lists/arm-kernel/msg563445.html > > > Then my next question: is spi_map_msg() enough in every case, even with > > VIPT or VIVT caches? > > > > Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM > cortex-a15) wherein pages allocated by vmalloc are in highmem region > that are not addressable using 32 bit addresses and is backed by LPAE. > So, a 32 bit DMA cannot access these buffers at all. > When dma_map_sg() is called to map these pages by spi_map_buf() the > physical address is just truncated to 32 bit in pfn_to_dma() (as part of > dma_map_sg() call). This results in random crashes as DMA starts > accessing random memory during SPI read. > > IMO, there may be more undiscovered caveat with using dma_map_sg() for > non kmalloc'd buffers and its better that spi-nor starts handling these > buffers instead of relying on spi_map_msg() and working around every > time something pops up. > Ok, I had a closer look at the SPI framework, and it seems there's a way to tell to the core that a specific transfer cannot use DMA (->can_dam()). The first thing you should do is fix the spi-davinci driver: 1/ implement ->can_dma() 2/ patch davinci_spi_bufs() to take the decision to do DMA or not on a per-xfer basis and not on a per-device basis Then we can start thinking about how to improve perfs by using a bounce buffer for large transfers, but I'm still not sure this should be done at the MTD level... -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/03/2017 17:55, Boris Brezillon wrote: > On Wed, 1 Mar 2017 17:16:30 +0530 > Vignesh R <vigneshr@ti.com> wrote: > >> On Wednesday 01 March 2017 04:13 PM, Cyrille Pitchen wrote: >>> Le 01/03/2017 à 05:54, Vignesh R a écrit : >>>> >>>> On Wednesday 01 March 2017 03:11 AM, Richard Weinberger wrote: >>>>> Vignesh, >>>>> >>>>> Am 27.02.2017 um 13:08 schrieb Vignesh R: >>>>>> Many SPI controller drivers use DMA to read/write from m25p80 compatible >>>>>> flashes. Therefore enable bounce buffers support provided by spi-nor >>>>>> framework to take care of handling vmalloc'd buffers which may not be >>>>>> DMA'able. >>>>>> >>>>>> Signed-off-by: Vignesh R <vigneshr@ti.com> >>>>>> --- >>>>>> drivers/mtd/devices/m25p80.c | 1 + >>>>>> 1 file changed, 1 insertion(+) >>>>>> >>>>>> diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c >>>>>> index c4df3b1bded0..d05acf22eadf 100644 >>>>>> --- a/drivers/mtd/devices/m25p80.c >>>>>> +++ b/drivers/mtd/devices/m25p80.c >>>>>> @@ -241,6 +241,7 @@ static int m25p_probe(struct spi_device *spi) >>>>>> else >>>>>> flash_name = spi->modalias; >>>>>> >>>>>> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; >>>>> Isn't there a better way to detect whether a bounce buffer is needed or not? >>>> >>> I agree with Richard: the bounce buffer should be enabled only if needed >>> by the SPI controller. >>> >>>> Yes, I can poke the spi->master struct to see of dma channels are >>>> populated and request SNOR_F_USE_BOUNCE_BUFFER accordingly: >>>> >>>> - nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; >>>> + if (spi->master->dma_tx || spi->master->dma_rx) >>>> + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; >>>> + >>>> >>> However I don't agree with this solution: master->dma_{tx|rx} can be set >>> for SPI controllers which already rely on spi_map_msg() to handle >>> vmalloc'ed memory during DMA transfers. >>> Such SPI controllers don't need the spi-nor bounce buffer. >>> >>> spi_map_msg() can build a scatter-gather list from vmalloc'ed buffer >>> then map this sg list with dma_map_sg(). AFAIK, It is safe to do so for >>> architectures using PIPT caches since the possible cache aliases issue >>> present for VIPT or VIVT caches is always avoided for PIPT caches. >>> >>> For instance, the drivers/spi/spi-atmel.c driver relies on spi_map_sg() >>> to be called from the SPI sub-system to handle vmalloc'ed buffers and >>> both master->dma_tx and master->dma_rx are set by the this driver. >>> >>> >>> By the way, Is there any case where the same physical page is actually >>> mapped into two different virtual addresses for the buffers allocated by >>> the MTD sub-system? Because for a long time now I wonder whether the >>> cache aliases issue is a real or only theoretical issue but I have no >>> answer to that question. >>> >> I have atleast one evidence of VIVT aliasing causing problem. Please see >> this thread on DMA issues with davinci-spi driver >> https://www.spinics.net/lists/arm-kernel/msg563420.html >> https://www.spinics.net/lists/arm-kernel/msg563445.html >> >>> Then my next question: is spi_map_msg() enough in every case, even with >>> VIPT or VIVT caches? >>> >> Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM >> cortex-a15) wherein pages allocated by vmalloc are in highmem region >> that are not addressable using 32 bit addresses and is backed by LPAE. >> So, a 32 bit DMA cannot access these buffers at all. >> When dma_map_sg() is called to map these pages by spi_map_buf() the >> physical address is just truncated to 32 bit in pfn_to_dma() (as part of >> dma_map_sg() call). This results in random crashes as DMA starts >> accessing random memory during SPI read. >> >> IMO, there may be more undiscovered caveat with using dma_map_sg() for >> non kmalloc'd buffers and its better that spi-nor starts handling these >> buffers instead of relying on spi_map_msg() and working around every >> time something pops up. >> > Ok, I had a closer look at the SPI framework, and it seems there's a > way to tell to the core that a specific transfer cannot use DMA > (->can_dam()). The first thing you should do is fix the spi-davinci > driver: > > 1/ implement ->can_dma() > 2/ patch davinci_spi_bufs() to take the decision to do DMA or not on a > per-xfer basis and not on a per-device basis > > Then we can start thinking about how to improve perfs by using a bounce > buffer for large transfers, but I'm still not sure this should be done > at the MTD level... This has already been done, see http://lists.infradead.org/pipermail/linux-arm-kernel/2017-February/489761.html. I return false for can_dma() if the rx or tx buffer is a vmalloc'ed one. In that case the transfer gos back to PIO and you loose performance, but no data corruption. Thanks, Frode -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>> >>> Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM >>> cortex-a15) wherein pages allocated by vmalloc are in highmem region >>> that are not addressable using 32 bit addresses and is backed by LPAE. >>> So, a 32 bit DMA cannot access these buffers at all. >>> When dma_map_sg() is called to map these pages by spi_map_buf() the >>> physical address is just truncated to 32 bit in pfn_to_dma() (as part of >>> dma_map_sg() call). This results in random crashes as DMA starts >>> accessing random memory during SPI read. >>> >>> IMO, there may be more undiscovered caveat with using dma_map_sg() for >>> non kmalloc'd buffers and its better that spi-nor starts handling these >>> buffers instead of relying on spi_map_msg() and working around every >>> time something pops up. >>> >> Ok, I had a closer look at the SPI framework, and it seems there's a >> way to tell to the core that a specific transfer cannot use DMA >> (->can_dam()). The first thing you should do is fix the spi-davinci >> driver: >> >> 1/ implement ->can_dma() >> 2/ patch davinci_spi_bufs() to take the decision to do DMA or not on a >> per-xfer basis and not on a per-device basis >> This would lead to poor perf defeating entire purpose of using DMA. >> Then we can start thinking about how to improve perfs by using a bounce >> buffer for large transfers, but I'm still not sure this should be done >> at the MTD level... If its at SPI level, then I guess each individual drivers which cannot handle vmalloc'd buffers will have to implement bounce buffer logic. Or SPI core can be extended in a way similar to this RFC. That is, SPI master driver will set a flag to request SPI core to use of bounce buffer for vmalloc'd buffers. And spi_map_buf() just uses bounce buffer in case buf does not belong to kmalloc region based on the flag. Mark, Cyrille, Is that what you prefer?
On Thu, 2 Mar 2017 19:24:43 +0530 Vignesh R <vigneshr@ti.com> wrote: > >>>> > >>> Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM > >>> cortex-a15) wherein pages allocated by vmalloc are in highmem region > >>> that are not addressable using 32 bit addresses and is backed by LPAE. > >>> So, a 32 bit DMA cannot access these buffers at all. > >>> When dma_map_sg() is called to map these pages by spi_map_buf() the > >>> physical address is just truncated to 32 bit in pfn_to_dma() (as part of > >>> dma_map_sg() call). This results in random crashes as DMA starts > >>> accessing random memory during SPI read. > >>> > >>> IMO, there may be more undiscovered caveat with using dma_map_sg() for > >>> non kmalloc'd buffers and its better that spi-nor starts handling these > >>> buffers instead of relying on spi_map_msg() and working around every > >>> time something pops up. > >>> > >> Ok, I had a closer look at the SPI framework, and it seems there's a > >> way to tell to the core that a specific transfer cannot use DMA > >> (->can_dam()). The first thing you should do is fix the spi-davinci > >> driver: > >> > >> 1/ implement ->can_dma() > >> 2/ patch davinci_spi_bufs() to take the decision to do DMA or not on a > >> per-xfer basis and not on a per-device basis > >> > > This would lead to poor perf defeating entire purpose of using DMA. Hm, that's not really true. For all cases where you have a DMA-able buffer it would still use DMA. For other cases (like the UBI+SPI-NOR case we're talking about here), yes, it will be slower, but slower is still better than buggy. So, in any case, I think the fixes pointed by Frode are needed. > > >> Then we can start thinking about how to improve perfs by using a bounce > >> buffer for large transfers, but I'm still not sure this should be done > >> at the MTD level... > > If its at SPI level, then I guess each individual drivers which cannot > handle vmalloc'd buffers will have to implement bounce buffer logic. Well, that's my opinion. The only one that can decide when to do PIO, when to use DMA or when to use a bounce buffer+DMA is the SPI controller. If you move this logic to the SPI NOR layer, you'll have to guess what is the best approach, and I fear the decision will be wrong on some platforms (leading to perf degradation). You're mentioning code duplication in each SPI controller, I agree, this is far from ideal, but what you're suggesting is not necessarily better. What if another SPI user starts passing vmalloc-ed buffers to the SPI controller? You'll have to duplicate the bounce-buffer logic in this user as well. > > Or SPI core can be extended in a way similar to this RFC. That is, SPI > master driver will set a flag to request SPI core to use of bounce > buffer for vmalloc'd buffers. And spi_map_buf() just uses bounce buffer > in case buf does not belong to kmalloc region based on the flag. That's a better approach IMHO. Note that the decision should not only be based on the buffer type, but also on the transfer length and/or whether the controller supports transferring non physically contiguous buffers. Maybe we should just extend ->can_dma() to let the core know if it should use a bounce buffer. Regarding the bounce buffer allocation logic, I'm not sure how it should be done. The SPI user should be able to determine a max transfer len (at least this is the case for SPI NORs) and inform the SPI layer about this boundary so that the SPI core can allocate a bounce buffer of this size. But we also have limitations at the SPI master level (->max_transfer_size(), ->max_message_size()). -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/03/2017 15:29, Boris Brezillon wrote: > On Thu, 2 Mar 2017 19:24:43 +0530 > Vignesh R <vigneshr@ti.com> wrote: > >>>>>> >>>>> Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM >>>>> cortex-a15) wherein pages allocated by vmalloc are in highmem region >>>>> that are not addressable using 32 bit addresses and is backed by LPAE. >>>>> So, a 32 bit DMA cannot access these buffers at all. >>>>> When dma_map_sg() is called to map these pages by spi_map_buf() the >>>>> physical address is just truncated to 32 bit in pfn_to_dma() (as part of >>>>> dma_map_sg() call). This results in random crashes as DMA starts >>>>> accessing random memory during SPI read. >>>>> >>>>> IMO, there may be more undiscovered caveat with using dma_map_sg() for >>>>> non kmalloc'd buffers and its better that spi-nor starts handling these >>>>> buffers instead of relying on spi_map_msg() and working around every >>>>> time something pops up. >>>>> >>>> Ok, I had a closer look at the SPI framework, and it seems there's a >>>> way to tell to the core that a specific transfer cannot use DMA >>>> (->can_dam()). The first thing you should do is fix the spi-davinci >>>> driver: >>>> >>>> 1/ implement ->can_dma() >>>> 2/ patch davinci_spi_bufs() to take the decision to do DMA or not on a >>>> per-xfer basis and not on a per-device basis >>>> >> This would lead to poor perf defeating entire purpose of using DMA. > Hm, that's not really true. For all cases where you have a DMA-able > buffer it would still use DMA. For other cases (like the UBI+SPI-NOR > case we're talking about here), yes, it will be slower, but slower is > still better than buggy. > So, in any case, I think the fixes pointed by Frode are needed. Also, I think the UBIFS layer only uses vmalloc'ed buffers during mount/unmount and not for read/write, so the performance hit is not that big. In most cases the buffer is the size of the erase block, but I've seen vmalloc'ed buffer of size only 11 bytes ! So, to optimize this, the best solution is probably to change how the UBIFS layer is using vmalloc'ed vs kmalloc'ed buffers, since vmalloc'ed should only be used for large (> 128K) buffers. Frode > >>>> Then we can start thinking about how to improve perfs by using a bounce >>>> buffer for large transfers, but I'm still not sure this should be done >>>> at the MTD level... >> If its at SPI level, then I guess each individual drivers which cannot >> handle vmalloc'd buffers will have to implement bounce buffer logic. > Well, that's my opinion. The only one that can decide when to do > PIO, when to use DMA or when to use a bounce buffer+DMA is the SPI > controller. > If you move this logic to the SPI NOR layer, you'll have to guess what > is the best approach, and I fear the decision will be wrong on some > platforms (leading to perf degradation). > > You're mentioning code duplication in each SPI controller, I agree, > this is far from ideal, but what you're suggesting is not necessarily > better. What if another SPI user starts passing vmalloc-ed buffers to > the SPI controller? You'll have to duplicate the bounce-buffer logic in > this user as well. > >> Or SPI core can be extended in a way similar to this RFC. That is, SPI >> master driver will set a flag to request SPI core to use of bounce >> buffer for vmalloc'd buffers. And spi_map_buf() just uses bounce buffer >> in case buf does not belong to kmalloc region based on the flag. > That's a better approach IMHO. Note that the decision should not only > be based on the buffer type, but also on the transfer length and/or > whether the controller supports transferring non physically contiguous > buffers. > > Maybe we should just extend ->can_dma() to let the core know if it > should use a bounce buffer. > > Regarding the bounce buffer allocation logic, I'm not sure how it > should be done. The SPI user should be able to determine a max transfer > len (at least this is the case for SPI NORs) and inform the SPI layer > about this boundary so that the SPI core can allocate a bounce buffer > of this size. But we also have limitations at the SPI master level > (->max_transfer_size(), ->max_message_size()). > > -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2 Mar 2017 16:03:17 +0100 Frode Isaksen <fisaksen@baylibre.com> wrote: > On 02/03/2017 15:29, Boris Brezillon wrote: > > On Thu, 2 Mar 2017 19:24:43 +0530 > > Vignesh R <vigneshr@ti.com> wrote: > > > >>>>>> > >>>>> Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM > >>>>> cortex-a15) wherein pages allocated by vmalloc are in highmem region > >>>>> that are not addressable using 32 bit addresses and is backed by LPAE. > >>>>> So, a 32 bit DMA cannot access these buffers at all. > >>>>> When dma_map_sg() is called to map these pages by spi_map_buf() the > >>>>> physical address is just truncated to 32 bit in pfn_to_dma() (as part of > >>>>> dma_map_sg() call). This results in random crashes as DMA starts > >>>>> accessing random memory during SPI read. > >>>>> > >>>>> IMO, there may be more undiscovered caveat with using dma_map_sg() for > >>>>> non kmalloc'd buffers and its better that spi-nor starts handling these > >>>>> buffers instead of relying on spi_map_msg() and working around every > >>>>> time something pops up. > >>>>> > >>>> Ok, I had a closer look at the SPI framework, and it seems there's a > >>>> way to tell to the core that a specific transfer cannot use DMA > >>>> (->can_dam()). The first thing you should do is fix the spi-davinci > >>>> driver: > >>>> > >>>> 1/ implement ->can_dma() > >>>> 2/ patch davinci_spi_bufs() to take the decision to do DMA or not on a > >>>> per-xfer basis and not on a per-device basis > >>>> > >> This would lead to poor perf defeating entire purpose of using DMA. > > Hm, that's not really true. For all cases where you have a DMA-able > > buffer it would still use DMA. For other cases (like the UBI+SPI-NOR > > case we're talking about here), yes, it will be slower, but slower is > > still better than buggy. > > So, in any case, I think the fixes pointed by Frode are needed. > Also, I think the UBIFS layer only uses vmalloc'ed buffers during > mount/unmount and not for read/write, so the performance hit is not > that big. It's a bit more complicated than that. You may have operations running in background that are using those big vmalloc-ed buffers at runtime. To optimize things, we really need to split LEB/PEB buffers into multiple ->max_write_size (or ->min_io_size) kmalloc-ed buffers. > In most cases the buffer is the size of the erase block, but I've seen > vmalloc'ed buffer of size only 11 bytes ! So, to optimize this, the > best solution is probably to change how the UBIFS layer is using > vmalloc'ed vs kmalloc'ed buffers, since vmalloc'ed should only be used > for large (> 128K) buffers. Hm, the buffer itself is bigger than 11 bytes, it's just that the same buffer is used in different use cases, and sometime we're only partially filling it. -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le 02/03/2017 à 15:29, Boris Brezillon a écrit : > On Thu, 2 Mar 2017 19:24:43 +0530 > Vignesh R <vigneshr@ti.com> wrote: > >>>>>> >>>>> Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM >>>>> cortex-a15) wherein pages allocated by vmalloc are in highmem region >>>>> that are not addressable using 32 bit addresses and is backed by LPAE. >>>>> So, a 32 bit DMA cannot access these buffers at all. >>>>> When dma_map_sg() is called to map these pages by spi_map_buf() the >>>>> physical address is just truncated to 32 bit in pfn_to_dma() (as part of >>>>> dma_map_sg() call). This results in random crashes as DMA starts >>>>> accessing random memory during SPI read. >>>>> >>>>> IMO, there may be more undiscovered caveat with using dma_map_sg() for >>>>> non kmalloc'd buffers and its better that spi-nor starts handling these >>>>> buffers instead of relying on spi_map_msg() and working around every >>>>> time something pops up. >>>>> >>>> Ok, I had a closer look at the SPI framework, and it seems there's a >>>> way to tell to the core that a specific transfer cannot use DMA >>>> (->can_dam()). The first thing you should do is fix the spi-davinci >>>> driver: >>>> >>>> 1/ implement ->can_dma() >>>> 2/ patch davinci_spi_bufs() to take the decision to do DMA or not on a >>>> per-xfer basis and not on a per-device basis >>>> >> >> This would lead to poor perf defeating entire purpose of using DMA. > > Hm, that's not really true. For all cases where you have a DMA-able > buffer it would still use DMA. For other cases (like the UBI+SPI-NOR > case we're talking about here), yes, it will be slower, but slower is > still better than buggy. > So, in any case, I think the fixes pointed by Frode are needed. > >> >>>> Then we can start thinking about how to improve perfs by using a bounce >>>> buffer for large transfers, but I'm still not sure this should be done >>>> at the MTD level... >> >> If its at SPI level, then I guess each individual drivers which cannot >> handle vmalloc'd buffers will have to implement bounce buffer logic. > > Well, that's my opinion. The only one that can decide when to do > PIO, when to use DMA or when to use a bounce buffer+DMA is the SPI > controller. > If you move this logic to the SPI NOR layer, you'll have to guess what > is the best approach, and I fear the decision will be wrong on some > platforms (leading to perf degradation). > True. For instance, Atmel SAMA5* SoCs don't need this bounce buffer since their L1 data cache uses a PIPT scheme. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0433c/CHDFAHBD.html """ 2.1.4. Data side memory system Data Cache Unit The Data Cache Unit (DCU) consists of the following sub-blocks: The Level 1 (L1) data cache controller, which generates the control signals for the associated embedded tag, data, and dirty memory (RAMs) and arbitrates between the different sources requesting access to the memory resources. The data cache is 4-way set associative and uses a Physically Indexed Physically Tagged (PIPT) scheme for lookup which enables unambiguous address management in the system. """ So for those SoCs, spi_map_msg() should be safe to handle vmalloc'ed buffers since they don't have to worry about the cache aliases issue or address truncation. That's why I don't think setting the SNOR_F_USE_BOUNCE_BUFFER in *all* cases in m25p80 is the right solution since it would not fair to degrade the performances of some devices when it's not needed hence not justified. I still agree with the idea of patch 1 but about patch 2, if m25p80 users want to take advantage of this new spi-nor bounce buffer, we have to agree on a reliable mechanism that clearly tells whether or not the SNOR_F_USE_BOUNCE_BUFFER is to be set from m25p80. > You're mentioning code duplication in each SPI controller, I agree, > this is far from ideal, but what you're suggesting is not necessarily > better. What if another SPI user starts passing vmalloc-ed buffers to > the SPI controller? You'll have to duplicate the bounce-buffer logic in > this user as well. > >> >> Or SPI core can be extended in a way similar to this RFC. That is, SPI >> master driver will set a flag to request SPI core to use of bounce >> buffer for vmalloc'd buffers. And spi_map_buf() just uses bounce buffer >> in case buf does not belong to kmalloc region based on the flag. > > That's a better approach IMHO. Note that the decision should not only > be based on the buffer type, but also on the transfer length and/or > whether the controller supports transferring non physically contiguous > buffers. > > Maybe we should just extend ->can_dma() to let the core know if it > should use a bounce buffer. > > Regarding the bounce buffer allocation logic, I'm not sure how it > should be done. The SPI user should be able to determine a max transfer > len (at least this is the case for SPI NORs) and inform the SPI layer > about this boundary so that the SPI core can allocate a bounce buffer > of this size. But we also have limitations at the SPI master level > (->max_transfer_size(), ->max_message_size()). > > > -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 02, 2017 at 03:29:21PM +0100, Boris Brezillon wrote: > Vignesh R <vigneshr@ti.com> wrote: > > Or SPI core can be extended in a way similar to this RFC. That is, SPI > > master driver will set a flag to request SPI core to use of bounce > > buffer for vmalloc'd buffers. And spi_map_buf() just uses bounce buffer > > in case buf does not belong to kmalloc region based on the flag. > That's a better approach IMHO. Note that the decision should not only I don't understand how the driver is supposed to tell if it might need a bounce buffer due to where the memory is allocated and the caches used by the particular system it is used on? The suggestion to pass via scatterlists seems a bit more likely to work but even then I'm not clear that drivers doing PIO would play well. > be based on the buffer type, but also on the transfer length and/or > whether the controller supports transferring non physically contiguous > buffers. The reason most drivers only look at the transfer length when deciding that they can DMA is that most controllers are paired with DMA controllers that are sensibly implemented, the only factor they're selecting on is the copybreak for performance.
On Thu, 2 Mar 2017 17:00:41 +0000 Mark Brown <broonie@kernel.org> wrote: > On Thu, Mar 02, 2017 at 03:29:21PM +0100, Boris Brezillon wrote: > > Vignesh R <vigneshr@ti.com> wrote: > > > > Or SPI core can be extended in a way similar to this RFC. That is, SPI > > > master driver will set a flag to request SPI core to use of bounce > > > buffer for vmalloc'd buffers. And spi_map_buf() just uses bounce buffer > > > in case buf does not belong to kmalloc region based on the flag. > > > That's a better approach IMHO. Note that the decision should not only > > I don't understand how the driver is supposed to tell if it might need a > bounce buffer due to where the memory is allocated and the caches used > by the particular system it is used on? That's true, but if the SPI controller driver can't decide that, how could a SPI device driver guess? We could patch dma_map_sg() to create a bounce buffer when it's given a vmalloc-ed buffer and we are running on a system using VIVT or VIPT caches (it's already allocating bounce buffers when the peripheral device cannot access the memory region, so why not in this case). This still leaves 2 problems: 1/ for big transfers, dynamically allocating a bounce buffer on demand (and freeing it after the DMA operation) might fail, or might induce some latency, especially when the system is under high mem pressure. Allocating these bounce buffers once during the SPI device driver ->probe() guarantees that the bounce buffer will always be available when needed, but OTOH, we don't know if it's really needed. 2/ only the SPI and/or DMA engine know when using DMA with a bounce buffer is better than using PIO mode. The limit is probably different from the DMA vs PIO mode (dma_min_len < dma_bounce_min_len). Thanks to ->can_dma() we can let drivers decide when preparing the buffer for a DMA transfer is needed. 3/ if the DMA engine does not support chaining DMA descriptor, and the vmalloc-ed buffer spans several non-contiguous pages, doing DMA is simply not possible. That one can probably handled with the ->can_dma() hook too. > The suggestion to pass via > scatterlists seems a bit more likely to work but even then I'm not clear > that drivers doing PIO would play well. You mean that SPI device drivers would directly pass an sg list instead of a virtual pointer? Not sure that would help, we're just moving the decision one level up without providing more information to help decide what to do. > > > be based on the buffer type, but also on the transfer length and/or > > whether the controller supports transferring non physically contiguous > > buffers. > > The reason most drivers only look at the transfer length when deciding > that they can DMA is that most controllers are paired with DMA > controllers that are sensibly implemented, the only factor they're > selecting on is the copybreak for performance. Of course, the checks I mentioned (especially the physically contiguous one) are SPI controller and/or DMA engine dependent. Some of them might be irrelevant. -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/03/2017 16:25, Boris Brezillon wrote: > On Thu, 2 Mar 2017 16:03:17 +0100 > Frode Isaksen <fisaksen@baylibre.com> wrote: > >> On 02/03/2017 15:29, Boris Brezillon wrote: >>> On Thu, 2 Mar 2017 19:24:43 +0530 >>> Vignesh R <vigneshr@ti.com> wrote: >>> >>>>>>>> >>>>>>> Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM >>>>>>> cortex-a15) wherein pages allocated by vmalloc are in highmem region >>>>>>> that are not addressable using 32 bit addresses and is backed by LPAE. >>>>>>> So, a 32 bit DMA cannot access these buffers at all. >>>>>>> When dma_map_sg() is called to map these pages by spi_map_buf() the >>>>>>> physical address is just truncated to 32 bit in pfn_to_dma() (as part of >>>>>>> dma_map_sg() call). This results in random crashes as DMA starts >>>>>>> accessing random memory during SPI read. >>>>>>> >>>>>>> IMO, there may be more undiscovered caveat with using dma_map_sg() for >>>>>>> non kmalloc'd buffers and its better that spi-nor starts handling these >>>>>>> buffers instead of relying on spi_map_msg() and working around every >>>>>>> time something pops up. >>>>>>> >>>>>> Ok, I had a closer look at the SPI framework, and it seems there's a >>>>>> way to tell to the core that a specific transfer cannot use DMA >>>>>> (->can_dam()). The first thing you should do is fix the spi-davinci >>>>>> driver: >>>>>> >>>>>> 1/ implement ->can_dma() >>>>>> 2/ patch davinci_spi_bufs() to take the decision to do DMA or not on a >>>>>> per-xfer basis and not on a per-device basis >>>>>> >>>> This would lead to poor perf defeating entire purpose of using DMA. >>> Hm, that's not really true. For all cases where you have a DMA-able >>> buffer it would still use DMA. For other cases (like the UBI+SPI-NOR >>> case we're talking about here), yes, it will be slower, but slower is >>> still better than buggy. >>> So, in any case, I think the fixes pointed by Frode are needed. >> Also, I think the UBIFS layer only uses vmalloc'ed buffers during >> mount/unmount and not for read/write, so the performance hit is not >> that big. > It's a bit more complicated than that. You may have operations running > in background that are using those big vmalloc-ed buffers at runtime. > To optimize things, we really need to split LEB/PEB buffers into > multiple ->max_write_size (or ->min_io_size) kmalloc-ed buffers. > >> In most cases the buffer is the size of the erase block, but I've seen >> vmalloc'ed buffer of size only 11 bytes ! So, to optimize this, the >> best solution is probably to change how the UBIFS layer is using >> vmalloc'ed vs kmalloc'ed buffers, since vmalloc'ed should only be used >> for large (> 128K) buffers. > Hm, the buffer itself is bigger than 11 bytes, it's just that the > same buffer is used in different use cases, and sometime we're only > partially filling it. There are at least one place in the UBIFS layer where a small buffer is vmalloc'ed: static int read_ltab(struct ubifs_info *c) { int err; void *buf; buf = vmalloc(c->ltab_sz); if (!buf) return -ENOMEM; err = ubifs_leb_read(c, c->ltab_lnum, buf, c->ltab_offs, c->ltab_sz, 1); if (err) goto out; err = unpack_ltab(c, buf); out: vfree(buf); return err; } On my board, the buffer size is 11 bytes. Frode -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 02, 2017 at 08:49:00PM +0100, Boris Brezillon wrote: > 1/ for big transfers, dynamically allocating a bounce buffer on demand > (and freeing it after the DMA operation) might fail, or might induce > some latency, especially when the system is under high mem pressure. > Allocating these bounce buffers once during the SPI device driver > ->probe() guarantees that the bounce buffer will always be available > when needed, but OTOH, we don't know if it's really needed. Yeah, but then the bounces are going to slow things down anyway. This does seem fixable if we do something with caching the buffer once we create it, and if it's implemented elsewhere the same problem will exist. We can't just allocate the maximum possible buffer size because some devices have effectively unlimited transfer sizes so you could waste huge amounts of memory, especially in the common case where we don't use vmalloc() at all. > 2/ only the SPI and/or DMA engine know when using DMA with a bounce > buffer is better than using PIO mode. The limit is probably > different from the DMA vs PIO mode (dma_min_len < > dma_bounce_min_len). Thanks to ->can_dma() we can let drivers decide > when preparing the buffer for a DMA transfer is needed. I'm not so worried about that, the numbers are basically an educated guess anyway. It's a concern though, yes. > 3/ if the DMA engine does not support chaining DMA descriptor, and the > vmalloc-ed buffer spans several non-contiguous pages, doing DMA > is simply not possible. That one can probably handled with the > ->can_dma() hook too. Anything that doesn't support chaining is in trouble already (or should be soon hopefully), but mostly the controllers will have no idea about that as they're just asking their DMA controller to do things. We'd be better off having the core query the capabilities of the DMA controllers directly. > > The suggestion to pass via > > scatterlists seems a bit more likely to work but even then I'm not clear > > that drivers doing PIO would play well. > You mean that SPI device drivers would directly pass an sg list instead > of a virtual pointer? Not sure that would help, we're just moving the > decision one level up without providing more information to help decide > what to do. I think the idea was to ensure you only use one mapping type at once.
On Thursday 02 March 2017 07:59 PM, Boris Brezillon wrote: > On Thu, 2 Mar 2017 19:24:43 +0530 > Vignesh R <vigneshr@ti.com> wrote: > >>>>>> >>>>> Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM >>>>> cortex-a15) wherein pages allocated by vmalloc are in highmem region >>>>> that are not addressable using 32 bit addresses and is backed by LPAE. >>>>> So, a 32 bit DMA cannot access these buffers at all. >>>>> When dma_map_sg() is called to map these pages by spi_map_buf() the >>>>> physical address is just truncated to 32 bit in pfn_to_dma() (as part of >>>>> dma_map_sg() call). This results in random crashes as DMA starts >>>>> accessing random memory during SPI read. >>>>> >>>>> IMO, there may be more undiscovered caveat with using dma_map_sg() for >>>>> non kmalloc'd buffers and its better that spi-nor starts handling these >>>>> buffers instead of relying on spi_map_msg() and working around every >>>>> time something pops up. >>>>> >>>> Ok, I had a closer look at the SPI framework, and it seems there's a >>>> way to tell to the core that a specific transfer cannot use DMA >>>> (->can_dam()). The first thing you should do is fix the spi-davinci >>>> driver: >>>> >>>> 1/ implement ->can_dma() >>>> 2/ patch davinci_spi_bufs() to take the decision to do DMA or not on a >>>> per-xfer basis and not on a per-device basis >>>> >> >> This would lead to poor perf defeating entire purpose of using DMA. > > Hm, that's not really true. For all cases where you have a DMA-able > buffer it would still use DMA. For other cases (like the UBI+SPI-NOR > case we're talking about here), yes, it will be slower, but slower is > still better than buggy. > So, in any case, I think the fixes pointed by Frode are needed. > Yes, but still bounce buffer does help in perf improvement over PIO. >> >>>> Then we can start thinking about how to improve perfs by using a bounce >>>> buffer for large transfers, but I'm still not sure this should be done >>>> at the MTD level... >> >> If its at SPI level, then I guess each individual drivers which cannot >> handle vmalloc'd buffers will have to implement bounce buffer logic. > > Well, that's my opinion. The only one that can decide when to do > PIO, when to use DMA or when to use a bounce buffer+DMA is the SPI > controller. > If you move this logic to the SPI NOR layer, you'll have to guess what > is the best approach, and I fear the decision will be wrong on some > platforms (leading to perf degradation). > > You're mentioning code duplication in each SPI controller, I agree, > this is far from ideal, but what you're suggesting is not necessarily > better. What if another SPI user starts passing vmalloc-ed buffers to > the SPI controller? You'll have to duplicate the bounce-buffer logic in > this user as well. > Hmm... Yes, there are ways to by pass SPI core. >> >> Or SPI core can be extended in a way similar to this RFC. That is, SPI >> master driver will set a flag to request SPI core to use of bounce >> buffer for vmalloc'd buffers. And spi_map_buf() just uses bounce buffer >> in case buf does not belong to kmalloc region based on the flag. > > That's a better approach IMHO. Note that the decision should not only > be based on the buffer type, but also on the transfer length and/or > whether the controller supports transferring non physically contiguous > buffers. > > Maybe we should just extend ->can_dma() to let the core know if it > should use a bounce buffer. > Yes, this is definitely needed. ->can_dma() currently returns bool. We need a better interface that returns different error codes for restriction on buffer length vs buffer type (I dont see any appropriate error codes) or make ->can_dma() return flag asking for bounce buffer. SPI controller drivers may use cache_is_*() and virt_addr_valid() to decide whether or not request bounce buffer. > Regarding the bounce buffer allocation logic, I'm not sure how it > should be done. The SPI user should be able to determine a max transfer > len (at least this is the case for SPI NORs) and inform the SPI layer > about this boundary so that the SPI core can allocate a bounce buffer > of this size. But we also have limitations at the SPI master level > (->max_transfer_size(), ->max_message_size()). > Again, I guess only SPI controller can suggest the appropriate size of bounce buffer based on its internal constraints and use cases that its known to support.
On 3/6/2017 5:17 PM, Vignesh R wrote: > > > On Thursday 02 March 2017 07:59 PM, Boris Brezillon wrote: >> On Thu, 2 Mar 2017 19:24:43 +0530 >> Vignesh R <vigneshr@ti.com> wrote: [...] >>> >>> If its at SPI level, then I guess each individual drivers which cannot >>> handle vmalloc'd buffers will have to implement bounce buffer logic. >> >> Well, that's my opinion. The only one that can decide when to do >> PIO, when to use DMA or when to use a bounce buffer+DMA is the SPI >> controller. >> If you move this logic to the SPI NOR layer, you'll have to guess what >> is the best approach, and I fear the decision will be wrong on some >> platforms (leading to perf degradation). >> >> You're mentioning code duplication in each SPI controller, I agree, >> this is far from ideal, but what you're suggesting is not necessarily >> better. What if another SPI user starts passing vmalloc-ed buffers to >> the SPI controller? You'll have to duplicate the bounce-buffer logic in >> this user as well. >> > > Hmm... Yes, there are ways to by pass SPI core. > >>> >>> Or SPI core can be extended in a way similar to this RFC. That is, SPI >>> master driver will set a flag to request SPI core to use of bounce >>> buffer for vmalloc'd buffers. And spi_map_buf() just uses bounce buffer >>> in case buf does not belong to kmalloc region based on the flag. >> >> That's a better approach IMHO. Note that the decision should not only >> be based on the buffer type, but also on the transfer length and/or >> whether the controller supports transferring non physically contiguous >> buffers. >> >> Maybe we should just extend ->can_dma() to let the core know if it >> should use a bounce buffer. >> > > Yes, this is definitely needed. ->can_dma() currently returns bool. We > need a better interface that returns different error codes for > restriction on buffer length vs buffer type (I dont see any appropriate > error codes) or make ->can_dma() return flag asking for bounce buffer. > SPI controller drivers may use cache_is_*() and virt_addr_valid() to > decide whether or not request bounce buffer. > >> Regarding the bounce buffer allocation logic, I'm not sure how it >> should be done. The SPI user should be able to determine a max transfer >> len (at least this is the case for SPI NORs) and inform the SPI layer >> about this boundary so that the SPI core can allocate a bounce buffer >> of this size. But we also have limitations at the SPI master level >> (->max_transfer_size(), ->max_message_size()). >> > > Again, I guess only SPI controller can suggest the appropriate size of > bounce buffer based on its internal constraints and use cases that its > known to support. > I will work on a patch series implementing bounce buffer support in SPI core and extend ->can_dma() to inform when bounce buffer needs to be used. This will make sure that SPI controller drivers that are not affected by cache aliasing problem or LPAE buffers don't have performance impact. Any comments appreciated!
diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c index c4df3b1bded0..d05acf22eadf 100644 --- a/drivers/mtd/devices/m25p80.c +++ b/drivers/mtd/devices/m25p80.c @@ -241,6 +241,7 @@ static int m25p_probe(struct spi_device *spi) else flash_name = spi->modalias; + nor->flags |= SNOR_F_USE_BOUNCE_BUFFER; ret = spi_nor_scan(nor, flash_name, mode); if (ret) return ret;
Many SPI controller drivers use DMA to read/write from m25p80 compatible flashes. Therefore enable bounce buffers support provided by spi-nor framework to take care of handling vmalloc'd buffers which may not be DMA'able. Signed-off-by: Vignesh R <vigneshr@ti.com> --- drivers/mtd/devices/m25p80.c | 1 + 1 file changed, 1 insertion(+)