Message ID | 20230603090643.6799-1-stefan.wahren@i2se.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ARM: mm: Fix ARCH_LOW_ADDRESS_LIMIT when CONFIG_ZONE_DMA | expand |
On Sat, Jun 3, 2023, at 11:06, Stefan Wahren wrote: > Configuring VMSPLIT_2G + LPAE on Raspberry Pi 4 leads to SWIOTLB > buffer allocation beyond platform dma_zone_size of SZ_1G, which > results in broken SD card boot. > > So fix this be setting ARCH_LOW_ADDRESS_LIMIT in CONFIG_ZONE_DMA > case. > > Suggested-by: Russell King <linux@armlinux.org.uk> > Fixes: e9faf9b0b07a ("ARM: add multi_v7_lpae_defconfig") > Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> This looks correct and required to me, but I think we also want the change to paging_init() that you tried earlier before this patch. Changing the low address limit by itself makes the swiotlb work correctly, but for performance we also want to avoid having to use the swiotlb at all when dealing with the zero page. Arnd
On Sat, Jun 03, 2023 at 02:33:49PM +0200, Arnd Bergmann wrote: > Changing the low address limit by itself makes the swiotlb work > correctly, but for performance we also want to avoid having to > use the swiotlb at all when dealing with the zero page. How often is the zero page used for DMA _to_ a device (it should never be used for DMA _from_ a device.) ? It doesn't sound that useful to me, since it can only be used for writing zeros.
On Sat, Jun 3, 2023, at 14:38, Russell King (Oracle) wrote: > On Sat, Jun 03, 2023 at 02:33:49PM +0200, Arnd Bergmann wrote: >> Changing the low address limit by itself makes the swiotlb work >> correctly, but for performance we also want to avoid having to >> use the swiotlb at all when dealing with the zero page. > > How often is the zero page used for DMA _to_ a device (it should never > be used for DMA _from_ a device.) ? It doesn't sound that useful to me, > since it can only be used for writing zeros. It's possible that this doesn't happen all that much at all, the bcm2835-dma driver is the only one that has a specific optimization for this case, introduced in commit bf75703d0912d ("dmaengine: bcm2835: Avoid accessing memory when copying zeroes"). If the ZERO page can only be mapped through swiotlb, that optimization is clearly counterproductive because it requires making a copy of the zero page but doesn't actually skip any transfers. I don't really understand what the spi driver is doing here, but I can see that it still contains the code that originally started the discussion about the change to the DMA driver, so I assume it's still used that way. Arnd
On Sat, Jun 03, 2023 at 02:59:53PM +0200, Arnd Bergmann wrote: > On Sat, Jun 3, 2023, at 14:38, Russell King (Oracle) wrote: > > On Sat, Jun 03, 2023 at 02:33:49PM +0200, Arnd Bergmann wrote: > >> Changing the low address limit by itself makes the swiotlb work > >> correctly, but for performance we also want to avoid having to > >> use the swiotlb at all when dealing with the zero page. > > > > How often is the zero page used for DMA _to_ a device (it should never > > be used for DMA _from_ a device.) ? It doesn't sound that useful to me, > > since it can only be used for writing zeros. > > It's possible that this doesn't happen all that much at all, the > bcm2835-dma driver is the only one that has a specific optimization > for this case, introduced in commit bf75703d0912d ("dmaengine: > bcm2835: Avoid accessing memory when copying zeroes"). > > If the ZERO page can only be mapped through swiotlb, that > optimization is clearly counterproductive because it requires > making a copy of the zero page but doesn't actually skip > any transfers. > > I don't really understand what the spi driver is doing here, > but I can see that it still contains the code that originally > started the discussion about the change to the DMA driver, > so I assume it's still used that way. With many SPI interfaces, if you want to read something, you need to have an active "write" stream (because when you clock the bus, bits need to be clocked out while bits are clocked in.) A common way to do that is to clock out zeros, and to DMA them from somewhere. Note that in the commit you pointed to: + /* non-lite channels can write zeroes w/o accessing memory */ + if (buf_addr == od->zero_page && !c->is_lite_channel) + info |= BCM2835_DMA_S_IGNORE; So, if it's not a "lite" channel, then even mapping the zero page is a waste, because according to the comment, it won't even access memory. Note also that the driver is also relying on dma_map_*() of zero-page _always_ producing the same result, which afaik the DMA API has never guaranteed. Also note that the buffer passed into bcm2835_dma_prep_dma_cyclic() must also have been dma-mapped (since it takes a dma_addr_t, not the virtual address of the buffer.) So this all looks rather dodgy to me.
Hi, Am 03.06.23 um 15:58 schrieb Lukas Wunner: > On Sat, Jun 03, 2023 at 02:59:53PM +0200, Arnd Bergmann wrote: >> On Sat, Jun 3, 2023, at 14:38, Russell King (Oracle) wrote: >>> On Sat, Jun 03, 2023 at 02:33:49PM +0200, Arnd Bergmann wrote: >>>> Changing the low address limit by itself makes the swiotlb work >>>> correctly, but for performance we also want to avoid having to >>>> use the swiotlb at all when dealing with the zero page. >>> >>> How often is the zero page used for DMA _to_ a device (it should never >>> be used for DMA _from_ a device.) ? It doesn't sound that useful to me, >>> since it can only be used for writing zeros. >> >> It's possible that this doesn't happen all that much at all, the >> bcm2835-dma driver is the only one that has a specific optimization >> for this case, introduced in commit bf75703d0912d ("dmaengine: >> bcm2835: Avoid accessing memory when copying zeroes"). >> >> If the ZERO page can only be mapped through swiotlb, that >> optimization is clearly counterproductive because it requires >> making a copy of the zero page but doesn't actually skip >> any transfers. >> >> I don't really understand what the spi driver is doing here, >> but I can see that it still contains the code that originally >> started the discussion about the change to the DMA driver, >> so I assume it's still used that way. > > SPI is a bidirectional bus, i.e. you receive as many bytes as you write. > > When you're receiving an Ethernet frame from an SPI-attached network chip, > you need to clock out bytes in order to clock in the Ethernet frame. > The content of those bytes you clock out doesn't matter. They can be > zeroes or anything else. > > The Raspberry Pi's BCM2835 DMA controller is capable of synthesizing zeroes > and writing them to the SPI controller's FIFO. This reduces traffic on the > memory bus because you don't need to copy data from memory to the FIFO. > > Unfortunately only a subset of the DMA channels has that capability. > If the SPI controller happens to get only one of the inferior "lite" > channels, it needs to fall back to copying data from memory. To avoid > having to map a page for that, it simply reuses the zero page. > > Hope that sheds some light on what the driver is doing there. i'm bit a lost in this old discussion. Arnd sent a Reviewed-By to the patch. Shall i send this to RMK's patch system? Or is the patch not sufficient enough? > > Thanks, > > Lukas
On 2023-09-06 11:31, Stefan Wahren wrote: > Hi, > > Am 03.06.23 um 15:58 schrieb Lukas Wunner: >> On Sat, Jun 03, 2023 at 02:59:53PM +0200, Arnd Bergmann wrote: >>> On Sat, Jun 3, 2023, at 14:38, Russell King (Oracle) wrote: >>>> On Sat, Jun 03, 2023 at 02:33:49PM +0200, Arnd Bergmann wrote: >>>>> Changing the low address limit by itself makes the swiotlb work >>>>> correctly, but for performance we also want to avoid having to >>>>> use the swiotlb at all when dealing with the zero page. >>>> >>>> How often is the zero page used for DMA _to_ a device (it should never >>>> be used for DMA _from_ a device.) ? It doesn't sound that useful to me, >>>> since it can only be used for writing zeros. >>> >>> It's possible that this doesn't happen all that much at all, the >>> bcm2835-dma driver is the only one that has a specific optimization >>> for this case, introduced in commit bf75703d0912d ("dmaengine: >>> bcm2835: Avoid accessing memory when copying zeroes"). >>> >>> If the ZERO page can only be mapped through swiotlb, that >>> optimization is clearly counterproductive because it requires >>> making a copy of the zero page but doesn't actually skip >>> any transfers. >>> >>> I don't really understand what the spi driver is doing here, >>> but I can see that it still contains the code that originally >>> started the discussion about the change to the DMA driver, >>> so I assume it's still used that way. >> >> SPI is a bidirectional bus, i.e. you receive as many bytes as you write. >> >> When you're receiving an Ethernet frame from an SPI-attached network >> chip, >> you need to clock out bytes in order to clock in the Ethernet frame. >> The content of those bytes you clock out doesn't matter. They can be >> zeroes or anything else. >> >> The Raspberry Pi's BCM2835 DMA controller is capable of synthesizing >> zeroes >> and writing them to the SPI controller's FIFO. This reduces traffic >> on the >> memory bus because you don't need to copy data from memory to the FIFO. >> >> Unfortunately only a subset of the DMA channels has that capability. >> If the SPI controller happens to get only one of the inferior "lite" >> channels, it needs to fall back to copying data from memory. To avoid >> having to map a page for that, it simply reuses the zero page. >> >> Hope that sheds some light on what the driver is doing there. > > i'm bit a lost in this old discussion. Arnd sent a Reviewed-By to the > patch. > > Shall i send this to RMK's patch system? I think so - I don't remember how we got into the SPI thing, but that looks somewhat bogus and broken, and doesn't matter to the correctness of this patch in itself. Thanks, Robin. > > Or is the patch not sufficient enough? > >> >> Thanks, >> >> Lukas
diff --git a/arch/arm/include/asm/dma.h b/arch/arm/include/asm/dma.h index c6aded1b069c..e2a1916013e7 100644 --- a/arch/arm/include/asm/dma.h +++ b/arch/arm/include/asm/dma.h @@ -12,6 +12,9 @@ extern phys_addr_t arm_dma_zone_size; \ arm_dma_zone_size && arm_dma_zone_size < (0x100000000ULL - PAGE_OFFSET) ? \ (PAGE_OFFSET + arm_dma_zone_size) : 0xffffffffUL; }) + +extern phys_addr_t arm_dma_limit; +#define ARCH_LOW_ADDRESS_LIMIT arm_dma_limit #endif #ifdef CONFIG_ISA_DMA_API
Configuring VMSPLIT_2G + LPAE on Raspberry Pi 4 leads to SWIOTLB buffer allocation beyond platform dma_zone_size of SZ_1G, which results in broken SD card boot. So fix this be setting ARCH_LOW_ADDRESS_LIMIT in CONFIG_ZONE_DMA case. Suggested-by: Russell King <linux@armlinux.org.uk> Fixes: e9faf9b0b07a ("ARM: add multi_v7_lpae_defconfig") Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> --- arch/arm/include/asm/dma.h | 3 +++ 1 file changed, 3 insertions(+)