Message ID | 20200730154545.3965-6-Sergey.Semin@baikalelectronics.ru (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | dmaengine: dw: Introduce non-mem peripherals optimizations | expand |
On Thu, Jul 30, 2020 at 06:45:45PM +0300, Serge Semin wrote: > DW DMA IP-core provides a way to synthesize the DMA controller with > channels having different parameters like maximum burst-length, > multi-block support, maximum data width, etc. Those parameters both > explicitly and implicitly affect the channels performance. Since DMA slave > devices might be very demanding to the DMA performance, let's provide a > functionality for the slaves to be assigned with DW DMA channels, which > performance according to the platform engineer fulfill their requirements. > After this patch is applied it can be done by passing the mask of suitable > DMA-channels either directly in the dw_dma_slave structure instance or as > a fifth cell of the DMA DT-property. If mask is zero or not provided, then > there is no limitation on the channels allocation. > > For instance Baikal-T1 SoC is equipped with a DW DMAC engine, which first > two channels are synthesized with max burst length of 16, while the rest > of the channels have been created with max-burst-len=4. It would seem that > the first two channels must be faster than the others and should be more > preferable for the time-critical DMA slave devices. In practice it turned > out that the situation is quite the opposite. The channels with > max-burst-len=4 demonstrated a better performance than the channels with > max-burst-len=16 even when they both had been initialized with the same > settings. The performance drop of the first two DMA-channels made them > unsuitable for the DW APB SSI slave device. No matter what settings they > are configured with, full-duplex SPI transfers occasionally experience the > Rx FIFO overflow. It means that the DMA-engine doesn't keep up with > incoming data pace even though the SPI-bus is enabled with speed of 25MHz > while the DW DMA controller is clocked with 50MHz signal. There is no such > problem has been noticed for the channels synthesized with > max-burst-len=4. ... > + if (dws->channels && !(dws->channels & dwc->mask)) You can drop the first check if... > + return false; ... > + if (dma_spec->args_count >= 4) > + slave.channels = dma_spec->args[3]; ...you apply sane default here or somewhere else. ... > + fls(slave.channels) > dw->pdata->nr_channels)) Does it really make sense? I think it can also be simplified to faster op, i.e. BIT(nr_channels) < slave.channels (but check for off-by-one errors) ... > + * @channels: mask of the channels permitted for allocation (zero > + * value means any) Perhaps on one line?
On Thu, Jul 30, 2020 at 07:41:46PM +0300, Andy Shevchenko wrote: > On Thu, Jul 30, 2020 at 06:45:45PM +0300, Serge Semin wrote: > > DW DMA IP-core provides a way to synthesize the DMA controller with > > channels having different parameters like maximum burst-length, > > multi-block support, maximum data width, etc. Those parameters both > > explicitly and implicitly affect the channels performance. Since DMA slave > > devices might be very demanding to the DMA performance, let's provide a > > functionality for the slaves to be assigned with DW DMA channels, which > > performance according to the platform engineer fulfill their requirements. > > After this patch is applied it can be done by passing the mask of suitable > > DMA-channels either directly in the dw_dma_slave structure instance or as > > a fifth cell of the DMA DT-property. If mask is zero or not provided, then > > there is no limitation on the channels allocation. > > > > For instance Baikal-T1 SoC is equipped with a DW DMAC engine, which first > > two channels are synthesized with max burst length of 16, while the rest > > of the channels have been created with max-burst-len=4. It would seem that > > the first two channels must be faster than the others and should be more > > preferable for the time-critical DMA slave devices. In practice it turned > > out that the situation is quite the opposite. The channels with > > max-burst-len=4 demonstrated a better performance than the channels with > > max-burst-len=16 even when they both had been initialized with the same > > settings. The performance drop of the first two DMA-channels made them > > unsuitable for the DW APB SSI slave device. No matter what settings they > > are configured with, full-duplex SPI transfers occasionally experience the > > Rx FIFO overflow. It means that the DMA-engine doesn't keep up with > > incoming data pace even though the SPI-bus is enabled with speed of 25MHz > > while the DW DMA controller is clocked with 50MHz signal. There is no such > > problem has been noticed for the channels synthesized with > > max-burst-len=4. > > ... > > > + if (dws->channels && !(dws->channels & dwc->mask)) > > You can drop the first check if... See below. > > > + return false; > > ... > > > + if (dma_spec->args_count >= 4) > > + slave.channels = dma_spec->args[3]; > > ...you apply sane default here or somewhere else. Alas I can't because dw_dma_slave structure is defined all over the kernel drivers/spi/spi-dw-dma.c drivers/spi/spi-pxa2xx-pci.c drivers/tty/serial/8250/8250_lpss.c These devices aren't always placed on the OF-based platforms. In that case the corresponding DMA-channels won't be requested by means of the dw_dma_of_xlate() method. So we have to preserve a default behavior if dws->channels is zero. > > ... > > > + fls(slave.channels) > dw->pdata->nr_channels)) > > Does it really make sense? It does to prevent the clients to specify an invalid channels mask, which can't have bits set higher than the number of channels the engine supports. > > I think it can also be simplified to faster op, i.e. > BIT(nr_channels) < slave.channels > (but check for off-by-one errors) Makes sense. Thanks. I'll replace it with the next statement: slave.channels >= BIT(dw->pdata->nr_channels) > > ... > > > + * @channels: mask of the channels permitted for allocation (zero > > + * value means any) > > Perhaps on one line? I don't really care. If you insist on that, I'll make it a single line, but it will be over 80 columns. 85 to be exact. -Sergey > > -- > With Best Regards, > Andy Shevchenko > >
diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c index 3da0aea9fe25..5f7b9badb965 100644 --- a/drivers/dma/dw/core.c +++ b/drivers/dma/dw/core.c @@ -772,6 +772,10 @@ bool dw_dma_filter(struct dma_chan *chan, void *param) if (dws->dma_dev != chan->device->dev) return false; + /* permit channels in accordance with the channels mask */ + if (dws->channels && !(dws->channels & dwc->mask)) + return false; + /* We have to copy data since dws can be temporary storage */ memcpy(&dwc->dws, dws, sizeof(struct dw_dma_slave)); diff --git a/drivers/dma/dw/of.c b/drivers/dma/dw/of.c index 1474b3817ef4..abdf22b269b5 100644 --- a/drivers/dma/dw/of.c +++ b/drivers/dma/dw/of.c @@ -22,18 +22,21 @@ static struct dma_chan *dw_dma_of_xlate(struct of_phandle_args *dma_spec, }; dma_cap_mask_t cap; - if (dma_spec->args_count != 3) + if (dma_spec->args_count < 3 || dma_spec->args_count > 4) return NULL; slave.src_id = dma_spec->args[0]; slave.dst_id = dma_spec->args[0]; slave.m_master = dma_spec->args[1]; slave.p_master = dma_spec->args[2]; + if (dma_spec->args_count >= 4) + slave.channels = dma_spec->args[3]; if (WARN_ON(slave.src_id >= DW_DMA_MAX_NR_REQUESTS || slave.dst_id >= DW_DMA_MAX_NR_REQUESTS || slave.m_master >= dw->pdata->nr_masters || - slave.p_master >= dw->pdata->nr_masters)) + slave.p_master >= dw->pdata->nr_masters || + fls(slave.channels) > dw->pdata->nr_channels)) return NULL; dma_cap_zero(cap); diff --git a/include/linux/platform_data/dma-dw.h b/include/linux/platform_data/dma-dw.h index 4f681df85c27..3bc48451a70c 100644 --- a/include/linux/platform_data/dma-dw.h +++ b/include/linux/platform_data/dma-dw.h @@ -23,6 +23,8 @@ * @dst_id: dst request line * @m_master: memory master for transfers on allocated channel * @p_master: peripheral master for transfers on allocated channel + * @channels: mask of the channels permitted for allocation (zero + * value means any) * @hs_polarity:set active low polarity of handshake interface */ struct dw_dma_slave { @@ -31,6 +33,7 @@ struct dw_dma_slave { u8 dst_id; u8 m_master; u8 p_master; + u8 channels; bool hs_polarity; };
DW DMA IP-core provides a way to synthesize the DMA controller with channels having different parameters like maximum burst-length, multi-block support, maximum data width, etc. Those parameters both explicitly and implicitly affect the channels performance. Since DMA slave devices might be very demanding to the DMA performance, let's provide a functionality for the slaves to be assigned with DW DMA channels, which performance according to the platform engineer fulfill their requirements. After this patch is applied it can be done by passing the mask of suitable DMA-channels either directly in the dw_dma_slave structure instance or as a fifth cell of the DMA DT-property. If mask is zero or not provided, then there is no limitation on the channels allocation. For instance Baikal-T1 SoC is equipped with a DW DMAC engine, which first two channels are synthesized with max burst length of 16, while the rest of the channels have been created with max-burst-len=4. It would seem that the first two channels must be faster than the others and should be more preferable for the time-critical DMA slave devices. In practice it turned out that the situation is quite the opposite. The channels with max-burst-len=4 demonstrated a better performance than the channels with max-burst-len=16 even when they both had been initialized with the same settings. The performance drop of the first two DMA-channels made them unsuitable for the DW APB SSI slave device. No matter what settings they are configured with, full-duplex SPI transfers occasionally experience the Rx FIFO overflow. It means that the DMA-engine doesn't keep up with incoming data pace even though the SPI-bus is enabled with speed of 25MHz while the DW DMA controller is clocked with 50MHz signal. There is no such problem has been noticed for the channels synthesized with max-burst-len=4. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> --- drivers/dma/dw/core.c | 4 ++++ drivers/dma/dw/of.c | 7 +++++-- include/linux/platform_data/dma-dw.h | 3 +++ 3 files changed, 12 insertions(+), 2 deletions(-)