diff mbox series

[v4,2/4] spi: Split spi message into max_dma_len size chunks

Message ID 20190411164235.49771-3-noralf@tronnes.org (mailing list archive)
State New, archived
Headers show
Series Chunk splitting of spi transfers | expand

Commit Message

Noralf Trønnes April 11, 2019, 4:42 p.m. UTC
From: Meghana Madhyastha <meghana.madhyastha@gmail.com>

Some drivers like spi_bcm2835 have a max size on DMA transfers. Work
around this by splitting up the transfer if necessary.

->max_transfer_size is MAX_INT if the driver doesn't set it, so this change
will only affect drivers that set the value.

Signed-off-by: Meghana Madhyastha <meghana.madhyastha@gmail.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
---
 drivers/spi/spi.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Lukas Wunner April 11, 2019, 6:18 p.m. UTC | #1
On Thu, Apr 11, 2019 at 06:42:33PM +0200, Noralf Trønnes wrote:
> @@ -1299,6 +1299,11 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
>  
>  	trace_spi_message_start(ctlr->cur_msg);
>  
> +	ret = spi_split_transfers_maxsize(ctlr, ctlr->cur_msg, ctlr->max_dma_len,
> +					  GFP_KERNEL | GFP_DMA);

Um, shouldn't this be ifdef'ed to CONFIG_HAS_DMA?


> Some drivers like spi_bcm2835 have a max size on DMA transfers. Work
> around this by splitting up the transfer if necessary.

This feels like a very expensive solution to the problem:  Large transfers
are split into multiple smaller transfers (requiring lots of overhead to
allocate and populate the structures) and the split transfers seem to be
cached for later reuse.

I'm wondering, for a frame buffer, doesn't the buffer (and hence the SPI
message) change on every transmission?  In other words, does a frame
buffer benefit from the caching?  If not, then the caching seems to incur
unnecessary overhead.

Even the normal case when no transfers exceed the limit is affected by
additional overhead, namely an iteration over the transfer list to
check each transfers' length. :-(

Note that spi_map_buf() already splits every transfer's sglist into
segments that are smaller than ctlr->max_dma_len.  Now all that needs
to be done is to amend spi-bcm2835.c to iterate over the sglist
and transmit it in portions which do not exceed 65535.  Addressing the
problem at this lower level would drastically reduce the overhead
compared to the approach in the present patch and hence appears to be
more recommendable.

Thanks,

Lukas
Noralf Trønnes April 11, 2019, 8:39 p.m. UTC | #2
Den 11.04.2019 18.42, skrev Noralf Trønnes:
> From: Meghana Madhyastha <meghana.madhyastha@gmail.com>
> 
> Some drivers like spi_bcm2835 have a max size on DMA transfers. Work
> around this by splitting up the transfer if necessary.
> 
> ->max_transfer_size is MAX_INT if the driver doesn't set it, so this change
> will only affect drivers that set the value.
> 
> Signed-off-by: Meghana Madhyastha <meghana.madhyastha@gmail.com>
> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
> ---
>  drivers/spi/spi.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index 05875e63be43..22bc658032b3 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -1299,6 +1299,11 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
>  
>  	trace_spi_message_start(ctlr->cur_msg);
>  
> +	ret = spi_split_transfers_maxsize(ctlr, ctlr->cur_msg, ctlr->max_dma_len,
> +					  GFP_KERNEL | GFP_DMA);
> +	if (ret)
> +		goto out;
> +
>  	if (ctlr->prepare_message) {
>  		ret = ctlr->prepare_message(ctlr, ctlr->cur_msg);
>  		if (ret) {
> 

This doesn't work.

I started to wonder why it was necessary to lift the upper bound in
spi-bcm2835, so I put the test back and it triggered. Adding some
printk's show that the split transfer are mapped correctly, but when
unmap happens there's only the one big transfer:

[   70.935524] __spi_map_msg: msg=ae711dd4
[   70.935572]     xfer=95c5acd8, xfer->len=65532
[   70.955800]     xfer=aee3f198, xfer->len=65532
[   70.956034]     xfer=7fc3464e, xfer->len=22536
[   70.982802] __spi_unmap_msg: msg=ae711dd4
[   70.982889]     xfer=d5b5dbbf, xfer->len=153600

I need to study this more tomorrow to find out why this is, unless
someone knows off hand what the problem is.

Noralf.
Noralf Trønnes April 11, 2019, 9:02 p.m. UTC | #3
Den 11.04.2019 20.18, skrev Lukas Wunner:
> On Thu, Apr 11, 2019 at 06:42:33PM +0200, Noralf Trønnes wrote:
>> @@ -1299,6 +1299,11 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
>>  
>>  	trace_spi_message_start(ctlr->cur_msg);
>>  
>> +	ret = spi_split_transfers_maxsize(ctlr, ctlr->cur_msg, ctlr->max_dma_len,
>> +					  GFP_KERNEL | GFP_DMA);
> 
> Um, shouldn't this be ifdef'ed to CONFIG_HAS_DMA?
> 

Maybe, I don't know. Mark didn't mentioned it when he commented on a
previous version of this. Some hate ifdef's and want to avoid them, some
don't.

> 
>> Some drivers like spi_bcm2835 have a max size on DMA transfers. Work
>> around this by splitting up the transfer if necessary.
> 
> This feels like a very expensive solution to the problem:  Large transfers
> are split into multiple smaller transfers (requiring lots of overhead to
> allocate and populate the structures) and the split transfers seem to be
> cached for later reuse.
> 

Only drivers that set ->max_dma_len will be split, and only when the
length is bigger. I can't see any caching with a quick glance, where do
you see it?

> I'm wondering, for a frame buffer, doesn't the buffer (and hence the SPI
> message) change on every transmission?  In other words, does a frame
> buffer benefit from the caching?  If not, then the caching seems to incur
> unnecessary overhead.
> 
> Even the normal case when no transfers exceed the limit is affected by
> additional overhead, namely an iteration over the transfer list to
> check each transfers' length. :-(
> 
> Note that spi_map_buf() already splits every transfer's sglist into
> segments that are smaller than ctlr->max_dma_len.  Now all that needs
> to be done is to amend spi-bcm2835.c to iterate over the sglist
> and transmit it in portions which do not exceed 65535.  Addressing the
> problem at this lower level would drastically reduce the overhead
> compared to the approach in the present patch and hence appears to be
> more recommendable.
> 

In a previous version of this I suggested to Meghana to put this in the
driver, but Mark wanted it in the core.

Noralf.

> Thanks,
> 
> Lukas
>
Mark Brown April 12, 2019, 9:08 a.m. UTC | #4
On Thu, Apr 11, 2019 at 06:42:33PM +0200, Noralf Trønnes wrote:

> +++ b/drivers/spi/spi.c
> @@ -1299,6 +1299,11 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
>  
>  	trace_spi_message_start(ctlr->cur_msg);
>  
> +	ret = spi_split_transfers_maxsize(ctlr, ctlr->cur_msg, ctlr->max_dma_len,
> +					  GFP_KERNEL | GFP_DMA);
> +	if (ret)
> +		goto out;
> +

We should do any message maipulation in __spi_validate() with all the
other mainpulation, the message pump should just be pushing messages
out.
Mark Brown April 12, 2019, 9:47 a.m. UTC | #5
On Thu, Apr 11, 2019 at 11:02:26PM +0200, Noralf Trønnes wrote:
> Den 11.04.2019 20.18, skrev Lukas Wunner:
> > On Thu, Apr 11, 2019 at 06:42:33PM +0200, Noralf Trønnes wrote:
> >> @@ -1299,6 +1299,11 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)

> >>  	trace_spi_message_start(ctlr->cur_msg);

> >> +	ret = spi_split_transfers_maxsize(ctlr, ctlr->cur_msg, ctlr->max_dma_len,
> >> +					  GFP_KERNEL | GFP_DMA);

> > Um, shouldn't this be ifdef'ed to CONFIG_HAS_DMA?

> Maybe, I don't know. Mark didn't mentioned it when he commented on a
> previous version of this. Some hate ifdef's and want to avoid them, some
> don't.

I *think* we managed to fix all the architectures to at least stub out
the DMA interfaces, it's such a pointless thing to have conditional -
it really only makes a difference for build coverage.

> >> Some drivers like spi_bcm2835 have a max size on DMA transfers. Work
> >> around this by splitting up the transfer if necessary.

> > This feels like a very expensive solution to the problem:  Large transfers
> > are split into multiple smaller transfers (requiring lots of overhead to
> > allocate and populate the structures) and the split transfers seem to be
> > cached for later reuse.

> Only drivers that set ->max_dma_len will be split, and only when the
> length is bigger. I can't see any caching with a quick glance, where do
> you see it?

It *is* more expensive to post-process the transfers and have to do
allocations rather than having them set up as we want them on the way
in.

> > Even the normal case when no transfers exceed the limit is affected by
> > additional overhead, namely an iteration over the transfer list to
> > check each transfers' length. :-(

We already have the iterations in the message validation, if the code is
integrated there the overhead will be reduced.

> > Note that spi_map_buf() already splits every transfer's sglist into
> > segments that are smaller than ctlr->max_dma_len.  Now all that needs
> > to be done is to amend spi-bcm2835.c to iterate over the sglist
> > and transmit it in portions which do not exceed 65535.  Addressing the
> > problem at this lower level would drastically reduce the overhead
> > compared to the approach in the present patch and hence appears to be
> > more recommendable.

> In a previous version of this I suggested to Meghana to put this in the
> driver, but Mark wanted it in the core.

If we want to do this at a lower level the DMA code could hide this
limitation from the upper layers; presumably the SPI driver isn't the
only thing that might run into this.
Martin Sperl April 12, 2019, 10:03 a.m. UTC | #6
> On 12.04.2019, at 11:47, Mark Brown <broonie@kernel.org> wrote:
> 
> 
>> In a previous version of this I suggested to Meghana to put this in the
>> driver, but Mark wanted it in the core.
> 
> If we want to do this at a lower level the DMA code could hide this
> limitation from the upper layers; presumably the SPI driver isn't the
> only thing that might run into this.

For clarification:
There is a register of the SPI controller where you have to configure the
number of bytes that it will request via DMA (primarily support transfers
that are not a multiple of 4 - the data is transferred by DMA as words).
So it is not really related to the general DMA implementation but to the
DMA (request) support of the SPI controller.

Martin
Mark Brown April 12, 2019, 10:22 a.m. UTC | #7
On Fri, Apr 12, 2019 at 12:03:35PM +0200, kernel@martin.sperl.org wrote:
> > On 12.04.2019, at 11:47, Mark Brown <broonie@kernel.org> wrote:

> >> In a previous version of this I suggested to Meghana to put this in the
> >> driver, but Mark wanted it in the core.

> > If we want to do this at a lower level the DMA code could hide this
> > limitation from the upper layers; presumably the SPI driver isn't the
> > only thing that might run into this.

> For clarification:
> There is a register of the SPI controller where you have to configure the
> number of bytes that it will request via DMA (primarily support transfers
> that are not a multiple of 4 - the data is transferred by DMA as words).
> So it is not really related to the general DMA implementation but to the
> DMA (request) support of the SPI controller.

Ah, I see.  That's unfortunate :/
Lukas Wunner April 12, 2019, 10:46 a.m. UTC | #8
On Thu, Apr 11, 2019 at 11:02:26PM +0200, Noralf Trønnes wrote:
> Den 11.04.2019 20.18, skrev Lukas Wunner:
> > On Thu, Apr 11, 2019 at 06:42:33PM +0200, Noralf Trønnes wrote:
> >> @@ -1299,6 +1299,11 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
> >>  
> >>  	trace_spi_message_start(ctlr->cur_msg);
> >>  
> >> +	ret = spi_split_transfers_maxsize(ctlr, ctlr->cur_msg, ctlr->max_dma_len,
> >> +					  GFP_KERNEL | GFP_DMA);
> > 
> > Um, shouldn't this be ifdef'ed to CONFIG_HAS_DMA?
> 
> Maybe, I don't know. Mark didn't mentioned it when he commented on a
> previous version of this. Some hate ifdef's and want to avoid them, some
> don't.

The above call is clearly only necessary if DMA is present and
all the DMA-specific functionality in spi.c is ifdef'ed to
CONFIG_HAS_DMA, so the above should likewise only be compiled
in if DMA is present.

Regardless whether this is achieved with an #ifdef or an empty
inline stub for the non-DMA case.


> > > Some drivers like spi_bcm2835 have a max size on DMA transfers. Work
> > > around this by splitting up the transfer if necessary.
> > 
> > This feels like a very expensive solution to the problem:  Large transfers
> > are split into multiple smaller transfers (requiring lots of overhead to
> > allocate and populate the structures) and the split transfers seem to be
> > cached for later reuse.
> > 
> 
> Only drivers that set ->max_dma_len will be split, and only when the
> length is bigger.

The issue on the BCM2385 is that the (non-Lite) DMA channels have a
max length of 1 GByte but the SPI controller has a 65535 byte limit.

There are three other drivers which set max_dma_len, I'm not sure if
they suffer from the same issue as the BCM2835, but this patch causes
additional overhead for them so the driver maintainers / main authors
should at least be cc'ed.


> I can't see any caching with a quick glance, where do you see it?

Misunderstanding of the code on my part, sorry.  I thought
spi_res_release() wouldn't be called until the spi_message is freed,
in which case the split transfers would be kept around (cached) for
retransmissions of the same message.  But I notice now that
spi_res_release() is already called upon finishing the first
transmission of the spi_message, hence the split transfers are gone
and need to be reconstructed on retransmission.


> > Note that spi_map_buf() already splits every transfer's sglist into
> > segments that are smaller than ctlr->max_dma_len.  Now all that needs
> > to be done is to amend spi-bcm2835.c to iterate over the sglist
> > and transmit it in portions which do not exceed 65535.  Addressing the
> > problem at this lower level would drastically reduce the overhead
> > compared to the approach in the present patch and hence appears to be
> > more recommendable.
> 
> In a previous version of this I suggested to Meghana to put this in the
> driver, but Mark wanted it in the core.

Do you have a link to these comments of Mark?  The first version of
this patchset that I have here is v2 of March 2018 and it already
uses spi_split_transfers_maxsize().  I've never seen a version which
splits the sglist in spi-bcm2835.c (instead of splitting the transfers
in spi.c, which, again, is significantly more expensive).

Thanks,

Lukas
Mark Brown April 12, 2019, 11:09 a.m. UTC | #9
On Fri, Apr 12, 2019 at 12:54:48PM +0200, Lukas Wunner wrote:
> On Fri, Apr 12, 2019 at 10:47:21AM +0100, Mark Brown wrote:

> > I *think* we managed to fix all the architectures to at least stub out
> > the DMA interfaces, it's such a pointless thing to have conditional -
> > it really only makes a difference for build coverage.

> My point was that the call to spi_split_transfers_maxsize() shouldn't
> be called on non-DMA-capable platforms in the first place (because it's
> DMA-specific).

It's not a DMA specific problem - there can be SPI controller
limitations on transfer sizes too.
Lukas Wunner April 12, 2019, 11:16 a.m. UTC | #10
On Fri, Apr 12, 2019 at 12:09:34PM +0100, Mark Brown wrote:
> On Fri, Apr 12, 2019 at 12:54:48PM +0200, Lukas Wunner wrote:
> > On Fri, Apr 12, 2019 at 10:47:21AM +0100, Mark Brown wrote:
> > > I *think* we managed to fix all the architectures to at least stub out
> > > the DMA interfaces, it's such a pointless thing to have conditional -
> > > it really only makes a difference for build coverage.
> 
> > My point was that the call to spi_split_transfers_maxsize() shouldn't
> > be called on non-DMA-capable platforms in the first place (because it's
> > DMA-specific).
> 
> It's not a DMA specific problem - there can be SPI controller
> limitations on transfer sizes too.

The call does pass in ctlr->max_dma_len though, so is clearly motivated
by a DMA limitation.
Mark Brown April 12, 2019, 11:28 a.m. UTC | #11
On Fri, Apr 12, 2019 at 01:16:15PM +0200, Lukas Wunner wrote:
> On Fri, Apr 12, 2019 at 12:09:34PM +0100, Mark Brown wrote:
> > On Fri, Apr 12, 2019 at 12:54:48PM +0200, Lukas Wunner wrote:

> > > My point was that the call to spi_split_transfers_maxsize() shouldn't
> > > be called on non-DMA-capable platforms in the first place (because it's
> > > DMA-specific).

> > It's not a DMA specific problem - there can be SPI controller
> > limitations on transfer sizes too.

> The call does pass in ctlr->max_dma_len though, so is clearly motivated
> by a DMA limitation.

Yeah, that bit is but we can have other limitations.
Martin Sperl April 12, 2019, 11:34 a.m. UTC | #12
> On 12.04.2019, at 13:16, Lukas Wunner <lukas@wunner.de> wrote:
> 
> On Fri, Apr 12, 2019 at 12:09:34PM +0100, Mark Brown wrote:
>> On Fri, Apr 12, 2019 at 12:54:48PM +0200, Lukas Wunner wrote:
>>> On Fri, Apr 12, 2019 at 10:47:21AM +0100, Mark Brown wrote:
>>>> I *think* we managed to fix all the architectures to at least stub out
>>>> the DMA interfaces, it's such a pointless thing to have conditional -
>>>> it really only makes a difference for build coverage.
>> 
>>> My point was that the call to spi_split_transfers_maxsize() shouldn't
>>> be called on non-DMA-capable platforms in the first place (because it's
>>> DMA-specific).
>> 
>> It's not a DMA specific problem - there can be SPI controller
>> limitations on transfer sizes too.
> 
> The call does pass in ctlr->max_dma_len though, so is clearly motivated
> by a DMA limitation.
The limitation is in this register: BCM2835_SPI_DLEN

Where the bcm2835-SDK (Brcm_Android_ICS_Graphics_Stack.tar.gz)defines:
bit 0-15 for DLEN

See transformed data at:
https://github.com/msperl/rpi-registers/blob/master/md/Region_SPI.md

Or the Arm Periperial Document on page 156 with regards to DLEN register.

This is where the DMA length limit comes from - so it is NOT
really DMA block related but SPI block related.

More specifically to DMA requests are only triggered when this counter
is > 0.

That is unless the Documentation has another errata and more bits are
actually allowed.

Martin
Mark Brown April 12, 2019, 11:46 a.m. UTC | #13
On Fri, Apr 12, 2019 at 12:46:44PM +0200, Lukas Wunner wrote:
> On Thu, Apr 11, 2019 at 11:02:26PM +0200, Noralf Trønnes wrote:
> > Den 11.04.2019 20.18, skrev Lukas Wunner:

> > > Note that spi_map_buf() already splits every transfer's sglist into
> > > segments that are smaller than ctlr->max_dma_len.  Now all that needs
> > > to be done is to amend spi-bcm2835.c to iterate over the sglist
> > > and transmit it in portions which do not exceed 65535.  Addressing the
> > > problem at this lower level would drastically reduce the overhead
> > > compared to the approach in the present patch and hence appears to be
> > > more recommendable.

> > In a previous version of this I suggested to Meghana to put this in the
> > driver, but Mark wanted it in the core.

> Do you have a link to these comments of Mark?  The first version of
> this patchset that I have here is v2 of March 2018 and it already
> uses spi_split_transfers_maxsize().  I've never seen a version which
> splits the sglist in spi-bcm2835.c (instead of splitting the transfers
> in spi.c, which, again, is significantly more expensive).

The basic theory is that we shouldn't be open coding the same handling
in multiple drivers, they should just be able to declare their
capabilities and have the core handle things as far as possible.  Since
we're already iterating over the whole message I'd be surprised if there
were a particularly big hit in cases where we don't need to split things
out, though obviously that's not been tested yet.

There have also been discussions in the past about pre-cooking some of
the messages so that if the client is sending the same thing a lot (or
things that vary only by data) we can do a lot of the validation one
time which would help.
diff mbox series

Patch

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 05875e63be43..22bc658032b3 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1299,6 +1299,11 @@  static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 
 	trace_spi_message_start(ctlr->cur_msg);
 
+	ret = spi_split_transfers_maxsize(ctlr, ctlr->cur_msg, ctlr->max_dma_len,
+					  GFP_KERNEL | GFP_DMA);
+	if (ret)
+		goto out;
+
 	if (ctlr->prepare_message) {
 		ret = ctlr->prepare_message(ctlr, ctlr->cur_msg);
 		if (ret) {