spi: mediatek: Manually set dma_ops for spi_master device

Message ID	20170124132519.13271-1-djkurtz@chromium.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-mediatek-bounces+patchwork-linux-mediatek=patchwork.kernel.org@lists.infradead.org> From: Daniel Kurtz <djkurtz@chromium.org> To: Leilk Liu <leilk.liu@mediatek.com> Subject: [PATCH] spi: mediatek: Manually set dma_ops for spi_master device Date: Tue, 24 Jan 2017 21:25:19 +0800 Message-Id: <20170124132519.13271-1-djkurtz@chromium.org> Precedence: list Cc: open list <linux-kernel@vger.kernel.org>, "open list:SPI SUBSYSTEM" <linux-spi@vger.kernel.org>, Matthias Brugger <matthias.bgg@gmail.com>, Mark Brown <broonie@kernel.org>, "moderated list:ARM/Mediatek SoC support" <linux-mediatek@lists.infradead.org>, groeck@chromium.org, dtor@chromium.org, "moderated list:ARM/Mediatek SoC support" <linux-arm-kernel@lists.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "Linux-mediatek" <linux-mediatek-bounces@lists.infradead.org> Errors-To: linux-mediatek-bounces+patchwork-linux-mediatek=patchwork.kernel.org@lists.infradead.org

Daniel Kurtz Jan. 24, 2017, 1:25 p.m. UTC

Back before commit 1dccb598df54 ("arm64: simplify dma_get_ops"), for arm64,
devices that do not manually set a dma_ops are automatically configured to
use swiotlb_dma_ops, since this was hard-coded as the global "dma_ops" in
arm64_dma_init().

Now, the global "dma_ops" has been removed, and all devices much have
their dma_ops explicitly set by a call to arch_setup_dma_ops().  If
arch_setup_dma_ops() is not called, the device is assigned dummy_dma_ops,
and thus calls to map_sg for such a device will fail (return 0).

Mediatek SPI uses DMA but does not use a dma channel.  Support for this
was added by commit c37f45b5f1cd ("spi: support spi without dma channel to
use can_dma()"), which uses the master_spi dev to DMA map buffers.

The master_spi device is not a platform, rather it is created
in spi_alloc_device(), and its dma_ops are never set.

Therefore, when the mediatek SPI driver when it does DMA (for large SPI
transactions > 32 bytes), SPI will use spi_map_buf()->dma_map_sg() to map
the buffer for use in DMA.  But dma_map_sg()->dma_map_sg_attrs() returns 0,
because ops->map_sg is dummy_dma_ops->__dummy_map_sg, and hence
spi_map_buf() returns -ENOMEM (-12).

The solution in this patch is a bit of a hack.  To install dma_ops for
its spi_master, we call of_dma_configure() during probe and pass in the
of_node of the spi-mt65xx platform device.

However, by default, of_dma_configure() will set the coherent_dma_mask
to 0xffffffff.  In fact, using a non-zero dma_mask doesn't actually work
and causes frequent SPI transaction errors.  To work around this, we
also explicitly set coherent_dma_mask to 0.

Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
---

I don't know the right place to configure the dma_ops for spi_master.
It feels like this should actually be done in the spi core, as this might be
needed by other drivers.

Alternatively, perhaps we should be using master->dev.parent to dma map bufs
in the "no dma channel case".

And I really don't know why we needed to set the coherent_dma_mask to 0 to
avoid SPI transaction errors.

Any advice is welcome.

 drivers/spi/spi-mt65xx.c | 5 +++++
 1 file changed, 5 insertions(+)

Robin Murphy Jan. 24, 2017, 3:14 p.m. UTC | #1

Hi Dan,

On 24/01/17 13:25, Daniel Kurtz wrote:
> Back before commit 1dccb598df54 ("arm64: simplify dma_get_ops"), for arm64,
> devices that do not manually set a dma_ops are automatically configured to
> use swiotlb_dma_ops, since this was hard-coded as the global "dma_ops" in
> arm64_dma_init().
> 
> Now, the global "dma_ops" has been removed, and all devices much have
> their dma_ops explicitly set by a call to arch_setup_dma_ops().  If
> arch_setup_dma_ops() is not called, the device is assigned dummy_dma_ops,
> and thus calls to map_sg for such a device will fail (return 0).
> 
> Mediatek SPI uses DMA but does not use a dma channel.  Support for this
> was added by commit c37f45b5f1cd ("spi: support spi without dma channel to
> use can_dma()"), which uses the master_spi dev to DMA map buffers.
> 
> The master_spi device is not a platform, rather it is created
> in spi_alloc_device(), and its dma_ops are never set.

Ugh, this isn't dwc3-usb all over again, is it?

> Therefore, when the mediatek SPI driver when it does DMA (for large SPI
> transactions > 32 bytes), SPI will use spi_map_buf()->dma_map_sg() to map
> the buffer for use in DMA.  But dma_map_sg()->dma_map_sg_attrs() returns 0,
> because ops->map_sg is dummy_dma_ops->__dummy_map_sg, and hence
> spi_map_buf() returns -ENOMEM (-12).
> 
> The solution in this patch is a bit of a hack.  To install dma_ops for
> its spi_master, we call of_dma_configure() during probe and pass in the
> of_node of the spi-mt65xx platform device.
> 
> However, by default, of_dma_configure() will set the coherent_dma_mask
> to 0xffffffff.  In fact, using a non-zero dma_mask doesn't actually work
> and causes frequent SPI transaction errors.  To work around this, we
> also explicitly set coherent_dma_mask to 0.

That sound like it could do with more investigation before piling hacks
on top of hacks. In theory, that would force dma_alloc_*() to always
fail, whilst allowing dma_map_*() to still work - I imagine the actual
SPI block is just a plain AXI master, meaning there oughtn't to be any
fundamental difference between coherent and streaming DMA, but there can
be differences in whatever functionality those API uses represent in the
driver/SPI code (e.g. data buffers vs. control descriptors).

> Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
> ---
> 
> I don't know the right place to configure the dma_ops for spi_master.
> It feels like this should actually be done in the spi core, as this might be
> needed by other drivers.
> 
> Alternatively, perhaps we should be using master->dev.parent to dma map bufs
> in the "no dma channel case".

AFAICS from a quick scan of the driver, that would be the correct thing
to do - in general for a platform device, the guy described in the DT
should be the guy being passed to DMA API calls.

> And I really don't know why we needed to set the coherent_dma_mask to 0 to
> avoid SPI transaction errors.

Following what I mentioned above, "git grep dma_alloc drivers/spi" makes
it seem like coherent DMA isn't used anywhere relevant, which is rather
puzzling. Unless of course somewhere along the line somebody's done the
dev->dma_mask = &dev->dma_coherent_mask hack, with the result that
writing one mask hits both, making *all* DMA fail and big transfers fall
back to PIO.

And as a backup review comment in case any of the above musings go
nowhere; at the very least please use dma_set_coherent_mask() rather
than open-coding it ;)

Robin.

> Any advice is welcome.
> 
>  drivers/spi/spi-mt65xx.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/spi/spi-mt65xx.c b/drivers/spi/spi-mt65xx.c
> index 8763eff13d3d..e768835bb67f 100644
> --- a/drivers/spi/spi-mt65xx.c
> +++ b/drivers/spi/spi-mt65xx.c
> @@ -20,6 +20,7 @@
>  #include <linux/ioport.h>
>  #include <linux/module.h>
>  #include <linux/of.h>
> +#include <linux/of_device.h>
>  #include <linux/of_gpio.h>
>  #include <linux/platform_device.h>
>  #include <linux/platform_data/spi-mt65xx.h>
> @@ -575,6 +576,10 @@ static int mtk_spi_probe(struct platform_device *pdev)
>  		goto err_put_master;
>  	}
>  
> +	/* Call of_dma_configure to set up spi_master's dma_ops */
> +	of_dma_configure(&master->dev, master->dev.of_node);
> +	/* But explicitly set the coherent_dma_mask to 0 */
> +	master->dev.coherent_dma_mask = 0;
>  	if (!pdev->dev.dma_mask)
>  		pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
>  
>

Mark Brown Jan. 24, 2017, 3:48 p.m. UTC | #2

On Tue, Jan 24, 2017 at 09:25:19PM +0800, Daniel Kurtz wrote:

> The master_spi device is not a platform, rather it is created
> in spi_alloc_device(), and its dma_ops are never set.

Which is good because it's not a physical device and doesn't do DMA.

> Alternatively, perhaps we should be using master->dev.parent to dma map bufs
> in the "no dma channel case".

Yes, exactly.

Daniel Kurtz Jan. 25, 2017, 10:24 a.m. UTC | #3

Hi Robin,

On Tue, Jan 24, 2017 at 11:14 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> Hi Dan,

[snip...]

>> And I really don't know why we needed to set the coherent_dma_mask to 0 to
>> avoid SPI transaction errors.
>
> Following what I mentioned above, "git grep dma_alloc drivers/spi" makes
> it seem like coherent DMA isn't used anywhere relevant, which is rather
> puzzling. Unless of course somewhere along the line somebody's done the
> dev->dma_mask = &dev->dma_coherent_mask hack, with the result that
> writing one mask hits both, making *all* DMA fail and big transfers fall
> back to PIO.

You mean this last line?

>> @@ -575,6 +576,10 @@ static int mtk_spi_probe(struct platform_device *pdev)
>>               goto err_put_master;
>>       }
>>
>> +     /* Call of_dma_configure to set up spi_master's dma_ops */
>> +     of_dma_configure(&master->dev, master->dev.of_node);
>> +     /* But explicitly set the coherent_dma_mask to 0 */
>> +     master->dev.coherent_dma_mask = 0;
>>       if (!pdev->dev.dma_mask)
>>               pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
                                  ^^^^^

As predicted, setting the dma_mask = 0 causes "all DMA to fail".  In
particular, dma_capable() always returns false, so
swiotlb_map_sg_attrs() takes the map_single() path, instead of just
assigning:

   sg->dma_address = dev_addr;

swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
                     enum dma_data_direction dir, unsigned long attrs)
{
        struct scatterlist *sg;
        int i;

        BUG_ON(dir == DMA_NONE);

        for_each_sg(sgl, sg, nelems, i) {
                phys_addr_t paddr = sg_phys(sg);
                dma_addr_t dev_addr = phys_to_dma(hwdev, paddr);

                if (swiotlb_force == SWIOTLB_FORCE ||
                    !dma_capable(hwdev, dev_addr, sg->length)) {
                        phys_addr_t map = map_single(hwdev, sg_phys(sg),
                                                     sg->length, dir, attrs);
                        ...
                        }
                        sg->dma_address = phys_to_dma(hwdev, map);
                } else
                        sg->dma_address = dev_addr;
                sg_dma_len(sg) = sg->length;
        }
        return nelems;
}

So, I think this means that the only reason the MTK SPI driver ever
worked before was that it was tested on an older kernel, so the
spi_master was defaulting to swiotlb_dma_ops with a 0 dma_mask, and
therefore it was using SWIOTLB bounce buffers (via 'map_single'), and
not actually ever doing real DMA.

I'm in a bit over my head here, any suggestions on how to fix this?

-Dan

Mark Brown Jan. 25, 2017, 12:34 p.m. UTC | #4

On Wed, Jan 25, 2017 at 06:24:12PM +0800, Daniel Kurtz wrote:

> I'm in a bit over my head here, any suggestions on how to fix this?

As both Robin and I said the driver should be using the platform device
which physically exists rather than the virtual master device.

Robin Murphy Jan. 25, 2017, 5:59 p.m. UTC | #5

On 25/01/17 10:24, Daniel Kurtz wrote:
> Hi Robin,
> 
> On Tue, Jan 24, 2017 at 11:14 PM, Robin Murphy <robin.murphy@arm.com> wrote:
>> Hi Dan,
> 
> [snip...]
> 
>>> And I really don't know why we needed to set the coherent_dma_mask to 0 to
>>> avoid SPI transaction errors.
>>
>> Following what I mentioned above, "git grep dma_alloc drivers/spi" makes
>> it seem like coherent DMA isn't used anywhere relevant, which is rather
>> puzzling. Unless of course somewhere along the line somebody's done the
>> dev->dma_mask = &dev->dma_coherent_mask hack, with the result that
>> writing one mask hits both, making *all* DMA fail and big transfers fall
>> back to PIO.
> 
> You mean this last line?
> 
>>> @@ -575,6 +576,10 @@ static int mtk_spi_probe(struct platform_device *pdev)
>>>               goto err_put_master;
>>>       }
>>>
>>> +     /* Call of_dma_configure to set up spi_master's dma_ops */
>>> +     of_dma_configure(&master->dev, master->dev.of_node);
>>> +     /* But explicitly set the coherent_dma_mask to 0 */
>>> +     master->dev.coherent_dma_mask = 0;
>>>       if (!pdev->dev.dma_mask)
>>>               pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
>                                   ^^^^^

Ha, I totally failed to spot that!

> As predicted, setting the dma_mask = 0 causes "all DMA to fail".  In
> particular, dma_capable() always returns false, so
> swiotlb_map_sg_attrs() takes the map_single() path, instead of just
> assigning:
> 
>    sg->dma_address = dev_addr;
> 
> swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
>                      enum dma_data_direction dir, unsigned long attrs)
> {
>         struct scatterlist *sg;
>         int i;
> 
>         BUG_ON(dir == DMA_NONE);
> 
>         for_each_sg(sgl, sg, nelems, i) {
>                 phys_addr_t paddr = sg_phys(sg);
>                 dma_addr_t dev_addr = phys_to_dma(hwdev, paddr);
> 
>                 if (swiotlb_force == SWIOTLB_FORCE ||
>                     !dma_capable(hwdev, dev_addr, sg->length)) {
>                         phys_addr_t map = map_single(hwdev, sg_phys(sg),
>                                                      sg->length, dir, attrs);
>                         ...
>                         }
>                         sg->dma_address = phys_to_dma(hwdev, map);
>                 } else
>                         sg->dma_address = dev_addr;
>                 sg_dma_len(sg) = sg->length;
>         }
>         return nelems;
> }
> 
> So, I think this means that the only reason the MTK SPI driver ever
> worked before was that it was tested on an older kernel, so the
> spi_master was defaulting to swiotlb_dma_ops with a 0 dma_mask, and
> therefore it was using SWIOTLB bounce buffers (via 'map_single'), and
> not actually ever doing real DMA.

Well, it's still "real DMA" if the device is able to slurp data out of
the bounce buffer. That suggests there might be some mismatch between
the default DMA mask it's getting given and the actual hardware
capability (i.e. the bounce buffer happens to fall somewhere accessible,
but other addresses may not be) - is crazy 33-bit mode involved here?

Robin.

> I'm in a bit over my head here, any suggestions on how to fix this?
> 
> -Dan
>

Daniel Kurtz Jan. 26, 2017, 4:29 p.m. UTC | #6

Hi Robin,

On Thu, Jan 26, 2017 at 1:59 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 25/01/17 10:24, Daniel Kurtz wrote:
> > Hi Robin,
> >
> > On Tue, Jan 24, 2017 at 11:14 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> >> Hi Dan,
> >
> > [snip...]
> >
> >>> And I really don't know why we needed to set the coherent_dma_mask to 0 to
> >>> avoid SPI transaction errors.
> >>
> >> Following what I mentioned above, "git grep dma_alloc drivers/spi" makes
> >> it seem like coherent DMA isn't used anywhere relevant, which is rather
> >> puzzling. Unless of course somewhere along the line somebody's done the
> >> dev->dma_mask = &dev->dma_coherent_mask hack, with the result that
> >> writing one mask hits both, making *all* DMA fail and big transfers fall
> >> back to PIO.
> >
> > You mean this last line?
> >
> >>> @@ -575,6 +576,10 @@ static int mtk_spi_probe(struct platform_device *pdev)
> >>>               goto err_put_master;
> >>>       }
> >>>
> >>> +     /* Call of_dma_configure to set up spi_master's dma_ops */
> >>> +     of_dma_configure(&master->dev, master->dev.of_node);
> >>> +     /* But explicitly set the coherent_dma_mask to 0 */
> >>> +     master->dev.coherent_dma_mask = 0;
> >>>       if (!pdev->dev.dma_mask)
> >>>               pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
> >                                   ^^^^^
>
> Ha, I totally failed to spot that!
>
> > As predicted, setting the dma_mask = 0 causes "all DMA to fail".  In
> > particular, dma_capable() always returns false, so
> > swiotlb_map_sg_attrs() takes the map_single() path, instead of just
> > assigning:
> >
> >    sg->dma_address = dev_addr;
> >
> > swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
> >                      enum dma_data_direction dir, unsigned long attrs)
> > {
> >         struct scatterlist *sg;
> >         int i;
> >
> >         BUG_ON(dir == DMA_NONE);
> >
> >         for_each_sg(sgl, sg, nelems, i) {
> >                 phys_addr_t paddr = sg_phys(sg);
> >                 dma_addr_t dev_addr = phys_to_dma(hwdev, paddr);
> >
> >                 if (swiotlb_force == SWIOTLB_FORCE ||
> >                     !dma_capable(hwdev, dev_addr, sg->length)) {
> >                         phys_addr_t map = map_single(hwdev, sg_phys(sg),
> >                                                      sg->length, dir, attrs);
> >                         ...
> >                         }
> >                         sg->dma_address = phys_to_dma(hwdev, map);
> >                 } else
> >                         sg->dma_address = dev_addr;
> >                 sg_dma_len(sg) = sg->length;
> >         }
> >         return nelems;
> > }
> >
> > So, I think this means that the only reason the MTK SPI driver ever
> > worked before was that it was tested on an older kernel, so the
> > spi_master was defaulting to swiotlb_dma_ops with a 0 dma_mask, and
> > therefore it was using SWIOTLB bounce buffers (via 'map_single'), and
> > not actually ever doing real DMA.
>
> Well, it's still "real DMA" if the device is able to slurp data out of
> the bounce buffer. That suggests there might be some mismatch between
> the default DMA mask it's getting given and the actual hardware
> capability (i.e. the bounce buffer happens to fall somewhere accessible,
> but other addresses may not be) - is crazy 33-bit mode involved here?

AFAICT, the Mediatek SPI does not have a 33-bit mode.  The Mediatek
SPI DMA registers are only 32-bits wide, so the default 0xffffffff
dma_mask is correct.  For larger addresses, swiotlb properly falls
back to using a bounce buffer.

However, I did discover what the real problem was.  The Mediatek SPI
DMA does not like un-aligned addresses.
Teaching the driver to use the FIFO even for large buffers, and
forcing un-aligned buffers to use the FIFO makes the SPI very
reliable:

https://patchwork.kernel.org/patch/9539735/

Thanks for your help!

-Dan

Dmitry Torokhov Jan. 26, 2017, 4:35 p.m. UTC | #7

On Thu, Jan 26, 2017 at 8:29 AM, Daniel Kurtz <djkurtz@chromium.org> wrote:
> Hi Robin,
>
> On Thu, Jan 26, 2017 at 1:59 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 25/01/17 10:24, Daniel Kurtz wrote:
>> > Hi Robin,
>> >
>> > On Tue, Jan 24, 2017 at 11:14 PM, Robin Murphy <robin.murphy@arm.com> wrote:
>> >> Hi Dan,
>> >
>> > [snip...]
>> >
>> >>> And I really don't know why we needed to set the coherent_dma_mask to 0 to
>> >>> avoid SPI transaction errors.
>> >>
>> >> Following what I mentioned above, "git grep dma_alloc drivers/spi" makes
>> >> it seem like coherent DMA isn't used anywhere relevant, which is rather
>> >> puzzling. Unless of course somewhere along the line somebody's done the
>> >> dev->dma_mask = &dev->dma_coherent_mask hack, with the result that
>> >> writing one mask hits both, making *all* DMA fail and big transfers fall
>> >> back to PIO.
>> >
>> > You mean this last line?
>> >
>> >>> @@ -575,6 +576,10 @@ static int mtk_spi_probe(struct platform_device *pdev)
>> >>>               goto err_put_master;
>> >>>       }
>> >>>
>> >>> +     /* Call of_dma_configure to set up spi_master's dma_ops */
>> >>> +     of_dma_configure(&master->dev, master->dev.of_node);
>> >>> +     /* But explicitly set the coherent_dma_mask to 0 */
>> >>> +     master->dev.coherent_dma_mask = 0;
>> >>>       if (!pdev->dev.dma_mask)
>> >>>               pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
>> >                                   ^^^^^
>>
>> Ha, I totally failed to spot that!
>>
>> > As predicted, setting the dma_mask = 0 causes "all DMA to fail".  In
>> > particular, dma_capable() always returns false, so
>> > swiotlb_map_sg_attrs() takes the map_single() path, instead of just
>> > assigning:
>> >
>> >    sg->dma_address = dev_addr;
>> >
>> > swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
>> >                      enum dma_data_direction dir, unsigned long attrs)
>> > {
>> >         struct scatterlist *sg;
>> >         int i;
>> >
>> >         BUG_ON(dir == DMA_NONE);
>> >
>> >         for_each_sg(sgl, sg, nelems, i) {
>> >                 phys_addr_t paddr = sg_phys(sg);
>> >                 dma_addr_t dev_addr = phys_to_dma(hwdev, paddr);
>> >
>> >                 if (swiotlb_force == SWIOTLB_FORCE ||
>> >                     !dma_capable(hwdev, dev_addr, sg->length)) {
>> >                         phys_addr_t map = map_single(hwdev, sg_phys(sg),
>> >                                                      sg->length, dir, attrs);
>> >                         ...
>> >                         }
>> >                         sg->dma_address = phys_to_dma(hwdev, map);
>> >                 } else
>> >                         sg->dma_address = dev_addr;
>> >                 sg_dma_len(sg) = sg->length;
>> >         }
>> >         return nelems;
>> > }
>> >
>> > So, I think this means that the only reason the MTK SPI driver ever
>> > worked before was that it was tested on an older kernel, so the
>> > spi_master was defaulting to swiotlb_dma_ops with a 0 dma_mask, and
>> > therefore it was using SWIOTLB bounce buffers (via 'map_single'), and
>> > not actually ever doing real DMA.
>>
>> Well, it's still "real DMA" if the device is able to slurp data out of
>> the bounce buffer. That suggests there might be some mismatch between
>> the default DMA mask it's getting given and the actual hardware
>> capability (i.e. the bounce buffer happens to fall somewhere accessible,
>> but other addresses may not be) - is crazy 33-bit mode involved here?
>
> AFAICT, the Mediatek SPI does not have a 33-bit mode.  The Mediatek
> SPI DMA registers are only 32-bits wide, so the default 0xffffffff
> dma_mask is correct.  For larger addresses, swiotlb properly falls
> back to using a bounce buffer.
>
> However, I did discover what the real problem was.  The Mediatek SPI
> DMA does not like un-aligned addresses.

Nobody doing DMA likes unaligned buffers. In fact, memory used for DMA
must be cacheline aligned, or you going to have trouble. What
component supplies unaligned buffers to SPI core for transfers?

Thanks,
Dmitry

Daniel Kurtz Jan. 26, 2017, 4:52 p.m. UTC | #8

On Fri, Jan 27, 2017 at 12:35 AM, Dmitry Torokhov <dtor@chromium.org> wrote:
> On Thu, Jan 26, 2017 at 8:29 AM, Daniel Kurtz <djkurtz@chromium.org> wrote:
>> Hi Robin,
>>
>> On Thu, Jan 26, 2017 at 1:59 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>>
>>> On 25/01/17 10:24, Daniel Kurtz wrote:
>>> > Hi Robin,
>>> >
>>> > On Tue, Jan 24, 2017 at 11:14 PM, Robin Murphy <robin.murphy@arm.com> wrote:
>>> >> Hi Dan,
>>> >
>>> > [snip...]
>>> >
>>> >>> And I really don't know why we needed to set the coherent_dma_mask to 0 to
>>> >>> avoid SPI transaction errors.
>>> >>
>>> >> Following what I mentioned above, "git grep dma_alloc drivers/spi" makes
>>> >> it seem like coherent DMA isn't used anywhere relevant, which is rather
>>> >> puzzling. Unless of course somewhere along the line somebody's done the
>>> >> dev->dma_mask = &dev->dma_coherent_mask hack, with the result that
>>> >> writing one mask hits both, making *all* DMA fail and big transfers fall
>>> >> back to PIO.
>>> >
>>> > You mean this last line?
>>> >
>>> >>> @@ -575,6 +576,10 @@ static int mtk_spi_probe(struct platform_device *pdev)
>>> >>>               goto err_put_master;
>>> >>>       }
>>> >>>
>>> >>> +     /* Call of_dma_configure to set up spi_master's dma_ops */
>>> >>> +     of_dma_configure(&master->dev, master->dev.of_node);
>>> >>> +     /* But explicitly set the coherent_dma_mask to 0 */
>>> >>> +     master->dev.coherent_dma_mask = 0;
>>> >>>       if (!pdev->dev.dma_mask)
>>> >>>               pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
>>> >                                   ^^^^^
>>>
>>> Ha, I totally failed to spot that!
>>>
>>> > As predicted, setting the dma_mask = 0 causes "all DMA to fail".  In
>>> > particular, dma_capable() always returns false, so
>>> > swiotlb_map_sg_attrs() takes the map_single() path, instead of just
>>> > assigning:
>>> >
>>> >    sg->dma_address = dev_addr;
>>> >
>>> > swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
>>> >                      enum dma_data_direction dir, unsigned long attrs)
>>> > {
>>> >         struct scatterlist *sg;
>>> >         int i;
>>> >
>>> >         BUG_ON(dir == DMA_NONE);
>>> >
>>> >         for_each_sg(sgl, sg, nelems, i) {
>>> >                 phys_addr_t paddr = sg_phys(sg);
>>> >                 dma_addr_t dev_addr = phys_to_dma(hwdev, paddr);
>>> >
>>> >                 if (swiotlb_force == SWIOTLB_FORCE ||
>>> >                     !dma_capable(hwdev, dev_addr, sg->length)) {
>>> >                         phys_addr_t map = map_single(hwdev, sg_phys(sg),
>>> >                                                      sg->length, dir, attrs);
>>> >                         ...
>>> >                         }
>>> >                         sg->dma_address = phys_to_dma(hwdev, map);
>>> >                 } else
>>> >                         sg->dma_address = dev_addr;
>>> >                 sg_dma_len(sg) = sg->length;
>>> >         }
>>> >         return nelems;
>>> > }
>>> >
>>> > So, I think this means that the only reason the MTK SPI driver ever
>>> > worked before was that it was tested on an older kernel, so the
>>> > spi_master was defaulting to swiotlb_dma_ops with a 0 dma_mask, and
>>> > therefore it was using SWIOTLB bounce buffers (via 'map_single'), and
>>> > not actually ever doing real DMA.
>>>
>>> Well, it's still "real DMA" if the device is able to slurp data out of
>>> the bounce buffer. That suggests there might be some mismatch between
>>> the default DMA mask it's getting given and the actual hardware
>>> capability (i.e. the bounce buffer happens to fall somewhere accessible,
>>> but other addresses may not be) - is crazy 33-bit mode involved here?
>>
>> AFAICT, the Mediatek SPI does not have a 33-bit mode.  The Mediatek
>> SPI DMA registers are only 32-bits wide, so the default 0xffffffff
>> dma_mask is correct.  For larger addresses, swiotlb properly falls
>> back to using a bounce buffer.
>>
>> However, I did discover what the real problem was.  The Mediatek SPI
>> DMA does not like un-aligned addresses.
>
> Nobody doing DMA likes unaligned buffers. In fact, memory used for DMA
> must be cacheline aligned, or you going to have trouble. What
> component supplies unaligned buffers to SPI core for transfers?

cros-ec-spi

>
> Thanks,
> Dmitry

Mark Brown Jan. 26, 2017, 5:07 p.m. UTC | #9

On Fri, Jan 27, 2017 at 12:52:15AM +0800, Daniel Kurtz wrote:
> On Fri, Jan 27, 2017 at 12:35 AM, Dmitry Torokhov <dtor@chromium.org> wrote:

> > Nobody doing DMA likes unaligned buffers. In fact, memory used for DMA
> > must be cacheline aligned, or you going to have trouble. What
> > component supplies unaligned buffers to SPI core for transfers?

> cros-ec-spi

That needs fixing then - I keep thinking we should add a warning to the
core for this, especially now we do DMA mapping in the core.

Dmitry Torokhov Jan. 26, 2017, 5:26 p.m. UTC | #10

On Thu, Jan 26, 2017 at 9:07 AM, Mark Brown <broonie@kernel.org> wrote:
> On Fri, Jan 27, 2017 at 12:52:15AM +0800, Daniel Kurtz wrote:
>> On Fri, Jan 27, 2017 at 12:35 AM, Dmitry Torokhov <dtor@chromium.org> wrote:
>
>> > Nobody doing DMA likes unaligned buffers. In fact, memory used for DMA
>> > must be cacheline aligned, or you going to have trouble. What
>> > component supplies unaligned buffers to SPI core for transfers?
>
>> cros-ec-spi
>
> That needs fixing then - I keep thinking we should add a warning to the
> core for this, especially now we do DMA mapping in the core.

OTOH the full buffer used by cros ec *is* DMA-safe (allocated with
kmalloc), it is that cors ec uses it at offset while transferring data
piece by piece. Do we want to support this?

Mark Brown Jan. 27, 2017, 1:36 p.m. UTC | #11

On Thu, Jan 26, 2017 at 09:26:13AM -0800, Dmitry Torokhov wrote:

> OTOH the full buffer used by cros ec *is* DMA-safe (allocated with
> kmalloc), it is that cors ec uses it at offset while transferring data
> piece by piece. Do we want to support this?

I'm not sure the complexity is worth it as a general feature, I worry
about the cost of worrying about cutting the transaction up to include a
PIO portion.  That said that's just a gut feeling, if there were a
sufficiently simple/nice implementation it'd definitely avoid bugs.

To me part of DMA safety is the alignment restrictions.

spi: mediatek: Manually set dma_ops for spi_master device

Commit Message

Comments

Patch