diff mbox series

ARM: dts: integrator: Fix DMA ranges

Message ID 20220923113822.1445491-1-linus.walleij@linaro.org (mailing list archive)
State Superseded
Headers show
Series ARM: dts: integrator: Fix DMA ranges | expand

Commit Message

Linus Walleij Sept. 23, 2022, 11:38 a.m. UTC
A recent change affecting the behaviour of phys_to_dma() to
actually require the device tree ranges to work unmasked a
bug in the Integrator DMA ranges.

The PL110 uses the CMA allocator to obtain coherent allocations
from a dedicated 1MB video memory, leading to the following
call chain:

drm_gem_cma_create()
  dma_alloc_attrs()
    dma_alloc_from_dev_coherent()
      __dma_alloc_from_coherent()
        dma_get_device_base()
          phys_to_dma()
            translate_phys_to_dma()

phys_to_dma() by way of translate_phys_to_dma() will nowadays not
provide 1:1 mappings unless the ranges are properly defined in
the device tree and reflected into the dev->dma_range_map.

There is a bug in the device trees because the DMA ranges are
incorrectly specified, and the patch uncovers this bug.

Solution:

- Fix the LB (logic bus) ranges to be 1-to-1 like they should
  have always been.
- Provide a 1:1 dma-ranges attribute to the PL110.
- Mark the PL110 display controller as DMA coherent.

This makes the DMA ranges work right and makes the PL110
framebuffer work again.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Fixes: ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
SoC folks: please apply this directly for fixes if it seems
to be the right thing to do.
---
 arch/arm/boot/dts/integratorap-im-pd1.dts | 2 ++
 arch/arm/boot/dts/integratorap.dts        | 8 ++++----
 2 files changed, 6 insertions(+), 4 deletions(-)

Comments

Arnd Bergmann Sept. 23, 2022, 11:58 a.m. UTC | #1
On Fri, Sep 23, 2022, at 1:38 PM, Linus Walleij wrote:
> A recent change affecting the behaviour of phys_to_dma() to
> actually require the device tree ranges to work unmasked a
> bug in the Integrator DMA ranges.
>
> The PL110 uses the CMA allocator to obtain coherent allocations
> from a dedicated 1MB video memory, leading to the following
> call chain:
>
> drm_gem_cma_create()
>   dma_alloc_attrs()
>     dma_alloc_from_dev_coherent()
>       __dma_alloc_from_coherent()
>         dma_get_device_base()
>           phys_to_dma()
>             translate_phys_to_dma()
>
> phys_to_dma() by way of translate_phys_to_dma() will nowadays not
> provide 1:1 mappings unless the ranges are properly defined in
> the device tree and reflected into the dev->dma_range_map.

I don't understand this yet, what did the kernel previously
do to that allowed the correct DMA mapping when a wrong
address was set in the DT?

> There is a bug in the device trees because the DMA ranges are
> incorrectly specified, and the patch uncovers this bug.
>
> Solution:
>
> - Fix the LB (logic bus) ranges to be 1-to-1 like they should
>   have always been.
> - Provide a 1:1 dma-ranges attribute to the PL110.
> - Mark the PL110 display controller as DMA coherent.

Are you sure the actually coherent? I'm not aware of any other
ARM9 based SoC with cache-coherent DMA. What is the DMA master
that accesses

> diff --git a/arch/arm/boot/dts/integratorap-im-pd1.dts 
> b/arch/arm/boot/dts/integratorap-im-pd1.dts
> index 31724753d3f3..ecccbd1777a3 100644
> --- a/arch/arm/boot/dts/integratorap-im-pd1.dts
> +++ b/arch/arm/boot/dts/integratorap-im-pd1.dts
> @@ -248,6 +248,8 @@ display@1000000 {
>  		/* 640x480 16bpp @ 25.175MHz is 36827428 bytes/s */
>  		max-memory-bandwidth = <40000000>;
>  		memory-region = <&impd1_ram>;
> +		dma-ranges;
> +		dma-coherent;
> 
>  		port@0 {
>  			#address-cells = <1>;

Which device is the actual DMA master here? The "dma-coherent"
property sets the pl110 as coherent, but the dma-ranges property
would refer to the port, right?

> diff --git a/arch/arm/boot/dts/integratorap.dts 
> b/arch/arm/boot/dts/integratorap.dts
> index c983435ed492..9148287fa0a9 100644
> --- a/arch/arm/boot/dts/integratorap.dts
> +++ b/arch/arm/boot/dts/integratorap.dts
> @@ -262,7 +262,7 @@ bus@c0000000 {
>  		lm0: bus@c0000000 {
>  			compatible = "simple-bus";
>  			ranges = <0x00000000 0xc0000000 0x10000000>;
> -			dma-ranges = <0x00000000 0x80000000 0x10000000>;
> +			dma-ranges = <0x00000000 0xc0000000 0x10000000>;

In your description, you say that you set a 1:1 map, but this is
not what you seem to be setting here. Instead you map DMA address
zero to point to the beginning of RAM.

     Arnd
Linus Walleij Sept. 23, 2022, 12:56 p.m. UTC | #2
On Fri, Sep 23, 2022 at 1:58 PM Arnd Bergmann <arnd@arndb.de> wrote:
> On Fri, Sep 23, 2022, at 1:38 PM, Linus Walleij wrote:

> > A recent change affecting the behaviour of phys_to_dma() to
> > actually require the device tree ranges to work unmasked a
> > bug in the Integrator DMA ranges.
> >
> > The PL110 uses the CMA allocator to obtain coherent allocations
> > from a dedicated 1MB video memory, leading to the following
> > call chain:
> >
> > drm_gem_cma_create()
> >   dma_alloc_attrs()
> >     dma_alloc_from_dev_coherent()
> >       __dma_alloc_from_coherent()
> >         dma_get_device_base()
> >           phys_to_dma()
> >             translate_phys_to_dma()
> >
> > phys_to_dma() by way of translate_phys_to_dma() will nowadays not
> > provide 1:1 mappings unless the ranges are properly defined in
> > the device tree and reflected into the dev->dma_range_map.
>
> I don't understand this yet, what did the kernel previously
> do to that allowed the correct DMA mapping when a wrong
> address was set in the DT?

Ah this goes way back. I think I need a different fixes tag.
I hadn't tested this platform in a while.

What happens currently in translate_phys_to_dma() is that it returns
DMA_MAPPING_ERROR (0xffffffff) because of the buggy device tree
and that causes the problem.

Before the dma direct rework we selected ARCH_HAS_PHYS_TO_DMA
and arch/arm/include/asm/dma-direct.h did this:

static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn)
{
        if (dev)
                pfn -= dev->dma_pfn_offset;
        return (dma_addr_t)__pfn_to_bus(pfn);
}
(...)
static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
{
        unsigned int offset = paddr & ~PAGE_MASK;
        return pfn_to_dma(dev, __phys_to_pfn(paddr)) + offset;
}

As dma_pfn_offset was 0 this resulted in a 1:1 map.

The actual switchover from this code happens in
commit af6f23b88e9508f1cb8d0af008bb175019428f73
"ARM/dma-mapping: use the generic versions of dma_to_phys/phys_to_dma
by default"

I'm not sure which commit is the appropriate for Fixes: really.

> > There is a bug in the device trees because the DMA ranges are
> > incorrectly specified, and the patch uncovers this bug.
> >
> > Solution:
> >
> > - Fix the LB (logic bus) ranges to be 1-to-1 like they should
> >   have always been.
> > - Provide a 1:1 dma-ranges attribute to the PL110.
> > - Mark the PL110 display controller as DMA coherent.
>
> Are you sure the actually coherent? I'm not aware of any other
> ARM9 based SoC with cache-coherent DMA. What is the DMA master
> that accesses

No scratch that, this oneliner isn't even needed.

I got my head wrong around what coherent means in this
context. Captain hindsight says we should have named
the dt flag dma-cache-coherent instead of just
dma-coherent.

Odd thing: this flag has no device tree bindings I could
find. Maybe it's in the really old DT docs?

> Which device is the actual DMA master here? The "dma-coherent"
> property sets the pl110 as coherent, but the dma-ranges property
> would refer to the port, right?

Yeah just the latter should be set. Mea culpa.

> > diff --git a/arch/arm/boot/dts/integratorap.dts
> > b/arch/arm/boot/dts/integratorap.dts
> > index c983435ed492..9148287fa0a9 100644
> > --- a/arch/arm/boot/dts/integratorap.dts
> > +++ b/arch/arm/boot/dts/integratorap.dts
> > @@ -262,7 +262,7 @@ bus@c0000000 {
> >               lm0: bus@c0000000 {
> >                       compatible = "simple-bus";
> >                       ranges = <0x00000000 0xc0000000 0x10000000>;
> > -                     dma-ranges = <0x00000000 0x80000000 0x10000000>;
> > +                     dma-ranges = <0x00000000 0xc0000000 0x10000000>;
>
> In your description, you say that you set a 1:1 map, but this is
> not what you seem to be setting here. Instead you map DMA address
> zero to point to the beginning of RAM.

This is in this context (after my changes):

        reserved-memory {
                #address-cells = <1>;
                #size-cells = <1>;
                ranges;

                impd1_ram: vram@c2000000 {
                        /* 1 MB of designated video RAM on the IM-PD1 */
                        compatible = "shared-dma-pool";
                        reg = <0xc2000000 0x00100000>;
                        no-map;
                };
        };

So reserved-memory has to be in the root of the device tree
because we don't support putting reserved memory anywhere
else (would be nice to fix this!)

Here the special memory appears in the CPU address map.

        bus@c0000000 {
                compatible = "arm,integrator-ap-lm";
                #address-cells = <1>;
                #size-cells = <1>;
                ranges = <0xc0000000 0xc0000000 0x40000000>;
                dma-ranges;
(...)
                lm0: bus@c0000000 {
                        compatible = "simple-bus";
                        ranges = <0x00000000 0xc0000000 0x10000000>;
                        dma-ranges = <0x00000000 0xc0000000 0x10000000>;
                        reg = <0xc0000000 0x10000000>;
                        #address-cells = <1>;
                        #size-cells = <1>;

                        display@1000000 {
                                compatible = "arm,pl110", "arm,primecell";
                                reg = <0x01000000 0x1000>;
(...)
                                memory-region = <&impd1_ram>;
                                dma-ranges;
(...)

Here we find the reg properly at physical address 0xc1000000 thanks to
the ranges: 0xc0000000 + 0x00000000 + 0x01000000 = 0xc1000000.

But the special memory region is in the root of the device tree, outside of the
translation.

Now dma-ranges will assume that when we translate the memory region
it is in the address space of the display controller, but it isn't, because the
phandle goes to something in the root of the device tree.

0xc2000000 + 0x00000000 + 0x00000000 = 0xc2000000

Whenever I do ranges my head start spinning :/

If you have a better way to accomodate the DMA ranges I am happy to
comply!

We can fix the problem that reserved memory can only be in the
root as well I suppose but that is no easy regression fix.

Yours,
Linus Walleij
Arnd Bergmann Sept. 23, 2022, 1:20 p.m. UTC | #3
On Fri, Sep 23, 2022, at 2:56 PM, Linus Walleij wrote:
> On Fri, Sep 23, 2022 at 1:58 PM Arnd Bergmann <arnd@arndb.de> wrote:
>> On Fri, Sep 23, 2022, at 1:38 PM, Linus Walleij wrote:
>
>> > A recent change affecting the behaviour of phys_to_dma() to
>> > actually require the device tree ranges to work unmasked a
>> > bug in the Integrator DMA ranges.
>> >
>> > The PL110 uses the CMA allocator to obtain coherent allocations
>> > from a dedicated 1MB video memory, leading to the following
>> > call chain:
>> >
>> > drm_gem_cma_create()
>> >   dma_alloc_attrs()
>> >     dma_alloc_from_dev_coherent()
>> >       __dma_alloc_from_coherent()
>> >         dma_get_device_base()
>> >           phys_to_dma()
>> >             translate_phys_to_dma()
>> >
>> > phys_to_dma() by way of translate_phys_to_dma() will nowadays not
>> > provide 1:1 mappings unless the ranges are properly defined in
>> > the device tree and reflected into the dev->dma_range_map.
>>
>> I don't understand this yet, what did the kernel previously
>> do to that allowed the correct DMA mapping when a wrong
>> address was set in the DT?
>
> Ah this goes way back. I think I need a different fixes tag.
> I hadn't tested this platform in a while.
>
> What happens currently in translate_phys_to_dma() is that it returns
> DMA_MAPPING_ERROR (0xffffffff) because of the buggy device tree
> and that causes the problem.
>
> Before the dma direct rework we selected ARCH_HAS_PHYS_TO_DMA
> and arch/arm/include/asm/dma-direct.h did this:
>
> static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn)
> {
>         if (dev)
>                 pfn -= dev->dma_pfn_offset;
>         return (dma_addr_t)__pfn_to_bus(pfn);
> }
> (...)
> static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
> {
>         unsigned int offset = paddr & ~PAGE_MASK;
>         return pfn_to_dma(dev, __phys_to_pfn(paddr)) + offset;
> }
>
> As dma_pfn_offset was 0 this resulted in a 1:1 map.

Ah, I see. So the offset was 0 because the dma-ranges
did not actually point to RAM then, right?

> Odd thing: this flag has no device tree bindings I could
> find. Maybe it's in the really old DT docs?

No idea. I do remember that in the old times, whether DMA was
coherent or not was just a compile-time option, so ppc was
usually assumed to be coherent while Arm was not, but then
we needed something to identify machines that were different.

> Here the special memory appears in the CPU address map.
>
>         bus@c0000000 {
>                 compatible = "arm,integrator-ap-lm";
>                 #address-cells = <1>;
>                 #size-cells = <1>;
>                 ranges = <0xc0000000 0xc0000000 0x40000000>;
>                 dma-ranges;
> (...)
>                 lm0: bus@c0000000 {
>                         compatible = "simple-bus";
>                         ranges = <0x00000000 0xc0000000 0x10000000>;
>                         dma-ranges = <0x00000000 0xc0000000 0x10000000>;
>                         reg = <0xc0000000 0x10000000>;
>                         #address-cells = <1>;
>                         #size-cells = <1>;
>
>                         display@1000000 {
>                                 compatible = "arm,pl110", "arm,primecell";
>                                 reg = <0x01000000 0x1000>;
> (...)
>                                 memory-region = <&impd1_ram>;
>                                 dma-ranges;
> (...)
>
> Here we find the reg properly at physical address 0xc1000000 thanks to
> the ranges: 0xc0000000 + 0x00000000 + 0x01000000 = 0xc1000000.
>
> But the special memory region is in the root of the device tree,
> outside of the translation.
>
> Now dma-ranges will assume that when we translate the memory region
> it is in the address space of the display controller, but it isn't,
> because the phandle goes to something in the root of the device tree.
>
> 0xc2000000 + 0x00000000 + 0x00000000 = 0xc2000000
>
> Whenever I do ranges my head start spinning :/
>
> If you have a better way to accomodate the DMA ranges I am happy to
> comply!

I'm still not sure I understand it. Is that reserved memory
area in RAM or on the logic bus? The 'ranges' property makes
it appear that all of 0xc0000000-0xcfffffff is the logic bus,
but then the vram itself is also in that address range, which
would indicate that it is behind that bus. However, using the
'reserved-memory' property indicates that this is actually
just normal RAM that is carved out from what the OS manages.

      Arnd
Linus Walleij Sept. 23, 2022, 1:54 p.m. UTC | #4
On Fri, Sep 23, 2022 at 3:21 PM Arnd Bergmann <arnd@arndb.de> wrote:
> On Fri, Sep 23, 2022, at 2:56 PM, Linus Walleij wrote:

> > Here the special memory appears in the CPU address map.
> >
> >         bus@c0000000 {
> >                 compatible = "arm,integrator-ap-lm";
> >                 #address-cells = <1>;
> >                 #size-cells = <1>;
> >                 ranges = <0xc0000000 0xc0000000 0x40000000>;
> >                 dma-ranges;
> > (...)
> >                 lm0: bus@c0000000 {
> >                         compatible = "simple-bus";
> >                         ranges = <0x00000000 0xc0000000 0x10000000>;
> >                         dma-ranges = <0x00000000 0xc0000000 0x10000000>;
> >                         reg = <0xc0000000 0x10000000>;
> >                         #address-cells = <1>;
> >                         #size-cells = <1>;
> >
> >                         display@1000000 {
> >                                 compatible = "arm,pl110", "arm,primecell";
> >                                 reg = <0x01000000 0x1000>;
> > (...)
> >                                 memory-region = <&impd1_ram>;
> >                                 dma-ranges;
> > (...)
> >
> > Here we find the reg properly at physical address 0xc1000000 thanks to
> > the ranges: 0xc0000000 + 0x00000000 + 0x01000000 = 0xc1000000.
> >
> > But the special memory region is in the root of the device tree,
> > outside of the translation.
> >
> > Now dma-ranges will assume that when we translate the memory region
> > it is in the address space of the display controller, but it isn't,
> > because the phandle goes to something in the root of the device tree.
> >
> > 0xc2000000 + 0x00000000 + 0x00000000 = 0xc2000000
> >
> > Whenever I do ranges my head start spinning :/
> >
> > If you have a better way to accomodate the DMA ranges I am happy to
> > comply!
>
> I'm still not sure I understand it. Is that reserved memory
> area in RAM or on the logic bus?

It's on the logic bus slot "lm0", so the display controller is
at 0xc1000000 and right below it at 0xc2000000 is 1MB
of dedicated graphics memory.

> The 'ranges' property makes
> it appear that all of 0xc0000000-0xcfffffff is the logic bus,

Yep it is.

> but then the vram itself is also in that address range, which
> would indicate that it is behind that bus.

Yep it is.

>  However, using the
> 'reserved-memory' property indicates that this is actually
> just normal RAM that is carved out from what the OS manages.

The binding document
Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml
unfortunately does not mention anywhere that the reserved memory
must be a subset of the system main RAM "carveout" or anything like that,
it says that  the memory should be "designed for special usage by various
device drivers", and "excluded from normal usage" which it is, hence the
misunderstanding.

We have used this for Vexpress like forever, but I did
the change from some custom vexpress memory property
to shared-dma-pool so I guess that's my fault.
Incidentally this also works very well.

Is there some other mechanism we should be using for dedicated
framebuffer memory on some odd address?

Yours,
Linus Walleij
Arnd Bergmann Sept. 23, 2022, 6:45 p.m. UTC | #5
On Fri, Sep 23, 2022, at 3:54 PM, Linus Walleij wrote:
> On Fri, Sep 23, 2022 at 3:21 PM Arnd Bergmann <arnd@arndb.de> wrote:
>> On Fri, Sep 23, 2022, at 2:56 PM, Linus Walleij wrote:
>
>> but then the vram itself is also in that address range, which
>> would indicate that it is behind that bus.
>
> Yep it is.

Ok.

>>  However, using the
>> 'reserved-memory' property indicates that this is actually
>> just normal RAM that is carved out from what the OS manages.
>
> The binding document
> Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml
> unfortunately does not mention anywhere that the reserved memory
> must be a subset of the system main RAM "carveout" or anything like that,
> it says that  the memory should be "designed for special usage by various
> device drivers", and "excluded from normal usage" which it is, hence the
> misunderstanding.
>
> We have used this for Vexpress like forever, but I did
> the change from some custom vexpress memory property
> to shared-dma-pool so I guess that's my fault.
> Incidentally this also works very well.
>
> Is there some other mechanism we should be using for dedicated
> framebuffer memory on some odd address?

I think the correct way to do this would have been to have
a 'reg' property in the device itself that points to the VRAM,
instead of describing it as memory.

This in turn would mean that the driver would use some
variant of ioremap_wc() to map the framebuffer into kernel
space, rather than using dma_alloc_coherent().

I guess the author of the driver copied the code from another
fbdev driver that was meant for an Arm SoC with a shared
memory framebuffer rather than a dedicated VRAM...

It's probably too late to change now, and the new dma-ranges
property is at least consistent with the usage.

Can you update the changelog text based on the above and
send the patch again with the dma-coherent property removed?

     Arnd
diff mbox series

Patch

diff --git a/arch/arm/boot/dts/integratorap-im-pd1.dts b/arch/arm/boot/dts/integratorap-im-pd1.dts
index 31724753d3f3..ecccbd1777a3 100644
--- a/arch/arm/boot/dts/integratorap-im-pd1.dts
+++ b/arch/arm/boot/dts/integratorap-im-pd1.dts
@@ -248,6 +248,8 @@  display@1000000 {
 		/* 640x480 16bpp @ 25.175MHz is 36827428 bytes/s */
 		max-memory-bandwidth = <40000000>;
 		memory-region = <&impd1_ram>;
+		dma-ranges;
+		dma-coherent;
 
 		port@0 {
 			#address-cells = <1>;
diff --git a/arch/arm/boot/dts/integratorap.dts b/arch/arm/boot/dts/integratorap.dts
index c983435ed492..9148287fa0a9 100644
--- a/arch/arm/boot/dts/integratorap.dts
+++ b/arch/arm/boot/dts/integratorap.dts
@@ -262,7 +262,7 @@  bus@c0000000 {
 		lm0: bus@c0000000 {
 			compatible = "simple-bus";
 			ranges = <0x00000000 0xc0000000 0x10000000>;
-			dma-ranges = <0x00000000 0x80000000 0x10000000>;
+			dma-ranges = <0x00000000 0xc0000000 0x10000000>;
 			reg = <0xc0000000 0x10000000>;
 			#address-cells = <1>;
 			#size-cells = <1>;
@@ -270,7 +270,7 @@  lm0: bus@c0000000 {
 		lm1: bus@d0000000 {
 			compatible = "simple-bus";
 			ranges = <0x00000000 0xd0000000 0x10000000>;
-			dma-ranges = <0x00000000 0x80000000 0x10000000>;
+			dma-ranges = <0x00000000 0xd0000000 0x10000000>;
 			reg = <0xd0000000 0x10000000>;
 			#address-cells = <1>;
 			#size-cells = <1>;
@@ -278,7 +278,7 @@  lm1: bus@d0000000 {
 		lm2: bus@e0000000 {
 			compatible = "simple-bus";
 			ranges = <0x00000000 0xe0000000 0x10000000>;
-			dma-ranges = <0x00000000 0x80000000 0x10000000>;
+			dma-ranges = <0x00000000 0xe0000000 0x10000000>;
 			reg = <0xe0000000 0x10000000>;
 			#address-cells = <1>;
 			#size-cells = <1>;
@@ -286,7 +286,7 @@  lm2: bus@e0000000 {
 		lm3: bus@f0000000 {
 			compatible = "simple-bus";
 			ranges = <0x00000000 0xf0000000 0x10000000>;
-			dma-ranges = <0x00000000 0x80000000 0x10000000>;
+			dma-ranges = <0x00000000 0xf0000000 0x10000000>;
 			reg = <0xf0000000 0x10000000>;
 			#address-cells = <1>;
 			#size-cells = <1>;