Message ID | 1484779180-1344-7-git-send-email-roy.pledge@nxp.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jan 18, 2017 at 05:39:36PM -0500, Roy Pledge wrote: > From: Claudiu Manoil <claudiu.manoil@nxp.com> > > Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> > Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> > Signed-off-by: Roy Pledge <roy.pledge@nxp.com> > --- > drivers/soc/fsl/qbman/qman_ccsr.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c > index 43feaa9..67ae073 100644 > --- a/drivers/soc/fsl/qbman/qman_ccsr.c > +++ b/drivers/soc/fsl/qbman/qman_ccsr.c > @@ -446,8 +446,14 @@ static int zero_priv_mem(struct device *dev, struct device_node *node, > return -ENOMEM; > > memset(tmpp, 0, sz); > +#ifdef CONFIG_PPC > flush_dcache_range((unsigned long)tmpp, > (unsigned long)tmpp + sz); > +#elif defined(CONFIG_ARM) > + __cpuc_flush_dcache_area(tmpp, sz); Please do not fiddle about under the covers, there be dragons there. It looks to me like you're trying to use __cpuc_flush_dcache_area() on an area that's been ioremap()'d, which is a waste of CPU cycles. ioremap()'d areas are mapped as "device" memory type, which means the region isn't even cached. So I don't think this is necessary.
On 01/18/2017 05:12 PM, Russell King - ARM Linux wrote: > On Wed, Jan 18, 2017 at 05:39:36PM -0500, Roy Pledge wrote: >> From: Claudiu Manoil <claudiu.manoil@nxp.com> >> >> Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> >> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> >> Signed-off-by: Roy Pledge <roy.pledge@nxp.com> >> --- >> drivers/soc/fsl/qbman/qman_ccsr.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c >> index 43feaa9..67ae073 100644 >> --- a/drivers/soc/fsl/qbman/qman_ccsr.c >> +++ b/drivers/soc/fsl/qbman/qman_ccsr.c >> @@ -446,8 +446,14 @@ static int zero_priv_mem(struct device *dev, struct device_node *node, >> return -ENOMEM; >> >> memset(tmpp, 0, sz); >> +#ifdef CONFIG_PPC >> flush_dcache_range((unsigned long)tmpp, >> (unsigned long)tmpp + sz); >> +#elif defined(CONFIG_ARM) >> + __cpuc_flush_dcache_area(tmpp, sz); > > Please do not fiddle about under the covers, there be dragons there. > > It looks to me like you're trying to use __cpuc_flush_dcache_area() > on an area that's been ioremap()'d, which is a waste of CPU cycles. > ioremap()'d areas are mapped as "device" memory type, which means > the region isn't even cached. So I don't think this is necessary. It is mapped cachable, via memremap() (see patch 1/10). This is RAM that the device uses, non-coherently, for its own purposes (but requires software to clear it first). Is there a non-"under the covers" way to say "flush this region" without the arch second-guessing whether it really needs to be flushed? -Scott
On 1/18/2017 6:36 PM, Scott Wood wrote: > On 01/18/2017 05:12 PM, Russell King - ARM Linux wrote: >> On Wed, Jan 18, 2017 at 05:39:36PM -0500, Roy Pledge wrote: >>> From: Claudiu Manoil <claudiu.manoil@nxp.com> >>> >>> Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> >>> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> >>> Signed-off-by: Roy Pledge <roy.pledge@nxp.com> >>> --- >>> drivers/soc/fsl/qbman/qman_ccsr.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c >>> index 43feaa9..67ae073 100644 >>> --- a/drivers/soc/fsl/qbman/qman_ccsr.c >>> +++ b/drivers/soc/fsl/qbman/qman_ccsr.c >>> @@ -446,8 +446,14 @@ static int zero_priv_mem(struct device *dev, struct device_node *node, >>> return -ENOMEM; >>> >>> memset(tmpp, 0, sz); >>> +#ifdef CONFIG_PPC >>> flush_dcache_range((unsigned long)tmpp, >>> (unsigned long)tmpp + sz); >>> +#elif defined(CONFIG_ARM) >>> + __cpuc_flush_dcache_area(tmpp, sz); >> Please do not fiddle about under the covers, there be dragons there. >> >> It looks to me like you're trying to use __cpuc_flush_dcache_area() >> on an area that's been ioremap()'d, which is a waste of CPU cycles. >> ioremap()'d areas are mapped as "device" memory type, which means >> the region isn't even cached. So I don't think this is necessary. > It is mapped cachable, via memremap() (see patch 1/10). This is RAM > that the device uses, non-coherently, for its own purposes (but requires > software to clear it first). > > Is there a non-"under the covers" way to say "flush this region" without > the arch second-guessing whether it really needs to be flushed? Any advice on how to resolve this? I looked into trying to do a non-cacheable mapping of the memory so that the flush wouldn't be required but the ioremap code prevents mapping normal memory in this way. The QMan device requiresthis memory to be zeroed at startup. Because the device does non coherent reads and writes to the memory we must ensure that any cache in the CPU cluster(s) is flushed in order to prevent a future castout from overwriting data. This happened before on PPC platforms and made for some very unfun debug session trying to understand what was causing the failure. Roy > -Scott > >
On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote: > > Is there a non-"under the covers" way to say "flush this region" without > > the arch second-guessing whether it really needs to be flushed? > Any advice on how to resolve this? I looked into trying to do a > non-cacheable mapping of the > memory so that the flush wouldn't be required but the ioremap code > prevents mapping normal > memory in this way. The QMan device requiresthis memory to be zeroed at > startup. Because the device > does non coherent reads and writes to the memory we must ensure that any > cache in the CPU cluster(s) > is flushed in order to prevent a future castout from overwriting data. > This happened before on PPC > platforms and made for some very unfun debug session trying to > understand what was causing the failure. If this is normal RAM, you should be able to just write zeroes, and then do a dma_map_single() for initialization. Are there any other requirements what to do with the memory later, is it used for communication at all, or just required to be zero? Arnd
On 01/25/2017 03:20 PM, Arnd Bergmann wrote: > On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote: >>> Is there a non-"under the covers" way to say "flush this region" without >>> the arch second-guessing whether it really needs to be flushed? >> Any advice on how to resolve this? I looked into trying to do a >> non-cacheable mapping of the >> memory so that the flush wouldn't be required but the ioremap code >> prevents mapping normal >> memory in this way. The QMan device requiresthis memory to be zeroed at >> startup. Because the device >> does non coherent reads and writes to the memory we must ensure that any >> cache in the CPU cluster(s) >> is flushed in order to prevent a future castout from overwriting data. >> This happened before on PPC >> platforms and made for some very unfun debug session trying to >> understand what was causing the failure. > > If this is normal RAM, you should be able to just write zeroes, and then > do a dma_map_single() for initialization. The DMA API on PPC currently has no way of knowing that the device accesses this memory incoherently. If that were somehow fixed, we still couldn't use dma_map_single() as it doesn't accept virtual addresses that come from memremap() or similar dynamic mappings. We'd have to convert the physical address to an array of struct pages and pass each one to dma_map_page(). And even if we did all that, there would still be other manual cache manipulation left in this driver, to deal with its cacheable register interface. > Are there any other requirements > what to do with the memory later, is it used for communication at all, > or just required to be zero? It just needs to be zeroed before the device begins using it for internal storage. -Scott
On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> wrote: > On 01/25/2017 03:20 PM, Arnd Bergmann wrote: >> On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote: >> If this is normal RAM, you should be able to just write zeroes, and then >> do a dma_map_single() for initialization. > > The DMA API on PPC currently has no way of knowing that the device > accesses this memory incoherently. Ah, this is because PPC doesn't use the 'dma-coherent' property on devices but just assumes that coherency is a global property of the platform,right? > If that were somehow fixed, we still couldn't use dma_map_single() as it > doesn't accept virtual addresses that come from memremap() or similar > dynamic mappings. We'd have to convert the physical address to an array > of struct pages and pass each one to dma_map_page(). Sorry for my ignorance, but where does the memory come from to start with? Is this in the normal linearly mapped RAM, in a carveout outside of the linear mapping but the same memory, or in some on-chip buffer? > And even if we did all that, there would still be other manual cache > manipulation left in this driver, to deal with its cacheable register > interface. I thought we had concluded that "cacheable register" is something that cannot work reliably on ARM at all when this came up before. Any updates on that? >> Are there any other requirements >> what to do with the memory later, is it used for communication at all, >> or just required to be zero? > > It just needs to be zeroed before the device begins using it for > internal storage. Ok. Arnd
On Fri, 2017-01-27 at 17:41 +0100, Arnd Bergmann wrote: > On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> wrote: > > > > On 01/25/2017 03:20 PM, Arnd Bergmann wrote: > > > > > > On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote: > > > > > > > > If this is normal RAM, you should be able to just write zeroes, and then > > > do a dma_map_single() for initialization. > > The DMA API on PPC currently has no way of knowing that the device > > accesses this memory incoherently. > Ah, this is because PPC doesn't use the 'dma-coherent' property on devices > but just assumes that coherency is a global property of the platform,right? Right. > > If that were somehow fixed, we still couldn't use dma_map_single() as it > > doesn't accept virtual addresses that come from memremap() or similar > > dynamic mappings. We'd have to convert the physical address to an array > > of struct pages and pass each one to dma_map_page(). > Sorry for my ignorance, but where does the memory come from to start > with? Is this in the normal linearly mapped RAM, in a carveout outside > of the linear mapping but the same memory, or in some on-chip buffer? It's RAM that comes from the device tree reserved memory mechanism (drivers/of/of_reserved_mem.c). On a 32-bit kernel it is not guaranteed (or likely) to be lowmem. > > And even if we did all that, there would still be other manual cache > > manipulation left in this driver, to deal with its cacheable register > > interface. > I thought we had concluded that "cacheable register" is something > that cannot work reliably on ARM at all when this came up before. > Any updates on that? I'm not familiar with the details there... My understanding is that the hardware people at NXP are convinced that it can work on these specific chips due to implementation details. -Scott
On Wed, Jan 25, 2017 at 10:20:09PM +0100, Arnd Bergmann wrote: > On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote: > > > Is there a non-"under the covers" way to say "flush this region" without > > > the arch second-guessing whether it really needs to be flushed? > > Any advice on how to resolve this? I looked into trying to do a > > non-cacheable mapping of the > > memory so that the flush wouldn't be required but the ioremap code > > prevents mapping normal > > memory in this way. The QMan device requiresthis memory to be zeroed at > > startup. Because the device > > does non coherent reads and writes to the memory we must ensure that any > > cache in the CPU cluster(s) > > is flushed in order to prevent a future castout from overwriting data. > > This happened before on PPC > > platforms and made for some very unfun debug session trying to > > understand what was causing the failure. > > If this is normal RAM, you should be able to just write zeroes, and then > do a dma_map_single() for initialization. Are there any other requirements > what to do with the memory later, is it used for communication at all, > or just required to be zero? Please don't encourage incorrect DMA API usage. dma_map_single() must always be paired with an unmap at some point, otherwise it's a memory leak if DMA API debugging is enabled. So, if you wish to suggest using dma_map_single(), please also suggest where to use dma_unmap_single() too. Thanks.
On Wed, Jan 18, 2017 at 11:36:18PM +0000, Scott Wood wrote: > On 01/18/2017 05:12 PM, Russell King - ARM Linux wrote: > > Please do not fiddle about under the covers, there be dragons there. > > > > It looks to me like you're trying to use __cpuc_flush_dcache_area() > > on an area that's been ioremap()'d, which is a waste of CPU cycles. > > ioremap()'d areas are mapped as "device" memory type, which means > > the region isn't even cached. So I don't think this is necessary. > > It is mapped cachable, via memremap() (see patch 1/10). This is RAM > that the device uses, non-coherently, for its own purposes (but requires > software to clear it first). Ok. > Is there a non-"under the covers" way to say "flush this region" without > the arch second-guessing whether it really needs to be flushed? memremap() doesn't have any associated cache flushing functionality defined for it - it's a relatively new interface added to the kernel. There are some interfaces defined for memory that was vmalloc'd, but due to the way kernel/memremap.c works, you can't guarantee that the pointer you end up with came from vmalloc'd space, so I don't think suggesting to use those would be correct (as they may have assumptions that they'll only be called with vmalloc'd addresses.) As pointed out later in the thread, using dma_map_single() on the returned address is not permissible either. The only thing I can say is that it needs a proposal for a new arch- independent interface.
On Fri, Jan 27, 2017 at 05:41:10PM +0100, Arnd Bergmann wrote: > On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> wrote: > > And even if we did all that, there would still be other manual cache > > manipulation left in this driver, to deal with its cacheable register > > interface. > > I thought we had concluded that "cacheable register" is something > that cannot work reliably on ARM at all when this came up before. > Any updates on that? There's a big comment in arch/arm/include/asm/io.h which explains everything. Nothing there is likely to change.
On 28/01/17 02:34, Scott Wood wrote: > On Fri, 2017-01-27 at 17:41 +0100, Arnd Bergmann wrote: >> On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> wrote: >>> >>> On 01/25/2017 03:20 PM, Arnd Bergmann wrote: >>>> >>>> On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote: >>> >>>> >>>> If this is normal RAM, you should be able to just write zeroes, and then >>>> do a dma_map_single() for initialization. >>> The DMA API on PPC currently has no way of knowing that the device >>> accesses this memory incoherently. >> Ah, this is because PPC doesn't use the 'dma-coherent' property on devices >> but just assumes that coherency is a global property of the platform,right? > > Right. > >>> If that were somehow fixed, we still couldn't use dma_map_single() as it >>> doesn't accept virtual addresses that come from memremap() or similar >>> dynamic mappings. We'd have to convert the physical address to an array >>> of struct pages and pass each one to dma_map_page(). >> Sorry for my ignorance, but where does the memory come from to start >> with? Is this in the normal linearly mapped RAM, in a carveout outside >> of the linear mapping but the same memory, or in some on-chip buffer? > > It's RAM that comes from the device tree reserved memory mechanism > (drivers/of/of_reserved_mem.c). On a 32-bit kernel it is not guaranteed (or > likely) to be lowmem. Wouldn't dma_declare_coherent_memory() be the appropriate tool for that job, then (modulo the PPC issue)? On ARM that should result in dma_alloc_coherent() giving back a non-cacheable mapping if the device is non-coherent, wherein a dmb_wmb() after writing the data from the CPU side should be enough to ensure it is published to the device. Robin. >>> And even if we did all that, there would still be other manual cache >>> manipulation left in this driver, to deal with its cacheable register >>> interface. >> I thought we had concluded that "cacheable register" is something >> that cannot work reliably on ARM at all when this came up before. >> Any updates on that? > > I'm not familiar with the details there... My understanding is that the > hardware people at NXP are convinced that it can work on these specific chips > due to implementation details. > > -Scott > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
On 1/30/2017 10:31 AM, Robin Murphy wrote: > On 28/01/17 02:34, Scott Wood wrote: >> On Fri, 2017-01-27 at 17:41 +0100, Arnd Bergmann wrote: >>> On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> wrote: >>>> On 01/25/2017 03:20 PM, Arnd Bergmann wrote: >>>>> On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote: >>>>> If this is normal RAM, you should be able to just write zeroes, and then >>>>> do a dma_map_single() for initialization. >>>> The DMA API on PPC currently has no way of knowing that the device >>>> accesses this memory incoherently. >>> Ah, this is because PPC doesn't use the 'dma-coherent' property on devices >>> but just assumes that coherency is a global property of the platform,right? >> Right. >> >>>> If that were somehow fixed, we still couldn't use dma_map_single() as it >>>> doesn't accept virtual addresses that come from memremap() or similar >>>> dynamic mappings. We'd have to convert the physical address to an array >>>> of struct pages and pass each one to dma_map_page(). >>> Sorry for my ignorance, but where does the memory come from to start >>> with? Is this in the normal linearly mapped RAM, in a carveout outside >>> of the linear mapping but the same memory, or in some on-chip buffer? >> It's RAM that comes from the device tree reserved memory mechanism >> (drivers/of/of_reserved_mem.c). On a 32-bit kernel it is not guaranteed (or >> likely) to be lowmem. > Wouldn't dma_declare_coherent_memory() be the appropriate tool for that > job, then (modulo the PPC issue)? On ARM that should result in > dma_alloc_coherent() giving back a non-cacheable mapping if the device > is non-coherent, wherein a dmb_wmb() after writing the data from the CPU > side should be enough to ensure it is published to the device. I think there is some confusion here (and it may very well be mine). My understanding is that the dma_declare_coherent_memory() API sets up a region of memory that will be managed by dma_alloc_coherent() and friends. This is useful if the driver needs to manage a region of on device memory but that isn't what this specific region is used for. The memory that is trying to be initialized here is a big chunk (possible multiple megabytes) of RAM that only the QBMan device will access with one exception - at initialization the device expects software to zero the memory before starting to use the device. Since the CPUs will never access this region again the device does non-coherent/non-shareable accesses for performance reasons since QBMan device is the only user - no need to maintain coherency with core side caches. We used the of_reserve memory as that seemed to be the best reliable way to guarantee that we would get a properly aligned contiguous allocation and it's been working well. The downside is that the contents of that memory is undefined so we had to map it, zero it and flush the cache in order to get the RAM into the desired state and make sure we don't get hit by a random castout in the future. Would it make sense to add an option in the of_reserved_mem system to fill the memory? I haven't looked at the feasibility of that but it seems like a generic solution that could be useful to others. We could add the fill value to the device tree so you could initialize to any pattern. - Roy > > Robin. > >>>> And even if we did all that, there would still be other manual cache >>>> manipulation left in this driver, to deal with its cacheable register >>>> interface. >>> I thought we had concluded that "cacheable register" is something >>> that cannot work reliably on ARM at all when this came up before. >>> Any updates on that? >> I'm not familiar with the details there... My understanding is that the >> hardware people at NXP are convinced that it can work on these specific chips >> due to implementation details. >> >> -Scott >> >> >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >> >
On Monday, January 30, 2017 3:19:33 PM CET Russell King - ARM Linux wrote: > On Fri, Jan 27, 2017 at 05:41:10PM +0100, Arnd Bergmann wrote: > > On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> wrote: > > > And even if we did all that, there would still be other manual cache > > > manipulation left in this driver, to deal with its cacheable register > > > interface. > > > > I thought we had concluded that "cacheable register" is something > > that cannot work reliably on ARM at all when this came up before. > > Any updates on that? > > There's a big comment in arch/arm/include/asm/io.h which explains > everything. Nothing there is likely to change. Ah right, so since there is no interface to map as "device writeback", the driver would likely end up with an MMIO register in a "normal writeback", with these properties (among others listed there): * - writes can be repeated (in certain cases) with no side effects * - writes can be merged before accessing the target * - unaligned accesses can be supported * - ordering is not guaranteed without explicit dependencies or barrier * instructions * - writes may be delayed before they hit the endpoint memory I think the ordering is the most critical here, as IIRC the driver expects a whole cache line to be written into the MMIO register in a single bus cycle specifically when we issue the flush from the driver, but we can actually have partial writebacks at any time. Arnd
On Monday, January 30, 2017 3:04:28 PM CET Russell King - ARM Linux wrote: > On Wed, Jan 25, 2017 at 10:20:09PM +0100, Arnd Bergmann wrote: > > On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote: > > > > Is there a non-"under the covers" way to say "flush this region" without > > > > the arch second-guessing whether it really needs to be flushed? > > > Any advice on how to resolve this? I looked into trying to do a > > > non-cacheable mapping of the > > > memory so that the flush wouldn't be required but the ioremap code > > > prevents mapping normal > > > memory in this way. The QMan device requiresthis memory to be zeroed at > > > startup. Because the device > > > does non coherent reads and writes to the memory we must ensure that any > > > cache in the CPU cluster(s) > > > is flushed in order to prevent a future castout from overwriting data. > > > This happened before on PPC > > > platforms and made for some very unfun debug session trying to > > > understand what was causing the failure. > > > > If this is normal RAM, you should be able to just write zeroes, and then > > do a dma_map_single() for initialization. Are there any other requirements > > what to do with the memory later, is it used for communication at all, > > or just required to be zero? > > Please don't encourage incorrect DMA API usage. > > dma_map_single() must always be paired with an unmap at some point, > otherwise it's a memory leak if DMA API debugging is enabled. So, > if you wish to suggest using dma_map_single(), please also suggest > where to use dma_unmap_single() too. The mapping obviously has to exist the whole time the device is in use, and we should not call dma_unmap_single() until the device is released by the driver. Unfortunately it seems we can't use this interface though, as the device is already configured to a specific reserved memory range, and we can't just use get_free_pages() here. Arnd
On 30/01/17 19:04, Roy Pledge wrote: > On 1/30/2017 10:31 AM, Robin Murphy wrote: >> On 28/01/17 02:34, Scott Wood wrote: >>> On Fri, 2017-01-27 at 17:41 +0100, Arnd Bergmann wrote: >>>> On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> wrote: >>>>> On 01/25/2017 03:20 PM, Arnd Bergmann wrote: >>>>>> On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote: >>>>>> If this is normal RAM, you should be able to just write zeroes, and then >>>>>> do a dma_map_single() for initialization. >>>>> The DMA API on PPC currently has no way of knowing that the device >>>>> accesses this memory incoherently. >>>> Ah, this is because PPC doesn't use the 'dma-coherent' property on devices >>>> but just assumes that coherency is a global property of the platform,right? >>> Right. >>> >>>>> If that were somehow fixed, we still couldn't use dma_map_single() as it >>>>> doesn't accept virtual addresses that come from memremap() or similar >>>>> dynamic mappings. We'd have to convert the physical address to an array >>>>> of struct pages and pass each one to dma_map_page(). >>>> Sorry for my ignorance, but where does the memory come from to start >>>> with? Is this in the normal linearly mapped RAM, in a carveout outside >>>> of the linear mapping but the same memory, or in some on-chip buffer? >>> It's RAM that comes from the device tree reserved memory mechanism >>> (drivers/of/of_reserved_mem.c). On a 32-bit kernel it is not guaranteed (or >>> likely) to be lowmem. >> Wouldn't dma_declare_coherent_memory() be the appropriate tool for that >> job, then (modulo the PPC issue)? On ARM that should result in >> dma_alloc_coherent() giving back a non-cacheable mapping if the device >> is non-coherent, wherein a dmb_wmb() after writing the data from the CPU >> side should be enough to ensure it is published to the device. > I think there is some confusion here (and it may very well be mine). > > My understanding is that the dma_declare_coherent_memory() API sets up a > region of memory that will be managed by dma_alloc_coherent() and > friends. This is useful if the driver needs to manage a region of on > device memory but that isn't what this specific region is used for. It's a bit more general than that - dma_alloc_coherent() can essentially be considered "give me some memory for this device to use". We already have use-cases where such buffers are only ever accessed by the device (e.g. some display controllers, and future NVMe devices), hence DMA_ATTR_NO_KERNEL_MAPPING on ARM to save the vmalloc space. A DMA allocation also inherently guarantees appropriate alignment, regardless of whether you're using a per-device reservation or just regular CMA, and will also zero the underlying memory (and for a non-coherent device perform whatever cache maintenance is necessary, if the clearing isn't already done via a non-cacheable mapping). All you need to do in the driver is allocate your buffer and hand the resulting address off to the device at probe (after optionally checking for a reservation in DT and declaring it), then free it at remove, which also ends up far more self-documenting (IMO) than a bunch of open-coded remapping and #ifdef'ed architecture-private cache shenanigans. > The memory that is trying to be initialized here is a big chunk > (possible multiple megabytes) of RAM that only the QBMan device will > access with one exception - at initialization the device expects > software to zero the memory before starting to use the device. Since the > CPUs will never access this region again the device does > non-coherent/non-shareable accesses for performance reasons since QBMan > device is the only user - no need to maintain coherency with core side > caches. > > We used the of_reserve memory as that seemed to be the best reliable way > to guarantee that we would get a properly aligned contiguous allocation > and it's been working well. The downside is that the contents of that > memory is undefined so we had to map it, zero it and flush the cache in > order to get the RAM into the desired state and make sure we don't get > hit by a random castout in the future. > > Would it make sense to add an option in the of_reserved_mem system to > fill the memory? I haven't looked at the feasibility of that but it > seems like a generic solution that could be useful to others. We could > add the fill value to the device tree so you could initialize to any > pattern. In short, said generic solution is right there already, only the PPC arch code might need tweaking to accommodate it :) Robin. > > - Roy > >> >> Robin. >> >>>>> And even if we did all that, there would still be other manual cache >>>>> manipulation left in this driver, to deal with its cacheable register >>>>> interface. >>>> I thought we had concluded that "cacheable register" is something >>>> that cannot work reliably on ARM at all when this came up before. >>>> Any updates on that? >>> I'm not familiar with the details there... My understanding is that the >>> hardware people at NXP are convinced that it can work on these specific chips >>> due to implementation details. >>> >>> -Scott >>> >>> >>> _______________________________________________ >>> linux-arm-kernel mailing list >>> linux-arm-kernel@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>> >> >
On Wed, 2017-02-01 at 13:03 +0000, Robin Murphy wrote: > On 30/01/17 19:04, Roy Pledge wrote: > > > > On 1/30/2017 10:31 AM, Robin Murphy wrote: > > > > > > On 28/01/17 02:34, Scott Wood wrote: > > > > > > > > On Fri, 2017-01-27 at 17:41 +0100, Arnd Bergmann wrote: > > > > > > > > > > On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> > > > > > wrote: > > > > > > > > > > > > On 01/25/2017 03:20 PM, Arnd Bergmann wrote: > > > > > > > > > > > > > > On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote: > > > > > > > If this is normal RAM, you should be able to just write zeroes, > > > > > > > and then > > > > > > > do a dma_map_single() for initialization. > > > > > > The DMA API on PPC currently has no way of knowing that the device > > > > > > accesses this memory incoherently. > > > > > Ah, this is because PPC doesn't use the 'dma-coherent' property on > > > > > devices > > > > > but just assumes that coherency is a global property of the > > > > > platform,right? > > > > Right. > > > > > > > > > > > > > > > > > > > > > If that were somehow fixed, we still couldn't use dma_map_single() > > > > > > as it > > > > > > doesn't accept virtual addresses that come from memremap() or > > > > > > similar > > > > > > dynamic mappings. We'd have to convert the physical address to an > > > > > > array > > > > > > of struct pages and pass each one to dma_map_page(). > > > > > Sorry for my ignorance, but where does the memory come from to start > > > > > with? Is this in the normal linearly mapped RAM, in a carveout > > > > > outside > > > > > of the linear mapping but the same memory, or in some on-chip > > > > > buffer? > > > > It's RAM that comes from the device tree reserved memory mechanism > > > > (drivers/of/of_reserved_mem.c). On a 32-bit kernel it is not > > > > guaranteed (or > > > > likely) to be lowmem. > > > Wouldn't dma_declare_coherent_memory() be the appropriate tool for that > > > job, then (modulo the PPC issue)? On ARM that should result in > > > dma_alloc_coherent() giving back a non-cacheable mapping if the device > > > is non-coherent, wherein a dmb_wmb() after writing the data from the CPU > > > side should be enough to ensure it is published to the device. > > I think there is some confusion here (and it may very well be mine). > > > > My understanding is that the dma_declare_coherent_memory() API sets up a > > region of memory that will be managed by dma_alloc_coherent() and > > friends. This is useful if the driver needs to manage a region of on > > device memory but that isn't what this specific region is used for. > It's a bit more general than that - dma_alloc_coherent() can essentially > be considered "give me some memory for this device to use". We already > have use-cases where such buffers are only ever accessed by the device > (e.g. some display controllers, and future NVMe devices), hence > DMA_ATTR_NO_KERNEL_MAPPING on ARM to save the vmalloc space. That doesn't deal with the fact that on PPC the DMA API will assume that DMA is coherent -- and in fact providing non-cacheable memory is difficult on PPC because the memory is covered by large-page cacheable mappings and mixing cacheable and non-cacheable mappings is strictly forbidden (not just in terms of coherence -- I've seen mysterious machine checks generated). The idea behind the DMA API is that the platform knows better than the driver how DMA needs to be handled, and usually that's correct -- but sometimes, with integrated SoC devices whose programming model was designed from the perspective of the entire platform, it isn't. > A DMA allocation also inherently guarantees appropriate alignment, > regardless of whether you're using a per-device reservation or just > regular CMA, When "appropriate alignment" is many megabytes, you're more likely to waste memory when you try to guarantee alignment on a secondary allocation than when you're doing the aligned allocation directly from the main memory pool. And how exactly does the DMA API know what alignment this particular allocation needs? dma_alloc_coherent() doesn't take alignment as a parameter. > and will also zero the underlying memory (and for a > non-coherent device perform whatever cache maintenance is necessary, if > the clearing isn't already done via a non-cacheable mapping). > > All you need to do in the driver is allocate your buffer and hand the > resulting address off to the device at probe (after optionally checking > for a reservation in DT and declaring it), then free it at remove, which > also ends up far more self-documenting (IMO) It might be more self-documenting but as I pointed out earlier in this thread it doesn't *work* without PPC arch work, and due to the mapping issues mentioned above fixing the PPC arch (and in particular, this subarch) to handle this would be difficult. > than a bunch of open-coded remapping and #ifdef'ed architecture-private > cache shenanigans. The only reason for the ifdefs is because arches can't agree on what to call the function that actually does an unconditional cache flush. And as I also pointed out earlier, this is not the only place where this driver needs to do cache flushing. -Scott
On Wed, Feb 01, 2017 at 01:47:46PM +0100, Arnd Bergmann wrote: > Ah right, so since there is no interface to map as "device writeback", > the driver would likely end up with an MMIO register in a > "normal writeback", with these properties (among others listed there): It's not about there being no _interface_, there's no support in the architecture for it. The use of the "device" memory type does not allow for any form of cache hints to be specified. > * - writes can be repeated (in certain cases) with no side effects > * - writes can be merged before accessing the target > * - unaligned accesses can be supported > * - ordering is not guaranteed without explicit dependencies or barrier > * instructions > * - writes may be delayed before they hit the endpoint memory > > I think the ordering is the most critical here, as IIRC the driver > expects a whole cache line to be written into the MMIO register > in a single bus cycle specifically when we issue the flush from the > driver, but we can actually have partial writebacks at any time. I don't think that's achievable, unless your data bus is as wide as a cache line (iow, for a 32-byte cache line, your data bus all the way to the peripheral would need to be 256 bits wide to write the cache line in a single bus cycle.)
On 2/1/2017 6:17 PM, Russell King - ARM Linux wrote: > On Wed, Feb 01, 2017 at 01:47:46PM +0100, Arnd Bergmann wrote: >> Ah right, so since there is no interface to map as "device writeback", >> the driver would likely end up with an MMIO register in a >> "normal writeback", with these properties (among others listed there): > It's not about there being no _interface_, there's no support in the > architecture for it. The use of the "device" memory type does not > allow for any form of cache hints to be specified. > >> * - writes can be repeated (in certain cases) with no side effects >> * - writes can be merged before accessing the target >> * - unaligned accesses can be supported >> * - ordering is not guaranteed without explicit dependencies or barrier >> * instructions >> * - writes may be delayed before they hit the endpoint memory >> >> I think the ordering is the most critical here, as IIRC the driver >> expects a whole cache line to be written into the MMIO register >> in a single bus cycle specifically when we issue the flush from the >> driver, but we can actually have partial writebacks at any time. > I don't think that's achievable, unless your data bus is as wide as > a cache line (iow, for a 32-byte cache line, your data bus all the > way to the peripheral would need to be 256 bits wide to write the > cache line in a single bus cycle.) > The device doesn't expect a whole cache line to be written at once. The protocol protects against cast outs/write backs by using an alternating valid bit that is set after a memory barrier is executed. The driver ensures that the valid bit is flipped last and the ARM architecture protects that the device sees the writes in the proper order due to the barrier. In this case portal memory is on device memory so the writes from the CPU are backed by the device - there is no read transaction from the device for any portal memory. I also want to point out that the memory we've been discussion for the flush has nothing to do with the software portal mechanism, the memory here is for queue storage when queues grow in length. All this patch is trying to do is zero memory obtained from the of_reserved memory systems and ensure it isn't in the cache once the device starts using it. I will try looking at the dma_alloc_coherent() suggestions from Robin but I beleive when we tried that before it couldn't deal with the multiple megabyte allocations due to fragmentation.
On 2/1/2017 8:03 AM, Robin Murphy wrote: > >> My understanding is that the dma_declare_coherent_memory() API sets up a >> region of memory that will be managed by dma_alloc_coherent() and >> friends. This is useful if the driver needs to manage a region of on >> device memory but that isn't what this specific region is used for. > It's a bit more general than that - dma_alloc_coherent() can essentially > be considered "give me some memory for this device to use". We already > have use-cases where such buffers are only ever accessed by the device > (e.g. some display controllers, and future NVMe devices), hence > DMA_ATTR_NO_KERNEL_MAPPING on ARM to save the vmalloc space. > > A DMA allocation also inherently guarantees appropriate alignment, > regardless of whether you're using a per-device reservation or just > regular CMA, and will also zero the underlying memory (and for a > non-coherent device perform whatever cache maintenance is necessary, if > the clearing isn't already done via a non-cacheable mapping). > > All you need to do in the driver is allocate your buffer and hand the > resulting address off to the device at probe (after optionally checking > for a reservation in DT and declaring it), then free it at remove, which > also ends up far more self-documenting (IMO) than a bunch of open-coded > remapping and #ifdef'ed architecture-private cache shenanigans. I found some time to work on this but the dma_declare_coherent_memory() API does not allow me to pass normal memory for the device to use. Depending on which flag is used I get one of two behaviors. If I pass DMA_MEMORY_MAP the path attempts to do a memremap() which fails with the message "memremap attempted on ram". If I use DMA_MEMORY_IO then it attempts to use ioremap() on the memory which fails for a similar reason (Not allowed to map ram). Using dma_alloc_coherent() fails in our basic use case because it is unable to obtain 8 megabytes of RAM. Why don't I simplify this and use dma_sync_single_for_device() to flush the cache? This seems to me to be portable and the simplest solution to this problem. The device reports non-coherent when queried using is_device_dma_coherent() and my very quick walk through the code seems to indicate that the call will do the right thing. Roy Roy
On Mon, Feb 06, 2017 at 10:26:26PM +0000, Roy Pledge wrote: > Why don't I simplify this and use dma_sync_single_for_device() to flush > the cache? ... which can only be used on direct-mapped memory.
On 2/6/2017 5:37 PM, Russell King - ARM Linux wrote: > On Mon, Feb 06, 2017 at 10:26:26PM +0000, Roy Pledge wrote: >> Why don't I simplify this and use dma_sync_single_for_device() to flush >> the cache? > ... which can only be used on direct-mapped memory. > Then I don't see another (or understand) any other option. There is no way I can add a large area of physical memory carved out by of_reserve_mem to a DMA allocator since both ioremap and memremap throw an error if I try to do so. I appreciate the suggestions from people to resolve this but so far there isn't a workable solution to the problem. As far as I can tell other users of dma_declare_coherent_memory() are using it for device backed memory, not DDR. It sounds like a solution on the surface but when you dig into the details it just doesn't work. Please let me know if I'm missing something, I really want to close this issue so we can get upstream support for this device on ARM. Roy
On 07/02/17 16:44, Roy Pledge wrote: > On 2/6/2017 5:37 PM, Russell King - ARM Linux wrote: >> On Mon, Feb 06, 2017 at 10:26:26PM +0000, Roy Pledge wrote: >>> Why don't I simplify this and use dma_sync_single_for_device() to flush >>> the cache? >> ... which can only be used on direct-mapped memory. >> > Then I don't see another (or understand) any other option. There is no > way I can add a large area of physical memory carved out by > of_reserve_mem to a DMA allocator since both ioremap and memremap throw > an error if I try to do so. I appreciate the suggestions from people to > resolve this but so far there isn't a workable solution to the problem. > As far as I can tell other users of dma_declare_coherent_memory() are > using it for device backed memory, not DDR. It sounds like a solution > on the surface but when you dig into the details it just doesn't work. > Please let me know if I'm missing something, I really want to close this > issue so we can get upstream support for this device on ARM. Apologies, having found time to take a proper look, it transpires I was somewhat misremembering things. You're right that dma_declare_coherent_memory() is for resources owned by the device, and a DT reservation doesn't really count as such. What I had mixed up is that a reserved-memory node has its own way of getting tied into CMA automatically: ensure it has the "shared-dma-pool" compatible, point a memory-region property at it from your device's node, then have the driver call of_reserved_mem_device_init() - at that point, your DMA allocations should be coming out of the given region so you should be free to just grab the whole lot in one go. I've just confirmed that experimentally by bodging a reserved-memory node into my Juno DT for the ARM HDLCD - another non-coherent device which needs ~8MB contiguous buffers - and it works as expected. Robin. > > > Roy >
On 2/7/2017 1:26 PM, Robin Murphy wrote: > On 07/02/17 16:44, Roy Pledge wrote: >> On 2/6/2017 5:37 PM, Russell King - ARM Linux wrote: >>> On Mon, Feb 06, 2017 at 10:26:26PM +0000, Roy Pledge wrote: >>>> Why don't I simplify this and use dma_sync_single_for_device() to flush >>>> the cache? >>> ... which can only be used on direct-mapped memory. >>> >> Then I don't see another (or understand) any other option. There is no >> way I can add a large area of physical memory carved out by >> of_reserve_mem to a DMA allocator since both ioremap and memremap throw >> an error if I try to do so. I appreciate the suggestions from people to >> resolve this but so far there isn't a workable solution to the problem. >> As far as I can tell other users of dma_declare_coherent_memory() are >> using it for device backed memory, not DDR. It sounds like a solution >> on the surface but when you dig into the details it just doesn't work. >> Please let me know if I'm missing something, I really want to close this >> issue so we can get upstream support for this device on ARM. > Apologies, having found time to take a proper look, it transpires I was > somewhat misremembering things. You're right that > dma_declare_coherent_memory() is for resources owned by the device, and > a DT reservation doesn't really count as such. What I had mixed up is > that a reserved-memory node has its own way of getting tied into CMA > automatically: ensure it has the "shared-dma-pool" compatible, point a > memory-region property at it from your device's node, then have the > driver call of_reserved_mem_device_init() - at that point, your DMA > allocations should be coming out of the given region so you should be > free to just grab the whole lot in one go. > > I've just confirmed that experimentally by bodging a reserved-memory > node into my Juno DT for the ARM HDLCD - another non-coherent device > which needs ~8MB contiguous buffers - and it works as expected. > > Robin. This makes sense to me now, but I run into an issue when I try this on my development system. I added 'compatible = "shared-dma-pool" into my device tree and the set the memory-region property for my device and I see this now at boottime: [ 0.000000] Reserved memory: created DMA memory pool at 0x00000009ff000000, size 8 MiB [ 0.000000] OF: reserved mem: initialized node qman-fqd, compatible id shared-dma-pool but when I call of_reserved_mem_device_init() during my device probe routine it fails as follows: [ 0.738814] memremap attempted on ram 0x00000009ff000000 size: 0x800000 [ 0.745485] ------------[ cut here ]------------ [ 0.750132] WARNING: CPU: 0 PID: 1 at kernel/memremap.c:111 memremap+0x100/0x144 <....cut out a bunch of register dumps....> [ 0.973780] [<ffff00000816a748>] memremap+0x100/0x144 [ 0.978859] [<ffff00000853c1d0>] dma_init_coherent_memory+0x60/0x194 [ 0.985249] [<ffff00000853c7b0>] rmem_dma_device_init+0x64/0x94 [ 0.991202] [<ffff000008775c88>] of_reserved_mem_device_init_by_idx+0xf8/0x198 [ 0.998468] [<ffff00000849b1ac>] fsl_qman_probe+0xdc/0x804 [ 1.003983] [<ffff000008526538>] platform_drv_probe+0x50/0xbc [ 1.009760] [<ffff00000852489c>] driver_probe_device+0x220/0x2b8 [ 1.015799] [<ffff0000085249d8>] __driver_attach+0xa4/0xa8 [ 1.021313] [<ffff000008522908>] bus_for_each_dev+0x58/0x98 [ 1.026915] [<ffff0000085241d8>] driver_attach+0x20/0x28 [ 1.032254] [<ffff000008523d98>] bus_add_driver+0x1c8/0x22c [ 1.037856] [<ffff0000085252d8>] driver_register+0x60/0xf4 [ 1.043371] [<ffff00000852647c>] __platform_driver_register+0x48/0x50 [ 1.049849] [<ffff000008d5800c>] fsl_qman_driver_init+0x18/0x20 [ 1.055802] [<ffff000008083148>] do_one_initcall+0x38/0x118 [ 1.061405] [<ffff000008d20cec>] kernel_init_freeable+0x1a0/0x240 [ 1.067533] [<ffff0000088f8298>] kernel_init+0x10/0xfc [ 1.072697] [<ffff000008082ec0>] ret_from_fork+0x10/0x50 [ 1.078055] Reserved memory: failed to init DMA memory pool at 0x00000009ff000000, size 8 MiB Is there another attribute I need to set in the device tree? It seems to be trying to setup the pool but failing to do the mapping similar to the issues I faced earlier. Thanks again Roy > >> >> Roy >> >
On 2/13/2017 4:26 PM, Roy Pledge wrote: > On 2/7/2017 1:26 PM, Robin Murphy wrote: >> On 07/02/17 16:44, Roy Pledge wrote: >>> On 2/6/2017 5:37 PM, Russell King - ARM Linux wrote: >>>> On Mon, Feb 06, 2017 at 10:26:26PM +0000, Roy Pledge wrote: >>>>> Why don't I simplify this and use dma_sync_single_for_device() to flush >>>>> the cache? >>>> ... which can only be used on direct-mapped memory. >>>> >>> Then I don't see another (or understand) any other option. There is no >>> way I can add a large area of physical memory carved out by >>> of_reserve_mem to a DMA allocator since both ioremap and memremap throw >>> an error if I try to do so. I appreciate the suggestions from people to >>> resolve this but so far there isn't a workable solution to the problem. >>> As far as I can tell other users of dma_declare_coherent_memory() are >>> using it for device backed memory, not DDR. It sounds like a solution >>> on the surface but when you dig into the details it just doesn't work. >>> Please let me know if I'm missing something, I really want to close this >>> issue so we can get upstream support for this device on ARM. >> Apologies, having found time to take a proper look, it transpires I was >> somewhat misremembering things. You're right that >> dma_declare_coherent_memory() is for resources owned by the device, and >> a DT reservation doesn't really count as such. What I had mixed up is >> that a reserved-memory node has its own way of getting tied into CMA >> automatically: ensure it has the "shared-dma-pool" compatible, point a >> memory-region property at it from your device's node, then have the >> driver call of_reserved_mem_device_init() - at that point, your DMA >> allocations should be coming out of the given region so you should be >> free to just grab the whole lot in one go. >> >> I've just confirmed that experimentally by bodging a reserved-memory >> node into my Juno DT for the ARM HDLCD - another non-coherent device >> which needs ~8MB contiguous buffers - and it works as expected. >> >> Robin. > This makes sense to me now, but I run into an issue when I try this on > my development system. > > I added 'compatible = "shared-dma-pool" into my device tree and the set > the memory-region property for my device and I see this now at boottime: > > [ 0.000000] Reserved memory: created DMA memory pool at > 0x00000009ff000000, size 8 MiB > [ 0.000000] OF: reserved mem: initialized node qman-fqd, compatible > id shared-dma-pool > > but when I call of_reserved_mem_device_init() during my device probe > routine it fails as follows: > > [ 0.738814] memremap attempted on ram 0x00000009ff000000 size: 0x800000 > [ 0.745485] ------------[ cut here ]------------ > [ 0.750132] WARNING: CPU: 0 PID: 1 at kernel/memremap.c:111 > memremap+0x100/0x144 > > <....cut out a bunch of register dumps....> > > [ 0.973780] [<ffff00000816a748>] memremap+0x100/0x144 > [ 0.978859] [<ffff00000853c1d0>] dma_init_coherent_memory+0x60/0x194 > [ 0.985249] [<ffff00000853c7b0>] rmem_dma_device_init+0x64/0x94 > [ 0.991202] [<ffff000008775c88>] > of_reserved_mem_device_init_by_idx+0xf8/0x198 > [ 0.998468] [<ffff00000849b1ac>] fsl_qman_probe+0xdc/0x804 > [ 1.003983] [<ffff000008526538>] platform_drv_probe+0x50/0xbc > [ 1.009760] [<ffff00000852489c>] driver_probe_device+0x220/0x2b8 > [ 1.015799] [<ffff0000085249d8>] __driver_attach+0xa4/0xa8 > [ 1.021313] [<ffff000008522908>] bus_for_each_dev+0x58/0x98 > [ 1.026915] [<ffff0000085241d8>] driver_attach+0x20/0x28 > [ 1.032254] [<ffff000008523d98>] bus_add_driver+0x1c8/0x22c > [ 1.037856] [<ffff0000085252d8>] driver_register+0x60/0xf4 > [ 1.043371] [<ffff00000852647c>] __platform_driver_register+0x48/0x50 > [ 1.049849] [<ffff000008d5800c>] fsl_qman_driver_init+0x18/0x20 > [ 1.055802] [<ffff000008083148>] do_one_initcall+0x38/0x118 > [ 1.061405] [<ffff000008d20cec>] kernel_init_freeable+0x1a0/0x240 > [ 1.067533] [<ffff0000088f8298>] kernel_init+0x10/0xfc > [ 1.072697] [<ffff000008082ec0>] ret_from_fork+0x10/0x50 > [ 1.078055] Reserved memory: failed to init DMA memory pool at > 0x00000009ff000000, size 8 MiB > > Is there another attribute I need to set in the device tree? It seems to > be trying to setup the pool but failing to do the mapping similar to the > issues I faced earlier. > > Thanks again > > Roy Just wanted to follow up on this - turns out the no-map; attribute is what I was missing. I now have the suggested mechanism working for ARM. However when I went to test on PowePC the region wasn't being enabled properly. It turns out the Kconfig in arch/powerpc doesn't define HAVE_GENERIC_DMA_COHERENT so this mechanism isn't available. However I added it and everything worked on my PPC board as it did on ARM. Does anyone know why HAVE_GENERIC_DMA_COHERENT isn't defined for PowerPC? I did a quick search but didn't see previous discussion on this topic. > >>> Roy >>> >
On Thu, 2017-03-16 at 00:43 +0000, Roy Pledge wrote: > Just wanted to follow up on this - turns out the no-map; attribute is > what I was missing. I now have the suggested mechanism working for ARM. > However when I went to test on PowePC the region wasn't being enabled > properly. It turns out the Kconfig in arch/powerpc doesn't define > HAVE_GENERIC_DMA_COHERENT so this mechanism isn't available. However I > added it and everything worked on my PPC board as it did on ARM. > > Does anyone know why HAVE_GENERIC_DMA_COHERENT isn't defined for > PowerPC? I did a quick search but didn't see previous discussion on this > topic. Probably because nothing on PPC needed it. In any case, the problems with doing this on PPC run deeper than that, as I pointed out in: https://www.spinics.net/lists/arm-kernel/msg560020.html -Scott
On 3/16/2017 4:08 PM, Scott Wood wrote: > On Thu, 2017-03-16 at 00:43 +0000, Roy Pledge wrote: >> Just wanted to follow up on this - turns out the no-map; attribute is >> what I was missing. I now have the suggested mechanism working for ARM. >> However when I went to test on PowePC the region wasn't being enabled >> properly. It turns out the Kconfig in arch/powerpc doesn't define >> HAVE_GENERIC_DMA_COHERENT so this mechanism isn't available. However I >> added it and everything worked on my PPC board as it did on ARM. >> >> Does anyone know why HAVE_GENERIC_DMA_COHERENT isn't defined for >> PowerPC? I did a quick search but didn't see previous discussion on this >> topic. > Probably because nothing on PPC needed it. > > In any case, the problems with doing this on PPC run deeper than that, as I > pointed out in: > https://www.spinics.net/lists/arm-kernel/msg560020.html I just pushed out an RFC patchset that uses dma_zalloc_coherent() to setup the memory mappings as Robin suggested. This is working in our regressions for both PPC and ARM but I only pushed the PPC changes for now to keep the patchset small. Scott, I'm not clear on why you don't think this will work for PPC. I agree there are issues with flush/invalidate in the software portal area but this patch just deals with private memory allocations. If you don't think this approach is valid on PPC then I'm not sure how else to implement this in a platform generic way. I'm looking forward to your feedback as well as Robin's and others. Thanks - Roy > -Scott > >
diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c index 43feaa9..67ae073 100644 --- a/drivers/soc/fsl/qbman/qman_ccsr.c +++ b/drivers/soc/fsl/qbman/qman_ccsr.c @@ -446,8 +446,14 @@ static int zero_priv_mem(struct device *dev, struct device_node *node, return -ENOMEM; memset(tmpp, 0, sz); +#ifdef CONFIG_PPC flush_dcache_range((unsigned long)tmpp, (unsigned long)tmpp + sz); +#elif defined(CONFIG_ARM) + __cpuc_flush_dcache_area(tmpp, sz); +#elif defined(CONFIG_ARM64) + __flush_dcache_area(tmpp, sz); +#endif memunmap(tmpp); return 0;