Message ID | 1415896047.1787.4.camel@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thursday 13 November 2014 16:27:27 Jon Medhurst wrote: > 32-bit ARM kernels may have a 64-bit dma_addr_t but have no > implementation of the compiler helper for 64-bit unsigned division, > therefore the use of the modulo operator in pl330_prep_dma_memcpy causes > the link error "undefined reference to `__aeabi_uldivmod'" > > As the burst value is always a power of two we can fix the problem, and > make the code more efficient, by replacing "% burst" with "& (burst-1)". > > Reported-by: kbuild test robot <fengguang.wu@intel.com> > Signed-off-by: Jon Medhurst <tixy@linaro.org> > Just saw the same thing and was going to send a different patch, but yours is better. Acked-by: Arnd Bergmann <arnd@arndb.de>
On Thu, Nov 13, 2014 at 04:27:27PM +0000, Jon Medhurst (Tixy) wrote: > 32-bit ARM kernels may have a 64-bit dma_addr_t but have no > implementation of the compiler helper for 64-bit unsigned division, > therefore the use of the modulo operator in pl330_prep_dma_memcpy causes > the link error "undefined reference to `__aeabi_uldivmod'" > > As the burst value is always a power of two we can fix the problem, and > make the code more efficient, by replacing "% burst" with "& (burst-1)". > > Reported-by: kbuild test robot <fengguang.wu@intel.com> > Signed-off-by: Jon Medhurst <tixy@linaro.org> > --- > > Vinod. I haven't added a 'Fixes:' line because I was unsure if the patch > in linux-next is part of a stable branch or if the SHA1 might change > before hitting mainline. If it stable then the line should be... > > Fixes: 63369d0a96dc ("dmaengine: pl330: Align DMA memcpy operations to MFIFO width") I have applied this for now but... While at it and also related to Fixes, typically the fixes branch wont be rebased before its sent to Linus and merged. But this is introduced in patch which is sent, should I just fold it in and not cause this regression in first place...?
On Thu, 2014-11-13 at 16:27 +0000, Jon Medhurst (Tixy) wrote: > 32-bit ARM kernels may have a 64-bit dma_addr_t but have no > implementation of the compiler helper for 64-bit unsigned division, > therefore the use of the modulo operator in pl330_prep_dma_memcpy causes > the link error "undefined reference to `__aeabi_uldivmod'" > > As the burst value is always a power of two we can fix the problem, and > make the code more efficient, by replacing "% burst" with "& (burst-1)". > > Reported-by: kbuild test robot <fengguang.wu@intel.com> > Signed-off-by: Jon Medhurst <tixy@linaro.org> > --- > > Vinod. I haven't added a 'Fixes:' line because I was unsure if the patch > in linux-next is part of a stable branch or if the SHA1 might change > before hitting mainline. If it stable then the line should be... > > Fixes: 63369d0a96dc ("dmaengine: pl330: Align DMA memcpy operations to MFIFO width") > > > drivers/dma/pl330.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c > index 38c9617..52c4c62 100644 > --- a/drivers/dma/pl330.c > +++ b/drivers/dma/pl330.c > @@ -2464,11 +2464,8 @@ pl330_prep_dma_memcpy(struct dma_chan *chan, dma_addr_t dst, > * parameters because our DMA programming algorithm doesn't cope with > * transfers which straddle an entry in the DMA device's MFIFO. > */ > - while (burst > 1) { > - if (!((src | dst | len) % burst)) > - break; > + while ((src | dst | len) & (burst - 1)) > burst /= 2; > - } Maybe something like: div = ffs(src | dst | len); if (burst > 1 && div) burst >>= div; ? dunno if dma_addr_t src or dst can ever be a 64 bit value for AMBA or not. If so, the ffs would need to be different. Maybe: if (sizeof(dma_addr_t) == sizeof(u64)) div = __ffs64(src | dst | len); else div = ffs(src | dst | len); if (burst > 1 && div) burst >>= div;
On Thu, 2014-11-13 at 22:31 +0530, Vinod Koul wrote: > On Thu, Nov 13, 2014 at 04:27:27PM +0000, Jon Medhurst (Tixy) wrote: > > 32-bit ARM kernels may have a 64-bit dma_addr_t but have no > > implementation of the compiler helper for 64-bit unsigned division, > > therefore the use of the modulo operator in pl330_prep_dma_memcpy causes > > the link error "undefined reference to `__aeabi_uldivmod'" > > > > As the burst value is always a power of two we can fix the problem, and > > make the code more efficient, by replacing "% burst" with "& (burst-1)". > > > > Reported-by: kbuild test robot <fengguang.wu@intel.com> > > Signed-off-by: Jon Medhurst <tixy@linaro.org> > > --- > > > > Vinod. I haven't added a 'Fixes:' line because I was unsure if the patch > > in linux-next is part of a stable branch or if the SHA1 might change > > before hitting mainline. If it stable then the line should be... > > > > Fixes: 63369d0a96dc ("dmaengine: pl330: Align DMA memcpy operations to MFIFO width") > I have applied this for now but... > > While at it and also related to Fixes, typically the fixes branch wont be > rebased before its sent to Linus and merged. But this is introduced in patch > which is sent, should I just fold it in and not cause this regression in > first place...? I have no objection to folding it in, but then doesn't that remove credit for Fengguang Wu's test system for finding and reporting errors?
On Thu, 2014-11-13 at 09:02 -0800, Joe Perches wrote: > On Thu, 2014-11-13 at 16:27 +0000, Jon Medhurst (Tixy) wrote: > > 32-bit ARM kernels may have a 64-bit dma_addr_t but have no > > implementation of the compiler helper for 64-bit unsigned division, > > therefore the use of the modulo operator in pl330_prep_dma_memcpy causes > > the link error "undefined reference to `__aeabi_uldivmod'" > > > > As the burst value is always a power of two we can fix the problem, and > > make the code more efficient, by replacing "% burst" with "& (burst-1)". > > > > Reported-by: kbuild test robot <fengguang.wu@intel.com> > > Signed-off-by: Jon Medhurst <tixy@linaro.org> > > --- > > > > Vinod. I haven't added a 'Fixes:' line because I was unsure if the patch > > in linux-next is part of a stable branch or if the SHA1 might change > > before hitting mainline. If it stable then the line should be... > > > > Fixes: 63369d0a96dc ("dmaengine: pl330: Align DMA memcpy operations to MFIFO width") > > > > > > drivers/dma/pl330.c | 5 +---- > > 1 file changed, 1 insertion(+), 4 deletions(-) > > > > diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c > > index 38c9617..52c4c62 100644 > > --- a/drivers/dma/pl330.c > > +++ b/drivers/dma/pl330.c > > @@ -2464,11 +2464,8 @@ pl330_prep_dma_memcpy(struct dma_chan *chan, dma_addr_t dst, > > * parameters because our DMA programming algorithm doesn't cope with > > * transfers which straddle an entry in the DMA device's MFIFO. > > */ > > - while (burst > 1) { > > - if (!((src | dst | len) % burst)) > > - break; > > + while ((src | dst | len) & (burst - 1)) > > burst /= 2; > > - } > > Maybe something like: > > div = ffs(src | dst | len); > if (burst > 1 && div) > burst >>= div; That doesn't work, the code is trying to limit burst to make it a factor of src, dst and len, so it would need to be something like div = ffs(src | dst | len); if (div) burst = min(burst, 1 << div); There are many ways to code the limiting of the burst width, but as it starts out as the data bus width the DMA can handle (maximum 16 bytes) then at most we'll be going round the existing while loop 4 times so I don't think it's that much overhead, and probably less code size than using ffs. And as the driver has been broken for the unaligned memcpy case since the day it was added then I can't see that anyone is actually using it that way anyway, so all existing users (if any) must already be doing bus aligned copies and the current while loop will iterate zero times. That's probably enough bikeshedding from me :-) > ? > > dunno if dma_addr_t src or dst can ever be a 64 bit value > for AMBA or not. The pl330 TRM I have and the current Linux driver explicitly have 32-bit addresses, so you would need an IOMMU to access addresses above 4GB.
On Thu, 2014-11-13 at 18:19 +0000, Jon Medhurst (Tixy) wrote: > There are many ways to code the limiting of the burst width, but as it > starts out as the data bus width the DMA can handle (maximum 16 bytes) > then at most we'll be going round the existing while loop 4 times so I > don't think it's that much overhead, and probably less code size than > using ffs. For arm, isn't ffs just a few instruction with no loops? > And as the driver has been broken for the unaligned memcpy case since > the day it was added then I can't see that anyone is actually using it > that way anyway, so all existing users (if any) must already be doing > bus aligned copies and the current while loop will iterate zero times. That's probably right, I just don't like reading while loops where ffs/fls might be suitable. > That's probably enough bikeshedding from me :-) ;) Me too. cheers, Joe
On Thu, Nov 13, 2014 at 05:11:28PM +0000, Jon Medhurst (Tixy) wrote: > On Thu, 2014-11-13 at 22:31 +0530, Vinod Koul wrote: > > On Thu, Nov 13, 2014 at 04:27:27PM +0000, Jon Medhurst (Tixy) wrote: > > > 32-bit ARM kernels may have a 64-bit dma_addr_t but have no > > > implementation of the compiler helper for 64-bit unsigned division, > > > therefore the use of the modulo operator in pl330_prep_dma_memcpy causes > > > the link error "undefined reference to `__aeabi_uldivmod'" > > > > > > As the burst value is always a power of two we can fix the problem, and > > > make the code more efficient, by replacing "% burst" with "& (burst-1)". > > > > > > Reported-by: kbuild test robot <fengguang.wu@intel.com> > > > Signed-off-by: Jon Medhurst <tixy@linaro.org> > > > --- > > > > > > Vinod. I haven't added a 'Fixes:' line because I was unsure if the patch > > > in linux-next is part of a stable branch or if the SHA1 might change > > > before hitting mainline. If it stable then the line should be... > > > > > > Fixes: 63369d0a96dc ("dmaengine: pl330: Align DMA memcpy operations to MFIFO width") > > I have applied this for now but... > > > > While at it and also related to Fixes, typically the fixes branch wont be > > rebased before its sent to Linus and merged. But this is introduced in patch > > which is sent, should I just fold it in and not cause this regression in > > first place...? > > I have no objection to folding it in, but then doesn't that remove > credit for Fengguang Wu's test system for finding and reporting errors? I added entry for that and retiained credit to him.
diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c index 38c9617..52c4c62 100644 --- a/drivers/dma/pl330.c +++ b/drivers/dma/pl330.c @@ -2464,11 +2464,8 @@ pl330_prep_dma_memcpy(struct dma_chan *chan, dma_addr_t dst, * parameters because our DMA programming algorithm doesn't cope with * transfers which straddle an entry in the DMA device's MFIFO. */ - while (burst > 1) { - if (!((src | dst | len) % burst)) - break; + while ((src | dst | len) & (burst - 1)) burst /= 2; - } desc->rqcfg.brst_size = 0; while (burst != (1 << desc->rqcfg.brst_size))
32-bit ARM kernels may have a 64-bit dma_addr_t but have no implementation of the compiler helper for 64-bit unsigned division, therefore the use of the modulo operator in pl330_prep_dma_memcpy causes the link error "undefined reference to `__aeabi_uldivmod'" As the burst value is always a power of two we can fix the problem, and make the code more efficient, by replacing "% burst" with "& (burst-1)". Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jon Medhurst <tixy@linaro.org> --- Vinod. I haven't added a 'Fixes:' line because I was unsure if the patch in linux-next is part of a stable branch or if the SHA1 might change before hitting mainline. If it stable then the line should be... Fixes: 63369d0a96dc ("dmaengine: pl330: Align DMA memcpy operations to MFIFO width") drivers/dma/pl330.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)