Message ID | 20240925132420.821473-2-jvetter@kalrayinc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Consolidate IO memcpy functions | expand |
On Wed, Sep 25, 2024, at 13:24, Julian Vetter wrote: > Various architectures have almost the same implementations for > __memcpy_{to,from}io and __memset_io functions. So, consolidate them > into the existing lib/iomap_copy.c. > > Reviewed-by: Yann Sionneau <ysionneau@kalrayinc.com> > Signed-off-by: Julian Vetter <jvetter@kalrayinc.com> > --- > Signed-off-by: Julian Vetter <jvetter@kalrayinc.com> You have a duplicated signoff here. > +#ifndef __memcpy_fromio > +void __memcpy_fromio(void *to, const volatile void __iomem *from, > size_t count); > +#endif > + > +#ifndef __memcpy_toio > +void __memcpy_toio(volatile void __iomem *to, const void *from, size_t > count); > +#endif > + > +#ifndef __memset_io > +void __memset_io(volatile void __iomem *dst, int c, size_t count); > +#endif I'm not entirely sure about the purpose of the #ifdef here, since nothing ever overrides the double-underscore versions, both before and after your patches. Unless I'm missing something here, I think a more logical sequence would be: 1. add the definitions in this file without the underscores, as memcpy_fromio/memcpy_toio/memset_io, with the #ifdef for that name that is always set at this point 2. replace the default implementation in asm-generic/io.h with extern prototypes, remove the #define from those 3. convert the other architectures, removing both the implementations and the prototypes. Arnd
On 26.09.24 09:14, Arnd Bergmann wrote: > On Wed, Sep 25, 2024, at 13:24, Julian Vetter wrote: >> Various architectures have almost the same implementations for >> __memcpy_{to,from}io and __memset_io functions. So, consolidate them >> into the existing lib/iomap_copy.c. >> >> Reviewed-by: Yann Sionneau <ysionneau@kalrayinc.com> >> Signed-off-by: Julian Vetter <jvetter@kalrayinc.com> >> --- >> Signed-off-by: Julian Vetter <jvetter@kalrayinc.com> > > You have a duplicated signoff here. Yes, thank you. I will remove it in the next patch revision. > > >> +#ifndef __memcpy_fromio >> +void __memcpy_fromio(void *to, const volatile void __iomem *from, >> size_t count); >> +#endif >> + >> +#ifndef __memcpy_toio >> +void __memcpy_toio(volatile void __iomem *to, const void *from, size_t >> count); >> +#endif >> + >> +#ifndef __memset_io >> +void __memset_io(volatile void __iomem *dst, int c, size_t count); >> +#endif > > I'm not entirely sure about the purpose of the #ifdef here, since > nothing ever overrides the double-underscore versions, both before > and after your patches. > > Unless I'm missing something here, I think a more logical > sequence would be: > > 1. add the definitions in this file without the underscores, by: "...in this file..." you mean the 'lib/iomap_copy.c' file, right? But what if an architecture does not select 'CONFIG_HAS_IOMEM'. Then 'iomap_copy.c' is not compiled and we don't have an implementation, right? I tried to compile with ARCH=um, with some MTD chip driver, like the robot did and it indeed fails, because um has 'NO_IOMEM' set. and the driver uses memcpy_fromio. I mean it's a strange combination, because apparently we try to use IO memory? Is this an invalid combination? But shouldn't the driver then 'depends on HAS_IOMEM'? > as memcpy_fromio/memcpy_toio/memset_io, with the #ifdef > for that name that is always set at this point > Right. I will remove it in my next patch revision. > 2. replace the default implementation in asm-generic/io.h > with extern prototypes, remove the #define from those > Yes, I have done this now. > 3. convert the other architectures, removing both the > implementations and the prototypes. > I have removed the prototypes and have aligned the function arguments in m68k, alpha, parisc, and sh, which all have their own implementation, but had slightly different function arguments. Btw, I have not removed their implementations because some of them seem to have optimized implementations (e.g., alpha and m68k), that I didn't want to touch. But you're right others (e.g., sh) just do byte wise accesses and have a comment "This needs to be optimized." Maybe I should remove these and let them use the new version?! > Arnd > > > >
On Fri, Sep 27, 2024, at 08:19, Julian Vetter wrote: > On 26.09.24 09:14, Arnd Bergmann wrote: >>> +#ifndef __memcpy_fromio >>> +void __memcpy_fromio(void *to, const volatile void __iomem *from, >>> size_t count); >>> +#endif >>> + >>> +#ifndef __memcpy_toio >>> +void __memcpy_toio(volatile void __iomem *to, const void *from, size_t >>> count); >>> +#endif >>> + >>> +#ifndef __memset_io >>> +void __memset_io(volatile void __iomem *dst, int c, size_t count); >>> +#endif >> >> I'm not entirely sure about the purpose of the #ifdef here, since >> nothing ever overrides the double-underscore versions, both before >> and after your patches. >> >> Unless I'm missing something here, I think a more logical >> sequence would be: >> >> 1. add the definitions in this file without the underscores, > > by: "...in this file..." you mean the 'lib/iomap_copy.c' file, right? Yes > But what if an architecture does not select 'CONFIG_HAS_IOMEM'. Then > 'iomap_copy.c' is not compiled and we don't have an implementation, > right? > I tried to compile with ARCH=um, with some MTD chip driver, like > the robot did and it indeed fails, because um has 'NO_IOMEM' set. and > the driver uses memcpy_fromio. I mean it's a strange combination, > because apparently we try to use IO memory? Is this an invalid > combination? But shouldn't the driver then 'depends on HAS_IOMEM'? Yes, I think that would be the best way to do it. Alternatively, arch/um could provide a dummy implementation of these. >> 3. convert the other architectures, removing both the >> implementations and the prototypes. >> > > I have removed the prototypes and have aligned the function arguments in > m68k, alpha, parisc, and sh, which all have their own implementation, > but had slightly different function arguments. Sorry for being unclear, I meant only the architectures that you are already touching. > Btw, I have not removed > their implementations because some of them seem to have optimized > implementations (e.g., alpha and m68k), that I didn't want to touch. But > you're right others (e.g., sh) just do byte wise accesses and have a > comment "This needs to be optimized." Maybe I should remove these and > let them use the new version?! Ideally we should end up with only one copy, but I'd leave the rest for a future cleanup. In particular, alpha probably still needs a custom function. Arnd
From: Julian Vetter > Sent: 25 September 2024 14:24 > > Various architectures have almost the same implementations for > __memcpy_{to,from}io and __memset_io functions. So, consolidate them > into the existing lib/iomap_copy.c. > > Reviewed-by: Yann Sionneau <ysionneau@kalrayinc.com> > Signed-off-by: Julian Vetter <jvetter@kalrayinc.com> > --- > Signed-off-by: Julian Vetter <jvetter@kalrayinc.com> > --- > Changes for v6: > - Included linux/aslign.h > - Replaced compile time check by ifdef to remove compiler warning > --- > include/asm-generic/io.h | 12 +++++ > lib/iomap_copy.c | 109 +++++++++++++++++++++++++++++++++++++++ > 2 files changed, 121 insertions(+) > > diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h > index 80de699bf6af..9b8e0449da28 100644 > --- a/include/asm-generic/io.h > +++ b/include/asm-generic/io.h > @@ -102,6 +102,18 @@ static inline void log_post_read_mmio(u64 val, u8 width, const volatile void __i > > #endif /* CONFIG_TRACE_MMIO_ACCESS */ > > +#ifndef __memcpy_fromio > +void __memcpy_fromio(void *to, const volatile void __iomem *from, size_t count); > +#endif > + > +#ifndef __memcpy_toio > +void __memcpy_toio(volatile void __iomem *to, const void *from, size_t count); > +#endif > + > +#ifndef __memset_io > +void __memset_io(volatile void __iomem *dst, int c, size_t count); > +#endif > + > /* > * __raw_{read,write}{b,w,l,q}() access memory in native endianness. > * > diff --git a/lib/iomap_copy.c b/lib/iomap_copy.c > index 2fd5712fb7c0..c2cee6410151 100644 > --- a/lib/iomap_copy.c > +++ b/lib/iomap_copy.c > @@ -3,9 +3,15 @@ > * Copyright 2006 PathScale, Inc. All Rights Reserved. > */ > > +#include <asm/unaligned.h> > + > +#include <linux/align.h> > #include <linux/export.h> > +#include <linux/types.h> > #include <linux/io.h> > > +#define NATIVE_STORE_SIZE (BITS_PER_LONG/8) (sizeof (long)) > + > /** > * __iowrite32_copy - copy data to MMIO space, in 32-bit units > * @to: destination, in MMIO space (must be 32-bit aligned) > @@ -76,3 +82,106 @@ void __iowrite64_copy(void __iomem *to, const void *from, size_t count) > } > EXPORT_SYMBOL_GPL(__iowrite64_copy); > #endif > + > + > +#ifndef __memcpy_fromio > +void __memcpy_fromio(void *to, const volatile void __iomem *from, size_t count) > +{ > + while (count && !IS_ALIGNED((unsigned long)from, NATIVE_STORE_SIZE)) { > + *(u8 *)to = __raw_readb(from); > + from++; > + to++; > + count--; > + } > + > + while (count >= NATIVE_STORE_SIZE) { > +#ifdef CONFIG_64BIT > + put_unaligned(__raw_readq(from), (uintptr_t *)to); > +#else > + put_unaligned(__raw_readl(from), (uintptr_t *)to); > +#endif That looks horrid to me. You seem to be mixing several different types and tests. NATIVE_STORE_SIZE is based on the 'long ' type (indirectly and by assumption). CONFIG_64BiIT (probably) implies LP64. readl() reads 4 bytes and readq() 8 (for both 32bit and 64bit kernels) uintptr is an unsigned integer large enough to hold a pointer. The sizes might all happen to match, but there is no need to rely on all of them. I might be best to just use 'sizeof (long)' except that you might get a compile error on some 32bit archs for the: long val = sizeof (val) == 8) ? readq(from) : readl(from); put_unaligned(val, (long *)to); (due to there being no declaration readq()) so it might need a #if somewhere. OTOH there might always be an 'extern' for readq(). If you are using the __raw_readx() functions don't you need the synchronisation barriers top and bottom? Also if put_unaligned() is non-trivial the code will be horrid. An initial test for ((to | from) & (sizeof (long) - 1) == 0) for an aligned copy may be worthwhile. There is the question of whether the code is allowed to do full word reads - valid if the io area behaves like memory. In which case you don't want to do byte transfers for alignment and tail transfers - just read the full word that contains the data. PCIe reads can be horribly slow (writes are 'posted' so much better). I'm not sure how long they take into a 'normal' target, but back to back reads into our fpga are about 128 clocks apart on its internal 125Mhz clock - the host cpu will stall for the entire period. So you definitely want to use the largest register possible. (Or try very hard to never do non-dma reads in either direction.) David > + > + from += NATIVE_STORE_SIZE; > + to += NATIVE_STORE_SIZE; > + count -= NATIVE_STORE_SIZE; > + } > + > + while (count) { > + *(u8 *)to = __raw_readb(from); > + from++; > + to++; > + count--; > + } > +} > +EXPORT_SYMBOL(__memcpy_fromio); > +#endif > + > +#ifndef __memcpy_toio > +void __memcpy_toio(volatile void __iomem *to, const void *from, size_t count) > +{ > + while (count && !IS_ALIGNED((unsigned long)to, NATIVE_STORE_SIZE)) { > + __raw_writeb(*(u8 *)from, to); > + from++; > + to++; > + count--; > + } > + > + while (count >= NATIVE_STORE_SIZE) { > +#ifdef CONFIG_64BIT > + __raw_writeq(get_unaligned((uintptr_t *)from), to); > +#else > + __raw_writel(get_unaligned((uintptr_t *)from), to); > +#endif > + > + from += NATIVE_STORE_SIZE; > + to += NATIVE_STORE_SIZE; > + count -= NATIVE_STORE_SIZE; > + } > + > + while (count) { > + __raw_writeb(*(u8 *)from, to); > + from++; > + to++; > + count--; > + } > +} > +EXPORT_SYMBOL(__memcpy_toio); > +#endif > + > +#ifndef __memset_io > +void __memset_io(volatile void __iomem *dst, int c, size_t count) > +{ > + uintptr_t qc = (u8)c; > + > + qc |= qc << 8; > + qc |= qc << 16; > + > +#ifdef CONFIG_64BIT > + qc |= qc << 32; > +#endif > + > + while (count && !IS_ALIGNED((unsigned long)dst, NATIVE_STORE_SIZE)) { > + __raw_writeb(c, dst); > + dst++; > + count--; > + } > + > + while (count >= NATIVE_STORE_SIZE) { > +#ifdef CONFIG_64BIT > + __raw_writeq(qc, dst); > +#else > + __raw_writel(qc, dst); > +#endif > + > + dst += NATIVE_STORE_SIZE; > + count -= NATIVE_STORE_SIZE; > + } > + > + while (count) { > + __raw_writeb(c, dst); > + dst++; > + count--; > + } > +} > +EXPORT_SYMBOL(__memset_io); > +#endif > -- > 2.34.1 > > > > > - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index 80de699bf6af..9b8e0449da28 100644 --- a/include/asm-generic/io.h +++ b/include/asm-generic/io.h @@ -102,6 +102,18 @@ static inline void log_post_read_mmio(u64 val, u8 width, const volatile void __i #endif /* CONFIG_TRACE_MMIO_ACCESS */ +#ifndef __memcpy_fromio +void __memcpy_fromio(void *to, const volatile void __iomem *from, size_t count); +#endif + +#ifndef __memcpy_toio +void __memcpy_toio(volatile void __iomem *to, const void *from, size_t count); +#endif + +#ifndef __memset_io +void __memset_io(volatile void __iomem *dst, int c, size_t count); +#endif + /* * __raw_{read,write}{b,w,l,q}() access memory in native endianness. * diff --git a/lib/iomap_copy.c b/lib/iomap_copy.c index 2fd5712fb7c0..c2cee6410151 100644 --- a/lib/iomap_copy.c +++ b/lib/iomap_copy.c @@ -3,9 +3,15 @@ * Copyright 2006 PathScale, Inc. All Rights Reserved. */ +#include <asm/unaligned.h> + +#include <linux/align.h> #include <linux/export.h> +#include <linux/types.h> #include <linux/io.h> +#define NATIVE_STORE_SIZE (BITS_PER_LONG/8) + /** * __iowrite32_copy - copy data to MMIO space, in 32-bit units * @to: destination, in MMIO space (must be 32-bit aligned) @@ -76,3 +82,106 @@ void __iowrite64_copy(void __iomem *to, const void *from, size_t count) } EXPORT_SYMBOL_GPL(__iowrite64_copy); #endif + + +#ifndef __memcpy_fromio +void __memcpy_fromio(void *to, const volatile void __iomem *from, size_t count) +{ + while (count && !IS_ALIGNED((unsigned long)from, NATIVE_STORE_SIZE)) { + *(u8 *)to = __raw_readb(from); + from++; + to++; + count--; + } + + while (count >= NATIVE_STORE_SIZE) { +#ifdef CONFIG_64BIT + put_unaligned(__raw_readq(from), (uintptr_t *)to); +#else + put_unaligned(__raw_readl(from), (uintptr_t *)to); +#endif + + from += NATIVE_STORE_SIZE; + to += NATIVE_STORE_SIZE; + count -= NATIVE_STORE_SIZE; + } + + while (count) { + *(u8 *)to = __raw_readb(from); + from++; + to++; + count--; + } +} +EXPORT_SYMBOL(__memcpy_fromio); +#endif + +#ifndef __memcpy_toio +void __memcpy_toio(volatile void __iomem *to, const void *from, size_t count) +{ + while (count && !IS_ALIGNED((unsigned long)to, NATIVE_STORE_SIZE)) { + __raw_writeb(*(u8 *)from, to); + from++; + to++; + count--; + } + + while (count >= NATIVE_STORE_SIZE) { +#ifdef CONFIG_64BIT + __raw_writeq(get_unaligned((uintptr_t *)from), to); +#else + __raw_writel(get_unaligned((uintptr_t *)from), to); +#endif + + from += NATIVE_STORE_SIZE; + to += NATIVE_STORE_SIZE; + count -= NATIVE_STORE_SIZE; + } + + while (count) { + __raw_writeb(*(u8 *)from, to); + from++; + to++; + count--; + } +} +EXPORT_SYMBOL(__memcpy_toio); +#endif + +#ifndef __memset_io +void __memset_io(volatile void __iomem *dst, int c, size_t count) +{ + uintptr_t qc = (u8)c; + + qc |= qc << 8; + qc |= qc << 16; + +#ifdef CONFIG_64BIT + qc |= qc << 32; +#endif + + while (count && !IS_ALIGNED((unsigned long)dst, NATIVE_STORE_SIZE)) { + __raw_writeb(c, dst); + dst++; + count--; + } + + while (count >= NATIVE_STORE_SIZE) { +#ifdef CONFIG_64BIT + __raw_writeq(qc, dst); +#else + __raw_writel(qc, dst); +#endif + + dst += NATIVE_STORE_SIZE; + count -= NATIVE_STORE_SIZE; + } + + while (count) { + __raw_writeb(c, dst); + dst++; + count--; + } +} +EXPORT_SYMBOL(__memset_io); +#endif