Message ID | 1413993983-17310-1-git-send-email-mathieu.poirier@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Oct 22, 2014 at 10:06:23AM -0600, mathieu.poirier@linaro.org wrote: > @@ -306,10 +324,13 @@ extern void _memset_io(volatile void __iomem *, int, size_t); > __raw_readw(c)); __r; }) > #define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) \ > __raw_readl(c)); __r; }) > +#define readq_relaxed(c) ({ u64 __r = le64_to_cpu((__force __le64) \ > + __raw_readq(c)); __r; }) > > #define writeb_relaxed(v,c) __raw_writeb(v,c) > #define writew_relaxed(v,c) __raw_writew((__force u16) cpu_to_le16(v),c) > #define writel_relaxed(v,c) __raw_writel((__force u32) cpu_to_le32(v),c) > +#define writeq_relaxed(v,c) __raw_writeq((__force u64) cpu_to_le64(v),c) You should only define these if we have the corresponding __raw_ versions too.
On 22 October 2014 18:11, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Wed, Oct 22, 2014 at 10:06:23AM -0600, mathieu.poirier@linaro.org wrote: >> @@ -306,10 +324,13 @@ extern void _memset_io(volatile void __iomem *, int, size_t); >> __raw_readw(c)); __r; }) >> #define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) \ >> __raw_readl(c)); __r; }) >> +#define readq_relaxed(c) ({ u64 __r = le64_to_cpu((__force __le64) \ >> + __raw_readq(c)); __r; }) >> >> #define writeb_relaxed(v,c) __raw_writeb(v,c) >> #define writew_relaxed(v,c) __raw_writew((__force u16) cpu_to_le16(v),c) >> #define writel_relaxed(v,c) __raw_writel((__force u32) cpu_to_le32(v),c) >> +#define writeq_relaxed(v,c) __raw_writeq((__force u64) cpu_to_le64(v),c) > > You should only define these if we have the corresponding __raw_ versions > too. I had this conversation with a colleague who reviewed the work. If the architecture is < 5 the __raw_ versions aren't included and the compiler won't complain until someone tries to use the macros. We achieve the same result - the macros aren't accessible when the architecture doesn't support it - while saving an #if condition in the file. I'm not strongly opinionated on this - I can enclose the macros in an #if statement. > > -- > FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up > according to speedtest.net.
On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote: > +#if __LINUX_ARM_ARCH__ >= 5 My old ARMv5 book does not list LDRD/STRD. It looks like they only come with ARMv5TE. Are there any processors prior to this supported by the kernel? > +static inline void __raw_writeq(u64 val, volatile void __iomem *addr) > +{ > + asm volatile("strd %1, %0" > + : "+Qo" (*(volatile u64 __force *)addr) > + : "r" (val)); > +} > + > +static inline u64 __raw_readq(const volatile void __iomem *addr) > +{ > + u64 val; > + asm volatile("ldrd %1, %0" > + : "+Qo" (*(volatile u64 __force *)addr), > + "=r" (val)); > + return val; > +} > +#endif I'm curious why you need these. Do you have a device that needs a 64-bit single access or you are trying to read two consecutive registers?
On Wed, Oct 22, 2014 at 06:22:09PM +0200, Mathieu Poirier wrote: > I had this conversation with a colleague who reviewed the work. If > the architecture is < 5 the __raw_ versions aren't included and the > compiler won't complain until someone tries to use the macros. We > achieve the same result - the macros aren't accessible when the > architecture doesn't support it - while saving an #if condition in the > file. > > I'm not strongly opinionated on this - I can enclose the macros in an > #if statement. What you're missing is that some driver may test for their presence by doing: #ifndef readq_relaxed ... do something else which would now break as a detection method, as the macro is always defined no matter whether it's present or not.
On 22 October 2014 19:19, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Wed, Oct 22, 2014 at 06:22:09PM +0200, Mathieu Poirier wrote: >> I had this conversation with a colleague who reviewed the work. If >> the architecture is < 5 the __raw_ versions aren't included and the >> compiler won't complain until someone tries to use the macros. We >> achieve the same result - the macros aren't accessible when the >> architecture doesn't support it - while saving an #if condition in the >> file. >> >> I'm not strongly opinionated on this - I can enclose the macros in an >> #if statement. > > What you're missing is that some driver may test for their presence by > doing: > > #ifndef readq_relaxed > ... do something else > > which would now break as a detection method, as the macro is always > defined no matter whether it's present or not. Good point - thanks for showing me the light. > > -- > FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up > according to speedtest.net.
On 22 October 2014 18:44, Catalin Marinas <catalin.marinas@arm.com> wrote: > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote: >> +#if __LINUX_ARM_ARCH__ >= 5 > > My old ARMv5 book does not list LDRD/STRD. It looks like they only come > with ARMv5TE. Are there any processors prior to this supported by the > kernel? > >> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr) >> +{ >> + asm volatile("strd %1, %0" >> + : "+Qo" (*(volatile u64 __force *)addr) >> + : "r" (val)); >> +} >> + >> +static inline u64 __raw_readq(const volatile void __iomem *addr) >> +{ >> + u64 val; >> + asm volatile("ldrd %1, %0" >> + : "+Qo" (*(volatile u64 __force *)addr), >> + "=r" (val)); >> + return val; >> +} >> +#endif > > I'm curious why you need these. Do you have a device that needs a 64-bit > single access or you are trying to read two consecutive registers? The fundamental data size of Coresight STM32 for ARMv7 is implementation defined and can be 32 or 64bit. As such stimulus ports can support transaction sizes of up to 64 bit. Mathieu > > -- > Catalin
On Wed, 22 Oct 2014, Catalin Marinas wrote: > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote: > > +#if __LINUX_ARM_ARCH__ >= 5 > > My old ARMv5 book does not list LDRD/STRD. It looks like they only come > with ARMv5TE. Are there any processors prior to this supported by the > kernel? We still supports ARMv4 targets. As far as I know, all the ARMv5 targets we support are also TE capable. Nicolas
On Thu, Oct 23, 2014 at 03:47:32PM -0400, Nicolas Pitre wrote: > On Wed, 22 Oct 2014, Catalin Marinas wrote: > > > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote: > > > +#if __LINUX_ARM_ARCH__ >= 5 > > > > My old ARMv5 book does not list LDRD/STRD. It looks like they only come > > with ARMv5TE. Are there any processors prior to this supported by the > > kernel? > > We still supports ARMv4 targets. > > As far as I know, all the ARMv5 targets we support are also TE capable. Not quite. We have ARM1020, which according to our proc-*.S files is only ARMv5T, not ARMv5TE.
On Thu, 23 Oct 2014, Russell King - ARM Linux wrote: > On Thu, Oct 23, 2014 at 03:47:32PM -0400, Nicolas Pitre wrote: > > On Wed, 22 Oct 2014, Catalin Marinas wrote: > > > > > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote: > > > > +#if __LINUX_ARM_ARCH__ >= 5 > > > > > > My old ARMv5 book does not list LDRD/STRD. It looks like they only come > > > with ARMv5TE. Are there any processors prior to this supported by the > > > kernel? > > > > We still supports ARMv4 targets. > > > > As far as I know, all the ARMv5 targets we support are also TE capable. > > Not quite. We have ARM1020, which according to our proc-*.S files is > only ARMv5T, not ARMv5TE. Oh well. Never saw such a beast in the field though. Maybe to be on the very safe side, given that no ARMV5TE is likely to need 64-bit IO accessors at this point, this could simply be __LINUX_ARM_ARCH__ >= 6 instead. Nicolas
On Thu, Oct 23, 2014 at 08:47:32PM +0100, Nicolas Pitre wrote: > On Wed, 22 Oct 2014, Catalin Marinas wrote: > > > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote: > > > +#if __LINUX_ARM_ARCH__ >= 5 > > > > My old ARMv5 book does not list LDRD/STRD. It looks like they only come > > with ARMv5TE. Are there any processors prior to this supported by the > > kernel? > > We still supports ARMv4 targets. Yes. What I meant is non-TE ARMv5 targets.
On Wed, Oct 22, 2014 at 08:10:27PM +0100, Mathieu Poirier wrote: > On 22 October 2014 18:44, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote: > >> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr) > >> +{ > >> + asm volatile("strd %1, %0" > >> + : "+Qo" (*(volatile u64 __force *)addr) > >> + : "r" (val)); > >> +} > >> + > >> +static inline u64 __raw_readq(const volatile void __iomem *addr) > >> +{ > >> + u64 val; > >> + asm volatile("ldrd %1, %0" > >> + : "+Qo" (*(volatile u64 __force *)addr), > >> + "=r" (val)); > >> + return val; > >> +} > >> +#endif > > > > I'm curious why you need these. Do you have a device that needs a 64-bit > > single access or you are trying to read two consecutive registers? > > The fundamental data size of Coresight STM32 for ARMv7 is > implementation defined and can be 32 or 64bit. As such stimulus ports > can support transaction sizes of up to 64 bit. The STM programmer's model spec recommends something else (though I find the "3.6 Data sizes" chapter a bit confusing): To ensure that code is portable between processor micro-architectures and system implementations, ARM recommends that only the native data size of the machine is used, and smaller sizes. For the 32-bit ARMv7 architecture, only 8, 16, and 32-bit transfers are recommended. For an ARMv8 processor that supports the AArch64 Execution state, it is recommended that the fundamental data size of 64-bits is implemented. Which means that you should not use readq/writeq on a 32-bit system.
On Thursday 23 October 2014 16:15:19 Nicolas Pitre wrote: > On Thu, 23 Oct 2014, Russell King - ARM Linux wrote: > > > On Thu, Oct 23, 2014 at 03:47:32PM -0400, Nicolas Pitre wrote: > > > On Wed, 22 Oct 2014, Catalin Marinas wrote: > > > > > > > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote: > > > > > +#if __LINUX_ARM_ARCH__ >= 5 > > > > > > > > My old ARMv5 book does not list LDRD/STRD. It looks like they only come > > > > with ARMv5TE. Are there any processors prior to this supported by the > > > > kernel? > > > > > > We still supports ARMv4 targets. > > > > > > As far as I know, all the ARMv5 targets we support are also TE capable. > > > > Not quite. We have ARM1020, which according to our proc-*.S files is > > only ARMv5T, not ARMv5TE. Does this actually work when we are building with -march=armv5te? The Makefile contains this line: arch-$(CONFIG_CPU_32v5) =-D__LINUX_ARM_ARCH__=5 $(call cc-option,-march=armv5te,-march=armv4t) which looks like it would break for ARM1020. On a related note, I also wonder about this part: tune-$(CONFIG_CPU_ARM946E) =$(call cc-option,-mtune=arm9e,-mtune=arm9tdmi) tune-$(CONFIG_CPU_ARM920T) =-mtune=arm9tdmi tune-$(CONFIG_CPU_ARM922T) =-mtune=arm9tdmi tune-$(CONFIG_CPU_ARM925T) =-mtune=arm9tdmi tune-$(CONFIG_CPU_ARM926T) =-mtune=arm9tdmi I stumbled over this a while ago and couldn't figure it out. Does ARM926T actually exist, or is that a mistake that should actually be ARM926E? If this is always ARM926E, shouldn't we build with -mtune=arm9e as we do for ARM946E? > Oh well. Never saw such a beast in the field though. The only ARM10 implementation aside from integrator/realview that I'm aware of is an ARM1026E based Conexant/Ikanos DSL modem SoC (CX94xxx), and that is of course ARMv5TE. > Maybe to be on the very safe side, given that no ARMV5TE is likely to > need 64-bit IO accessors at this point, this could simply be > __LINUX_ARM_ARCH__ >= 6 instead. Which drivers need that support anyway? We definitely need to ensure that we don't try to build them on architectures without this support when CONFIG_COMPILE_TEST is set. Arnd
On 24 October 2014 03:28, Catalin Marinas <catalin.marinas@arm.com> wrote: > On Wed, Oct 22, 2014 at 08:10:27PM +0100, Mathieu Poirier wrote: >> On 22 October 2014 18:44, Catalin Marinas <catalin.marinas@arm.com> wrote: >> > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote: >> >> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr) >> >> +{ >> >> + asm volatile("strd %1, %0" >> >> + : "+Qo" (*(volatile u64 __force *)addr) >> >> + : "r" (val)); >> >> +} >> >> + >> >> +static inline u64 __raw_readq(const volatile void __iomem *addr) >> >> +{ >> >> + u64 val; >> >> + asm volatile("ldrd %1, %0" >> >> + : "+Qo" (*(volatile u64 __force *)addr), >> >> + "=r" (val)); >> >> + return val; >> >> +} >> >> +#endif >> > >> > I'm curious why you need these. Do you have a device that needs a 64-bit >> > single access or you are trying to read two consecutive registers? >> >> The fundamental data size of Coresight STM32 for ARMv7 is >> implementation defined and can be 32 or 64bit. As such stimulus ports >> can support transaction sizes of up to 64 bit. > > The STM programmer's model spec recommends something else (though I find > the "3.6 Data sizes" chapter a bit confusing): > > To ensure that code is portable between processor micro-architectures > and system implementations, ARM recommends that only the native data > size of the machine is used, and smaller sizes. For the 32-bit ARMv7 > architecture, only 8, 16, and 32-bit transfers are recommended. For an > ARMv8 processor that supports the AArch64 Execution state, it is > recommended that the fundamental data size of 64-bits is implemented. > > Which means that you should not use readq/writeq on a 32-bit system. Not quite. ARM documentation IHI0054B (ARM System Trace Macrocell: Programmers' Model Architecture Specification) stipulate that "For systems with an ARMv7 processor, ARM recommends configuration 1 or configuration 2.", where configuration 2 has a fundamental size of 64 bit. Mathieu > > -- > Catalin
On Fri, Oct 24, 2014 at 04:05:13PM +0100, Mathieu Poirier wrote: > On 24 October 2014 03:28, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Wed, Oct 22, 2014 at 08:10:27PM +0100, Mathieu Poirier wrote: > >> On 22 October 2014 18:44, Catalin Marinas <catalin.marinas@arm.com> wrote: > >> > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote: > >> >> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr) > >> >> +{ > >> >> + asm volatile("strd %1, %0" > >> >> + : "+Qo" (*(volatile u64 __force *)addr) > >> >> + : "r" (val)); > >> >> +} > >> >> + > >> >> +static inline u64 __raw_readq(const volatile void __iomem *addr) > >> >> +{ > >> >> + u64 val; > >> >> + asm volatile("ldrd %1, %0" > >> >> + : "+Qo" (*(volatile u64 __force *)addr), > >> >> + "=r" (val)); > >> >> + return val; > >> >> +} > >> >> +#endif > >> > > >> > I'm curious why you need these. Do you have a device that needs a 64-bit > >> > single access or you are trying to read two consecutive registers? > >> > >> The fundamental data size of Coresight STM32 for ARMv7 is > >> implementation defined and can be 32 or 64bit. As such stimulus ports > >> can support transaction sizes of up to 64 bit. > > > > The STM programmer's model spec recommends something else (though I find > > the "3.6 Data sizes" chapter a bit confusing): > > > > To ensure that code is portable between processor micro-architectures > > and system implementations, ARM recommends that only the native data > > size of the machine is used, and smaller sizes. For the 32-bit ARMv7 > > architecture, only 8, 16, and 32-bit transfers are recommended. For an > > ARMv8 processor that supports the AArch64 Execution state, it is > > recommended that the fundamental data size of 64-bits is implemented. > > > > Which means that you should not use readq/writeq on a 32-bit system. > > Not quite. ARM documentation IHI0054B (ARM System Trace Macrocell: > Programmers' Model Architecture Specification) stipulate that "For > systems with an ARMv7 processor, ARM recommends configuration 1 or > configuration 2.", where configuration 2 has a fundamental size of 64 > bit. As I said, it's confusing. Anyway, you can go ahead and add the readq/writeq for ARMv6 and later, though it won't be guaranteed to have a 64-bit access, it depends on the bus. BTW, do you need to define the non-relaxed accessors as well? That would be readq/writeq.
On 24 October 2014 10:16, Catalin Marinas <catalin.marinas@arm.com> wrote: > On Fri, Oct 24, 2014 at 04:05:13PM +0100, Mathieu Poirier wrote: >> On 24 October 2014 03:28, Catalin Marinas <catalin.marinas@arm.com> wrote: >> > On Wed, Oct 22, 2014 at 08:10:27PM +0100, Mathieu Poirier wrote: >> >> On 22 October 2014 18:44, Catalin Marinas <catalin.marinas@arm.com> wrote: >> >> > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote: >> >> >> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr) >> >> >> +{ >> >> >> + asm volatile("strd %1, %0" >> >> >> + : "+Qo" (*(volatile u64 __force *)addr) >> >> >> + : "r" (val)); >> >> >> +} >> >> >> + >> >> >> +static inline u64 __raw_readq(const volatile void __iomem *addr) >> >> >> +{ >> >> >> + u64 val; >> >> >> + asm volatile("ldrd %1, %0" >> >> >> + : "+Qo" (*(volatile u64 __force *)addr), >> >> >> + "=r" (val)); >> >> >> + return val; >> >> >> +} >> >> >> +#endif >> >> > >> >> > I'm curious why you need these. Do you have a device that needs a 64-bit >> >> > single access or you are trying to read two consecutive registers? >> >> >> >> The fundamental data size of Coresight STM32 for ARMv7 is >> >> implementation defined and can be 32 or 64bit. As such stimulus ports >> >> can support transaction sizes of up to 64 bit. >> > >> > The STM programmer's model spec recommends something else (though I find >> > the "3.6 Data sizes" chapter a bit confusing): >> > >> > To ensure that code is portable between processor micro-architectures >> > and system implementations, ARM recommends that only the native data >> > size of the machine is used, and smaller sizes. For the 32-bit ARMv7 >> > architecture, only 8, 16, and 32-bit transfers are recommended. For an >> > ARMv8 processor that supports the AArch64 Execution state, it is >> > recommended that the fundamental data size of 64-bits is implemented. >> > >> > Which means that you should not use readq/writeq on a 32-bit system. >> >> Not quite. ARM documentation IHI0054B (ARM System Trace Macrocell: >> Programmers' Model Architecture Specification) stipulate that "For >> systems with an ARMv7 processor, ARM recommends configuration 1 or >> configuration 2.", where configuration 2 has a fundamental size of 64 >> bit. > > As I said, it's confusing. Anyway, you can go ahead and add the > readq/writeq for ARMv6 and later, though it won't be guaranteed to have > a 64-bit access, it depends on the bus. Agreed. > > BTW, do you need to define the non-relaxed accessors as well? That would > be readq/writeq. No other master in the system is consuming this data and the channels return '0' on read. As such I didn't plan on implementing the non-relaxed accessors. Would you like me to do so for completeness? > > -- > Catalin
On Fri, Oct 24, 2014 at 05:16:34PM +0100, Catalin Marinas wrote: > On Fri, Oct 24, 2014 at 04:05:13PM +0100, Mathieu Poirier wrote: > > On 24 October 2014 03:28, Catalin Marinas <catalin.marinas@arm.com> wrote: > > > On Wed, Oct 22, 2014 at 08:10:27PM +0100, Mathieu Poirier wrote: > > >> On 22 October 2014 18:44, Catalin Marinas <catalin.marinas@arm.com> wrote: > > >> > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote: > > >> >> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr) > > >> >> +{ > > >> >> + asm volatile("strd %1, %0" > > >> >> + : "+Qo" (*(volatile u64 __force *)addr) > > >> >> + : "r" (val)); > > >> >> +} > > >> >> + > > >> >> +static inline u64 __raw_readq(const volatile void __iomem *addr) > > >> >> +{ > > >> >> + u64 val; > > >> >> + asm volatile("ldrd %1, %0" > > >> >> + : "+Qo" (*(volatile u64 __force *)addr), > > >> >> + "=r" (val)); > > >> >> + return val; > > >> >> +} > > >> >> +#endif > > >> > > > >> > I'm curious why you need these. Do you have a device that needs a 64-bit > > >> > single access or you are trying to read two consecutive registers? > > >> > > >> The fundamental data size of Coresight STM32 for ARMv7 is > > >> implementation defined and can be 32 or 64bit. As such stimulus ports > > >> can support transaction sizes of up to 64 bit. > > > > > > The STM programmer's model spec recommends something else (though I find > > > the "3.6 Data sizes" chapter a bit confusing): > > > > > > To ensure that code is portable between processor micro-architectures > > > and system implementations, ARM recommends that only the native data > > > size of the machine is used, and smaller sizes. For the 32-bit ARMv7 > > > architecture, only 8, 16, and 32-bit transfers are recommended. For an > > > ARMv8 processor that supports the AArch64 Execution state, it is > > > recommended that the fundamental data size of 64-bits is implemented. > > > > > > Which means that you should not use readq/writeq on a 32-bit system. > > > > Not quite. ARM documentation IHI0054B (ARM System Trace Macrocell: > > Programmers' Model Architecture Specification) stipulate that "For > > systems with an ARMv7 processor, ARM recommends configuration 1 or > > configuration 2.", where configuration 2 has a fundamental size of 64 > > bit. > > As I said, it's confusing. Anyway, you can go ahead and add the > readq/writeq for ARMv6 and later, though it won't be guaranteed to have > a 64-bit access, it depends on the bus. I'm really not comfortable with this... we don't make any guarantees for 32-bit CPUs that a double-word access will be single-copy atomic for MMIO. That means it could be subjected to things like reordering and merging, which I think means that it depends on both the bus *and* the endpoint as to whether or not this will work. Worse still, the endpoint could decide to return a SLVERR, which would appear as an external abort. Is it not possible to use 32-bit MMIO accesses for this driver? Will
On 27 October 2014 09:54, Will Deacon <will.deacon@arm.com> wrote: > On Fri, Oct 24, 2014 at 05:16:34PM +0100, Catalin Marinas wrote: >> On Fri, Oct 24, 2014 at 04:05:13PM +0100, Mathieu Poirier wrote: >> > On 24 October 2014 03:28, Catalin Marinas <catalin.marinas@arm.com> wrote: >> > > On Wed, Oct 22, 2014 at 08:10:27PM +0100, Mathieu Poirier wrote: >> > >> On 22 October 2014 18:44, Catalin Marinas <catalin.marinas@arm.com> wrote: >> > >> > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote: >> > >> >> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr) >> > >> >> +{ >> > >> >> + asm volatile("strd %1, %0" >> > >> >> + : "+Qo" (*(volatile u64 __force *)addr) >> > >> >> + : "r" (val)); >> > >> >> +} >> > >> >> + >> > >> >> +static inline u64 __raw_readq(const volatile void __iomem *addr) >> > >> >> +{ >> > >> >> + u64 val; >> > >> >> + asm volatile("ldrd %1, %0" >> > >> >> + : "+Qo" (*(volatile u64 __force *)addr), >> > >> >> + "=r" (val)); >> > >> >> + return val; >> > >> >> +} >> > >> >> +#endif >> > >> > >> > >> > I'm curious why you need these. Do you have a device that needs a 64-bit >> > >> > single access or you are trying to read two consecutive registers? >> > >> >> > >> The fundamental data size of Coresight STM32 for ARMv7 is >> > >> implementation defined and can be 32 or 64bit. As such stimulus ports >> > >> can support transaction sizes of up to 64 bit. >> > > >> > > The STM programmer's model spec recommends something else (though I find >> > > the "3.6 Data sizes" chapter a bit confusing): >> > > >> > > To ensure that code is portable between processor micro-architectures >> > > and system implementations, ARM recommends that only the native data >> > > size of the machine is used, and smaller sizes. For the 32-bit ARMv7 >> > > architecture, only 8, 16, and 32-bit transfers are recommended. For an >> > > ARMv8 processor that supports the AArch64 Execution state, it is >> > > recommended that the fundamental data size of 64-bits is implemented. >> > > >> > > Which means that you should not use readq/writeq on a 32-bit system. >> > >> > Not quite. ARM documentation IHI0054B (ARM System Trace Macrocell: >> > Programmers' Model Architecture Specification) stipulate that "For >> > systems with an ARMv7 processor, ARM recommends configuration 1 or >> > configuration 2.", where configuration 2 has a fundamental size of 64 >> > bit. >> >> As I said, it's confusing. Anyway, you can go ahead and add the >> readq/writeq for ARMv6 and later, though it won't be guaranteed to have >> a 64-bit access, it depends on the bus. > > I'm really not comfortable with this... we don't make any guarantees for > 32-bit CPUs that a double-word access will be single-copy atomic for MMIO. > That means it could be subjected to things like reordering and merging, > which I think means that it depends on both the bus *and* the endpoint as to > whether or not this will work. Worse still, the endpoint could decide to > return a SLVERR, which would appear as an external abort. I agree on all of the point you bring up. The person using these should know their architecture and the target endpoint support this kind of access. If they don't then a problem will show up pretty quickly. > > Is it not possible to use 32-bit MMIO accesses for this driver? Sure it is but we wouldn't be using the HW to it's full capability. Another solution is to move the accessors to the driver itself where nobody else in the 32 bit world will have access to them. Russell, what you're opinion on that? > > Will
On Mon, Oct 27, 2014 at 10:14:41PM +0000, Mathieu Poirier wrote: > On 27 October 2014 09:54, Will Deacon <will.deacon@arm.com> wrote: > > On Fri, Oct 24, 2014 at 05:16:34PM +0100, Catalin Marinas wrote: > >> As I said, it's confusing. Anyway, you can go ahead and add the > >> readq/writeq for ARMv6 and later, though it won't be guaranteed to have > >> a 64-bit access, it depends on the bus. > > > > I'm really not comfortable with this... we don't make any guarantees for > > 32-bit CPUs that a double-word access will be single-copy atomic for MMIO. > > That means it could be subjected to things like reordering and merging, > > which I think means that it depends on both the bus *and* the endpoint as to > > whether or not this will work. Worse still, the endpoint could decide to > > return a SLVERR, which would appear as an external abort. > > I agree on all of the point you bring up. The person using these > should know their architecture and the target endpoint support this > kind of access. If they don't then a problem will show up pretty > quickly. That goes against the I/O abstractions provided by the kernel to allow for portable device drivers. readq/writeq *must* have some portable semantics and I don't think that we should implement them on a best-effort basis in io.h. > > > > Is it not possible to use 32-bit MMIO accesses for this driver? > > Sure it is but we wouldn't be using the HW to it's full capability. > Another solution is to move the accessors to the driver itself where > nobody else in the 32 bit world will have access to them. Russell, > what you're opinion on that? FWIW, I'd much prefer that, but I'd be interested to know how much of a a couple of {read,write}l_relaxed operations really cost you by comparison. Will
diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h index 1805674..861e52c 100644 --- a/arch/arm/include/asm/io.h +++ b/arch/arm/include/asm/io.h @@ -118,6 +118,24 @@ static inline u32 __raw_readl(const volatile void __iomem *addr) return val; } +#if __LINUX_ARM_ARCH__ >= 5 +static inline void __raw_writeq(u64 val, volatile void __iomem *addr) +{ + asm volatile("strd %1, %0" + : "+Qo" (*(volatile u64 __force *)addr) + : "r" (val)); +} + +static inline u64 __raw_readq(const volatile void __iomem *addr) +{ + u64 val; + asm volatile("ldrd %1, %0" + : "+Qo" (*(volatile u64 __force *)addr), + "=r" (val)); + return val; +} +#endif + /* * Architecture ioremap implementation. */ @@ -306,10 +324,13 @@ extern void _memset_io(volatile void __iomem *, int, size_t); __raw_readw(c)); __r; }) #define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) \ __raw_readl(c)); __r; }) +#define readq_relaxed(c) ({ u64 __r = le64_to_cpu((__force __le64) \ + __raw_readq(c)); __r; }) #define writeb_relaxed(v,c) __raw_writeb(v,c) #define writew_relaxed(v,c) __raw_writew((__force u16) cpu_to_le16(v),c) #define writel_relaxed(v,c) __raw_writel((__force u32) cpu_to_le32(v),c) +#define writeq_relaxed(v,c) __raw_writeq((__force u64) cpu_to_le64(v),c) #define readb(c) ({ u8 __v = readb_relaxed(c); __iormb(); __v; }) #define readw(c) ({ u16 __v = readw_relaxed(c); __iormb(); __v; })