Message ID | 1522425494-2916-2-git-send-email-okaya@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, On Fri, Mar 30, 2018 at 11:58:13AM -0400, Sinan Kaya wrote: > The default implementation of mapping readX() to __raw_readX() is wrong. > readX() has stronger ordering semantics. Compiler is allowed to reorder > __raw_readX(). Could you please specify what the compiler is potentially reordering __raw_readX() against, and why this would be wrong? e.g. do we care about prior normal memory accesses, subsequent normal memory accesses, and/or other IO accesses? I assume that the asm-generic __raw_{read,write}X() implementations are all ordered w.r.t. each other (at least for a specific device). Thanks, Mark. > In the abscence of a read barrier or when using a strongly ordered > architecture, readX() should at least have a compiler barrier in > it to prevent commpiler from clobbering the execution order. > > Signed-off-by: Sinan Kaya <okaya@codeaurora.org> > --- > include/asm-generic/io.h | 28 ++++++++++++++++++++++++---- > 1 file changed, 24 insertions(+), 4 deletions(-) > > diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h > index e8c2078..2554f15 100644 > --- a/include/asm-generic/io.h > +++ b/include/asm-generic/io.h > @@ -110,7 +110,12 @@ static inline void __raw_writeq(u64 value, volatile void __iomem *addr) > #define readb readb > static inline u8 readb(const volatile void __iomem *addr) > { > - return __raw_readb(addr); > + u8 val; > + > + val = __raw_readb(addr); > + barrier(); > + > + return val; > } > #endif > > @@ -118,7 +123,12 @@ static inline u8 readb(const volatile void __iomem *addr) > #define readw readw > static inline u16 readw(const volatile void __iomem *addr) > { > - return __le16_to_cpu(__raw_readw(addr)); > + u16 val; > + > + val = __le16_to_cpu(__raw_readw(addr)); > + barrier(); > + > + return val; > } > #endif > > @@ -126,7 +136,12 @@ static inline u16 readw(const volatile void __iomem *addr) > #define readl readl > static inline u32 readl(const volatile void __iomem *addr) > { > - return __le32_to_cpu(__raw_readl(addr)); > + u32 val; > + > + val = __le32_to_cpu(__raw_readl(addr)); > + barrier(); > + > + return val; > } > #endif > > @@ -135,7 +150,12 @@ static inline u32 readl(const volatile void __iomem *addr) > #define readq readq > static inline u64 readq(const volatile void __iomem *addr) > { > - return __le64_to_cpu(__raw_readq(addr)); > + u64 val; > + > + val = __le64_to_cpu(__raw_readq(addr)); > + barrier(); > + > + return val; > } > #endif > #endif /* CONFIG_64BIT */ > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 3, 2018 at 12:49 PM, Mark Rutland <mark.rutland@arm.com> wrote: > Hi, > > On Fri, Mar 30, 2018 at 11:58:13AM -0400, Sinan Kaya wrote: >> The default implementation of mapping readX() to __raw_readX() is wrong. >> readX() has stronger ordering semantics. Compiler is allowed to reorder >> __raw_readX(). > > Could you please specify what the compiler is potentially reordering > __raw_readX() against, and why this would be wrong? > > e.g. do we care about prior normal memory accesses, subsequent normal > memory accesses, and/or other IO accesses? > > I assume that the asm-generic __raw_{read,write}X() implementations are > all ordered w.r.t. each other (at least for a specific device). I think that is correct: the compiler won't reorder those because of the 'volatile' pointer dereference, but it can reorder access to a normal pointer against a __raw_readl()/__raw_writel(), which breaks the scenario of using writel to trigger a DMA, or using a readl to see if a DMA has completed. The question is whether we should use a stronger barrier such as rmb() amd wmb() here rather than a simple compiler barrier. I would assume that on complex architectures with write buffers and out-of-order prefetching, those are required, while on architectures without those features, the barriers are cheap. Arnd
On 4/3/2018 7:13 AM, Arnd Bergmann wrote: > On Tue, Apr 3, 2018 at 12:49 PM, Mark Rutland <mark.rutland@arm.com> wrote: >> Hi, >> >> On Fri, Mar 30, 2018 at 11:58:13AM -0400, Sinan Kaya wrote: >>> The default implementation of mapping readX() to __raw_readX() is wrong. >>> readX() has stronger ordering semantics. Compiler is allowed to reorder >>> __raw_readX(). >> >> Could you please specify what the compiler is potentially reordering >> __raw_readX() against, and why this would be wrong? >> >> e.g. do we care about prior normal memory accesses, subsequent normal >> memory accesses, and/or other IO accesses? >> >> I assume that the asm-generic __raw_{read,write}X() implementations are >> all ordered w.r.t. each other (at least for a specific device). > > I think that is correct: the compiler won't reorder those because of the > 'volatile' pointer dereference, but it can reorder access to a normal > pointer against a __raw_readl()/__raw_writel(), which breaks the scenario > of using writel to trigger a DMA, or using a readl to see if a DMA has > completed. Yes, we are worried about memory update vs. IO update ordering here. That was the reason why barrier() was introduced in this patch. I'll try to clarify that better in the commit text. > > The question is whether we should use a stronger barrier such > as rmb() amd wmb() here rather than a simple compiler barrier. > > I would assume that on complex architectures with write buffers and > out-of-order prefetching, those are required, while on architectures > without those features, the barriers are cheap. That's my reasoning too. I'm trying to follow the x86 example here where there is a compiler barrier in writeX() and readX() family of functions. > > Arnd >
On Tue, Apr 3, 2018 at 2:44 PM, Sinan Kaya <okaya@codeaurora.org> wrote: > On 4/3/2018 7:13 AM, Arnd Bergmann wrote: >> On Tue, Apr 3, 2018 at 12:49 PM, Mark Rutland <mark.rutland@arm.com> wrote: >>> Hi, >>> >>> On Fri, Mar 30, 2018 at 11:58:13AM -0400, Sinan Kaya wrote: >>>> The default implementation of mapping readX() to __raw_readX() is wrong. >>>> readX() has stronger ordering semantics. Compiler is allowed to reorder >>>> __raw_readX(). >>> >>> Could you please specify what the compiler is potentially reordering >>> __raw_readX() against, and why this would be wrong? >>> >>> e.g. do we care about prior normal memory accesses, subsequent normal >>> memory accesses, and/or other IO accesses? >>> >>> I assume that the asm-generic __raw_{read,write}X() implementations are >>> all ordered w.r.t. each other (at least for a specific device). >> >> I think that is correct: the compiler won't reorder those because of the >> 'volatile' pointer dereference, but it can reorder access to a normal >> pointer against a __raw_readl()/__raw_writel(), which breaks the scenario >> of using writel to trigger a DMA, or using a readl to see if a DMA has >> completed. > > Yes, we are worried about memory update vs. IO update ordering here. > That was the reason why barrier() was introduced in this patch. I'll try to > clarify that better in the commit text. > >> >> The question is whether we should use a stronger barrier such >> as rmb() amd wmb() here rather than a simple compiler barrier. >> >> I would assume that on complex architectures with write buffers and >> out-of-order prefetching, those are required, while on architectures >> without those features, the barriers are cheap. > > That's my reasoning too. I'm trying to follow the x86 example here where there > is a compiler barrier in writeX() and readX() family of functions. I think x86 is the special case here because it implicitly guarantees the strict ordering in the hardware, as long as the compiler gets it right. For the asm-generic version, it may be better to play safe and do the safest version, requiring architectures to override that barrier if they want to be faster. We could use the same macros that riscv has, using __io_br(), __io_ar(), __io_bw() and __io_aw() for before/after read/write. Arnd
On 4/3/2018 8:56 AM, Arnd Bergmann wrote: > On Tue, Apr 3, 2018 at 2:44 PM, Sinan Kaya <okaya@codeaurora.org> wrote: >> On 4/3/2018 7:13 AM, Arnd Bergmann wrote: >>> On Tue, Apr 3, 2018 at 12:49 PM, Mark Rutland <mark.rutland@arm.com> wrote: >>>> Hi, >>>> >>>> On Fri, Mar 30, 2018 at 11:58:13AM -0400, Sinan Kaya wrote: >>>>> The default implementation of mapping readX() to __raw_readX() is wrong. >>>>> readX() has stronger ordering semantics. Compiler is allowed to reorder >>>>> __raw_readX(). >>>> >>>> Could you please specify what the compiler is potentially reordering >>>> __raw_readX() against, and why this would be wrong? >>>> >>>> e.g. do we care about prior normal memory accesses, subsequent normal >>>> memory accesses, and/or other IO accesses? >>>> >>>> I assume that the asm-generic __raw_{read,write}X() implementations are >>>> all ordered w.r.t. each other (at least for a specific device). >>> >>> I think that is correct: the compiler won't reorder those because of the >>> 'volatile' pointer dereference, but it can reorder access to a normal >>> pointer against a __raw_readl()/__raw_writel(), which breaks the scenario >>> of using writel to trigger a DMA, or using a readl to see if a DMA has >>> completed. >> >> Yes, we are worried about memory update vs. IO update ordering here. >> That was the reason why barrier() was introduced in this patch. I'll try to >> clarify that better in the commit text. >> >>> >>> The question is whether we should use a stronger barrier such >>> as rmb() amd wmb() here rather than a simple compiler barrier. >>> >>> I would assume that on complex architectures with write buffers and >>> out-of-order prefetching, those are required, while on architectures >>> without those features, the barriers are cheap. >> >> That's my reasoning too. I'm trying to follow the x86 example here where there >> is a compiler barrier in writeX() and readX() family of functions. > > I think x86 is the special case here because it implicitly guarantees > the strict ordering in the hardware, as long as the compiler gets it > right. For the asm-generic version, it may be better to play safe and > do the safest version, requiring architectures to override that barrier > if they want to be faster. > > We could use the same macros that riscv has, using __io_br(), > __io_ar(), __io_bw() and __io_aw() for before/after read/write. Sure, let me take a stab at it. > > Arnd >
On Tue, 03 Apr 2018 05:56:18 PDT (-0700), Arnd Bergmann wrote: > On Tue, Apr 3, 2018 at 2:44 PM, Sinan Kaya <okaya@codeaurora.org> wrote: >> On 4/3/2018 7:13 AM, Arnd Bergmann wrote: >>> On Tue, Apr 3, 2018 at 12:49 PM, Mark Rutland <mark.rutland@arm.com> wrote: >>>> Hi, >>>> >>>> On Fri, Mar 30, 2018 at 11:58:13AM -0400, Sinan Kaya wrote: >>>>> The default implementation of mapping readX() to __raw_readX() is wrong. >>>>> readX() has stronger ordering semantics. Compiler is allowed to reorder >>>>> __raw_readX(). >>>> >>>> Could you please specify what the compiler is potentially reordering >>>> __raw_readX() against, and why this would be wrong? >>>> >>>> e.g. do we care about prior normal memory accesses, subsequent normal >>>> memory accesses, and/or other IO accesses? >>>> >>>> I assume that the asm-generic __raw_{read,write}X() implementations are >>>> all ordered w.r.t. each other (at least for a specific device). >>> >>> I think that is correct: the compiler won't reorder those because of the >>> 'volatile' pointer dereference, but it can reorder access to a normal >>> pointer against a __raw_readl()/__raw_writel(), which breaks the scenario >>> of using writel to trigger a DMA, or using a readl to see if a DMA has >>> completed. >> >> Yes, we are worried about memory update vs. IO update ordering here. >> That was the reason why barrier() was introduced in this patch. I'll try to >> clarify that better in the commit text. >> >>> >>> The question is whether we should use a stronger barrier such >>> as rmb() amd wmb() here rather than a simple compiler barrier. >>> >>> I would assume that on complex architectures with write buffers and >>> out-of-order prefetching, those are required, while on architectures >>> without those features, the barriers are cheap. >> >> That's my reasoning too. I'm trying to follow the x86 example here where there >> is a compiler barrier in writeX() and readX() family of functions. > > I think x86 is the special case here because it implicitly guarantees > the strict ordering in the hardware, as long as the compiler gets it > right. For the asm-generic version, it may be better to play safe and > do the safest version, requiring architectures to override that barrier > if they want to be faster. > > We could use the same macros that riscv has, using __io_br(), > __io_ar(), __io_bw() and __io_aw() for before/after read/write. FWIW, when I wrote this I wasn't sure what the RISC-V memory model was going to be so I just picked something generic. In other words, it's already a generic interface, just one that we're the only users of :).
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index e8c2078..2554f15 100644 --- a/include/asm-generic/io.h +++ b/include/asm-generic/io.h @@ -110,7 +110,12 @@ static inline void __raw_writeq(u64 value, volatile void __iomem *addr) #define readb readb static inline u8 readb(const volatile void __iomem *addr) { - return __raw_readb(addr); + u8 val; + + val = __raw_readb(addr); + barrier(); + + return val; } #endif @@ -118,7 +123,12 @@ static inline u8 readb(const volatile void __iomem *addr) #define readw readw static inline u16 readw(const volatile void __iomem *addr) { - return __le16_to_cpu(__raw_readw(addr)); + u16 val; + + val = __le16_to_cpu(__raw_readw(addr)); + barrier(); + + return val; } #endif @@ -126,7 +136,12 @@ static inline u16 readw(const volatile void __iomem *addr) #define readl readl static inline u32 readl(const volatile void __iomem *addr) { - return __le32_to_cpu(__raw_readl(addr)); + u32 val; + + val = __le32_to_cpu(__raw_readl(addr)); + barrier(); + + return val; } #endif @@ -135,7 +150,12 @@ static inline u32 readl(const volatile void __iomem *addr) #define readq readq static inline u64 readq(const volatile void __iomem *addr) { - return __le64_to_cpu(__raw_readq(addr)); + u64 val; + + val = __le64_to_cpu(__raw_readq(addr)); + barrier(); + + return val; } #endif #endif /* CONFIG_64BIT */
The default implementation of mapping readX() to __raw_readX() is wrong. readX() has stronger ordering semantics. Compiler is allowed to reorder __raw_readX(). In the abscence of a read barrier or when using a strongly ordered architecture, readX() should at least have a compiler barrier in it to prevent commpiler from clobbering the execution order. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> --- include/asm-generic/io.h | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-)