Message ID | 20180321163745.12286-2-logang@deltatee.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
On Wed, Mar 21, 2018 at 10:37:36AM -0600, Logan Gunthorpe wrote: > The semantics of the iowriteXXbe() functions are to write a > value in CPU endianess to an IO register that is known by the > caller to be in Big Endian. The mmio_writeXXbe() macro, which > is called by iowriteXXbe(), should therefore use cpu_to_beXX() > instead of beXX_to_cpu(). > > Seeing both beXX_to_cpu() and cpu_to_beXX() are both functionally > implemented as either null operations or swabXX operations there > was no noticable bug here. But it is confusing for both developers > and code analysis tools alike. > > Signed-off-by: Logan Gunthorpe <logang@deltatee.com> > Cc: Philippe Ombredanne <pombredanne@nexb.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Kate Stewart <kstewart@linuxfoundation.org> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> > --- > lib/iomap.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/lib/iomap.c b/lib/iomap.c > index 541d926da95e..be120c13d6cc 100644 > --- a/lib/iomap.c > +++ b/lib/iomap.c > @@ -106,8 +106,8 @@ EXPORT_SYMBOL(ioread32be); > #endif > > #ifndef mmio_write16be > -#define mmio_write16be(val,port) __raw_writew(be16_to_cpu(val),port) > -#define mmio_write32be(val,port) __raw_writel(be32_to_cpu(val),port) > +#define mmio_write16be(val,port) __raw_writew(cpu_to_be16(val),port) > +#define mmio_write32be(val,port) __raw_writel(cpu_to_be32(val),port) > #endif > > void iowrite8(u8 val, void __iomem *addr) > -- > 2.11.0 > LGTM, feel free to add my Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> -- Luc
On Wed, Mar 21, 2018 at 5:37 PM, Logan Gunthorpe <logang@deltatee.com> wrote: > The semantics of the iowriteXXbe() functions are to write a > value in CPU endianess to an IO register that is known by the > caller to be in Big Endian. The mmio_writeXXbe() macro, which > is called by iowriteXXbe(), should therefore use cpu_to_beXX() > instead of beXX_to_cpu(). > > Seeing both beXX_to_cpu() and cpu_to_beXX() are both functionally > implemented as either null operations or swabXX operations there > was no noticable bug here. But it is confusing for both developers > and code analysis tools alike. > > Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Your patch is a clear improvement of what we had before, but I notice that we have a weird asymmetry between big-endian and little-endian accessors before and after this patch: void iowrite32(u32 val, void __iomem *addr) { IO_COND(addr, outl(val,port), writel(val, addr)); } void iowrite32be(u32 val, void __iomem *addr) { IO_COND(addr, pio_write32be(val,port), mmio_write32be(val, addr)); } The little-endian iowrite32() when applied to mmio registers uses a 32-bit wide atomic store to a little-endian register with barriers to order against both spinlocks and DMA. The big-endian iowrite32be() on the same pointer uses a nonatomic store with no barriers whatsoever and the opposite endianess. On most architectures, this is not important: - For x86, the stores are aways atomic and no additional barriers are needed, so the two are the same - For ARM (both 32 and 64-bit), powerpc and many others, we don't use the generic iowrite() and just fall back to writel() or writel(swab32()). However, shouldn't we just use the writel(swab32()) logic here as well for the common case rather than risking missing barriers? Arnd
On 26/03/18 04:53 AM, Arnd Bergmann wrote: > On most architectures, this is not important: > - For x86, the stores are aways atomic and no additional barriers > are needed, so the two are the same > - For ARM (both 32 and 64-bit), powerpc and many others, we don't > use the generic iowrite() and just fall back to writel() or > writel(swab32()). > > However, shouldn't we just use the writel(swab32()) logic here as well > for the common case rather than risking missing barriers? Hmm, I don't know... it's complicated? Doing a bit of digging shows that the existing code was written during a time when writel() did not include extra barriers over __raw_writel() in any of the common arches. The commit logs don't seem to provide any guidance as to why this it was done this way, but I'd assume it was done to avoid a double swab() call on BE arches. Seeing writel() is typically implemented as: __raw_writel(__cpu_to_le32(value), addr); Then on BE arches, writel(swab32()) would become: __raw_writel(swab32(swab32(value)), addr) Which seems undesirable. Logan
On Mon, Mar 26, 2018 at 6:21 PM, Logan Gunthorpe <logang@deltatee.com> wrote: > > > On 26/03/18 04:53 AM, Arnd Bergmann wrote: >> On most architectures, this is not important: >> - For x86, the stores are aways atomic and no additional barriers >> are needed, so the two are the same >> - For ARM (both 32 and 64-bit), powerpc and many others, we don't >> use the generic iowrite() and just fall back to writel() or >> writel(swab32()). >> >> However, shouldn't we just use the writel(swab32()) logic here as well >> for the common case rather than risking missing barriers? > > Hmm, I don't know... it's complicated? > > Doing a bit of digging shows that the existing code was written during a > time when writel() did not include extra barriers over __raw_writel() in > any of the common arches. > > The commit logs don't seem to provide any guidance as to why this it was > done this way, but I'd assume it was done to avoid a double swab() call > on BE arches. Seeing writel() is typically implemented as: > > __raw_writel(__cpu_to_le32(value), addr); > > Then on BE arches, writel(swab32()) would become: > > __raw_writel(swab32(swab32(value)), addr) > > Which seems undesirable. I wouldn't expect it to matter: the byte swap is almost always much cheaper compared to the actual bus access for the MMIO, and I would also guess that modern compilers can eliminate the double byte swap on architectures where writel() is an inline function. Most of the important architectures use ARCH_USE_BUILTIN_BSWAP, which guarantees that. Arnd
On 26/03/18 01:50 PM, Arnd Bergmann wrote: > I wouldn't expect it to matter: the byte swap is almost always much > cheaper compared to the actual bus access for the MMIO, and I > would also guess that modern compilers can eliminate the double > byte swap on architectures where writel() is an inline function. Most of > the important architectures use ARCH_USE_BUILTIN_BSWAP, which > guarantees that. Fair enough. Sometime this week I'll update my patch set to change that. Thanks, Logan
diff --git a/lib/iomap.c b/lib/iomap.c index 541d926da95e..be120c13d6cc 100644 --- a/lib/iomap.c +++ b/lib/iomap.c @@ -106,8 +106,8 @@ EXPORT_SYMBOL(ioread32be); #endif #ifndef mmio_write16be -#define mmio_write16be(val,port) __raw_writew(be16_to_cpu(val),port) -#define mmio_write32be(val,port) __raw_writel(be32_to_cpu(val),port) +#define mmio_write16be(val,port) __raw_writew(cpu_to_be16(val),port) +#define mmio_write32be(val,port) __raw_writel(cpu_to_be32(val),port) #endif void iowrite8(u8 val, void __iomem *addr)
The semantics of the iowriteXXbe() functions are to write a value in CPU endianess to an IO register that is known by the caller to be in Big Endian. The mmio_writeXXbe() macro, which is called by iowriteXXbe(), should therefore use cpu_to_beXX() instead of beXX_to_cpu(). Seeing both beXX_to_cpu() and cpu_to_beXX() are both functionally implemented as either null operations or swabXX operations there was no noticable bug here. But it is confusing for both developers and code analysis tools alike. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> --- lib/iomap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)