Message ID | 20220429135108.2781579-2-schnelle@linux.ibm.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [RFC,v2,01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary | expand |
On Fri, 29 Apr 2022, Niklas Schnelle wrote: > We introduce a new HAS_IOPORT Kconfig option to indicate support for > I/O Port access. In a future patch HAS_IOPORT=n will disable compilation > of the I/O accessor functions inb()/outb() and friends on architectures > which can not meaningfully support legacy I/O spaces such as s390 or > where such support is optional. The "depends on" relations on HAS_IOPORT > in drivers as well as ifdefs for HAS_IOPORT specific sections will be > added in subsequent patches on a per subsystem basis. [...] > diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig > index de3b32a507d2..4c55df08d6f1 100644 > --- a/arch/mips/Kconfig > +++ b/arch/mips/Kconfig > @@ -47,6 +47,7 @@ config MIPS > select GENERIC_SMP_IDLE_THREAD > select GENERIC_TIME_VSYSCALL > select GUP_GET_PTE_LOW_HIGH if CPU_MIPS32 && PHYS_ADDR_T_64BIT > + select HAS_IOPORT > select HAVE_ARCH_COMPILER_H > select HAVE_ARCH_JUMP_LABEL > select HAVE_ARCH_KGDB if MIPS_FP_SUPPORT NAK, not all MIPS systems have the port I/O space, and we have it already handled via the NO_IOPORT_MAP option. We'll need to have HAS_IOPORT set to !NO_IOPORT_MAP (or vice versa) for the MIPS architecture. Maciej
On Fri, Apr 29, 2022 at 03:49:59PM +0200, Niklas Schnelle wrote: > We introduce a new HAS_IOPORT Kconfig option to indicate support for > I/O Port access. In a future patch HAS_IOPORT=n will disable compilation > of the I/O accessor functions inb()/outb() and friends on architectures > which can not meaningfully support legacy I/O spaces such as s390 or > where such support is optional. So you plan to drop inb()/outb() on architectures where I/O port space is optional? So even platforms that have I/O port space may not be able to use it? This feels like a lot of work where the main benefit is to keep Kconfig from offering drivers that aren't of interest on s390. Granted, there may be issues where inb()/outb() does the wrong thing such as dereferencing null pointers when I/O port space isn't implemented. I think that's a defect in inb()/outb() and could be fixed there. Bjorn
On Wed, May 4, 2022 at 11:08 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Apr 29, 2022 at 03:49:59PM +0200, Niklas Schnelle wrote: > > We introduce a new HAS_IOPORT Kconfig option to indicate support for > > I/O Port access. In a future patch HAS_IOPORT=n will disable compilation > > of the I/O accessor functions inb()/outb() and friends on architectures > > which can not meaningfully support legacy I/O spaces such as s390 or > > where such support is optional. > > So you plan to drop inb()/outb() on architectures where I/O port space > is optional? So even platforms that have I/O port space may not be > able to use it? > > This feels like a lot of work where the main benefit is to keep > Kconfig from offering drivers that aren't of interest on s390. > > Granted, there may be issues where inb()/outb() does the wrong thing > such as dereferencing null pointers when I/O port space isn't > implemented. I think that's a defect in inb()/outb() and could be > fixed there. The current implementation in asm-generic/io.h implements inb()/outb() using readb()/writeb() with a fixed architecture specific offset. There are three possible things that can happen here: a) there is a host bridge driver that maps its I/O ports to this window, and everything works b) the address range is reserved and accessible but no host bridge driver has mapped its registers there, so an access causes a page fault c) the architecture does not define an offset, and accessing low I/O ports ends up as a NULL pointer dereference The main goal is to avoid c), which is what happens on s390, but can also happen elsewhere. Catching b) would be nice as well, but is much harder to do from generic code as you'd need an architecture specific inline asm statement to insert a ex_table fixup, or a runtime conditional on each access. Arnd
On Wed, 2022-05-04 at 23:31 +0200, Arnd Bergmann wrote: > On Wed, May 4, 2022 at 11:08 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Apr 29, 2022 at 03:49:59PM +0200, Niklas Schnelle wrote: > > > We introduce a new HAS_IOPORT Kconfig option to indicate support for > > > I/O Port access. In a future patch HAS_IOPORT=n will disable compilation > > > of the I/O accessor functions inb()/outb() and friends on architectures > > > which can not meaningfully support legacy I/O spaces such as s390 or > > > where such support is optional. > > > > So you plan to drop inb()/outb() on architectures where I/O port space > > is optional? So even platforms that have I/O port space may not be > > able to use it? > > > > This feels like a lot of work where the main benefit is to keep > > Kconfig from offering drivers that aren't of interest on s390. > > > > Granted, there may be issues where inb()/outb() does the wrong thing > > such as dereferencing null pointers when I/O port space isn't > > implemented. I think that's a defect in inb()/outb() and could be > > fixed there. > > The current implementation in asm-generic/io.h implements inb()/outb() > using readb()/writeb() with a fixed architecture specific offset. > > There are three possible things that can happen here: > > a) there is a host bridge driver that maps its I/O ports to this window, > and everything works > b) the address range is reserved and accessible but no host bridge > driver has mapped its registers there, so an access causes a > page fault > c) the architecture does not define an offset, and accessing low I/O > ports ends up as a NULL pointer dereference > > The main goal is to avoid c), which is what happens on s390, but > can also happen elsewhere. Catching b) would be nice as well, > but is much harder to do from generic code as you'd need an > architecture specific inline asm statement to insert a ex_table > fixup, or a runtime conditional on each access. > > Arnd Yes and to add to this, we did try a local solution in inb()/outb() before. This added a warning when they are used and we know at compile time that we're dealing with case c). This approach was nacked by Linus though as we were turning a compile time known broken case into a runtime one: https://lore.kernel.org/lkml/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@mail.gmail.com/ I do agree with this assesment and think this is the rightâ„¢ approach but it is more churn as can be seen by the size of this series. I think longer term it could be valuable though especially if more platforms phase out I/O port support like POWER9 for which this also allows filtering what drivers will never work.
On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote: > On Wed, May 4, 2022 at 11:08 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Apr 29, 2022 at 03:49:59PM +0200, Niklas Schnelle wrote: > > > We introduce a new HAS_IOPORT Kconfig option to indicate support for > > > I/O Port access. In a future patch HAS_IOPORT=n will disable compilation > > > of the I/O accessor functions inb()/outb() and friends on architectures > > > which can not meaningfully support legacy I/O spaces such as s390 or > > > where such support is optional. > > > > So you plan to drop inb()/outb() on architectures where I/O port space > > is optional? So even platforms that have I/O port space may not be > > able to use it? > > > > This feels like a lot of work where the main benefit is to keep > > Kconfig from offering drivers that aren't of interest on s390. > > > > Granted, there may be issues where inb()/outb() does the wrong thing > > such as dereferencing null pointers when I/O port space isn't > > implemented. I think that's a defect in inb()/outb() and could be > > fixed there. > > The current implementation in asm-generic/io.h implements inb()/outb() > using readb()/writeb() with a fixed architecture specific offset. > > There are three possible things that can happen here: > > a) there is a host bridge driver that maps its I/O ports to this window, > and everything works > b) the address range is reserved and accessible but no host bridge > driver has mapped its registers there, so an access causes a > page fault > c) the architecture does not define an offset, and accessing low I/O > ports ends up as a NULL pointer dereference > > The main goal is to avoid c), which is what happens on s390, but > can also happen elsewhere. Catching b) would be nice as well, > but is much harder to do from generic code as you'd need an > architecture specific inline asm statement to insert a ex_table > fixup, or a runtime conditional on each access. Or s390 could implement its own inb(). I'm hearing that generic powerpc kernels have to run both on machines that have I/O port space and those that don't. That makes me think s390 could do something similar. Bjorn
On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote: > > > > The main goal is to avoid c), which is what happens on s390, but > > can also happen elsewhere. Catching b) would be nice as well, > > but is much harder to do from generic code as you'd need an > > architecture specific inline asm statement to insert a ex_table > > fixup, or a runtime conditional on each access. > > Or s390 could implement its own inb(). > > I'm hearing that generic powerpc kernels have to run both on machines > that have I/O port space and those that don't. That makes me think > s390 could do something similar. No, this is actually the current situation, and it makes absolutely no sense. s390 has no way of implementing inb()/outb() because there are no instructions for it and it cannot tunnel them through a virtual address mapping like on most of the other architectures. (it has special instructions for accessing memory space, which is not the same as a pointer dereference here). The existing implementation gets flagged as a NULL pointer dereference by a compiler warning because it effectively is. powerpc kernels generally map the I/O space into a section of the physical address space, where it gets mapped into a fixed virtual address and accessed through pointer dereference. This works on any powerpc CPU as long as it is implemented in the PCI host bridge in the usual way. The only difference between powerpc and arm here is that there are fewer implementations, so one can make assumptions about which PCI host bridge is used based on a CPU core. Arnd
On Thu, May 05, 2022 at 07:39:42PM +0200, Arnd Bergmann wrote: > On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote: > > > > > > The main goal is to avoid c), which is what happens on s390, but > > > can also happen elsewhere. Catching b) would be nice as well, > > > but is much harder to do from generic code as you'd need an > > > architecture specific inline asm statement to insert a ex_table > > > fixup, or a runtime conditional on each access. > > > > Or s390 could implement its own inb(). > > > > I'm hearing that generic powerpc kernels have to run both on machines > > that have I/O port space and those that don't. That makes me think > > s390 could do something similar. > > No, this is actually the current situation, and it makes absolutely no > sense. s390 has no way of implementing inb()/outb() because there > are no instructions for it and it cannot tunnel them through a virtual > address mapping like on most of the other architectures. (it has special > instructions for accessing memory space, which is not the same as > a pointer dereference here). > > The existing implementation gets flagged as a NULL pointer dereference > by a compiler warning because it effectively is. I think s390 currently uses the inb() in asm-generic/io.h, i.e., "__raw_readb(PCI_IOBASE + addr)". I understand that's a NULL pointer dereference because the default PCI_IOBASE is 0. I mooted a s390 inb() implementation like "return ~0" because that's what happens on most arches when there's no device to respond to the inb(). The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter drivers that use I/O ports in some cases but not others. But maybe it's the most practical way. Bjorn
On Thu, 5 May 2022, Bjorn Helgaas wrote: > On Thu, May 05, 2022 at 07:39:42PM +0200, Arnd Bergmann wrote: > > On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote: > > > > > > > > The main goal is to avoid c), which is what happens on s390, but > > > > can also happen elsewhere. Catching b) would be nice as well, > > > > but is much harder to do from generic code as you'd need an > > > > architecture specific inline asm statement to insert a ex_table > > > > fixup, or a runtime conditional on each access. > > > > > > Or s390 could implement its own inb(). > > > > > > I'm hearing that generic powerpc kernels have to run both on machines > > > that have I/O port space and those that don't. That makes me think > > > s390 could do something similar. > > > > No, this is actually the current situation, and it makes absolutely no > > sense. s390 has no way of implementing inb()/outb() because there > > are no instructions for it and it cannot tunnel them through a virtual > > address mapping like on most of the other architectures. (it has special > > instructions for accessing memory space, which is not the same as > > a pointer dereference here). > > > > The existing implementation gets flagged as a NULL pointer dereference > > by a compiler warning because it effectively is. > > I think s390 currently uses the inb() in asm-generic/io.h, i.e., > "__raw_readb(PCI_IOBASE + addr)". I understand that's a NULL pointer > dereference because the default PCI_IOBASE is 0. > > I mooted a s390 inb() implementation like "return ~0" because that's > what happens on most arches when there's no device to respond to the > inb(). > > The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter > drivers that use I/O ports in some cases but not others. But maybe > it's the most practical way. > Do you mean, "the most practical way to avoid a compiler warning on s390"? What about "#pragma GCC diagnostic ignored"?
On Thu, 2022-05-05 at 14:53 -0500, Bjorn Helgaas wrote: > On Thu, May 05, 2022 at 07:39:42PM +0200, Arnd Bergmann wrote: > > On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote: > > > > The main goal is to avoid c), which is what happens on s390, but > > > > can also happen elsewhere. Catching b) would be nice as well, > > > > but is much harder to do from generic code as you'd need an > > > > architecture specific inline asm statement to insert a ex_table > > > > fixup, or a runtime conditional on each access. > > > > > > Or s390 could implement its own inb(). > > > > > > I'm hearing that generic powerpc kernels have to run both on machines > > > that have I/O port space and those that don't. That makes me think > > > s390 could do something similar. > > > > No, this is actually the current situation, and it makes absolutely no > > sense. s390 has no way of implementing inb()/outb() because there > > are no instructions for it and it cannot tunnel them through a virtual > > address mapping like on most of the other architectures. (it has special > > instructions for accessing memory space, which is not the same as > > a pointer dereference here). > > > > The existing implementation gets flagged as a NULL pointer dereference > > by a compiler warning because it effectively is. > > I think s390 currently uses the inb() in asm-generic/io.h, i.e., > "__raw_readb(PCI_IOBASE + addr)". I understand that's a NULL pointer > dereference because the default PCI_IOBASE is 0. > > I mooted a s390 inb() implementation like "return ~0" because that's > what happens on most arches when there's no device to respond to the > inb(). > > The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter > drivers that use I/O ports in some cases but not others. But maybe > it's the most practical way. > > Bjorn I fear such stubs are kind of equivalent to my previous patch doing the same in asm-generic/io.h that was pulled and then unpulled by Linus. Maybe it would be slightly different if instead of a warning outX() would just be a NOP and inX() just returned ~0 but we're in essence pretending that we have these functions when we know they are nonsense. Another argument I see is that as shown by POWER9 we might start to see more platforms that just can't do I/O port access. E.g. I would also be surprised if Apple's M1 has I/O port access. Sooner or later I expect distributions on some platforms to only support such systems. For example on ppc a server distribution might only support IBM POWER without I/O port support before too long. Then having HAS_IOPORT allows to get rid of drivers that won't work anyway. There are also reports of probing a driver with I/O ports causing a system crash on systems without I/O port support. For example in this answer by John Garry (added so he may supply more information): https://lore.kernel.org/lkml/db043b76-880d-5fad-69cf-96abcd9cd34f@huawei.com/
On Thu, 5 May 2022, Arnd Bergmann wrote: > > I'm hearing that generic powerpc kernels have to run both on machines > > that have I/O port space and those that don't. That makes me think > > s390 could do something similar. > > No, this is actually the current situation, and it makes absolutely no > sense. s390 has no way of implementing inb()/outb() because there > are no instructions for it and it cannot tunnel them through a virtual > address mapping like on most of the other architectures. (it has special > instructions for accessing memory space, which is not the same as > a pointer dereference here). I think I'm missing something here. IIUC we're talking about a PCI/PCIe bus used with s390 hardware, right? (It has to be PCI/PCIe, because other than x86/IA-64 host buses there are only PCI/PCIe and EISA/ISA buses out there that define I/O access cycles and EISA/ISA have long been obsoleted except perhaps from some niche use.) If this is PCI/PCIe indeed, then an I/O access is just a different bit pattern put on the bus/in the TLP in the address phase. So what is there inherent to the s390 architecture that prevents that different bit pattern from being used? If anything, I could imagine the same limitation as with current POWER9 implementations, that is whatever glue is used to wire PCI/PCIe to the rest of the system does not implement a way to use said bit pattern (which has nothing to do with the POWER9 processor instruction set). But that has nothing to do with the presence or absence of any specific processor instructions. It's just a limitation of bus glue. So I guess it's just that all PCI/PCIe glue logic implementations for s390 have such a limitation, right? Maciej
On 06/05/2022 10:38, Niklas Schnelle wrote: > Another argument I see is that as shown by POWER9 we might start to see > more platforms that just can't do I/O port access. E.g. I would also be > surprised if Apple's M1 has I/O port access. Sooner or later I expect > distributions on some platforms to only support such systems. For > example on ppc a server distribution might only support IBM POWER > without I/O port support before too long. Then having HAS_IOPORT allows > to get rid of drivers that won't work anyway. > > There are also reports of probing a driver with I/O ports causing a > system crash on systems without I/O port support. For example in this > answer by John Garry (added so he may supply more information): > > https://lore.kernel.org/lkml/db043b76-880d-5fad-69cf-96abcd9cd34f@huawei.com/ > > . That issue is that drivers like hwmon f71805f use inb/outb accessors with hardcoded IO port addresses to probe the driver. On archs like arm64 or powerpc - which do not natively support inb et al - this may crash the system when no PCI IO space is mapped [0]. Indeed, when PCI IO space is mapped, it is preferable these those drivers still would not access these ports. So this series from Niklas could be used as a basis to solve that problem, in that we could also introduce something like HARDCODED_IOPORT [1] to stop those drivers being built at all for arm64. [0] https://lore.kernel.org/linux-input/20210112055129.7840-1-song.bao.hua@hisilicon.com/T/#mf86445470160c44ac110e9d200b09245169dc5b6 [1] https://lore.kernel.org/lkml/CAK8P3a3HHeP+Gw_k2P7Qtig0OmErf0HN30G22+qHic_uZTh11Q@mail.gmail.com/ Thanks, John
On Fri, 2022-05-06 at 19:12 +1000, Finn Thain wrote: > > On Thu, 5 May 2022, Bjorn Helgaas wrote: > > > On Thu, May 05, 2022 at 07:39:42PM +0200, Arnd Bergmann wrote: > > > On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote: > > > > > The main goal is to avoid c), which is what happens on s390, but > > > > > can also happen elsewhere. Catching b) would be nice as well, > > > > > but is much harder to do from generic code as you'd need an > > > > > architecture specific inline asm statement to insert a ex_table > > > > > fixup, or a runtime conditional on each access. > > > > > > > > Or s390 could implement its own inb(). > > > > > > > > I'm hearing that generic powerpc kernels have to run both on machines > > > > that have I/O port space and those that don't. That makes me think > > > > s390 could do something similar. > > > > > > No, this is actually the current situation, and it makes absolutely no > > > sense. s390 has no way of implementing inb()/outb() because there > > > are no instructions for it and it cannot tunnel them through a virtual > > > address mapping like on most of the other architectures. (it has special > > > instructions for accessing memory space, which is not the same as > > > a pointer dereference here). > > > > > > The existing implementation gets flagged as a NULL pointer dereference > > > by a compiler warning because it effectively is. > > > > I think s390 currently uses the inb() in asm-generic/io.h, i.e., > > "__raw_readb(PCI_IOBASE + addr)". I understand that's a NULL pointer > > dereference because the default PCI_IOBASE is 0. > > > > I mooted a s390 inb() implementation like "return ~0" because that's > > what happens on most arches when there's no device to respond to the > > inb(). > > > > The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter > > drivers that use I/O ports in some cases but not others. But maybe > > it's the most practical way. > > > > Do you mean, "the most practical way to avoid a compiler warning on s390"? > What about "#pragma GCC diagnostic ignored"? This actually happens with clang. Apart from that, I think this would also fall under the same argument as the original patch Linus unpulled. We would just paint over someting that we know at compile time won't work: https://lore.kernel.org/lkml/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@mail.gmail.com/
On Fri, May 6, 2022 at 12:20 PM Maciej W. Rozycki <macro@orcam.me.uk> wrote: > On Thu, 5 May 2022, Arnd Bergmann wrote: > I think I'm missing something here. IIUC we're talking about a PCI/PCIe > bus used with s390 hardware, right? > > (It has to be PCI/PCIe, because other than x86/IA-64 host buses there are > only PCI/PCIe and EISA/ISA buses out there that define I/O access cycles > and EISA/ISA have long been obsoleted except perhaps from some niche use.) > > If this is PCI/PCIe indeed, then an I/O access is just a different bit > pattern put on the bus/in the TLP in the address phase. So what is there > inherent to the s390 architecture that prevents that different bit pattern > from being used? The hardware design for PCI on s390 is very different from any other architecture, and more abstract. Rather than implementing MMIO register access as pointer dereference, this is a separate CPU instruction that takes a device/bar plus offset as arguments rather than a pointer, and Linux encodes this back into a fake __iomem token. > If anything, I could imagine the same limitation as with current POWER9 > implementations, that is whatever glue is used to wire PCI/PCIe to the > rest of the system does not implement a way to use said bit pattern (which > has nothing to do with the POWER9 processor instruction set). > > But that has nothing to do with the presence or absence of any specific > processor instructions. It's just a limitation of bus glue. So I guess > it's just that all PCI/PCIe glue logic implementations for s390 have such > a limitation, right? There are separate instructions for PCI memory and config space, but no instructions for I/O space, or for non-PCI MMIO that it could be mapped into. Arnd
On Fri, 6 May 2022, Arnd Bergmann wrote: > > If this is PCI/PCIe indeed, then an I/O access is just a different bit > > pattern put on the bus/in the TLP in the address phase. So what is there > > inherent to the s390 architecture that prevents that different bit pattern > > from being used? > > The hardware design for PCI on s390 is very different from any other > architecture, and more abstract. Rather than implementing MMIO register > access as pointer dereference, this is a separate CPU instruction that > takes a device/bar plus offset as arguments rather than a pointer, and > Linux encodes this back into a fake __iomem token. OK, that seems to me like a reasonable and quite a clean design (on the hardware side). So what happens if the instruction is given an I/O rather than memory BAR as the relevant argument? Is the address space indicator bit (bit #0) simply ignored or what? > > But that has nothing to do with the presence or absence of any specific > > processor instructions. It's just a limitation of bus glue. So I guess > > it's just that all PCI/PCIe glue logic implementations for s390 have such > > a limitation, right? > > There are separate instructions for PCI memory and config space, but > no instructions for I/O space, or for non-PCI MMIO that it could be mapped > into. The PCI configuration space was retrofitted into x86 systems (and is accessed in an awkward manner with them), but with a new design such a clean approach is most welcome IMHO. Thank you for your explanation. Maciej
On Fri, 2022-05-06 at 13:33 +0200, Arnd Bergmann wrote: > On Fri, May 6, 2022 at 12:20 PM Maciej W. Rozycki <macro@orcam.me.uk> wrote: > > On Thu, 5 May 2022, Arnd Bergmann wrote: > > I think I'm missing something here. IIUC we're talking about a PCI/PCIe > > bus used with s390 hardware, right? > > > > (It has to be PCI/PCIe, because other than x86/IA-64 host buses there are > > only PCI/PCIe and EISA/ISA buses out there that define I/O access cycles > > and EISA/ISA have long been obsoleted except perhaps from some niche use.) > > > > If this is PCI/PCIe indeed, then an I/O access is just a different bit > > pattern put on the bus/in the TLP in the address phase. So what is there > > inherent to the s390 architecture that prevents that different bit pattern > > from being used? > > The hardware design for PCI on s390 is very different from any other > architecture, and more abstract. Rather than implementing MMIO register > access as pointer dereference, this is a separate CPU instruction that > takes a device/bar plus offset as arguments rather than a pointer, and > Linux encodes this back into a fake __iomem token. Correct, we have since gained new PCI load/store instructions that actually do take a virtual address that gets address translated to a "physical address" for the PCI BARs. The "physical address" we get from the platform (again via special instructions). I put "physical address" in quotes because while they are conceptually physical addresses and they are translated to by MMU translation tables, accessing their virtual mapping with any instruction other then the special PCI load/store instructions will cause addressing exceptions. So we still don't have real MMIO but something that looks a lot more like MMIO than we used to. > > > If anything, I could imagine the same limitation as with current POWER9 > > implementations, that is whatever glue is used to wire PCI/PCIe to the > > rest of the system does not implement a way to use said bit pattern (which > > has nothing to do with the POWER9 processor instruction set). > > > > But that has nothing to do with the presence or absence of any specific > > processor instructions. It's just a limitation of bus glue. So I guess > > it's just that all PCI/PCIe glue logic implementations for s390 have such > > a limitation, right? > > There are separate instructions for PCI memory and config space, but > no instructions for I/O space, or for non-PCI MMIO that it could be mapped > into. > > Arnd The config space is still accessed with the old style PCI load/store instructions via a special extra BAR. But yes overall on s390x we can only access PCI(e) devices via special instructions not via real MMIO and also the OS has no direct access to the registers of the PHB which are only accessible to firmware. Maybe as a bit of further background it's also important to note that on s390x all Operating Systems run inside a hypervisor. On the lowest level any OS can run this is a non-paging machine level hypervisor. For PCI that means that we always have a kind of pass-through access though without paging and hardware support for the memory partitioning this is of course relatively simple.
From: Maciej W. Rozycki > Sent: 06 May 2022 13:27 > > On Fri, 6 May 2022, Arnd Bergmann wrote: > > > > If this is PCI/PCIe indeed, then an I/O access is just a different bit > > > pattern put on the bus/in the TLP in the address phase. So what is there > > > inherent to the s390 architecture that prevents that different bit pattern > > > from being used? > > > > The hardware design for PCI on s390 is very different from any other > > architecture, and more abstract. Rather than implementing MMIO register > > access as pointer dereference, this is a separate CPU instruction that > > takes a device/bar plus offset as arguments rather than a pointer, and > > Linux encodes this back into a fake __iomem token. > > OK, that seems to me like a reasonable and quite a clean design (on the > hardware side). > > So what happens if the instruction is given an I/O rather than memory BAR > as the relevant argument? Is the address space indicator bit (bit #0) > simply ignored or what? You don't understand something... For a memory BAR pci_ioremap() returns a token that can be used with readl(). On most architectures readl() is just a memory access. This all fails on an I/O BAR. For an I/O BAR you want a similar pair of functions. On x86 the access function would need to use the 'in/out' instructions but there is no requirement for the token to be the native io port number. On many non-x86 architectures the access function would be a simple memory access - but a specific system (eg many ppc) may never actually allow such mappings to succeed. You might also want a third PCI mapping function that can map a memory BAR or an I/O BAR - with a suitable access function. On x86 this would need something like ioread8() for accesses. Except that function uses small integers for io port numbers (which is inherently dangerous). Nothing except the architecture specific implementation of the function to access an io BAR would ever go near a function called inb(). The same is really true for other bus type - including ISA and EISA. (Ignoring the horrid of probing ISI bus devices - hopefully they are in the ACPI tables??_ If a driver is probed on a ISA bus there ought to be functions equivalent to pci_ioremap() (for both memory and IO addresses) that return tokens appropriate for the specific bus. That is all a different load of churn. > > > But that has nothing to do with the presence or absence of any specific > > > processor instructions. It's just a limitation of bus glue. So I guess > > > it's just that all PCI/PCIe glue logic implementations for s390 have such > > > a limitation, right? > > > > There are separate instructions for PCI memory and config space, but > > no instructions for I/O space, or for non-PCI MMIO that it could be mapped > > into. > > The PCI configuration space was retrofitted into x86 systems (and is > accessed in an awkward manner with them), but with a new design such a > clean approach is most welcome IMHO. Thank you for your explanation. Actually I think x86 was the initial system for PCI. The PCI config space 'mess' is all about trying to make something that wouldn't break existing memory maps. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, May 6, 2022 at 2:27 PM Maciej W. Rozycki <macro@orcam.me.uk> wrote: > > On Fri, 6 May 2022, Arnd Bergmann wrote: > > > > If this is PCI/PCIe indeed, then an I/O access is just a different bit > > > pattern put on the bus/in the TLP in the address phase. So what is there > > > inherent to the s390 architecture that prevents that different bit pattern > > > from being used? > > > > The hardware design for PCI on s390 is very different from any other > > architecture, and more abstract. Rather than implementing MMIO register > > access as pointer dereference, this is a separate CPU instruction that > > takes a device/bar plus offset as arguments rather than a pointer, and > > Linux encodes this back into a fake __iomem token. > > OK, that seems to me like a reasonable and quite a clean design (on the > hardware side). > > So what happens if the instruction is given an I/O rather than memory BAR > as the relevant argument? Is the address space indicator bit (bit #0) > simply ignored or what? Not sure. My best guess is that it would actually work as you'd expect, but is deliberately left out of the architecture specification so they don't have to to validate the correctness. Note that only a small number of PCIe cards are actually supported by IBM, and I think the firmware only passes devices to the OS if they are whitelisted. Arnd
On Fri, 2022-05-06 at 13:27 +0100, Maciej W. Rozycki wrote: > On Fri, 6 May 2022, Arnd Bergmann wrote: > > > > If this is PCI/PCIe indeed, then an I/O access is just a different bit > > > pattern put on the bus/in the TLP in the address phase. So what is there > > > inherent to the s390 architecture that prevents that different bit pattern > > > from being used? > > > > The hardware design for PCI on s390 is very different from any other > > architecture, and more abstract. Rather than implementing MMIO register > > access as pointer dereference, this is a separate CPU instruction that > > takes a device/bar plus offset as arguments rather than a pointer, and > > Linux encodes this back into a fake __iomem token. > > OK, that seems to me like a reasonable and quite a clean design (on the > hardware side). > > So what happens if the instruction is given an I/O rather than memory BAR > as the relevant argument? Is the address space indicator bit (bit #0) > simply ignored or what? See my answer to Arnd for some more background but there simply isn't a way to formulate an I/O access. In the old style PCI instructions the BAR number and the function handle are put in a register before the access. BAR number 15 is used to access config space. If there is no BAR for that number the instruction fails with a non-zero CC. > > > > But that has nothing to do with the presence or absence of any specific > > > processor instructions. It's just a limitation of bus glue. So I guess > > > it's just that all PCI/PCIe glue logic implementations for s390 have such > > > a limitation, right? > > > > There are separate instructions for PCI memory and config space, but > > no instructions for I/O space, or for non-PCI MMIO that it could be mapped > > into. > > The PCI configuration space was retrofitted into x86 systems (and is > accessed in an awkward manner with them), but with a new design such a > clean approach is most welcome IMHO. Thank you for your explanation. > > Maciej Well our design is a retrofit too considering s390x is a direct decendent of IBM System/360 which one could argue to have been the first ISA. But yes as PCI support was only added with PCIe and with a machine level hypervisor already in place we do get shielded a lot from the gritty details.
On Fri, May 6, 2022 at 2:56 PM David Laight <David.Laight@aculab.com> wrote: > From: Maciej W. Rozycki > > Sent: 06 May 2022 13:27 > > On Fri, 6 May 2022, Arnd Bergmann wrote: > > > > If this is PCI/PCIe indeed, then an I/O access is just a different bit > > > > pattern put on the bus/in the TLP in the address phase. So what is there > > > > inherent to the s390 architecture that prevents that different bit pattern > > > > from being used? > > > > > > The hardware design for PCI on s390 is very different from any other > > > architecture, and more abstract. Rather than implementing MMIO register > > > access as pointer dereference, this is a separate CPU instruction that > > > takes a device/bar plus offset as arguments rather than a pointer, and > > > Linux encodes this back into a fake __iomem token. > > > > OK, that seems to me like a reasonable and quite a clean design (on the > > hardware side). > > > > So what happens if the instruction is given an I/O rather than memory BAR > > as the relevant argument? Is the address space indicator bit (bit #0) > > simply ignored or what? > > You don't understand something... > > For a memory BAR pci_ioremap() returns a token that can be used with readl(). > On most architectures readl() is just a memory access. > This all fails on an I/O BAR. > > For an I/O BAR you want a similar pair of functions. > On x86 the access function would need to use the 'in/out' instructions but > there is no requirement for the token to be the native io port number. > On many non-x86 architectures the access function would be a simple memory > access - but a specific system (eg many ppc) may never actually allow > such mappings to succeed. > > You might also want a third PCI mapping function that can map a memory > BAR or an I/O BAR - with a suitable access function. > On x86 this would need something like ioread8() for accesses. > Except that function uses small integers for io port numbers > (which is inherently dangerous). > > Nothing except the architecture specific implementation of the function > to access an io BAR would ever go near a function called inb(). > > The same is really true for other bus type - including ISA and EISA. > (Ignoring the horrid of probing ISI bus devices - hopefully they > are in the ACPI tables??_ > If a driver is probed on a ISA bus there ought to be functions > equivalent to pci_ioremap() (for both memory and IO addresses) > that return tokens appropriate for the specific bus. > > That is all a different load of churn. A loooong time ago, it was suggested to add register accessor functions to struct device, so e.g. readl(dev, offset) would call into these accessors, which would implement the bus-specific behavior. No more worries about readl(), __raw_readl(), ioread32b(), or whatever quirk is needed, at the (small on nowadays' machines) expense of some indirection... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On Fri, 6 May 2022, David Laight wrote: > > The PCI configuration space was retrofitted into x86 systems (and is > > accessed in an awkward manner with them), but with a new design such a > > clean approach is most welcome IMHO. Thank you for your explanation. > > Actually I think x86 was the initial system for PCI. > The PCI config space 'mess' is all about trying to make > something that wouldn't break existing memory maps. It was retrofitted in that x86 systems already existed for ~15 years when PCI came into picture. Therefore the makers of the CPU ISA couldn't have envisaged the need for config access instructions like they did for memory and port access. Maciej
On Fri, 2022-05-06 at 14:53 +0200, Arnd Bergmann wrote: > On Fri, May 6, 2022 at 2:27 PM Maciej W. Rozycki <macro@orcam.me.uk> wrote: > > On Fri, 6 May 2022, Arnd Bergmann wrote: > > > > > > If this is PCI/PCIe indeed, then an I/O access is just a different bit > > > > pattern put on the bus/in the TLP in the address phase. So what is there > > > > inherent to the s390 architecture that prevents that different bit pattern > > > > from being used? > > > > > > The hardware design for PCI on s390 is very different from any other > > > architecture, and more abstract. Rather than implementing MMIO register > > > access as pointer dereference, this is a separate CPU instruction that > > > takes a device/bar plus offset as arguments rather than a pointer, and > > > Linux encodes this back into a fake __iomem token. > > > > OK, that seems to me like a reasonable and quite a clean design (on the > > hardware side). > > > > So what happens if the instruction is given an I/O rather than memory BAR > > as the relevant argument? Is the address space indicator bit (bit #0) > > simply ignored or what? > > Not sure. My best guess is that it would actually work as you'd expect, > but is deliberately left out of the architecture specification so they don't > have to to validate the correctness. Note that only a small number of > PCIe cards are actually supported by IBM, and I think the firmware > only passes devices to the OS if they are whitelisted. > > Arnd Yes, though in Linux we do try hard to work with whatever is plugged in. We did benefit from this in the past working with a new NIC from a different vendor with 0 additional changes. Also you can use vfio-pci to pass-through arbitrary PCI devices to a QEMU emulating s390x.
On Fri, 6 May 2022, Arnd Bergmann wrote: > > So what happens if the instruction is given an I/O rather than memory BAR > > as the relevant argument? Is the address space indicator bit (bit #0) > > simply ignored or what? > > Not sure. My best guess is that it would actually work as you'd expect, > but is deliberately left out of the architecture specification so they don't > have to to validate the correctness. Note that only a small number of > PCIe cards are actually supported by IBM, and I think the firmware > only passes devices to the OS if they are whitelisted. That makes sense, thanks! Maciej
From: Maciej W. Rozycki > Sent: 06 May 2022 14:15 > On Fri, 6 May 2022, David Laight wrote: > > > > The PCI configuration space was retrofitted into x86 systems (and is > > > accessed in an awkward manner with them), but with a new design such a > > > clean approach is most welcome IMHO. Thank you for your explanation. > > > > Actually I think x86 was the initial system for PCI. > > The PCI config space 'mess' is all about trying to make > > something that wouldn't break existing memory maps. > > It was retrofitted in that x86 systems already existed for ~15 years when > PCI came into picture. Therefore the makers of the CPU ISA couldn't have > envisaged the need for config access instructions like they did for memory > and port access. Rev 2.0 of the PCI spec (1993) defines two mechanisms for config cycles. #2 is probably the first one and maps all of PCI config space into 4k of IO space (PCI bridges aren't supported). #1 requires a pair of accesses (and SMP locking). Neither is really horrid. For horrid try the ISApnp interface. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, 6 May 2022, Geert Uytterhoeven wrote: > A loooong time ago, it was suggested to add register accessor > functions to struct device, so e.g. readl(dev, offset) would call > into these accessors, which would implement the bus-specific behavior. > No more worries about readl(), __raw_readl(), ioread32b(), or whatever > quirk is needed, at the (small on nowadays' machines) expense of > some indirection... I guess you'd need an additional parameter for the endianness policy required (to match either bit or byte lanes, according to ultimate data interpretation) where crossing between buses of a different endianness each. Otherwise you'd end up with the mess elsewhere. Maciej
From: Geert Uytterhoeven > Sent: 06 May 2022 14:09 ... > > The same is really true for other bus type - including ISA and EISA. > > (Ignoring the horrid of probing ISI bus devices - hopefully they > > are in the ACPI tables??_ > > If a driver is probed on a ISA bus there ought to be functions > > equivalent to pci_ioremap() (for both memory and IO addresses) > > that return tokens appropriate for the specific bus. > > > > That is all a different load of churn. > > A loooong time ago, it was suggested to add register accessor > functions to struct device, so e.g. readl(dev, offset) would call > into these accessors, which would implement the bus-specific behavior. > No more worries about readl(), __raw_readl(), ioread32b(), or whatever > quirk is needed, at the (small on nowadays' machines) expense of > some indirection... I was just thinking that the access functions might need a 'device'. Although you also need the BAR (or equivalent). So readl(dev, bar_token, offset) or readl(dev, bar_token + offset). Clearly the 'dev' parameter could be compiled out for non-DEBUG build on x86 - leaving the current(ish) object code. You don't want an indirect call (this year), but maybe real function call and a few tests won't make that much difference. They might affect PCIe writes, but PCIe reads are so slow you need to avoid them whenever possible. I've not timed reads into something like an ethernet chip, but into our fpga they are probably 1000 clocks+. OTOH I wouldn't want any overhead on the PIO fifo reads on one of our small ppc devices. We push a lot of data though that fifo and anything extra would kill performance. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, 6 May 2022, David Laight wrote: > > It was retrofitted in that x86 systems already existed for ~15 years when > > PCI came into picture. Therefore the makers of the CPU ISA couldn't have > > envisaged the need for config access instructions like they did for memory > > and port access. > > Rev 2.0 of the PCI spec (1993) defines two mechanisms for config cycles. > #2 is probably the first one and maps all of PCI config space into > 4k of IO space (PCI bridges aren't supported). This one is even more horrid than #1 in that it requires two separate preparatory I/O writes rather than just one, one to the Forward Register (at 0xcfa) to set the bus number, and another to the Configuration Space Enable Register (at 0xcf8) to set the function number, before you can issue a configuration read or write to a device. So you need MP locking too. NB only peer bridges aren't supported with this mechanism, normal PCI-PCI bridges are, via the Forward Register. > #1 requires a pair of accesses (and SMP locking). > > Neither is really horrid. Both are. First neither is MP-safe and second both are indirect in that you need to poke at some chipset registers before you can issue the actual read or write. Sane access would require a single CPU instruction to read or write from the configuration space. To access the conventional PCI configuration space in a direct linear manner you need 256 * 21 * 8 * 256 = 10.5MiB of address space. Such amount of address space seems affordable even with 32-bit systems. Maciej
Hi Maciej, On Fri, May 6, 2022 at 4:44 PM Maciej W. Rozycki <macro@orcam.me.uk> wrote: > On Fri, 6 May 2022, David Laight wrote: > > > It was retrofitted in that x86 systems already existed for ~15 years when > > > PCI came into picture. Therefore the makers of the CPU ISA couldn't have > > > envisaged the need for config access instructions like they did for memory > > > and port access. > > > > Rev 2.0 of the PCI spec (1993) defines two mechanisms for config cycles. > > #2 is probably the first one and maps all of PCI config space into > > 4k of IO space (PCI bridges aren't supported). > > This one is even more horrid than #1 in that it requires two separate > preparatory I/O writes rather than just one, one to the Forward Register > (at 0xcfa) to set the bus number, and another to the Configuration Space > Enable Register (at 0xcf8) to set the function number, before you can > issue a configuration read or write to a device. So you need MP locking > too. > > NB only peer bridges aren't supported with this mechanism, normal PCI-PCI > bridges are, via the Forward Register. > > > #1 requires a pair of accesses (and SMP locking). > > > > Neither is really horrid. > > Both are. First neither is MP-safe and second both are indirect in that > you need to poke at some chipset registers before you can issue the actual > read or write. > > Sane access would require a single CPU instruction to read or write from > the configuration space. To access the conventional PCI configuration > space in a direct linear manner you need 256 * 21 * 8 * 256 = 10.5MiB of > address space. Such amount of address space seems affordable even with > 32-bit systems. Won't have fit in the legacy 1 MiB space ("640 KiB..."). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Hi David On Fri, May 6, 2022 at 4:05 PM David Laight <David.Laight@aculab.com> wrote: > From: Geert Uytterhoeven > > Sent: 06 May 2022 14:09 > > > The same is really true for other bus type - including ISA and EISA. > > > (Ignoring the horrid of probing ISI bus devices - hopefully they > > > are in the ACPI tables??_ > > > If a driver is probed on a ISA bus there ought to be functions > > > equivalent to pci_ioremap() (for both memory and IO addresses) > > > that return tokens appropriate for the specific bus. > > > > > > That is all a different load of churn. > > > > A loooong time ago, it was suggested to add register accessor > > functions to struct device, so e.g. readl(dev, offset) would call > > into these accessors, which would implement the bus-specific behavior. > > No more worries about readl(), __raw_readl(), ioread32b(), or whatever > > quirk is needed, at the (small on nowadays' machines) expense of > > some indirection... > > I was just thinking that the access functions might need a 'device'. > Although you also need the BAR (or equivalent). > So readl(dev, bar_token, offset) or readl(dev, bar_token + offset). Note that we do have such a system: regmap. > Clearly the 'dev' parameter could be compiled out for non-DEBUG > build on x86 - leaving the current(ish) object code. Assumed all devices are PCI devices. E.g. USB devices would still need the indirection. > You don't want an indirect call (this year), but maybe real > function call and a few tests won't make that much difference. > They might affect PCIe writes, but PCIe reads are so slow you > need to avoid them whenever possible. > I've not timed reads into something like an ethernet chip, > but into our fpga they are probably 1000 clocks+. > > OTOH I wouldn't want any overhead on the PIO fifo reads > on one of our small ppc devices. > We push a lot of data though that fifo and anything extra > would kill performance. Right. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Hi Geert, > > Sane access would require a single CPU instruction to read or write from > > the configuration space. To access the conventional PCI configuration > > space in a direct linear manner you need 256 * 21 * 8 * 256 = 10.5MiB of > > address space. Such amount of address space seems affordable even with > > 32-bit systems. > > Won't have fit in the legacy 1 MiB space ("640 KiB..."). Haha, but anyway you're supposed to use BIOS calls under DOS and the like so it doesn't really matter. You can't poke at the APIC in the legacy space either. Maciej
On Fri, 6 May 2022, Niklas Schnelle wrote: > On Fri, 2022-05-06 at 19:12 +1000, Finn Thain wrote: > > > > On Thu, 5 May 2022, Bjorn Helgaas wrote: > > > > > On Thu, May 05, 2022 at 07:39:42PM +0200, Arnd Bergmann wrote: > > > > On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote: > > > > > > The main goal is to avoid c), which is what happens on s390, > > > > > > but can also happen elsewhere. Catching b) would be nice as > > > > > > well, but is much harder to do from generic code as you'd need > > > > > > an architecture specific inline asm statement to insert a > > > > > > ex_table fixup, or a runtime conditional on each access. > > > > > > > > > > Or s390 could implement its own inb(). > > > > > > > > > > I'm hearing that generic powerpc kernels have to run both on > > > > > machines that have I/O port space and those that don't. That > > > > > makes me think s390 could do something similar. > > > > > > > > No, this is actually the current situation, and it makes > > > > absolutely no sense. s390 has no way of implementing inb()/outb() > > > > because there are no instructions for it and it cannot tunnel them > > > > through a virtual address mapping like on most of the other > > > > architectures. (it has special instructions for accessing memory > > > > space, which is not the same as a pointer dereference here). > > > > > > > > The existing implementation gets flagged as a NULL pointer > > > > dereference by a compiler warning because it effectively is. > > > > > > I think s390 currently uses the inb() in asm-generic/io.h, i.e., > > > "__raw_readb(PCI_IOBASE + addr)". I understand that's a NULL > > > pointer dereference because the default PCI_IOBASE is 0. > > > > > > I mooted a s390 inb() implementation like "return ~0" because that's > > > what happens on most arches when there's no device to respond to the > > > inb(). > > > > > > The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter > > > drivers that use I/O ports in some cases but not others. But maybe > > > it's the most practical way. > > > > > > > Do you mean, "the most practical way to avoid a compiler warning on > > s390"? What about "#pragma GCC diagnostic ignored"? > > This actually happens with clang. That suggests a clang bug to me. If you believe GCC should behave like clang, then I guess the pragma above really is the one you want. If you somehow feel that the kernel should cater to gcc and clang even where they disagree then you would have to use "#pragma clang diagnostic ignored". > Apart from that, I think this would also fall under the same argument as > the original patch Linus unpulled. We would just paint over someting > that we know at compile time won't work: > > https://lore.kernel.org/lkml/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@mail.gmail.com/ > I wasn't advocating adding any warnings. If you know at compile time that a driver won't work, the usual solution is scripts/config -d CONFIG_SOME_UNDESIRED_DRIVER. Why is that no longer appropriate for drivers that use IO ports?
On Sat, May 7, 2022 at 2:01 AM Finn Thain <fthain@linux-m68k.org> wrote: > On Fri, 6 May 2022, Niklas Schnelle wrote: > > On Fri, 2022-05-06 at 19:12 +1000, Finn Thain wrote: > > > On Thu, 5 May 2022, Bjorn Helgaas wrote: > > > > > > > > I mooted a s390 inb() implementation like "return ~0" because that's > > > > what happens on most arches when there's no device to respond to the > > > > inb(). > > > > > > > > The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter > > > > drivers that use I/O ports in some cases but not others. But maybe > > > > it's the most practical way. > > > > > > > > > > Do you mean, "the most practical way to avoid a compiler warning on > > > s390"? What about "#pragma GCC diagnostic ignored"? > > > > This actually happens with clang. > > That suggests a clang bug to me. If you believe GCC should behave like > clang, then I guess the pragma above really is the one you want. If you > somehow feel that the kernel should cater to gcc and clang even where they > disagree then you would have to use "#pragma clang diagnostic ignored". I don't see how you can blame the compiler for this. On architectures with a zero PCI_IOBASE, an inb(0x2f8) literally becomes var = *(u8*)((NULL + 0x2f8); If you run a driver that does this, the kernel gets a page fault for the NULL page and reports an Oops. clang tells you 'warning: performing pointer arithmetic on a null pointer has undefined behavior', which is not exactly spot on, but close enough to warn you that you probably shouldn't do this. gcc doesn't warn here, but it does warn about an array out-of-bounds access when you pass such a pointer into memcpy or another string function. > > Apart from that, I think this would also fall under the same argument as > > the original patch Linus unpulled. We would just paint over someting > > that we know at compile time won't work: > > > > https://lore.kernel.org/lkml/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@mail.gmail.com/ > > > > I wasn't advocating adding any warnings. > > If you know at compile time that a driver won't work, the usual solution > is scripts/config -d CONFIG_SOME_UNDESIRED_DRIVER. Why is that no > longer appropriate for drivers that use IO ports? This was never an option, we rely on 'make allmodconfig' to build without warnings on all architectures for finding regressions. Any driver that depends on architecture specific interfaces must not get selected on architectures that don't have those interfaces. Arnd
Hi Arnd, On Sat, 7 May 2022, Arnd Bergmann wrote: > On Sat, May 7, 2022 at 2:01 AM Finn Thain <fthain@linux-m68k.org> wrote: > > On Fri, 6 May 2022, Niklas Schnelle wrote: > > > On Fri, 2022-05-06 at 19:12 +1000, Finn Thain wrote: > > > > On Thu, 5 May 2022, Bjorn Helgaas wrote: > > > > > > > > > > I mooted a s390 inb() implementation like "return ~0" because that's > > > > > what happens on most arches when there's no device to respond to the > > > > > inb(). > > > > > > > > > > The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter > > > > > drivers that use I/O ports in some cases but not others. But maybe > > > > > it's the most practical way. > > > > > > > > > > > > > Do you mean, "the most practical way to avoid a compiler warning on > > > > s390"? What about "#pragma GCC diagnostic ignored"? > > > > > > This actually happens with clang. > > > > That suggests a clang bug to me. If you believe GCC should behave like > > clang, then I guess the pragma above really is the one you want. If you > > somehow feel that the kernel should cater to gcc and clang even where they > > disagree then you would have to use "#pragma clang diagnostic ignored". > > I don't see how you can blame the compiler for this. On architectures > with a zero PCI_IOBASE, an inb(0x2f8) literally becomes > > var = *(u8*)((NULL + 0x2f8); > > If you run a driver that does this, the kernel gets a page fault for > the NULL page > and reports an Oops. clang tells you 'warning: performing pointer > arithmetic on a null pointer has undefined behavior', which is not exactly > spot on, but close enough to warn you that you probably shouldn't do this. gcc > doesn't warn here, but it does warn about an array out-of-bounds access when > you pass such a pointer into memcpy or another string function. > The appeal to UB is weak IMHO. Pointer arithmetic with a zero value is unambiguous and the compiler generates the code to implement the expected behaviour just fine. UB is literally an omission in the standard. Well, low level programming has always been beyond the scope of C standards. If architectural-level code wants to do arithmetic with an arbitrary integer values, and the compiler doesn't like it, then the relevant warnings should be disabled for those expressions. > > > Apart from that, I think this would also fall under the same argument as > > > the original patch Linus unpulled. We would just paint over someting > > > that we know at compile time won't work: > > > > > > https://lore.kernel.org/lkml/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@mail.gmail.com/ > > > > > > > I wasn't advocating adding any warnings. > > > > If you know at compile time that a driver won't work, the usual solution > > is scripts/config -d CONFIG_SOME_UNDESIRED_DRIVER. Why is that no > > longer appropriate for drivers that use IO ports? > > This was never an option, we rely on 'make allmodconfig' to build > without warnings on all architectures for finding regressions. "All modules on all architectures with all compilers and checkers with all warnings enabled"? That's not even vaguely realistic. How about, "All modules on all architectures with a nominated compiler with the appropriate warnings enabled." > Any driver that depends on architecture specific interfaces must not get > selected on architectures that don't have those interfaces. > Kconfig always met that need before we got saddled with -Werror. That suggests to me that we need a "bool CONFIG_WARINGS_INTO_ERRORS" to control -Werror, which could be disabled for .config files (like make allmodconfig) where it is not helping.
On Sun, 8 May 2022, I wrote: > > That suggests to me that we need a "bool CONFIG_WARINGS_INTO_ERRORS" to > control -Werror, which could be disabled for .config files (like make > allmodconfig) where it is not helping. > I just noticed that we already have CONFIG_WERROR. So perhaps something like this would help. diff --git a/init/Kconfig b/init/Kconfig index ddcbefe535e9..765d83fb148e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -150,6 +150,8 @@ config WERROR However, if you have a new (or very old) compiler with odd and unusual warnings, or you have some architecture with problems, + or if you are using a compiler that doesn't happen to interpret + the C standards in quite the same way as some other compilers, you may need to disable this config option in order to successfully build the kernel.
diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index 7d0d26b5b3f5..2b9cf1b0bdb8 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -27,6 +27,7 @@ config ALPHA select AUDIT_ARCH select GENERIC_CPU_VULNERABILITIES select GENERIC_SMP_IDLE_THREAD + select HAS_IOPORT select HAVE_ARCH_AUDITSYSCALL select HAVE_MOD_ARCH_SPECIFIC select MODULES_USE_ELF_RELA diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 2e8091e2d8a8..603ce00033a5 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -69,6 +69,7 @@ config ARM select GENERIC_SCHED_CLOCK select GENERIC_SMP_IDLE_THREAD select HARDIRQS_SW_RESEND + select HAS_IOPORT select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6 select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 20ea89d9ac2f..234dc89a7654 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -136,6 +136,7 @@ config ARM64 select GENERIC_GETTIMEOFDAY select GENERIC_VDSO_TIME_NS select HARDIRQS_SW_RESEND + select HAS_IOPORT select HAVE_MOVE_PMD select HAVE_MOVE_PUD select HAVE_PCI diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index cb93769a9f2a..0fffe5130a80 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -25,6 +25,7 @@ config IA64 select PCI_DOMAINS if PCI select PCI_MSI select PCI_SYSCALL if PCI + select HAS_IOPORT select HAVE_ASM_MODVERSIONS select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_EXIT_THREAD diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index 936cce42ae9a..54bf0a40c2f0 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -18,6 +18,7 @@ config M68K select GENERIC_CPU_DEVICES select GENERIC_IOMAP select GENERIC_IRQ_SHOW + select HAS_IOPORT if PCI || ISA || ATARI_ROM_ISA select HAVE_ASM_MODVERSIONS select HAVE_DEBUG_BUGVERBOSE select HAVE_EFFICIENT_UNALIGNED_ACCESS if !CPU_HAS_NO_UNALIGNED diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig index 8cf429ad1c84..966a6682f1fc 100644 --- a/arch/microblaze/Kconfig +++ b/arch/microblaze/Kconfig @@ -21,6 +21,7 @@ config MICROBLAZE select GENERIC_IRQ_SHOW select GENERIC_PCI_IOMAP select GENERIC_SCHED_CLOCK + select HAS_IOPORT if PCI select HAVE_ARCH_HASH select HAVE_ARCH_KGDB select HAVE_ARCH_SECCOMP diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index de3b32a507d2..4c55df08d6f1 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -47,6 +47,7 @@ config MIPS select GENERIC_SMP_IDLE_THREAD select GENERIC_TIME_VSYSCALL select GUP_GET_PTE_LOW_HIGH if CPU_MIPS32 && PHYS_ADDR_T_64BIT + select HAS_IOPORT select HAVE_ARCH_COMPILER_H select HAVE_ARCH_JUMP_LABEL select HAVE_ARCH_KGDB if MIPS_FP_SUPPORT diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index 52e550b45692..741c5c64c173 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -46,6 +46,7 @@ config PARISC select MODULES_USE_ELF_RELA select CLONE_BACKWARDS select TTY # Needed for pdc_cons.c + select HAS_IOPORT if PCI || EISA select HAVE_DEBUG_STACKOVERFLOW select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_HASH diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 174edabb74fa..7133cc35b777 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -182,6 +182,7 @@ config PPC select GENERIC_SMP_IDLE_THREAD select GENERIC_TIME_VSYSCALL select GENERIC_VDSO_TIME_NS + select HAS_IOPORT if PCI select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_HUGE_VMALLOC if HAVE_ARCH_HUGE_VMAP select HAVE_ARCH_HUGE_VMAP if PPC_RADIX_MMU || PPC_8xx diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 00fd9c548f26..27fc8a450478 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -67,6 +67,7 @@ config RISCV select GENERIC_SMP_IDLE_THREAD select GENERIC_TIME_VSYSCALL if MMU && 64BIT select GENERIC_VDSO_TIME_NS if HAVE_GENERIC_VDSO + select HAS_IOPORT if MMU select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL select HAVE_ARCH_JUMP_LABEL_RELATIVE if !XIP_KERNEL diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig index 5f220e903e5a..6c1694e82b89 100644 --- a/arch/sh/Kconfig +++ b/arch/sh/Kconfig @@ -25,6 +25,7 @@ config SUPERH select GENERIC_SCHED_CLOCK select GENERIC_SMP_IDLE_THREAD select GUP_GET_PTE_LOW_HIGH if X2TLB + select HAS_IOPORT if HAS_IOPORT_MAP select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_KGDB select HAVE_ARCH_SECCOMP_FILTER diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 9200bc04701c..64736476dde8 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -32,6 +32,7 @@ config SPARC select GENERIC_IRQ_SHOW select ARCH_WANT_IPC_PARSE_VERSION select GENERIC_PCI_IOMAP + select HAS_IOPORT select HAVE_NMI_WATCHDOG if SPARC64 select HAVE_CBPF_JIT if SPARC32 select HAVE_EBPF_JIT if SPARC64 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index b0142e01002e..9ef0438d1b7d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -155,6 +155,7 @@ config X86 select GUP_GET_PTE_LOW_HIGH if X86_PAE select HARDIRQS_SW_RESEND select HARDLOCKUP_CHECK_TIMESTAMP if X86_64 + select HAS_IOPORT select HAVE_ACPI_APEI if ACPI select HAVE_ACPI_APEI_NMI if ACPI select HAVE_ALIGNED_STRUCT_PAGE if SLUB diff --git a/drivers/bus/Kconfig b/drivers/bus/Kconfig index 3c68e174a113..a61285100224 100644 --- a/drivers/bus/Kconfig +++ b/drivers/bus/Kconfig @@ -81,7 +81,7 @@ config MOXTET config HISILICON_LPC bool "Support for ISA I/O space on HiSilicon Hip06/7" depends on (ARM64 && ARCH_HISI) || (COMPILE_TEST && !ALPHA && !HEXAGON && !PARISC) - depends on HAS_IOMEM + depends on HAS_IOPORT select INDIRECT_PIO if ARM64 help Driver to enable I/O access to devices attached to the Low Pin diff --git a/lib/Kconfig b/lib/Kconfig index 087e06b4cdfd..177fed9cf20a 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -91,6 +91,7 @@ config ARCH_USE_SYM_ANNOTATIONS config INDIRECT_PIO bool "Access I/O in non-MMIO mode" depends on ARM64 + depends on HAS_IOPORT help On some platforms where no separate I/O space exists, there are I/O hosts which can not be accessed in MMIO mode. Using the logical PIO @@ -493,6 +494,9 @@ config HAS_IOMEM depends on !NO_IOMEM default y +config HAS_IOPORT + def_bool ISA + config HAS_IOPORT_MAP bool depends on HAS_IOMEM && !NO_IOPORT_MAP diff --git a/lib/Kconfig.kgdb b/lib/Kconfig.kgdb index 05dae05b6cc9..c68e4d9dcecb 100644 --- a/lib/Kconfig.kgdb +++ b/lib/Kconfig.kgdb @@ -121,6 +121,7 @@ config KDB_DEFAULT_ENABLE config KDB_KEYBOARD bool "KGDB_KDB: keyboard as input device" + depends on HAS_IOPORT depends on VT && KGDB_KDB default n help
We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O Port access. In a future patch HAS_IOPORT=n will disable compilation of the I/O accessor functions inb()/outb() and friends on architectures which can not meaningfully support legacy I/O spaces such as s390 or where such support is optional. The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs for HAS_IOPORT specific sections will be added in subsequent patches on a per subsystem basis. Co-developed-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> --- arch/alpha/Kconfig | 1 + arch/arm/Kconfig | 1 + arch/arm64/Kconfig | 1 + arch/ia64/Kconfig | 1 + arch/m68k/Kconfig | 1 + arch/microblaze/Kconfig | 1 + arch/mips/Kconfig | 1 + arch/parisc/Kconfig | 1 + arch/powerpc/Kconfig | 1 + arch/riscv/Kconfig | 1 + arch/sh/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/x86/Kconfig | 1 + drivers/bus/Kconfig | 2 +- lib/Kconfig | 4 ++++ lib/Kconfig.kgdb | 1 + 16 files changed, 19 insertions(+), 1 deletion(-)