diff mbox

Issue with the emulated PCI bridge implementation

Message ID 20131226165241.2c50b244@skate (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Thomas Petazzoni Dec. 26, 2013, 3:52 p.m. UTC
Jason,

On Thu, 26 Dec 2013 16:05:34 +0100, Thomas Petazzoni wrote:

> For now, my only idea would be to do the pci_ioremap_io()
> unconditionally for all PCIe interfaces when the PCIe host controller
> driver is initialized. We know the maximum size of the I/O region for
> each PCIe interface, and this size is small (64 KB). We can keep the
> creation of the corresponding MBus window as something dynamic done by
> the bridge.

Here is an implementation of this idea, tested to work with an e1000e
card, with the driver modified to do a few read/write to the I/O
region. What do you think about it?

Thanks!

Thomas

From c062af329fa07ddc6d0ca305b4c9633f7e1dca00 Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Date: Thu, 26 Dec 2013 16:46:24 +0100
Subject: [PATCH] pci: mvebu: call pci_ioremap_io() at startup instead of
 dynamically

The mvebu PCI host controller driver uses an emulated PCI-to-PCI
bridge to leverage the core PCI kernel enumeration logic to
dynamically create and remove the MBus windows needed to access the
memory and I/O regions of each PCI interface.

In the context of this PCI-to-PCI bridge emulation, the driver
emulates all reads and writes to the PCI bridge registers. Upon a
write to the registers conguring the I/O base and limit, the driver
was creating the MBus window, and calling pci_ioremap_io() to setup
the mapping.

However, it turns out that accesses to these registers are made in an
IRQ disabled context, while pci_ioremap_io() is a potentially sleeping
function. Not only this is wrong, but it is causing so fairly loud
warnings at boot time when the appropriate kernel hacking options are
enabled.

This patch solves this by moving the pci_ioremap_io() call at the
startup of the driver. At this point, we don't know how many PCI
interfaces will be enabled, so we are simply remapping the entire PCI
I/O space to virtual addresses. This is reasonable since this I/O
space is limited to 1 MB in size, and also because the MBus windows
continue to be created in a dynamic fashion only when devices need
them.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 drivers/pci/host/pci-mvebu.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Bjorn Helgaas Jan. 2, 2014, 9:41 p.m. UTC | #1
On Thu, Dec 26, 2013 at 8:52 AM, Thomas Petazzoni
<thomas.petazzoni@free-electrons.com> wrote:
> Jason,
>
> On Thu, 26 Dec 2013 16:05:34 +0100, Thomas Petazzoni wrote:
>
>> For now, my only idea would be to do the pci_ioremap_io()
>> unconditionally for all PCIe interfaces when the PCIe host controller
>> driver is initialized. We know the maximum size of the I/O region for
>> each PCIe interface, and this size is small (64 KB). We can keep the
>> creation of the corresponding MBus window as something dynamic done by
>> the bridge.
>
> Here is an implementation of this idea, tested to work with an e1000e
> card, with the driver modified to do a few read/write to the I/O
> region. What do you think about it?
>
> Thanks!
>
> Thomas
>
> From c062af329fa07ddc6d0ca305b4c9633f7e1dca00 Mon Sep 17 00:00:00 2001
> From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> Date: Thu, 26 Dec 2013 16:46:24 +0100
> Subject: [PATCH] pci: mvebu: call pci_ioremap_io() at startup instead of
>  dynamically
>
> The mvebu PCI host controller driver uses an emulated PCI-to-PCI
> bridge to leverage the core PCI kernel enumeration logic to
> dynamically create and remove the MBus windows needed to access the
> memory and I/O regions of each PCI interface.
>
> In the context of this PCI-to-PCI bridge emulation, the driver
> emulates all reads and writes to the PCI bridge registers. Upon a
> write to the registers conguring the I/O base and limit, the driver
> was creating the MBus window, and calling pci_ioremap_io() to setup
> the mapping.
>
> However, it turns out that accesses to these registers are made in an
> IRQ disabled context, while pci_ioremap_io() is a potentially sleeping
> function. Not only this is wrong, but it is causing so fairly loud
> warnings at boot time when the appropriate kernel hacking options are
> enabled.
>
> This patch solves this by moving the pci_ioremap_io() call at the
> startup of the driver. At this point, we don't know how many PCI
> interfaces will be enabled, so we are simply remapping the entire PCI
> I/O space to virtual addresses. This is reasonable since this I/O
> space is limited to 1 MB in size, and also because the MBus windows
> continue to be created in a dynamic fashion only when devices need
> them.
>
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

Applied to my pci/host-mvebu branch for v3.14, thanks!

Bjorn

> ---
>  drivers/pci/host/pci-mvebu.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
> index 2aa7b77c..476dd72 100644
> --- a/drivers/pci/host/pci-mvebu.c
> +++ b/drivers/pci/host/pci-mvebu.c
> @@ -330,8 +330,6 @@ static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
>         mvebu_mbus_add_window_remap_by_id(port->io_target, port->io_attr,
>                                           port->iowin_base, port->iowin_size,
>                                           iobase);
> -
> -       pci_ioremap_io(iobase, port->iowin_base);
>  }
>
>  static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
> @@ -969,6 +967,10 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
>         }
>
>         pcie->nports = i;
> +
> +       for (i = 0; i < (IO_SPACE_LIMIT - SZ_64K); i += SZ_64K)
> +               pci_ioremap_io(i, pcie->io.start + i);
> +
>         mvebu_pcie_msi_enable(pcie);
>         mvebu_pcie_enable(pcie);
>
> --
> 1.8.3.2
>
> --
> Thomas Petazzoni, CTO, Free Electrons
> Embedded Linux, Kernel and Android engineering
> http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Jan. 3, 2014, 12:26 a.m. UTC | #2
On Thu, Dec 26, 2013 at 04:52:41PM +0100, Thomas Petazzoni wrote:

> Here is an implementation of this idea, tested to work with an e1000e
> card, with the driver modified to do a few read/write to the I/O
> region. What do you think about it?

This seems reasonable, the only down side is that a stray read to an
unused portion of the Linux IO mapping will lock the machine instead
of getting a page fault - however I don't see that as a blocker.

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Jan. 3, 2014, 12:22 p.m. UTC | #3
On Friday 03 January 2014, Jason Gunthorpe wrote:
> On Thu, Dec 26, 2013 at 04:52:41PM +0100, Thomas Petazzoni wrote:
> 
> > Here is an implementation of this idea, tested to work with an e1000e
> > card, with the driver modified to do a few read/write to the I/O
> > region. What do you think about it?
> 
> This seems reasonable, the only down side is that a stray read to an
> unused portion of the Linux IO mapping will lock the machine instead
> of getting a page fault - however I don't see that as a blocker.
> 

I've scratched my head a bit over this patch, and I couldn't find anything
wrong with it in the end, as long as we don't have any other device
in the system that also wants its share of the I/O space (e.g. a
PCMCIA port on the SRAM interface), but then we'd probably need other
changes as well.

However the part that made me wonder is that an e1000e with a PCI bridge
actually /should/not/ need to allocate an I/O space window with a precious
mbus resource, since AFAIK this adapter does not have an I/O space BARs.

Thomas, can you send the 'lspci -v' output of the system with this card
to confirm that the I/O space is actually needed? If not, we should probably
review the core PCI code to see why the bridge code tries to add this window.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Jan. 3, 2014, 6:44 p.m. UTC | #4
On Fri, Jan 03, 2014 at 01:22:31PM +0100, Arnd Bergmann wrote:

> However the part that made me wonder is that an e1000e with a PCI bridge
> actually /should/not/ need to allocate an I/O space window with a precious
> mbus resource, since AFAIK this adapter does not have an I/O space BARs.

IIRC the e1000 still has a legacy IO port BAR.. (guessing it is used
for PXE boot on x86?)

My patch set to allow the DT to turn off IO port allocation is already
in mainline - drop the IO ranges from the DT and no IO resources are
consumed at all.

Though, that is an interesting point, a small refinement would be to
not allocate the pci io map if IO is turned off as well..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Petazzoni Jan. 3, 2014, 7:01 p.m. UTC | #5
Dear Arnd Bergmann,

On Fri, 3 Jan 2014 13:22:31 +0100, Arnd Bergmann wrote:

> However the part that made me wonder is that an e1000e with a PCI bridge
> actually /should/not/ need to allocate an I/O space window with a precious
> mbus resource, since AFAIK this adapter does not have an I/O space BARs.
> 
> Thomas, can you send the 'lspci -v' output of the system with this card
> to confirm that the I/O space is actually needed? If not, we should probably
> review the core PCI code to see why the bridge code tries to add this window.

I don't have the hardware next to me, but IIRC the card does have an
I/O space. It is not used by the mainline kernel driver, so I have a
hacky patch that modifies the e1000e driver to make it use the I/O
space and do a few I/O reads and writes. I've used that since the
beginning to test I/O space functionality of the pci-mvebu driver.

Digging in the LAKML archive, I found a lspci -v output about the
e1000e, and it has an I/O space:

03:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06)
	Subsystem: Intel Corporation PRO/1000 PT Server Adapter
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 105
	Region 0: Memory at c1000000 (32-bit, non-prefetchable) [size=128K]
	Region 1: Memory at c1020000 (32-bit, non-prefetchable) [size=128K]
	Region 2: I/O ports at c0000000 [disabled] [size=32]
	[virtual] Expansion ROM at c1100000 [disabled] [size=128K]
	Capabilities: [c8] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [e0] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Latency L0 <4us, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [140 v1] Device Serial Number 00-1b-21-ff-ff-c1-c4-fe
	Kernel driver in use: e1000e

Best regards,

Thomas
Arnd Bergmann Jan. 3, 2014, 7:03 p.m. UTC | #6
On Friday 03 January 2014 11:44:24 Jason Gunthorpe wrote:
> On Fri, Jan 03, 2014 at 01:22:31PM +0100, Arnd Bergmann wrote:
> 
> > However the part that made me wonder is that an e1000e with a PCI bridge
> > actually /should/not/ need to allocate an I/O space window with a precious
> > mbus resource, since AFAIK this adapter does not have an I/O space BARs.
> 
> IIRC the e1000 still has a legacy IO port BAR.. (guessing it is used
> for PXE boot on x86?)

Ok, that would certainly explain it. I actually found a reference to the
I/O BAR in the e1000 driver now.

> My patch set to allow the DT to turn off IO port allocation is already
> in mainline - drop the IO ranges from the DT and no IO resources are
> consumed at all.

Ah, nice! I haven't followed that, but it sounds like a very good
solution.

> Though, that is an interesting point, a small refinement would be to
> not allocate the pci io map if IO is turned off as well..

Do you mean not call pci_ioremap_io() and pci_add_resource_offset(), or
something else? Either way, it sounds like the correct thing to do.
I wonder if the PCI core has any problems when there is no IO resource,
but it's probably fine.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Jan. 3, 2014, 7:04 p.m. UTC | #7
On Friday 03 January 2014 20:01:06 Thomas Petazzoni wrote:
> I don't have the hardware next to me, but IIRC the card does have an
> I/O space. It is not used by the mainline kernel driver, so I have a
> hacky patch that modifies the e1000e driver to make it use the I/O
> space and do a few I/O reads and writes. I've used that since the
> beginning to test I/O space functionality of the pci-mvebu driver.
> 
> Digging in the LAKML archive, I found a lspci -v output about the
> e1000e, and it has an I/O space:

Ok, thanks for the confirmation!

Do you mind posting your hack here? It may be useful for others as
well, such as the xgene developers that seem to be doing funny things
with their I/O space.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index 2aa7b77c..476dd72 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -330,8 +330,6 @@  static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
 	mvebu_mbus_add_window_remap_by_id(port->io_target, port->io_attr,
 					  port->iowin_base, port->iowin_size,
 					  iobase);
-
-	pci_ioremap_io(iobase, port->iowin_base);
 }
 
 static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
@@ -969,6 +967,10 @@  static int mvebu_pcie_probe(struct platform_device *pdev)
 	}
 
 	pcie->nports = i;
+
+	for (i = 0; i < (IO_SPACE_LIMIT - SZ_64K); i += SZ_64K)
+		pci_ioremap_io(i, pcie->io.start + i);
+
 	mvebu_pcie_msi_enable(pcie);
 	mvebu_pcie_enable(pcie);