Message ID | 20090420223305.GA15340@jurassic.park.msu.ru (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Ivan Kokshaysky wrote: > On Sun, Apr 19, 2009 at 11:06:15AM +0200, Ingo Molnar wrote: >> Hm, there's one patch in that lot that does: >> >> drivers/pci/bus.c | 8 +++++++- >> drivers/pci/probe.c | 8 ++++++-- >> drivers/pci/setup-bus.c | 40 +++++++++++++++++++++++++++++++--------- >> 3 files changed, 44 insertions(+), 12 deletions(-) >> >> Which should go via the PCI tree. > > Here is a replacement for that patch which doesn't touch > the generic code. > > Ivan. > > --- > x86 pci: first cut on 64-bit resource allocation > > I believe that we should consider PCI memory above 4G as yet another > type of address space. This actually makes sense, as even accesses to that > memory are physically different - Dual Address Cycle (DAC) vs. 32-bit > Single Address Cycle (SAC). > > So, platform that can deal with 64-bit allocations would set up an > additional root bus resource and mark it with IORESOURCE_MEM64 flag. > > The main problem here is how the kernel would detect that hardware can > actually access a DAC memory (I know for a fact that a lot of Intel chipsets > cannot access MMIO >4G, even though subordinate p2p bridges are 64-bit > capable). > On the other hand, there are PCI devices with 64-bit BARs that do not > work properly being placed above 4G boundary. For example, some > radeon cards have 64-bit BAR for video RAM, but allocating that BAR in > the DAC area doesn't work for various reasons, like video-BIOS > limitations or drivers not taking into account that GPU is 32-bit. > > So moving stuff into MEM64 area should be considered as generally unsafe > operation, and the best default policy is to not enable MEM64 resource > unless we find that BIOS has allocated something there. > At the same time, MEM64 can be easily enabled/disabled based on host > bridge PCI IDs, kernel command line options and so on. > > Here is a basic implementation of the above for x86. I think it's > reasonably good starting point for PCI64 work - the next step would > be to teach pci_bus_alloc_resource() about IORESOURCE_MEM64: logic is > similar to prefetch vs non-prefetch case - MEM can hold MEM64 resource, > but not vice versa. And eventually bridge sizing code will be updated > for reasonable 64-bit allocations (it's a non-trivial task, though). > > This patch alone should fix cardbus >4G allocations and similar > nonsense. > > Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> > --- > arch/x86/include/asm/pci.h | 8 ++++++++ > arch/x86/pci/Makefile | 2 ++ > arch/x86/pci/dac_64bit.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ > arch/x86/pci/i386.c | 10 ++++++++++ > include/linux/ioport.h | 2 ++ > 5 files changed, 66 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h > index b51a1e8..5a9c54e 100644 > --- a/arch/x86/include/asm/pci.h > +++ b/arch/x86/include/asm/pci.h > @@ -86,6 +86,14 @@ static inline void early_quirks(void) { } > > extern void pci_iommu_alloc(void); > > +#ifdef CONFIG_ARCH_PHYS_ADDR_T_64BIT > +extern void pcibios_pci64_setup(void); > +extern void pcibios_pci64_verify(void); > +#else > +static inline void pcibios_pci64_setup(void) { } > +static inline void pcibios_pci64_verify(void) { } > +#endif > + > /* MSI arch hook */ > #define arch_setup_msi_irqs arch_setup_msi_irqs > > diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile > index d49202e..1b6c576 100644 > --- a/arch/x86/pci/Makefile > +++ b/arch/x86/pci/Makefile > @@ -13,5 +13,7 @@ obj-$(CONFIG_X86_VISWS) += visws.o > > obj-$(CONFIG_X86_NUMAQ) += numaq_32.o > > +obj-$(CONFIG_ARCH_PHYS_ADDR_T_64BIT) += dac_64bit.o > + > obj-y += common.o early.o > obj-y += amd_bus.o > diff --git a/arch/x86/pci/dac_64bit.c b/arch/x86/pci/dac_64bit.c > new file mode 100644 > index 0000000..ee03c4a > --- /dev/null > +++ b/arch/x86/pci/dac_64bit.c > @@ -0,0 +1,44 @@ > +/* > + * Set up the 64-bit bus resource for allocations > 4G if the hardware > + * is capable of generating Dual Address Cycle (DAC). > + */ > + > +#include <linux/pci.h> > + > +static struct resource mem64 = { > + .name = "PCI mem64", > + .start = (resource_size_t)1 << 32, /* 4Gb */ > + .end = -1, > + .flags = IORESOURCE_MEM, > +}; > + > +void pcibios_pci64_setup(void) > +{ > + struct resource *r64 = &mem64, *root = &iomem_resource; > + struct pci_bus *b; > + > + if (insert_resource(root, r64)) { > + printk(KERN_WARNING "PCI: Failed to allocate PCI64 space\n"); > + return; > + } > + > + list_for_each_entry(b, &pci_root_buses, node) { > + /* Is this a "standard" root bus created by pci_create_bus? */ > + if (b->resource[1] != root || b->resource[2]) > + continue; > + b->resource[2] = r64; /* create DAC resource */ > + } > +} > + > +void pcibios_pci64_verify(void) > +{ > + struct pci_bus *b; > + > + if (mem64.flags & IORESOURCE_MEM64) > + return; /* presumably DAC works */ > + list_for_each_entry(b, &pci_root_buses, node) { > + if (b->resource[2] == &mem64) > + b->resource[2] = NULL; > + } > + printk(KERN_INFO "PCI: allocations above 4G disabled\n"); > +} > diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c > index f1817f7..bf8eb75 100644 > --- a/arch/x86/pci/i386.c > +++ b/arch/x86/pci/i386.c > @@ -137,6 +137,10 @@ static void __init pcibios_allocate_bus_resources(struct list_head *bus_list) > * range. > */ > r->flags = 0; > + } else { > + /* Successful allocation */ > + if (upper_32_bits(r->start)) > + pr->flags |= IORESOURCE_MEM64; > } > } > } > @@ -174,6 +178,10 @@ static void __init pcibios_allocate_resources(int pass) > /* We'll assign a new address later */ > r->end -= r->start; > r->start = 0; > + } else { > + /* Successful allocation */ > + if (upper_32_bits(r->start)) > + pr->flags |= IORESOURCE_MEM64; > } > } > } > @@ -225,9 +233,11 @@ static int __init pcibios_assign_resources(void) > void __init pcibios_resource_survey(void) > { > DBG("PCI: Allocating resources\n"); > + pcibios_pci64_setup(); > pcibios_allocate_bus_resources(&pci_root_buses); > pcibios_allocate_resources(0); > pcibios_allocate_resources(1); > + pcibios_pci64_verify(); > > e820_reserve_resources_late(); > } > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > index 32e4b2f..30403b3 100644 > --- a/include/linux/ioport.h > +++ b/include/linux/ioport.h > @@ -49,6 +49,8 @@ struct resource_list { > #define IORESOURCE_SIZEALIGN 0x00020000 /* size indicates alignment */ > #define IORESOURCE_STARTALIGN 0x00040000 /* start field is alignment */ > > +#define IORESOURCE_MEM64 0x00080000 /* 64-bit addressing, >4G */ > + > #define IORESOURCE_EXCLUSIVE 0x08000000 /* Userland may not map this resource */ > #define IORESOURCE_DISABLED 0x10000000 > #define IORESOURCE_UNSET 0x20000000 i don't think this going to work with AMD system with 2 and more HT chains. also it may have problem with new intel system with 2 IOHs. YH -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ivan Kokshaysky wrote: > On Sun, Apr 19, 2009 at 11:06:15AM +0200, Ingo Molnar wrote: >> Hm, there's one patch in that lot that does: >> >> drivers/pci/bus.c | 8 +++++++- >> drivers/pci/probe.c | 8 ++++++-- >> drivers/pci/setup-bus.c | 40 +++++++++++++++++++++++++++++++--------- >> 3 files changed, 44 insertions(+), 12 deletions(-) >> >> Which should go via the PCI tree. > > Here is a replacement for that patch which doesn't touch > the generic code. > > Ivan. > > --- > x86 pci: first cut on 64-bit resource allocation > > I believe that we should consider PCI memory above 4G as yet another > type of address space. This actually makes sense, as even accesses to that > memory are physically different - Dual Address Cycle (DAC) vs. 32-bit > Single Address Cycle (SAC). > > So, platform that can deal with 64-bit allocations would set up an > additional root bus resource and mark it with IORESOURCE_MEM64 flag. > > The main problem here is how the kernel would detect that hardware can > actually access a DAC memory (I know for a fact that a lot of Intel chipsets > cannot access MMIO >4G, even though subordinate p2p bridges are 64-bit > capable). > On the other hand, there are PCI devices with 64-bit BARs that do not > work properly being placed above 4G boundary. For example, some > radeon cards have 64-bit BAR for video RAM, but allocating that BAR in > the DAC area doesn't work for various reasons, like video-BIOS > limitations or drivers not taking into account that GPU is 32-bit. > > So moving stuff into MEM64 area should be considered as generally unsafe > operation, and the best default policy is to not enable MEM64 resource > unless we find that BIOS has allocated something there. > At the same time, MEM64 can be easily enabled/disabled based on host > bridge PCI IDs, kernel command line options and so on. > > Here is a basic implementation of the above for x86. I think it's > reasonably good starting point for PCI64 work - the next step would > be to teach pci_bus_alloc_resource() about IORESOURCE_MEM64: logic is > similar to prefetch vs non-prefetch case - MEM can hold MEM64 resource, > but not vice versa. And eventually bridge sizing code will be updated > for reasonable 64-bit allocations (it's a non-trivial task, though). > > This patch alone should fix cardbus >4G allocations and similar > nonsense. > > Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> > --- > arch/x86/include/asm/pci.h | 8 ++++++++ > arch/x86/pci/Makefile | 2 ++ > arch/x86/pci/dac_64bit.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ > arch/x86/pci/i386.c | 10 ++++++++++ > include/linux/ioport.h | 2 ++ > 5 files changed, 66 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h > index b51a1e8..5a9c54e 100644 > --- a/arch/x86/include/asm/pci.h > +++ b/arch/x86/include/asm/pci.h > @@ -86,6 +86,14 @@ static inline void early_quirks(void) { } > > extern void pci_iommu_alloc(void); > > +#ifdef CONFIG_ARCH_PHYS_ADDR_T_64BIT > +extern void pcibios_pci64_setup(void); > +extern void pcibios_pci64_verify(void); > +#else > +static inline void pcibios_pci64_setup(void) { } > +static inline void pcibios_pci64_verify(void) { } > +#endif > + > /* MSI arch hook */ > #define arch_setup_msi_irqs arch_setup_msi_irqs > > diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile > index d49202e..1b6c576 100644 > --- a/arch/x86/pci/Makefile > +++ b/arch/x86/pci/Makefile > @@ -13,5 +13,7 @@ obj-$(CONFIG_X86_VISWS) += visws.o > > obj-$(CONFIG_X86_NUMAQ) += numaq_32.o > > +obj-$(CONFIG_ARCH_PHYS_ADDR_T_64BIT) += dac_64bit.o > + > obj-y += common.o early.o > obj-y += amd_bus.o > diff --git a/arch/x86/pci/dac_64bit.c b/arch/x86/pci/dac_64bit.c > new file mode 100644 > index 0000000..ee03c4a > --- /dev/null > +++ b/arch/x86/pci/dac_64bit.c > @@ -0,0 +1,44 @@ > +/* > + * Set up the 64-bit bus resource for allocations > 4G if the hardware > + * is capable of generating Dual Address Cycle (DAC). > + */ > + > +#include <linux/pci.h> > + > +static struct resource mem64 = { > + .name = "PCI mem64", > + .start = (resource_size_t)1 << 32, /* 4Gb */ > + .end = -1, > + .flags = IORESOURCE_MEM, > +}; > + > +void pcibios_pci64_setup(void) > +{ > + struct resource *r64 = &mem64, *root = &iomem_resource; > + struct pci_bus *b; > + > + if (insert_resource(root, r64)) { > + printk(KERN_WARNING "PCI: Failed to allocate PCI64 space\n"); > + return; > + } > + > + list_for_each_entry(b, &pci_root_buses, node) { > + /* Is this a "standard" root bus created by pci_create_bus? */ > + if (b->resource[1] != root || b->resource[2]) > + continue; > + b->resource[2] = r64; /* create DAC resource */ > + } > +} > + > +void pcibios_pci64_verify(void) > +{ > + struct pci_bus *b; > + > + if (mem64.flags & IORESOURCE_MEM64) > + return; /* presumably DAC works */ > + list_for_each_entry(b, &pci_root_buses, node) { > + if (b->resource[2] == &mem64) > + b->resource[2] = NULL; > + } > + printk(KERN_INFO "PCI: allocations above 4G disabled\n"); > +} > diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c > index f1817f7..bf8eb75 100644 > --- a/arch/x86/pci/i386.c > +++ b/arch/x86/pci/i386.c > @@ -137,6 +137,10 @@ static void __init pcibios_allocate_bus_resources(struct list_head *bus_list) > * range. > */ > r->flags = 0; > + } else { > + /* Successful allocation */ > + if (upper_32_bits(r->start)) > + pr->flags |= IORESOURCE_MEM64; > } > } > } > @@ -174,6 +178,10 @@ static void __init pcibios_allocate_resources(int pass) > /* We'll assign a new address later */ > r->end -= r->start; > r->start = 0; > + } else { > + /* Successful allocation */ > + if (upper_32_bits(r->start)) > + pr->flags |= IORESOURCE_MEM64; > } > } > } > @@ -225,9 +233,11 @@ static int __init pcibios_assign_resources(void) > void __init pcibios_resource_survey(void) > { > DBG("PCI: Allocating resources\n"); > + pcibios_pci64_setup(); > pcibios_allocate_bus_resources(&pci_root_buses); > pcibios_allocate_resources(0); > pcibios_allocate_resources(1); > + pcibios_pci64_verify(); > > e820_reserve_resources_late(); > } > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > index 32e4b2f..30403b3 100644 > --- a/include/linux/ioport.h > +++ b/include/linux/ioport.h > @@ -49,6 +49,8 @@ struct resource_list { > #define IORESOURCE_SIZEALIGN 0x00020000 /* size indicates alignment */ > #define IORESOURCE_STARTALIGN 0x00040000 /* start field is alignment */ > > +#define IORESOURCE_MEM64 0x00080000 /* 64-bit addressing, >4G */ > + > #define IORESOURCE_EXCLUSIVE 0x08000000 /* Userland may not map this resource */ > #define IORESOURCE_DISABLED 0x10000000 > #define IORESOURCE_UNSET 0x20000000 also it seems logical is wrong. we should make sure if one pci resource support 64 from pci_read_bases() instead of pcibios_allocate_resources. thinking about: if pci bridge on bus 0 (aka first peer root bus), some device under bridge doesn't get allocated resource from BIOS. and those bridge/device does support pref mem64 then your patch will not getIORESORCE_MEM64 set before pcibios_pci64_verify... Correct logic should be record all device if support 64bit pref mem (with pci_read_bases), and make sure bridge to be consistent with that of device under if. and that is my patch doing: pci: don't assume pref memio are 64bit -v2 YH -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 20, 2009 at 03:52:26PM -0700, Yinghai Lu wrote: > i don't think this going to work with AMD system with 2 and more HT chains. > > also it may have problem with new intel system with 2 IOHs. There are no problems with these systems, my patch is just a no-op for them, which is fine for now - please note this is only a first step to get 64-bit allocations right. Ivan. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 20, 2009 at 05:09:32PM -0700, Yinghai Lu wrote: > also it seems logical is wrong. > > we should make sure if one pci resource support 64 from pci_read_bases() instead of > pcibios_allocate_resources. pci_read_bases() is already providing all necessary information. > thinking about: if pci bridge on bus 0 (aka first peer root bus), some device under > bridge doesn't get allocated resource from BIOS. and those bridge/device does support pref mem64 > then your patch will not getIORESORCE_MEM64 set before pcibios_pci64_verify... Yes, that's my point - even if host bridge, p2p bridge and device are 64-bit, there is absolutely no guarantee that after moving the BARs above 4G the device will work correctly. > Correct logic should be > record all device if support 64bit pref mem (with pci_read_bases), and make sure bridge to be > consistent with that of device under if. Your view is very x86 centric, please don't forget that drivers/pci code is used by other architectures as well: - limiting 32-bit allocations to 0xffffffff simply breaks non-x86 architectures. Alpha doesn't even boot with your patch; - there are lots of devices with 64-bit non-prefetchable memory BARs, you don't seem to care about that. And your patch doesn't work even on x86: 00:01.0 PCI bridge: Intel Corporation 82945G/GZ/P/PL PCI Express Root Port (rev 02) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 16 bytes Bus: primary=00, secondary=04, subordinate=04, sec-latency=0 I/O behind bridge: 0000e000-0000efff Memory behind bridge: cdf00000-cfffffff Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [88] Subsystem: Intel Corporation Device 0000 Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+ Address: fee0300c Data: 4159 Capabilities: [a0] Express (v1) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- RBE- FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #2, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <256ns, L1 <4us ClockPM- Suprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise- Slot # 0, PowerLimit 75.000000; Interlock- NoCompl- SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Off, PwrInd On, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet+ LinkState- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- 00: 86 80 71 27 07 05 10 00 02 00 04 06 04 00 01 00 10: 00 00 00 00 00 00 00 00 00 04 04 00 e0 e0 00 00 20: f0 cd f0 cf 01 d0 f1 df 00 00 00 00 00 00 00 00 30: 00 00 00 00 88 00 00 00 00 00 00 00 0b 01 0a 00 As one can see, the prefetchable base register (0x24) is 0xd001, the bit 0 indicates 64-bitness. Which is not true, as i82945G/GZ/P/PL only supports 32-bit addressing (please check the datasheet). Also, your patch can't handle transparent bridges. And it doesn't bode well for bus sizing code. > and that is my patch doing: pci: don't assume pref memio are 64bit -v2 Your patch is all wrong, sorry. Ivan. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 21 Apr 2009 02:33:05 +0400 Ivan Kokshaysky <ink@jurassic.park.msu.ru> wrote: > x86 pci: first cut on 64-bit resource allocation > > I believe that we should consider PCI memory above 4G as yet another > type of address space. This actually makes sense, as even accesses to > that memory are physically different - Dual Address Cycle (DAC) vs. > 32-bit Single Address Cycle (SAC). > > So, platform that can deal with 64-bit allocations would set up an > additional root bus resource and mark it with IORESOURCE_MEM64 flag. > > The main problem here is how the kernel would detect that hardware can > actually access a DAC memory (I know for a fact that a lot of Intel > chipsets cannot access MMIO >4G, even though subordinate p2p bridges > are 64-bit capable). > On the other hand, there are PCI devices with 64-bit BARs that do not > work properly being placed above 4G boundary. For example, some > radeon cards have 64-bit BAR for video RAM, but allocating that BAR in > the DAC area doesn't work for various reasons, like video-BIOS > limitations or drivers not taking into account that GPU is 32-bit. > > So moving stuff into MEM64 area should be considered as generally > unsafe operation, and the best default policy is to not enable MEM64 > resource unless we find that BIOS has allocated something there. > At the same time, MEM64 can be easily enabled/disabled based on host > bridge PCI IDs, kernel command line options and so on. This sounds like reasonable default behavior given the variety of chipsets and device quirks out there. This does muddy up the arch vs. generic code just a little bit more though; iirc mips & ia64 use full 64 bit ranges for their current IORESOURCE_MEM types (hm now that I've checked it appears ia64 has changed a bit here, but still other arches should probably get cleaned up to use the new 64 bit type at some point). > Here is a basic implementation of the above for x86. I think it's > reasonably good starting point for PCI64 work - the next step would > be to teach pci_bus_alloc_resource() about IORESOURCE_MEM64: logic is > similar to prefetch vs non-prefetch case - MEM can hold MEM64 > resource, but not vice versa. And eventually bridge sizing code will > be updated for reasonable 64-bit allocations (it's a non-trivial > task, though). > > This patch alone should fix cardbus >4G allocations and similar > nonsense. > > Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Nice. Any cleanups to existing arch code could be done at the same time as the updates to the bus allocation. Anyone care to send me some tested-by lines? Thanks,
Ivan Kokshaysky wrote: > On Mon, Apr 20, 2009 at 05:09:32PM -0700, Yinghai Lu wrote: >> also it seems logical is wrong. >> >> we should make sure if one pci resource support 64 from pci_read_bases() instead of >> pcibios_allocate_resources. > > pci_read_bases() is already providing all necessary information. current pci_read_bases() does not tell us if that device support 32bit pref mem or 64bit pref. aka there is type about that, but that is not recording in res->flag > >> thinking about: if pci bridge on bus 0 (aka first peer root bus), some device under >> bridge doesn't get allocated resource from BIOS. and those bridge/device does support pref mem64 >> then your patch will not getIORESORCE_MEM64 set before pcibios_pci64_verify... > > Yes, that's my point - even if host bridge, p2p bridge and device are 64-bit, > there is absolutely no guarantee that after moving the BARs above 4G > the device will work correctly. > >> Correct logic should be >> record all device if support 64bit pref mem (with pci_read_bases), and make sure bridge to be >> consistent with that of device under if. > > Your view is very x86 centric, please don't forget that drivers/pci > code is used by other architectures as well: > - limiting 32-bit allocations to 0xffffffff simply breaks non-x86 > architectures. Alpha doesn't even boot with your patch; will look at it, may limit that to x86 arch. > - there are lots of devices with 64-bit non-prefetchable memory BARs, > you don't seem to care about that. the fact is current p2p spec said: mem ( non pref) is 32bit. and pref mem could be 64bit or 32 bit. with devices on root buses directly, we may need to expand current res->flags & IORESOURCE_PREFETCH checking to support it. > > And your patch doesn't work even on x86: > > 00:01.0 PCI bridge: Intel Corporation 82945G/GZ/P/PL PCI Express Root Port (rev 02) (prog-if 00 [Normal decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 16 bytes > Bus: primary=00, secondary=04, subordinate=04, sec-latency=0 > I/O behind bridge: 0000e000-0000efff > Memory behind bridge: cdf00000-cfffffff > Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- > BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B- > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > Capabilities: [88] Subsystem: Intel Corporation Device 0000 > Capabilities: [80] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+ > Address: fee0300c Data: 4159 > Capabilities: [a0] Express (v1) Root Port (Slot+), MSI 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us > ExtTag- RBE- FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- > MaxPayload 128 bytes, MaxReadReq 128 bytes > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- > LnkCap: Port #2, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <256ns, L1 <4us > ClockPM- Suprise- LLActRep- BwNot- > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise- > Slot # 0, PowerLimit 75.000000; Interlock- NoCompl- > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- > Control: AttnInd Off, PwrInd On, Power- Interlock- > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- > Changed: MRL- PresDet+ LinkState- > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- > RootCap: CRSVisible- > RootSta: PME ReqID 0000, PMEStatus- PMEPending- > 00: 86 80 71 27 07 05 10 00 02 00 04 06 04 00 01 00 > 10: 00 00 00 00 00 00 00 00 00 04 04 00 e0 e0 00 00 > 20: f0 cd f0 cf 01 d0 f1 df 00 00 00 00 00 00 00 00 > 30: 00 00 00 00 88 00 00 00 00 00 00 00 0b 01 0a 00 > > As one can see, the prefetchable base register (0x24) is 0xd001, the > bit 0 indicates 64-bitness. Which is not true, as i82945G/GZ/P/PL > only supports 32-bit addressing (please check the datasheet). that should be done with quirks way to handle it. > > Also, your patch can't handle transparent bridges. And it doesn't > bode well for bus sizing code. my patch is only trying to handle the case: don't assume all pref mem is 64bit so don't assign resource above to 4g to pci bridge that has device under it but the device only support 32 bit pref. for the bus resize, may could be use your mem64 res, but set that flag always, and in pci_bus_size_bridges, try to expand mask = IORESOURCE_MEM; prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH; if (pbus_size_mem(bus, prefmask, prefmask)) mask = prefmask; /* Success, size non-prefetch only. */ pbus_size_mem(bus, mask, IORESOURCE_MEM); to have more tries, and that flags. YH -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h index b51a1e8..5a9c54e 100644 --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -86,6 +86,14 @@ static inline void early_quirks(void) { } extern void pci_iommu_alloc(void); +#ifdef CONFIG_ARCH_PHYS_ADDR_T_64BIT +extern void pcibios_pci64_setup(void); +extern void pcibios_pci64_verify(void); +#else +static inline void pcibios_pci64_setup(void) { } +static inline void pcibios_pci64_verify(void) { } +#endif + /* MSI arch hook */ #define arch_setup_msi_irqs arch_setup_msi_irqs diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile index d49202e..1b6c576 100644 --- a/arch/x86/pci/Makefile +++ b/arch/x86/pci/Makefile @@ -13,5 +13,7 @@ obj-$(CONFIG_X86_VISWS) += visws.o obj-$(CONFIG_X86_NUMAQ) += numaq_32.o +obj-$(CONFIG_ARCH_PHYS_ADDR_T_64BIT) += dac_64bit.o + obj-y += common.o early.o obj-y += amd_bus.o diff --git a/arch/x86/pci/dac_64bit.c b/arch/x86/pci/dac_64bit.c new file mode 100644 index 0000000..ee03c4a --- /dev/null +++ b/arch/x86/pci/dac_64bit.c @@ -0,0 +1,44 @@ +/* + * Set up the 64-bit bus resource for allocations > 4G if the hardware + * is capable of generating Dual Address Cycle (DAC). + */ + +#include <linux/pci.h> + +static struct resource mem64 = { + .name = "PCI mem64", + .start = (resource_size_t)1 << 32, /* 4Gb */ + .end = -1, + .flags = IORESOURCE_MEM, +}; + +void pcibios_pci64_setup(void) +{ + struct resource *r64 = &mem64, *root = &iomem_resource; + struct pci_bus *b; + + if (insert_resource(root, r64)) { + printk(KERN_WARNING "PCI: Failed to allocate PCI64 space\n"); + return; + } + + list_for_each_entry(b, &pci_root_buses, node) { + /* Is this a "standard" root bus created by pci_create_bus? */ + if (b->resource[1] != root || b->resource[2]) + continue; + b->resource[2] = r64; /* create DAC resource */ + } +} + +void pcibios_pci64_verify(void) +{ + struct pci_bus *b; + + if (mem64.flags & IORESOURCE_MEM64) + return; /* presumably DAC works */ + list_for_each_entry(b, &pci_root_buses, node) { + if (b->resource[2] == &mem64) + b->resource[2] = NULL; + } + printk(KERN_INFO "PCI: allocations above 4G disabled\n"); +} diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c index f1817f7..bf8eb75 100644 --- a/arch/x86/pci/i386.c +++ b/arch/x86/pci/i386.c @@ -137,6 +137,10 @@ static void __init pcibios_allocate_bus_resources(struct list_head *bus_list) * range. */ r->flags = 0; + } else { + /* Successful allocation */ + if (upper_32_bits(r->start)) + pr->flags |= IORESOURCE_MEM64; } } } @@ -174,6 +178,10 @@ static void __init pcibios_allocate_resources(int pass) /* We'll assign a new address later */ r->end -= r->start; r->start = 0; + } else { + /* Successful allocation */ + if (upper_32_bits(r->start)) + pr->flags |= IORESOURCE_MEM64; } } } @@ -225,9 +233,11 @@ static int __init pcibios_assign_resources(void) void __init pcibios_resource_survey(void) { DBG("PCI: Allocating resources\n"); + pcibios_pci64_setup(); pcibios_allocate_bus_resources(&pci_root_buses); pcibios_allocate_resources(0); pcibios_allocate_resources(1); + pcibios_pci64_verify(); e820_reserve_resources_late(); } diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 32e4b2f..30403b3 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -49,6 +49,8 @@ struct resource_list { #define IORESOURCE_SIZEALIGN 0x00020000 /* size indicates alignment */ #define IORESOURCE_STARTALIGN 0x00040000 /* start field is alignment */ +#define IORESOURCE_MEM64 0x00080000 /* 64-bit addressing, >4G */ + #define IORESOURCE_EXCLUSIVE 0x08000000 /* Userland may not map this resource */ #define IORESOURCE_DISABLED 0x10000000 #define IORESOURCE_UNSET 0x20000000