diff mbox

x86/pci: make pci_mem_start to be aligned only -v4

Message ID 20090420223305.GA15340@jurassic.park.msu.ru (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Ivan Kokshaysky April 20, 2009, 10:33 p.m. UTC
On Sun, Apr 19, 2009 at 11:06:15AM +0200, Ingo Molnar wrote:
> Hm, there's one patch in that lot that does:
> 
>  drivers/pci/bus.c	 |    8 +++++++-
>  drivers/pci/probe.c     |    8 ++++++--
>  drivers/pci/setup-bus.c |   40 +++++++++++++++++++++++++++++++---------
>  3 files changed, 44 insertions(+), 12 deletions(-)
> 
> Which should go via the PCI tree.

Here is a replacement for that patch which doesn't touch
the generic code.

Ivan.

---
x86 pci: first cut on 64-bit resource allocation

I believe that we should consider PCI memory above 4G as yet another
type of address space. This actually makes sense, as even accesses to that
memory are physically different - Dual Address Cycle (DAC) vs. 32-bit
Single Address Cycle (SAC).

So, platform that can deal with 64-bit allocations would set up an
additional root bus resource and mark it with IORESOURCE_MEM64 flag.

The main problem here is how the kernel would detect that hardware can
actually access a DAC memory (I know for a fact that a lot of Intel chipsets
cannot access MMIO >4G, even though subordinate p2p bridges are 64-bit
capable).
On the other hand, there are PCI devices with 64-bit BARs that do not
work properly being placed above 4G boundary. For example, some
radeon cards have 64-bit BAR for video RAM, but allocating that BAR in
the DAC area doesn't work for various reasons, like video-BIOS
limitations or drivers not taking into account that GPU is 32-bit.

So moving stuff into MEM64 area should be considered as generally unsafe
operation, and the best default policy is to not enable MEM64 resource
unless we find that BIOS has allocated something there.
At the same time, MEM64 can be easily enabled/disabled based on host
bridge PCI IDs, kernel command line options and so on.

Here is a basic implementation of the above for x86. I think it's
reasonably good starting point for PCI64 work - the next step would
be to teach pci_bus_alloc_resource() about IORESOURCE_MEM64: logic is
similar to prefetch vs non-prefetch case - MEM can hold MEM64 resource,
but not vice versa. And eventually bridge sizing code will be updated
for reasonable 64-bit allocations (it's a non-trivial task, though).

This patch alone should fix cardbus >4G allocations and similar
nonsense.

Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
---
 arch/x86/include/asm/pci.h |    8 ++++++++
 arch/x86/pci/Makefile      |    2 ++
 arch/x86/pci/dac_64bit.c   |   44 ++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/pci/i386.c        |   10 ++++++++++
 include/linux/ioport.h     |    2 ++
 5 files changed, 66 insertions(+), 0 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Yinghai Lu April 20, 2009, 10:52 p.m. UTC | #1
Ivan Kokshaysky wrote:
> On Sun, Apr 19, 2009 at 11:06:15AM +0200, Ingo Molnar wrote:
>> Hm, there's one patch in that lot that does:
>>
>>  drivers/pci/bus.c	 |    8 +++++++-
>>  drivers/pci/probe.c     |    8 ++++++--
>>  drivers/pci/setup-bus.c |   40 +++++++++++++++++++++++++++++++---------
>>  3 files changed, 44 insertions(+), 12 deletions(-)
>>
>> Which should go via the PCI tree.
> 
> Here is a replacement for that patch which doesn't touch
> the generic code.
> 
> Ivan.
> 
> ---
> x86 pci: first cut on 64-bit resource allocation
> 
> I believe that we should consider PCI memory above 4G as yet another
> type of address space. This actually makes sense, as even accesses to that
> memory are physically different - Dual Address Cycle (DAC) vs. 32-bit
> Single Address Cycle (SAC).
> 
> So, platform that can deal with 64-bit allocations would set up an
> additional root bus resource and mark it with IORESOURCE_MEM64 flag.
> 
> The main problem here is how the kernel would detect that hardware can
> actually access a DAC memory (I know for a fact that a lot of Intel chipsets
> cannot access MMIO >4G, even though subordinate p2p bridges are 64-bit
> capable).
> On the other hand, there are PCI devices with 64-bit BARs that do not
> work properly being placed above 4G boundary. For example, some
> radeon cards have 64-bit BAR for video RAM, but allocating that BAR in
> the DAC area doesn't work for various reasons, like video-BIOS
> limitations or drivers not taking into account that GPU is 32-bit.
> 
> So moving stuff into MEM64 area should be considered as generally unsafe
> operation, and the best default policy is to not enable MEM64 resource
> unless we find that BIOS has allocated something there.
> At the same time, MEM64 can be easily enabled/disabled based on host
> bridge PCI IDs, kernel command line options and so on.
> 
> Here is a basic implementation of the above for x86. I think it's
> reasonably good starting point for PCI64 work - the next step would
> be to teach pci_bus_alloc_resource() about IORESOURCE_MEM64: logic is
> similar to prefetch vs non-prefetch case - MEM can hold MEM64 resource,
> but not vice versa. And eventually bridge sizing code will be updated
> for reasonable 64-bit allocations (it's a non-trivial task, though).
> 
> This patch alone should fix cardbus >4G allocations and similar
> nonsense.
> 
> Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
> ---
>  arch/x86/include/asm/pci.h |    8 ++++++++
>  arch/x86/pci/Makefile      |    2 ++
>  arch/x86/pci/dac_64bit.c   |   44 ++++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/pci/i386.c        |   10 ++++++++++
>  include/linux/ioport.h     |    2 ++
>  5 files changed, 66 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
> index b51a1e8..5a9c54e 100644
> --- a/arch/x86/include/asm/pci.h
> +++ b/arch/x86/include/asm/pci.h
> @@ -86,6 +86,14 @@ static inline void early_quirks(void) { }
>  
>  extern void pci_iommu_alloc(void);
>  
> +#ifdef CONFIG_ARCH_PHYS_ADDR_T_64BIT
> +extern void pcibios_pci64_setup(void);
> +extern void pcibios_pci64_verify(void);
> +#else
> +static inline void pcibios_pci64_setup(void) { }
> +static inline void pcibios_pci64_verify(void) { }
> +#endif
> +
>  /* MSI arch hook */
>  #define arch_setup_msi_irqs arch_setup_msi_irqs
>  
> diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile
> index d49202e..1b6c576 100644
> --- a/arch/x86/pci/Makefile
> +++ b/arch/x86/pci/Makefile
> @@ -13,5 +13,7 @@ obj-$(CONFIG_X86_VISWS)		+= visws.o
>  
>  obj-$(CONFIG_X86_NUMAQ)		+= numaq_32.o
>  
> +obj-$(CONFIG_ARCH_PHYS_ADDR_T_64BIT)	+= dac_64bit.o
> +
>  obj-y				+= common.o early.o
>  obj-y				+= amd_bus.o
> diff --git a/arch/x86/pci/dac_64bit.c b/arch/x86/pci/dac_64bit.c
> new file mode 100644
> index 0000000..ee03c4a
> --- /dev/null
> +++ b/arch/x86/pci/dac_64bit.c
> @@ -0,0 +1,44 @@
> +/*
> + * Set up the 64-bit bus resource for allocations > 4G if the hardware
> + * is capable of generating Dual Address Cycle (DAC).
> + */
> +
> +#include <linux/pci.h>
> +
> +static struct resource mem64 = {
> +	.name	= "PCI mem64",
> +	.start	= (resource_size_t)1 << 32,	/* 4Gb */
> +	.end	= -1,
> +	.flags	= IORESOURCE_MEM,
> +};
> +
> +void pcibios_pci64_setup(void)
> +{
> +	struct resource *r64 = &mem64, *root = &iomem_resource;
> +	struct pci_bus *b;
> +
> +	if (insert_resource(root, r64)) {
> +		printk(KERN_WARNING "PCI: Failed to allocate PCI64 space\n");
> +		return;
> +	}
> +
> +	list_for_each_entry(b, &pci_root_buses, node) {
> +		/* Is this a "standard" root bus created by pci_create_bus? */
> +		if (b->resource[1] != root || b->resource[2])
> +			continue;
> +		b->resource[2] = r64;	/* create DAC resource */
> +	}
> +}
> +
> +void pcibios_pci64_verify(void)
> +{
> +	struct pci_bus *b;
> +
> +	if (mem64.flags & IORESOURCE_MEM64)
> +		return;		/* presumably DAC works */
> +	list_for_each_entry(b, &pci_root_buses, node) {
> +		if (b->resource[2] == &mem64)
> +			b->resource[2] = NULL;
> +	}
> +	printk(KERN_INFO "PCI: allocations above 4G disabled\n");
> +}
> diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
> index f1817f7..bf8eb75 100644
> --- a/arch/x86/pci/i386.c
> +++ b/arch/x86/pci/i386.c
> @@ -137,6 +137,10 @@ static void __init pcibios_allocate_bus_resources(struct list_head *bus_list)
>  					 * range.
>  					 */
>  					r->flags = 0;
> +				} else {
> +					/* Successful allocation */
> +					if (upper_32_bits(r->start))
> +						pr->flags |= IORESOURCE_MEM64;
>  				}
>  			}
>  		}
> @@ -174,6 +178,10 @@ static void __init pcibios_allocate_resources(int pass)
>  					/* We'll assign a new address later */
>  					r->end -= r->start;
>  					r->start = 0;
> +				} else {
> +					/* Successful allocation */
> +					if (upper_32_bits(r->start))
> +						pr->flags |= IORESOURCE_MEM64;
>  				}
>  			}
>  		}
> @@ -225,9 +233,11 @@ static int __init pcibios_assign_resources(void)
>  void __init pcibios_resource_survey(void)
>  {
>  	DBG("PCI: Allocating resources\n");
> +	pcibios_pci64_setup();
>  	pcibios_allocate_bus_resources(&pci_root_buses);
>  	pcibios_allocate_resources(0);
>  	pcibios_allocate_resources(1);
> +	pcibios_pci64_verify();
>  
>  	e820_reserve_resources_late();
>  }
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 32e4b2f..30403b3 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -49,6 +49,8 @@ struct resource_list {
>  #define IORESOURCE_SIZEALIGN	0x00020000	/* size indicates alignment */
>  #define IORESOURCE_STARTALIGN	0x00040000	/* start field is alignment */
>  
> +#define IORESOURCE_MEM64	0x00080000	/* 64-bit addressing, >4G */
> +
>  #define IORESOURCE_EXCLUSIVE	0x08000000	/* Userland may not map this resource */
>  #define IORESOURCE_DISABLED	0x10000000
>  #define IORESOURCE_UNSET	0x20000000

i don't think this going to work with AMD system with 2 and more HT chains.

also it may have problem with new intel system with 2 IOHs.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu April 21, 2009, 12:09 a.m. UTC | #2
Ivan Kokshaysky wrote:
> On Sun, Apr 19, 2009 at 11:06:15AM +0200, Ingo Molnar wrote:
>> Hm, there's one patch in that lot that does:
>>
>>  drivers/pci/bus.c	 |    8 +++++++-
>>  drivers/pci/probe.c     |    8 ++++++--
>>  drivers/pci/setup-bus.c |   40 +++++++++++++++++++++++++++++++---------
>>  3 files changed, 44 insertions(+), 12 deletions(-)
>>
>> Which should go via the PCI tree.
> 
> Here is a replacement for that patch which doesn't touch
> the generic code.
> 
> Ivan.
> 
> ---
> x86 pci: first cut on 64-bit resource allocation
> 
> I believe that we should consider PCI memory above 4G as yet another
> type of address space. This actually makes sense, as even accesses to that
> memory are physically different - Dual Address Cycle (DAC) vs. 32-bit
> Single Address Cycle (SAC).
> 
> So, platform that can deal with 64-bit allocations would set up an
> additional root bus resource and mark it with IORESOURCE_MEM64 flag.
> 
> The main problem here is how the kernel would detect that hardware can
> actually access a DAC memory (I know for a fact that a lot of Intel chipsets
> cannot access MMIO >4G, even though subordinate p2p bridges are 64-bit
> capable).
> On the other hand, there are PCI devices with 64-bit BARs that do not
> work properly being placed above 4G boundary. For example, some
> radeon cards have 64-bit BAR for video RAM, but allocating that BAR in
> the DAC area doesn't work for various reasons, like video-BIOS
> limitations or drivers not taking into account that GPU is 32-bit.
> 
> So moving stuff into MEM64 area should be considered as generally unsafe
> operation, and the best default policy is to not enable MEM64 resource
> unless we find that BIOS has allocated something there.
> At the same time, MEM64 can be easily enabled/disabled based on host
> bridge PCI IDs, kernel command line options and so on.
> 
> Here is a basic implementation of the above for x86. I think it's
> reasonably good starting point for PCI64 work - the next step would
> be to teach pci_bus_alloc_resource() about IORESOURCE_MEM64: logic is
> similar to prefetch vs non-prefetch case - MEM can hold MEM64 resource,
> but not vice versa. And eventually bridge sizing code will be updated
> for reasonable 64-bit allocations (it's a non-trivial task, though).
> 
> This patch alone should fix cardbus >4G allocations and similar
> nonsense.
> 
> Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
> ---
>  arch/x86/include/asm/pci.h |    8 ++++++++
>  arch/x86/pci/Makefile      |    2 ++
>  arch/x86/pci/dac_64bit.c   |   44 ++++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/pci/i386.c        |   10 ++++++++++
>  include/linux/ioport.h     |    2 ++
>  5 files changed, 66 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
> index b51a1e8..5a9c54e 100644
> --- a/arch/x86/include/asm/pci.h
> +++ b/arch/x86/include/asm/pci.h
> @@ -86,6 +86,14 @@ static inline void early_quirks(void) { }
>  
>  extern void pci_iommu_alloc(void);
>  
> +#ifdef CONFIG_ARCH_PHYS_ADDR_T_64BIT
> +extern void pcibios_pci64_setup(void);
> +extern void pcibios_pci64_verify(void);
> +#else
> +static inline void pcibios_pci64_setup(void) { }
> +static inline void pcibios_pci64_verify(void) { }
> +#endif
> +
>  /* MSI arch hook */
>  #define arch_setup_msi_irqs arch_setup_msi_irqs
>  
> diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile
> index d49202e..1b6c576 100644
> --- a/arch/x86/pci/Makefile
> +++ b/arch/x86/pci/Makefile
> @@ -13,5 +13,7 @@ obj-$(CONFIG_X86_VISWS)		+= visws.o
>  
>  obj-$(CONFIG_X86_NUMAQ)		+= numaq_32.o
>  
> +obj-$(CONFIG_ARCH_PHYS_ADDR_T_64BIT)	+= dac_64bit.o
> +
>  obj-y				+= common.o early.o
>  obj-y				+= amd_bus.o
> diff --git a/arch/x86/pci/dac_64bit.c b/arch/x86/pci/dac_64bit.c
> new file mode 100644
> index 0000000..ee03c4a
> --- /dev/null
> +++ b/arch/x86/pci/dac_64bit.c
> @@ -0,0 +1,44 @@
> +/*
> + * Set up the 64-bit bus resource for allocations > 4G if the hardware
> + * is capable of generating Dual Address Cycle (DAC).
> + */
> +
> +#include <linux/pci.h>
> +
> +static struct resource mem64 = {
> +	.name	= "PCI mem64",
> +	.start	= (resource_size_t)1 << 32,	/* 4Gb */
> +	.end	= -1,
> +	.flags	= IORESOURCE_MEM,
> +};
> +
> +void pcibios_pci64_setup(void)
> +{
> +	struct resource *r64 = &mem64, *root = &iomem_resource;
> +	struct pci_bus *b;
> +
> +	if (insert_resource(root, r64)) {
> +		printk(KERN_WARNING "PCI: Failed to allocate PCI64 space\n");
> +		return;
> +	}
> +
> +	list_for_each_entry(b, &pci_root_buses, node) {
> +		/* Is this a "standard" root bus created by pci_create_bus? */
> +		if (b->resource[1] != root || b->resource[2])
> +			continue;
> +		b->resource[2] = r64;	/* create DAC resource */
> +	}
> +}
> +
> +void pcibios_pci64_verify(void)
> +{
> +	struct pci_bus *b;
> +
> +	if (mem64.flags & IORESOURCE_MEM64)
> +		return;		/* presumably DAC works */
> +	list_for_each_entry(b, &pci_root_buses, node) {
> +		if (b->resource[2] == &mem64)
> +			b->resource[2] = NULL;
> +	}
> +	printk(KERN_INFO "PCI: allocations above 4G disabled\n");
> +}
> diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
> index f1817f7..bf8eb75 100644
> --- a/arch/x86/pci/i386.c
> +++ b/arch/x86/pci/i386.c
> @@ -137,6 +137,10 @@ static void __init pcibios_allocate_bus_resources(struct list_head *bus_list)
>  					 * range.
>  					 */
>  					r->flags = 0;
> +				} else {
> +					/* Successful allocation */
> +					if (upper_32_bits(r->start))
> +						pr->flags |= IORESOURCE_MEM64;
>  				}
>  			}
>  		}
> @@ -174,6 +178,10 @@ static void __init pcibios_allocate_resources(int pass)
>  					/* We'll assign a new address later */
>  					r->end -= r->start;
>  					r->start = 0;
> +				} else {
> +					/* Successful allocation */
> +					if (upper_32_bits(r->start))
> +						pr->flags |= IORESOURCE_MEM64;
>  				}
>  			}
>  		}
> @@ -225,9 +233,11 @@ static int __init pcibios_assign_resources(void)
>  void __init pcibios_resource_survey(void)
>  {
>  	DBG("PCI: Allocating resources\n");
> +	pcibios_pci64_setup();
>  	pcibios_allocate_bus_resources(&pci_root_buses);
>  	pcibios_allocate_resources(0);
>  	pcibios_allocate_resources(1);
> +	pcibios_pci64_verify();
>  
>  	e820_reserve_resources_late();
>  }
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 32e4b2f..30403b3 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -49,6 +49,8 @@ struct resource_list {
>  #define IORESOURCE_SIZEALIGN	0x00020000	/* size indicates alignment */
>  #define IORESOURCE_STARTALIGN	0x00040000	/* start field is alignment */
>  
> +#define IORESOURCE_MEM64	0x00080000	/* 64-bit addressing, >4G */
> +
>  #define IORESOURCE_EXCLUSIVE	0x08000000	/* Userland may not map this resource */
>  #define IORESOURCE_DISABLED	0x10000000
>  #define IORESOURCE_UNSET	0x20000000

also it seems logical is wrong.

we should make sure if one pci resource support 64 from pci_read_bases() instead of 
pcibios_allocate_resources.

thinking about: if pci bridge on bus 0 (aka first peer root bus), some device under
bridge doesn't get allocated resource from BIOS. and those bridge/device does support pref mem64
then your patch will not getIORESORCE_MEM64 set before pcibios_pci64_verify...

Correct logic should be
record all device if support 64bit pref mem (with pci_read_bases), and make sure bridge to be
 consistent with that of device under if.
and that is my patch doing: pci: don't assume pref memio are 64bit -v2

YH

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ivan Kokshaysky April 21, 2009, 10:54 a.m. UTC | #3
On Mon, Apr 20, 2009 at 03:52:26PM -0700, Yinghai Lu wrote:
> i don't think this going to work with AMD system with 2 and more HT chains.
> 
> also it may have problem with new intel system with 2 IOHs.

There are no problems with these systems, my patch is just a no-op
for them, which is fine for now - please note this is only a first
step to get 64-bit allocations right.

Ivan.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ivan Kokshaysky April 21, 2009, 10:56 a.m. UTC | #4
On Mon, Apr 20, 2009 at 05:09:32PM -0700, Yinghai Lu wrote:
> also it seems logical is wrong.
> 
> we should make sure if one pci resource support 64 from pci_read_bases() instead of 
> pcibios_allocate_resources.

pci_read_bases() is already providing all necessary information.

> thinking about: if pci bridge on bus 0 (aka first peer root bus), some device under
> bridge doesn't get allocated resource from BIOS. and those bridge/device does support pref mem64
> then your patch will not getIORESORCE_MEM64 set before pcibios_pci64_verify...

Yes, that's my point - even if host bridge, p2p bridge and device are 64-bit,
there is absolutely no guarantee that after moving the BARs above 4G
the device will work correctly.

> Correct logic should be
> record all device if support 64bit pref mem (with pci_read_bases), and make sure bridge to be
>  consistent with that of device under if.

Your view is very x86 centric, please don't forget that drivers/pci
code is used by other architectures as well:
- limiting 32-bit allocations to 0xffffffff simply breaks non-x86
  architectures. Alpha doesn't even boot with your patch;
- there are lots of devices with 64-bit non-prefetchable memory BARs,
  you don't seem to care about that.

And your patch doesn't work even on x86:

00:01.0 PCI bridge: Intel Corporation 82945G/GZ/P/PL PCI Express Root Port (rev 02) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 16 bytes
        Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
        I/O behind bridge: 0000e000-0000efff
        Memory behind bridge: cdf00000-cfffffff
        Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [88] Subsystem: Intel Corporation Device 0000
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+
                Address: fee0300c  Data: 4159
        Capabilities: [a0] Express (v1) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- RBE- FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #2, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <256ns, L1 <4us
                        ClockPM- Suprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise-
                        Slot #  0, PowerLimit 75.000000; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Off, PwrInd On, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet+ LinkState-
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
00: 86 80 71 27 07 05 10 00 02 00 04 06 04 00 01 00
10: 00 00 00 00 00 00 00 00 00 04 04 00 e0 e0 00 00
20: f0 cd f0 cf 01 d0 f1 df 00 00 00 00 00 00 00 00
30: 00 00 00 00 88 00 00 00 00 00 00 00 0b 01 0a 00

As one can see, the prefetchable base register (0x24) is 0xd001, the
bit 0 indicates 64-bitness. Which is not true, as i82945G/GZ/P/PL
only supports 32-bit addressing (please check the datasheet).

Also, your patch can't handle transparent bridges. And it doesn't
bode well for bus sizing code.

> and that is my patch doing: pci: don't assume pref memio are 64bit -v2

Your patch is all wrong, sorry.

Ivan.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Barnes April 21, 2009, 3:41 p.m. UTC | #5
On Tue, 21 Apr 2009 02:33:05 +0400
Ivan Kokshaysky <ink@jurassic.park.msu.ru> wrote:
> x86 pci: first cut on 64-bit resource allocation
> 
> I believe that we should consider PCI memory above 4G as yet another
> type of address space. This actually makes sense, as even accesses to
> that memory are physically different - Dual Address Cycle (DAC) vs.
> 32-bit Single Address Cycle (SAC).
> 
> So, platform that can deal with 64-bit allocations would set up an
> additional root bus resource and mark it with IORESOURCE_MEM64 flag.
> 
> The main problem here is how the kernel would detect that hardware can
> actually access a DAC memory (I know for a fact that a lot of Intel
> chipsets cannot access MMIO >4G, even though subordinate p2p bridges
> are 64-bit capable).
> On the other hand, there are PCI devices with 64-bit BARs that do not
> work properly being placed above 4G boundary. For example, some
> radeon cards have 64-bit BAR for video RAM, but allocating that BAR in
> the DAC area doesn't work for various reasons, like video-BIOS
> limitations or drivers not taking into account that GPU is 32-bit.
> 
> So moving stuff into MEM64 area should be considered as generally
> unsafe operation, and the best default policy is to not enable MEM64
> resource unless we find that BIOS has allocated something there.
> At the same time, MEM64 can be easily enabled/disabled based on host
> bridge PCI IDs, kernel command line options and so on.

This sounds like reasonable default behavior given the variety of
chipsets and device quirks out there.  This does muddy up the arch vs.
generic code just a little bit more though; iirc mips & ia64 use full
64 bit ranges for their current IORESOURCE_MEM types (hm now that I've
checked it appears ia64 has changed a bit here, but still other
arches should probably get cleaned up to use the new 64 bit type at
some point).

> Here is a basic implementation of the above for x86. I think it's
> reasonably good starting point for PCI64 work - the next step would
> be to teach pci_bus_alloc_resource() about IORESOURCE_MEM64: logic is
> similar to prefetch vs non-prefetch case - MEM can hold MEM64
> resource, but not vice versa. And eventually bridge sizing code will
> be updated for reasonable 64-bit allocations (it's a non-trivial
> task, though).
> 
> This patch alone should fix cardbus >4G allocations and similar
> nonsense.
> 
> Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>

Nice.  Any cleanups to existing arch code could be done at the same
time as the updates to the bus allocation.

Anyone care to send me some tested-by lines?

Thanks,
Yinghai Lu April 21, 2009, 3:57 p.m. UTC | #6
Ivan Kokshaysky wrote:
> On Mon, Apr 20, 2009 at 05:09:32PM -0700, Yinghai Lu wrote:
>> also it seems logical is wrong.
>>
>> we should make sure if one pci resource support 64 from pci_read_bases() instead of 
>> pcibios_allocate_resources.
> 
> pci_read_bases() is already providing all necessary information.

current pci_read_bases() does not tell us if that device support 32bit pref mem or 64bit pref.
aka there is type about that, but that is not recording in res->flag

> 
>> thinking about: if pci bridge on bus 0 (aka first peer root bus), some device under
>> bridge doesn't get allocated resource from BIOS. and those bridge/device does support pref mem64
>> then your patch will not getIORESORCE_MEM64 set before pcibios_pci64_verify...
> 
> Yes, that's my point - even if host bridge, p2p bridge and device are 64-bit,
> there is absolutely no guarantee that after moving the BARs above 4G
> the device will work correctly.
> 
>> Correct logic should be
>> record all device if support 64bit pref mem (with pci_read_bases), and make sure bridge to be
>>  consistent with that of device under if.
> 
> Your view is very x86 centric, please don't forget that drivers/pci
> code is used by other architectures as well:
> - limiting 32-bit allocations to 0xffffffff simply breaks non-x86
>   architectures. Alpha doesn't even boot with your patch;

will look at it, may limit that to x86 arch.

> - there are lots of devices with 64-bit non-prefetchable memory BARs,
>   you don't seem to care about that.

the fact is current p2p spec said: mem ( non pref) is 32bit.
and pref mem could be 64bit or 32 bit.

with devices on root buses directly, we may need to expand current 
res->flags & IORESOURCE_PREFETCH checking to support it.

> 
> And your patch doesn't work even on x86:
> 
> 00:01.0 PCI bridge: Intel Corporation 82945G/GZ/P/PL PCI Express Root Port (rev 02) (prog-if 00 [Normal decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 16 bytes
>         Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
>         I/O behind bridge: 0000e000-0000efff
>         Memory behind bridge: cdf00000-cfffffff
>         Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
>         BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
>                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>         Capabilities: [88] Subsystem: Intel Corporation Device 0000
>         Capabilities: [80] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+
>                 Address: fee0300c  Data: 4159
>         Capabilities: [a0] Express (v1) Root Port (Slot+), MSI 00
>                 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
>                         ExtTag- RBE- FLReset-
>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                         MaxPayload 128 bytes, MaxReadReq 128 bytes
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>                 LnkCap: Port #2, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <256ns, L1 <4us
>                         ClockPM- Suprise- LLActRep- BwNot-
>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>                 SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise-
>                         Slot #  0, PowerLimit 75.000000; Interlock- NoCompl-
>                 SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
>                         Control: AttnInd Off, PwrInd On, Power- Interlock-
>                 SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
>                         Changed: MRL- PresDet+ LinkState-
>                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
>                 RootCap: CRSVisible-
>                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> 00: 86 80 71 27 07 05 10 00 02 00 04 06 04 00 01 00
> 10: 00 00 00 00 00 00 00 00 00 04 04 00 e0 e0 00 00
> 20: f0 cd f0 cf 01 d0 f1 df 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 88 00 00 00 00 00 00 00 0b 01 0a 00
> 
> As one can see, the prefetchable base register (0x24) is 0xd001, the
> bit 0 indicates 64-bitness. Which is not true, as i82945G/GZ/P/PL
> only supports 32-bit addressing (please check the datasheet).

that should be done with quirks way to handle it.

> 
> Also, your patch can't handle transparent bridges. And it doesn't
> bode well for bus sizing code.

my patch is only trying to handle the case:
don't assume all pref mem is 64bit so don't assign resource above to 4g to pci bridge that
has device under it but the device only support 32 bit pref.

for the bus resize, may could be use your mem64 res, but set that flag always, 
and in pci_bus_size_bridges, try to  expand 
                mask = IORESOURCE_MEM;
                prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
                if (pbus_size_mem(bus, prefmask, prefmask))
                        mask = prefmask; /* Success, size non-prefetch only. */
                pbus_size_mem(bus, mask, IORESOURCE_MEM);

to have more tries, and that flags.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index b51a1e8..5a9c54e 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -86,6 +86,14 @@  static inline void early_quirks(void) { }
 
 extern void pci_iommu_alloc(void);
 
+#ifdef CONFIG_ARCH_PHYS_ADDR_T_64BIT
+extern void pcibios_pci64_setup(void);
+extern void pcibios_pci64_verify(void);
+#else
+static inline void pcibios_pci64_setup(void) { }
+static inline void pcibios_pci64_verify(void) { }
+#endif
+
 /* MSI arch hook */
 #define arch_setup_msi_irqs arch_setup_msi_irqs
 
diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile
index d49202e..1b6c576 100644
--- a/arch/x86/pci/Makefile
+++ b/arch/x86/pci/Makefile
@@ -13,5 +13,7 @@  obj-$(CONFIG_X86_VISWS)		+= visws.o
 
 obj-$(CONFIG_X86_NUMAQ)		+= numaq_32.o
 
+obj-$(CONFIG_ARCH_PHYS_ADDR_T_64BIT)	+= dac_64bit.o
+
 obj-y				+= common.o early.o
 obj-y				+= amd_bus.o
diff --git a/arch/x86/pci/dac_64bit.c b/arch/x86/pci/dac_64bit.c
new file mode 100644
index 0000000..ee03c4a
--- /dev/null
+++ b/arch/x86/pci/dac_64bit.c
@@ -0,0 +1,44 @@ 
+/*
+ * Set up the 64-bit bus resource for allocations > 4G if the hardware
+ * is capable of generating Dual Address Cycle (DAC).
+ */
+
+#include <linux/pci.h>
+
+static struct resource mem64 = {
+	.name	= "PCI mem64",
+	.start	= (resource_size_t)1 << 32,	/* 4Gb */
+	.end	= -1,
+	.flags	= IORESOURCE_MEM,
+};
+
+void pcibios_pci64_setup(void)
+{
+	struct resource *r64 = &mem64, *root = &iomem_resource;
+	struct pci_bus *b;
+
+	if (insert_resource(root, r64)) {
+		printk(KERN_WARNING "PCI: Failed to allocate PCI64 space\n");
+		return;
+	}
+
+	list_for_each_entry(b, &pci_root_buses, node) {
+		/* Is this a "standard" root bus created by pci_create_bus? */
+		if (b->resource[1] != root || b->resource[2])
+			continue;
+		b->resource[2] = r64;	/* create DAC resource */
+	}
+}
+
+void pcibios_pci64_verify(void)
+{
+	struct pci_bus *b;
+
+	if (mem64.flags & IORESOURCE_MEM64)
+		return;		/* presumably DAC works */
+	list_for_each_entry(b, &pci_root_buses, node) {
+		if (b->resource[2] == &mem64)
+			b->resource[2] = NULL;
+	}
+	printk(KERN_INFO "PCI: allocations above 4G disabled\n");
+}
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index f1817f7..bf8eb75 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -137,6 +137,10 @@  static void __init pcibios_allocate_bus_resources(struct list_head *bus_list)
 					 * range.
 					 */
 					r->flags = 0;
+				} else {
+					/* Successful allocation */
+					if (upper_32_bits(r->start))
+						pr->flags |= IORESOURCE_MEM64;
 				}
 			}
 		}
@@ -174,6 +178,10 @@  static void __init pcibios_allocate_resources(int pass)
 					/* We'll assign a new address later */
 					r->end -= r->start;
 					r->start = 0;
+				} else {
+					/* Successful allocation */
+					if (upper_32_bits(r->start))
+						pr->flags |= IORESOURCE_MEM64;
 				}
 			}
 		}
@@ -225,9 +233,11 @@  static int __init pcibios_assign_resources(void)
 void __init pcibios_resource_survey(void)
 {
 	DBG("PCI: Allocating resources\n");
+	pcibios_pci64_setup();
 	pcibios_allocate_bus_resources(&pci_root_buses);
 	pcibios_allocate_resources(0);
 	pcibios_allocate_resources(1);
+	pcibios_pci64_verify();
 
 	e820_reserve_resources_late();
 }
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 32e4b2f..30403b3 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -49,6 +49,8 @@  struct resource_list {
 #define IORESOURCE_SIZEALIGN	0x00020000	/* size indicates alignment */
 #define IORESOURCE_STARTALIGN	0x00040000	/* start field is alignment */
 
+#define IORESOURCE_MEM64	0x00080000	/* 64-bit addressing, >4G */
+
 #define IORESOURCE_EXCLUSIVE	0x08000000	/* Userland may not map this resource */
 #define IORESOURCE_DISABLED	0x10000000
 #define IORESOURCE_UNSET	0x20000000