diff mbox series

[v4] Subject: PCI: Enable io space 1k granularity for intel cpu root port

Message ID 20240702035649.26039-1-zhoushengqing@ttyinfo.com (mailing list archive)
State Changes Requested
Delegated to: Bjorn Helgaas
Headers show
Series [v4] Subject: PCI: Enable io space 1k granularity for intel cpu root port | expand

Commit Message

Zhou Shengqing July 2, 2024, 3:56 a.m. UTC
This patch add 1k granularity for intel root port bridge. Intel latest
server CPU support 1K granularity, And there is an BIOS setup item named
"EN1K", but linux doesn't support it. if an IIO has 5 IOU (SPR has 5 IOUs)
all are bifurcated 2x8.In a 2P server system,There are 20 P2P bridges
present. if keep 4K granularity allocation,it need 20*4=80k io space,
exceeding 64k. I test it in a 16*nvidia 4090s system under intel eaglestrem
platform. There are six 4090s that cannot be allocated I/O resources.
So I applied this patch. And I found a similar implementation in quirks.c,
but it only targets the Intel P64H2 platform.

Signed-off-by: Zhou Shengqing <zhoushengqing@ttyinfo.com>
---
 drivers/pci/quirks.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

Comments

Bjorn Helgaas July 12, 2024, 6:48 p.m. UTC | #1
On Tue, Jul 02, 2024 at 03:56:49AM +0000, Zhou Shengqing wrote:
> This patch add 1k granularity for intel root port bridge. Intel latest
> server CPU support 1K granularity, And there is an BIOS setup item named
> "EN1K", but linux doesn't support it. if an IIO has 5 IOU (SPR has 5 IOUs)
> all are bifurcated 2x8.In a 2P server system,There are 20 P2P bridges
> present. if keep 4K granularity allocation,it need 20*4=80k io space,
> exceeding 64k. I test it in a 16*nvidia 4090s system under intel eaglestrem
> platform. There are six 4090s that cannot be allocated I/O resources.
> So I applied this patch. And I found a similar implementation in quirks.c,
> but it only targets the Intel P64H2 platform.

I think this has potential.  Can you include a more complete citation
for the Intel spec?  Complete name, document number if available,
revision, section?  Hopefully it's publically available?

> Signed-off-by: Zhou Shengqing <zhoushengqing@ttyinfo.com>
> ---
>  drivers/pci/quirks.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 568410e64ce6..f30083d51e15 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -2562,6 +2562,36 @@ static void quirk_p64h2_1k_io(struct pci_dev *dev)
>  }
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x1460, quirk_p64h2_1k_io);
>  
> +/* Enable 1k I/O space granularity on the intel root port */
> +static void quirk_intel_rootport_1k_io(struct pci_dev *dev)
> +{
> +	struct pci_dev *d = NULL;
> +	u16 en1k = 0;
> +	struct pci_dev *root_port = pcie_find_root_port(dev);
> +
> +	if (!root_port)
> +		return;

This doesn't seem quite right to me.  The point is to set
dev->io_window_1k when "dev" is a Root Port itself and when the EN1K
bit is set in a [8086:09a2] device.

So I don't think we need to *look* for the Root Port, we just need to
check that "dev" itself *is* a Root Port, e.g.,

  if (!pci_is_pcie(dev) || pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
    return;

> +	/*
> +	 * Per intel sever CPU EDS vol2(register) spec,
> +	 * Intel Memory Map/Intel VT-d configuration space,
> +	 * IIO MISC Control (IIOMISCCTRL_1_5_0_CFG) — Offset 1C0h
> +	 * bit 2.
> +	 * Enable 1K (EN1K):
> +	 * This bit when set, enables 1K granularity for I/O space decode
> +	 * in each of the virtual P2P bridges
> +	 * corresponding to root ports, and DMI ports.
> +	 */
> +	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {

To be safe, "d" (the [8086:09a2] device) should be on the same bus as
"dev" (with VMD, I think we get Root Ports *below* the VMD bridge,
which would be a different bus, and they presumably are not influenced
by the EN1K bit.

> +		pci_read_config_word(d, 0x1c0, &en1k);
> +		if (en1k & 0x4) {
> +			pci_info(d, "INTEL: System should support 1k io window\n");

If we log this, I think it should be with "dev", not "d", since we
likely will have several Root Ports, and this would lead to several
identical messages.  Maybe something like this:

  pci_info(dev, "1K I/O windows enabled per %s EN1K setting\n", pci_name(d));

> +			dev->io_window_1k = 1;
> +		}
> +	}
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,	PCI_ANY_ID,	quirk_intel_rootport_1k_io);
> +
>  /*
>   * Under some circumstances, AER is not linked with extended capabilities.
>   * Force it to be linked by setting the corresponding control bit in the
> -- 
> 2.39.2
>
Zhou Shengqing July 23, 2024, 8:04 a.m. UTC | #2
> I think this has potential.  Can you include a more complete citation
> for the Intel spec?  Complete name, document number if available,
> revision, section?  Hopefully it's publically available?

Most of intel CPU EDS specs are under NDA. But you can refer to
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v2-datasheet-vol-2.pdf
keyword:"EN1K".

> + /*
> > + * Per intel sever CPU EDS vol2(register) spec,
> > + * Intel Memory Map/Intel VT-d configuration space,
> > + * IIO MISC Control (IIOMISCCTRL_1_5_0_CFG) — Offset 1C0h
> > + * bit 2.
> > + * Enable 1K (EN1K):
> > + * This bit when set, enables 1K granularity for I/O space decode
> > + * in each of the virtual P2P bridges
> > + * corresponding to root ports, and DMI ports.
> > + */
> > + while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
> 
> To be safe, "d" (the [8086:09a2] device) should be on the same bus as
> "dev" (with VMD, I think we get Root Ports *below* the VMD bridge,
> which would be a different bus, and they presumably are not influenced
> by the EN1K bit.

I modified the code as follows, can you help me review it? 

/* Enable 1k I/O space granularity on the intel root port */
static void quirk_intel_rootport_1k_io(struct pci_dev *dev)
{
	struct pci_dev *d = NULL;
	u16 en1k = 0;

	if (!pci_is_pcie(dev) || pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
		return;

	/*
	 * Per intel sever CPU (ICX SPR GNR)EDS vol2(register) spec,
	 * Intel Memory Map/Intel VT-d configuration space,
	 * IIO MISC Control (IIOMISCCTRL_1_5_0_CFG) — Offset 1C0h
	 * bit 2.
	 * Enable 1K (EN1K):
	 * This bit when set, enables 1K granularity for I/O space decode 
	 * in each of the virtual P2P bridges
	 * corresponding to root ports, and DMI ports.
	 */
	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
		if (pci_domain_nr(d->bus) == pci_domain_nr(dev->bus)) {
			pci_read_config_word(d, 0x1c0, &en1k);
			if (en1k & 0x4) {
				pci_info(dev, "1K I/O windows enabled per %s EN1K setting\n", pci_name(d));
				dev->io_window_1k = 1;
			}
		}
	}
}
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,	PCI_ANY_ID,	quirk_intel_rootport_1k_io);

If you have a better method, please let me know. If there are no issues, 
I can submit a new patch.
Ethan Zhao July 24, 2024, 2:34 a.m. UTC | #3
On 7/23/2024 4:04 PM, Zhou Shengqing wrote:
>> I think this has potential.  Can you include a more complete citation
>> for the Intel spec?  Complete name, document number if available,
>> revision, section?  Hopefully it's publically available?
> Most of intel CPU EDS specs are under NDA. But you can refer to
> https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v2-datasheet-vol-2.pdf
> keyword:"EN1K".
>
>> + /*
>>> + * Per intel sever CPU EDS vol2(register) spec,
>>> + * Intel Memory Map/Intel VT-d configuration space,
>>> + * IIO MISC Control (IIOMISCCTRL_1_5_0_CFG) — Offset 1C0h
>>> + * bit 2.
>>> + * Enable 1K (EN1K):
>>> + * This bit when set, enables 1K granularity for I/O space decode
>>> + * in each of the virtual P2P bridges
>>> + * corresponding to root ports, and DMI ports.
>>> + */
>>> + while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
>> To be safe, "d" (the [8086:09a2] device) should be on the same bus as
>> "dev" (with VMD, I think we get Root Ports *below* the VMD bridge,
>> which would be a different bus, and they presumably are not influenced
>> by the EN1K bit.
> I modified the code as follows, can you help me review it?
>
> /* Enable 1k I/O space granularity on the intel root port */
> static void quirk_intel_rootport_1k_io(struct pci_dev *dev)
> {
> 	struct pci_dev *d = NULL;
> 	u16 en1k = 0;
>
> 	if (!pci_is_pcie(dev) || pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
> 		return;
>
> 	/*
> 	 * Per intel sever CPU (ICX SPR GNR)EDS vol2(register) spec,
> 	 * Intel Memory Map/Intel VT-d configuration space,
> 	 * IIO MISC Control (IIOMISCCTRL_1_5_0_CFG) — Offset 1C0h
> 	 * bit 2.
> 	 * Enable 1K (EN1K):
> 	 * This bit when set, enables 1K granularity for I/O space decode
> 	 * in each of the virtual P2P bridges
> 	 * corresponding to root ports, and DMI ports.
> 	 */
> 	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
> 		if (pci_domain_nr(d->bus) == pci_domain_nr(dev->bus)) {

Perhaps it is enough to check if the 0x09a2 VT-d and the rootport are on the smae bus
e.g. On my SPR, domain 0000

00:00.0 System peripheral: Intel Corporation Device 09a2 (rev 20)
00:0f.0 PCI bridge: Intel Corporation Device 1bbf (rev 10) (prog-if 00 [Normal decode])

  
15:00.0 System peripheral: Intel Corporation Device 09a2 (rev 20)
15:01.0 PCI bridge: Intel Corporation Device 352a (rev 04) (prog-if 00 [Normal decode])

and if you check domain number only, they might sit on different bus, perhaps that
would make thing complex, could you make sure the VT-d is on the upstream bus of the
bridge ?

Thanks,
Ethan
  

> 			pci_read_config_word(d, 0x1c0, &en1k);
> 			if (en1k & 0x4) {
> 				pci_info(dev, "1K I/O windows enabled per %s EN1K setting\n", pci_name(d));
> 				dev->io_window_1k = 1;
> 			}
> 		}
> 	}
> }
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,	PCI_ANY_ID,	quirk_intel_rootport_1k_io);
>
> If you have a better method, please let me know. If there are no issues,
> I can submit a new patch.
>
Zhou Shengqing July 24, 2024, 3:38 a.m. UTC | #4
> On 7/23/2024 4:04 PM, Zhou Shengqing wrote:
> >> I think this has potential.  Can you include a more complete citation
> >> for the Intel spec?  Complete name, document number if available,
> >> revision, section?  Hopefully it's publically available?
> > Most of intel CPU EDS specs are under NDA. But you can refer to
> > https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v2-datasheet-vol-2.pdf
> > keyword:"EN1K".
> > ...
> > 
> > 	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
> > 		if (pci_domain_nr(d->bus) == pci_domain_nr(dev->bus)) {
> 
> Perhaps it is enough to check if the 0x09a2 VT-d and the rootport are on the smae bus
> e.g. On my SPR, domain 0000

Thank you for your comment.

Do you mean it shoud be like this?

	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
		if (d->bus->number == dev->bus->number) {
			pci_read_config_word(d, 0x1c0, &en1k);
			if (en1k & 0x4) {
				pci_info(dev, "1K I/O windows enabled per %s EN1K setting\n", pci_name(d));
				dev->io_window_1k = 1;
			}
		}
	}

> 
> 00:00.0 System peripheral: Intel Corporation Device 09a2 (rev 20)
> 00:0f.0 PCI bridge: Intel Corporation Device 1bbf (rev 10) (prog-if 00 [Normal decode])
> 
>   
> 15:00.0 System peripheral: Intel Corporation Device 09a2 (rev 20)
> 15:01.0 PCI bridge: Intel Corporation Device 352a (rev 04) (prog-if 00 [Normal decode])
> 
> and if you check domain number only, they might sit on different bus, perhaps that
> would make thing complex, could you make sure the VT-d is on the upstream bus of the
> bridge ?

I checked it on ICX SPR EMR GNR, VT-d is always on the same bus with root port,
and VT-d device and function number is always 0. 

Please let me know if further modifications are needed.

> 
> Thanks,
> Ethan
Ethan Zhao July 24, 2024, 5:39 a.m. UTC | #5
On 7/24/2024 11:38 AM, Zhou Shengqing wrote:
>> On 7/23/2024 4:04 PM, Zhou Shengqing wrote:
>>>> I think this has potential.  Can you include a more complete citation
>>>> for the Intel spec?  Complete name, document number if available,
>>>> revision, section?  Hopefully it's publically available?
>>> Most of intel CPU EDS specs are under NDA. But you can refer to
>>> https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v2-datasheet-vol-2.pdf
>>> keyword:"EN1K".
>>> ...
>>>
>>> 	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
>>> 		if (pci_domain_nr(d->bus) == pci_domain_nr(dev->bus)) {
>> Perhaps it is enough to check if the 0x09a2 VT-d and the rootport are on the smae bus
>> e.g. On my SPR, domain 0000
> Thank you for your comment.
>
> Do you mean it shoud be like this?
>
> 	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
> 		if (d->bus->number == dev->bus->number) {
> 			pci_read_config_word(d, 0x1c0, &en1k);
> 			if (en1k & 0x4) {
> 				pci_info(dev, "1K I/O windows enabled per %s EN1K setting\n", pci_name(d));
> 				dev->io_window_1k = 1;
> 			}
> 		}
> 	}
>
>> 00:00.0 System peripheral: Intel Corporation Device 09a2 (rev 20)
>> 00:0f.0 PCI bridge: Intel Corporation Device 1bbf (rev 10) (prog-if 00 [Normal decode])
>>
>>    
>> 15:00.0 System peripheral: Intel Corporation Device 09a2 (rev 20)
>> 15:01.0 PCI bridge: Intel Corporation Device 352a (rev 04) (prog-if 00 [Normal decode])
>>
>> and if you check domain number only, they might sit on different bus, perhaps that
>> would make thing complex, could you make sure the VT-d is on the upstream bus of the
>> bridge ?
> I checked it on ICX SPR EMR GNR, VT-d is always on the same bus with root port,
> and VT-d device and function number is always 0.

Yes, every VT-d instance in the root complex and the root port integrated are
on the same bus. and VT-d is the first device of that bus.

The EDS doesn't say if there is exception one of the VT-d instances in an
system its EN1K wasn't set while others were set, vice vesa. so I suggest
just check the VT-d and then set the root port's io_windows_1k of the same bus.

Hope that works for your case.


Thanks,
Ethan

>
> Please let me know if further modifications are needed.
>
>> Thanks,
>> Ethan
Zhou Shengqing July 24, 2024, 6:35 a.m. UTC | #6
> > Do you mean it shoud be like this?
> >
> > 	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
> > 		if (d->bus->number == dev->bus->number) {
> > 			pci_read_config_word(d, 0x1c0, &en1k);
> > 			if (en1k & 0x4) {
> > 				pci_info(dev, "1K I/O windows enabled per %s EN1K setting\n", pci_name(d));
> > 				dev->io_window_1k = 1;
> > 			}
> > 		}
> > 	}
> >
> >> 00:00.0 System peripheral: Intel Corporation Device 09a2 (rev 20)
> >> 00:0f.0 PCI bridge: Intel Corporation Device 1bbf (rev 10) (prog-if 00 [Normal decode])
> >>
> >>    
> >> 15:00.0 System peripheral: Intel Corporation Device 09a2 (rev 20)
> >> 15:01.0 PCI bridge: Intel Corporation Device 352a (rev 04) (prog-if 00 [Normal decode])
> >>
> >> and if you check domain number only, they might sit on different bus, perhaps that
> >> would make thing complex, could you make sure the VT-d is on the upstream bus of the
> >> bridge ?
> > I checked it on ICX SPR EMR GNR, VT-d is always on the same bus with root port,
> > and VT-d device and function number is always 0.
> 
> Yes, every VT-d instance in the root complex and the root port integrated are
> on the same bus. and VT-d is the first device of that bus.
> 
> The EDS doesn't say if there is exception one of the VT-d instances in an
> system its EN1K wasn't set while others were set, vice vesa. so I suggest
> just check the VT-d and then set the root port's io_windows_1k of the same bus.

But as Bjorn mentioned at July 12, 2024, 6:48 p.m.,

"To be safe, "d" (the [8086:09a2] device) should be on the same bus as
"dev" (with VMD, I think we get Root Ports *below* the VMD bridge,
which would be a different bus, and they presumably are not influenced
by the EN1K bit."

When VMD enabled, just check bus number identical may lead to enable
1k io windows for VMD domain root port. E.g. 0000:80:00.0 is a 
VT-d(09a2). If VMD enabled, there might be a root port 10000:80:01.0 present.
this code may lead to enable 10000:80:01.0 io_window_1k = 1. 
This is probably not expected.

If I modify it like this,

	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
	---if (d->bus->number == dev->bus->number) {
	+++if (d->bus == dev->bus) {
			pci_read_config_word(d, 0x1c0, &en1k);
			if (en1k & 0x4) {
				pci_info(dev, "1K I/O windows enabled per %s EN1K setting\n", pci_name(d));
				dev->io_window_1k = 1;
			}
		}
	}
    
Can the situation mentioned above be avoided?

Hope for your suggestion.

> 
> Hope that works for your case.
>
Ethan Zhao July 24, 2024, 7:51 a.m. UTC | #7
On 7/24/2024 2:35 PM, Zhou Shengqing wrote:
>>> Do you mean it shoud be like this?
>>>
>>> 	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
>>> 		if (d->bus->number == dev->bus->number) {
>>> 			pci_read_config_word(d, 0x1c0, &en1k);
>>> 			if (en1k & 0x4) {
>>> 				pci_info(dev, "1K I/O windows enabled per %s EN1K setting\n", pci_name(d));
>>> 				dev->io_window_1k = 1;
>>> 			}
>>> 		}
>>> 	}
>>>
>>>> 00:00.0 System peripheral: Intel Corporation Device 09a2 (rev 20)
>>>> 00:0f.0 PCI bridge: Intel Corporation Device 1bbf (rev 10) (prog-if 00 [Normal decode])
>>>>
>>>>     
>>>> 15:00.0 System peripheral: Intel Corporation Device 09a2 (rev 20)
>>>> 15:01.0 PCI bridge: Intel Corporation Device 352a (rev 04) (prog-if 00 [Normal decode])
>>>>
>>>> and if you check domain number only, they might sit on different bus, perhaps that
>>>> would make thing complex, could you make sure the VT-d is on the upstream bus of the
>>>> bridge ?
>>> I checked it on ICX SPR EMR GNR, VT-d is always on the same bus with root port,
>>> and VT-d device and function number is always 0.
>> Yes, every VT-d instance in the root complex and the root port integrated are
>> on the same bus. and VT-d is the first device of that bus.
>>
>> The EDS doesn't say if there is exception one of the VT-d instances in an
>> system its EN1K wasn't set while others were set, vice vesa. so I suggest
>> just check the VT-d and then set the root port's io_windows_1k of the same bus.
> But as Bjorn mentioned at July 12, 2024, 6:48 p.m.,
>
> "To be safe, "d" (the [8086:09a2] device) should be on the same bus as
> "dev" (with VMD, I think we get Root Ports *below* the VMD bridge,
> which would be a different bus, and they presumably are not influenced
> by the EN1K bit."
>
> When VMD enabled, just check bus number identical may lead to enable
> 1k io windows for VMD domain root port. E.g. 0000:80:00.0 is a
> VT-d(09a2). If VMD enabled, there might be a root port 10000:80:01.0 present.
> this code may lead to enable 10000:80:01.0 io_window_1k = 1.
> This is probably not expected.
>
> If I modify it like this,
>
> 	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {

BTW, don't save letters to use single letter variable 'd', please use 'vtd_dev' or
something else to express the VT-d device.

> 	---if (d->bus->number == dev->bus->number) {
> 	+++if (d->bus == dev->bus) {

What if their 'bus' are NULL, though it is almost impossible. :)

> 			pci_read_config_word(d, 0x1c0, &en1k);
> 			if (en1k & 0x4) {
> 				pci_info(dev, "1K I/O windows enabled per %s EN1K setting\n", pci_name(d));
> 				dev->io_window_1k = 1;
> 			}
> 		}
> 	}
>      
> Can the situation mentioned above be avoided?

Yes, my understanding, as Bjorn pointed out root port extended from VMD
bridge not on the same bus as VT-d.



Thanks,
Ethan

>
> Hope for your suggestion.
>
>> Hope that works for your case.
>>
Zhou Shengqing July 25, 2024, 7:44 a.m. UTC | #8
> On 7/24/2024 2:35 PM, Zhou Shengqing wrote:
> >>> Do you mean it shoud be like this?
> >>>
> >>> 	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
> >>> 		if (d->bus->number == dev->bus->number) {
> >>> 			pci_read_config_word(d, 0x1c0, &en1k);
> >>> 			if (en1k & 0x4) {
> >>> 				pci_info(dev, "1K I/O windows enabled per %s EN1K setting\n", pci_name(d));
> >>> 				dev->io_window_1k = 1;
> >>> 			}
> >>> 		}
> >>> 	}
> >>>
> >>>> 00:00.0 System peripheral: Intel Corporation Device 09a2 (rev 20)
> >>>> 00:0f.0 PCI bridge: Intel Corporation Device 1bbf (rev 10) (prog-if 00 [Normal decode])
> >>>>
> >>>>     
> >>>> 15:00.0 System peripheral: Intel Corporation Device 09a2 (rev 20)
> >>>> 15:01.0 PCI bridge: Intel Corporation Device 352a (rev 04) (prog-if 00 [Normal decode])
> >>>>
> >>>> and if you check domain number only, they might sit on different bus, perhaps that
> >>>> would make thing complex, could you make sure the VT-d is on the upstream bus of the
> >>>> bridge ?
> >>> I checked it on ICX SPR EMR GNR, VT-d is always on the same bus with root port,
> >>> and VT-d device and function number is always 0.
> >> Yes, every VT-d instance in the root complex and the root port integrated are
> >> on the same bus. and VT-d is the first device of that bus.
> >>
> >> The EDS doesn't say if there is exception one of the VT-d instances in an
> >> system its EN1K wasn't set while others were set, vice vesa. so I suggest
> >> just check the VT-d and then set the root port's io_windows_1k of the same bus.
> > But as Bjorn mentioned at July 12, 2024, 6:48 p.m.,
> >
> > "To be safe, "d" (the [8086:09a2] device) should be on the same bus as
> > "dev" (with VMD, I think we get Root Ports *below* the VMD bridge,
> > which would be a different bus, and they presumably are not influenced
> > by the EN1K bit."
> >
> > When VMD enabled, just check bus number identical may lead to enable
> > 1k io windows for VMD domain root port. E.g. 0000:80:00.0 is a
> > VT-d(09a2). If VMD enabled, there might be a root port 10000:80:01.0 present.
> > this code may lead to enable 10000:80:01.0 io_window_1k = 1.
> > This is probably not expected.
> >
> > If I modify it like this,
> >
> > 	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
> 
> BTW, don't save letters to use single letter variable 'd', please use 'vtd_dev' or
> something else to express the VT-d device.

Got it!

> 
> > 	---if (d->bus->number == dev->bus->number) {
> > 	+++if (d->bus == dev->bus) {
> 
> What if their 'bus' are NULL, though it is almost impossible. :)
> 
> > 			pci_read_config_word(d, 0x1c0, &en1k);
> > 			if (en1k & 0x4) {
> > 				pci_info(dev, "1K I/O windows enabled per %s EN1K setting\n", pci_name(d));
> > 				dev->io_window_1k = 1;
> > 			}
> > 		}
> > 	}
> >      
> > Can the situation mentioned above be avoided?
> 
> Yes, my understanding, as Bjorn pointed out root port extended from VMD
> bridge not on the same bus as VT-d.

For the root port extended from VMD, should the 1k window be set
when BIOS setup EN1K knob enabled? 
In my case, I think  EN1K should not apply to the VMD root port.

But what I'm confused about is, how can I reasonably exclude the VMD root port
in the code?

> 
>
Ethan Zhao July 26, 2024, 2:27 a.m. UTC | #9
On 7/25/2024 3:44 PM, Zhou Shengqing wrote:
>> On 7/24/2024 2:35 PM, Zhou Shengqing wrote:
>>>>> Do you mean it shoud be like this?
>>>>>
>>>>> 	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
>>>>> 		if (d->bus->number == dev->bus->number) {
>>>>> 			pci_read_config_word(d, 0x1c0, &en1k);
>>>>> 			if (en1k & 0x4) {
>>>>> 				pci_info(dev, "1K I/O windows enabled per %s EN1K setting\n", pci_name(d));
>>>>> 				dev->io_window_1k = 1;
>>>>> 			}
>>>>> 		}
>>>>> 	}
>>>>>
>>>>>> 00:00.0 System peripheral: Intel Corporation Device 09a2 (rev 20)
>>>>>> 00:0f.0 PCI bridge: Intel Corporation Device 1bbf (rev 10) (prog-if 00 [Normal decode])
>>>>>>
>>>>>>      
>>>>>> 15:00.0 System peripheral: Intel Corporation Device 09a2 (rev 20)
>>>>>> 15:01.0 PCI bridge: Intel Corporation Device 352a (rev 04) (prog-if 00 [Normal decode])
>>>>>>
>>>>>> and if you check domain number only, they might sit on different bus, perhaps that
>>>>>> would make thing complex, could you make sure the VT-d is on the upstream bus of the
>>>>>> bridge ?
>>>>> I checked it on ICX SPR EMR GNR, VT-d is always on the same bus with root port,
>>>>> and VT-d device and function number is always 0.
>>>> Yes, every VT-d instance in the root complex and the root port integrated are
>>>> on the same bus. and VT-d is the first device of that bus.
>>>>
>>>> The EDS doesn't say if there is exception one of the VT-d instances in an
>>>> system its EN1K wasn't set while others were set, vice vesa. so I suggest
>>>> just check the VT-d and then set the root port's io_windows_1k of the same bus.
>>> But as Bjorn mentioned at July 12, 2024, 6:48 p.m.,
>>>
>>> "To be safe, "d" (the [8086:09a2] device) should be on the same bus as
>>> "dev" (with VMD, I think we get Root Ports *below* the VMD bridge,
>>> which would be a different bus, and they presumably are not influenced
>>> by the EN1K bit."
>>>
>>> When VMD enabled, just check bus number identical may lead to enable
>>> 1k io windows for VMD domain root port. E.g. 0000:80:00.0 is a
>>> VT-d(09a2). If VMD enabled, there might be a root port 10000:80:01.0 present.
>>> this code may lead to enable 10000:80:01.0 io_window_1k = 1.
>>> This is probably not expected.
>>>
>>> If I modify it like this,
>>>
>>> 	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
>> BTW, don't save letters to use single letter variable 'd', please use 'vtd_dev' or
>> something else to express the VT-d device.
> Got it!
>
>>> 	---if (d->bus->number == dev->bus->number) {
>>> 	+++if (d->bus == dev->bus) {
>> What if their 'bus' are NULL, though it is almost impossible. :)
>>
>>> 			pci_read_config_word(d, 0x1c0, &en1k);
>>> 			if (en1k & 0x4) {
>>> 				pci_info(dev, "1K I/O windows enabled per %s EN1K setting\n", pci_name(d));
>>> 				dev->io_window_1k = 1;
>>> 			}
>>> 		}
>>> 	}
>>>       
>>> Can the situation mentioned above be avoided?
>> Yes, my understanding, as Bjorn pointed out root port extended from VMD
>> bridge not on the same bus as VT-d.
> For the root port extended from VMD, should the 1k window be set
> when BIOS setup EN1K knob enabled?
> In my case, I think  EN1K should not apply to the VMD root port.
>
> But what I'm confused about is, how can I reasonably exclude the VMD root port
> in the code?

VMD, if enabled, is EP, not RP. and its RPs are mapped into its own space, and
sit at different buses as VT-d, no need to care about them if am correct.

Thanks,
Ethan

>>
>
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 568410e64ce6..f30083d51e15 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2562,6 +2562,36 @@  static void quirk_p64h2_1k_io(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x1460, quirk_p64h2_1k_io);
 
+/* Enable 1k I/O space granularity on the intel root port */
+static void quirk_intel_rootport_1k_io(struct pci_dev *dev)
+{
+	struct pci_dev *d = NULL;
+	u16 en1k = 0;
+	struct pci_dev *root_port = pcie_find_root_port(dev);
+
+	if (!root_port)
+		return;
+
+	/*
+	 * Per intel sever CPU EDS vol2(register) spec,
+	 * Intel Memory Map/Intel VT-d configuration space,
+	 * IIO MISC Control (IIOMISCCTRL_1_5_0_CFG) — Offset 1C0h
+	 * bit 2.
+	 * Enable 1K (EN1K):
+	 * This bit when set, enables 1K granularity for I/O space decode
+	 * in each of the virtual P2P bridges
+	 * corresponding to root ports, and DMI ports.
+	 */
+	while ((d = pci_get_device(PCI_VENDOR_ID_INTEL, 0x09a2, d))) {
+		pci_read_config_word(d, 0x1c0, &en1k);
+		if (en1k & 0x4) {
+			pci_info(d, "INTEL: System should support 1k io window\n");
+			dev->io_window_1k = 1;
+		}
+	}
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,	PCI_ANY_ID,	quirk_intel_rootport_1k_io);
+
 /*
  * Under some circumstances, AER is not linked with extended capabilities.
  * Force it to be linked by setting the corresponding control bit in the