Message ID | 20231221-thunderbolt-pci-patch-4-v4-1-2e136e57c9bc@chromium.org (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [v4] PCI: Relabel JHL6540 on Lenovo X1 Carbon 7,8 | expand |
On Thu, Dec 21, 2023 at 03:53:42PM -0500, Esther Shimanovich wrote: > +static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev) > +{ > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > + return; > + > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > + if (dev->subsystem_device != 0x22be && // Gen 8 > + dev->subsystem_device != 0x2292) { // Gen 7 > + return; > + } > + > + dev_set_removable(&dev->dev, DEVICE_FIXED); > + > + /* Not all 0x15d3 components are external facing */ > + if (dev->device == 0x15d3 && > + dev->devfn != 0x08 && > + dev->devfn != 0x20) { > + return; > + } > + > + dev->external_facing = true; > +} > + > +/* > + * We also need to relabel the root port as a consequence of changing > + * the JHL6540's PCIE hierarchy. > + */ > +static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev) > +{ > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > + return; > + > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > + if (dev->subsystem_device != 0x22be && // Gen 8 > + dev->subsystem_device != 0x2292) { // Gen 7 > + return; > + } This ventures into the realm of nitpicking, but this can be factored out into a helper. I'll shut up now and let PCI folks handle this. Thanks.
On Thu, Dec 21, 2023 at 03:53:42PM -0500, Esther Shimanovich wrote: > On Lenovo X1 Carbon Gen 7/8 devices, when a platform enables a policy to > distrust removable PCI devices, the build-in USB-C ports stop working at > all. > This happens because these X1 Carbon models have a unique feature; a > Thunderbolt controller that is discrete from the SoC. The software sees > this controller, and incorrectly assumes it is a removable PCI device, > even though it is fixed to the computer and is wired to the computer's > own USB-C ports. > > Relabel all the components of the JHL6540 controller as DEVICE_FIXED, > and where applicable, external_facing. > > Ensure that the security policy to distrust external PCI devices works > as intended, and that the device's USB-C ports are able to enumerate > even when the policy is enabled. Thanks for all your work here. This is going to be a maintenance problem. We typically use quirks to work around hardware defects, e.g., a device that advertises a feature that doesn't work per spec. But I don't see where the defect is here. And I doubt that this is really a unique situation. So it's likely that this will happen on other systems, and we don't want to have to add quirks every time another one shows up. If this is a firmware defect, e.g., if this platform is using "ExternalFacingPort" incorrectly, we can add a quirk to work around that, too. But I'm not sure that's the case. > Signed-off-by: Esther Shimanovich <eshimanovich@chromium.org> > --- > Changes in v4: > - replaced a dmi check in the rootport quirk with a subsystem vendor and > device check. > - Link to v3: https://lore.kernel.org/r/20231220-thunderbolt-pci-patch-4-v3-1-056fd1717d06@chromium.org > > Changes in v3: > - removed redundant dmi check, as the subsystem vendor check is > sufficient > - switched to PCI_VENDOR_ID_LENOVO instead of hex code > - Link to v2: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v2-1-ec2d7af45a9b@chromium.org > > Changes in v2: > - nothing new, v1 was just a test run to see if the ASCII diagram would > be rendered properly in mutt and k-9 > - for folks using gmail, make sure to select "show original" on the top > right, as otherwise the diagram will be garbled by the standard > non-monospace font > - Link to v1: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v1-1-4e8e3773f0a9@chromium.org > --- > drivers/pci/quirks.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 112 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index ea476252280a..34e43323ff14 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -3873,6 +3873,118 @@ DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL, > quirk_apple_poweroff_thunderbolt); > #endif > > +/* > + * On most ThinkPad Carbon 7/8s, JHL6540 Thunderbolt 3 bridges are set > + * incorrectly as DEVICE_REMOVABLE despite being built into the device. > + * This is the side effect of a unique hardware configuration. > + * > + * Normally, Thunderbolt functionality is integrated to the SoC and > + * its root ports. > + * > + * Most devices: > + * root port --> USB-C port > + * > + * But X1 Carbon Gen 7/8 uses Whiskey Lake and Comet Lake SoC, which > + * don't come with Thunderbolt functionality. Therefore, a discrete > + * Thunderbolt Host PCI controller was added between the root port and > + * the USB-C port. > + * > + * Thinkpad Carbon 7/8s > + * (w/ Whiskey Lake and Comet Lake SoC): > + * root port --> JHL6540 --> USB-C port > + * > + * Because the root port is labeled by FW as "ExternalFacingPort", as > + * required by the DMAR ACPI spec, the JHL6540 chip is inaccurately Can you include a citation (spec name, revision, section) for this DMAR requirement? > + * labeled as DEVICE_REMOVABLE by the kernel pci driver. > + * Therefore, the built-in USB-C ports do not enumerate when policies > + * forbidding external pci devices are enforced. > + * > + * This fix relabels the pci components in the built-in JHL6540 chip as > + * DEVICE_FIXED, ensuring that the built-in USB-C ports always enumerate > + * properly as intended. > + * > + * This fix also labels the external facing components of the JHL6540 as > + * external_facing, so that the pci attach policy works as intended. > + * > + * The ASCII diagram below describes the pci layout of the JHL6540 chip. > + * > + * Root Port > + * [8086:02b4] or [8086:9db4] > + * | > + * JHL6540 Chip > + * __________________________________________________ > + * | Bridge | > + * | PCI ID -> [8086:15d3] | > + * | DEVFN -> (00) | > + * | _________________|__________________ | > + * | | | | | | > + * | Bridge Bridge Bridge Bridge | > + * | [8086:15d3] [8086:15d3] [8086:15d3] [8086:15d3] | > + * | (00) (08) (10) (20) | > + * | | | | | | > + * | NHI | USB Controller | | > + * | [8086:15d2] | [8086:15d4] | | > + * | (00) | (00) | | > + * | |___________| |___________| | > + * |____________|________________________|____________| > + * | | > + * USB-C Port USB-C Port > + * > + * > + * Based on what a JHL6549 pci component's pci id, subsystem device id > + * and devfn are, we can infer if it is fixed and if it faces a usb port; > + * which would mean it is external facing. > + * This quirk uses these values to identify the pci components and set the > + * properties accordingly. Random nits: Capitalize spec terms like "PCI", "USB", "Root Port", etc., consistently in English text. Rewrap to fill 78 columns or so. Add blank lines between paragraphs. This applies to the commit log (which should be wrapped to 75 to allow for "git log" indent) as well as comments. But honestly, I hope we can figure out a solution that doesn't require a big comment like this. Checking for random device IDs to deduce the topology is not the way PCI is supposed to work. PCI is designed to be enumerable, so software can learn what it needs to know by interrogating the hardware directly. For cases where that's impossible, ACPI is supposed to fill the gaps, e.g., with "ExternalFacingPort". If this patch covers over a gap that firmware doesn't handle yet, maybe we need to add some new firmware interface. If that's the case, we can add quirks for platforms that don't have the new interface. But we at least need a plan that doesn't require quirks indefinitely. > + */ > +static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev) > +{ > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > + return; > + > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > + if (dev->subsystem_device != 0x22be && // Gen 8 > + dev->subsystem_device != 0x2292) { // Gen 7 > + return; > + } > + > + dev_set_removable(&dev->dev, DEVICE_FIXED); > + > + /* Not all 0x15d3 components are external facing */ > + if (dev->device == 0x15d3 && > + dev->devfn != 0x08 && > + dev->devfn != 0x20) { > + return; > + } > + > + dev->external_facing = true; > +} > + > +/* > + * We also need to relabel the root port as a consequence of changing > + * the JHL6540's PCIE hierarchy. > + */ > +static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev) > +{ > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > + return; > + > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > + if (dev->subsystem_device != 0x22be && // Gen 8 > + dev->subsystem_device != 0x2292) { // Gen 7 > + return; > + } > + > + dev->external_facing = false; > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d3, carbon_X1_fixup_relabel_alpine_ridge); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d2, carbon_X1_fixup_relabel_alpine_ridge); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d4, carbon_X1_fixup_relabel_alpine_ridge); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x02b4, carbon_X1_fixup_rootport_not_removable); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9db4, carbon_X1_fixup_rootport_not_removable); > + > /* > * Following are device-specific reset methods which can be used to > * reset a single function if other methods (e.g. FLR, PM D0->D3) are > > --- > base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86 > change-id: 20231219-thunderbolt-pci-patch-4-ede71cb833c4 > > Best regards, > -- > Esther Shimanovich <eshimanovich@chromium.org> >
[adding Mika and Mario to the list of recipients, original patch is here: https://lore.kernel.org/all/20231221-thunderbolt-pci-patch-4-v4-1-2e136e57c9bc@chromium.org/ a lot of folks are on vacation, replies might not be sent before Jan 8; full quote without further comments below] On Thu, Dec 21, 2023 at 03:53:42PM -0500, Esther Shimanovich wrote: > On Lenovo X1 Carbon Gen 7/8 devices, when a platform enables a policy to > distrust removable PCI devices, the build-in USB-C ports stop working at > all. > This happens because these X1 Carbon models have a unique feature; a > Thunderbolt controller that is discrete from the SoC. The software sees > this controller, and incorrectly assumes it is a removable PCI device, > even though it is fixed to the computer and is wired to the computer's > own USB-C ports. > > Relabel all the components of the JHL6540 controller as DEVICE_FIXED, > and where applicable, external_facing. > > Ensure that the security policy to distrust external PCI devices works > as intended, and that the device's USB-C ports are able to enumerate > even when the policy is enabled. > > Signed-off-by: Esther Shimanovich <eshimanovich@chromium.org> > --- > Changes in v4: > - replaced a dmi check in the rootport quirk with a subsystem vendor and > device check. > - Link to v3: https://lore.kernel.org/r/20231220-thunderbolt-pci-patch-4-v3-1-056fd1717d06@chromium.org > > Changes in v3: > - removed redundant dmi check, as the subsystem vendor check is > sufficient > - switched to PCI_VENDOR_ID_LENOVO instead of hex code > - Link to v2: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v2-1-ec2d7af45a9b@chromium.org > > Changes in v2: > - nothing new, v1 was just a test run to see if the ASCII diagram would > be rendered properly in mutt and k-9 > - for folks using gmail, make sure to select "show original" on the top > right, as otherwise the diagram will be garbled by the standard > non-monospace font > - Link to v1: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v1-1-4e8e3773f0a9@chromium.org > --- > drivers/pci/quirks.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 112 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index ea476252280a..34e43323ff14 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -3873,6 +3873,118 @@ DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL, > quirk_apple_poweroff_thunderbolt); > #endif > > +/* > + * On most ThinkPad Carbon 7/8s, JHL6540 Thunderbolt 3 bridges are set > + * incorrectly as DEVICE_REMOVABLE despite being built into the device. > + * This is the side effect of a unique hardware configuration. > + * > + * Normally, Thunderbolt functionality is integrated to the SoC and > + * its root ports. > + * > + * Most devices: > + * root port --> USB-C port > + * > + * But X1 Carbon Gen 7/8 uses Whiskey Lake and Comet Lake SoC, which > + * don't come with Thunderbolt functionality. Therefore, a discrete > + * Thunderbolt Host PCI controller was added between the root port and > + * the USB-C port. > + * > + * Thinkpad Carbon 7/8s > + * (w/ Whiskey Lake and Comet Lake SoC): > + * root port --> JHL6540 --> USB-C port > + * > + * Because the root port is labeled by FW as "ExternalFacingPort", as > + * required by the DMAR ACPI spec, the JHL6540 chip is inaccurately > + * labeled as DEVICE_REMOVABLE by the kernel pci driver. > + * Therefore, the built-in USB-C ports do not enumerate when policies > + * forbidding external pci devices are enforced. > + * > + * This fix relabels the pci components in the built-in JHL6540 chip as > + * DEVICE_FIXED, ensuring that the built-in USB-C ports always enumerate > + * properly as intended. > + * > + * This fix also labels the external facing components of the JHL6540 as > + * external_facing, so that the pci attach policy works as intended. > + * > + * The ASCII diagram below describes the pci layout of the JHL6540 chip. > + * > + * Root Port > + * [8086:02b4] or [8086:9db4] > + * | > + * JHL6540 Chip > + * __________________________________________________ > + * | Bridge | > + * | PCI ID -> [8086:15d3] | > + * | DEVFN -> (00) | > + * | _________________|__________________ | > + * | | | | | | > + * | Bridge Bridge Bridge Bridge | > + * | [8086:15d3] [8086:15d3] [8086:15d3] [8086:15d3] | > + * | (00) (08) (10) (20) | > + * | | | | | | > + * | NHI | USB Controller | | > + * | [8086:15d2] | [8086:15d4] | | > + * | (00) | (00) | | > + * | |___________| |___________| | > + * |____________|________________________|____________| > + * | | > + * USB-C Port USB-C Port > + * > + * > + * Based on what a JHL6549 pci component's pci id, subsystem device id > + * and devfn are, we can infer if it is fixed and if it faces a usb port; > + * which would mean it is external facing. > + * This quirk uses these values to identify the pci components and set the > + * properties accordingly. > + */ > +static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev) > +{ > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > + return; > + > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > + if (dev->subsystem_device != 0x22be && // Gen 8 > + dev->subsystem_device != 0x2292) { // Gen 7 > + return; > + } > + > + dev_set_removable(&dev->dev, DEVICE_FIXED); > + > + /* Not all 0x15d3 components are external facing */ > + if (dev->device == 0x15d3 && > + dev->devfn != 0x08 && > + dev->devfn != 0x20) { > + return; > + } > + > + dev->external_facing = true; > +} > + > +/* > + * We also need to relabel the root port as a consequence of changing > + * the JHL6540's PCIE hierarchy. > + */ > +static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev) > +{ > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > + return; > + > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > + if (dev->subsystem_device != 0x22be && // Gen 8 > + dev->subsystem_device != 0x2292) { // Gen 7 > + return; > + } > + > + dev->external_facing = false; > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d3, carbon_X1_fixup_relabel_alpine_ridge); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d2, carbon_X1_fixup_relabel_alpine_ridge); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d4, carbon_X1_fixup_relabel_alpine_ridge); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x02b4, carbon_X1_fixup_rootport_not_removable); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9db4, carbon_X1_fixup_rootport_not_removable); > + > /* > * Following are device-specific reset methods which can be used to > * reset a single function if other methods (e.g. FLR, PM D0->D3) are > > --- > base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86 > change-id: 20231219-thunderbolt-pci-patch-4-ede71cb833c4 > > Best regards, > -- > Esther Shimanovich <eshimanovich@chromium.org>
Thanks Lukas for including me. On Thu, Dec 28, 2023 at 02:25:17PM +0100, Lukas Wunner wrote: > [adding Mika and Mario to the list of recipients, original patch is here: > https://lore.kernel.org/all/20231221-thunderbolt-pci-patch-4-v4-1-2e136e57c9bc@chromium.org/ > a lot of folks are on vacation, replies might not be sent before Jan 8; > full quote without further comments below] > > On Thu, Dec 21, 2023 at 03:53:42PM -0500, Esther Shimanovich wrote: > > On Lenovo X1 Carbon Gen 7/8 devices, when a platform enables a policy to > > distrust removable PCI devices, the build-in USB-C ports stop working at > > all. > > This happens because these X1 Carbon models have a unique feature; a > > Thunderbolt controller that is discrete from the SoC. The software sees > > this controller, and incorrectly assumes it is a removable PCI device, > > even though it is fixed to the computer and is wired to the computer's > > own USB-C ports. Yes, the ExternalFacingPort applies to root ports so that includes everything below that. > > Relabel all the components of the JHL6540 controller as DEVICE_FIXED, > > and where applicable, external_facing. > > > > Ensure that the security policy to distrust external PCI devices works > > as intended, and that the device's USB-C ports are able to enumerate > > even when the policy is enabled. This has been working just fine so far and as far as I can tell there is no such "policy" in place in the mainline kernel. > > > > Signed-off-by: Esther Shimanovich <eshimanovich@chromium.org> > > --- > > Changes in v4: > > - replaced a dmi check in the rootport quirk with a subsystem vendor and > > device check. > > - Link to v3: https://lore.kernel.org/r/20231220-thunderbolt-pci-patch-4-v3-1-056fd1717d06@chromium.org > > > > Changes in v3: > > - removed redundant dmi check, as the subsystem vendor check is > > sufficient > > - switched to PCI_VENDOR_ID_LENOVO instead of hex code > > - Link to v2: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v2-1-ec2d7af45a9b@chromium.org > > > > Changes in v2: > > - nothing new, v1 was just a test run to see if the ASCII diagram would > > be rendered properly in mutt and k-9 > > - for folks using gmail, make sure to select "show original" on the top > > right, as otherwise the diagram will be garbled by the standard > > non-monospace font > > - Link to v1: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v1-1-4e8e3773f0a9@chromium.org > > --- > > drivers/pci/quirks.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 112 insertions(+) > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > index ea476252280a..34e43323ff14 100644 > > --- a/drivers/pci/quirks.c > > +++ b/drivers/pci/quirks.c > > @@ -3873,6 +3873,118 @@ DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL, > > quirk_apple_poweroff_thunderbolt); > > #endif > > > > +/* > > + * On most ThinkPad Carbon 7/8s, JHL6540 Thunderbolt 3 bridges are set > > + * incorrectly as DEVICE_REMOVABLE despite being built into the device. > > + * This is the side effect of a unique hardware configuration. There is really nothing "unique" here. It's exactly as specified by this: https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports and being used in many many system already out there and those have been working just fine. > > + * > > + * Normally, Thunderbolt functionality is integrated to the SoC and > > + * its root ports. > > + * > > + * Most devices: > > + * root port --> USB-C port > > + * > > + * But X1 Carbon Gen 7/8 uses Whiskey Lake and Comet Lake SoC, which > > + * don't come with Thunderbolt functionality. Therefore, a discrete > > + * Thunderbolt Host PCI controller was added between the root port and > > + * the USB-C port. > > + * > > + * Thinkpad Carbon 7/8s > > + * (w/ Whiskey Lake and Comet Lake SoC): > > + * root port --> JHL6540 --> USB-C port > > + * > > + * Because the root port is labeled by FW as "ExternalFacingPort", as > > + * required by the DMAR ACPI spec, the JHL6540 chip is inaccurately > > + * labeled as DEVICE_REMOVABLE by the kernel pci driver. > > + * Therefore, the built-in USB-C ports do not enumerate when policies > > + * forbidding external pci devices are enforced. > > + * > > + * This fix relabels the pci components in the built-in JHL6540 chip as > > + * DEVICE_FIXED, ensuring that the built-in USB-C ports always enumerate > > + * properly as intended. > > + * > > + * This fix also labels the external facing components of the JHL6540 as > > + * external_facing, so that the pci attach policy works as intended. > > + * > > + * The ASCII diagram below describes the pci layout of the JHL6540 chip. > > + * > > + * Root Port > > + * [8086:02b4] or [8086:9db4] > > + * | > > + * JHL6540 Chip > > + * __________________________________________________ > > + * | Bridge | > > + * | PCI ID -> [8086:15d3] | > > + * | DEVFN -> (00) | > > + * | _________________|__________________ | > > + * | | | | | | > > + * | Bridge Bridge Bridge Bridge | > > + * | [8086:15d3] [8086:15d3] [8086:15d3] [8086:15d3] | > > + * | (00) (08) (10) (20) | > > + * | | | | | | > > + * | NHI | USB Controller | | > > + * | [8086:15d2] | [8086:15d4] | | > > + * | (00) | (00) | | > > + * | |___________| |___________| | > > + * |____________|________________________|____________| > > + * | | > > + * USB-C Port USB-C Port > > + * > > + * > > + * Based on what a JHL6549 pci component's pci id, subsystem device id > > + * and devfn are, we can infer if it is fixed and if it faces a usb port; > > + * which would mean it is external facing. > > + * This quirk uses these values to identify the pci components and set the > > + * properties accordingly. > > + */ > > +static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev) > > +{ > > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > > + return; > > + > > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > > + if (dev->subsystem_device != 0x22be && // Gen 8 > > + dev->subsystem_device != 0x2292) { // Gen 7 > > + return; > > + } > > + > > + dev_set_removable(&dev->dev, DEVICE_FIXED); > > + > > + /* Not all 0x15d3 components are external facing */ > > + if (dev->device == 0x15d3 && > > + dev->devfn != 0x08 && > > + dev->devfn != 0x20) { > > + return; > > + } > > + > > + dev->external_facing = true; > > +} > > + > > +/* > > + * We also need to relabel the root port as a consequence of changing > > + * the JHL6540's PCIE hierarchy. > > + */ > > +static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev) > > +{ > > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > > + return; > > + > > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > > + if (dev->subsystem_device != 0x22be && // Gen 8 > > + dev->subsystem_device != 0x2292) { // Gen 7 > > + return; > > + } > > + > > + dev->external_facing = false; > > +} > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d3, carbon_X1_fixup_relabel_alpine_ridge); > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d2, carbon_X1_fixup_relabel_alpine_ridge); > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d4, carbon_X1_fixup_relabel_alpine_ridge); > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x02b4, carbon_X1_fixup_rootport_not_removable); > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9db4, carbon_X1_fixup_rootport_not_removable); This is not scalable at all. You would need to include lots of systems here. And there should be no issue at all anyways. Can you elaborate what the issue is and which mainline kernel you are using to reproduce this?
Thank you for all your comments! I really appreciate all your help with this. I will address the style feedback once we reach a decision on how we will fix this bug. I first will respond to your comments, and then I will list out the possible solutions to this bug, in a way that takes into account all of your insights. On Tue, Dec 26, 2023 at 7:15 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > Can you include a citation (spec name, revision, section) for this > DMAR requirement? > This was my mistake–I misinterpreted what a firmware developer told me. This is a firmware ACPI requirement from windows, which is not in the DMAR spec. Windows uses it to identify externally exposed PCIE root ports. https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports > But I don't see where the defect is here. And I doubt that this is > really a unique situation. So it's likely that this will happen on > other systems, and we don't want to have to add quirks every time > another one shows up. ... > don't have the new interface. But we at least need a plan that > doesn't require quirks indefinitely. ... On Thu, Dec 28, 2023 at 8:41 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > This is not scalable at all. You would need to include lots of systems > here. And there should be no issue at all anyways. My team tests hundreds of different devices, and this is the only one which exhibited this issue that we’ve seen so far. No other devices we’ve seen so far have a discrete internal Thunderbolt controller which is treated as a removable device. Therefore, we don’t expect that a large number of devices will need this quirk. > There is really nothing "unique" here. It's exactly as specified by > this: > > https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports > > and being used in many many system already out there and those have been > working just fine. I don’t know how many computers have a discrete Thunderbolt chip that is separate from their CPU, but this doesn’t seem to be a common occurrence. These devices were made during a narrow window of time when CPUs didn’t have Thunderbolt features yet, so a separate JHL6540 chip had to be added so that Lenovo could include Thunderbolt on X1 Carbon Gen 7/8. As you said, these devices do indeed work fine in cases where you don’t care if a PCI Thunderbolt device is internal or external, which is most cases. Problems happen only whenever someone adds a security policy, or some other feature that cares about the distinction between a fixed or removable PCI device. > This has been working just fine so far and as far as I can tell there is > no such "policy" in place in the mainline kernel. Correct, there is no such policy in the mainline kernel as of now. The bug is that the linux kernel’s “removable” property is inaccurate for this device. > Can you elaborate what the issue is and which mainline kernel you are > using to reproduce this? Thanks for this question! On a Lenovo Thinkpad Gen 7/Gen 8 computer with the linux kernel installed, when you look at the properties of the JHL6540 Thunderbolt controller, you see that it is incorrectly labeled as removable. I have replicated this bug on the b85ea95d0864 Linux 6.7-rc1 kernel. Before my patch, you see that the JHL6540 controller is inaccurately labeled “removable”: $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e {removable} -e {device} -e {vendor} -e looking looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': ATTR{device}=="0x15d3" ATTR{removable}=="removable" ATTR{vendor}=="0x8086" looking at parent device '/devices/pci0000:00/0000:00:1d.4': ATTRS{device}=="0x02b4" ATTRS{vendor}=="0x8086" looking at parent device '/devices/pci0000:00': After applying the patch in this ticket, we see the JHL6540 controller is now labeled as “fixed”: $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e {removable} -e {device} -e {vendor} -e looking looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': ATTR{device}=="0x15d3" ATTR{removable}=="fixed" ATTR{vendor}=="0x8086" looking at parent device '/devices/pci0000:00/0000:00:1d.4': ATTRS{device}=="0x02b4" ATTRS{vendor}=="0x8086" looking at parent device '/devices/pci0000:00': OK so here is the part where I share what I’ve developed as a result of your comments: The two options I see to resolve this are as follows: 1) Either we fix this by adding a new firmware interface as Bjorn Helgaas brought up. 2) Alternatively we may address this through a cleaned-up version of this patch If the solution is to add a firmware interface, how would I go about that process? Could you put me in touch with someone with that know-how? Would we have a temporary software quirk in place while the firmware spec is being updated? I am deferring to your expertise and knowledge in solving this bug. Thank you for all your help.
On Wed, Jan 17, 2024 at 04:21:18PM -0500, Esther Shimanovich wrote: > Thank you for all your comments! I really appreciate all your help > with this. I will address the style feedback once we reach a decision > on how we will fix this bug. > I first will respond to your comments, and then I will list out the > possible solutions to this bug, in a way that takes into account all > of your insights. > > On Tue, Dec 26, 2023 at 7:15 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > Can you include a citation (spec name, revision, section) for this > > DMAR requirement? > > > This was my mistake–I misinterpreted what a firmware developer told > me. This is a firmware ACPI requirement from windows, which is not in > the DMAR spec. Windows uses it to identify externally exposed PCIE > root ports. > https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports > > > But I don't see where the defect is here. And I doubt that this is > > really a unique situation. So it's likely that this will happen on > > other systems, and we don't want to have to add quirks every time > > another one shows up. > ... > > don't have the new interface. But we at least need a plan that > > doesn't require quirks indefinitely. > ... > On Thu, Dec 28, 2023 at 8:41 AM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > This is not scalable at all. You would need to include lots of systems > > here. And there should be no issue at all anyways. > My team tests hundreds of different devices, and this is the only one > which exhibited this issue that we’ve seen so far. > No other devices we’ve seen so far have a discrete internal > Thunderbolt controller which is treated as a removable device. > Therefore, we don’t expect that a large number of devices will need > this quirk. Well that's pretty much all Intel Titan Ridge and Maple Ridge based systems. Some early ones did not use IOMMU but all the rest do. > > There is really nothing "unique" here. It's exactly as specified by > > this: > > > > https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports > > > > and being used in many many system already out there and those have been > > working just fine. > I don’t know how many computers have a discrete Thunderbolt chip that > is separate from their CPU, but this doesn’t seem to be a common > occurrence. > These devices were made during a narrow window of time when CPUs > didn’t have Thunderbolt features yet, so a separate JHL6540 chip had > to be added so that Lenovo could include Thunderbolt on X1 Carbon Gen > 7/8. Before Intel Ice Lake it was all discrete and it is still discrete with the Barlow Ridge controller who will have exact same ExternalFacing port as the previous models. > As you said, these devices do indeed work fine in cases where you > don’t care if a PCI Thunderbolt device is internal or external, which > is most cases. > Problems happen only whenever someone adds a security policy, or some > other feature that cares about the distinction between a fixed or > removable PCI device. Do we have such security policy in the mainline kernel? > > This has been working just fine so far and as far as I can tell there is > > no such "policy" in place in the mainline kernel. > Correct, there is no such policy in the mainline kernel as of now. The > bug is that the linux kernel’s “removable” property is inaccurate for > this device. Or perhaps the "policy" should know this better? IIRC there were some "exceptions" in the Chrome kernel that allowed to put these devices into "allowlist" is this not the case anymore? > > Can you elaborate what the issue is and which mainline kernel you are > > using to reproduce this? > Thanks for this question! On a Lenovo Thinkpad Gen 7/Gen 8 computer > with the linux kernel installed, when you look at the properties of > the JHL6540 Thunderbolt controller, you see that it is incorrectly > labeled as removable. I have replicated this bug on the b85ea95d0864 > Linux 6.7-rc1 kernel. > > Before my patch, you see that the JHL6540 controller is inaccurately > labeled “removable”: > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e > {removable} -e {device} -e {vendor} -e looking > looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': > ATTR{device}=="0x15d3" > ATTR{removable}=="removable" > ATTR{vendor}=="0x8086" This is actually accurate. The Thunderbolt controller is itself hot-removable and that BTW happens to be hot-removed when fwupd applies firmware upgrades to the device.
On 1/18/2024 00:00, Mika Westerberg wrote: >> Before my patch, you see that the JHL6540 controller is inaccurately >> labeled “removable”: >> $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e >> {removable} -e {device} -e {vendor} -e looking >> looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': >> ATTR{device}=="0x15d3" >> ATTR{removable}=="removable" >> ATTR{vendor}=="0x8086" > > This is actually accurate. The Thunderbolt controller is itself > hot-removable and that BTW happens to be hot-removed when fwupd applies > firmware upgrades to the device. Depending on the consumers of this removable attribute I wonder if we need to a new ATTR of "external" instead of overloading "removable".
On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote: > On 1/18/2024 00:00, Mika Westerberg wrote: > > > Before my patch, you see that the JHL6540 controller is inaccurately > > > labeled “removable”: > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e > > > {removable} -e {device} -e {vendor} -e looking > > > looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': > > > ATTR{device}=="0x15d3" > > > ATTR{removable}=="removable" > > > ATTR{vendor}=="0x8086" > > > > This is actually accurate. The Thunderbolt controller is itself > > hot-removable and that BTW happens to be hot-removed when fwupd applies > > firmware upgrades to the device. This is quite interesting take. Does fwupd rip the controller out of the box to update it? By that account your touchpad is also removable as it may stop functioning when its firmware gets updated. > > Depending on the consumers of this removable attribute I wonder if we need > to a new ATTR of "external" instead of overloading "removable". Isn't this the same thing? From Documentation/ABI/testing/sysfs-devices-removable: What: /sys/devices/.../removable Date: May 2021 Contact: Rajat Jain <rajatxjain@gmail.com> Description: Information about whether a given device can be removed from the platform by the user. This is determined by its subsystem in a bus / platform-specific way. This attribute is only present for devices that can support determining such information: =========== =================================================== "removable" device can be removed from the platform by the user "fixed" device is fixed to the platform / cannot be removed by the user. Note this "by the user". Maybe we should add word "physically" here to qualify the meaning completely, but that is what it is. Not that it disappears from the bus or stops operating for some time because of firmware updates, but it can be physically detached from the platform/system. Thanks.
On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote: > On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote: > > On 1/18/2024 00:00, Mika Westerberg wrote: > > > > Before my patch, you see that the JHL6540 controller is inaccurately > > > > labeled “removable”: > > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e > > > > {removable} -e {device} -e {vendor} -e looking > > > > looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': > > > > ATTR{device}=="0x15d3" > > > > ATTR{removable}=="removable" > > > > ATTR{vendor}=="0x8086" > > > > > > This is actually accurate. The Thunderbolt controller is itself > > > hot-removable and that BTW happens to be hot-removed when fwupd applies > > > firmware upgrades to the device. > > This is quite interesting take. Does fwupd rip the controller out of the > box to update it? By that account your touchpad is also removable as it > may stop functioning when its firmware gets updated. > > > > > Depending on the consumers of this removable attribute I wonder if we need > > to a new ATTR of "external" instead of overloading "removable". > > Isn't this the same thing? From > Documentation/ABI/testing/sysfs-devices-removable: > > What: /sys/devices/.../removable > Date: May 2021 > Contact: Rajat Jain <rajatxjain@gmail.com> > Description: > Information about whether a given device can be removed from the > platform by the user. This is determined by its subsystem in a > bus / platform-specific way. This attribute is only present for > devices that can support determining such information: > > =========== =================================================== > "removable" device can be removed from the platform by the user > "fixed" device is fixed to the platform / cannot be removed > by the user. > > Note this "by the user". Maybe we should add word "physically" here to > qualify the meaning completely, but that is what it is. Not that it > disappears from the bus or stops operating for some time because of > firmware updates, but it can be physically detached from the > platform/system. FTR this is where Rajat and Bjorn have agreed what the attribute means when it was moved from USB-specific to general device property: https://lore.kernel.org/all/20210511230228.GA2429744@bjorn-Precision-5520/ Thanks.
On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote: > On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote: > > On 1/18/2024 00:00, Mika Westerberg wrote: > > > > Before my patch, you see that the JHL6540 controller is inaccurately > > > > labeled “removable”: > > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e > > > > {removable} -e {device} -e {vendor} -e looking > > > > looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': > > > > ATTR{device}=="0x15d3" > > > > ATTR{removable}=="removable" > > > > ATTR{vendor}=="0x8086" > > > > > > This is actually accurate. The Thunderbolt controller is itself > > > hot-removable and that BTW happens to be hot-removed when fwupd applies > > > firmware upgrades to the device. > > This is quite interesting take. Does fwupd rip the controller out of the > box to update it? By that account your touchpad is also removable as it > may stop functioning when its firmware gets updated. The Thunderbolt controller is connected to a hotpluggable PCIe root port so it will be dissappear from the userspace so that "removable" in that sense is accurate. > > Depending on the consumers of this removable attribute I wonder if we need > > to a new ATTR of "external" instead of overloading "removable". > > Isn't this the same thing? From > Documentation/ABI/testing/sysfs-devices-removable: > > What: /sys/devices/.../removable > Date: May 2021 > Contact: Rajat Jain <rajatxjain@gmail.com> > Description: > Information about whether a given device can be removed from the > platform by the user. This is determined by its subsystem in a > bus / platform-specific way. This attribute is only present for > devices that can support determining such information: > > =========== =================================================== > "removable" device can be removed from the platform by the user > "fixed" device is fixed to the platform / cannot be removed > by the user. > > Note this "by the user". Maybe we should add word "physically" here to > qualify the meaning completely, but that is what it is. Not that it > disappears from the bus or stops operating for some time because of > firmware updates, but it can be physically detached from the > platform/system. That would be good to fully qualify what this means. But I get you it is intented to identify devices that can be physically unplugged from the system.
On Fri, Jan 19, 2024 at 07:37:56AM +0200, Mika Westerberg wrote: > On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote: > > On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote: > > > On 1/18/2024 00:00, Mika Westerberg wrote: > > > > > Before my patch, you see that the JHL6540 controller is inaccurately > > > > > labeled “removable”: > > > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e > > > > > {removable} -e {device} -e {vendor} -e looking > > > > > looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': > > > > > ATTR{device}=="0x15d3" > > > > > ATTR{removable}=="removable" > > > > > ATTR{vendor}=="0x8086" > > > > > > > > This is actually accurate. The Thunderbolt controller is itself > > > > hot-removable and that BTW happens to be hot-removed when fwupd applies > > > > firmware upgrades to the device. > > > > This is quite interesting take. Does fwupd rip the controller out of the > > box to update it? By that account your touchpad is also removable as it > > may stop functioning when its firmware gets updated. > > The Thunderbolt controller is connected to a hotpluggable PCIe root port > so it will be dissappear from the userspace so that "removable" in that > sense is accurate. There are systems as well where the Thunderbolt (and/or xHCI) controller only appears if there is anything plugged to the physical Type-C ports and it gets removed pretty soon after the physical device gets unplugged. These are also the same Alpine Ridge and Titan Ridge controllers that this patch is dealing with. I tried to think about some sort of more generic heuristic how to figure out that the controller is actually inside the physical system but there is a problem that the same controller can appear on the bus as well, eg. you plug in Thunderbolt dock and that one has xHCI controller too. That device should definitely be "removable". With the "software CM" systems we have a couple of additional hints in the ACPI tables that can be used to identify the "tunneled" ports but this does not apply to the older systems I'm afraid. Now if I understand the reason behind this patch is actually not about "removability" that much than about identifying a trusted vs. untrusted device and attaching a driver to those. I was under impression that there is already a solution to this in ChromeOS kernel. It has an allowlist of drivers that are allowed to attach these devices and that includes the PCIe port drivers, xhci_hcd and the thunderbolt driver, possibly something else too. Is this not working for your case?
On Fri, Jan 19, 2024 at 09:48:29AM +0200, Mika Westerberg wrote: > On Fri, Jan 19, 2024 at 07:37:56AM +0200, Mika Westerberg wrote: > > On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote: > > > On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote: > > > > On 1/18/2024 00:00, Mika Westerberg wrote: > > > > > > Before my patch, you see that the JHL6540 controller is inaccurately > > > > > > labeled “removable”: > > > > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e > > > > > > {removable} -e {device} -e {vendor} -e looking > > > > > > looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': > > > > > > ATTR{device}=="0x15d3" > > > > > > ATTR{removable}=="removable" > > > > > > ATTR{vendor}=="0x8086" > > > > > > > > > > This is actually accurate. The Thunderbolt controller is itself > > > > > hot-removable and that BTW happens to be hot-removed when fwupd applies > > > > > firmware upgrades to the device. > > > > > > This is quite interesting take. Does fwupd rip the controller out of the > > > box to update it? By that account your touchpad is also removable as it > > > may stop functioning when its firmware gets updated. > > > > The Thunderbolt controller is connected to a hotpluggable PCIe root port > > so it will be dissappear from the userspace so that "removable" in that > > sense is accurate. > > There are systems as well where the Thunderbolt (and/or xHCI) controller > only appears if there is anything plugged to the physical Type-C ports > and it gets removed pretty soon after the physical device gets > unplugged. These are also the same Alpine Ridge and Titan Ridge > controllers that this patch is dealing with. > > I tried to think about some sort of more generic heuristic how to figure > out that the controller is actually inside the physical system but there > is a problem that the same controller can appear on the bus as well, eg. > you plug in Thunderbolt dock and that one has xHCI controller too. That > device should definitely be "removable". With the "software CM" systems > we have a couple of additional hints in the ACPI tables that can be used > to identify the "tunneled" ports but this does not apply to the older > systems I'm afraid. The below "might" work: 1. A device that is directly behind a PCIe root or downstream port that has ->external_facing == 1. 2. It is a PCIe endpoint. 3. It is a sibling to or has any of the below PCI IDs (see drivers/thunderbolt/nhi.h for the definitions): PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_4C_NHI PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_2C_NHI PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_LP_USBONLY_NHI PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_USBONLY_NHI PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_USBONLY_NHI PCI_DEVICE_ID_INTEL_TITAN_RIDGE_2C_NHI PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_NHI And for all USB4 we can use the PCI class: PCI_CLASS_SERIAL_USB_USB4 (4. With USB4 systems we could also add the check that the device is not below the tunneled PCIe ports but that's kind of redundant). I think this covers the existing devices as well as future discrete USB4 controllers and marks both the NHI and the xHCI as "fixed" which we could also use to lift the bounce buffering restriction from them. @Mario, did I miss anything?
On Thu, Jan 18, 2024 at 1:01 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > Well that's pretty much all Intel Titan Ridge and Maple Ridge based > systems. Some early ones did not use IOMMU but all the rest do. ... > Before Intel Ice Lake it was all discrete and it is still discrete with > the Barlow Ridge controller who will have exact same ExternalFacing port > as the previous models. Next week I'll try those devices in our inventory to see if I can find another one with this bug. I'll get back to you on that! On Fri, Jan 19, 2024 at 2:58 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > Now if I understand the reason behind this patch is actually not about > "removability" that much than about identifying a trusted vs. untrusted > device and attaching a driver to those. I was under impression that > there is already a solution to this in ChromeOS kernel. It has an > allowlist of drivers that are allowed to attach these devices and that > includes the PCIe port drivers, xhci_hcd and the thunderbolt driver, > possibly something else too. Is this not working for your case? This device shouldn’t be treated as a removable thunderbolt device that is enabled by policy because it is an internal device that should be trusted in the first place. Even so, while learning about this problem I tried modifying the ChromeOS policy but it ended up not fixing the issue because it seems like there is an expectation for it to see an existing “fixed” thunderbolt port before it loads anything else. But the fixed thunderbolt port is prevented from enumerating during bootup, before the policy has a chance to work. On Fri, Jan 19, 2024 at 5:23 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > The below "might" work: > > 1. A device that is directly behind a PCIe root or downstream port that > has ->external_facing == 1. > > 2. It is a PCIe endpoint. > > 3. It is a sibling to or has any of the below PCI IDs (see > drivers/thunderbolt/nhi.h for the definitions): External pci devices seem to have the same kinds of chips in them. So this wouldn’t distinguish the “fixed but discrete” embedded pci devices from the “removable” pcie through usb devices. My monitor with thunderbolt capabilities has the JHL7540 chip in it. From the kernel's perspective, I have only found that the subsystem id is what distinguishes these devices. That is, unless I am missing something in your proposal that would distinguish a fixed JHL6540 chip from an external JHL6540 chip. Please correct me on any assumptions I get wrong!
On Fri, Jan 19, 2024 at 11:03:18AM -0500, Esther Shimanovich wrote: > On Thu, Jan 18, 2024 at 1:01 AM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > > > Well that's pretty much all Intel Titan Ridge and Maple Ridge based > > systems. Some early ones did not use IOMMU but all the rest do. > ... > > Before Intel Ice Lake it was all discrete and it is still discrete with > > the Barlow Ridge controller who will have exact same ExternalFacing port > > as the previous models. > > Next week I'll try those devices in our inventory to see if I can find > another one with this bug. I'll get back to you on that! > > On Fri, Jan 19, 2024 at 2:58 AM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > > > Now if I understand the reason behind this patch is actually not about > > "removability" that much than about identifying a trusted vs. untrusted > > device and attaching a driver to those. I was under impression that > > there is already a solution to this in ChromeOS kernel. It has an > > allowlist of drivers that are allowed to attach these devices and that > > includes the PCIe port drivers, xhci_hcd and the thunderbolt driver, > > possibly something else too. Is this not working for your case? > > This device shouldn’t be treated as a removable thunderbolt device > that is enabled by policy because it is an internal device that should > be trusted in the first place. > Even so, while learning about this problem I tried modifying the > ChromeOS policy but it ended up not fixing the issue because it seems > like there is an expectation for it to see an existing “fixed” > thunderbolt port before it loads anything else. But the fixed > thunderbolt port is prevented from enumerating during bootup, before > the policy has a chance to work. Have you contacted the Chrome kernel folks about this? Perhaps they have thought about this already? > On Fri, Jan 19, 2024 at 5:23 AM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > > > The below "might" work: > > > > 1. A device that is directly behind a PCIe root or downstream port that > > has ->external_facing == 1. > > > > 2. It is a PCIe endpoint. > > > > 3. It is a sibling to or has any of the below PCI IDs (see > > drivers/thunderbolt/nhi.h for the definitions): > > External pci devices seem to have the same kinds of chips in them. So > this wouldn’t distinguish the “fixed but discrete” embedded pci > devices from the “removable” pcie through usb devices. My monitor with > thunderbolt capabilities has the JHL7540 chip in it. From the kernel's > perspective, I have only found that the subsystem id is what > distinguishes these devices. > > That is, unless I am missing something in your proposal that would > distinguish a fixed JHL6540 chip from an external JHL6540 chip. Please > correct me on any assumptions I get wrong! Yes, you are missing the 1. that it needs to be directly behind the PCIe root or downstream port that is marked as ->external_facing, and the fact that there can't be NHI's (that's the host controller with the IDs I listed in 3.) anywhere else except starting the topology according the USB4 spec (and the same applies to Thunderbolt 1-3).
On 1/19/2024 04:22, Mika Westerberg wrote: > On Fri, Jan 19, 2024 at 09:48:29AM +0200, Mika Westerberg wrote: >> On Fri, Jan 19, 2024 at 07:37:56AM +0200, Mika Westerberg wrote: >>> On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote: >>>> On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote: >>>>> On 1/18/2024 00:00, Mika Westerberg wrote: >>>>>>> Before my patch, you see that the JHL6540 controller is inaccurately >>>>>>> labeled “removable”: >>>>>>> $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e >>>>>>> {removable} -e {device} -e {vendor} -e looking >>>>>>> looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': >>>>>>> ATTR{device}=="0x15d3" >>>>>>> ATTR{removable}=="removable" >>>>>>> ATTR{vendor}=="0x8086" >>>>>> >>>>>> This is actually accurate. The Thunderbolt controller is itself >>>>>> hot-removable and that BTW happens to be hot-removed when fwupd applies >>>>>> firmware upgrades to the device. >>>> >>>> This is quite interesting take. Does fwupd rip the controller out of the >>>> box to update it? By that account your touchpad is also removable as it >>>> may stop functioning when its firmware gets updated. >>> >>> The Thunderbolt controller is connected to a hotpluggable PCIe root port >>> so it will be dissappear from the userspace so that "removable" in that >>> sense is accurate. >> >> There are systems as well where the Thunderbolt (and/or xHCI) controller >> only appears if there is anything plugged to the physical Type-C ports >> and it gets removed pretty soon after the physical device gets >> unplugged. These are also the same Alpine Ridge and Titan Ridge >> controllers that this patch is dealing with. >> >> I tried to think about some sort of more generic heuristic how to figure >> out that the controller is actually inside the physical system but there >> is a problem that the same controller can appear on the bus as well, eg. >> you plug in Thunderbolt dock and that one has xHCI controller too. That >> device should definitely be "removable". With the "software CM" systems >> we have a couple of additional hints in the ACPI tables that can be used >> to identify the "tunneled" ports but this does not apply to the older >> systems I'm afraid. > > The below "might" work: > > 1. A device that is directly behind a PCIe root or downstream port that > has ->external_facing == 1. > > 2. It is a PCIe endpoint. > > 3. It is a sibling to or has any of the below PCI IDs (see > drivers/thunderbolt/nhi.h for the definitions): > > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_4C_NHI > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_2C_NHI > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_LP_USBONLY_NHI > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_USBONLY_NHI > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_USBONLY_NHI > PCI_DEVICE_ID_INTEL_TITAN_RIDGE_2C_NHI > PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_NHI > > And for all USB4 we can use the PCI class: > > PCI_CLASS_SERIAL_USB_USB4 > > (4. With USB4 systems we could also add the check that the device is not > below the tunneled PCIe ports but that's kind of redundant). > > I think this covers the existing devices as well as future discrete USB4 > controllers and marks both the NHI and the xHCI as "fixed" which we > could also use to lift the bounce buffering restriction from them. > > @Mario, did I miss anything? The bounce buffering restriction is only for unaligned DMA isn't it? Does that tend to happen a lot? But otherwise I think this does a good job. It will cover external enclosure cases too because of having to check it's directly behind a root port. But it should also include comments about why it's not needed on newer systems (IE the ACPI hints for someone with no prior knowledge looking at this to find).
On Mon, Jan 22, 2024 at 05:50:24PM -0600, Mario Limonciello wrote: > On 1/19/2024 04:22, Mika Westerberg wrote: > > On Fri, Jan 19, 2024 at 09:48:29AM +0200, Mika Westerberg wrote: > > > On Fri, Jan 19, 2024 at 07:37:56AM +0200, Mika Westerberg wrote: > > > > On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote: > > > > > On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote: > > > > > > On 1/18/2024 00:00, Mika Westerberg wrote: > > > > > > > > Before my patch, you see that the JHL6540 controller is inaccurately > > > > > > > > labeled “removable”: > > > > > > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e > > > > > > > > {removable} -e {device} -e {vendor} -e looking > > > > > > > > looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': > > > > > > > > ATTR{device}=="0x15d3" > > > > > > > > ATTR{removable}=="removable" > > > > > > > > ATTR{vendor}=="0x8086" > > > > > > > > > > > > > > This is actually accurate. The Thunderbolt controller is itself > > > > > > > hot-removable and that BTW happens to be hot-removed when fwupd applies > > > > > > > firmware upgrades to the device. > > > > > > > > > > This is quite interesting take. Does fwupd rip the controller out of the > > > > > box to update it? By that account your touchpad is also removable as it > > > > > may stop functioning when its firmware gets updated. > > > > > > > > The Thunderbolt controller is connected to a hotpluggable PCIe root port > > > > so it will be dissappear from the userspace so that "removable" in that > > > > sense is accurate. > > > > > > There are systems as well where the Thunderbolt (and/or xHCI) controller > > > only appears if there is anything plugged to the physical Type-C ports > > > and it gets removed pretty soon after the physical device gets > > > unplugged. These are also the same Alpine Ridge and Titan Ridge > > > controllers that this patch is dealing with. > > > > > > I tried to think about some sort of more generic heuristic how to figure > > > out that the controller is actually inside the physical system but there > > > is a problem that the same controller can appear on the bus as well, eg. > > > you plug in Thunderbolt dock and that one has xHCI controller too. That > > > device should definitely be "removable". With the "software CM" systems > > > we have a couple of additional hints in the ACPI tables that can be used > > > to identify the "tunneled" ports but this does not apply to the older > > > systems I'm afraid. > > > > The below "might" work: > > > > 1. A device that is directly behind a PCIe root or downstream port that > > has ->external_facing == 1. > > > > 2. It is a PCIe endpoint. > > > > 3. It is a sibling to or has any of the below PCI IDs (see > > drivers/thunderbolt/nhi.h for the definitions): > > > > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_4C_NHI > > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_2C_NHI > > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_LP_USBONLY_NHI > > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_USBONLY_NHI > > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_USBONLY_NHI > > PCI_DEVICE_ID_INTEL_TITAN_RIDGE_2C_NHI > > PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_NHI > > > > And for all USB4 we can use the PCI class: > > > > PCI_CLASS_SERIAL_USB_USB4 > > > > (4. With USB4 systems we could also add the check that the device is not > > below the tunneled PCIe ports but that's kind of redundant). > > > > I think this covers the existing devices as well as future discrete USB4 > > controllers and marks both the NHI and the xHCI as "fixed" which we > > could also use to lift the bounce buffering restriction from them. > > > > @Mario, did I miss anything? > > The bounce buffering restriction is only for unaligned DMA isn't it? Does > that tend to happen a lot? AFAICT no but this would allow to use IOMMU identity mappings instead of full mappings with these devices. > But otherwise I think this does a good job. It will cover external > enclosure cases too because of having to check it's directly behind a root > port. > > But it should also include comments about why it's not needed on newer > systems (IE the ACPI hints for someone with no prior knowledge looking at > this to find). Agree.
On Fri, Jan 19, 2024 at 11:03 AM Esther Shimanovich <eshimanovich@chromium.org> wrote: > Next week I'll try those devices in our inventory to see if I can find > another one with this bug. I'll get back to you on that! I ended up finding 11 additional models with this bug, from various manufacturers such as HP, Dell and Lenovo, that use the following thunderbolt controllers: JHL6240, JHL6340, JHL6540, JHL7540. So making this fix apply to all affected devices would make sense. On Mon, Jan 22, 2024 at 1:10 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > Yes, you are missing the 1. that it needs to be directly behind the PCIe > root or downstream port that is marked as ->external_facing, and the > fact that there can't be NHI's (that's the host controller with the IDs > I listed in 3.) anywhere else except starting the topology according the > USB4 spec (and the same applies to Thunderbolt 1-3). Thanks for the explanation. I'll write up a patch that implements this and takes into account all the feedback. Then I'll test it on multiple models, and then send it your way. Let me know if it makes sense to add you as a co-author. Very much appreciate your insights.
Hey! Asking for some help on implementation. So I implemented most of this, and successfully tested the quirk on 6 different devices with various types of discrete fixed JHL Thunderbolt chips. However I want to add an additional check. A device wouldn't need this quirk if it already has Thunderbolt functionality built in within its Root Port. I tried to use "is_thunderbolt" as an identifier for that type of device--- but when I tested on a device with a thunderbolt root port, "is_thunderbolt" was false for all the Thunderbolt PCI components listed below. ~ # lspci -nn | grep Thunderbolt 00:07.0 PCI bridge [0604]: Intel Corporation Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #1 [8086:9a25] (rev 01) 00:07.2 PCI bridge [0604]: Intel Corporation Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #2 [8086:9a27] (rev 01) 00:0d.0 USB controller [0c03]: Intel Corporation Tiger Lake-LP Thunderbolt 4 USB Controller [8086:9a13] (rev 01) 00:0d.2 USB controller [0c03]: Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #0 [8086:9a1b] (rev 01) 00:0d.3 USB controller [0c03]: Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #1 [8086:9a1d] (rev 01) The device I tested was the Lenovo carbon X1 Gen 9. When set_pcie_thunderbolt is run, the devices listed above return false on the pci_find_next_ext_capability check. I have spent some time trying to see if there are any ACPI values or any alternative indicators to reliably know if a root port has thunderbolt capabilities. I do see that ADBG is often set to TBT but that looks like a debugging tool in ACPI. I also looked through lspci -vvv and compared the output with a device that has no Thunderbolt built into its CPU, which gave me nothing. I also tried looking through values in /sys/bus and searching through the kernel, looking through logs etc. The only option I see now is figuring out how to get the string value returned by the lspci and parsing it, but I'm not 100% sure if all Thunderbolt CPUs would have Root ports specifically labeled as "Thunderbolt Root Port". I'm also not sure if that value is supposed to be used in that way. I was hoping that someone may have a suggestion here. Just for reference, this is the fix I have so far. I don't want to submit it as a new patch, until I resolve the above question. +static bool get_pci_exp_upstream_port(struct pci_dev *dev, + struct pci_dev **upstream_port) +{ + struct pci_dev *parent = dev; + + // If the dev is an upstream port, return itself + if (pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM) { + *upstream_port = dev; + return true; + } + + // Iterate through the dev's parents to find its upstream port + while ((parent = pci_upstream_bridge(parent))) { + if (pci_pcie_type(parent) == PCI_EXP_TYPE_UPSTREAM) { + *upstream_port = parent; + return true; + } + } + return false; +} + +static void relabel_internal_thunderbolt_chip(struct pci_dev *dev) +{ + struct pci_dev *upstream_port; + struct pci_dev *upstream_ports_parent; + + // Get the upstream port closest to the dev + if (!(get_pci_exp_upstream_port(dev, &upstream_port))) + return; + + // Next, we confirm if the upstream port is directly behind a PCIe root + // port. + if (!(upstream_ports_parent == pci_upstream_bridge(upstream_port))) + return; + if (!(pci_pcie_type(upstream_ports_parent) == PCI_EXP_TYPE_ROOT_PORT)) + return; // quirk does not apply + + // If a device's root port already has thunderbolt capabilities, then + // it shouldn't be using this quirk. + // TODO: THIS CHECK DOES NOT WORK + // I ALSO TRIED USING pci_is_thunderbolt_attached WHICH DIDN'T WORK EITHER + if (!(upstream_ports_parent->is_thunderbolt)) + return; // thunderbolt functionality is built into root port + + // Apply quirk + dev_set_removable(&dev->dev, DEVICE_FIXED); + + // External facing bridges must be marked as such so that devices + // below them can correctly be labeled as REMOVABLE + if ((pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM) + && (dev->devfn == 0x08 | dev->devfn == 0x20)) + dev->external_facing = true; +}
Hi, On Mon, Apr 15, 2024 at 06:34:00PM -0400, Esther Shimanovich wrote: > Hey! > Asking for some help on implementation. > So I implemented most of this, and successfully tested the quirk on 6 > different devices with various types of discrete fixed JHL Thunderbolt > chips. > > However I want to add an additional check. A device wouldn't need this > quirk if it already has Thunderbolt functionality built in within its > Root Port. There is really no "Thunderbolt functionality built in within its Root Port". More accurate is that the Thunderbolt/USB4 controller is integrated into the CPU and the PCIe tunnels go out through the CPU PCIe Root Ports. > I tried to use "is_thunderbolt" as an identifier for that type of > device--- but when I tested on a device with a thunderbolt root port, > "is_thunderbolt" was false for all the Thunderbolt PCI components > listed below. You should not use is_thunderbolt for anything else than with the legacy Apple systems. > ~ # lspci -nn | grep Thunderbolt > 00:07.0 PCI bridge [0604]: Intel Corporation Tiger Lake-LP Thunderbolt > 4 PCI Express Root Port #1 [8086:9a25] (rev 01) > 00:07.2 PCI bridge [0604]: Intel Corporation Tiger Lake-LP Thunderbolt > 4 PCI Express Root Port #2 [8086:9a27] (rev 01) > 00:0d.0 USB controller [0c03]: Intel Corporation Tiger Lake-LP > Thunderbolt 4 USB Controller [8086:9a13] (rev 01) > 00:0d.2 USB controller [0c03]: Intel Corporation Tiger Lake-LP > Thunderbolt 4 NHI #0 [8086:9a1b] (rev 01) > 00:0d.3 USB controller [0c03]: Intel Corporation Tiger Lake-LP > Thunderbolt 4 NHI #1 [8086:9a1d] (rev 01) Okay this is integrated Thunderbolt/USB4 controller. > The device I tested was the Lenovo carbon X1 Gen 9. When > set_pcie_thunderbolt is run, the devices listed above return false on > the pci_find_next_ext_capability check. > > I have spent some time trying to see if there are any ACPI values or > any alternative indicators to reliably know if a root port has > thunderbolt capabilities. I do see that ADBG is often set to TBT but > that looks like a debugging tool in ACPI. > > I also looked through lspci -vvv and compared the output with a device > that has no Thunderbolt built into its CPU, which gave me nothing. For integrated and most recent systems (that's with the software connection manager) the tunneled PCIe ports (Root or Downstream) can be identified by looking at "usb4-host-interface" ACPI _DSD property. In addition to this there is the "External Facing Port" ACPI _DSD property attached to them too. Maybe these help? With the firmware connection manager there is only the "External Facing Port" _DSD though. The Microsoft documentation for this that we also use in Linux is here: https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports
Thank you for your response! It is very much appreciated. On the Tiger Lake device I was testing on, the usb4-host-interface value is NOT listed in its ACPI. I then decided to query the ACPI values collected from devices in my office, to see if this issue is limited to my device. Ice Lake - 4 devices, none had "usb4-host-interface" Tiger Lake - 31 devices, none have "usb4-host-interface" Alder Lake - 32 devices, I see that 15 of them have "usb4-host-interface" in their ACPI Raptor Lake - 1 device, does not have "usb4-host-interface" It looks like only Alder Lake has usb4-host-interface listed in its ACPI for whatever reason. It seems like I cannot use usb4-host-interface as a determinant whether the CPU has Thunderbolt capabilities (thus not needing a discrete Thunderbolt chip). ExternalFacingPort is listed in devices that don't have CPUs with Thunderbolts, so that can't be a determinant. Am I missing something?
Hi, On Thu, Apr 18, 2024 at 03:43:08PM -0400, Esther Shimanovich wrote: > Thank you for your response! It is very much appreciated. > > On the Tiger Lake device I was testing on, the usb4-host-interface > value is NOT listed in its ACPI. > > I then decided to query the ACPI values collected from devices in my > office, to see if this issue is limited to my device. > Ice Lake - 4 devices, none had "usb4-host-interface" > Tiger Lake - 31 devices, none have "usb4-host-interface" > Alder Lake - 32 devices, I see that 15 of them have > "usb4-host-interface" in their ACPI > Raptor Lake - 1 device, does not have "usb4-host-interface" > > It looks like only Alder Lake has usb4-host-interface listed in its > ACPI for whatever reason. The reason is that it is there only for "software connection manager" systems. That's Chrome TGL and then all ADL and beyond (with integrated Thunderbolt/USB4). > It seems like I cannot use usb4-host-interface as a determinant > whether the CPU has Thunderbolt capabilities (thus not needing a > discrete Thunderbolt chip). > ExternalFacingPort is listed in devices that don't have CPUs with > Thunderbolts, so that can't be a determinant. > > Am I missing something? "ExternalFacingPort" is there for all firmware and software connection manager systems but only if they use IOMMU for DMA security (also /sys/bus/thunderbolt/devices/domainX/iommu_dma_protection == 1). So this is all integrated (Ice Lake+), then Alpine Ridge, Titan Ridge and Maple Ridge based recent systems. That leaves out still older systems with firmware connection manager with the "legacy" security settings. This is pretty much Alpine and Titan Ridge and I think those you can identify using their PCI IDs directly as the list will not be too big. Something like this taken from the drivers/thunderbolt/nhi.h: #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_LP_BRIDGE 0x15c0 #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_4C_BRIDGE 0x15d3 #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_2C_BRIDGE 0x15da #define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_2C_BRIDGE 0x15e7 #define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_BRIDGE 0x15ea #define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_DD_BRIDGE 0x15ef In addition to this some of the systems used the BIOS "assisted" enumeration which means that the controller is actually only present if you have either USB or Thunderbolt device connected. Othertimes it is not present at all (not sure if you want to label these differently though). In summary if you want to find out if the CPU has an integrated Thunderbolt/USB4 controller (I think this is what you ask above) then the best bet would be to use "ExternalFacingPort" because that is present in all those systems.
Thanks for the explanation! I still don't fully understand how that would work for my use case. Perhaps it would be better for me to describe the case I am trying to protect against. To rehash, this quirk was written for devices with discrete Thunderbolt controllers. For example, CometLake_CPU -> AlpineRidge_Chip -> USB-C Port This device has the ExternalFacingPort property in ACPI. My quirk relabels the Alpine Ridge chip as "fixed" and external-facing, so that devices attached to the USB-C port could be labeled as "removable" Let's say we have a TigerLake CPU, which has integrated Thunderbolt/USB4 capabilities: TigerLake_ThunderboltCPU -> USB-C Port This device also has the ExternalFacingPort property in ACPI and lacks the usb4-host-interface property in the ACPI. My worry is that someone could take an Alpine Ridge Chip Thunderbolt Dock and attach it to the TigerLake CPU TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock If that were to happen, this quirk would incorrectly label the Alpine Ridge Dock as "fixed" instead of "removable". My thinking was that we could prevent this scenario from occurring if we filtered this quirk not to apply on CPU's like Tiger Lake, with integrated Thunderbolt/USB4 capabilities. ExternalFacingPort is found both on the Comet Lake ACPI and also on the Tiger Lake ACPI. So I can't use that to distinguish between CPUs which don't have integrated Thunderbolt, like Comet Lake, and CPUs with integrated Thunderbolt, like Tiger Lake. I am looking for something that can tell me if the device's Root Port has the Thunderbolt controller upstream to it or not. Is there anything like that? Or perhaps should I add a check which compares the name of the device's CPU with a list of CPUs that this quirk can be applied to? Or is there some way I can identify the Thunderbolt controller, then determine if it's upstream or downstream from the root port? Or are Alpine Ridge docks not something to worry about at all?
On 4/22/2024 14:17, Esther Shimanovich wrote: > Thanks for the explanation! I still don't fully understand how that > would work for my use case. > > Perhaps it would be better for me to describe the case I am trying to > protect against. > > To rehash, this quirk was written for devices with discrete > Thunderbolt controllers. > > For example, > CometLake_CPU -> AlpineRidge_Chip -> USB-C Port > This device has the ExternalFacingPort property in ACPI. > My quirk relabels the Alpine Ridge chip as "fixed" and > external-facing, so that devices attached to the USB-C port could be > labeled as "removable" > > Let's say we have a TigerLake CPU, which has integrated > Thunderbolt/USB4 capabilities: > > TigerLake_ThunderboltCPU -> USB-C Port > This device also has the ExternalFacingPort property in ACPI and lacks > the usb4-host-interface property in the ACPI. > > My worry is that someone could take an Alpine Ridge Chip Thunderbolt > Dock and attach it to the TigerLake CPU > > TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock > > If that were to happen, this quirk would incorrectly label the Alpine > Ridge Dock as "fixed" instead of "removable". > > My thinking was that we could prevent this scenario from occurring if > we filtered this quirk not to apply on CPU's like Tiger Lake, with > integrated Thunderbolt/USB4 capabilities. > > ExternalFacingPort is found both on the Comet Lake ACPI and also on > the Tiger Lake ACPI. So I can't use that to distinguish between CPUs > which don't have integrated Thunderbolt, like Comet Lake, and CPUs > with integrated Thunderbolt, like Tiger Lake. > > I am looking for something that can tell me if the device's Root Port > has the Thunderbolt controller upstream to it or not. > Is there anything like that? > Or perhaps should I add a check which compares the name of the > device's CPU with a list of CPUs that this quirk can be applied to? > Or is there some way I can identify the Thunderbolt controller, then > determine if it's upstream or downstream from the root port? > Or are Alpine Ridge docks not something to worry about at all? My thought was once you have a device as untrusted, everything else connected to it should "also" be untrusted.
On Mon, Apr 22, 2024 at 02:21:18PM -0500, Mario Limonciello wrote: > On 4/22/2024 14:17, Esther Shimanovich wrote: > > Thanks for the explanation! I still don't fully understand how that > > would work for my use case. > > > > Perhaps it would be better for me to describe the case I am trying to > > protect against. > > > > To rehash, this quirk was written for devices with discrete > > Thunderbolt controllers. > > > > For example, > > CometLake_CPU -> AlpineRidge_Chip -> USB-C Port > > This device has the ExternalFacingPort property in ACPI. > > My quirk relabels the Alpine Ridge chip as "fixed" and > > external-facing, so that devices attached to the USB-C port could be > > labeled as "removable" > > > > Let's say we have a TigerLake CPU, which has integrated > > Thunderbolt/USB4 capabilities: > > > > TigerLake_ThunderboltCPU -> USB-C Port > > This device also has the ExternalFacingPort property in ACPI and lacks > > the usb4-host-interface property in the ACPI. > > > > My worry is that someone could take an Alpine Ridge Chip Thunderbolt > > Dock and attach it to the TigerLake CPU > > > > TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock > > > > If that were to happen, this quirk would incorrectly label the Alpine > > Ridge Dock as "fixed" instead of "removable". > > > > My thinking was that we could prevent this scenario from occurring if > > we filtered this quirk not to apply on CPU's like Tiger Lake, with > > integrated Thunderbolt/USB4 capabilities. > > > > ExternalFacingPort is found both on the Comet Lake ACPI and also on > > the Tiger Lake ACPI. So I can't use that to distinguish between CPUs > > which don't have integrated Thunderbolt, like Comet Lake, and CPUs > > with integrated Thunderbolt, like Tiger Lake. > > > > I am looking for something that can tell me if the device's Root Port > > has the Thunderbolt controller upstream to it or not. > > Is there anything like that? > > Or perhaps should I add a check which compares the name of the > > device's CPU with a list of CPUs that this quirk can be applied to? > > Or is there some way I can identify the Thunderbolt controller, then > > determine if it's upstream or downstream from the root port? > > Or are Alpine Ridge docks not something to worry about at all? > > My thought was once you have a device as untrusted, everything else > connected to it should "also" be untrusted. I think what you are looking for is that anything behind a PCIe tunnel should not have this applied. IIRC the AMD GPU or some code there were going to add identification of "virtual" links to the bandwidth calculation functionality. @Mario, do you remember if this was done already and if that could maybe be re-used here? The other way I think is something like this: - If it does not have "usb4-host-interface" property (or behind a port that has that). These are all tunneled (e.g virtual). - It is directly connected to a PCIe root port with "ExternalFacingPort" and it has sibling device that is "Thunderbolt NHI". This is because you can only have "NHI" on a host router according to the USB4 spec. I may be forgetting something though.
On Tue, Apr 23, 2024 at 08:33:12AM +0300, Mika Westerberg wrote: > I think what you are looking for is that anything behind a PCIe tunnel > should not have this applied. IIRC the AMD GPU or some code there were > going to add identification of "virtual" links to the bandwidth > calculation functionality. I guess I could resurrect my correlation patch: https://lore.kernel.org/all/f53ea40a7487e145aa1a62c347cef1814072e140.1536517047.git.lukas@wunner.de/ The last time I forward-ported it was for v5.13. I still have that code running on my development machine. The problem is that it only allows lookup from tb_port to pci_dev. I'd have to add a pointer to struct pci_dev to allow lookups in the inverse direction. Though I think we have such PCI companion devices for CXL as well, so such a pointer could be useful in general. I'm knee-deep in PCI device authentication code but could probably dedicate a weekend to the correlation patch if there's interest? Once we have correlation, we can expose more precise bandwidth for virtual PCI links in sysfs. Thanks, Lukas
On Tue, Apr 23, 2024 at 10:31:30AM +0200, Lukas Wunner wrote: > On Tue, Apr 23, 2024 at 08:33:12AM +0300, Mika Westerberg wrote: > > I think what you are looking for is that anything behind a PCIe tunnel > > should not have this applied. IIRC the AMD GPU or some code there were > > going to add identification of "virtual" links to the bandwidth > > calculation functionality. > > I guess I could resurrect my correlation patch: > > https://lore.kernel.org/all/f53ea40a7487e145aa1a62c347cef1814072e140.1536517047.git.lukas@wunner.de/ > > The last time I forward-ported it was for v5.13. I still have that code > running on my development machine. > > The problem is that it only allows lookup from tb_port to pci_dev. > I'd have to add a pointer to struct pci_dev to allow lookups in the > inverse direction. Though I think we have such PCI companion devices > for CXL as well, so such a pointer could be useful in general. > > I'm knee-deep in PCI device authentication code but could probably > dedicate a weekend to the correlation patch if there's interest? > > Once we have correlation, we can expose more precise bandwidth > for virtual PCI links in sysfs. Sounds good to me :) There are also some additions in USB4 spec that allows discovery of mapping between PCIe adapters and the corresponding PCIe downstream/root port. Perhaps these can be added there too?
On 4/23/2024 00:33, Mika Westerberg wrote: > On Mon, Apr 22, 2024 at 02:21:18PM -0500, Mario Limonciello wrote: >> On 4/22/2024 14:17, Esther Shimanovich wrote: >>> Thanks for the explanation! I still don't fully understand how that >>> would work for my use case. >>> >>> Perhaps it would be better for me to describe the case I am trying to >>> protect against. >>> >>> To rehash, this quirk was written for devices with discrete >>> Thunderbolt controllers. >>> >>> For example, >>> CometLake_CPU -> AlpineRidge_Chip -> USB-C Port >>> This device has the ExternalFacingPort property in ACPI. >>> My quirk relabels the Alpine Ridge chip as "fixed" and >>> external-facing, so that devices attached to the USB-C port could be >>> labeled as "removable" >>> >>> Let's say we have a TigerLake CPU, which has integrated >>> Thunderbolt/USB4 capabilities: >>> >>> TigerLake_ThunderboltCPU -> USB-C Port >>> This device also has the ExternalFacingPort property in ACPI and lacks >>> the usb4-host-interface property in the ACPI. >>> >>> My worry is that someone could take an Alpine Ridge Chip Thunderbolt >>> Dock and attach it to the TigerLake CPU >>> >>> TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock >>> >>> If that were to happen, this quirk would incorrectly label the Alpine >>> Ridge Dock as "fixed" instead of "removable". >>> >>> My thinking was that we could prevent this scenario from occurring if >>> we filtered this quirk not to apply on CPU's like Tiger Lake, with >>> integrated Thunderbolt/USB4 capabilities. >>> >>> ExternalFacingPort is found both on the Comet Lake ACPI and also on >>> the Tiger Lake ACPI. So I can't use that to distinguish between CPUs >>> which don't have integrated Thunderbolt, like Comet Lake, and CPUs >>> with integrated Thunderbolt, like Tiger Lake. >>> >>> I am looking for something that can tell me if the device's Root Port >>> has the Thunderbolt controller upstream to it or not. >>> Is there anything like that? >>> Or perhaps should I add a check which compares the name of the >>> device's CPU with a list of CPUs that this quirk can be applied to? >>> Or is there some way I can identify the Thunderbolt controller, then >>> determine if it's upstream or downstream from the root port? >>> Or are Alpine Ridge docks not something to worry about at all? >> >> My thought was once you have a device as untrusted, everything else >> connected to it should "also" be untrusted. > > I think what you are looking for is that anything behind a PCIe tunnel > should not have this applied. IIRC the AMD GPU or some code there were > going to add identification of "virtual" links to the bandwidth > calculation functionality. > > @Mario, do you remember if this was done already and if that could maybe > be re-used here? Yeah there was a series that I worked on a few spins a while back specifically in the context of eGPUs to identify virtual links and take them out of bandwidth calculations. It didn't get merged, I recall it got stalled on various feedback and I didn't dust it off because the series also did prompt discussions about the reasoning that amdgpu was doing this in the first place. It turned out to be a bad assumption in the code and I instead made a change to amdgpu to not look at the whole topology but just the link partner (466a7d115326ece682c2b60d1c77d1d0b9010b4f if anyone is curious). > > The other way I think is something like this: > > - If it does not have "usb4-host-interface" property (or behind a port > that has that). These are all tunneled (e.g virtual). > > - It is directly connected to a PCIe root port with > "ExternalFacingPort" and it has sibling device that is "Thunderbolt > NHI". This is because you can only have "NHI" on a host router > according to the USB4 spec. > > I may be forgetting something though.
On Tue, Apr 23, 2024 at 11:59:36AM -0500, Mario Limonciello wrote: > On 4/23/2024 00:33, Mika Westerberg wrote: > > On Mon, Apr 22, 2024 at 02:21:18PM -0500, Mario Limonciello wrote: > > > On 4/22/2024 14:17, Esther Shimanovich wrote: > > > > Thanks for the explanation! I still don't fully understand how that > > > > would work for my use case. > > > > > > > > Perhaps it would be better for me to describe the case I am trying to > > > > protect against. > > > > > > > > To rehash, this quirk was written for devices with discrete > > > > Thunderbolt controllers. > > > > > > > > For example, > > > > CometLake_CPU -> AlpineRidge_Chip -> USB-C Port > > > > This device has the ExternalFacingPort property in ACPI. > > > > My quirk relabels the Alpine Ridge chip as "fixed" and > > > > external-facing, so that devices attached to the USB-C port could be > > > > labeled as "removable" > > > > > > > > Let's say we have a TigerLake CPU, which has integrated > > > > Thunderbolt/USB4 capabilities: > > > > > > > > TigerLake_ThunderboltCPU -> USB-C Port > > > > This device also has the ExternalFacingPort property in ACPI and lacks > > > > the usb4-host-interface property in the ACPI. > > > > > > > > My worry is that someone could take an Alpine Ridge Chip Thunderbolt > > > > Dock and attach it to the TigerLake CPU > > > > > > > > TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock > > > > > > > > If that were to happen, this quirk would incorrectly label the Alpine > > > > Ridge Dock as "fixed" instead of "removable". > > > > > > > > My thinking was that we could prevent this scenario from occurring if > > > > we filtered this quirk not to apply on CPU's like Tiger Lake, with > > > > integrated Thunderbolt/USB4 capabilities. > > > > > > > > ExternalFacingPort is found both on the Comet Lake ACPI and also on > > > > the Tiger Lake ACPI. So I can't use that to distinguish between CPUs > > > > which don't have integrated Thunderbolt, like Comet Lake, and CPUs > > > > with integrated Thunderbolt, like Tiger Lake. > > > > > > > > I am looking for something that can tell me if the device's Root Port > > > > has the Thunderbolt controller upstream to it or not. > > > > Is there anything like that? > > > > Or perhaps should I add a check which compares the name of the > > > > device's CPU with a list of CPUs that this quirk can be applied to? > > > > Or is there some way I can identify the Thunderbolt controller, then > > > > determine if it's upstream or downstream from the root port? > > > > Or are Alpine Ridge docks not something to worry about at all? > > > > > > My thought was once you have a device as untrusted, everything else > > > connected to it should "also" be untrusted. > > > > I think what you are looking for is that anything behind a PCIe tunnel > > should not have this applied. IIRC the AMD GPU or some code there were > > going to add identification of "virtual" links to the bandwidth > > calculation functionality. > > > > @Mario, do you remember if this was done already and if that could maybe > > be re-used here? > > Yeah there was a series that I worked on a few spins a while back > specifically in the context of eGPUs to identify virtual links and take them > out of bandwidth calculations. > > It didn't get merged, I recall it got stalled on various feedback and I > didn't dust it off because the series also did prompt discussions about the > reasoning that amdgpu was doing this in the first place. It turned out to > be a bad assumption in the code and I instead made a change to amdgpu to not > look at the whole topology but just the link partner > (466a7d115326ece682c2b60d1c77d1d0b9010b4f if anyone is curious). Okay that makes sense. Thanks!
Thank you for all your help! On Tue, Apr 23, 2024 at 1:33 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > The other way I think is something like this: > > - If it does not have "usb4-host-interface" property (or behind a port > that has that). These are all tunneled (e.g virtual). > > - It is directly connected to a PCIe root port with > "ExternalFacingPort" and it has sibling device that is "Thunderbolt > NHI". This is because you can only have "NHI" on a host router > according to the USB4 spec. > I did find one example of a docking station that uses the DSL6540 chip, which has PCI IDs defined in include/linux/pci_ids.h: #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI 0x1577 #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE 0x1578 It seems like it has an NHI, despite being in an external, removable docking station. This appears to contradict what you say about only having "NHI" on a host router. I am assuming that by host router, you mean the fixed discrete, fixed thunderbolt chip, or the thunderbolt controller upstream to the root port. Please correct me if I got anything wrong! Looking at 18-241_ThunderboltController_Brief_HI.pdf, it seems like these Alpine Ridge chips can be used either on a computer or a peripheral. (Expected usage: Computer or peripheral) So I'm not sure if finding an NHI would guarantee that the device is not a peripheral. My original question was how to distinguish a Thunderbolt controller that is on a removable peripheral, like a docking station-- from one that is a discrete chip fixed to a computer or upstream to the root port. So unless I am misunderstanding something, it appears that my only option is waiting for Lukas's patches. Please correct me if that is not the case!
Hi, On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote: > Thank you for all your help! > > On Tue, Apr 23, 2024 at 1:33 AM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > > > The other way I think is something like this: > > > > - If it does not have "usb4-host-interface" property (or behind a port > > that has that). These are all tunneled (e.g virtual). > > > > - It is directly connected to a PCIe root port with > > "ExternalFacingPort" and it has sibling device that is "Thunderbolt > > NHI". This is because you can only have "NHI" on a host router > > according to the USB4 spec. > > > I did find one example of a docking station that uses the DSL6540 > chip, which has PCI IDs defined in include/linux/pci_ids.h: > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI 0x1577 > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE 0x1578 > It seems like it has an NHI, despite being in an external, removable > docking station. This appears to contradict what you say about only > having "NHI" on a host router. I am assuming that by host router, you > mean the fixed discrete, fixed thunderbolt chip, or the thunderbolt > controller upstream to the root port. Please correct me if I got > anything wrong! So it goes same way with other discrete chips from Intel at least. It is the same silicon but the NHI is disabled on device routers. That said, it is entirely possible for a "malicious" device to pretend to have one so we need to be careful. > Looking at 18-241_ThunderboltController_Brief_HI.pdf, it seems like > these Alpine Ridge chips can be used either on a computer or a > peripheral. (Expected usage: Computer or peripheral) Yes as above. Most our chips are such. > So I'm not sure if finding an NHI would guarantee that the device is > not a peripheral. My original question was how to distinguish a > Thunderbolt controller that is on a removable peripheral, like a > docking station-- from one that is a discrete chip fixed to a computer > or upstream to the root port. Yes that's the problem. We can figure that out from full USB4 system by looking at the "usb4-host-interface" ACPI _DSD properties of the tunneled PCIe Root/Downstream ports. But this does not work with the pre-USB4 hosts so that requires some sort of heuristics, unfortunately. > So unless I am misunderstanding something, it appears that my only > option is waiting for Lukas's patches. Please correct me if that is > not the case! I think with Lukas' patches (Lukas please correct me if I got that wrong) you can find out the PCIe ports which have their link going over a tunnel. His series works with pre-USB4 devices so that should cover your case. (In addition to "usb4-host-interface" ACPI _DSD property there is a special router operation that allows extracting the same PCIe downstream port mapping that Lukas' patches are doing from DROM so this should also allow identify all tunneled links.) This way you can identify the xHCI (well and NHI) that are not behind PCIe tunnel so that should mean they are really part of the host system (being soldered or plugged to the PCIe slot or so). If I understood right this is what you were looking for, correct?
> This way you can identify the xHCI (well and NHI) that are not behind > PCIe tunnel so that should mean they are really part of the host system > (being soldered or plugged to the PCIe slot or so). If I understood > right this is what you were looking for, correct? Yes! Thank you for the explanation.
On Fri, Apr 26, 2024 at 07:52:07AM +0300, Mika Westerberg wrote: > On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote: > > I did find one example of a docking station that uses the DSL6540 > > chip, which has PCI IDs defined in include/linux/pci_ids.h: > > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI 0x1577 > > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE 0x1578 > > It seems like it has an NHI, despite being in an external, removable > > docking station. This appears to contradict what you say about only > > having "NHI" on a host router. I am assuming that by host router, you > > mean the fixed discrete, fixed thunderbolt chip, or the thunderbolt > > controller upstream to the root port. Please correct me if I got > > anything wrong! > > So it goes same way with other discrete chips from Intel at least. It is > the same silicon but the NHI is disabled on device routers. > > That said, it is entirely possible for a "malicious" device to pretend > to have one so we need to be careful. If a device (accidentally or maliciously) exposes an NHI, the thunderbolt driver will try to bind to it. Do we take any precautions to prevent that? AFAICS we'd be allocating a duplicate root_switch with route 0. Seems dangerous if two driver instances talk to the same Root Switch. This looks like something that needs to be checked by Intel Validation Engineering folks. Have they been testing this? (+cc Gil) Thanks, Lukas
On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote: > So unless I am misunderstanding something, it appears that my only > option is waiting for Lukas's patches. I've just pushed this branch: https://github.com/l1k/linux/commits/thunderbolt_associate_v1 But it's not even compile-tested yet. 0-day is crunching through it now and I'll force-push if/when fixing bugs. It also doesn't contain support yet for the "Get PCIe Downstream Entry" functionality that's apparently needed for USB4. The branch marks PCIe Downstream Adapters on a Root Switch as fixed and external_facing. However that doesn't appear to be sufficient: I notice that in your patch, you're also clearing the external_facing bit on the Root Port above the discrete host controller. Additionally you're marking the bridges leading to the NHI and XHCI as fixed. You're also marking the NHI and XHCI themselves as fixed and external_facing. I just want to confirm whether all of this is actually necessary. At least marking the NHI and XHCI as external_facing seems nonsensical. I need to know the *minimal* set of attribute changes to fix the problem. I also need to know exactly what the user-visible issue is and how it comes about. Your commit message merely says "the build-in USB-C ports stop working at all". Does that mean that no driver is bound to the NHI or XHCI? The solution implemented by the above-linked branch hinges on the NHI driver fixing up the device attributes. But if the NHI driver is not bound, it can't fix up the attributes. I notice that commit 86eaf4a5b431 ("thunderbolt: Make iommu_dma_protection more accurate") bemoans the inability to determine external_facing ports clearly. The above-linked branch may solve that, so it seems there may be overlap between the issue discussed here and the one that the commit sought to solve. Thanks, Lukas
On Sat, Apr 27, 2024 at 07:35:33AM +0200, Lukas Wunner wrote: > On Fri, Apr 26, 2024 at 07:52:07AM +0300, Mika Westerberg wrote: > > On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote: > > > I did find one example of a docking station that uses the DSL6540 > > > chip, which has PCI IDs defined in include/linux/pci_ids.h: > > > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI 0x1577 > > > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE 0x1578 > > > It seems like it has an NHI, despite being in an external, removable > > > docking station. This appears to contradict what you say about only > > > having "NHI" on a host router. I am assuming that by host router, you > > > mean the fixed discrete, fixed thunderbolt chip, or the thunderbolt > > > controller upstream to the root port. Please correct me if I got > > > anything wrong! > > > > So it goes same way with other discrete chips from Intel at least. It is > > the same silicon but the NHI is disabled on device routers. > > > > That said, it is entirely possible for a "malicious" device to pretend > > to have one so we need to be careful. > > If a device (accidentally or maliciously) exposes an NHI, the thunderbolt > driver will try to bind to it. > > Do we take any precautions to prevent that? Not at the moment but it will be behind the IOMMU so it cannot access any other memory that what is reserved for it. > AFAICS we'd be allocating a duplicate root_switch with route 0. > Seems dangerous if two driver instances talk to the same Root Switch. I don't think it even works because it cannot have topology ID of 0 if it is a device router which it is in this case since it is being first enumerated by the "real" host. That said we have not tested this either so can be 100% sure.
On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote: > I did find one example of a docking station that uses the DSL6540 > chip, which has PCI IDs defined in include/linux/pci_ids.h: > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI 0x1577 > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE 0x1578 > It seems like it has an NHI, despite being in an external, removable > docking station. Could you provide full output of dmesg and lspci -vvv of a machine with this docking station attached? Perhaps open a bug at bugzilla.kernel.org and attach it there? Could you then try the below patch and see if it prevents the Thunderbolt driver from binding to the hot-plugged device? Thanks! -- >8 -- From a10a294a650232c16447a43c2b591f34d3cb399f Mon Sep 17 00:00:00 2001 From: Lukas Wunner <lukas@wunner.de> Date: Sat, 27 Apr 2024 16:24:18 +0200 Subject: [PATCH] thunderbolt: Do not bind to NHI exposed by a hot-plugged device An NHI should only be exposed by Thunderbolt host controllers, not by hot-plugged devices. Avoid binding to an NHI erroneously or maliciously exposed by a device by looking at the upstream port of its switch: On a host controller, the upstream port is of type TB_TYPE_NHI, whereas on hot-plugged devices it is of type TB_TYPE_PORT. Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org --- drivers/thunderbolt/tb.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c index e00e62de53f3..d95ff9ed4d96 100644 --- a/drivers/thunderbolt/tb.c +++ b/drivers/thunderbolt/tb.c @@ -2786,6 +2786,13 @@ static int tb_start(struct tb *tb, bool reset) if (IS_ERR(tb->root_switch)) return PTR_ERR(tb->root_switch); + /* NHI erroneously exposed by a hot-plugged device? */ + if (!tb_port_is_nhi(tb_upstream_port(tb->root_switch))) { + tb_err(tb, "not a host controller\n"); + tb_switch_put(tb->root_switch); + return -ENODEV; + } + /* * ICM firmware upgrade needs running firmware and in native * mode that is not available so disable firmware upgrade of the
On Sat, Apr 27, 2024 at 3:17 AM Lukas Wunner <lukas@wunner.de> wrote: > > However that doesn't appear to be sufficient: I notice that in your > patch, you're also clearing the external_facing bit on the Root Port > above the discrete host controller. Rajat (rajatja@google.com) in an internal review had suggested I add that, and leave it up to kernel maintainers to decide if it's strictly necessary. > Additionally you're marking the bridges leading to the NHI and XHCI > as fixed. You're also marking the NHI and XHCI themselves as fixed > and external_facing. > > I just want to confirm whether all of this is actually necessary. > At least marking the NHI and XHCI as external_facing seems nonsensical. > I need to know the *minimal* set of attribute changes to fix the problem. Labeling the NHI and XHCI external-facing was my mistake as I got the ASCII diagram wrong in the beginning. (NHI and XHCI should not be shown as connected to the USB-C port). If you look at my latest draft of the patch (included in one of my messages in this email chain), you will see that I ended up removing that mistake. https://lore.kernel.org/lkml/CA+Y6NJGz6f8hE4kRDUTGgCj+jddUyHeD_8ocFBkARr7w90jmBw@mail.gmail.com/ I labeled the bridges leading to the NHI and XHCI as fixed because they are technically part of the discrete chip, and fixed. I do understand that if they were labeled as “removable”, that *might* not affect anything even though that is not technically true. Happy to leave that decision up to what you think makes more sense. > I also need to know exactly what the user-visible issue is and how > it comes about. Your commit message merely says "the build-in USB-C > ports stop working at all". Does that mean that no driver is bound > to the NHI or XHCI? That is correct, when the user-visible issue occurs, no driver is bound to the NHI and XHCI. The discrete JHL chip is not permitted to attach to the external-facing root port because of the security policy, so the NHI and XHCI are not seen by the computer. When the user connects a non-thunderbolt peripheral to the USB-C port that is downstream from the JHL chip, nothing happens in the logs, and the computer does not see the peripheral. However, power chargers still are able to charge the device through that port. > The solution implemented by the above-linked branch hinges on the > NHI driver fixing up the device attributes. But if the NHI driver > is not bound, it can't fix up the attributes. I could try to experiment with your patch, see what happens, and if I can get around that. On Sat, Apr 27, 2024 at 11:09 AM Lukas Wunner <lukas@wunner.de> wrote: > > On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote: > > I did find one example of a docking station that uses the DSL6540 > > chip, which has PCI IDs defined in include/linux/pci_ids.h: > > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI 0x1577 > > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE 0x1578 > > It seems like it has an NHI, despite being in an external, removable > > docking station. > > Could you provide full output of dmesg and lspci -vvv of a machine > with this docking station attached? > > Perhaps open a bug at bugzilla.kernel.org and attach it there? > I don’t have this device available at my office. I just saw that StarTech sells a universal laptop docking station with chipset-id Intel - Alpine Ridge DSL6540. Then I looked up the device, and found it here: https://linux-hardware.org/?id=pci:8086-1577-8086-0000 Therefore, I concluded that the DSL6540 has an NHI component. If these logs are important, I could probably make a case to purchase that docking station and get the info that you need. Please let me know! > Could you then try the below patch and see if it prevents the > Thunderbolt driver from binding to the hot-plugged device? I could give it a try. Thank you!
Hi, On Wed, May 01, 2024 at 06:23:28PM -0400, Esther Shimanovich wrote: > I don’t have this device available at my office. I just saw that > StarTech sells a universal laptop docking station with chipset-id > Intel - Alpine Ridge DSL6540. Then I looked up the device, and found > it here: https://linux-hardware.org/?id=pci:8086-1577-8086-0000 > > Therefore, I concluded that the DSL6540 has an NHI component. Okay understood. Yes Alpine Ridge can be both host and device router. In device configuration such as the above it does not expose NHI. If it is host as in the above list you shared then it includes one.
On 5/1/2024 23:38, Mika Westerberg wrote: > Hi, > > On Wed, May 01, 2024 at 06:23:28PM -0400, Esther Shimanovich wrote: >> I don’t have this device available at my office. I just saw that >> StarTech sells a universal laptop docking station with chipset-id >> Intel - Alpine Ridge DSL6540. Then I looked up the device, and found >> it here: https://linux-hardware.org/?id=pci:8086-1577-8086-0000 >> >> Therefore, I concluded that the DSL6540 has an NHI component. > > Okay understood. Yes Alpine Ridge can be both host and device router. In > device configuration such as the above it does not expose NHI. If it is > host as in the above list you shared then it includes one. There are different PCI IDs for AR for host vs device though, right? But I guess that could technically be spoofed. Is there a fixed PCI ID for the RP used to tunnel for host AR? If so you could special case that anything connected to that PCI ID for RP used to tunnel isn't trusted.
On Thu, May 02, 2024 at 04:54:56AM -0500, Mario Limonciello wrote: > On 5/1/2024 23:38, Mika Westerberg wrote: > > Hi, > > > > On Wed, May 01, 2024 at 06:23:28PM -0400, Esther Shimanovich wrote: > > > I don’t have this device available at my office. I just saw that > > > StarTech sells a universal laptop docking station with chipset-id > > > Intel - Alpine Ridge DSL6540. Then I looked up the device, and found > > > it here: https://linux-hardware.org/?id=pci:8086-1577-8086-0000 > > > > > > Therefore, I concluded that the DSL6540 has an NHI component. > > > > Okay understood. Yes Alpine Ridge can be both host and device router. In > > device configuration such as the above it does not expose NHI. If it is > > host as in the above list you shared then it includes one. > > There are different PCI IDs for AR for host vs device though, right? No they are the same. Well I think Titan Ridge has different for device. Here's my system with AR host and AR dock: 06:00.0 0604: 8086:15d3 (rev 02) 07:00.0 0604: 8086:15d3 (rev 02) 07:01.0 0604: 8086:15d3 (rev 02) 07:02.0 0604: 8086:15d3 (rev 02) 07:04.0 0604: 8086:15d3 (rev 02) 08:00.0 0880: 8086:15d2 (rev 02) 09:00.0 0604: 8086:15d3 (rev 02) 0a:00.0 0604: 8086:15d3 (rev 02) 0a:01.0 0604: 8086:15d3 (rev 02) 0a:02.0 0604: 8086:15d3 (rev 02) 0a:03.0 0604: 8086:15d3 (rev 02) 0a:04.0 0604: 8086:15d3 (rev 02) > But I guess that could technically be spoofed. > Is there a fixed PCI ID for the RP used to tunnel for host AR? No, I don't think so. The Root Port can be anything even non-Intel. > If so you could special case that anything connected to that PCI ID for RP > used to tunnel isn't trusted. None of the PCIe tunnels are trusted. We have IOMMU already in place that blocks accesses outside. Also it is not possible to have anoter host router in a domain because: * The connection manager in the real host router needs to enumerate the device router before any tunnels can be established. * Once it does that it has TopologyID > 0 so it is not a host router and cannot receive packets with CM bit set in the route string. * If it still exposes a PCIe endpoint with NHI that is behind a PCIe tunnel from the host router PCIe downstream port so it is behind the full IOMMU mappings (as these ports are "external_facing"). Yes if there is such PCIe device the Thunderbolt driver may attach to it but I don't see what the harm would be since it cannot really affect the topology (except maybe trying to crash the driver by sending unexpected replies, or so).
On Wed, May 01, 2024 at 06:23:28PM -0400, Esther Shimanovich wrote: > On Sat, Apr 27, 2024 at 3:17AM Lukas Wunner <lukas@wunner.de> wrote: > That is correct, when the user-visible issue occurs, no driver is > bound to the NHI and XHCI. The discrete JHL chip is not permitted to > attach to the external-facing root port because of the security > policy, so the NHI and XHCI are not seen by the computer. Could you rework your patch to only rectify the NHI's and XHCI's device properties and leave the bridges untouched? The thunderbolt driver will then rectify the bridge's properties using the patches on this branch (particularly the one named "thunderbolt: Mark PCIe Adapters on Root Switch as non-removable"): https://github.com/l1k/linux/commits/thunderbolt_associate_v1 This approach keeps most of the code in the thunderbolt driver (which has a very clear picture which PCI bridges belong to the Host Router and which to Device Routers). The footprint in the PCI core is thus kept minimal, which increases upstream acceptability of your patch. You can match the NHI using DECLARE_PCI_FIXUP_CLASS_FINAL(): * Search for PCI_CLASS_SERIAL_USB_USB4 with class shift 0 to match a USB4 Host Interface from any vendor. * Seach for PCI_CLASS_SYSTEM_OTHER with class shift 8 to match a Thunderbolt 1 to 3 Host Interface. I recommend checking the is_thunderbolt bit on those devices to avoid matching a non-NHI. Then fixup the device properties of the NHI so that it can bind. To also rectify the properties of the XHCI, you'd have to use pci_upstream_bridge() to find the Downstream Port above, check whether that's non-NULL. The bus on which the Downstream Port resides is pdev->bus. On all Host Routers I know, the XHCI is below slot 02.0 on that bus, so you could use pci_get_slot() to find that Downstream Port, then use pci_get_slot() again to find slot 00.0 on that bridge's subordinate bus. If that device has class PCI_CLASS_SERIAL_USB_XHCI, you've found the XHCI and can rectify its properties. Device references acquired with pci_get_*() need to be returned with pci_dev_put(). The quirk should be #ifdef'ed to CONFIG_ACPI. Alternatively, it could be declared in pci-acpi.c near pci_acpi_set_external_facing(). > > However that doesn't appear to be sufficient: I notice that in your > > patch, you're also clearing the external_facing bit on the Root Port > > above the discrete host controller. > > Rajat (rajatja@google.com) in an internal review had suggested I add > that, and leave it up to kernel maintainers to decide if it's strictly > necessary. I'd recommend to leave the Root Port's properties untouched unless that's necessary. > I don???t have this device available at my office. I just saw that > StarTech sells a universal laptop docking station with chipset-id > Intel - Alpine Ridge DSL6540. Then I looked up the device, and found > it here: https://linux-hardware.org/?id=pci:8086-1577-8086-0000 > > Therefore, I concluded that the DSL6540 has an NHI component. > > If these logs are important, I could probably make a case to purchase > that docking station and get the info that you need. Please let me > know! Never mind, this scenario is being tested internally at Intel and the above-linked branch contains a commit to avoid binding to a Host Interface exposed by a Device Router. Thanks, Lukas
Hi, On Wed, May 08, 2024 at 07:14:37AM +0200, Lukas Wunner wrote: > On Wed, May 01, 2024 at 06:23:28PM -0400, Esther Shimanovich wrote: > > On Sat, Apr 27, 2024 at 3:17AM Lukas Wunner <lukas@wunner.de> wrote: > > That is correct, when the user-visible issue occurs, no driver is > > bound to the NHI and XHCI. The discrete JHL chip is not permitted to > > attach to the external-facing root port because of the security > > policy, so the NHI and XHCI are not seen by the computer. > > Could you rework your patch to only rectify the NHI's and XHCI's > device properties and leave the bridges untouched? As an alternative, I also spent some time to figure out whether there is a way to detect the integrated Thunderbolt PCIe Root Ports and turns out it is not "impossible" at least :) Basically it is a list of Ice Lake and Tiger Lake Thunderbolt PCIe Root Ports. Everything after this is using the "usb4-host-interface" device property. I went a head and did a patch that, instead of relabeling, sets the "untrusted" and "removable" flags based on this and some heuristics (to figure out the discrete controller) directly on the source. I did some testing over the hardware I have here and it sets the flags like this: - Everything below integrated Thunderbolt/USB4 PCIe root ports are marked as "untrusted" and "removable", this includes the Ice Lake and Tiger Lake ones. Whereas the NHI and xHCI here are untouched. - Everything below discrete Thunderbolt/USB4 host controller PCIe downstream ports that are behind a PCIe Root Port with "external_facing" set are marked as "untrusted" and "removable" whereas endpoints are still "trusted" and "fixed". I'm sharing the code below. @Esther, you may use it as you like, parts of it or just ignore the whole thing completely. diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 1325fbae2f28..38bc80c931d6 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1612,6 +1612,127 @@ static void set_pcie_thunderbolt(struct pci_dev *dev) dev->is_thunderbolt = 1; } +static bool pcie_switch_directly_under(struct pci_dev *bridge, + struct pci_dev *parent, + struct pci_dev *pdev) +{ + u64 serial, upstream_serial; + + /* + * Check the type of the device to make sure it is part of a PCIe + * switch and if it is try to match the serial numbers too with + * the assumption that they all share the same serial number. + */ + switch (pci_pcie_type(pdev)) { + case PCI_EXP_TYPE_UPSTREAM: + if (parent == bridge) + return pci_get_dsn(pdev) != 0; + break; + + case PCI_EXP_TYPE_DOWNSTREAM: + if (pci_pcie_type(parent) == PCI_EXP_TYPE_UPSTREAM) { + upstream_serial = pci_get_dsn(parent); + if (!upstream_serial) + return false; + serial = pci_get_dsn(pdev); + if (!serial) + return false; + if (serial != upstream_serial) + return false; + parent = pci_upstream_bridge(parent); + if (parent == bridge) + return true; + } + break; + + case PCI_EXP_TYPE_ENDPOINT: + if (pci_pcie_type(parent) == PCI_EXP_TYPE_DOWNSTREAM) { + serial = pci_get_dsn(parent); + if (!serial) + return false; + parent = pci_upstream_bridge(parent); + if (parent && pci_pcie_type(parent) == PCI_EXP_TYPE_UPSTREAM) { + upstream_serial = pci_get_dsn(parent); + if (!upstream_serial) + return false; + if (serial != upstream_serial) + return false; + parent = pci_upstream_bridge(parent); + if (parent == bridge) + return true; + } + } + break; + } + + return false; +} + +static bool pcie_has_usb4_host_interface(struct pci_dev *pdev) +{ + struct fwnode_handle *fwnode; + + /* + * For USB4 the tunneled PCIe root or downstream ports are marked with + * the "usb4-host-interface" property so we look for that first. This + * should cover the most cases. + */ + fwnode = fwnode_find_reference(dev_fwnode(&pdev->dev), + "usb4-host-interface", 0); + if (!IS_ERR(fwnode)) { + fwnode_handle_put(fwnode); + return true; + } + + /* + * Any integrated Thunderbolt 3/4 PCIe root ports from Intel + * before Alder Lake do not have the above device property so we + * use their PCI IDs instead. All these are tunneled. This list + * is not expected to grow. + */ + if (pdev->vendor == PCI_VENDOR_ID_INTEL) { + switch (pdev->device) { + /* Ice Lake Thunderbolt 3 PCIe Root Ports */ + case 0x8a1d: + case 0x8a1f: + case 0x8a21: + case 0x8a23: + /* Tiger Lake-LP Thunderbolt 4 PCIe Root Ports */ + case 0x9a23: + case 0x9a25: + case 0x9a27: + case 0x9a29: + /* Tiger Lake-H Thunderbolt 4 PCIe Root Ports */ + case 0x9a2b: + case 0x9a2d: + case 0x9a2f: + case 0x9a31: + return true; + } + } + + return false; +} + +/* root->external_facing is true, parent != NULL */ +static bool pcie_is_tunneled(struct pci_dev *root, struct pci_dev *parent, + struct pci_dev *pdev) +{ + /* Anything directly behind a "usb4-host-interface" is tunneled */ + if (pcie_has_usb4_host_interface(parent)) + return true; + + /* + * Check if this is a discrete Thunderbolt/USB4 controller that is + * directly behind a PCIe Root Port marked as "ExternalFacingPort". + * These are not behind a PCIe tunnel. + */ + if (pcie_switch_directly_under(root, parent, pdev)) + return false; + + return true; +} + static void set_pcie_untrusted(struct pci_dev *dev) { struct pci_dev *parent; @@ -1621,8 +1742,32 @@ static void set_pcie_untrusted(struct pci_dev *dev) * untrusted as well. */ parent = pci_upstream_bridge(dev); - if (parent && (parent->untrusted || parent->external_facing)) - dev->untrusted = true; + if (parent) { + struct pci_dev *root; + + /* If parent is untrusted so are we */ + if (parent->untrusted) { + pci_dbg(dev, "marking as untrusted\n"); + dev->untrusted = true; + return; + } + + root = pcie_find_root_port(dev); + if (root && root->external_facing) { + /* + * Only PCIe root ports can be marked as + * "ExternalFacingPort", However, in case of a + * discrete Thunderbolt/USB4 controller only its + * downstream facing ports are actually + * something that are exposed to the wild so we + * only mark devices behind those as untrusted. + */ + if (pcie_is_tunneled(root, parent, dev)) { + pci_dbg(dev, "marking as untrusted\n"); + dev->untrusted = true; + } + } + } } static void pci_set_removable(struct pci_dev *dev) @@ -1639,10 +1784,15 @@ static void pci_set_removable(struct pci_dev *dev) * the port is marked with external_facing, such devices are less * accessible to user / may not be removed by end user, and thus not * exposed as "removable" to userspace. + * + * These are the same devices marked as untrusted by the above + * function. The ports and endpoints part of the discrete + * Thunderbolt/USB4 controller are not marked as removable. */ - if (parent && - (parent->external_facing || dev_is_removable(&parent->dev))) + if (dev->untrusted || (parent && dev_is_removable(&parent->dev))) { + pci_dbg(dev, "marking as removable\n"); dev_set_removable(&dev->dev, DEVICE_REMOVABLE); + } } /**
Thank you Lukas and Mika! This is very useful and helpful! I am setting up two alternative builds with both of your suggested approaches and will test on devices once I get back into the office, hopefully around next week. > + /* > + * Any integrated Thunderbolt 3/4 PCIe root ports from Intel > + * before Alder Lake do not have the above device property so we > + * use their PCI IDs instead. All these are tunneled. This list > + * is not expected to grow. > + */ > + if (pdev->vendor == PCI_VENDOR_ID_INTEL) { > + switch (pdev->device) { > + /* Ice Lake Thunderbolt 3 PCIe Root Ports */ > + case 0x8a1d: > + case 0x8a1f: > + case 0x8a21: > + case 0x8a23: > + /* Tiger Lake-LP Thunderbolt 4 PCIe Root Ports */ > + case 0x9a23: > + case 0x9a25: > + case 0x9a27: > + case 0x9a29: > + /* Tiger Lake-H Thunderbolt 4 PCIe Root Ports */ > + case 0x9a2b: > + case 0x9a2d: > + case 0x9a2f: > + case 0x9a31: > + return true; > + } > + } > + Something I noticed is that the list of root ports you have there does not include [8086:02b4] or [8086:9db4], the Comet Lake and Whiskey/Cannon Point root ports that I saw on the laptops I tested on. Those laptops do not have the usb4-host-interface property. This makes me think that the patch won't work as is. Then I queried for up all the root ports on all of our devices that are confirmed to be affected by this bug. Here they are as a reference: Cannon Point (6 devices in our lab with different combos of these root ports) 9db8 #1 9dbc #5 9dbe: #7 9dbf #8 9db0 #9 9db4 #13 Comet Lake (7 devices in our lab with different combos of these root ports) 02b8 #1 02bc #5 02b0 #9 06b0 #9 (one device had this variation of #9) 02b4 #13 Tiger Lake (1 device in our lab) a0bf #8 (the root port's device id is different from all the ones on your list) After first trying your fix as-is, I can vary the list to add all these root port devices to see if that has an effect. It seems like this list might have to be longer than it currently is though.
Hi, On Fri, May 10, 2024 at 11:44:12AM -0400, Esther Shimanovich wrote: > Thank you Lukas and Mika! > This is very useful and helpful! > I am setting up two alternative builds with both of your suggested > approaches and will test on devices once I get back into the office, > hopefully around next week. > > > + /* > > + * Any integrated Thunderbolt 3/4 PCIe root ports from Intel > > + * before Alder Lake do not have the above device property so we > > + * use their PCI IDs instead. All these are tunneled. This list > > + * is not expected to grow. > > + */ > > + if (pdev->vendor == PCI_VENDOR_ID_INTEL) { > > + switch (pdev->device) { > > + /* Ice Lake Thunderbolt 3 PCIe Root Ports */ > > + case 0x8a1d: > > + case 0x8a1f: > > + case 0x8a21: > > + case 0x8a23: > > + /* Tiger Lake-LP Thunderbolt 4 PCIe Root Ports */ > > + case 0x9a23: > > + case 0x9a25: > > + case 0x9a27: > > + case 0x9a29: > > + /* Tiger Lake-H Thunderbolt 4 PCIe Root Ports */ > > + case 0x9a2b: > > + case 0x9a2d: > > + case 0x9a2f: > > + case 0x9a31: > > + return true; > > + } > > + } > > + > > Something I noticed is that the list of root ports you have there does > not include [8086:02b4] or [8086:9db4], the Comet Lake and > Whiskey/Cannon Point root ports that I saw on the laptops I tested on. > Those laptops do not have the usb4-host-interface property. This makes > me think that the patch won't work as is. They are not integrated Thunderbolt PCIe root ports. They should be "matched" with the second "rule" that looks for a discrete controller directly behind an ExternalFacing PCIe root port.
On Sat, May 11, 2024 at 07:38:32AM +0300, Mika Westerberg wrote:
> They are not integrated Thunderbolt PCIe root ports.
For the clarity, Intel integrated Thunderbolt 3 controller first in Ice
Lake, then Thunderbolt 4 controller in Tiger Lake and forward (Alder
Lake, Raptor Lake, Meteor Lake). Anything else, including Comet Lake and
the like are using discrete controllers such as Alpine Ridge, Titan
Ridge (both Thunderbolt 3) and Maple Ridge (Thunderbolt 4), and Barlow
Ridge (Thunderbolt 5) where the controller is either soldered on the
motherboard or connected to a PCIe slot.
Sorry for not opening this up earlier.
There are some combinations where the integrated controller is disabled
and a discrete one is being used but again that should match the second
"rule".
I tried both patches! Build with Lukas's commits: On Wed, May 8, 2024 at 1:23 AM Lukas Wunner <lukas@wunner.de> wrote: > > On Wed, May 01, 2024 at 06:23:28PM -0400, Esther Shimanovich wrote: > > On Sat, Apr 27, 2024 at 3:17AM Lukas Wunner <lukas@wunner.de> wrote: > > That is correct, when the user-visible issue occurs, no driver is > > bound to the NHI and XHCI. The discrete JHL chip is not permitted to > > attach to the external-facing root port because of the security > > policy, so the NHI and XHCI are not seen by the computer. > > Could you rework your patch to only rectify the NHI's and XHCI's > device properties and leave the bridges untouched? So I tried a build with that patch, but it never reached the tb_pci_fixup function, even when NHI and XHCI were both labeled as fixed and external facing in the quirk. Also, I don't see where you distinguish between an integrated Thunderbolt PCIe root port and a root port with no thunderbolt functionality built in. Could you point that out to me? I'm not sure how your patch protects against the following case scenario I described earlier: > Let's say we have a TigerLake CPU, which has integrated > Thunderbolt/USB4 capabilities: > > TigerLake_ThunderboltCPU -> USB-C Port > This device also has the ExternalFacingPort property in ACPI and lacks > the usb4-host-interface property in the ACPI. > > My worry is that someone could take an Alpine Ridge Chip Thunderbolt > Dock and attach it to the TigerLake CPU > > TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock > > If that were to happen, this quirk would incorrectly label the Alpine > Ridge Dock as "fixed" instead of "removable". Build with Mika's Patch: On Sat, May 11, 2024 at 1:43 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > On Sat, May 11, 2024 at 07:38:32AM +0300, Mika Westerberg wrote: > > They are not integrated Thunderbolt PCIe root ports.m > > For the clarity, Intel integrated Thunderbolt 3 controller first in Ice > Lake, then Thunderbolt 4 controller in Tiger Lake and forward (Alder > Lake, Raptor Lake, Meteor Lake). Anything else, including Comet Lake and > the like are using discrete controllers such as Alpine Ridge, Titan > Ridge (both Thunderbolt 3) and Maple Ridge (Thunderbolt 4), and Barlow > Ridge (Thunderbolt 5) where the controller is either soldered on the > motherboard or connected to a PCIe slot. Thanks for the explanation! This patch worked smoothly on the device I tried. I'd be happy to go forward with this patch, and test it on more devices. Is that fine? Or should I try something else on the build I made with Lukas's commits?
On Wed, May 15, 2024 at 02:53:54PM -0400, Esther Shimanovich wrote: > On Wed, May 8, 2024 at 1:23???AM Lukas Wunner <lukas@wunner.de> wrote: > > On Wed, May 01, 2024 at 06:23:28PM -0400, Esther Shimanovich wrote: > > > On Sat, Apr 27, 2024 at 3:17AM Lukas Wunner <lukas@wunner.de> wrote: > > > That is correct, when the user-visible issue occurs, no driver is > > > bound to the NHI and XHCI. The discrete JHL chip is not permitted to > > > attach to the external-facing root port because of the security > > > policy, so the NHI and XHCI are not seen by the computer. > > > > Could you rework your patch to only rectify the NHI's and XHCI's > > device properties and leave the bridges untouched? > > So I tried a build with that patch, but it never reached the > tb_pci_fixup function That means that for some reason, the PCI devices are not associated with the Thunderbolt ports. Could you add this to the command line: thunderbolt.dyndbg ignore_loglevel log_buf_len=10M and this to your kernel config: CONFIG_DYNAMIC_DEBUG=y You should see "... is associated with ..." messages in dmesg. This did work for Mika during his testing with recent Thunderbolt chips. I amended the patches after his testing but wouldn't expect that to cause issues. @Mika, would you mind re-testing if you've got cycles to spare? > even when NHI and XHCI were both labeled as > fixed and external facing in the quirk. Setting the two as fixed and trusted should be sufficient. The external_facing bit should not be needed on the NHI and XHCI. > Also, I don't see where you distinguish between an integrated > Thunderbolt PCIe root port and a root port with no thunderbolt > functionality built in. Could you point that out to me? Hm, why would I have to distinguish between the two? I distinguish between Thunderbolt PCIe Adapters on the root switch and ones on non-root switches. The latter are attached Device Routers, the former is the Host Router. I just set the ones on the former to external_facing, fixed and trusted. Everything downstream is untrusted and removable. > I'm not sure how your patch protects against the following case > scenario I described earlier: > > Let's say we have a TigerLake CPU, which has integrated > > Thunderbolt/USB4 capabilities: > > > > TigerLake_ThunderboltCPU -> USB-C Port > > This device also has the ExternalFacingPort property in ACPI and lacks > > the usb4-host-interface property in the ACPI. > > > > My worry is that someone could take an Alpine Ridge Chip Thunderbolt > > Dock and attach it to the TigerLake CPU > > > > TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock > > > > If that were to happen, this quirk would incorrectly label the Alpine > > Ridge Dock as "fixed" instead of "removable". See above, the Alpine Ridge Dock is never the root switch. The Tiger Lake CPU is. Thanks, Lukas
On Wed, May 15, 2024 at 10:35:22PM +0200, Lukas Wunner wrote: > On Wed, May 15, 2024 at 02:53:54PM -0400, Esther Shimanovich wrote: > > On Wed, May 8, 2024 at 1:23???AM Lukas Wunner <lukas@wunner.de> wrote: > > > On Wed, May 01, 2024 at 06:23:28PM -0400, Esther Shimanovich wrote: > > > > On Sat, Apr 27, 2024 at 3:17AM Lukas Wunner <lukas@wunner.de> wrote: > > > > That is correct, when the user-visible issue occurs, no driver is > > > > bound to the NHI and XHCI. The discrete JHL chip is not permitted to > > > > attach to the external-facing root port because of the security > > > > policy, so the NHI and XHCI are not seen by the computer. > > > > > > Could you rework your patch to only rectify the NHI's and XHCI's > > > device properties and leave the bridges untouched? > > > > So I tried a build with that patch, but it never reached the > > tb_pci_fixup function Hm, re-reading your e-mail I'm irritated that you're referring to "that patch" (singular). You need at least these five commits: thunderbolt: Move struct tb_cm to tb.h thunderbolt: Obtain PCIe Device/Function number from DROM thunderbolt: Obtain PCIe Device/Function number via Router Operation thunderbolt: Associate PCI devices with PCIe Adapters thunderbolt: Mark PCIe Adapters on root switch as non-removable on this branch: https://github.com/l1k/linux/commits/thunderbolt_associate_v1 Thanks, Lukas
> Hm, why would I have to distinguish between the two? > > I distinguish between Thunderbolt PCIe Adapters on the root switch > and ones on non-root switches. The latter are attached Device Routers, > the former is the Host Router. I just set the ones on the former to > external_facing, fixed and trusted. Everything downstream is untrusted > and removable. Thanks for the explanation. I'll try adding the kernel config and command line parameters you mentioned, and share any additional debug info I find. What I did previously was use a pr_info line inside the tb_pci_fixup function. That's how I knew that the function wasn't visited. I am testing on ChromeOS on a non-ChromeBook so there might be some additional security settings that prevent the Thunderbolt driver from being loaded. I remember running into them when I first tried to debug this. That may be what is preventing your patch from working as intended on my device. I will have to look into what could be causing that. On Wed, May 15, 2024 at 4:51 PM Lukas Wunner <lukas@wunner.de> wrote: > > Hm, re-reading your e-mail I'm irritated that you're referring to > "that patch" (singular). You need at least these five commits: > > thunderbolt: Move struct tb_cm to tb.h > thunderbolt: Obtain PCIe Device/Function number from DROM > thunderbolt: Obtain PCIe Device/Function number via Router Operation > thunderbolt: Associate PCI devices with PCIe Adapters > thunderbolt: Mark PCIe Adapters on root switch as non-removable > > on this branch: > > https://github.com/l1k/linux/commits/thunderbolt_associate_v1 I should have specified "patch series" instead of patch. I did include all of your patches, and I used that branch you linked earlier! git log --pretty=short --oneline 2e83bc05d285 (HEAD -> integration) thunderbolt: Do not bind to Host Interface exposed by Device Router 01782fd462f2 thunderbolt: Mark PCIe Adapters on Root Switch as non-removable 814d0a7b0da8 thunderbolt: Associate PCI devices with PCIe Adapters 06b2f200532f thunderbolt: Obtain PCIe Device/Function number via Router Operation 03d532c8a59f thunderbolt: Obtain PCIe Device/Function number from DROM d710d8b828c5 thunderbolt: Move struct tb_cm to tb.h I very much appreciate your help with this! I am happy to dig deeper to figure out what is going on, I just wanted to share where I was so that I could get additional support, and be guided in the right direction. Thank you!
Hi, On Wed, May 15, 2024 at 10:35:22PM +0200, Lukas Wunner wrote: > On Wed, May 15, 2024 at 02:53:54PM -0400, Esther Shimanovich wrote: > > On Wed, May 8, 2024 at 1:23???AM Lukas Wunner <lukas@wunner.de> wrote: > > > On Wed, May 01, 2024 at 06:23:28PM -0400, Esther Shimanovich wrote: > > > > On Sat, Apr 27, 2024 at 3:17AM Lukas Wunner <lukas@wunner.de> wrote: > > > > That is correct, when the user-visible issue occurs, no driver is > > > > bound to the NHI and XHCI. The discrete JHL chip is not permitted to > > > > attach to the external-facing root port because of the security > > > > policy, so the NHI and XHCI are not seen by the computer. > > > > > > Could you rework your patch to only rectify the NHI's and XHCI's > > > device properties and leave the bridges untouched? > > > > So I tried a build with that patch, but it never reached the > > tb_pci_fixup function > > That means that for some reason, the PCI devices are not associated with > the Thunderbolt ports. Could you add this to the command line: > > thunderbolt.dyndbg ignore_loglevel log_buf_len=10M > > and this to your kernel config: > > CONFIG_DYNAMIC_DEBUG=y > > You should see "... is associated with ..." messages in dmesg. > This did work for Mika during his testing with recent Thunderbolt chips. > I amended the patches after his testing but wouldn't expect that to > cause issues. > > @Mika, would you mind re-testing if you've got cycles to spare? Sure, I'll try this today and update.
Hi, On Wed, May 15, 2024 at 02:53:54PM -0400, Esther Shimanovich wrote: > > For the clarity, Intel integrated Thunderbolt 3 controller first in Ice > > Lake, then Thunderbolt 4 controller in Tiger Lake and forward (Alder > > Lake, Raptor Lake, Meteor Lake). Anything else, including Comet Lake and > > the like are using discrete controllers such as Alpine Ridge, Titan > > Ridge (both Thunderbolt 3) and Maple Ridge (Thunderbolt 4), and Barlow > > Ridge (Thunderbolt 5) where the controller is either soldered on the > > motherboard or connected to a PCIe slot. > > Thanks for the explanation! > This patch worked smoothly on the device I tried. I'd be happy to go > forward with this patch, and test it on more devices. > Is that fine? Or should I try something else on the build I made with > Lukas's commits? I suggest trying on more devices just to be sure it covers all of them.
On Thu, May 16, 2024 at 11:30:17AM +0300, Mika Westerberg wrote: > Hi, > > On Wed, May 15, 2024 at 10:35:22PM +0200, Lukas Wunner wrote: > > On Wed, May 15, 2024 at 02:53:54PM -0400, Esther Shimanovich wrote: > > > On Wed, May 8, 2024 at 1:23???AM Lukas Wunner <lukas@wunner.de> wrote: > > > > On Wed, May 01, 2024 at 06:23:28PM -0400, Esther Shimanovich wrote: > > > > > On Sat, Apr 27, 2024 at 3:17AM Lukas Wunner <lukas@wunner.de> wrote: > > > > > That is correct, when the user-visible issue occurs, no driver is > > > > > bound to the NHI and XHCI. The discrete JHL chip is not permitted to > > > > > attach to the external-facing root port because of the security > > > > > policy, so the NHI and XHCI are not seen by the computer. > > > > > > > > Could you rework your patch to only rectify the NHI's and XHCI's > > > > device properties and leave the bridges untouched? > > > > > > So I tried a build with that patch, but it never reached the > > > tb_pci_fixup function > > > > That means that for some reason, the PCI devices are not associated with > > the Thunderbolt ports. Could you add this to the command line: > > > > thunderbolt.dyndbg ignore_loglevel log_buf_len=10M > > > > and this to your kernel config: > > > > CONFIG_DYNAMIC_DEBUG=y > > > > You should see "... is associated with ..." messages in dmesg. > > This did work for Mika during his testing with recent Thunderbolt chips. > > I amended the patches after his testing but wouldn't expect that to > > cause issues. > > > > @Mika, would you mind re-testing if you've got cycles to spare? > > Sure, I'll try this today and update. Okay now tried with your latest branch on Meteor Lake-P (integrated Thunderbolt). I do get these: [ 12.911728] thunderbolt 0000:00:0d.2: 0:8: associated with 0000:00:07.0 [ 12.911732] thunderbolt 0000:00:0d.2: 0:9: associated with 0000:00:07.1 ... [ 13.250242] thunderbolt 0000:00:0d.3: 0:8: associated with 0000:00:07.2 [ 13.250245] thunderbolt 0000:00:0d.3: 0:9: associated with 0000:00:07.3 The adapters 8 and 9 are PCIe as expected # tbadapters -r 0 -a 8 -a 9 8: PCIe Down Disabled 9: PCIe Down Disabled # tbadapters -d1 -r 0 -a 8 -a 9 8: PCIe Down Disabled 9: PCIe Down Disabled And the 07.[0-3] are the PCIe Thunderbolt Root Ports: # lspci ... 00:07.0 PCI bridge: Intel Corporation Meteor Lake-P Thunderbolt 4 PCI Express Root Port #0 (rev 10) 00:07.1 PCI bridge: Intel Corporation Meteor Lake-P Thunderbolt 4 PCI Express Root Port #1 (rev 10) 00:07.2 PCI bridge: Intel Corporation Meteor Lake-P Thunderbolt 4 PCI Express Root Port #2 (rev 10) 00:07.3 PCI bridge: Intel Corporation Meteor Lake-P Thunderbolt 4 PCI Express Root Port #3 (rev 10) ...
On Thu, May 16, 2024 at 5:16 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > I suggest trying on more devices just to be sure it covers all of them. I tested Mika's on additional devices, Lenovo ThinkPad T490 (JHL6250), Dell Latitude 7400 (JHL6340), Lenovo Thinkpad Carbon X1 Gen 7 (JHL6540) and the HP Elitebook 840 G6 (JHL7540) and the patch worked smoothly on all of those devices. It looks good to me! If you have any specific types of additional devices you'd want me to test this on, then let me know! On Wed, May 15, 2024 at 4:45 PM Lukas Wunner <lukas@wunner.de> wrote: > > Could you add this to the command line: > > thunderbolt.dyndbg ignore_loglevel log_buf_len=10M > > and this to your kernel config: > > CONFIG_DYNAMIC_DEBUG=y > > You should see "... is associated with ..." messages in dmesg. I tried Lukas's patches again, after enabling the Thunderbolt driver in the config and also verbose messages, so that I can see "thunderbolt:" messages, but it still never reaches the tb_pci_notifier_call function. I don't see "associated with" in any of the logs. The config on the image I am testing does not have the thunderbolt driver enabled by default, so this patch wouldn't help my use case even if I did manage to get it to work. I did spend some time on this -- it seems like it would take more time to test this out. And I would like maybe an explanation on why it is worth it to look into this, before I continue. I appreciate your patience with this! Things I tried: -searching through all existing logs for "associated with" and nothing came up. -I tried adding my own logs into the added function and compiling. -I did cherry pick Lukas's commits onto an earlier version of the kernel (5_15)--- I'm not sure if that could have affected anything. -I tried littering pci.c with logs but none of them showed up after deployment. I'm not sure at which point the thunderbolt driver decides that PCI tunneling is occurring. That could be something to look into. - I wonder if I am running into some additional ChromeOS-related restrictions surrounding tunneling. This seems to be the likely case. Also, this is what happens when I plug in an external thunderbolt device. Maybe there's a reason in the logs why the tb_pci_notifier_call function wasn't reached?: ``` 2024-06-24T15:19:20.215183Z INFO kernel: [183837.056612] usb 1-5: new high-speed USB device number 10 using xhci_hcd 2024-06-24T15:19:20.509844Z DEBUG kernel: [183837.349927] thunderbolt 0000:07:00.0: control channel starting... 2024-06-24T15:19:20.509867Z DEBUG kernel: [183837.349932] thunderbolt 0000:07:00.0: starting TX ring 0 2024-06-24T15:19:20.509870Z DEBUG kernel: [183837.349939] thunderbolt 0000:07:00.0: enabling interrupt at register 0x38200 bit 0 (0x0 -> 0x1) 2024-06-24T15:19:20.509872Z DEBUG kernel: [183837.349942] thunderbolt 0000:07:00.0: starting RX ring 0 2024-06-24T15:19:20.509876Z DEBUG kernel: [183837.349948] thunderbolt 0000:07:00.0: enabling interrupt at register 0x38200 bit 12 (0x1 -> 0x1001) 2024-06-24T15:19:21.879381Z DEBUG kernel: [183838.720640] thunderbolt 0000:07:00.0: current switch config: 2024-06-24T15:19:21.879389Z DEBUG kernel: [183838.720655] thunderbolt 0000:07:00.0: Thunderbolt 3 Switch: 8086:15ef (Revision: 6, TB Version: 16) 2024-06-24T15:19:21.879396Z DEBUG kernel: [183838.720667] thunderbolt 0000:07:00.0: Max Port Number: 13 2024-06-24T15:19:21.879402Z DEBUG kernel: [183838.720674] thunderbolt 0000:07:00.0: Config: 2024-06-24T15:19:21.879407Z DEBUG kernel: [183838.720679] thunderbolt 0000:07:00.0: Upstream Port Number: 1 Depth: 1 Route String: 0x3 Enabled: 1, PlugEventsDelay: 254ms 2024-06-24T15:19:21.879414Z DEBUG kernel: [183838.720689] thunderbolt 0000:07:00.0: unknown1: 0x0 unknown4: 0x0 2024-06-24T15:19:21.912146Z DEBUG kernel: [183838.753606] thunderbolt 0000:07:00.0: 3: reading drom (length: 0x80) 2024-06-24T15:19:22.511672Z DEBUG kernel: [183839.353220] thunderbolt 0000:07:00.0: 3: DROM version: 1 2024-06-24T15:19:22.517258Z DEBUG kernel: [183839.358696] thunderbolt 0000:07:00.0: 3: uid: 0x175942e1c49f800 [SOME VERSION OF THIS IS REPEATED FOR PORTS 1-13] 2024-06-24T15:19:22.523134Z DEBUG kernel: [183839.365301] thunderbolt 0000:07:00.0: Port 1: 8086:15ef (Revision: 6, TB Version: 1, Type: Port (0x1)) 2024-06-24T15:19:22.523155Z DEBUG kernel: [183839.365304] thunderbolt 0000:07:00.0: Max hop id (in/out): 19/19 2024-06-24T15:19:22.523158Z DEBUG kernel: [183839.365305] thunderbolt 0000:07:00.0: Max counters: 16 2024-06-24T15:19:22.523159Z DEBUG kernel: [183839.365306] thunderbolt 0000:07:00.0: NFC Credits: 0x7800048 2024-06-24T15:19:22.523160Z DEBUG kernel: [183839.365308] thunderbolt 0000:07:00.0: Credits (total/control): 120/2 2024-06-24T15:19:22.531155Z INFO kernel: [183839.373585] thunderbolt 0-3: new device found, vendor=0x175 device=0x87f 2024-06-24T15:19:22.531156Z INFO kernel: [183839.373587] thunderbolt 0-3: SAMSUNG ELECTRONICS CO.,LTD F32TU87x 2024-06-24T15:19:22.535373Z DEBUG kernel: [183839.377053] thunderbolt 0-3: GPIO lookup for consumer wp 2024-06-24T15:19:22.535379Z DEBUG kernel: [183839.377056] thunderbolt 0-3: using lookup tables for GPIO lookup 2024-06-24T15:19:22.535381Z DEBUG kernel: [183839.377057] thunderbolt 0-3: No GPIO consumer wp found 2024-06-24T15:19:22.535383Z DEBUG kernel: [183839.377082] thunderbolt 0-3: GPIO lookup for consumer wp 2024-06-24T15:19:22.535389Z DEBUG kernel: [183839.377083] thunderbolt 0-3: using lookup tables for GPIO lookup 2024-06-24T15:19:22.535390Z DEBUG kernel: [183839.377084] thunderbolt 0-3: No GPIO consumer wp found 2024-06-24T15:19:22.538463Z INFO pciguard[782]: INFO pciguard: [udev_monitor.cc(94)] UdevEvent: thunderbolt add /sys/devices/pci0000:00/0000:00:1d.4/0000:05:00.0/0000:06:00.0/0000:07:00.0/domain0/0-0/0-3 2024-06-24T15:19:22.538474Z INFO pciguard[782]: INFO pciguard: [event_handler.cc(28)] CurrentState=NO_USER_LOGGED_IN, UserPermission=0, received event=New-Thunderbolt-Dev [SOME VERSION OF THIS SECTION IS REPEATED 7 TIMES] 2024-06-24T15:19:24.038230Z DEBUG kernel: [183840.880242] i915 0000:00:02.0: HDMI infoframe: Dynamic Range and Mastering, version 1, length 26 2024-06-24T15:19:24.038271Z DEBUG kernel: [183840.880256] i915 0000:00:02.0: length: 26 2024-06-24T15:19:24.038277Z DEBUG kernel: [183840.880261] i915 0000:00:02.0: metadata type: 0 2024-06-24T15:19:24.038282Z DEBUG kernel: [183840.880266] i915 0000:00:02.0: eotf: 2 2024-06-24T15:19:24.038286Z DEBUG kernel: [183840.880270] i915 0000:00:02.0: x[0]: 35400 2024-06-24T15:19:24.038290Z DEBUG kernel: [183840.880275] i915 0000:00:02.0: y[0]: 14600 2024-06-24T15:19:24.038294Z DEBUG kernel: [183840.880279] i915 0000:00:02.0: x[1]: 8500 2024-06-24T15:19:24.038327Z DEBUG kernel: [183840.880283] i915 0000:00:02.0: y[1]: 39850 2024-06-24T15:19:24.038334Z DEBUG kernel: [183840.880287] i915 0000:00:02.0: x[2]: 6550 2024-06-24T15:19:24.038338Z DEBUG kernel: [183840.880290] i915 0000:00:02.0: y[2]: 2300 2024-06-24T15:19:24.038342Z DEBUG kernel: [183840.880294] i915 0000:00:02.0: white point x: 15635 2024-06-24T15:19:24.038346Z DEBUG kernel: [183840.880298] i915 0000:00:02.0: white point y: 16450 2024-06-24T15:19:24.038350Z DEBUG kernel: [183840.880302] i915 0000:00:02.0: max_display_mastering_luminance: 0 2024-06-24T15:19:24.038354Z DEBUG kernel: [183840.880307] i915 0000:00:02.0: min_display_mastering_luminance: 0 2024-06-24T15:19:24.038358Z DEBUG kernel: [183840.880310] i915 0000:00:02.0: max_cll: 0 2024-06-24T15:19:24.038362Z DEBUG kernel: [183840.880314] i915 0000:00:02.0: max_fall: 0 2024-06-24T15:19:40.375221Z DEBUG kernel: [183857.217231] thunderbolt 0000:07:00.0: stopping RX ring 0 2024-06-24T15:19:40.375267Z DEBUG kernel: [183857.217254] thunderbolt 0000:07:00.0: disabling interrupt at register 0x38200 bit 12 (0x1001 -> 0x1) 2024-06-24T15:19:40.375275Z DEBUG kernel: [183857.217286] thunderbolt 0000:07:00.0: stopping TX ring 0 2024-06-24T15:19:40.375280Z DEBUG kernel: [183857.217298] thunderbolt 0000:07:00.0: disabling interrupt at register 0x38200 bit 0 (0x1 -> 0x0) 2024-06-24T15:19:40.375323Z DEBUG kernel: [183857.217317] thunderbolt 0000:07:00.0: control channel stopped ``` I could share other thunderbolt logs if that helps!
Hi, On Mon, Jun 24, 2024 at 11:58:46AM -0400, Esther Shimanovich wrote: > On Thu, May 16, 2024 at 5:16 AM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > > > I suggest trying on more devices just to be sure it covers all of them. > > I tested Mika's on additional devices, Lenovo ThinkPad T490 (JHL6250), > Dell Latitude 7400 (JHL6340), Lenovo Thinkpad Carbon X1 Gen 7 > (JHL6540) and the HP Elitebook 840 G6 (JHL7540) and the patch worked > smoothly on all of those devices. It looks good to me! If you have any > specific types of additional devices you'd want me to test this on, > then let me know! Thanks for testing! I don't have any additional devices in mind. I will be on vacation starting next week for the whole July. The patch is kind of "pseudo-code" in that sense that it probably needs some additional work, cleanup, maybe drop the serial number checks and so on. You are free to use it as you see fit, or submit upstream as proper patch if nobody objects. If nothing has happened when I come back, I can pick up the work if I still remember this ;-)
On Mon, Jun 24, 2024 at 11:58:46AM -0400, Esther Shimanovich wrote: > On Wed, May 15, 2024 at 4:45???PM Lukas Wunner <lukas@wunner.de> wrote: > > Could you add this to the command line: > > thunderbolt.dyndbg ignore_loglevel log_buf_len=10M > > > > and this to your kernel config: > > CONFIG_DYNAMIC_DEBUG=y > > > > You should see "... is associated with ..." messages in dmesg. > > I tried Lukas's patches again, after enabling the Thunderbolt driver > in the config and also verbose messages, so that I can see > "thunderbolt:" messages, but it still never reaches the > tb_pci_notifier_call function. I don't see "associated with" in any of > the logs. The config on the image I am testing does not have the > thunderbolt driver enabled by default, so this patch wouldn't help my > use case even if I did manage to get it to work. Mika, what do you make of this? Are the ChromeBooks in question using ICM-based tunneling instead of native tunneling? I thought this is all native nowadays and ICM is only used on older (pre-USB4) products. Thanks, Lukas
On Wed, Jun 26, 2024 at 10:50:22AM +0200, Lukas Wunner wrote: > On Mon, Jun 24, 2024 at 11:58:46AM -0400, Esther Shimanovich wrote: > > On Wed, May 15, 2024 at 4:45???PM Lukas Wunner <lukas@wunner.de> wrote: > > > Could you add this to the command line: > > > thunderbolt.dyndbg ignore_loglevel log_buf_len=10M > > > > > > and this to your kernel config: > > > CONFIG_DYNAMIC_DEBUG=y > > > > > > You should see "... is associated with ..." messages in dmesg. > > > > I tried Lukas's patches again, after enabling the Thunderbolt driver > > in the config and also verbose messages, so that I can see > > "thunderbolt:" messages, but it still never reaches the > > tb_pci_notifier_call function. I don't see "associated with" in any of > > the logs. The config on the image I am testing does not have the > > thunderbolt driver enabled by default, so this patch wouldn't help my > > use case even if I did manage to get it to work. > > Mika, what do you make of this? Are the ChromeBooks in question > using ICM-based tunneling instead of native tunneling? I thought > this is all native nowadays and ICM is only used on older (pre-USB4) > products. I think these are not Chromebooks. They are "regular" PCs with Thunderbolt 3 host controller which is ICM as you suggest. There is still Maple Ridge and Tiger Lake (non-Chrome) that are ICM (firmware based connection manager) that are USB4 but everything after that is software based connection manager.
On Wed, Jun 26, 2024 at 4:05 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > I will be on vacation starting next week for the whole July. The patch > is kind of "pseudo-code" in that sense that it probably needs some > additional work, cleanup, maybe drop the serial number checks and so on. > You are free to use it as you see fit, or submit upstream as proper > patch if nobody objects. I cleaned it up, but I think I'd like to run it by you before submitting it, as you are the author and also some of my cleanups ended up being a bit more involved than I anticipated. For cleanup, I did the following: 1) I ended up moving the changes from pcie_set_pcie_untrusted to pci_set_removable for multiple reasons: - The downstream bug I ran into happened because of the "removable" attribute. - There seems to be a reason why both removable and untrusted exist despite both having the same logic. pci_fixup_early is run after pcie_set_pcie_untrusted, but before pci_set_removable. It seems like this was done on purpose so that downstream security policies can use quirks to set specific internal, fixed devices as untrusted. - The way you wrote it makes the attributes removable = untrusted, which wasn't the case before, and undos the pci_fixup_early quirks logic. - If you want to make sure that these non-tunneled discrete thunderbolt chips are labeled as trusted, we may have to duplicate this logic in both functions (which seems to be already the case anyways in their current state). I just don't fully know what the "untrusted" attribute entails, so I am erring on the more conservative side of only making changes I fully understand. 2) I changed this comment into code: > +/* root->external_facing is true, parent != NULL */ 3) I edited legacy comments to reflect what the code does now. I also changed your comments to reflect how I changed the code, but for the most part I kept your words in as they were really clear. 4) I removed the serial checks as you suggested > If nothing has happened when I come back, I can pick up the work if I > still remember this ;-) I did my best to clean up! I'm unsure if you will want me to duplicate this logic to pcie_set_pcie_untrusted, so just let me know if I should fix that, and I'll send it to the kernel! (I'm assuming with the Co-developed-by, and the Signed-off-by lines, to properly attribute you?) I hope you had a nice vacation! Both you and Lukas Wurner have been so helpful and attentive. The cleaned up patch is below: diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 43159965e09e9..fc3ef2cf66d58 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1613,24 +1613,161 @@ static void set_pcie_untrusted(struct pci_dev *dev) dev->untrusted = true; } +/* + * Checks if the PCIe switch that contains pdev is directly under + * the specified bridge. + */ +static bool pcie_switch_directly_under(struct pci_dev *bridge, + struct pci_dev *parent, + struct pci_dev *pdev) +{ + /* + * If the device has a PCIe type, that means it is part of a PCIe + * switch. + */ + switch (pci_pcie_type(pdev)) { + case PCI_EXP_TYPE_UPSTREAM: + if (parent == bridge) + return true; + break; + + case PCI_EXP_TYPE_DOWNSTREAM: + if (pci_pcie_type(parent) == PCI_EXP_TYPE_UPSTREAM) { + parent = pci_upstream_bridge(parent); + if (parent == bridge) + return true; + } + break; + + case PCI_EXP_TYPE_ENDPOINT: + if (pci_pcie_type(parent) == PCI_EXP_TYPE_DOWNSTREAM) { + parent = pci_upstream_bridge(parent); + if (parent && pci_pcie_type(parent) == PCI_EXP_TYPE_UPSTREAM) { + parent = pci_upstream_bridge(parent); + if (parent == bridge) + return true; + } + } + break; + } + + return false; +} + +static bool pcie_has_usb4_host_interface(struct pci_dev *pdev) +{ + struct fwnode_handle *fwnode; + + /* + * For USB4 the tunneled PCIe root or downstream ports are marked with + * the "usb4-host-interface" property so we look for that first. This + * should cover the most cases. + */ + fwnode = fwnode_find_reference(dev_fwnode(&pdev->dev), + "usb4-host-interface", 0); + if (!IS_ERR(fwnode)) { + fwnode_handle_put(fwnode); + return true; + } + + /* + * Any integrated Thunderbolt 3/4 PCIe root ports from Intel + * before Alder Lake do not have the above device property so we + * use their PCI IDs instead. All these are tunneled. This list + * is not expected to grow. + */ + if (pdev->vendor == PCI_VENDOR_ID_INTEL) { + switch (pdev->device) { + /* Ice Lake Thunderbolt 3 PCIe Root Ports */ + case 0x8a1d: + case 0x8a1f: + case 0x8a21: + case 0x8a23: + /* Tiger Lake-LP Thunderbolt 4 PCIe Root Ports */ + case 0x9a23: + case 0x9a25: + case 0x9a27: + case 0x9a29: + /* Tiger Lake-H Thunderbolt 4 PCIe Root Ports */ + case 0x9a2b: + case 0x9a2d: + case 0x9a2f: + case 0x9a31: + return true; + } + } + + return false; +} + +static bool pcie_is_tunneled(struct pci_dev *root, struct pci_dev *parent, + struct pci_dev *pdev) +{ + /* Return least trusted outcome if params are invalid */ + if (!(root && root->external_facing && parent)) + return true; + + /* Anything directly behind a "usb4-host-interface" is tunneled */ + if (pcie_has_usb4_host_interface(parent)) + return true; + + /* + * Check if this is a discrete Thunderbolt/USB4 controller that is + * directly behind a PCIe Root Port marked as "ExternalFacingPort". + * These are not behind a PCIe tunnel. + */ + if (pcie_switch_directly_under(root, parent, pdev)) + return false; + + return true; +} + static void pci_set_removable(struct pci_dev *dev) { - struct pci_dev *parent = pci_upstream_bridge(dev); + struct pci_dev *parent, *root; + + parent = pci_upstream_bridge(dev); /* - * We (only) consider everything downstream from an external_facing - * device to be removable by the user. We're mainly concerned with - * consumer platforms with user accessible thunderbolt ports that are - * vulnerable to DMA attacks, and we expect those ports to be marked by - * the firmware as external_facing. Devices in traditional hotplug - * slots can technically be removed, but the expectation is that unless - * the port is marked with external_facing, such devices are less - * accessible to user / may not be removed by end user, and thus not - * exposed as "removable" to userspace. + * We're mainly concerned with consumer platforms with user accessible + * thunderbolt ports that are vulnerable to DMA attacks. + * We expect those ports to be marked by the firmware as external_facing. + * Devices outside external_facing ports are labeled as removable, with + * the exception of discrete thunderbolt chips within the chassis. + * + * Devices in traditional hotplug slots can technically be removed, + * but the expectation is that unless the port is marked with + * external_facing, such devices are less accessible to user / may not + * be removed by end user, and thus not exposed as "removable" to + * userspace. */ - if (parent && - (parent->external_facing || dev_is_removable(&parent->dev))) + if (!parent) + return; + + if (dev_is_removable(&parent->dev)) dev_set_removable(&dev->dev, DEVICE_REMOVABLE); + + root = pcie_find_root_port(dev); + + if (root && root->external_facing) { + /* + * All devices behind a PCIe root port labeled as + * "ExternalFacingPort" are tunneled by definition, + * with the exception of discrete Thunderbolt/USB4 + * controllers that add Thunderbolt capabilities + * to CPUs that lack integrated Thunderbolt. + * They are identified because by definition, they + * aren't tunneled. + * + * Those discrete Thunderbolt/USB4 controllers are + * not removable. Only their downstream facing ports + * are actually something that are exposed to the + * wild so we only mark devices tunneled behind those + * as removable. + */ + if (pcie_is_tunneled(root, parent, dev)) + dev_set_removable(&dev->dev, DEVICE_REMOVABLE); + } } /**
On Wed, Jun 26, 2024 at 11:59:45AM +0300, Mika Westerberg wrote: > On Wed, Jun 26, 2024 at 10:50:22AM +0200, Lukas Wunner wrote: > > On Mon, Jun 24, 2024 at 11:58:46AM -0400, Esther Shimanovich wrote: > > > On Wed, May 15, 2024 at 4:45???PM Lukas Wunner <lukas@wunner.de> wrote: > > > > Could you add this to the command line: > > > > thunderbolt.dyndbg ignore_loglevel log_buf_len=10M > > > > > > > > and this to your kernel config: > > > > CONFIG_DYNAMIC_DEBUG=y > > > > > > > > You should see "... is associated with ..." messages in dmesg. > > > > > > I tried Lukas's patches again, after enabling the Thunderbolt driver > > > in the config and also verbose messages, so that I can see > > > "thunderbolt:" messages, but it still never reaches the > > > tb_pci_notifier_call function. I don't see "associated with" in any of > > > the logs. The config on the image I am testing does not have the > > > thunderbolt driver enabled by default, so this patch wouldn't help my > > > use case even if I did manage to get it to work. > > > > Mika, what do you make of this? Are the ChromeBooks in question > > using ICM-based tunneling instead of native tunneling? I thought > > this is all native nowadays and ICM is only used on older (pre-USB4) > > products. > > I think these are not Chromebooks. They are "regular" PCs with > Thunderbolt 3 host controller which is ICM as you suggest. > > There is still Maple Ridge and Tiger Lake (non-Chrome) that are ICM > (firmware based connection manager) that are USB4 but everything after > that is software based connection manager. Even with ICM, the DROM of the root switch seems to be retrieved: icm_start() tb_switch_add() tb_drom_read() Assuming the DROM contains proper PCIe Upstream and Downstream Adapter Entries, all the data needed to at least associate the PCIe Adapters on the root switch should be there. So I'm surprised Esther is not seeing *any* messages. Do the DROMs on ICM root switches generally lack PCIe Upstream and Downstream Adapter Entries? What am I missing? Thanks, Lukas
On Sun, Jul 28, 2024 at 05:41:09PM +0200, Lukas Wunner wrote: > On Wed, Jun 26, 2024 at 11:59:45AM +0300, Mika Westerberg wrote: > > On Wed, Jun 26, 2024 at 10:50:22AM +0200, Lukas Wunner wrote: > > > On Mon, Jun 24, 2024 at 11:58:46AM -0400, Esther Shimanovich wrote: > > > > On Wed, May 15, 2024 at 4:45???PM Lukas Wunner <lukas@wunner.de> wrote: > > > > > Could you add this to the command line: > > > > > thunderbolt.dyndbg ignore_loglevel log_buf_len=10M > > > > > > > > > > and this to your kernel config: > > > > > CONFIG_DYNAMIC_DEBUG=y > > > > > > > > > > You should see "... is associated with ..." messages in dmesg. > > > > > > > > I tried Lukas's patches again, after enabling the Thunderbolt driver > > > > in the config and also verbose messages, so that I can see > > > > "thunderbolt:" messages, but it still never reaches the > > > > tb_pci_notifier_call function. I don't see "associated with" in any of > > > > the logs. The config on the image I am testing does not have the > > > > thunderbolt driver enabled by default, so this patch wouldn't help my > > > > use case even if I did manage to get it to work. > > > > > > Mika, what do you make of this? Are the ChromeBooks in question > > > using ICM-based tunneling instead of native tunneling? I thought > > > this is all native nowadays and ICM is only used on older (pre-USB4) > > > products. > > > > I think these are not Chromebooks. They are "regular" PCs with > > Thunderbolt 3 host controller which is ICM as you suggest. > > > > There is still Maple Ridge and Tiger Lake (non-Chrome) that are ICM > > (firmware based connection manager) that are USB4 but everything after > > that is software based connection manager. > > Even with ICM, the DROM of the root switch seems to be retrieved: > > icm_start() > tb_switch_add() > tb_drom_read() > > Assuming the DROM contains proper PCIe Upstream and Downstream Adapter > Entries, all the data needed to at least associate the PCIe Adapters > on the root switch should be there. So I'm surprised Esther is not > seeing *any* messages. > > Do the DROMs on ICM root switches generally lack PCIe Upstream and > Downstream Adapter Entries? > What am I missing? My guess is that they are not populated for ICM host router DROM entries. These are pretty much Apple stuff and USB4 dropped them completely in favour of the router operations.
Hi, On Fri, Jul 26, 2024 at 02:17:46PM -0400, Esther Shimanovich wrote: > On Wed, Jun 26, 2024 at 4:05 AM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > I will be on vacation starting next week for the whole July. The patch > > is kind of "pseudo-code" in that sense that it probably needs some > > additional work, cleanup, maybe drop the serial number checks and so on. > > You are free to use it as you see fit, or submit upstream as proper > > patch if nobody objects. > > I cleaned it up, but I think I'd like to run it by you before > submitting it, as you are the author and also some of my cleanups > ended up being a bit more involved than I anticipated. > > For cleanup, I did the following: > > 1) I ended up moving the changes from pcie_set_pcie_untrusted to > pci_set_removable for multiple reasons: > > - The downstream bug I ran into happened because of the "removable" attribute. > > - There seems to be a reason why both removable and untrusted exist > despite both having the same logic. pci_fixup_early is run after > pcie_set_pcie_untrusted, but before pci_set_removable. It seems like > this was done on purpose so that downstream security policies can use > quirks to set specific internal, fixed devices as untrusted. > > - The way you wrote it makes the attributes removable = untrusted, > which wasn't the case before, and undos the pci_fixup_early quirks > logic. > > - If you want to make sure that these non-tunneled discrete > thunderbolt chips are labeled as trusted, we may have to duplicate > this logic in both functions (which seems to be already the case > anyways in their current state). > I just don't fully know what the "untrusted" attribute entails, so I > am erring on the more conservative side of only making changes I fully > understand. The "untrusted" means the device is something that is not "soldered down" or equivalent. E.g something you can hot-plug through a port on the laptop system such as USB4/TB. It is used to enable full IOMMU mappings for these devices to avoid malicious devices from accessing memory that does not belong to it. So it is pretty much same as "removable". Why we do want to have the USB4 host controller + xHCI not "untrusted" is because they don't need to go through the full IOMMU mappings and therefore we get for one more throughput for things like networking over USB4. > 2) I changed this comment into code: > > > +/* root->external_facing is true, parent != NULL */ > > 3) I edited legacy comments to reflect what the code does now. I also > changed your comments to reflect how I changed the code, but for the > most part I kept your words in as they were really clear. > > 4) I removed the serial checks as you suggested > > > If nothing has happened when I come back, I can pick up the work if I > > still remember this ;-) > > I did my best to clean up! I'm unsure if you will want me to duplicate > this logic to pcie_set_pcie_untrusted, so just let me know if I should > fix that, and I'll send it to the kernel! (I'm assuming with the > Co-developed-by, and the Signed-off-by lines, to properly attribute > you?) I think they should be the "same". You can add something like "Suggested-by" or so if you like but up to you. No need to add other tags from me. Please Cc IOMMU folks too so that they can take a look just in case. The patch is line-wrapped but otherwise looks good except the above. Thanks for cleaning it up. > I hope you had a nice vacation! Both you and Lukas Wurner have been so > helpful and attentive. > > The cleaned up patch is below: > > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 43159965e09e9..fc3ef2cf66d58 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1613,24 +1613,161 @@ static void set_pcie_untrusted(struct pci_dev *dev) > dev->untrusted = true; > } > > +/* > + * Checks if the PCIe switch that contains pdev is directly under > + * the specified bridge. > + */ > +static bool pcie_switch_directly_under(struct pci_dev *bridge, > + struct pci_dev *parent, > + struct pci_dev *pdev) > +{ > + /* > + * If the device has a PCIe type, that means it is part of a PCIe > + * switch. > + */ > + switch (pci_pcie_type(pdev)) { > + case PCI_EXP_TYPE_UPSTREAM: > + if (parent == bridge) > + return true; > + break; > + > + case PCI_EXP_TYPE_DOWNSTREAM: > + if (pci_pcie_type(parent) == PCI_EXP_TYPE_UPSTREAM) { > + parent = pci_upstream_bridge(parent); > + if (parent == bridge) > + return true; > + } > + break; > + > + case PCI_EXP_TYPE_ENDPOINT: > + if (pci_pcie_type(parent) == PCI_EXP_TYPE_DOWNSTREAM) { > + parent = pci_upstream_bridge(parent); > + if (parent && pci_pcie_type(parent) == > PCI_EXP_TYPE_UPSTREAM) { > + parent = pci_upstream_bridge(parent); > + if (parent == bridge) > + return true; > + } > + } > + break; > + } > + > + return false; > +} > + > +static bool pcie_has_usb4_host_interface(struct pci_dev *pdev) > +{ > + struct fwnode_handle *fwnode; > + > + /* > + * For USB4 the tunneled PCIe root or downstream ports are marked with > + * the "usb4-host-interface" property so we look for that first. This > + * should cover the most cases. > + */ > + fwnode = fwnode_find_reference(dev_fwnode(&pdev->dev), > + "usb4-host-interface", 0); > + if (!IS_ERR(fwnode)) { > + fwnode_handle_put(fwnode); > + return true; > + } > + > + /* > + * Any integrated Thunderbolt 3/4 PCIe root ports from Intel > + * before Alder Lake do not have the above device property so we > + * use their PCI IDs instead. All these are tunneled. This list > + * is not expected to grow. > + */ > + if (pdev->vendor == PCI_VENDOR_ID_INTEL) { > + switch (pdev->device) { > + /* Ice Lake Thunderbolt 3 PCIe Root Ports */ > + case 0x8a1d: > + case 0x8a1f: > + case 0x8a21: > + case 0x8a23: > + /* Tiger Lake-LP Thunderbolt 4 PCIe Root Ports */ > + case 0x9a23: > + case 0x9a25: > + case 0x9a27: > + case 0x9a29: > + /* Tiger Lake-H Thunderbolt 4 PCIe Root Ports */ > + case 0x9a2b: > + case 0x9a2d: > + case 0x9a2f: > + case 0x9a31: > + return true; > + } > + } > + > + return false; > +} > + > +static bool pcie_is_tunneled(struct pci_dev *root, struct pci_dev *parent, > + struct pci_dev *pdev) > +{ > + /* Return least trusted outcome if params are invalid */ > + if (!(root && root->external_facing && parent)) > + return true; > + > + /* Anything directly behind a "usb4-host-interface" is tunneled */ > + if (pcie_has_usb4_host_interface(parent)) > + return true; > + > + /* > + * Check if this is a discrete Thunderbolt/USB4 controller that is > + * directly behind a PCIe Root Port marked as "ExternalFacingPort". > + * These are not behind a PCIe tunnel. > + */ > + if (pcie_switch_directly_under(root, parent, pdev)) > + return false; > + > + return true; > +} > + > static void pci_set_removable(struct pci_dev *dev) > { > - struct pci_dev *parent = pci_upstream_bridge(dev); > + struct pci_dev *parent, *root; > + > + parent = pci_upstream_bridge(dev); > > /* > - * We (only) consider everything downstream from an external_facing > - * device to be removable by the user. We're mainly concerned with > - * consumer platforms with user accessible thunderbolt ports that are > - * vulnerable to DMA attacks, and we expect those ports to be marked by > - * the firmware as external_facing. Devices in traditional hotplug > - * slots can technically be removed, but the expectation is that unless > - * the port is marked with external_facing, such devices are less > - * accessible to user / may not be removed by end user, and thus not > - * exposed as "removable" to userspace. > + * We're mainly concerned with consumer platforms with user accessible > + * thunderbolt ports that are vulnerable to DMA attacks. > + * We expect those ports to be marked by the firmware as > external_facing. > + * Devices outside external_facing ports are labeled as removable, with > + * the exception of discrete thunderbolt chips within the chassis. > + * > + * Devices in traditional hotplug slots can technically be removed, > + * but the expectation is that unless the port is marked with > + * external_facing, such devices are less accessible to user / may not > + * be removed by end user, and thus not exposed as "removable" to > + * userspace. > */ > - if (parent && > - (parent->external_facing || dev_is_removable(&parent->dev))) > + if (!parent) > + return; > + > + if (dev_is_removable(&parent->dev)) > dev_set_removable(&dev->dev, DEVICE_REMOVABLE); > + > + root = pcie_find_root_port(dev); > + > + if (root && root->external_facing) { > + /* > + * All devices behind a PCIe root port labeled as > + * "ExternalFacingPort" are tunneled by definition, > + * with the exception of discrete Thunderbolt/USB4 > + * controllers that add Thunderbolt capabilities > + * to CPUs that lack integrated Thunderbolt. > + * They are identified because by definition, they > + * aren't tunneled. > + * > + * Those discrete Thunderbolt/USB4 controllers are > + * not removable. Only their downstream facing ports > + * are actually something that are exposed to the > + * wild so we only mark devices tunneled behind those > + * as removable. > + */ > + if (pcie_is_tunneled(root, parent, dev)) > + dev_set_removable(&dev->dev, DEVICE_REMOVABLE); > + } > } > > /**
On Mon, Jul 29, 2024 at 11:04:41AM +0300, Mika Westerberg wrote: > On Sun, Jul 28, 2024 at 05:41:09PM +0200, Lukas Wunner wrote: > > Do the DROMs on ICM root switches generally lack PCIe Upstream and > > Downstream Adapter Entries? > > My guess is that they are not populated for ICM host router DROM > entries. These are pretty much Apple stuff and USB4 dropped them > completely in favour of the router operations. I note that Microsoft specifies a "usb4-port-number" for PCIe Downstream Adapters as well as DP and USB ports: https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#mapping-native-protocols-pcie-displayport-tunneled-through-usb4-to-usb4-host-routers Presumably the "usb4-port-number" allows for associating a pci_dev with a tb_port. So that's a third way to do that, on top of DROM entries and the USB4 router operation. What I don't quite understand is whether the "usb4-port-number" is only present on PCIe Adapters of the Host Router or whether it can also exist on PCIe Adapters of Device Routers? I would also like to know whether "usb4-port-number" is set on the machines that Esther's quirk seeks to fix? I found this DSDT of an unknown Lenovo model which does not have "usb4-port-number" set on any PCIe Adapters, only on USB ports: https://gist.github.com/64kramsystem/ab2410f081a4f47d4a205699828ab2f9 I assumed that my solution to this problem would not be viable as we seem to be lacking port numbers for Host Router PCIe Adapters. Those are needed to associate a pci_dev with a tb_port on the Host Router and adjust its "untrusted" and "external_facing" properties. However if we do have the "usb4-port-number" on Host Router PCIe Adapters, then my solution would still be viable and I could look into adding support for the "usb4-port-number" property to it. Thanks, Lukas
Hi Lukas, On Sun, Aug 25, 2024 at 04:29:34PM +0200, Lukas Wunner wrote: > On Mon, Jul 29, 2024 at 11:04:41AM +0300, Mika Westerberg wrote: > > On Sun, Jul 28, 2024 at 05:41:09PM +0200, Lukas Wunner wrote: > > > Do the DROMs on ICM root switches generally lack PCIe Upstream and > > > Downstream Adapter Entries? > > > > My guess is that they are not populated for ICM host router DROM > > entries. These are pretty much Apple stuff and USB4 dropped them > > completely in favour of the router operations. > > I note that Microsoft specifies a "usb4-port-number" for PCIe Downstream > Adapters as well as DP and USB ports: > > https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#mapping-native-protocols-pcie-displayport-tunneled-through-usb4-to-usb4-host-routers > > Presumably the "usb4-port-number" allows for associating a > pci_dev with a tb_port. So that's a third way to do that, > on top of DROM entries and the USB4 router operation. It matches the "USB4 port" number of the host router. > What I don't quite understand is whether the "usb4-port-number" > is only present on PCIe Adapters of the Host Router or whether > it can also exist on PCIe Adapters of Device Routers? It is is only present on host router and only for PCIe and USB ports (I'ven not seen it been used with DisplayPort). > I would also like to know whether "usb4-port-number" is set on > the machines that Esther's quirk seeks to fix? This is only present in USB4 software connection manager systems so not on any of those as far as I can tell.
> > I would also like to know whether "usb4-port-number" is set on > > the machines that Esther's quirk seeks to fix? > > This is only present in USB4 software connection manager systems so not > on any of those as far as I can tell. Can confirm. "usb4-port-number" is not found in the ACPI tables for any of the devices I have observed this bug on.
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index ea476252280a..34e43323ff14 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3873,6 +3873,118 @@ DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL, quirk_apple_poweroff_thunderbolt); #endif +/* + * On most ThinkPad Carbon 7/8s, JHL6540 Thunderbolt 3 bridges are set + * incorrectly as DEVICE_REMOVABLE despite being built into the device. + * This is the side effect of a unique hardware configuration. + * + * Normally, Thunderbolt functionality is integrated to the SoC and + * its root ports. + * + * Most devices: + * root port --> USB-C port + * + * But X1 Carbon Gen 7/8 uses Whiskey Lake and Comet Lake SoC, which + * don't come with Thunderbolt functionality. Therefore, a discrete + * Thunderbolt Host PCI controller was added between the root port and + * the USB-C port. + * + * Thinkpad Carbon 7/8s + * (w/ Whiskey Lake and Comet Lake SoC): + * root port --> JHL6540 --> USB-C port + * + * Because the root port is labeled by FW as "ExternalFacingPort", as + * required by the DMAR ACPI spec, the JHL6540 chip is inaccurately + * labeled as DEVICE_REMOVABLE by the kernel pci driver. + * Therefore, the built-in USB-C ports do not enumerate when policies + * forbidding external pci devices are enforced. + * + * This fix relabels the pci components in the built-in JHL6540 chip as + * DEVICE_FIXED, ensuring that the built-in USB-C ports always enumerate + * properly as intended. + * + * This fix also labels the external facing components of the JHL6540 as + * external_facing, so that the pci attach policy works as intended. + * + * The ASCII diagram below describes the pci layout of the JHL6540 chip. + * + * Root Port + * [8086:02b4] or [8086:9db4] + * | + * JHL6540 Chip + * __________________________________________________ + * | Bridge | + * | PCI ID -> [8086:15d3] | + * | DEVFN -> (00) | + * | _________________|__________________ | + * | | | | | | + * | Bridge Bridge Bridge Bridge | + * | [8086:15d3] [8086:15d3] [8086:15d3] [8086:15d3] | + * | (00) (08) (10) (20) | + * | | | | | | + * | NHI | USB Controller | | + * | [8086:15d2] | [8086:15d4] | | + * | (00) | (00) | | + * | |___________| |___________| | + * |____________|________________________|____________| + * | | + * USB-C Port USB-C Port + * + * + * Based on what a JHL6549 pci component's pci id, subsystem device id + * and devfn are, we can infer if it is fixed and if it faces a usb port; + * which would mean it is external facing. + * This quirk uses these values to identify the pci components and set the + * properties accordingly. + */ +static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev) +{ + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) + return; + + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ + if (dev->subsystem_device != 0x22be && // Gen 8 + dev->subsystem_device != 0x2292) { // Gen 7 + return; + } + + dev_set_removable(&dev->dev, DEVICE_FIXED); + + /* Not all 0x15d3 components are external facing */ + if (dev->device == 0x15d3 && + dev->devfn != 0x08 && + dev->devfn != 0x20) { + return; + } + + dev->external_facing = true; +} + +/* + * We also need to relabel the root port as a consequence of changing + * the JHL6540's PCIE hierarchy. + */ +static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev) +{ + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) + return; + + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ + if (dev->subsystem_device != 0x22be && // Gen 8 + dev->subsystem_device != 0x2292) { // Gen 7 + return; + } + + dev->external_facing = false; +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d3, carbon_X1_fixup_relabel_alpine_ridge); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d2, carbon_X1_fixup_relabel_alpine_ridge); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d4, carbon_X1_fixup_relabel_alpine_ridge); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x02b4, carbon_X1_fixup_rootport_not_removable); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9db4, carbon_X1_fixup_rootport_not_removable); + /* * Following are device-specific reset methods which can be used to * reset a single function if other methods (e.g. FLR, PM D0->D3) are
On Lenovo X1 Carbon Gen 7/8 devices, when a platform enables a policy to distrust removable PCI devices, the build-in USB-C ports stop working at all. This happens because these X1 Carbon models have a unique feature; a Thunderbolt controller that is discrete from the SoC. The software sees this controller, and incorrectly assumes it is a removable PCI device, even though it is fixed to the computer and is wired to the computer's own USB-C ports. Relabel all the components of the JHL6540 controller as DEVICE_FIXED, and where applicable, external_facing. Ensure that the security policy to distrust external PCI devices works as intended, and that the device's USB-C ports are able to enumerate even when the policy is enabled. Signed-off-by: Esther Shimanovich <eshimanovich@chromium.org> --- Changes in v4: - replaced a dmi check in the rootport quirk with a subsystem vendor and device check. - Link to v3: https://lore.kernel.org/r/20231220-thunderbolt-pci-patch-4-v3-1-056fd1717d06@chromium.org Changes in v3: - removed redundant dmi check, as the subsystem vendor check is sufficient - switched to PCI_VENDOR_ID_LENOVO instead of hex code - Link to v2: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v2-1-ec2d7af45a9b@chromium.org Changes in v2: - nothing new, v1 was just a test run to see if the ASCII diagram would be rendered properly in mutt and k-9 - for folks using gmail, make sure to select "show original" on the top right, as otherwise the diagram will be garbled by the standard non-monospace font - Link to v1: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v1-1-4e8e3773f0a9@chromium.org --- drivers/pci/quirks.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 112 insertions(+) --- base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86 change-id: 20231219-thunderbolt-pci-patch-4-ede71cb833c4 Best regards,