Message ID | 20190418115859.2394-1-christian.koenig@amd.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI/P2PDMA: start with a whitelist for root complexes | expand |
On Thu, Apr 18, 2019 at 01:58:59PM +0200, Christian König wrote: > A lot of root complexes can still do P2P even when PCI devices > don't share a common upstream bridge. > > Start adding a whitelist and allow P2P if both participants are > attached to known good root complex. Is there a plan for addressing this in a generic way that doesn't require an OS modification for every new "known good root complex", e.g., some PCIe or ACPI spec update that allows the OS to discover this? > Signed-off-by: Christian König <christian.koenig@amd.com> > --- > drivers/pci/p2pdma.c | 38 +++++++++++++++++++++++++++++++++++--- > 1 file changed, 35 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c > index c52298d76e64..212baaa7f93b 100644 > --- a/drivers/pci/p2pdma.c > +++ b/drivers/pci/p2pdma.c > @@ -274,6 +274,31 @@ static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev) > seq_buf_printf(buf, "%s;", pci_name(pdev)); > } > > +/* > + * If we can't find a common upstream bridge take a look at the root complex and > + * compare it to a whitelist of known good hardware. > + */ > +static bool root_complex_whitelist(struct pci_dev *dev) > +{ > + struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > + struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0)); > + unsigned short vendor, device; > + > + if (!root) > + return false; > + > + vendor = root->vendor; > + device = root->device; > + pci_dev_put(root); > + > + /* AMD ZEN host bridges can do peer to peer */ > + if (vendor == PCI_VENDOR_ID_AMD && device == 0x1450) > + return true; > + > + /* TODO: Extend that to a proper whitelist */ > + return false; > +} > + > /* > * Find the distance through the nearest common upstream bridge between > * two PCI devices. > @@ -317,13 +342,13 @@ static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev) > * In this case, a list of all infringing bridge addresses will be > * populated in acs_list (assuming it's non-null) for printk purposes. > */ > -static int upstream_bridge_distance(struct pci_dev *a, > - struct pci_dev *b, > +static int upstream_bridge_distance(struct pci_dev *provider, > + struct pci_dev *client, > struct seq_buf *acs_list) > { > + struct pci_dev *a = provider, *b = client, *bb; > int dist_a = 0; > int dist_b = 0; > - struct pci_dev *bb = NULL; > int acs_cnt = 0; > > /* > @@ -354,6 +379,13 @@ static int upstream_bridge_distance(struct pci_dev *a, > dist_a++; > } > > + /* Allow the connection if both devices are on a whitelisted root > + * complex, but add an arbitary large value to the distance. > + */ > + if (root_complex_whitelist(provider) && > + root_complex_whitelist(client)) > + return 0x1000 + dist_a + dist_b; > + > return -1; > > check_b_path_acs: > -- > 2.17.1 >
On 2019-04-18 5:58 a.m., Christian König wrote: > A lot of root complexes can still do P2P even when PCI devices > don't share a common upstream bridge. > > Start adding a whitelist and allow P2P if both participants are > attached to known good root complex. > > Signed-off-by: Christian König <christian.koenig@amd.com> This looks simple enough and sounds sensible to me. Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Probably worth copying Bjorn who would be the maintainer to merge it. Logan
On 2019-04-18 10:33 a.m., Bjorn Helgaas wrote: > On Thu, Apr 18, 2019 at 01:58:59PM +0200, Christian König wrote: >> A lot of root complexes can still do P2P even when PCI devices >> don't share a common upstream bridge. >> >> Start adding a whitelist and allow P2P if both participants are >> attached to known good root complex. > > Is there a plan for addressing this in a generic way that doesn't > require an OS modification for every new "known good root complex", > e.g., some PCIe or ACPI spec update that allows the OS to discover > this? I'm aware of work going on in the PCI SIG to address this [1]. But I expect it's going to be a long time before actual hardware advertises this capability to indicate support. So in the interim we either need to not use p2pdma on root complexes or create a white list. I'm in favour of the white list approach. Logan [1] https://lore.kernel.org/linux-pci/20181210115653.0000615a@huawei.com/
On Thu, Apr 18, 2019 at 10:58:55AM -0600, Logan Gunthorpe wrote: > On 2019-04-18 10:33 a.m., Bjorn Helgaas wrote: > > On Thu, Apr 18, 2019 at 01:58:59PM +0200, Christian König wrote: > > > A lot of root complexes can still do P2P even when PCI devices > > > don't share a common upstream bridge. > > > > > > Start adding a whitelist and allow P2P if both participants are > > > attached to known good root complex. > > > > Is there a plan for addressing this in a generic way that doesn't > > require an OS modification for every new "known good root complex", > > e.g., some PCIe or ACPI spec update that allows the OS to discover > > this? > > I'm aware of work going on in the PCI SIG to address this [1]. > > But I expect it's going to be a long time before actual hardware advertises > this capability to indicate support. So in the interim we either need to not > use p2pdma on root complexes or create a white list. I'm in favour of the > white list approach. I agree we need a whitelist; I just want to make sure we also make progress on some way to limit the amount of time we need to update it. > [1] https://lore.kernel.org/linux-pci/20181210115653.0000615a@huawei.com/
Am 18.04.19 um 19:24 schrieb Bjorn Helgaas: > On Thu, Apr 18, 2019 at 10:58:55AM -0600, Logan Gunthorpe wrote: >> On 2019-04-18 10:33 a.m., Bjorn Helgaas wrote: >>> On Thu, Apr 18, 2019 at 01:58:59PM +0200, Christian König wrote: >>>> A lot of root complexes can still do P2P even when PCI devices >>>> don't share a common upstream bridge. >>>> >>>> Start adding a whitelist and allow P2P if both participants are >>>> attached to known good root complex. >>> Is there a plan for addressing this in a generic way that doesn't >>> require an OS modification for every new "known good root complex", >>> e.g., some PCIe or ACPI spec update that allows the OS to discover >>> this? >> I'm aware of work going on in the PCI SIG to address this [1]. >> >> But I expect it's going to be a long time before actual hardware advertises >> this capability to indicate support. So in the interim we either need to not >> use p2pdma on root complexes or create a white list. I'm in favour of the >> white list approach. > I agree we need a whitelist; I just want to make sure we also make > progress on some way to limit the amount of time we need to update it. Well as far as I know at least for AMD there is also a vendor specific way to figure out if a chipset does P2P routing or not. So you don't need every possible combination listed. It's perfectly possible that Intel, ARM and PowerPC have something similar and with all those on board you have pretty much every interesting case covered. I will try to dig something up into that direction. Another problem is also that some root complexes have very very strange limitations on the routing which are hard to handle. For example some old chipsets can do P2P, but only for writes and not reads! Anyway at least for my current use case a simple whitelist with vendor/device should be sufficient, so I'm fine with maintaining that for now. But on the other hand I also understand that in 10+ years we don't want a list with 200 entries here.... Regards, Christian. > >> [1] https://lore.kernel.org/linux-pci/20181210115653.0000615a@huawei.com/
On Fri, Apr 19, 2019 at 02:24:18PM +0000, Koenig, Christian wrote: > Am 18.04.19 um 19:24 schrieb Bjorn Helgaas: > > On Thu, Apr 18, 2019 at 10:58:55AM -0600, Logan Gunthorpe wrote: > >> On 2019-04-18 10:33 a.m., Bjorn Helgaas wrote: > >>> On Thu, Apr 18, 2019 at 01:58:59PM +0200, Christian König wrote: > >>>> A lot of root complexes can still do P2P even when PCI devices > >>>> don't share a common upstream bridge. > >>>> > >>>> Start adding a whitelist and allow P2P if both participants are > >>>> attached to known good root complex. > >>> Is there a plan for addressing this in a generic way that doesn't > >>> require an OS modification for every new "known good root complex", > >>> e.g., some PCIe or ACPI spec update that allows the OS to discover > >>> this? > >> I'm aware of work going on in the PCI SIG to address this [1]. > >> > >> But I expect it's going to be a long time before actual hardware advertises > >> this capability to indicate support. So in the interim we either need to not > >> use p2pdma on root complexes or create a white list. I'm in favour of the > >> white list approach. > > I agree we need a whitelist; I just want to make sure we also make > > progress on some way to limit the amount of time we need to update it. > > Well as far as I know at least for AMD there is also a vendor specific > way to figure out if a chipset does P2P routing or not. So you don't > need every possible combination listed. > > It's perfectly possible that Intel, ARM and PowerPC have something > similar and with all those on board you have pretty much every > interesting case covered. > > I will try to dig something up into that direction. > > Another problem is also that some root complexes have very very strange > limitations on the routing which are hard to handle. For example some > old chipsets can do P2P, but only for writes and not reads! > > Anyway at least for my current use case a simple whitelist with > vendor/device should be sufficient, so I'm fine with maintaining that > for now. But on the other hand I also understand that in 10+ years we > don't want a list with 200 entries here.... Right. And it's not just *you*; maintaining a list like that is also a burden for me and every distro maintainer who has to backport updates. And the fact that it's not covered by any spec also means we're open to all kinds of quirks about how it interacts with things that *are* in the spec: ACS, ATS, the read/write thing you mentioned, ... > >> [1] https://lore.kernel.org/linux-pci/20181210115653.0000615a@huawei.com/
On Thu, Apr 18, 2019 at 01:58:59PM +0200, Christian König wrote: > A lot of root complexes can still do P2P even when PCI devices > don't share a common upstream bridge. > > Start adding a whitelist and allow P2P if both participants are > attached to known good root complex. > > Signed-off-by: Christian König <christian.koenig@amd.com> > --- > drivers/pci/p2pdma.c | 38 +++++++++++++++++++++++++++++++++++--- > 1 file changed, 35 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c > index c52298d76e64..212baaa7f93b 100644 > --- a/drivers/pci/p2pdma.c > +++ b/drivers/pci/p2pdma.c > @@ -274,6 +274,31 @@ static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev) > seq_buf_printf(buf, "%s;", pci_name(pdev)); > } > > +/* > + * If we can't find a common upstream bridge take a look at the root complex and > + * compare it to a whitelist of known good hardware. > + */ > +static bool root_complex_whitelist(struct pci_dev *dev) > +{ > + struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > + struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0)); The fact that 00.0 is a host bridge and that its vendor/device ID tells us whether peer-to-peer is supported between two *other* devices (provider and client) is all totally vendor-specific. There's nothing in the PCI specs that says the host bridge even has to exist as a PCI device. Is there any chance you could identify it by the Root Port vendor/device instead of groping around for the 00:0 device? That would have been simpler, so I suppose that doesn't work for some reason. I'm surprised that you don't need both the provider and the client to figure this out. Wouldn't it would be possible to have two Root Ports under host bridge A that could do peer-to-peer to each other, but not to a Root Port under host bridge B? E.g., I'd think you would need something like this: if (pci_find_host_bridge(a->bus) == pci_find_host_bridge(b->bus) && a->vendor == PCI_VENDOR_ID_AMD && b->vendor == PCI_VENDOR_ID_AMD && a->device == X && b->device == X) return true; return false; But I guess that still implicitly relies on the notion that everything under one PNP0A03 device is potentially connected via P2P, and I don't think there's anything the PCI or ACPI specs that would support that. Maybe we should add a pci_dev.p2p_group and make it so "a.p2p_group == b.p2p_group" means that P2P is possible between them. You'd have to have a quirk for the root ports to initialize p2p_group using whatever vendor-specific information you need, then everything under them could inherit from the parent. > + unsigned short vendor, device; > + > + if (!root) > + return false; > + > + vendor = root->vendor; > + device = root->device; > + pci_dev_put(root); > + > + /* AMD ZEN host bridges can do peer to peer */ > + if (vendor == PCI_VENDOR_ID_AMD && device == 0x1450) > + return true; > + > + /* TODO: Extend that to a proper whitelist */ > + return false; > +} > + > /* > * Find the distance through the nearest common upstream bridge between > * two PCI devices. > @@ -317,13 +342,13 @@ static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev) > * In this case, a list of all infringing bridge addresses will be > * populated in acs_list (assuming it's non-null) for printk purposes. > */ > -static int upstream_bridge_distance(struct pci_dev *a, > - struct pci_dev *b, > +static int upstream_bridge_distance(struct pci_dev *provider, > + struct pci_dev *client, > struct seq_buf *acs_list) > { > + struct pci_dev *a = provider, *b = client, *bb; > int dist_a = 0; > int dist_b = 0; > - struct pci_dev *bb = NULL; > int acs_cnt = 0; > > /* > @@ -354,6 +379,13 @@ static int upstream_bridge_distance(struct pci_dev *a, > dist_a++; > } > > + /* Allow the connection if both devices are on a whitelisted root > + * complex, but add an arbitary large value to the distance. > + */ > + if (root_complex_whitelist(provider) && > + root_complex_whitelist(client)) > + return 0x1000 + dist_a + dist_b; > + > return -1; > > check_b_path_acs: > -- > 2.17.1 >
Am 19.04.19 um 21:19 schrieb Bjorn Helgaas: > On Thu, Apr 18, 2019 at 01:58:59PM +0200, Christian König wrote: >> A lot of root complexes can still do P2P even when PCI devices >> don't share a common upstream bridge. >> >> Start adding a whitelist and allow P2P if both participants are >> attached to known good root complex. >> >> Signed-off-by: Christian König <christian.koenig@amd.com> >> --- >> drivers/pci/p2pdma.c | 38 +++++++++++++++++++++++++++++++++++--- >> 1 file changed, 35 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c >> index c52298d76e64..212baaa7f93b 100644 >> --- a/drivers/pci/p2pdma.c >> +++ b/drivers/pci/p2pdma.c >> @@ -274,6 +274,31 @@ static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev) >> seq_buf_printf(buf, "%s;", pci_name(pdev)); >> } >> >> +/* >> + * If we can't find a common upstream bridge take a look at the root complex and >> + * compare it to a whitelist of known good hardware. >> + */ >> +static bool root_complex_whitelist(struct pci_dev *dev) >> +{ >> + struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); >> + struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0)); > The fact that 00.0 is a host bridge and that its vendor/device ID > tells us whether peer-to-peer is supported between two *other* devices > (provider and client) is all totally vendor-specific. There's nothing > in the PCI specs that says the host bridge even has to exist as a PCI > device. Crap, I hoped that at least the notion that the root complex is always 00.0 is standardized somewhere. > Is there any chance you could identify it by the Root Port > vendor/device instead of groping around for the 00:0 device? That > would have been simpler, so I suppose that doesn't work for some > reason. I can give that a try. The main issue is that you got more different root ports than root complexes, so our list is going to get longer this way. > I'm surprised that you don't need both the provider and the client to > figure this out. Wouldn't it would be possible to have two Root Ports > under host bridge A that could do peer-to-peer to each other, but not > to a Root Port under host bridge B? E.g., I'd think you would need > something like this: At least for our hardware it's actually more likely that when devices are connected to different host bridges that they can talk directly to each other. In other words when you have devices connected to the same host bridge you need to have P2P support inside that host bridge. But when you have different host bridges P2P accesses look to the host bridge just like accesses coming from one of the CPU cores. Regards, Christian. > > if (pci_find_host_bridge(a->bus) == pci_find_host_bridge(b->bus) && > a->vendor == PCI_VENDOR_ID_AMD && > b->vendor == PCI_VENDOR_ID_AMD && > a->device == X && > b->device == X) > return true; > > return false; > > But I guess that still implicitly relies on the notion that everything > under one PNP0A03 device is potentially connected via P2P, and I don't > think there's anything the PCI or ACPI specs that would support that. > > Maybe we should add a pci_dev.p2p_group and make it so "a.p2p_group == > b.p2p_group" means that P2P is possible between them. You'd have to > have a quirk for the root ports to initialize p2p_group using whatever > vendor-specific information you need, then everything under them could > inherit from the parent. Well exactly that's not going to work. E.g. it can be that devices A can talk to devices B and C, but on the other hand B and C can't talk directly to each other. Regards, Christian. > >> + unsigned short vendor, device; >> + >> + if (!root) >> + return false; >> + >> + vendor = root->vendor; >> + device = root->device; >> + pci_dev_put(root); >> + >> + /* AMD ZEN host bridges can do peer to peer */ >> + if (vendor == PCI_VENDOR_ID_AMD && device == 0x1450) >> + return true; >> + >> + /* TODO: Extend that to a proper whitelist */ >> + return false; >> +} >> + >> /* >> * Find the distance through the nearest common upstream bridge between >> * two PCI devices. >> @@ -317,13 +342,13 @@ static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev) >> * In this case, a list of all infringing bridge addresses will be >> * populated in acs_list (assuming it's non-null) for printk purposes. >> */ >> -static int upstream_bridge_distance(struct pci_dev *a, >> - struct pci_dev *b, >> +static int upstream_bridge_distance(struct pci_dev *provider, >> + struct pci_dev *client, >> struct seq_buf *acs_list) >> { >> + struct pci_dev *a = provider, *b = client, *bb; >> int dist_a = 0; >> int dist_b = 0; >> - struct pci_dev *bb = NULL; >> int acs_cnt = 0; >> >> /* >> @@ -354,6 +379,13 @@ static int upstream_bridge_distance(struct pci_dev *a, >> dist_a++; >> } >> >> + /* Allow the connection if both devices are on a whitelisted root >> + * complex, but add an arbitary large value to the distance. >> + */ >> + if (root_complex_whitelist(provider) && >> + root_complex_whitelist(client)) >> + return 0x1000 + dist_a + dist_b; >> + >> return -1; >> >> check_b_path_acs: >> -- >> 2.17.1 >>
On Thu, Apr 18, 2019 at 01:58:59PM +0200, Christian König wrote: > A lot of root complexes can still do P2P even when PCI devices > don't share a common upstream bridge. > > Start adding a whitelist and allow P2P if both participants are > attached to known good root complex. > > Signed-off-by: Christian König <christian.koenig@amd.com> I applied this with Logan's reviewed-by to pci/peer-to-peer for v5.2, thanks! This should be easy to build on in the future as we discover new hardware that supports this and as we find out any quirks in the way they do it. > --- > drivers/pci/p2pdma.c | 38 +++++++++++++++++++++++++++++++++++--- > 1 file changed, 35 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c > index c52298d76e64..212baaa7f93b 100644 > --- a/drivers/pci/p2pdma.c > +++ b/drivers/pci/p2pdma.c > @@ -274,6 +274,31 @@ static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev) > seq_buf_printf(buf, "%s;", pci_name(pdev)); > } > > +/* > + * If we can't find a common upstream bridge take a look at the root complex and > + * compare it to a whitelist of known good hardware. > + */ > +static bool root_complex_whitelist(struct pci_dev *dev) > +{ > + struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > + struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0)); > + unsigned short vendor, device; > + > + if (!root) > + return false; > + > + vendor = root->vendor; > + device = root->device; > + pci_dev_put(root); > + > + /* AMD ZEN host bridges can do peer to peer */ > + if (vendor == PCI_VENDOR_ID_AMD && device == 0x1450) > + return true; > + > + /* TODO: Extend that to a proper whitelist */ > + return false; > +} > + > /* > * Find the distance through the nearest common upstream bridge between > * two PCI devices. > @@ -317,13 +342,13 @@ static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev) > * In this case, a list of all infringing bridge addresses will be > * populated in acs_list (assuming it's non-null) for printk purposes. > */ > -static int upstream_bridge_distance(struct pci_dev *a, > - struct pci_dev *b, > +static int upstream_bridge_distance(struct pci_dev *provider, > + struct pci_dev *client, > struct seq_buf *acs_list) > { > + struct pci_dev *a = provider, *b = client, *bb; > int dist_a = 0; > int dist_b = 0; > - struct pci_dev *bb = NULL; > int acs_cnt = 0; > > /* > @@ -354,6 +379,13 @@ static int upstream_bridge_distance(struct pci_dev *a, > dist_a++; > } > > + /* Allow the connection if both devices are on a whitelisted root > + * complex, but add an arbitary large value to the distance. > + */ > + if (root_complex_whitelist(provider) && > + root_complex_whitelist(client)) > + return 0x1000 + dist_a + dist_b; > + > return -1; > > check_b_path_acs: > -- > 2.17.1 >
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index c52298d76e64..212baaa7f93b 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -274,6 +274,31 @@ static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev) seq_buf_printf(buf, "%s;", pci_name(pdev)); } +/* + * If we can't find a common upstream bridge take a look at the root complex and + * compare it to a whitelist of known good hardware. + */ +static bool root_complex_whitelist(struct pci_dev *dev) +{ + struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); + struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0)); + unsigned short vendor, device; + + if (!root) + return false; + + vendor = root->vendor; + device = root->device; + pci_dev_put(root); + + /* AMD ZEN host bridges can do peer to peer */ + if (vendor == PCI_VENDOR_ID_AMD && device == 0x1450) + return true; + + /* TODO: Extend that to a proper whitelist */ + return false; +} + /* * Find the distance through the nearest common upstream bridge between * two PCI devices. @@ -317,13 +342,13 @@ static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev) * In this case, a list of all infringing bridge addresses will be * populated in acs_list (assuming it's non-null) for printk purposes. */ -static int upstream_bridge_distance(struct pci_dev *a, - struct pci_dev *b, +static int upstream_bridge_distance(struct pci_dev *provider, + struct pci_dev *client, struct seq_buf *acs_list) { + struct pci_dev *a = provider, *b = client, *bb; int dist_a = 0; int dist_b = 0; - struct pci_dev *bb = NULL; int acs_cnt = 0; /* @@ -354,6 +379,13 @@ static int upstream_bridge_distance(struct pci_dev *a, dist_a++; } + /* Allow the connection if both devices are on a whitelisted root + * complex, but add an arbitary large value to the distance. + */ + if (root_complex_whitelist(provider) && + root_complex_whitelist(client)) + return 0x1000 + dist_a + dist_b; + return -1; check_b_path_acs:
A lot of root complexes can still do P2P even when PCI devices don't share a common upstream bridge. Start adding a whitelist and allow P2P if both participants are attached to known good root complex. Signed-off-by: Christian König <christian.koenig@amd.com> --- drivers/pci/p2pdma.c | 38 +++++++++++++++++++++++++++++++++++--- 1 file changed, 35 insertions(+), 3 deletions(-)