Message ID | 20210408170123.8788-3-logang@deltatee.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Add new DMA mapping operation for P2PDMA | expand |
On 4/8/21 10:01 AM, Logan Gunthorpe wrote: > In order to use upstream_bridge_distance_warn() from a dma_map function, > it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it > might sleep. > > In order to avoid this, try to get the host bridge's device from > bus->self, and if that is not set, just get the first element in the > device list. It should be impossible for the host bridge's device to > go away while references are held on child devices, so the first element > should not be able to change and, thus, this should be safe. > > Signed-off-by: Logan Gunthorpe <logang@deltatee.com> > --- > drivers/pci/p2pdma.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c > index bd89437faf06..473a08940fbc 100644 > --- a/drivers/pci/p2pdma.c > +++ b/drivers/pci/p2pdma.c > @@ -311,16 +311,26 @@ static const struct pci_p2pdma_whitelist_entry { > static bool __host_bridge_whitelist(struct pci_host_bridge *host, > bool same_host_bridge) > { > - struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0)); > const struct pci_p2pdma_whitelist_entry *entry; > + struct pci_dev *root = host->bus->self; > unsigned short vendor, device; > > + /* > + * This makes the assumption that the first device on the bus is the > + * bridge itself and it has the devfn of 00.0. This assumption should > + * hold for the devices in the white list above, and if there are cases > + * where this isn't true they will have to be dealt with when such a > + * case is added to the whitelist. Actually, it makes the assumption that the first device *in the list* (the host->bus-devices list) is 00.0. The previous code made the assumption that you wrote. By the way, pre-existing code comment: pci_p2pdma_whitelist[] seems really short. From a naive point of view, I'd expect that there must be a lot more CPUs/chipsets that can do pci p2p, what do you think? I wonder if we have to be so super strict, anyway. It just seems extremely limited, and I suspect there will be some additions to the list as soon as we start to use this. > + */ > if (!root) > + root = list_first_entry_or_null(&host->bus->devices, > + struct pci_dev, bus_list); OK, yes this avoids taking the pci_bus_sem, but it's kind of cheating. Why is it OK to avoid taking any locks in order to retrieve the first entry from the list, but in order to retrieve any other entry, you have to aquire the pci_bus_sem, and get a reference as well? Something is inconsistent there. The new version here also no longer takes a reference on the device, which is also cheating. But I'm guessing that the unstated assumption here is that there is always at least one entry in the list. But if that's true, then it's better to show clearly that assumption, instead of hiding it in an implicit call that skips both locking and reference counting. You could add a new function, which is a cut-down version of pci_get_slot(), like this, and call this from __host_bridge_whitelist(): /* * A special purpose variant of pci_get_slot() that doesn't take the pci_bus_sem * lock, and only looks for the 00.0 bus-device-function. Once the PCI bus is * up, it is safe to call this, because there will always be a top-level PCI * root device. * * Other assumptions: the root device is the first device in the list, and the * root device is numbered 00.0. */ struct pci_dev *pci_get_root_slot(struct pci_bus *bus) { struct pci_dev *root; unsigned devfn = PCI_DEVFN(0, 0); root = list_first_entry_or_null(&bus->devices, struct pci_dev, bus_list); if (root->devfn == devfn) goto out; root = NULL; out: pci_dev_get(root); return root; } EXPORT_SYMBOL(pci_get_root_slot); ...I think that's a lot clearer to the reader, about what's going on here. Note that I'm not really sure if it *is* safe, I would need to ask other PCIe subsystem developers with more experience. But I don't think anyone is trying to make p2pdma calls so early that PCIe buses are uninitialized. > + > + if (!root || root->devfn) > return false; > > vendor = root->vendor; > device = root->device; > - pci_dev_put(root); > > for (entry = pci_p2pdma_whitelist; entry->vendor; entry++) { > if (vendor != entry->vendor || device != entry->device) > thanks,
On 2021-05-01 11:35 p.m., John Hubbard wrote: > On 4/8/21 10:01 AM, Logan Gunthorpe wrote: >> In order to use upstream_bridge_distance_warn() from a dma_map function, >> it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it >> might sleep. >> >> In order to avoid this, try to get the host bridge's device from >> bus->self, and if that is not set, just get the first element in the >> device list. It should be impossible for the host bridge's device to >> go away while references are held on child devices, so the first element >> should not be able to change and, thus, this should be safe. >> >> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> >> --- >> drivers/pci/p2pdma.c | 14 ++++++++++++-- >> 1 file changed, 12 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c >> index bd89437faf06..473a08940fbc 100644 >> --- a/drivers/pci/p2pdma.c >> +++ b/drivers/pci/p2pdma.c >> @@ -311,16 +311,26 @@ static const struct pci_p2pdma_whitelist_entry { >> static bool __host_bridge_whitelist(struct pci_host_bridge *host, >> bool same_host_bridge) >> { >> - struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0)); >> const struct pci_p2pdma_whitelist_entry *entry; >> + struct pci_dev *root = host->bus->self; >> unsigned short vendor, device; >> >> + /* >> + * This makes the assumption that the first device on the bus is the >> + * bridge itself and it has the devfn of 00.0. This assumption should >> + * hold for the devices in the white list above, and if there are cases >> + * where this isn't true they will have to be dealt with when such a >> + * case is added to the whitelist. > > Actually, it makes the assumption that the first device *in the list* > (the host->bus-devices list) is 00.0. The previous code made the > assumption that you wrote. The comment notes two assumptions (although the grammar is poor, which I will fix). Yes, the previous code made the second assumption, the new code makes both assumptions. > By the way, pre-existing code comment: pci_p2pdma_whitelist[] seems > really short. From a naive point of view, I'd expect that there must be > a lot more CPUs/chipsets that can do pci p2p, what do you think? I > wonder if we have to be so super strict, anyway. It just seems extremely > limited, and I suspect there will be some additions to the list as soon > as we start to use this. Yes, well unfortunately we have no other way to determine what host bridges can communicate with P2P. We settled on a whitelist when the series was first patch. Nobody likes that situation, but nobody has found anything better. We've been hoping standards bodies would give us a flag but I haven't heard anything about that. At least AMD has been able to guarantee us that all CPUs newer than Zen will support so that covers a large swath. It would be nice if we could say something similar for Intel. > OK, yes this avoids taking the pci_bus_sem, but it's kind of cheating. > Why is it OK to avoid taking any locks in order to retrieve the > first entry from the list, but in order to retrieve any other entry, you > have to aquire the pci_bus_sem, and get a reference as well? Something > is inconsistent there. > > The new version here also no longer takes a reference on the device, > which is also cheating. But I'm guessing that the unstated assumption > here is that there is always at least one entry in the list. But if > that's true, then it's better to show clearly that assumption, instead > of hiding it in an implicit call that skips both locking and reference > counting. Because we hold a reference to a child device of the bus. So the host bus device can't go away until the child device has been released. An earlier version of the P2PDMA patchset had a lot more extraneous get device calls until someone else pointed this out. > You could add a new function, which is a cut-down version of pci_get_slot(), > like this, and call this from __host_bridge_whitelist(): > > /* > * A special purpose variant of pci_get_slot() that doesn't take the pci_bus_sem > * lock, and only looks for the 00.0 bus-device-function. Once the PCI bus is > * up, it is safe to call this, because there will always be a top-level PCI > * root device. > * > * Other assumptions: the root device is the first device in the list, and the > * root device is numbered 00.0. > */ > struct pci_dev *pci_get_root_slot(struct pci_bus *bus) > { > struct pci_dev *root; > unsigned devfn = PCI_DEVFN(0, 0); > > root = list_first_entry_or_null(&bus->devices, struct pci_dev, > bus_list); > if (root->devfn == devfn) > goto out; > > root = NULL; > out: > pci_dev_get(root); > return root; > } > EXPORT_SYMBOL(pci_get_root_slot); > > ...I think that's a lot clearer to the reader, about what's going on here. Per above, I think the reference count is unnecessary. But I could wrap it in a static function for clarity. (There's no reason to export this function). > Note that I'm not really sure if it *is* safe, I would need to ask other > PCIe subsystem developers with more experience. But I don't think anyone > is trying to make p2pdma calls so early that PCIe buses are uninitialized. Yeah, it's impossible to make a p2pdma call before the PCIe bus is initialized. They have to have access to at least one PCI device before they can even attempt it. Logan
On 5/3/21 9:08 AM, Logan Gunthorpe wrote: ... >> By the way, pre-existing code comment: pci_p2pdma_whitelist[] seems >> really short. From a naive point of view, I'd expect that there must be >> a lot more CPUs/chipsets that can do pci p2p, what do you think? I >> wonder if we have to be so super strict, anyway. It just seems extremely >> limited, and I suspect there will be some additions to the list as soon >> as we start to use this. > > Yes, well unfortunately we have no other way to determine what host > bridges can communicate with P2P. We settled on a whitelist when the > series was first patch. Nobody likes that situation, but nobody has > found anything better. We've been hoping standards bodies would give us > a flag but I haven't heard anything about that. At least AMD has been > able to guarantee us that all CPUs newer than Zen will support so that > covers a large swath. It would be nice if we could say something similar > for Intel. Thanks for explaining the situation! > >> OK, yes this avoids taking the pci_bus_sem, but it's kind of cheating. >> Why is it OK to avoid taking any locks in order to retrieve the >> first entry from the list, but in order to retrieve any other entry, you >> have to aquire the pci_bus_sem, and get a reference as well? Something >> is inconsistent there. >> >> The new version here also no longer takes a reference on the device, >> which is also cheating. But I'm guessing that the unstated assumption >> here is that there is always at least one entry in the list. But if >> that's true, then it's better to show clearly that assumption, instead >> of hiding it in an implicit call that skips both locking and reference >> counting. > > Because we hold a reference to a child device of the bus. So the host > bus device can't go away until the child device has been released. An > earlier version of the P2PDMA patchset had a lot more extraneous get > device calls until someone else pointed this out. > >> You could add a new function, which is a cut-down version of pci_get_slot(), >> like this, and call this from __host_bridge_whitelist(): >> >> /* >> * A special purpose variant of pci_get_slot() that doesn't take the pci_bus_sem >> * lock, and only looks for the 00.0 bus-device-function. Once the PCI bus is >> * up, it is safe to call this, because there will always be a top-level PCI >> * root device. >> * >> * Other assumptions: the root device is the first device in the list, and the >> * root device is numbered 00.0. >> */ >> struct pci_dev *pci_get_root_slot(struct pci_bus *bus) >> { >> struct pci_dev *root; >> unsigned devfn = PCI_DEVFN(0, 0); >> >> root = list_first_entry_or_null(&bus->devices, struct pci_dev, >> bus_list); >> if (root->devfn == devfn) >> goto out; >> >> root = NULL; >> out: >> pci_dev_get(root); >> return root; >> } >> EXPORT_SYMBOL(pci_get_root_slot); >> >> ...I think that's a lot clearer to the reader, about what's going on here. > > Per above, I think the reference count is unnecessary. But I could wrap > it in a static function for clarity. (There's no reason to export this > function). > Yes, please. thanks,
On Mon, May 03, 2021 at 10:08:34AM -0600, Logan Gunthorpe wrote: > Per above, I think the reference count is unnecessary. But I could wrap > it in a static function for clarity. (There's no reason to export this > function). A well documented helper function would really help to improve the code for the casual reader I think.
On 4/8/21 1:01 PM, Logan Gunthorpe wrote: > In order to use upstream_bridge_distance_warn() from a dma_map function, > it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it > might sleep. > > In order to avoid this, try to get the host bridge's device from > bus->self, and if that is not set, just get the first element in the > device list. It should be impossible for the host bridge's device to > go away while references are held on child devices, so the first element > should not be able to change and, thus, this should be safe. Bjorn: Why wouldn't (shouldn't?) the bus->self field be set for a host bridge device? Should this situation be repaired in the host-brige config/setup code elsewhere in the kernel. ... and here, a check-and-fail with info of what doesn't have it setup (another new pci function to do the check & prinfo), so it can point to the offending host-bridge, and thus, the code that needs to be updated? > Signed-off-by: Logan Gunthorpe <logang@deltatee.com> > --- > drivers/pci/p2pdma.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c > index bd89437faf06..473a08940fbc 100644 > --- a/drivers/pci/p2pdma.c > +++ b/drivers/pci/p2pdma.c > @@ -311,16 +311,26 @@ static const struct pci_p2pdma_whitelist_entry { > static bool __host_bridge_whitelist(struct pci_host_bridge *host, > bool same_host_bridge) > { > - struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0)); > const struct pci_p2pdma_whitelist_entry *entry; > + struct pci_dev *root = host->bus->self; > unsigned short vendor, device; > > + /* > + * This makes the assumption that the first device on the bus is the > + * bridge itself and it has the devfn of 00.0. This assumption should > + * hold for the devices in the white list above, and if there are cases > + * where this isn't true they will have to be dealt with when such a > + * case is added to the whitelist. > + */ > if (!root) > + root = list_first_entry_or_null(&host->bus->devices, > + struct pci_dev, bus_list); > + > + if (!root || root->devfn) > return false; > > vendor = root->vendor; > device = root->device; > - pci_dev_put(root); > > for (entry = pci_p2pdma_whitelist; entry->vendor; entry++) { > if (vendor != entry->vendor || device != entry->device)
On 5/2/21 1:35 AM, John Hubbard wrote: > On 4/8/21 10:01 AM, Logan Gunthorpe wrote: >> In order to use upstream_bridge_distance_warn() from a dma_map function, >> it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it >> might sleep. >> >> In order to avoid this, try to get the host bridge's device from >> bus->self, and if that is not set, just get the first element in the >> device list. It should be impossible for the host bridge's device to >> go away while references are held on child devices, so the first element >> should not be able to change and, thus, this should be safe. >> >> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> >> --- >> drivers/pci/p2pdma.c | 14 ++++++++++++-- >> 1 file changed, 12 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c >> index bd89437faf06..473a08940fbc 100644 >> --- a/drivers/pci/p2pdma.c >> +++ b/drivers/pci/p2pdma.c >> @@ -311,16 +311,26 @@ static const struct pci_p2pdma_whitelist_entry { >> static bool __host_bridge_whitelist(struct pci_host_bridge *host, >> bool same_host_bridge) >> { >> - struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0)); >> const struct pci_p2pdma_whitelist_entry *entry; >> + struct pci_dev *root = host->bus->self; >> unsigned short vendor, device; >> + /* >> + * This makes the assumption that the first device on the bus is the >> + * bridge itself and it has the devfn of 00.0. This assumption should >> + * hold for the devices in the white list above, and if there are cases >> + * where this isn't true they will have to be dealt with when such a >> + * case is added to the whitelist. > > Actually, it makes the assumption that the first device *in the list* > (the host->bus-devices list) is 00.0. The previous code made the > assumption that you wrote. > > By the way, pre-existing code comment: pci_p2pdma_whitelist[] seems > really short. From a naive point of view, I'd expect that there must be > a lot more CPUs/chipsets that can do pci p2p, what do you think? I > wonder if we have to be so super strict, anyway. It just seems extremely > limited, and I suspect there will be some additions to the list as soon > as we start to use this. > > >> + */ >> if (!root) >> + root = list_first_entry_or_null(&host->bus->devices, >> + struct pci_dev, bus_list); > > OK, yes this avoids taking the pci_bus_sem, but it's kind of cheating. > Why is it OK to avoid taking any locks in order to retrieve the > first entry from the list, but in order to retrieve any other entry, you > have to aquire the pci_bus_sem, and get a reference as well? Something > is inconsistent there. > > The new version here also no longer takes a reference on the device, > which is also cheating. But I'm guessing that the unstated assumption > here is that there is always at least one entry in the list. But if > that's true, then it's better to show clearly that assumption, instead > of hiding it in an implicit call that skips both locking and reference > counting. > > You could add a new function, which is a cut-down version of pci_get_slot(), > like this, and call this from __host_bridge_whitelist(): > > /* > * A special purpose variant of pci_get_slot() that doesn't take the pci_bus_sem > * lock, and only looks for the 00.0 bus-device-function. Once the PCI bus is > * up, it is safe to call this, because there will always be a top-level PCI > * root device. > * > * Other assumptions: the root device is the first device in the list, and the > * root device is numbered 00.0. > */ > struct pci_dev *pci_get_root_slot(struct pci_bus *bus) > { > struct pci_dev *root; > unsigned devfn = PCI_DEVFN(0, 0); > > root = list_first_entry_or_null(&bus->devices, struct pci_dev, > bus_list); > if (root->devfn == devfn) > goto out; > ... add a flag (set for p2pdma use) to the function to print out what the root->devfn is, and what the device is so the needed quirk &/or modification can added to handle when this assumption fails; or make it a prdebug that can be flipped on for this failing situation, again, to add needed change to accomodate. > root = NULL; > out: > pci_dev_get(root); > return root; > } > EXPORT_SYMBOL(pci_get_root_slot); > > ...I think that's a lot clearer to the reader, about what's going on here. > > Note that I'm not really sure if it *is* safe, I would need to ask other > PCIe subsystem developers with more experience. But I don't think anyone > is trying to make p2pdma calls so early that PCIe buses are uninitialized. > > >> + >> + if (!root || root->devfn) >> return false; >> vendor = root->vendor; >> device = root->device; >> - pci_dev_put(root); and the reason to remove the dev_put is b/c it can sleep as well? is that ok, given the dev_get that John put into the new pci_get_root_slot()? ... seems like a locking version with no get/put's is needed, or, fix the host-bridge setups so no !NULL self pointers. >> for (entry = pci_p2pdma_whitelist; entry->vendor; entry++) { >> if (vendor != entry->vendor || device != entry->device) >> > > thanks,
On 2021-05-11 10:05 a.m., Don Dutile wrote: > On 4/8/21 1:01 PM, Logan Gunthorpe wrote: >> In order to use upstream_bridge_distance_warn() from a dma_map function, >> it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it >> might sleep. >> >> In order to avoid this, try to get the host bridge's device from >> bus->self, and if that is not set, just get the first element in the >> device list. It should be impossible for the host bridge's device to >> go away while references are held on child devices, so the first element >> should not be able to change and, thus, this should be safe. > Bjorn: > Why wouldn't (shouldn't?) the bus->self field be set for a host bridge device? > Should this situation be repaired in the host-brige config/setup code elsewhere in the kernel. > ... and here, a check-and-fail with info of what doesn't have it setup (another new pci function to do the check & prinfo), so it can point to the offending host-bridge, and thus, the code that needs to be updated? I've dropped the bus->self thing in v2. Seems bus->self is explicitly unset for root bridges. There's remnants in the pci code that used to check bus->self to see if the bridge is the root bridge. I tried setting bus->self with the pci device of the root bridge but that just caused my machine not to boot and I didn't dig any further. Logan
On 2021-05-11 10:05 a.m., Don Dutile wrote: > ... add a flag (set for p2pdma use) to the function to print out what the root->devfn is, and what > the device is so the needed quirk &/or modification can added to handle when this assumption fails; > or make it a prdebug that can be flipped on for this failing situation, again, to add needed change to accomodate. Good idea! Will add. >> root = NULL; >> out: >> pci_dev_get(root); >> return root; >> } >> EXPORT_SYMBOL(pci_get_root_slot); >> >> ...I think that's a lot clearer to the reader, about what's going on here. >> >> Note that I'm not really sure if it *is* safe, I would need to ask other >> PCIe subsystem developers with more experience. But I don't think anyone >> is trying to make p2pdma calls so early that PCIe buses are uninitialized. >> >> >>> + >>> + if (!root || root->devfn) >>> return false; >>> vendor = root->vendor; >>> device = root->device; >>> - pci_dev_put(root); > and the reason to remove the dev_put is b/c it can sleep as well? > is that ok, given the dev_get that John put into the new pci_get_root_slot()? > ... seems like a locking version with no get/put's is needed, or, fix the host-bridge setups so no !NULL self pointers. The dev_get is redundant here seeing we hold references to child devices. It was only in the previous code because we were using pci_get_slot() to get the device which did the get for us. Logan
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index bd89437faf06..473a08940fbc 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -311,16 +311,26 @@ static const struct pci_p2pdma_whitelist_entry { static bool __host_bridge_whitelist(struct pci_host_bridge *host, bool same_host_bridge) { - struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0)); const struct pci_p2pdma_whitelist_entry *entry; + struct pci_dev *root = host->bus->self; unsigned short vendor, device; + /* + * This makes the assumption that the first device on the bus is the + * bridge itself and it has the devfn of 00.0. This assumption should + * hold for the devices in the white list above, and if there are cases + * where this isn't true they will have to be dealt with when such a + * case is added to the whitelist. + */ if (!root) + root = list_first_entry_or_null(&host->bus->devices, + struct pci_dev, bus_list); + + if (!root || root->devfn) return false; vendor = root->vendor; device = root->device; - pci_dev_put(root); for (entry = pci_p2pdma_whitelist; entry->vendor; entry++) { if (vendor != entry->vendor || device != entry->device)
In order to use upstream_bridge_distance_warn() from a dma_map function, it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it might sleep. In order to avoid this, try to get the host bridge's device from bus->self, and if that is not set, just get the first element in the device list. It should be impossible for the host bridge's device to go away while references are held on child devices, so the first element should not be able to change and, thus, this should be safe. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> --- drivers/pci/p2pdma.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)