Message ID | 20130711210326.1701.56478.stgit@bling.home (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: > This provides interfaces for drivers to discover the visible PCIe > requester ID for a device, for things like IOMMU setup, and iterate IDs (plural) > over the device chain from requestee to requester, including DMA > quirks at each step. "requestee" doesn't make sense to me. The "-ee" suffix added to a verb normally makes a noun that refers to the object of the action. So "requestee" sounds like it means something like "target" or "responder," but that's not what you mean here. > Suggested-by: Bjorn Helgaas <bhelgaas@google.com> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- > drivers/pci/search.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++ > include/linux/pci.h | 7 ++ > 2 files changed, 205 insertions(+) > > diff --git a/drivers/pci/search.c b/drivers/pci/search.c > index d0627fa..4759c02 100644 > --- a/drivers/pci/search.c > +++ b/drivers/pci/search.c > @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem); > EXPORT_SYMBOL_GPL(pci_bus_sem); > > /* > + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID > + * @dev: device to test > + */ > +static bool pci_has_pcie_requester_id(struct pci_dev *dev) > +{ > + /* > + * XXX There's no indicator of the bus type, conventional PCI vs > + * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe > + * requester ID is a native PCIe based system (such as VT-d or > + * AMD-Vi). It's common that PCIe root complex devices do not > + * include a PCIe capability, but we can assume they are PCIe > + * devices based on their topology. > + */ > + if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus)) > + return true; > + > + /* > + * PCI-X devices have a requester ID, but the bridge may still take > + * ownership of transactions and create a requester ID. We therefore > + * assume that the PCI-X requester ID is not the same one used on PCIe. > + */ > + > +#ifdef CONFIG_PCI_QUIRKS > + /* > + * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability. > + * If the device is a bridge, look to the next device upstream of it. > + * If that device is PCIe and not a PCIe-to-PCI bridge, then by > + * deduction, the device must be PCIe and therefore has a requester ID. > + */ > + if (dev->subordinate) { > + struct pci_dev *parent = dev->bus->self; > + > + if (pci_is_pcie(parent) && > + pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE) > + return true; > + } > +#endif > + > + return false; > +} > + > +/* > + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID? > + * @dev: requester device > + * @bridge: upstream bridge (or NULL for root bus) > + */ > +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev, > + struct pci_dev *bridge) > +{ > + /* > + * The entire path must be tested, if any step does not have a > + * requester ID, the chain is broken. This allows us to support > + * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe > + */ > + while (dev != bridge) { > + if (!pci_has_pcie_requester_id(dev)) > + return false; > + > + if (pci_is_root_bus(dev->bus)) > + return !bridge; /* false if we don't hit @bridge */ > + > + dev = dev->bus->self; > + } > + > + return true; > +} > + > +/* > + * Legacy PCI bridges within a root complex (ex. Intel 82801) report > + * a different requester ID than a standard PCIe-to-PCI bridge. Instead > + * of using (subordinate << 8 | 0) the use (bus << 8 | devfn), like a s/the/they/ Did you learn about this empirically? Intel spec? I wonder if there's some way to derive this from the PCIe specs. > + * standard PCIe endpoint. This function detects them. > + * > + * XXX Is this Intel vendor ID specific? > + */ > +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge) > +{ > + if (!pci_is_pcie(bridge) && pci_is_root_bus(bridge->bus)) > + return true; > + > + return false; > +} > + > +#define PCI_REQUESTER_ID(dev) (((dev)->bus->number << 8) | (dev)->devfn) > +#define PCI_BRIDGE_REQUESTER_ID(dev) ((dev)->subordinate->number << 8) > + > +/* > + * pci_get_visible_pcie_requester - Get requester and requester ID for > + * @requestee below @bridge > + * @requestee: requester device > + * @bridge: upstream bridge (or NULL for root bus) > + * @requester_id: location to store requester ID or NULL > + */ > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, > + struct pci_dev *bridge, > + u16 *requester_id) I'm not sure it makes sense to return a struct pci_dev here because there's no requirement that a requester ID correspond to an actual pci_dev. > +{ > + struct pci_dev *requester = requestee; > + > + while (requester != bridge) { > + requester = pci_get_dma_source(requester); > + pci_dev_put(requester); /* XXX skip ref cnt */ > + > + if (pci_has_visible_pcie_requester_id(requester, bridge)) If we acquire the "requester" pointer via a ref-counting interface, it's illegal to use the pointer after dropping the reference, isn't it? Maybe that's what you mean by the "XXX" comment. > + break; > + > + if (pci_is_root_bus(requester->bus)) > + return NULL; /* @bridge not parent to @requestee */ > + > + requester = requester->bus->self; > + } > + > + if (requester_id) { > + if (requester->bus != requestee->bus && > + !pci_bridge_uses_endpoint_requester(requester)) > + *requester_id = PCI_BRIDGE_REQUESTER_ID(requester); > + else > + *requester_id = PCI_REQUESTER_ID(requester); > + } > + > + return requester; > +} > + > +static int pci_do_requester_callback(struct pci_dev **dev, > + int (*fn)(struct pci_dev *, > + u16 id, void *), > + void *data) > +{ > + struct pci_dev *dma_dev; > + int ret; > + > + ret = fn(*dev, PCI_REQUESTER_ID(*dev), data); > + if (ret) > + return ret; > + > + dma_dev = pci_get_dma_source(*dev); > + pci_dev_put(dma_dev); /* XXX skip ref cnt */ > + if (dma_dev == *dev) Same ref count question as above. > + return 0; > + > + ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data); > + if (ret) > + return ret; > + > + *dev = dma_dev; > + return 0; > +} > + > +/* > + * pcie_for_each_requester - Call callback @fn on each devices and DMA source > + * from @requestee to the PCIe requester ID visible > + * to @bridge. Transactions from a device may appear with one of several requester IDs, but there's not necessarily an actual pci_dev for each ID, so I think the caller reads better if it's "...for_each_requester_id()" The "Call X on each devices and DMA source from Y to the requester ID" part doesn't quite make a sentence. > + * @requestee: Starting device > + * @bridge: upstream bridge (or NULL for root bus) You should say something about the significance of @bridge. I think the point is to call @fn for every possible requester ID @bridge could see for transactions from @requestee. This is a way to learn the requester IDs an IOMMU at @bridge needs to be prepared for. > + * @fn: callback function > + * @data: data to pass to callback > + */ > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, > + int (*fn)(struct pci_dev *, u16 id, void *), > + void *data) > +{ > + struct pci_dev *requester; > + struct pci_dev *dev = requestee; > + int ret = 0; > + > + requester = pci_get_visible_pcie_requester(requestee, bridge, NULL); > + if (!requester) > + return -EINVAL; > + > + do { > + ret = pci_do_requester_callback(&dev, fn, data); > + if (ret) > + return ret; > + > + if (dev == requester) > + return 0; > + > + /* > + * We always consider root bus devices to have a visible > + * requester ID, therefore this should never be true. > + */ > + BUG_ON(pci_is_root_bus(dev->bus)); What are we going to do if somebody hits this BUG_ON()? If it's impossible to hit, we should just remove it. If it's possible to hit it in some weird topology you didn't consider, we should see IOMMU faults for any requester ID we neglected to map, and that fault would be a better debugging hint than a BUG_ON() here. > + > + dev = dev->bus->self; > + > + } while (dev != requester); > + > + /* > + * If we've made it here, @requester is a bridge upstream from > + * @requestee. > + */ > + if (pci_bridge_uses_endpoint_requester(requester)) > + return pci_do_requester_callback(&requester, fn, data); > + > + return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data); > +} > + > +/* > * find the upstream PCIe-to-PCI bridge of a PCI device > * if the device is PCIE, return NULL > * if the device isn't connected to a PCIe bridge (that is its parent is a > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 3a24e4f..94e81d1 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev) > } > #endif > > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, > + struct pci_dev *bridge, > + u16 *requester_id); The structure of this interface implies that there is only one visible requester ID, but the whole point of this patch is that a transaction from @requestee may appear with one of several requester IDs. So which one will this return? > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, > + int (*fn)(struct pci_dev *, u16 id, void *), > + void *data); > + > /** > * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device > * @pdev: the PCI device > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote: > On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: > > This provides interfaces for drivers to discover the visible PCIe > > requester ID for a device, for things like IOMMU setup, and iterate > > IDs (plural) How does a device can't have multiple requester IDs? Reading below, I'm not sure we're on the same page for the purpose of this patch. > > over the device chain from requestee to requester, including DMA > > quirks at each step. > > "requestee" doesn't make sense to me. The "-ee" suffix added to a verb > normally makes a noun that refers to the object of the action. So > "requestee" sounds like it means something like "target" or "responder," > but that's not what you mean here. Hmm, ok. I figured a request-er makes a request on behalf of a request-ee. Suggestions? > > Suggested-by: Bjorn Helgaas <bhelgaas@google.com> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > --- > > drivers/pci/search.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++ > > include/linux/pci.h | 7 ++ > > 2 files changed, 205 insertions(+) > > > > diff --git a/drivers/pci/search.c b/drivers/pci/search.c > > index d0627fa..4759c02 100644 > > --- a/drivers/pci/search.c > > +++ b/drivers/pci/search.c > > @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem); > > EXPORT_SYMBOL_GPL(pci_bus_sem); > > > > /* > > + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID > > + * @dev: device to test > > + */ > > +static bool pci_has_pcie_requester_id(struct pci_dev *dev) > > +{ > > + /* > > + * XXX There's no indicator of the bus type, conventional PCI vs > > + * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe > > + * requester ID is a native PCIe based system (such as VT-d or > > + * AMD-Vi). It's common that PCIe root complex devices do not > > + * include a PCIe capability, but we can assume they are PCIe > > + * devices based on their topology. > > + */ > > + if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus)) > > + return true; > > + > > + /* > > + * PCI-X devices have a requester ID, but the bridge may still take > > + * ownership of transactions and create a requester ID. We therefore > > + * assume that the PCI-X requester ID is not the same one used on PCIe. > > + */ > > + > > +#ifdef CONFIG_PCI_QUIRKS > > + /* > > + * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability. > > + * If the device is a bridge, look to the next device upstream of it. > > + * If that device is PCIe and not a PCIe-to-PCI bridge, then by > > + * deduction, the device must be PCIe and therefore has a requester ID. > > + */ > > + if (dev->subordinate) { > > + struct pci_dev *parent = dev->bus->self; > > + > > + if (pci_is_pcie(parent) && > > + pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE) > > + return true; > > + } > > +#endif > > + > > + return false; > > +} > > + > > +/* > > + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID? > > + * @dev: requester device > > + * @bridge: upstream bridge (or NULL for root bus) > > + */ > > +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev, > > + struct pci_dev *bridge) > > +{ > > + /* > > + * The entire path must be tested, if any step does not have a > > + * requester ID, the chain is broken. This allows us to support > > + * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe > > + */ > > + while (dev != bridge) { > > + if (!pci_has_pcie_requester_id(dev)) > > + return false; > > + > > + if (pci_is_root_bus(dev->bus)) > > + return !bridge; /* false if we don't hit @bridge */ > > + > > + dev = dev->bus->self; > > + } > > + > > + return true; > > +} > > + > > +/* > > + * Legacy PCI bridges within a root complex (ex. Intel 82801) report > > + * a different requester ID than a standard PCIe-to-PCI bridge. Instead > > + * of using (subordinate << 8 | 0) the use (bus << 8 | devfn), like a > > s/the/they/ > > Did you learn about this empirically? Intel spec? I wonder if there's > some way to derive this from the PCIe specs. It's in the current intel-iommu logic, pretty much anywhere it uses pci_find_upstream_pcie_bridge() it follows that up with a pci_is_pcie() check. If it's PCIe, it uses a traditional PCIe-to-PCI bridge ID. If it's a legacy PCI bridge it uses the bridge ID itself. static int domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev, int translation) { ... /* dependent device mapping */ tmp = pci_find_upstream_pcie_bridge(pdev); if (!tmp) return 0; ... if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */ return domain_context_mapping_one(domain, pci_domain_nr(tmp->subordinate), tmp->subordinate->number, 0, translation); else /* this is a legacy PCI bridge */ return domain_context_mapping_one(domain, pci_domain_nr(tmp->bus), tmp->bus->number, tmp->devfn, translation); The 82801 reference is from looking at when this would actually happen on one of my own systems. > > + * standard PCIe endpoint. This function detects them. > > + * > > + * XXX Is this Intel vendor ID specific? > > + */ > > +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge) > > +{ > > + if (!pci_is_pcie(bridge) && pci_is_root_bus(bridge->bus)) > > + return true; > > + > > + return false; > > +} > > + > > +#define PCI_REQUESTER_ID(dev) (((dev)->bus->number << 8) | (dev)->devfn) > > +#define PCI_BRIDGE_REQUESTER_ID(dev) ((dev)->subordinate->number << 8) > > + > > +/* > > + * pci_get_visible_pcie_requester - Get requester and requester ID for > > + * @requestee below @bridge > > + * @requestee: requester device > > + * @bridge: upstream bridge (or NULL for root bus) > > + * @requester_id: location to store requester ID or NULL > > + */ > > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, > > + struct pci_dev *bridge, > > + u16 *requester_id) > > I'm not sure it makes sense to return a struct pci_dev here because > there's no requirement that a requester ID correspond to an actual > pci_dev. That's why this function is named get_.._requester instead of requester ID. I believe there still has to be a struct pci_dev that does the request, but the requester ID for that device may not actually match. So I return both. In a PCIe-to-PCI bridge case, the pci_dev is the bridge, but the requester ID is either the bridge bus|devfn or subordinate|0 depending on the topology. If we want to support "ghost functions", we can return the real pci_dev and a ghost requester ID. I think if we used just a requester ID, it ends up being extremely difficult to pass that into anything else since we then have to search again for where that requester ID is rooted. > > +{ > > + struct pci_dev *requester = requestee; > > + > > + while (requester != bridge) { > > + requester = pci_get_dma_source(requester); > > + pci_dev_put(requester); /* XXX skip ref cnt */ > > + > > + if (pci_has_visible_pcie_requester_id(requester, bridge)) > > If we acquire the "requester" pointer via a ref-counting interface, > it's illegal to use the pointer after dropping the reference, isn't it? > Maybe that's what you mean by the "XXX" comment. Yes, I was just following your RFC lead and didn't want to deal with reference counting until this approach had enough traction. > > + break; > > + > > + if (pci_is_root_bus(requester->bus)) > > + return NULL; /* @bridge not parent to @requestee */ > > + > > + requester = requester->bus->self; > > + } > > + > > + if (requester_id) { > > + if (requester->bus != requestee->bus && > > + !pci_bridge_uses_endpoint_requester(requester)) > > + *requester_id = PCI_BRIDGE_REQUESTER_ID(requester); > > + else > > + *requester_id = PCI_REQUESTER_ID(requester); > > + } > > + > > + return requester; > > +} > > + > > +static int pci_do_requester_callback(struct pci_dev **dev, > > + int (*fn)(struct pci_dev *, > > + u16 id, void *), > > + void *data) > > +{ > > + struct pci_dev *dma_dev; > > + int ret; > > + > > + ret = fn(*dev, PCI_REQUESTER_ID(*dev), data); > > + if (ret) > > + return ret; > > + > > + dma_dev = pci_get_dma_source(*dev); > > + pci_dev_put(dma_dev); /* XXX skip ref cnt */ > > + if (dma_dev == *dev) > > Same ref count question as above. > > > + return 0; > > + > > + ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data); > > + if (ret) > > + return ret; > > + > > + *dev = dma_dev; > > + return 0; > > +} > > + > > +/* > > + * pcie_for_each_requester - Call callback @fn on each devices and DMA source > > + * from @requestee to the PCIe requester ID visible > > + * to @bridge. > > Transactions from a device may appear with one of several requester IDs, > but there's not necessarily an actual pci_dev for each ID, so I think the > caller reads better if it's "...for_each_requester_id()" Wouldn't you expect to pass a requester ID into a function with that name? I'm pretty sure I had it named that at one point but thought the parameters made more sense this way. I'll see if I can think of a better name. > The "Call X on each devices and DMA source from Y to the requester ID" > part doesn't quite make a sentence. Ok > > + * @requestee: Starting device > > + * @bridge: upstream bridge (or NULL for root bus) > > You should say something about the significance of @bridge. I think the > point is to call @fn for every possible requester ID @bridge could see for > transactions from @requestee. This is a way to learn the requester IDs an > IOMMU at @bridge needs to be prepared for. I can add something. @bridge is supposed to be for bridge-based IOMMUs. Essentially it's just a stopping point. There might be PCI topology upstream of @bridge, but if you only want the requester ID seen by the bridge, you don't care. > > + * @fn: callback function > > + * @data: data to pass to callback > > + */ > > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, > > + int (*fn)(struct pci_dev *, u16 id, void *), > > + void *data) > > +{ > > + struct pci_dev *requester; > > + struct pci_dev *dev = requestee; > > + int ret = 0; > > + > > + requester = pci_get_visible_pcie_requester(requestee, bridge, NULL); > > + if (!requester) > > + return -EINVAL; > > + > > + do { > > + ret = pci_do_requester_callback(&dev, fn, data); > > + if (ret) > > + return ret; > > + > > + if (dev == requester) > > + return 0; > > + > > + /* > > + * We always consider root bus devices to have a visible > > + * requester ID, therefore this should never be true. > > + */ > > + BUG_ON(pci_is_root_bus(dev->bus)); > > What are we going to do if somebody hits this BUG_ON()? If it's impossible > to hit, we should just remove it. If it's possible to hit it in some weird > topology you didn't consider, we should see IOMMU faults for any requester > ID we neglected to map, and that fault would be a better debugging hint > than a BUG_ON() here. It's mostly for readability. I've learned that we never what to look at dev->bus->self without first testing pci_is_root_bus(dev->bus), so if I was reading this code I'd stumble when I got here and spend too long looking around for the assumption that makes it not needed. I suppose I could just make it a comment, but I thought why not make it official w/ a BUG. > > + > > + dev = dev->bus->self; > > + > > + } while (dev != requester); > > + > > + /* > > + * If we've made it here, @requester is a bridge upstream from > > + * @requestee. > > + */ > > + if (pci_bridge_uses_endpoint_requester(requester)) > > + return pci_do_requester_callback(&requester, fn, data); > > + > > + return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data); > > +} > > + > > +/* > > * find the upstream PCIe-to-PCI bridge of a PCI device > > * if the device is PCIE, return NULL > > * if the device isn't connected to a PCIe bridge (that is its parent is a > > diff --git a/include/linux/pci.h b/include/linux/pci.h > > index 3a24e4f..94e81d1 100644 > > --- a/include/linux/pci.h > > +++ b/include/linux/pci.h > > @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev) > > } > > #endif > > > > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, > > + struct pci_dev *bridge, > > + u16 *requester_id); > > The structure of this interface implies that there is only one visible > requester ID, but the whole point of this patch is that a transaction from > @requestee may appear with one of several requester IDs. So which one will > this return? I thought the point of this patch was to have an integrated interface for finding the requester ID and doing something across all devices with that requester ID and thereby remove pci_find_upstream_pcie_bridge(), provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for devices that use the wrong requester ID. In what case does a device appear to have multiple requester IDs? We have cases where devices use the wrong requester ID, but AFAIK they only use a single wrong requester ID. Thanks, Alex > > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, > > + int (*fn)(struct pci_dev *, u16 id, void *), > > + void *data); > > + > > /** > > * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device > > * @pdev: the PCI device > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 24, 2013 at 7:21 AM, Alex Williamson <alex.williamson@redhat.com> wrote: > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote: >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: >> > This provides interfaces for drivers to discover the visible PCIe >> > requester ID for a device, for things like IOMMU setup, and iterate >> >> IDs (plural) > > How does a device can't have multiple requester IDs? Reading below, I'm > not sure we're on the same page for the purpose of this patch. > >> > over the device chain from requestee to requester, including DMA >> > quirks at each step. >> >> "requestee" doesn't make sense to me. The "-ee" suffix added to a verb >> normally makes a noun that refers to the object of the action. So >> "requestee" sounds like it means something like "target" or "responder," >> but that's not what you mean here. > > Hmm, ok. I figured a request-er makes a request on behalf of a > request-ee. Suggestions? > >> > Suggested-by: Bjorn Helgaas <bhelgaas@google.com> >> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> >> > --- >> > drivers/pci/search.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++ >> > include/linux/pci.h | 7 ++ >> > 2 files changed, 205 insertions(+) >> > >> > diff --git a/drivers/pci/search.c b/drivers/pci/search.c >> > index d0627fa..4759c02 100644 >> > --- a/drivers/pci/search.c >> > +++ b/drivers/pci/search.c >> > @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem); >> > EXPORT_SYMBOL_GPL(pci_bus_sem); >> > >> > /* >> > + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID >> > + * @dev: device to test >> > + */ >> > +static bool pci_has_pcie_requester_id(struct pci_dev *dev) >> > +{ >> > + /* >> > + * XXX There's no indicator of the bus type, conventional PCI vs >> > + * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe >> > + * requester ID is a native PCIe based system (such as VT-d or >> > + * AMD-Vi). It's common that PCIe root complex devices do not >> > + * include a PCIe capability, but we can assume they are PCIe >> > + * devices based on their topology. >> > + */ >> > + if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus)) >> > + return true; >> > + >> > + /* >> > + * PCI-X devices have a requester ID, but the bridge may still take >> > + * ownership of transactions and create a requester ID. We therefore >> > + * assume that the PCI-X requester ID is not the same one used on PCIe. >> > + */ >> > + >> > +#ifdef CONFIG_PCI_QUIRKS >> > + /* >> > + * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability. >> > + * If the device is a bridge, look to the next device upstream of it. >> > + * If that device is PCIe and not a PCIe-to-PCI bridge, then by >> > + * deduction, the device must be PCIe and therefore has a requester ID. >> > + */ >> > + if (dev->subordinate) { >> > + struct pci_dev *parent = dev->bus->self; >> > + >> > + if (pci_is_pcie(parent) && >> > + pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE) >> > + return true; >> > + } >> > +#endif >> > + >> > + return false; >> > +} >> > + >> > +/* >> > + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID? >> > + * @dev: requester device >> > + * @bridge: upstream bridge (or NULL for root bus) >> > + */ >> > +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev, >> > + struct pci_dev *bridge) >> > +{ >> > + /* >> > + * The entire path must be tested, if any step does not have a >> > + * requester ID, the chain is broken. This allows us to support >> > + * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe >> > + */ >> > + while (dev != bridge) { >> > + if (!pci_has_pcie_requester_id(dev)) >> > + return false; >> > + >> > + if (pci_is_root_bus(dev->bus)) >> > + return !bridge; /* false if we don't hit @bridge */ >> > + >> > + dev = dev->bus->self; >> > + } >> > + >> > + return true; >> > +} >> > + >> > +/* >> > + * Legacy PCI bridges within a root complex (ex. Intel 82801) report >> > + * a different requester ID than a standard PCIe-to-PCI bridge. Instead >> > + * of using (subordinate << 8 | 0) the use (bus << 8 | devfn), like a >> >> s/the/they/ >> >> Did you learn about this empirically? Intel spec? I wonder if there's >> some way to derive this from the PCIe specs. > > It's in the current intel-iommu logic, pretty much anywhere it uses > pci_find_upstream_pcie_bridge() it follows that up with a pci_is_pcie() > check. If it's PCIe, it uses a traditional PCIe-to-PCI bridge ID. If > it's a legacy PCI bridge it uses the bridge ID itself. > > static int > domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev, > int translation) > { > ... > /* dependent device mapping */ > tmp = pci_find_upstream_pcie_bridge(pdev); > if (!tmp) > return 0; > ... > if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */ > return domain_context_mapping_one(domain, > pci_domain_nr(tmp->subordinate), > tmp->subordinate->number, 0, > translation); > else /* this is a legacy PCI bridge */ > return domain_context_mapping_one(domain, > pci_domain_nr(tmp->bus), > tmp->bus->number, > tmp->devfn, > translation); > > The 82801 reference is from looking at when this would actually happen > on one of my own systems. > > >> > + * standard PCIe endpoint. This function detects them. >> > + * >> > + * XXX Is this Intel vendor ID specific? >> > + */ >> > +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge) >> > +{ >> > + if (!pci_is_pcie(bridge) && pci_is_root_bus(bridge->bus)) >> > + return true; >> > + >> > + return false; >> > +} >> > + >> > +#define PCI_REQUESTER_ID(dev) (((dev)->bus->number << 8) | (dev)->devfn) >> > +#define PCI_BRIDGE_REQUESTER_ID(dev) ((dev)->subordinate->number << 8) >> > + >> > +/* >> > + * pci_get_visible_pcie_requester - Get requester and requester ID for >> > + * @requestee below @bridge >> > + * @requestee: requester device >> > + * @bridge: upstream bridge (or NULL for root bus) >> > + * @requester_id: location to store requester ID or NULL >> > + */ >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, >> > + struct pci_dev *bridge, >> > + u16 *requester_id) >> >> I'm not sure it makes sense to return a struct pci_dev here because >> there's no requirement that a requester ID correspond to an actual >> pci_dev. > > That's why this function is named get_.._requester instead of requester > ID. I believe there still has to be a struct pci_dev that does the > request, but the requester ID for that device may not actually match. > So I return both. In a PCIe-to-PCI bridge case, the pci_dev is the > bridge, but the requester ID is either the bridge bus|devfn or > subordinate|0 depending on the topology. If we want to support "ghost > functions", we can return the real pci_dev and a ghost requester ID. > > I think if we used just a requester ID, it ends up being extremely > difficult to pass that into anything else since we then have to search > again for where that requester ID is rooted. > >> > +{ >> > + struct pci_dev *requester = requestee; >> > + >> > + while (requester != bridge) { >> > + requester = pci_get_dma_source(requester); >> > + pci_dev_put(requester); /* XXX skip ref cnt */ >> > + >> > + if (pci_has_visible_pcie_requester_id(requester, bridge)) >> >> If we acquire the "requester" pointer via a ref-counting interface, >> it's illegal to use the pointer after dropping the reference, isn't it? >> Maybe that's what you mean by the "XXX" comment. > > > Yes, I was just following your RFC lead and didn't want to deal with > reference counting until this approach had enough traction. > > >> > + break; >> > + >> > + if (pci_is_root_bus(requester->bus)) >> > + return NULL; /* @bridge not parent to @requestee */ >> > + >> > + requester = requester->bus->self; >> > + } >> > + >> > + if (requester_id) { >> > + if (requester->bus != requestee->bus && >> > + !pci_bridge_uses_endpoint_requester(requester)) >> > + *requester_id = PCI_BRIDGE_REQUESTER_ID(requester); >> > + else >> > + *requester_id = PCI_REQUESTER_ID(requester); >> > + } >> > + >> > + return requester; >> > +} >> > + >> > +static int pci_do_requester_callback(struct pci_dev **dev, >> > + int (*fn)(struct pci_dev *, >> > + u16 id, void *), >> > + void *data) >> > +{ >> > + struct pci_dev *dma_dev; >> > + int ret; >> > + >> > + ret = fn(*dev, PCI_REQUESTER_ID(*dev), data); >> > + if (ret) >> > + return ret; >> > + >> > + dma_dev = pci_get_dma_source(*dev); >> > + pci_dev_put(dma_dev); /* XXX skip ref cnt */ >> > + if (dma_dev == *dev) >> >> Same ref count question as above. >> >> > + return 0; >> > + >> > + ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data); >> > + if (ret) >> > + return ret; >> > + >> > + *dev = dma_dev; >> > + return 0; >> > +} >> > + >> > +/* >> > + * pcie_for_each_requester - Call callback @fn on each devices and DMA source >> > + * from @requestee to the PCIe requester ID visible >> > + * to @bridge. >> >> Transactions from a device may appear with one of several requester IDs, >> but there's not necessarily an actual pci_dev for each ID, so I think the >> caller reads better if it's "...for_each_requester_id()" > > Wouldn't you expect to pass a requester ID into a function with that > name? I'm pretty sure I had it named that at one point but thought the > parameters made more sense this way. I'll see if I can think of a > better name. > >> The "Call X on each devices and DMA source from Y to the requester ID" >> part doesn't quite make a sentence. > > > Ok > >> > + * @requestee: Starting device >> > + * @bridge: upstream bridge (or NULL for root bus) >> >> You should say something about the significance of @bridge. I think the >> point is to call @fn for every possible requester ID @bridge could see for >> transactions from @requestee. This is a way to learn the requester IDs an >> IOMMU at @bridge needs to be prepared for. > > I can add something. @bridge is supposed to be for bridge-based IOMMUs. > Essentially it's just a stopping point. There might be PCI topology > upstream of @bridge, but if you only want the requester ID seen by the > bridge, you don't care. > >> > + * @fn: callback function >> > + * @data: data to pass to callback >> > + */ >> > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, >> > + int (*fn)(struct pci_dev *, u16 id, void *), >> > + void *data) >> > +{ >> > + struct pci_dev *requester; >> > + struct pci_dev *dev = requestee; >> > + int ret = 0; >> > + >> > + requester = pci_get_visible_pcie_requester(requestee, bridge, NULL); >> > + if (!requester) >> > + return -EINVAL; >> > + >> > + do { >> > + ret = pci_do_requester_callback(&dev, fn, data); >> > + if (ret) >> > + return ret; >> > + >> > + if (dev == requester) >> > + return 0; >> > + >> > + /* >> > + * We always consider root bus devices to have a visible >> > + * requester ID, therefore this should never be true. >> > + */ >> > + BUG_ON(pci_is_root_bus(dev->bus)); >> >> What are we going to do if somebody hits this BUG_ON()? If it's impossible >> to hit, we should just remove it. If it's possible to hit it in some weird >> topology you didn't consider, we should see IOMMU faults for any requester >> ID we neglected to map, and that fault would be a better debugging hint >> than a BUG_ON() here. > > It's mostly for readability. I've learned that we never what to look at > dev->bus->self without first testing pci_is_root_bus(dev->bus), so if I > was reading this code I'd stumble when I got here and spend too long > looking around for the assumption that makes it not needed. I suppose I > could just make it a comment, but I thought why not make it official w/ > a BUG. > >> > + >> > + dev = dev->bus->self; >> > + >> > + } while (dev != requester); >> > + >> > + /* >> > + * If we've made it here, @requester is a bridge upstream from >> > + * @requestee. >> > + */ >> > + if (pci_bridge_uses_endpoint_requester(requester)) >> > + return pci_do_requester_callback(&requester, fn, data); >> > + >> > + return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data); >> > +} >> > + >> > +/* >> > * find the upstream PCIe-to-PCI bridge of a PCI device >> > * if the device is PCIE, return NULL >> > * if the device isn't connected to a PCIe bridge (that is its parent is a >> > diff --git a/include/linux/pci.h b/include/linux/pci.h >> > index 3a24e4f..94e81d1 100644 >> > --- a/include/linux/pci.h >> > +++ b/include/linux/pci.h >> > @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev) >> > } >> > #endif >> > >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, >> > + struct pci_dev *bridge, >> > + u16 *requester_id); >> >> The structure of this interface implies that there is only one visible >> requester ID, but the whole point of this patch is that a transaction from >> @requestee may appear with one of several requester IDs. So which one will >> this return? > > I thought the point of this patch was to have an integrated interface > for finding the requester ID and doing something across all devices with > that requester ID and thereby remove pci_find_upstream_pcie_bridge(), > provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for > devices that use the wrong requester ID. In what case does a device > appear to have multiple requester IDs? We have cases where devices use > the wrong requester ID, but AFAIK they only use a single wrong requester > ID. Thanks, > The cases I've found where multiple requester IDs were used are all related to Marvell Sata controllers. Here's a table of affected devices with links to the bug reports. In each case both functions 0 and 1 are used. static const struct pci_dev_dma_multi_source_map { u16 vendor; u16 device; } pci_dev_dma_multi_source_map[] = { /* Reported by Patrick Bregman * https://bugzilla.redhat.com/show_bug.cgi?id=863653 */ { PCI_VENDOR_ID_MARVELL_EXT, 0x9120}, /* Reported by Pawe? ?ak, Korneliusz Jarz?bski, Daniel Mayer * https://bugzilla.kernel.org/show_bug.cgi?id=42679 and by * Justin Piszcz https://lkml.org/lkml/2012/11/24/94 */ { PCI_VENDOR_ID_MARVELL_EXT, 0x9123}, /* Reported by Robert Cicconetti * https://bugzilla.kernel.org/show_bug.cgi?id=42679 and by * Fernando https://bugzilla.redhat.com/show_bug.cgi?id=757166 */ { PCI_VENDOR_ID_MARVELL_EXT, 0x9128}, /* Reported by Stijn Tintel * https://bugzilla.kernel.org/show_bug.cgi?id=42679 */ { PCI_VENDOR_ID_MARVELL_EXT, 0x9130}, /* Reported by Gaudenz Steinlin * https://lkml.org/lkml/2013/3/5/288 */ { PCI_VENDOR_ID_MARVELL_EXT, 0x9143}, /* Reported by Andrew Cooks * https://bugzilla.kernel.org/show_bug.cgi?id=42679 */ { PCI_VENDOR_ID_MARVELL_EXT, 0x9172} }; > Alex > >> > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, >> > + int (*fn)(struct pci_dev *, u16 id, void *), >> > + void *data); >> > + >> > /** >> > * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device >> > * @pdev: the PCI device >> > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-07-24 at 23:03 +0800, Andrew Cooks wrote: > On Wed, Jul 24, 2013 at 7:21 AM, Alex Williamson > <alex.williamson@redhat.com> wrote: > > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote: > >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: > >> > This provides interfaces for drivers to discover the visible PCIe > >> > requester ID for a device, for things like IOMMU setup, and iterate > >> > >> IDs (plural) > > > > How does a device can't have multiple requester IDs? Reading below, I'm > > not sure we're on the same page for the purpose of this patch. > > > >> > over the device chain from requestee to requester, including DMA > >> > quirks at each step. > >> > >> "requestee" doesn't make sense to me. The "-ee" suffix added to a verb > >> normally makes a noun that refers to the object of the action. So > >> "requestee" sounds like it means something like "target" or "responder," > >> but that's not what you mean here. > > > > Hmm, ok. I figured a request-er makes a request on behalf of a > > request-ee. Suggestions? > > > >> > Suggested-by: Bjorn Helgaas <bhelgaas@google.com> > >> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > >> > --- > >> > drivers/pci/search.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > include/linux/pci.h | 7 ++ > >> > 2 files changed, 205 insertions(+) > >> > > >> > diff --git a/drivers/pci/search.c b/drivers/pci/search.c > >> > index d0627fa..4759c02 100644 > >> > --- a/drivers/pci/search.c > >> > +++ b/drivers/pci/search.c > >> > @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem); > >> > EXPORT_SYMBOL_GPL(pci_bus_sem); > >> > > >> > /* > >> > + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID > >> > + * @dev: device to test > >> > + */ > >> > +static bool pci_has_pcie_requester_id(struct pci_dev *dev) > >> > +{ > >> > + /* > >> > + * XXX There's no indicator of the bus type, conventional PCI vs > >> > + * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe > >> > + * requester ID is a native PCIe based system (such as VT-d or > >> > + * AMD-Vi). It's common that PCIe root complex devices do not > >> > + * include a PCIe capability, but we can assume they are PCIe > >> > + * devices based on their topology. > >> > + */ > >> > + if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus)) > >> > + return true; > >> > + > >> > + /* > >> > + * PCI-X devices have a requester ID, but the bridge may still take > >> > + * ownership of transactions and create a requester ID. We therefore > >> > + * assume that the PCI-X requester ID is not the same one used on PCIe. > >> > + */ > >> > + > >> > +#ifdef CONFIG_PCI_QUIRKS > >> > + /* > >> > + * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability. > >> > + * If the device is a bridge, look to the next device upstream of it. > >> > + * If that device is PCIe and not a PCIe-to-PCI bridge, then by > >> > + * deduction, the device must be PCIe and therefore has a requester ID. > >> > + */ > >> > + if (dev->subordinate) { > >> > + struct pci_dev *parent = dev->bus->self; > >> > + > >> > + if (pci_is_pcie(parent) && > >> > + pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE) > >> > + return true; > >> > + } > >> > +#endif > >> > + > >> > + return false; > >> > +} > >> > + > >> > +/* > >> > + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID? > >> > + * @dev: requester device > >> > + * @bridge: upstream bridge (or NULL for root bus) > >> > + */ > >> > +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev, > >> > + struct pci_dev *bridge) > >> > +{ > >> > + /* > >> > + * The entire path must be tested, if any step does not have a > >> > + * requester ID, the chain is broken. This allows us to support > >> > + * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe > >> > + */ > >> > + while (dev != bridge) { > >> > + if (!pci_has_pcie_requester_id(dev)) > >> > + return false; > >> > + > >> > + if (pci_is_root_bus(dev->bus)) > >> > + return !bridge; /* false if we don't hit @bridge */ > >> > + > >> > + dev = dev->bus->self; > >> > + } > >> > + > >> > + return true; > >> > +} > >> > + > >> > +/* > >> > + * Legacy PCI bridges within a root complex (ex. Intel 82801) report > >> > + * a different requester ID than a standard PCIe-to-PCI bridge. Instead > >> > + * of using (subordinate << 8 | 0) the use (bus << 8 | devfn), like a > >> > >> s/the/they/ > >> > >> Did you learn about this empirically? Intel spec? I wonder if there's > >> some way to derive this from the PCIe specs. > > > > It's in the current intel-iommu logic, pretty much anywhere it uses > > pci_find_upstream_pcie_bridge() it follows that up with a pci_is_pcie() > > check. If it's PCIe, it uses a traditional PCIe-to-PCI bridge ID. If > > it's a legacy PCI bridge it uses the bridge ID itself. > > > > static int > > domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev, > > int translation) > > { > > ... > > /* dependent device mapping */ > > tmp = pci_find_upstream_pcie_bridge(pdev); > > if (!tmp) > > return 0; > > ... > > if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */ > > return domain_context_mapping_one(domain, > > pci_domain_nr(tmp->subordinate), > > tmp->subordinate->number, 0, > > translation); > > else /* this is a legacy PCI bridge */ > > return domain_context_mapping_one(domain, > > pci_domain_nr(tmp->bus), > > tmp->bus->number, > > tmp->devfn, > > translation); > > > > The 82801 reference is from looking at when this would actually happen > > on one of my own systems. > > > > > >> > + * standard PCIe endpoint. This function detects them. > >> > + * > >> > + * XXX Is this Intel vendor ID specific? > >> > + */ > >> > +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge) > >> > +{ > >> > + if (!pci_is_pcie(bridge) && pci_is_root_bus(bridge->bus)) > >> > + return true; > >> > + > >> > + return false; > >> > +} > >> > + > >> > +#define PCI_REQUESTER_ID(dev) (((dev)->bus->number << 8) | (dev)->devfn) > >> > +#define PCI_BRIDGE_REQUESTER_ID(dev) ((dev)->subordinate->number << 8) > >> > + > >> > +/* > >> > + * pci_get_visible_pcie_requester - Get requester and requester ID for > >> > + * @requestee below @bridge > >> > + * @requestee: requester device > >> > + * @bridge: upstream bridge (or NULL for root bus) > >> > + * @requester_id: location to store requester ID or NULL > >> > + */ > >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, > >> > + struct pci_dev *bridge, > >> > + u16 *requester_id) > >> > >> I'm not sure it makes sense to return a struct pci_dev here because > >> there's no requirement that a requester ID correspond to an actual > >> pci_dev. > > > > That's why this function is named get_.._requester instead of requester > > ID. I believe there still has to be a struct pci_dev that does the > > request, but the requester ID for that device may not actually match. > > So I return both. In a PCIe-to-PCI bridge case, the pci_dev is the > > bridge, but the requester ID is either the bridge bus|devfn or > > subordinate|0 depending on the topology. If we want to support "ghost > > functions", we can return the real pci_dev and a ghost requester ID. > > > > I think if we used just a requester ID, it ends up being extremely > > difficult to pass that into anything else since we then have to search > > again for where that requester ID is rooted. > > > >> > +{ > >> > + struct pci_dev *requester = requestee; > >> > + > >> > + while (requester != bridge) { > >> > + requester = pci_get_dma_source(requester); > >> > + pci_dev_put(requester); /* XXX skip ref cnt */ > >> > + > >> > + if (pci_has_visible_pcie_requester_id(requester, bridge)) > >> > >> If we acquire the "requester" pointer via a ref-counting interface, > >> it's illegal to use the pointer after dropping the reference, isn't it? > >> Maybe that's what you mean by the "XXX" comment. > > > > > > Yes, I was just following your RFC lead and didn't want to deal with > > reference counting until this approach had enough traction. > > > > > >> > + break; > >> > + > >> > + if (pci_is_root_bus(requester->bus)) > >> > + return NULL; /* @bridge not parent to @requestee */ > >> > + > >> > + requester = requester->bus->self; > >> > + } > >> > + > >> > + if (requester_id) { > >> > + if (requester->bus != requestee->bus && > >> > + !pci_bridge_uses_endpoint_requester(requester)) > >> > + *requester_id = PCI_BRIDGE_REQUESTER_ID(requester); > >> > + else > >> > + *requester_id = PCI_REQUESTER_ID(requester); > >> > + } > >> > + > >> > + return requester; > >> > +} > >> > + > >> > +static int pci_do_requester_callback(struct pci_dev **dev, > >> > + int (*fn)(struct pci_dev *, > >> > + u16 id, void *), > >> > + void *data) > >> > +{ > >> > + struct pci_dev *dma_dev; > >> > + int ret; > >> > + > >> > + ret = fn(*dev, PCI_REQUESTER_ID(*dev), data); > >> > + if (ret) > >> > + return ret; > >> > + > >> > + dma_dev = pci_get_dma_source(*dev); > >> > + pci_dev_put(dma_dev); /* XXX skip ref cnt */ > >> > + if (dma_dev == *dev) > >> > >> Same ref count question as above. > >> > >> > + return 0; > >> > + > >> > + ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data); > >> > + if (ret) > >> > + return ret; > >> > + > >> > + *dev = dma_dev; > >> > + return 0; > >> > +} > >> > + > >> > +/* > >> > + * pcie_for_each_requester - Call callback @fn on each devices and DMA source > >> > + * from @requestee to the PCIe requester ID visible > >> > + * to @bridge. > >> > >> Transactions from a device may appear with one of several requester IDs, > >> but there's not necessarily an actual pci_dev for each ID, so I think the > >> caller reads better if it's "...for_each_requester_id()" > > > > Wouldn't you expect to pass a requester ID into a function with that > > name? I'm pretty sure I had it named that at one point but thought the > > parameters made more sense this way. I'll see if I can think of a > > better name. > > > >> The "Call X on each devices and DMA source from Y to the requester ID" > >> part doesn't quite make a sentence. > > > > > > Ok > > > >> > + * @requestee: Starting device > >> > + * @bridge: upstream bridge (or NULL for root bus) > >> > >> You should say something about the significance of @bridge. I think the > >> point is to call @fn for every possible requester ID @bridge could see for > >> transactions from @requestee. This is a way to learn the requester IDs an > >> IOMMU at @bridge needs to be prepared for. > > > > I can add something. @bridge is supposed to be for bridge-based IOMMUs. > > Essentially it's just a stopping point. There might be PCI topology > > upstream of @bridge, but if you only want the requester ID seen by the > > bridge, you don't care. > > > >> > + * @fn: callback function > >> > + * @data: data to pass to callback > >> > + */ > >> > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, > >> > + int (*fn)(struct pci_dev *, u16 id, void *), > >> > + void *data) > >> > +{ > >> > + struct pci_dev *requester; > >> > + struct pci_dev *dev = requestee; > >> > + int ret = 0; > >> > + > >> > + requester = pci_get_visible_pcie_requester(requestee, bridge, NULL); > >> > + if (!requester) > >> > + return -EINVAL; > >> > + > >> > + do { > >> > + ret = pci_do_requester_callback(&dev, fn, data); > >> > + if (ret) > >> > + return ret; > >> > + > >> > + if (dev == requester) > >> > + return 0; > >> > + > >> > + /* > >> > + * We always consider root bus devices to have a visible > >> > + * requester ID, therefore this should never be true. > >> > + */ > >> > + BUG_ON(pci_is_root_bus(dev->bus)); > >> > >> What are we going to do if somebody hits this BUG_ON()? If it's impossible > >> to hit, we should just remove it. If it's possible to hit it in some weird > >> topology you didn't consider, we should see IOMMU faults for any requester > >> ID we neglected to map, and that fault would be a better debugging hint > >> than a BUG_ON() here. > > > > It's mostly for readability. I've learned that we never what to look at > > dev->bus->self without first testing pci_is_root_bus(dev->bus), so if I > > was reading this code I'd stumble when I got here and spend too long > > looking around for the assumption that makes it not needed. I suppose I > > could just make it a comment, but I thought why not make it official w/ > > a BUG. > > > >> > + > >> > + dev = dev->bus->self; > >> > + > >> > + } while (dev != requester); > >> > + > >> > + /* > >> > + * If we've made it here, @requester is a bridge upstream from > >> > + * @requestee. > >> > + */ > >> > + if (pci_bridge_uses_endpoint_requester(requester)) > >> > + return pci_do_requester_callback(&requester, fn, data); > >> > + > >> > + return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data); > >> > +} > >> > + > >> > +/* > >> > * find the upstream PCIe-to-PCI bridge of a PCI device > >> > * if the device is PCIE, return NULL > >> > * if the device isn't connected to a PCIe bridge (that is its parent is a > >> > diff --git a/include/linux/pci.h b/include/linux/pci.h > >> > index 3a24e4f..94e81d1 100644 > >> > --- a/include/linux/pci.h > >> > +++ b/include/linux/pci.h > >> > @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev) > >> > } > >> > #endif > >> > > >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, > >> > + struct pci_dev *bridge, > >> > + u16 *requester_id); > >> > >> The structure of this interface implies that there is only one visible > >> requester ID, but the whole point of this patch is that a transaction from > >> @requestee may appear with one of several requester IDs. So which one will > >> this return? > > > > I thought the point of this patch was to have an integrated interface > > for finding the requester ID and doing something across all devices with > > that requester ID and thereby remove pci_find_upstream_pcie_bridge(), > > provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for > > devices that use the wrong requester ID. In what case does a device > > appear to have multiple requester IDs? We have cases where devices use > > the wrong requester ID, but AFAIK they only use a single wrong requester > > ID. Thanks, > > > > The cases I've found where multiple requester IDs were used are all related > to Marvell Sata controllers. > > Here's a table of affected devices with links to the > bug reports. In each case both functions 0 and 1 are used. I'm confused. There's only one requester ID for a given transaction. Therefore on a per transaction level there's a 1:1 relationship between a requester ID and a device. The requester ID may be incorrect, but there cannot be more than one per transaction. Are you suggesting that a given device (PCI function) will alternate using one requester ID for some transactions and a different requester ID for others? I'm thinking of cases like this bz: https://bugzilla.redhat.com/show_bug.cgi?id=757166#c16 In that case pci_get_visible_pcie_requester() (assuming no bridges are involved) should return the device and the requester ID that it would use if it worked correctly. pcie_for_each_requester() should then call the callback function using both the correct requester ID and the quirked requester ID. The reason I mentioned that this doesn't currently handle ghost functions is because it uses the pci_get_dma_source() quirk, which only handles actual devices. We'd need to add a pci_get_dma_alias() or something that would give us a list of ghost requester IDs to handle these broken marvel devices. Thanks, Alex > static const struct pci_dev_dma_multi_source_map { > u16 vendor; > u16 device; > } pci_dev_dma_multi_source_map[] = { > /* Reported by Patrick Bregman > * https://bugzilla.redhat.com/show_bug.cgi?id=863653 */ > { PCI_VENDOR_ID_MARVELL_EXT, 0x9120}, > > /* Reported by Pawe? ?ak, Korneliusz Jarz?bski, Daniel Mayer > * https://bugzilla.kernel.org/show_bug.cgi?id=42679 and by > * Justin Piszcz https://lkml.org/lkml/2012/11/24/94 */ > { PCI_VENDOR_ID_MARVELL_EXT, 0x9123}, > > /* Reported by Robert Cicconetti > * https://bugzilla.kernel.org/show_bug.cgi?id=42679 and by > * Fernando https://bugzilla.redhat.com/show_bug.cgi?id=757166 */ > { PCI_VENDOR_ID_MARVELL_EXT, 0x9128}, > > /* Reported by Stijn Tintel > * https://bugzilla.kernel.org/show_bug.cgi?id=42679 */ > { PCI_VENDOR_ID_MARVELL_EXT, 0x9130}, > > /* Reported by Gaudenz Steinlin > * https://lkml.org/lkml/2013/3/5/288 */ > { PCI_VENDOR_ID_MARVELL_EXT, 0x9143}, > > /* Reported by Andrew Cooks > * https://bugzilla.kernel.org/show_bug.cgi?id=42679 */ > { PCI_VENDOR_ID_MARVELL_EXT, 0x9172} > }; > > > > Alex > > > >> > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, > >> > + int (*fn)(struct pci_dev *, u16 id, void *), > >> > + void *data); > >> > + > >> > /** > >> > * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device > >> > * @pdev: the PCI device > >> > > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson <alex.williamson@redhat.com> wrote: > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote: >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: >> > This provides interfaces for drivers to discover the visible PCIe >> > requester ID for a device, for things like IOMMU setup, and iterate >> >> IDs (plural) > > How does a device can't have multiple requester IDs? Reading below, I'm > not sure we're on the same page for the purpose of this patch. >> "requestee" doesn't make sense to me. The "-ee" suffix added to a verb >> normally makes a noun that refers to the object of the action. So >> "requestee" sounds like it means something like "target" or "responder," >> but that's not what you mean here. > > Hmm, ok. I figured a request-er makes a request on behalf of a > request-ee. Suggestions? I would expect a request-er to make a request *to* a request-ee, just like a grant-or makes a grant to a grant-ee. My suggestion is to use "requester" consistently for only the originator of a DMA transaction. Any devices above it are by definition "bridges". As the DMA transaction propagates through the fabric, it may be tagged by bridges with different requester IDs. The requester IDs are needed outside PCI (by IOMMU drivers), but I'm not sure the intermediate pci_devs are. >> > + * pci_get_visible_pcie_requester - Get requester and requester ID for >> > + * @requestee below @bridge >> > + * @requestee: requester device >> > + * @bridge: upstream bridge (or NULL for root bus) >> > + * @requester_id: location to store requester ID or NULL >> > + */ >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, >> > + struct pci_dev *bridge, >> > + u16 *requester_id) >> >> I'm not sure it makes sense to return a struct pci_dev here because >> there's no requirement that a requester ID correspond to an actual >> pci_dev. > > That's why this function is named get_.._requester instead of requester > ID. I believe there still has to be a struct pci_dev that does the > request, but the requester ID for that device may not actually match. > So I return both. In a PCIe-to-PCI bridge case, the pci_dev is the > bridge, but the requester ID is either the bridge bus|devfn or > subordinate|0 depending on the topology. If we want to support "ghost > functions", we can return the real pci_dev and a ghost requester ID. > > I think if we used just a requester ID, it ends up being extremely > difficult to pass that into anything else since we then have to search > again for where that requester ID is rooted. Returning both a pci_dev and a requester ID makes it more complicated. At the hardware level, transactions contain only a requester ID, and bridges can either drop it, pass it unchanged, or assign a new one. I think the code will be simpler if we just model that. >> > + * pcie_for_each_requester - Call callback @fn on each devices and DMA source >> > + * from @requestee to the PCIe requester ID visible >> > + * to @bridge. >> >> Transactions from a device may appear with one of several requester IDs, >> but there's not necessarily an actual pci_dev for each ID, so I think the >> caller reads better if it's "...for_each_requester_id()" > > Wouldn't you expect to pass a requester ID into a function with that > name? I'm pretty sure I had it named that at one point but thought the > parameters made more sense this way. I'll see if I can think of a > better name. My thought was to pass a pci_dev (the originator of the DMA, which I would call the "requester") and a callback. The callback would accept the requester pci_dev (always the same requester device) and a requester ID. This would call @fn for each possible requester ID for transactions from the device. IOMMU drivers should only need the requester ID to manage mappings; they shouldn't need a pci_dev corresponding to any intermediate bridges. >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, >> > + struct pci_dev *bridge, >> > + u16 *requester_id); >> >> The structure of this interface implies that there is only one visible >> requester ID, but the whole point of this patch is that a transaction from >> @requestee may appear with one of several requester IDs. So which one will >> this return? > > I thought the point of this patch was to have an integrated interface > for finding the requester ID and doing something across all devices with > that requester ID Oh, here's the difference in our understanding. "Doing something across all devices with that requester ID" sounds like identifying the set of devices that have to be handled as a group by the IOMMU. That's certainly an issue, but I wasn't considering it at all. I was only concerned with the question of a single device that requires multiple IOMMU mappings because DMA requests might use any of several source IDs. This is mentioned in sec 3.6.1.1 of the VT-d spec, i.e., "requests arriving with the source-id in the original PCI-X transaction or the source-id provided by the bridge. ... software must program multiple context entries, each corresponding to the possible set of source-ids." My thought was that the IOMMU driver would use pcie_for_each_requester_id() and have the callback set up a mapping for one requester ID each time it was called. > and thereby remove pci_find_upstream_pcie_bridge(), > provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for > devices that use the wrong requester ID. In what case does a device > appear to have multiple requester IDs? We have cases where devices use > the wrong requester ID, but AFAIK they only use a single wrong requester > ID. I think the example in sec 3.6.1.1 answers this, doesn't it? Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote: > On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson > <alex.williamson@redhat.com> wrote: > > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote: > >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: > >> > This provides interfaces for drivers to discover the visible PCIe > >> > requester ID for a device, for things like IOMMU setup, and iterate > >> > >> IDs (plural) > > > > How does a device can't have multiple requester IDs? Reading below, I'm > > not sure we're on the same page for the purpose of this patch. > > >> "requestee" doesn't make sense to me. The "-ee" suffix added to a verb > >> normally makes a noun that refers to the object of the action. So > >> "requestee" sounds like it means something like "target" or "responder," > >> but that's not what you mean here. > > > > Hmm, ok. I figured a request-er makes a request on behalf of a > > request-ee. Suggestions? > > I would expect a request-er to make a request *to* a request-ee, just > like a grant-or makes a grant to a grant-ee. My suggestion is to use > "requester" consistently for only the originator of a DMA transaction. > Any devices above it are by definition "bridges". But there may not be a bridge, so are we then creating just a revised pci_find_upstream_pcie_bridge() function? I think we want to abstract away caring whether there's a bridge or a well behaved device or a broken device that requires quirks. > As the DMA > transaction propagates through the fabric, it may be tagged by bridges > with different requester IDs. > > The requester IDs are needed outside PCI (by IOMMU drivers), but I'm > not sure the intermediate pci_devs are. A u16 requester ID doesn't mean much on it's own though, it's not necessarily even unique. A requester ID associated with the context of a pci_dev is unique and gives us a reference point if we need to perform another operation on that requester ID. > >> > + * pci_get_visible_pcie_requester - Get requester and requester ID for > >> > + * @requestee below @bridge > >> > + * @requestee: requester device > >> > + * @bridge: upstream bridge (or NULL for root bus) > >> > + * @requester_id: location to store requester ID or NULL > >> > + */ > >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, > >> > + struct pci_dev *bridge, > >> > + u16 *requester_id) > >> > >> I'm not sure it makes sense to return a struct pci_dev here because > >> there's no requirement that a requester ID correspond to an actual > >> pci_dev. > > > > That's why this function is named get_.._requester instead of requester > > ID. I believe there still has to be a struct pci_dev that does the > > request, but the requester ID for that device may not actually match. > > So I return both. In a PCIe-to-PCI bridge case, the pci_dev is the > > bridge, but the requester ID is either the bridge bus|devfn or > > subordinate|0 depending on the topology. If we want to support "ghost > > functions", we can return the real pci_dev and a ghost requester ID. > > > > I think if we used just a requester ID, it ends up being extremely > > difficult to pass that into anything else since we then have to search > > again for where that requester ID is rooted. > > Returning both a pci_dev and a requester ID makes it more complicated. > At the hardware level, transactions contain only a requester ID, and > bridges can either drop it, pass it unchanged, or assign a new one. I > think the code will be simpler if we just model that. I'm not convinced. Patch 2/2 makes use of both the returned pci_dev and the returned requester ID and it's a huge simplification overall. > >> > + * pcie_for_each_requester - Call callback @fn on each devices and DMA source > >> > + * from @requestee to the PCIe requester ID visible > >> > + * to @bridge. > >> > >> Transactions from a device may appear with one of several requester IDs, > >> but there's not necessarily an actual pci_dev for each ID, so I think the > >> caller reads better if it's "...for_each_requester_id()" > > > > Wouldn't you expect to pass a requester ID into a function with that > > name? I'm pretty sure I had it named that at one point but thought the > > parameters made more sense this way. I'll see if I can think of a > > better name. > > My thought was to pass a pci_dev (the originator of the DMA, which I > would call the "requester") and a callback. The callback would accept > the requester pci_dev (always the same requester device) and a > requester ID. > > This would call @fn for each possible requester ID for transactions > from the device. IOMMU drivers should only need the requester ID to > manage mappings; they shouldn't need a pci_dev corresponding to any > intermediate bridges. This implementation is almost the same with the only change being that the pci_dev passed to the callback is the one most closely associated with the requester ID. For the IOMMU driver, it doesn't matter since it's only using the requester ID, but why would the callback care about the original requester? If it needed to do something device specific, it's going to care about the closest device to the requester ID. > >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, > >> > + struct pci_dev *bridge, > >> > + u16 *requester_id); > >> > >> The structure of this interface implies that there is only one visible > >> requester ID, but the whole point of this patch is that a transaction from > >> @requestee may appear with one of several requester IDs. So which one will > >> this return? > > > > I thought the point of this patch was to have an integrated interface > > for finding the requester ID and doing something across all devices with > > that requester ID > > Oh, here's the difference in our understanding. "Doing something > across all devices with that requester ID" sounds like identifying the > set of devices that have to be handled as a group by the IOMMU. > That's certainly an issue, but I wasn't considering it at all. > > I was only concerned with the question of a single device that > requires multiple IOMMU mappings because DMA requests might use any of > several source IDs. This is mentioned in sec 3.6.1.1 of the VT-d > spec, i.e., "requests arriving with the source-id in the original > PCI-X transaction or the source-id provided by the bridge. ... > software must program multiple context entries, each corresponding to > the possible set of source-ids." > > My thought was that the IOMMU driver would use > pcie_for_each_requester_id() and have the callback set up a mapping > for one requester ID each time it was called. Which is exactly what pcie_for_each_requester() does, but I try to solve the problem of a single device that needs multiple context entries the same as a bridge that masks multiple devices. > > and thereby remove pci_find_upstream_pcie_bridge(), > > provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for > > devices that use the wrong requester ID. In what case does a device > > appear to have multiple requester IDs? We have cases where devices use > > the wrong requester ID, but AFAIK they only use a single wrong requester > > ID. > > I think the example in sec 3.6.1.1 answers this, doesn't it? It does, in summary we have to be prepared that a transaction from a device behind a PCIe-to-PCI/X bridge may use either the device requester ID (bus|devfn) or the bridge (subordinate|0) requester ID. This is why pcie_for_each_requester() makes sense to me, we iterate across all requesters that may use the same requester ID. If we had a pcie_for_each_requester_id() how would we handle that? Do we call it once for the (bus|devfn) requester ID of the device we're trying to map and then call it again for the (subordinate|0) requester ID of the bridge? That's almost how pci_find_upstream_pcie_bridge() is used and it ends up putting a lot of the logic into the driver. How do we write a pcie_for_each_requester_id() function that can uniquely identify a PCI hierarchy to traverse in the presences of multiple domains? I have a hard time seeing how we get away from returning a {pci_dev, requester ID} pair if we want to be descriptive. Maybe we create a: struct pci_requester_id { struct pci_dev *dev; u16 id; } and return that if this is a matter of semantics. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/23/2013 06:35 PM, Bjorn Helgaas wrote: > On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: >> This provides interfaces for drivers to discover the visible PCIe >> requester ID for a device, for things like IOMMU setup, and iterate > > IDs (plural) > a single device does not have multiple requester id's; can have multiple tag-id's (that are ignored in this situation, but can be used by switches for ordering purposes), but there's only 1/fcn (except for those quirker pdevs!). >> over the device chain from requestee to requester, including DMA >> quirks at each step. > > "requestee" doesn't make sense to me. The "-ee" suffix added to a verb > normally makes a noun that refers to the object of the action. So > "requestee" sounds like it means something like "target" or "responder," > but that's not what you mean here. > sorry, I didn't follow the requester/requestee terminology either... >> Suggested-by: Bjorn Helgaas<bhelgaas@google.com> >> Signed-off-by: Alex Williamson<alex.williamson@redhat.com> >> --- >> drivers/pci/search.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++ >> include/linux/pci.h | 7 ++ >> 2 files changed, 205 insertions(+) >> >> diff --git a/drivers/pci/search.c b/drivers/pci/search.c >> index d0627fa..4759c02 100644 >> --- a/drivers/pci/search.c >> +++ b/drivers/pci/search.c >> @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem); >> EXPORT_SYMBOL_GPL(pci_bus_sem); >> >> /* >> + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID >> + * @dev: device to test >> + */ >> +static bool pci_has_pcie_requester_id(struct pci_dev *dev) >> +{ >> + /* >> + * XXX There's no indicator of the bus type, conventional PCI vs >> + * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe >> + * requester ID is a native PCIe based system (such as VT-d or >> + * AMD-Vi). It's common that PCIe root complex devices do not could the above comment be x86-iommu-neutral? by definition of PCIe, all devices have a requester id (min. to accept cfg cycles); req'd if source of read/write requests, read completions. >> + * include a PCIe capability, but we can assume they are PCIe >> + * devices based on their topology. >> + */ >> + if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus)) >> + return true; >> + >> + /* >> + * PCI-X devices have a requester ID, but the bridge may still take >> + * ownership of transactions and create a requester ID. We therefore >> + * assume that the PCI-X requester ID is not the same one used on PCIe. >> + */ >> + >> +#ifdef CONFIG_PCI_QUIRKS >> + /* >> + * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability. >> + * If the device is a bridge, look to the next device upstream of it. >> + * If that device is PCIe and not a PCIe-to-PCI bridge, then by >> + * deduction, the device must be PCIe and therefore has a requester ID. >> + */ >> + if (dev->subordinate) { >> + struct pci_dev *parent = dev->bus->self; >> + >> + if (pci_is_pcie(parent)&& >> + pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE) >> + return true; >> + } >> +#endif >> + >> + return false; >> +} >> + >> +/* >> + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID? >> + * @dev: requester device >> + * @bridge: upstream bridge (or NULL for root bus) >> + */ >> +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev, >> + struct pci_dev *bridge) >> +{ >> + /* >> + * The entire path must be tested, if any step does not have a >> + * requester ID, the chain is broken. This allows us to support >> + * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe >> + */ >> + while (dev != bridge) { >> + if (!pci_has_pcie_requester_id(dev)) >> + return false; >> + >> + if (pci_is_root_bus(dev->bus)) >> + return !bridge; /* false if we don't hit @bridge */ >> + >> + dev = dev->bus->self; >> + } >> + >> + return true; >> +} >> + >> +/* >> + * Legacy PCI bridges within a root complex (ex. Intel 82801) report >> + * a different requester ID than a standard PCIe-to-PCI bridge. Instead First, I'm assuming you mean that devices behind a Legacy PCI bridge within a root complex get assigned IDs different than std PCIe-to-PCI bridges (as quoted below). >> + * of using (subordinate<< 8 | 0) the use (bus<< 8 | devfn), like a > > s/the/they/ > well, the PPBs should inject their (secondary << 8 | 0), not subordinate. From the PCI Express to PCI/PCI-X Bridge spec, v1.0: The Tag is reassigned by the bridge according to the rules outlined in the following sections. If the bridge generates a new Requester ID for a transaction forwarded from the secondary interface to the primary interface, the bridge assigns the PCI Express Requester ID using its secondary interface’s ^^^^^^^^^ Bus Number and sets both the Device Number and Function Number fields to zero. ^^^^^^^^^^ As for the 82801, looks like they took this part of the PCIe spec to heart: (PCIe v3 spec, section 2.2.6.2 Transaction Descriptor - Transaction ID Field): Exception: The assignment of Bus and Device Numbers to the Devices within a Root Complex, and the Device Numbers to the Downstream Ports within a Switch, may be done in an implementation specific way. Obviously, you're missing the 'implementation-specific way' compiler... ;-) aw: which 'bus' do you mean above in '(bus<< 8 | devfn)' ? > Did you learn about this empirically? Intel spec? I wonder if there's > some way to derive this from the PCIe specs. > >> + * standard PCIe endpoint. This function detects them. >> + * >> + * XXX Is this Intel vendor ID specific? >> + */ >> +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge) >> +{ >> + if (!pci_is_pcie(bridge)&& pci_is_root_bus(bridge->bus)) >> + return true; >> + >> + return false; >> +} >> + >> +#define PCI_REQUESTER_ID(dev) (((dev)->bus->number<< 8) | (dev)->devfn) >> +#define PCI_BRIDGE_REQUESTER_ID(dev) ((dev)->subordinate->number<< 8) >> + >> +/* >> + * pci_get_visible_pcie_requester - Get requester and requester ID for >> + * @requestee below @bridge >> + * @requestee: requester device >> + * @bridge: upstream bridge (or NULL for root bus) >> + * @requester_id: location to store requester ID or NULL >> + */ >> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, >> + struct pci_dev *bridge, >> + u16 *requester_id) > > I'm not sure it makes sense to return a struct pci_dev here because > there's no requirement that a requester ID correspond to an actual > pci_dev. > well, I would expect the only callers would be for subsys (iommu's) searching to find requester-id for a pdev, b/c if a pdev doesn't exist, then the device (and requester-id) doesn't exist... :-/ >> +{ >> + struct pci_dev *requester = requestee; >> + >> + while (requester != bridge) { >> + requester = pci_get_dma_source(requester); >> + pci_dev_put(requester); /* XXX skip ref cnt */ >> + >> + if (pci_has_visible_pcie_requester_id(requester, bridge)) > > If we acquire the "requester" pointer via a ref-counting interface, > it's illegal to use the pointer after dropping the reference, isn't it? > Maybe that's what you mean by the "XXX" comment. > >> + break; >> + >> + if (pci_is_root_bus(requester->bus)) >> + return NULL; /* @bridge not parent to @requestee */ >> + >> + requester = requester->bus->self; >> + } >> + >> + if (requester_id) { >> + if (requester->bus != requestee->bus&& >> + !pci_bridge_uses_endpoint_requester(requester)) >> + *requester_id = PCI_BRIDGE_REQUESTER_ID(requester); >> + else >> + *requester_id = PCI_REQUESTER_ID(requester); >> + } >> + >> + return requester; >> +} >> + >> +static int pci_do_requester_callback(struct pci_dev **dev, >> + int (*fn)(struct pci_dev *, >> + u16 id, void *), >> + void *data) >> +{ >> + struct pci_dev *dma_dev; >> + int ret; >> + >> + ret = fn(*dev, PCI_REQUESTER_ID(*dev), data); >> + if (ret) >> + return ret; >> + >> + dma_dev = pci_get_dma_source(*dev); >> + pci_dev_put(dma_dev); /* XXX skip ref cnt */ >> + if (dma_dev == *dev) > > Same ref count question as above. > >> + return 0; >> + >> + ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data); >> + if (ret) >> + return ret; >> + >> + *dev = dma_dev; >> + return 0; >> +} >> + >> +/* >> + * pcie_for_each_requester - Call callback @fn on each devices and DMA source >> + * from @requestee to the PCIe requester ID visible >> + * to @bridge. > > Transactions from a device may appear with one of several requester IDs, > but there's not necessarily an actual pci_dev for each ID, so I think the ditto above; have to have a pdev for each id.... > caller reads better if it's "...for_each_requester_id()" > > The "Call X on each devices and DMA source from Y to the requester ID" > part doesn't quite make a sentence. > >> + * @requestee: Starting device >> + * @bridge: upstream bridge (or NULL for root bus) > > You should say something about the significance of @bridge. I think the > point is to call @fn for every possible requester ID @bridge could see for > transactions from @requestee. This is a way to learn the requester IDs an > IOMMU at @bridge needs to be prepared for. > >> + * @fn: callback function >> + * @data: data to pass to callback >> + */ >> +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, >> + int (*fn)(struct pci_dev *, u16 id, void *), >> + void *data) >> +{ >> + struct pci_dev *requester; >> + struct pci_dev *dev = requestee; >> + int ret = 0; >> + >> + requester = pci_get_visible_pcie_requester(requestee, bridge, NULL); >> + if (!requester) >> + return -EINVAL; >> + >> + do { >> + ret = pci_do_requester_callback(&dev, fn, data); >> + if (ret) >> + return ret; >> + >> + if (dev == requester) >> + return 0; >> + >> + /* >> + * We always consider root bus devices to have a visible >> + * requester ID, therefore this should never be true. >> + */ >> + BUG_ON(pci_is_root_bus(dev->bus)); > > What are we going to do if somebody hits this BUG_ON()? If it's impossible > to hit, we should just remove it. If it's possible to hit it in some weird > topology you didn't consider, we should see IOMMU faults for any requester > ID we neglected to map, and that fault would be a better debugging hint > than a BUG_ON() here. > according to spec, all pdev's have a requester-id, even RC ones, albeit "implementation specific"... >> + >> + dev = dev->bus->self; >> + >> + } while (dev != requester); >> + >> + /* >> + * If we've made it here, @requester is a bridge upstream from >> + * @requestee. >> + */ >> + if (pci_bridge_uses_endpoint_requester(requester)) >> + return pci_do_requester_callback(&requester, fn, data); >> + >> + return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data); >> +} >> + >> +/* >> * find the upstream PCIe-to-PCI bridge of a PCI device >> * if the device is PCIE, return NULL >> * if the device isn't connected to a PCIe bridge (that is its parent is a >> diff --git a/include/linux/pci.h b/include/linux/pci.h >> index 3a24e4f..94e81d1 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev) >> } >> #endif >> >> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, >> + struct pci_dev *bridge, >> + u16 *requester_id); > > The structure of this interface implies that there is only one visible > requester ID, but the whole point of this patch is that a transaction from > @requestee may appear with one of several requester IDs. So which one will > this return? > Are there devices that use multiple requester id's? I know we have ones that use the wrong id. If we want to handle the multiple requester-id's per pdev, we could pass in a ptr to an initial requester-id; if null, give me first; if !null, give me next one; would also need a flag returned to indicate there is more than 1. null-rtn when pass in !null ref, means the end. ... sledge-hammer + nail ??? >> +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, >> + int (*fn)(struct pci_dev *, u16 id, void *), >> + void *data); >> + >> /** >> * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device >> * @pdev: the PCI device >> -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-07-24 at 16:42 -0400, Don Dutile wrote: > On 07/23/2013 06:35 PM, Bjorn Helgaas wrote: > > On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: > >> This provides interfaces for drivers to discover the visible PCIe > >> requester ID for a device, for things like IOMMU setup, and iterate > > > > IDs (plural) > > > a single device does not have multiple requester id's; > can have multiple tag-id's (that are ignored in this situation, but > can be used by switches for ordering purposes), but there's only 1/fcn > (except for those quirker pdevs!). > > >> over the device chain from requestee to requester, including DMA > >> quirks at each step. > > > > "requestee" doesn't make sense to me. The "-ee" suffix added to a verb > > normally makes a noun that refers to the object of the action. So > > "requestee" sounds like it means something like "target" or "responder," > > but that's not what you mean here. > > > sorry, I didn't follow the requester/requestee terminology either... > > >> Suggested-by: Bjorn Helgaas<bhelgaas@google.com> > >> Signed-off-by: Alex Williamson<alex.williamson@redhat.com> > >> --- > >> drivers/pci/search.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++ > >> include/linux/pci.h | 7 ++ > >> 2 files changed, 205 insertions(+) > >> > >> diff --git a/drivers/pci/search.c b/drivers/pci/search.c > >> index d0627fa..4759c02 100644 > >> --- a/drivers/pci/search.c > >> +++ b/drivers/pci/search.c > >> @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem); > >> EXPORT_SYMBOL_GPL(pci_bus_sem); > >> > >> /* > >> + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID > >> + * @dev: device to test > >> + */ > >> +static bool pci_has_pcie_requester_id(struct pci_dev *dev) > >> +{ > >> + /* > >> + * XXX There's no indicator of the bus type, conventional PCI vs > >> + * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe > >> + * requester ID is a native PCIe based system (such as VT-d or > >> + * AMD-Vi). It's common that PCIe root complex devices do not > could the above comment be x86-iommu-neutral? > by definition of PCIe, all devices have a requester id (min. to accept cfg cycles); > req'd if source of read/write requests, read completions. I agree completely, the question is whether we have a PCIe root complex or a conventional PCI host bus. I don't think we have any way to tell, so I'm assuming pci_is_root_bus() indicates we're on a PCIe root complex and therefore have requester IDs. If there's some way to determine this let me know and we can avoid any kind of assumption. > >> + * include a PCIe capability, but we can assume they are PCIe > >> + * devices based on their topology. > >> + */ > >> + if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus)) > >> + return true; > >> + > >> + /* > >> + * PCI-X devices have a requester ID, but the bridge may still take > >> + * ownership of transactions and create a requester ID. We therefore > >> + * assume that the PCI-X requester ID is not the same one used on PCIe. > >> + */ > >> + > >> +#ifdef CONFIG_PCI_QUIRKS > >> + /* > >> + * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability. > >> + * If the device is a bridge, look to the next device upstream of it. > >> + * If that device is PCIe and not a PCIe-to-PCI bridge, then by > >> + * deduction, the device must be PCIe and therefore has a requester ID. > >> + */ > >> + if (dev->subordinate) { > >> + struct pci_dev *parent = dev->bus->self; > >> + > >> + if (pci_is_pcie(parent)&& > >> + pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE) > >> + return true; > >> + } > >> +#endif > >> + > >> + return false; > >> +} > >> + > >> +/* > >> + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID? > >> + * @dev: requester device > >> + * @bridge: upstream bridge (or NULL for root bus) > >> + */ > >> +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev, > >> + struct pci_dev *bridge) > >> +{ > >> + /* > >> + * The entire path must be tested, if any step does not have a > >> + * requester ID, the chain is broken. This allows us to support > >> + * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe > >> + */ > >> + while (dev != bridge) { > >> + if (!pci_has_pcie_requester_id(dev)) > >> + return false; > >> + > >> + if (pci_is_root_bus(dev->bus)) > >> + return !bridge; /* false if we don't hit @bridge */ > >> + > >> + dev = dev->bus->self; > >> + } > >> + > >> + return true; > >> +} > >> + > >> +/* > >> + * Legacy PCI bridges within a root complex (ex. Intel 82801) report > >> + * a different requester ID than a standard PCIe-to-PCI bridge. Instead > First, I'm assuming you mean that devices behind a Legacy PCI bridge within a root complex > get assigned IDs different than std PCIe-to-PCI bridges (as quoted below). Yes > >> + * of using (subordinate<< 8 | 0) the use (bus<< 8 | devfn), like a > > > > s/the/they/ > > > well, the PPBs should inject their (secondary << 8 | 0), not subordinate. > From the PCI Express to PCI/PCI-X Bridge spec, v1.0: > The Tag is reassigned by the bridge according to the rules outlined in the following sections. If the > bridge generates a new Requester ID for a transaction forwarded from the secondary interface to the > primary interface, the bridge assigns the PCI Express Requester ID using its secondary interface’s > ^^^^^^^^^ > Bus Number and sets both the Device Number and Function Number fields to zero. > ^^^^^^^^^^ I'm referring to (pdev->subordinate->number << 8 | 0) vs (pdev->bus->number << 8 | pdev->devfn). The subordinate struct pci_bus is the secondary interface bus. > As for the 82801, looks like they took this part of the PCIe spec to heart: > (PCIe v3 spec, section 2.2.6.2 Transaction Descriptor - Transaction ID Field): > Exception: The assignment of Bus and Device Numbers to the Devices within a Root Complex, > and the Device Numbers to the Downstream Ports within a Switch, may be done in an > implementation specific way. > Obviously, you're missing the 'implementation-specific way' compiler... ;-) So this is potentially Intel specific. How can we tell it's an Intel root complex? > aw: which 'bus' do you mean above in '(bus<< 8 | devfn)' ? (pdev->bus->number << 8 | pdev->devfn) > > Did you learn about this empirically? Intel spec? I wonder if there's > > some way to derive this from the PCIe specs. > > > >> + * standard PCIe endpoint. This function detects them. > >> + * > >> + * XXX Is this Intel vendor ID specific? > >> + */ > >> +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge) > >> +{ > >> + if (!pci_is_pcie(bridge)&& pci_is_root_bus(bridge->bus)) > >> + return true; > >> + > >> + return false; > >> +} > >> + > >> +#define PCI_REQUESTER_ID(dev) (((dev)->bus->number<< 8) | (dev)->devfn) > >> +#define PCI_BRIDGE_REQUESTER_ID(dev) ((dev)->subordinate->number<< 8) > >> + > >> +/* > >> + * pci_get_visible_pcie_requester - Get requester and requester ID for > >> + * @requestee below @bridge > >> + * @requestee: requester device > >> + * @bridge: upstream bridge (or NULL for root bus) > >> + * @requester_id: location to store requester ID or NULL > >> + */ > >> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, > >> + struct pci_dev *bridge, > >> + u16 *requester_id) > > > > I'm not sure it makes sense to return a struct pci_dev here because > > there's no requirement that a requester ID correspond to an actual > > pci_dev. > > > well, I would expect the only callers would be for subsys (iommu's) > searching to find requester-id for a pdev, b/c if a pdev doesn't exist, > then the device (and requester-id) doesn't exist... :-/ One of the cases Bjorn is referring to is probably the simple case of a PCIe-to-PCI bridge. The requester ID is (bridge->subordinate->number << 8 | 0), which is not an actual device. As coded here, the function returns bridge, but requester_id is (bridge->subordinate->number << 8 | 0). > >> +{ > >> + struct pci_dev *requester = requestee; > >> + > >> + while (requester != bridge) { > >> + requester = pci_get_dma_source(requester); > >> + pci_dev_put(requester); /* XXX skip ref cnt */ > >> + > >> + if (pci_has_visible_pcie_requester_id(requester, bridge)) > > > > If we acquire the "requester" pointer via a ref-counting interface, > > it's illegal to use the pointer after dropping the reference, isn't it? > > Maybe that's what you mean by the "XXX" comment. > > > >> + break; > >> + > >> + if (pci_is_root_bus(requester->bus)) > >> + return NULL; /* @bridge not parent to @requestee */ > >> + > >> + requester = requester->bus->self; > >> + } > >> + > >> + if (requester_id) { > >> + if (requester->bus != requestee->bus&& > >> + !pci_bridge_uses_endpoint_requester(requester)) > >> + *requester_id = PCI_BRIDGE_REQUESTER_ID(requester); > >> + else > >> + *requester_id = PCI_REQUESTER_ID(requester); > >> + } > >> + > >> + return requester; > >> +} > >> + > >> +static int pci_do_requester_callback(struct pci_dev **dev, > >> + int (*fn)(struct pci_dev *, > >> + u16 id, void *), > >> + void *data) > >> +{ > >> + struct pci_dev *dma_dev; > >> + int ret; > >> + > >> + ret = fn(*dev, PCI_REQUESTER_ID(*dev), data); > >> + if (ret) > >> + return ret; > >> + > >> + dma_dev = pci_get_dma_source(*dev); > >> + pci_dev_put(dma_dev); /* XXX skip ref cnt */ > >> + if (dma_dev == *dev) > > > > Same ref count question as above. > > > >> + return 0; > >> + > >> + ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data); > >> + if (ret) > >> + return ret; > >> + > >> + *dev = dma_dev; > >> + return 0; > >> +} > >> + > >> +/* > >> + * pcie_for_each_requester - Call callback @fn on each devices and DMA source > >> + * from @requestee to the PCIe requester ID visible > >> + * to @bridge. > > > > Transactions from a device may appear with one of several requester IDs, > > but there's not necessarily an actual pci_dev for each ID, so I think the > ditto above; have to have a pdev for each id.... > > > caller reads better if it's "...for_each_requester_id()" > > > > The "Call X on each devices and DMA source from Y to the requester ID" > > part doesn't quite make a sentence. > > > >> + * @requestee: Starting device > >> + * @bridge: upstream bridge (or NULL for root bus) > > > > You should say something about the significance of @bridge. I think the > > point is to call @fn for every possible requester ID @bridge could see for > > transactions from @requestee. This is a way to learn the requester IDs an > > IOMMU at @bridge needs to be prepared for. > > > >> + * @fn: callback function > >> + * @data: data to pass to callback > >> + */ > >> +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, > >> + int (*fn)(struct pci_dev *, u16 id, void *), > >> + void *data) > >> +{ > >> + struct pci_dev *requester; > >> + struct pci_dev *dev = requestee; > >> + int ret = 0; > >> + > >> + requester = pci_get_visible_pcie_requester(requestee, bridge, NULL); > >> + if (!requester) > >> + return -EINVAL; > >> + > >> + do { > >> + ret = pci_do_requester_callback(&dev, fn, data); > >> + if (ret) > >> + return ret; > >> + > >> + if (dev == requester) > >> + return 0; > >> + > >> + /* > >> + * We always consider root bus devices to have a visible > >> + * requester ID, therefore this should never be true. > >> + */ > >> + BUG_ON(pci_is_root_bus(dev->bus)); > > > > What are we going to do if somebody hits this BUG_ON()? If it's impossible > > to hit, we should just remove it. If it's possible to hit it in some weird > > topology you didn't consider, we should see IOMMU faults for any requester > > ID we neglected to map, and that fault would be a better debugging hint > > than a BUG_ON() here. > > > according to spec, all pdev's have a requester-id, even RC ones, albeit > "implementation specific"... > > >> + > >> + dev = dev->bus->self; > >> + > >> + } while (dev != requester); > >> + > >> + /* > >> + * If we've made it here, @requester is a bridge upstream from > >> + * @requestee. > >> + */ > >> + if (pci_bridge_uses_endpoint_requester(requester)) > >> + return pci_do_requester_callback(&requester, fn, data); > >> + > >> + return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data); > >> +} > >> + > >> +/* > >> * find the upstream PCIe-to-PCI bridge of a PCI device > >> * if the device is PCIE, return NULL > >> * if the device isn't connected to a PCIe bridge (that is its parent is a > >> diff --git a/include/linux/pci.h b/include/linux/pci.h > >> index 3a24e4f..94e81d1 100644 > >> --- a/include/linux/pci.h > >> +++ b/include/linux/pci.h > >> @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev) > >> } > >> #endif > >> > >> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, > >> + struct pci_dev *bridge, > >> + u16 *requester_id); > > > > The structure of this interface implies that there is only one visible > > requester ID, but the whole point of this patch is that a transaction from > > @requestee may appear with one of several requester IDs. So which one will > > this return? > > > Are there devices that use multiple requester id's? > I know we have ones that use the wrong id. > If we want to handle the multiple requester-id's per pdev, > we could pass in a ptr to an initial requester-id; if null, give me first; > if !null, give me next one; would also need a flag returned to indicate > there is more than 1. > null-rtn when pass in !null ref, means the end. > ... sledge-hammer + nail ??? That does not sound safe. Again, this is why I think the pcie_for_each_requester() works. Given a requester, call fn() on every device with the same requester ID. That means when you have a bridge, we call it for the bridge and every device and every device quirk behind the bridge, fully populating context entries for all of them. Thanks, Alex > >> +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, > >> + int (*fn)(struct pci_dev *, u16 id, void *), > >> + void *data); > >> + > >> /** > >> * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device > >> * @pdev: the PCI device > >> > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote: > On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote: > > On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson > > <alex.williamson@redhat.com> wrote: > > > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote: > > >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: > > As the DMA > > transaction propagates through the fabric, it may be tagged by bridges > > with different requester IDs. > > > > The requester IDs are needed outside PCI (by IOMMU drivers), but I'm > > not sure the intermediate pci_devs are. > > A u16 requester ID doesn't mean much on it's own though, it's not > necessarily even unique. A requester ID associated with the context of > a pci_dev is unique and gives us a reference point if we need to perform > another operation on that requester ID. A u16 requester ID better mean something to an IOMMU -- it's all the IOMMU can use to look up the correct mapping. That's why we have to give the iterator something to define the scope to iterate over. The same requester ID could mean something totally unrelated in a different scope, e.g., below a different IOMMU. > > Returning both a pci_dev and a requester ID makes it more complicated. > > At the hardware level, transactions contain only a requester ID, and > > bridges can either drop it, pass it unchanged, or assign a new one. I > > think the code will be simpler if we just model that. > > I'm not convinced. Patch 2/2 makes use of both the returned pci_dev and > the returned requester ID and it's a huge simplification overall. The IOMMU driver makes mappings so DMA from device A can reach a buffer. It needs to know about device A, and it needs to know what source IDs might appear on those DMA transactions. Why would it need to know anything about any bridges between A and the IOMMU? > > My thought was to pass a pci_dev (the originator of the DMA, which I > > would call the "requester") and a callback. The callback would accept > > the requester pci_dev (always the same requester device) and a > > requester ID. > > > > This would call @fn for each possible requester ID for transactions > > from the device. IOMMU drivers should only need the requester ID to > > manage mappings; they shouldn't need a pci_dev corresponding to any > > intermediate bridges. > > This implementation is almost the same with the only change being that > the pci_dev passed to the callback is the one most closely associated > with the requester ID. For the IOMMU driver, it doesn't matter since > it's only using the requester ID, but why would the callback care about > the original requester? If it needed to do something device specific, > it's going to care about the closest device to the requester ID. If this can be done without passing a pci_dev at all to the callback, that would be even better. If a caller needed the original pci_dev, it could always pass it via the "void *data". I don't see the point of passing a "device closest to the requester ID." What would the IOMMU do with that? As far as the IOMMU is concerned, the requester ID could be an arbitrary number completely unrelated to a pci_dev. > > >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, > > >> > + struct pci_dev *bridge, > > >> > + u16 *requester_id); > > >> > > >> The structure of this interface implies that there is only one visible > > >> requester ID, but the whole point of this patch is that a transaction from > > >> @requestee may appear with one of several requester IDs. So which one will > > >> this return? > > > > > > I thought the point of this patch was to have an integrated interface > > > for finding the requester ID and doing something across all devices with > > > that requester ID > > > > Oh, here's the difference in our understanding. "Doing something > > across all devices with that requester ID" sounds like identifying the > > set of devices that have to be handled as a group by the IOMMU. > > That's certainly an issue, but I wasn't considering it at all. > > > > I was only concerned with the question of a single device that > > requires multiple IOMMU mappings because DMA requests might use any of > > several source IDs. This is mentioned in sec 3.6.1.1 of the VT-d > > spec, i.e., "requests arriving with the source-id in the original > > PCI-X transaction or the source-id provided by the bridge. ... > > software must program multiple context entries, each corresponding to > > the possible set of source-ids." > > > > My thought was that the IOMMU driver would use > > pcie_for_each_requester_id() and have the callback set up a mapping > > for one requester ID each time it was called. > > Which is exactly what pcie_for_each_requester() does, but I try to solve > the problem of a single device that needs multiple context entries the > same as a bridge that masks multiple devices. I'm not sure I understand your point yet, but I think it's better to think of this as computing the set of requester IDs we might see from a given device. If we think of it as finding the set of devices that use a given requester ID, then we have to worry about that set being modified by hotplug. > > > and thereby remove pci_find_upstream_pcie_bridge(), > > > provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for > > > devices that use the wrong requester ID. In what case does a device > > > appear to have multiple requester IDs? We have cases where devices use > > > the wrong requester ID, but AFAIK they only use a single wrong requester > > > ID. > > > > I think the example in sec 3.6.1.1 answers this, doesn't it? > > It does, in summary we have to be prepared that a transaction from a > device behind a PCIe-to-PCI/X bridge may use either the device requester > ID (bus|devfn) or the bridge (subordinate|0) requester ID. This is why > pcie_for_each_requester() makes sense to me, we iterate across all > requesters that may use the same requester ID. Let's say device A and B use the same requester ID because they're behind a bridge. What happens if we map a buffer for A and then hot-remove B? Here's what I think your proposal would do: map buffer for A callback for A callback for B hot-remove B unmap buffer for A callback for A We never do an unmap callback for B, so if we did anything while mapping, it will never be undone. > If we had a > pcie_for_each_requester_id() how would we handle that? Do we call it > once for the (bus|devfn) requester ID of the device we're trying to map > and then call it again for the (subordinate|0) requester ID of the > bridge? That's almost how pci_find_upstream_pcie_bridge() is used and > it ends up putting a lot of the logic into the driver. That *is* what I'm proposing (calling the callback once for each possible requester ID that could be seen on DMA from the device). In my opinion, the problem with pci_find_upstream_pcie_bridge() is that you only get one device back from it, and the driver has to roll its own looping. > How do we write > a pcie_for_each_requester_id() function that can uniquely identify a PCI > hierarchy to traverse in the presences of multiple domains? I must be missing something in your question. If we give pcie_for_each_requester_id() the requester pci_dev, that uniquely identifies the PCI hierarchy. Then it's just a question of how far up the hierarchy we need to go, and the bridge (or bus) argument tells us that. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 24, 2013 at 04:42:03PM -0400, Don Dutile wrote: > On 07/23/2013 06:35 PM, Bjorn Helgaas wrote: > >On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: > >>This provides interfaces for drivers to discover the visible PCIe > >>requester ID for a device, for things like IOMMU setup, and iterate > > > >IDs (plural) > > > a single device does not have multiple requester id's; > can have multiple tag-id's (that are ignored in this situation, but > can be used by switches for ordering purposes), but there's only 1/fcn > (except for those quirker pdevs!). Generally a device does not have multiple requester IDs, but the IOMMU may see one of several requester IDs for DMAs from a given device because bridges may take ownership of those transactions (sec 3.6.1.1 of the VT-d spec). Just to be clear, I envision this whole interface as being specifically for use by IOMMU drivers, so I'm only trying to provide what's necessary to build IOMMU mappings. > >>+ * pci_get_visible_pcie_requester - Get requester and requester ID for > >>+ * @requestee below @bridge > >>+ * @requestee: requester device > >>+ * @bridge: upstream bridge (or NULL for root bus) > >>+ * @requester_id: location to store requester ID or NULL > >>+ */ > >>+struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, > >>+ struct pci_dev *bridge, > >>+ u16 *requester_id) > > > >I'm not sure it makes sense to return a struct pci_dev here because > >there's no requirement that a requester ID correspond to an actual > >pci_dev. > > > well, I would expect the only callers would be for subsys (iommu's) > searching to find requester-id for a pdev, b/c if a pdev doesn't exist, > then the device (and requester-id) doesn't exist... :-/ > >>+ * pcie_for_each_requester - Call callback @fn on each devices and DMA source > >>+ * from @requestee to the PCIe requester ID visible > >>+ * to @bridge. > > > >Transactions from a device may appear with one of several requester IDs, > >but there's not necessarily an actual pci_dev for each ID, so I think the > ditto above; have to have a pdev for each id.... This *might* be true, but I don't think we should rely on it. For example: 00:1c.0 PCIe to PCI bridge to [bus 01] 01:01.0 PCI endpoint The bridge will take ownership of DMA transactions from the 01:01.0 endpoint. An IOMMU on bus 00 will see a bridge-assigned requester ID of 01:00.0 (subordinate bus number, devfn zero), but there is no 01:00.0 device. Maybe the rules of conventional PCI require a device zero (I don't remember), but even if they do, it's ugly to rely on that here because I don't think device 01:00.0 is relevant to mappings for device 01:01.0. Obviously we also have to be aware that 01:00.0 and 01:01.0 can't be isolated from each other, but I think that issue is separate from the question of what requester IDs have to be mapped to make 01:01.0 work. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-07-24 at 17:24 -0600, Bjorn Helgaas wrote: > On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote: > > On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote: > > > On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson > > > <alex.williamson@redhat.com> wrote: > > > > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote: > > > >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: > > > As the DMA > > > transaction propagates through the fabric, it may be tagged by bridges > > > with different requester IDs. > > > > > > The requester IDs are needed outside PCI (by IOMMU drivers), but I'm > > > not sure the intermediate pci_devs are. > > > > A u16 requester ID doesn't mean much on it's own though, it's not > > necessarily even unique. A requester ID associated with the context of > > a pci_dev is unique and gives us a reference point if we need to perform > > another operation on that requester ID. > > A u16 requester ID better mean something to an IOMMU -- it's all the > IOMMU can use to look up the correct mapping. That's why we have to > give the iterator something to define the scope to iterate over. The > same requester ID could mean something totally unrelated in a > different scope, e.g., below a different IOMMU. The point I'm trying to make is that a requester ID depends on it's context (minimally, the PCI segment). The caller can assume the context based on the calling parameters or we can provide context in the form of an associated pci_dev. I chose the latter path because I prefer explicit interfaces and it has some usefulness in the intel-iommu implementation. For instance, get_domain_for_dev() first looks to see if a pci_dev already has a domain. If it doesn't, we want to look to see if there's an upstream device that would use the same requester ID that already has a domain. If our get-requester-ID-for-device function returns only the requester ID, we don't know if that requester ID is the device we're searching from or some upstream device. Therefore we potentially do an unnecessary search for the domain. The other user is intel_iommu_add_device() where we're trying to build IOMMU groups. Visibility is the first requirement of an IOMMU group. If the IOMMU cannot distinguish between devices, they must be part of the same IOMMU group. Here we want to find the pci_dev that hosts the requester ID. I don't even know how we'd implement this with a function that only returned the requester ID. Perhaps we'd have to walk upstream from the device calling the get-requester-ID-for-device function at each step and noticing when it changed. That moves significant logic back into the caller code. > > > Returning both a pci_dev and a requester ID makes it more complicated. > > > At the hardware level, transactions contain only a requester ID, and > > > bridges can either drop it, pass it unchanged, or assign a new one. I > > > think the code will be simpler if we just model that. > > > > I'm not convinced. Patch 2/2 makes use of both the returned pci_dev and > > the returned requester ID and it's a huge simplification overall. > > The IOMMU driver makes mappings so DMA from device A can reach a > buffer. It needs to know about device A, and it needs to know what > source IDs might appear on those DMA transactions. Why would it need > to know anything about any bridges between A and the IOMMU? Maybe it doesn't, but it might want to know about the top level bridge for the examples above and I prefer that we be consistent in providing both a requester ID and context device. It's not currently used in the callback function, but I could imagine it could be. For instance, what if there are multiple endpoints downstream of a bridge and the IOMMU wants to use the bridge pci_dev to track how many routes it's managing through this bridge. I think the latter is actually a current bug in intel-iommu as noted by the XXX in patch 2/2. iommu_detach_dependent_devices() will tear down all context entries along a path, but there's no reference counting as to how many devices are using that path. Should that reference counting be associated with the pci_dev? I suppose the IOMMU could already have data structures per requester ID or could create them and use an idr for lookup, but does this interface want to require that? > > > My thought was to pass a pci_dev (the originator of the DMA, which I > > > would call the "requester") and a callback. The callback would accept > > > the requester pci_dev (always the same requester device) and a > > > requester ID. > > > > > > This would call @fn for each possible requester ID for transactions > > > from the device. IOMMU drivers should only need the requester ID to > > > manage mappings; they shouldn't need a pci_dev corresponding to any > > > intermediate bridges. > > > > This implementation is almost the same with the only change being that > > the pci_dev passed to the callback is the one most closely associated > > with the requester ID. For the IOMMU driver, it doesn't matter since > > it's only using the requester ID, but why would the callback care about > > the original requester? If it needed to do something device specific, > > it's going to care about the closest device to the requester ID. > > If this can be done without passing a pci_dev at all to the callback, > that would be even better. If a caller needed the original pci_dev, > it could always pass it via the "void *data". Agreed, the original "device A" is the caller's problem if it wants to know about it, but I think providing full context of a requester ID (which means a pci_dev) makes sense too. > I don't see the point of passing a "device closest to the requester > ID." What would the IOMMU do with that? As far as the IOMMU is > concerned, the requester ID could be an arbitrary number completely > unrelated to a pci_dev. Do you have an example of a case where a requester ID doesn't have some association to a pci_dev? > > > >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, > > > >> > + struct pci_dev *bridge, > > > >> > + u16 *requester_id); > > > >> > > > >> The structure of this interface implies that there is only one visible > > > >> requester ID, but the whole point of this patch is that a transaction from > > > >> @requestee may appear with one of several requester IDs. So which one will > > > >> this return? > > > > > > > > I thought the point of this patch was to have an integrated interface > > > > for finding the requester ID and doing something across all devices with > > > > that requester ID > > > > > > Oh, here's the difference in our understanding. "Doing something > > > across all devices with that requester ID" sounds like identifying the > > > set of devices that have to be handled as a group by the IOMMU. > > > That's certainly an issue, but I wasn't considering it at all. > > > > > > I was only concerned with the question of a single device that > > > requires multiple IOMMU mappings because DMA requests might use any of > > > several source IDs. This is mentioned in sec 3.6.1.1 of the VT-d > > > spec, i.e., "requests arriving with the source-id in the original > > > PCI-X transaction or the source-id provided by the bridge. ... > > > software must program multiple context entries, each corresponding to > > > the possible set of source-ids." > > > > > > My thought was that the IOMMU driver would use > > > pcie_for_each_requester_id() and have the callback set up a mapping > > > for one requester ID each time it was called. > > > > Which is exactly what pcie_for_each_requester() does, but I try to solve > > the problem of a single device that needs multiple context entries the > > same as a bridge that masks multiple devices. > > I'm not sure I understand your point yet, but I think it's better to > think of this as computing the set of requester IDs we might see from > a given device. If we think of it as finding the set of devices that > use a given requester ID, then we have to worry about that set being > modified by hotplug. Perhaps I'm explain it wrong, I've had to refresh myself on how this all works since I posted it. Looking at pcie_for_each_requester() we first find the requester ID and associated device for the requested device (yet another users of the associated device part of the interface). We then walk up from the requested device to the associated device, calling the callback on each bridge or dma quirk in between. So, I was wrong to indicate that pcie_for_each_requester() does anything across all devices with the same requester ID. It traverses all devices and quirks of devices along the path from the requested device to the device associated with the requester ID. > > > > and thereby remove pci_find_upstream_pcie_bridge(), > > > > provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for > > > > devices that use the wrong requester ID. In what case does a device > > > > appear to have multiple requester IDs? We have cases where devices use > > > > the wrong requester ID, but AFAIK they only use a single wrong requester > > > > ID. > > > > > > I think the example in sec 3.6.1.1 answers this, doesn't it? > > > > It does, in summary we have to be prepared that a transaction from a > > device behind a PCIe-to-PCI/X bridge may use either the device requester > > ID (bus|devfn) or the bridge (subordinate|0) requester ID. This is why > > pcie_for_each_requester() makes sense to me, we iterate across all > > requesters that may use the same requester ID. > > Let's say device A and B use the same requester ID because they're > behind a bridge. What happens if we map a buffer for A and then > hot-remove B? Here's what I think your proposal would do: > > map buffer for A > callback for A > callback for B > hot-remove B > unmap buffer for A > callback for A > > We never do an unmap callback for B, so if we did anything while > mapping, it will never be undone. So I think I've again lost my context since I posted this, apologies. pcie_for_each_requester() is implemented to handle the path, not the subtree. I believe what pcie_for_each_requester() as written would do is: map buffer for A callback for A hot-remove B unmap buffer for A callback for A Furthermore, if we have: -- A / X--Y \ -- B Where X is the associated device for the requester ID, we'd do: map buffer for A callback for A callback for Y callback for X hot-remove B unmap buffer for A callback for A callback for Y callback for X That hot-remove though is where I think intel-iommu has a bug as the set of dependent devices for B includes Y & X, which would then have their context entries cleared. That's yet another intel-iommu bug though, not anything that requires a change to this interface. > > If we had a > > pcie_for_each_requester_id() how would we handle that? Do we call it > > once for the (bus|devfn) requester ID of the device we're trying to map > > and then call it again for the (subordinate|0) requester ID of the > > bridge? That's almost how pci_find_upstream_pcie_bridge() is used and > > it ends up putting a lot of the logic into the driver. > > That *is* what I'm proposing (calling the callback once for each > possible requester ID that could be seen on DMA from the device). In > my opinion, the problem with pci_find_upstream_pcie_bridge() is that > you only get one device back from it, and the driver has to roll its > own looping. I'm not sure if my argument above is still relevant, but what I was trying to suggest was that the caller would need to make multiple calls, which puts us in the same boat as the roll-your-own loop with find_upstream_pcie_bridge. Of course the callback needs to be called for each requester ID. Let me go back to the X-Y-A|B example above to see if I can explain why pcie_for_each_requester_id() doesn't make sense to me. Generally a for_each_foo function should iterate across all things with the same foo. So a for_each_requester_id should iterate across all things with the same requester ID. If we look at an endpoint device like A, only A has A's requester ID. Therefore, why would for_each_requester_id(A) traverse up to Y? Y can take ownership and become the requester for A, but if we were to call for_each_requester_id(Y), wouldn't you expect the callback to happen on {Y, A, B} since all of those can use that requester ID? That makes it a downstream looking for_each callback which is not really want we want. As you indicate above, we want an upstream looking function. If I call for_each_requester(A), A is obviously a requester for A. Y can take ownership of a transaction from A, therefore we must assume Y is a requester for A. Same with X. I believe this is what we want and this is what the code does. > > How do we write > > a pcie_for_each_requester_id() function that can uniquely identify a PCI > > hierarchy to traverse in the presences of multiple domains? > > I must be missing something in your question. If we give > pcie_for_each_requester_id() the requester pci_dev, that uniquely > identifies the PCI hierarchy. Then it's just a question of how far up > the hierarchy we need to go, and the bridge (or bus) argument tells us > that. I think the code actually does what you want, so we're only in disagreement about the naming and usefulness (appropriate-ness) of passing a pci_dev associated with a requester ID. Is that correct? There are several examples above about the associated device being useful to the implementation. It may not be required, but it's only going to make the caller implement more code on their side not to provide it. The for_each_requester_id vs for_each_requester is a matter of personal taste, but I think for_each_requester actually makes more sense given the parameters and the upstream vs downstream nature of the call. Sorry again for the confusion, I can barely identify my own code after a couple weeks. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/25/2013 01:19 PM, Bjorn Helgaas wrote: > On Wed, Jul 24, 2013 at 04:42:03PM -0400, Don Dutile wrote: >> On 07/23/2013 06:35 PM, Bjorn Helgaas wrote: >>> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: >>>> This provides interfaces for drivers to discover the visible PCIe >>>> requester ID for a device, for things like IOMMU setup, and iterate >>> >>> IDs (plural) >>> >> a single device does not have multiple requester id's; >> can have multiple tag-id's (that are ignored in this situation, but >> can be used by switches for ordering purposes), but there's only 1/fcn >> (except for those quirker pdevs!). > > Generally a device does not have multiple requester IDs, but the > IOMMU may see one of several requester IDs for DMAs from a given > device because bridges may take ownership of those transactions (sec > 3.6.1.1 of the VT-d spec). > > Just to be clear, I envision this whole interface as being > specifically for use by IOMMU drivers, so I'm only trying to provide > what's necessary to build IOMMU mappings. > This is all just PCIe-PCI bridge spec; the code Alex is proposing handles this case. >>>> + * pci_get_visible_pcie_requester - Get requester and requester ID for >>>> + * @requestee below @bridge >>>> + * @requestee: requester device >>>> + * @bridge: upstream bridge (or NULL for root bus) >>>> + * @requester_id: location to store requester ID or NULL >>>> + */ >>>> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, >>>> + struct pci_dev *bridge, >>>> + u16 *requester_id) >>> >>> I'm not sure it makes sense to return a struct pci_dev here because >>> there's no requirement that a requester ID correspond to an actual >>> pci_dev. >>> >> well, I would expect the only callers would be for subsys (iommu's) >> searching to find requester-id for a pdev, b/c if a pdev doesn't exist, >> then the device (and requester-id) doesn't exist... :-/ > >>>> + * pcie_for_each_requester - Call callback @fn on each devices and DMA source >>>> + * from @requestee to the PCIe requester ID visible >>>> + * to @bridge. >>> >>> Transactions from a device may appear with one of several requester IDs, >>> but there's not necessarily an actual pci_dev for each ID, so I think the >> ditto above; have to have a pdev for each id.... > > This *might* be true, but I don't think we should rely on it. For > example: > > 00:1c.0 PCIe to PCI bridge to [bus 01] > 01:01.0 PCI endpoint > > The bridge will take ownership of DMA transactions from the 01:01.0 > endpoint. An IOMMU on bus 00 will see a bridge-assigned requester > ID of 01:00.0 (subordinate bus number, devfn zero), but there is no > 01:00.0 device. > Clarification: I meant that each requester-id must have at least 1 PCI device associated with it. when PCIe-PCI(X) bridges are in the path, then iommu-groups lumps multiple requester-ids into a single group, and all devices behind the bridges are lumped into that group. The only time its possible for the requester-id for a device behind a PCIe-PCI(X) Bridge to not be <secondary-bus>:0.0 is when the device is PCI-X. I can't find anything in the spec that such a PPB can tell the sw what requester-id alg it's using (pass PCI-X device's requester-id, or use <sbus>:0.0). The (vtd) spec does say that all device behind a bridge (PCI-X or PCI) must be in the same domain, which means same iommu group. So, comments in the code or reviewers implied a requester-id could exist w/o a pdev; IMO, the (iommu?) code ought to track/tag req-id -> pdev and vice-versa. > Maybe the rules of conventional PCI require a device zero (I don't > remember), but even if they do, it's ugly to rely on that here > because I don't think device 01:00.0 is relevant to mappings for > device 01:01.0. > no requirement for any device-number on a PCI bus. > Obviously we also have to be aware that 01:00.0 and 01:01.0 can't be > isolated from each other, but I think that issue is separate from > the question of what requester IDs have to be mapped to make 01:01.0 > work. > yup. > Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/24/2013 05:22 PM, Alex Williamson wrote: > On Wed, 2013-07-24 at 16:42 -0400, Don Dutile wrote: >> On 07/23/2013 06:35 PM, Bjorn Helgaas wrote: >>> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: >>>> This provides interfaces for drivers to discover the visible PCIe >>>> requester ID for a device, for things like IOMMU setup, and iterate >>> >>> IDs (plural) >>> >> a single device does not have multiple requester id's; >> can have multiple tag-id's (that are ignored in this situation, but >> can be used by switches for ordering purposes), but there's only 1/fcn >> (except for those quirker pdevs!). >> >>>> over the device chain from requestee to requester, including DMA >>>> quirks at each step. >>> >>> "requestee" doesn't make sense to me. The "-ee" suffix added to a verb >>> normally makes a noun that refers to the object of the action. So >>> "requestee" sounds like it means something like "target" or "responder," >>> but that's not what you mean here. >>> >> sorry, I didn't follow the requester/requestee terminology either... >> >>>> Suggested-by: Bjorn Helgaas<bhelgaas@google.com> >>>> Signed-off-by: Alex Williamson<alex.williamson@redhat.com> >>>> --- >>>> drivers/pci/search.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> include/linux/pci.h | 7 ++ >>>> 2 files changed, 205 insertions(+) >>>> >>>> diff --git a/drivers/pci/search.c b/drivers/pci/search.c >>>> index d0627fa..4759c02 100644 >>>> --- a/drivers/pci/search.c >>>> +++ b/drivers/pci/search.c >>>> @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem); >>>> EXPORT_SYMBOL_GPL(pci_bus_sem); >>>> >>>> /* >>>> + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID >>>> + * @dev: device to test >>>> + */ >>>> +static bool pci_has_pcie_requester_id(struct pci_dev *dev) >>>> +{ >>>> + /* >>>> + * XXX There's no indicator of the bus type, conventional PCI vs >>>> + * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe >>>> + * requester ID is a native PCIe based system (such as VT-d or >>>> + * AMD-Vi). It's common that PCIe root complex devices do not >> could the above comment be x86-iommu-neutral? >> by definition of PCIe, all devices have a requester id (min. to accept cfg cycles); >> req'd if source of read/write requests, read completions. > > I agree completely, the question is whether we have a PCIe root complex > or a conventional PCI host bus. I don't think we have any way to tell, > so I'm assuming pci_is_root_bus() indicates we're on a PCIe root complex > and therefore have requester IDs. If there's some way to determine this > let me know and we can avoid any kind of assumption. > >>>> + * include a PCIe capability, but we can assume they are PCIe >>>> + * devices based on their topology. >>>> + */ >>>> + if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus)) >>>> + return true; >>>> + >>>> + /* >>>> + * PCI-X devices have a requester ID, but the bridge may still take >>>> + * ownership of transactions and create a requester ID. We therefore >>>> + * assume that the PCI-X requester ID is not the same one used on PCIe. >>>> + */ >>>> + >>>> +#ifdef CONFIG_PCI_QUIRKS >>>> + /* >>>> + * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability. >>>> + * If the device is a bridge, look to the next device upstream of it. >>>> + * If that device is PCIe and not a PCIe-to-PCI bridge, then by >>>> + * deduction, the device must be PCIe and therefore has a requester ID. >>>> + */ >>>> + if (dev->subordinate) { >>>> + struct pci_dev *parent = dev->bus->self; >>>> + >>>> + if (pci_is_pcie(parent)&& >>>> + pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE) >>>> + return true; >>>> + } >>>> +#endif >>>> + >>>> + return false; >>>> +} >>>> + >>>> +/* >>>> + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID? >>>> + * @dev: requester device >>>> + * @bridge: upstream bridge (or NULL for root bus) >>>> + */ >>>> +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev, >>>> + struct pci_dev *bridge) >>>> +{ >>>> + /* >>>> + * The entire path must be tested, if any step does not have a >>>> + * requester ID, the chain is broken. This allows us to support >>>> + * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe >>>> + */ >>>> + while (dev != bridge) { >>>> + if (!pci_has_pcie_requester_id(dev)) >>>> + return false; >>>> + >>>> + if (pci_is_root_bus(dev->bus)) >>>> + return !bridge; /* false if we don't hit @bridge */ >>>> + >>>> + dev = dev->bus->self; >>>> + } >>>> + >>>> + return true; >>>> +} >>>> + >>>> +/* >>>> + * Legacy PCI bridges within a root complex (ex. Intel 82801) report >>>> + * a different requester ID than a standard PCIe-to-PCI bridge. Instead >> First, I'm assuming you mean that devices behind a Legacy PCI bridge within a root complex >> get assigned IDs different than std PCIe-to-PCI bridges (as quoted below). > > Yes > >>>> + * of using (subordinate<< 8 | 0) the use (bus<< 8 | devfn), like a >>> >>> s/the/they/ >>> >> well, the PPBs should inject their (secondary<< 8 | 0), not subordinate. >> From the PCI Express to PCI/PCI-X Bridge spec, v1.0: >> The Tag is reassigned by the bridge according to the rules outlined in the following sections. If the >> bridge generates a new Requester ID for a transaction forwarded from the secondary interface to the >> primary interface, the bridge assigns the PCI Express Requester ID using its secondary interface’s >> ^^^^^^^^^ >> Bus Number and sets both the Device Number and Function Number fields to zero. >> ^^^^^^^^^^ > > I'm referring to (pdev->subordinate->number<< 8 | 0) vs > (pdev->bus->number<< 8 | pdev->devfn). The subordinate struct pci_bus > is the secondary interface bus. > >> As for the 82801, looks like they took this part of the PCIe spec to heart: >> (PCIe v3 spec, section 2.2.6.2 Transaction Descriptor - Transaction ID Field): >> Exception: The assignment of Bus and Device Numbers to the Devices within a Root Complex, >> and the Device Numbers to the Downstream Ports within a Switch, may be done in an >> implementation specific way. >> Obviously, you're missing the 'implementation-specific way' compiler... ;-) > > So this is potentially Intel specific. How can we tell it's an Intel > root complex? > Good question. ... another question to ask Intel.... :( >> aw: which 'bus' do you mean above in '(bus<< 8 | devfn)' ? > > (pdev->bus->number<< 8 | pdev->devfn) > k. >>> Did you learn about this empirically? Intel spec? I wonder if there's >>> some way to derive this from the PCIe specs. >>> >>>> + * standard PCIe endpoint. This function detects them. >>>> + * >>>> + * XXX Is this Intel vendor ID specific? >>>> + */ >>>> +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge) >>>> +{ >>>> + if (!pci_is_pcie(bridge)&& pci_is_root_bus(bridge->bus)) >>>> + return true; >>>> + >>>> + return false; >>>> +} >>>> + >>>> +#define PCI_REQUESTER_ID(dev) (((dev)->bus->number<< 8) | (dev)->devfn) >>>> +#define PCI_BRIDGE_REQUESTER_ID(dev) ((dev)->subordinate->number<< 8) >>>> + >>>> +/* >>>> + * pci_get_visible_pcie_requester - Get requester and requester ID for >>>> + * @requestee below @bridge >>>> + * @requestee: requester device >>>> + * @bridge: upstream bridge (or NULL for root bus) >>>> + * @requester_id: location to store requester ID or NULL >>>> + */ >>>> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, >>>> + struct pci_dev *bridge, >>>> + u16 *requester_id) >>> >>> I'm not sure it makes sense to return a struct pci_dev here because >>> there's no requirement that a requester ID correspond to an actual >>> pci_dev. >>> >> well, I would expect the only callers would be for subsys (iommu's) >> searching to find requester-id for a pdev, b/c if a pdev doesn't exist, >> then the device (and requester-id) doesn't exist... :-/ > > One of the cases Bjorn is referring to is probably the simple case of a > PCIe-to-PCI bridge. The requester ID is (bridge->subordinate->number<< > 8 | 0), which is not an actual device. As coded here, the function > returns bridge, but requester_id is (bridge->subordinate->number<< 8 | > 0). > right, that requester-id doesn't map (most likely) to a pdev w/matching BDF. >>>> +{ >>>> + struct pci_dev *requester = requestee; >>>> + >>>> + while (requester != bridge) { >>>> + requester = pci_get_dma_source(requester); >>>> + pci_dev_put(requester); /* XXX skip ref cnt */ >>>> + >>>> + if (pci_has_visible_pcie_requester_id(requester, bridge)) >>> >>> If we acquire the "requester" pointer via a ref-counting interface, >>> it's illegal to use the pointer after dropping the reference, isn't it? >>> Maybe that's what you mean by the "XXX" comment. >>> >>>> + break; >>>> + >>>> + if (pci_is_root_bus(requester->bus)) >>>> + return NULL; /* @bridge not parent to @requestee */ >>>> + >>>> + requester = requester->bus->self; >>>> + } >>>> + >>>> + if (requester_id) { >>>> + if (requester->bus != requestee->bus&& >>>> + !pci_bridge_uses_endpoint_requester(requester)) >>>> + *requester_id = PCI_BRIDGE_REQUESTER_ID(requester); >>>> + else >>>> + *requester_id = PCI_REQUESTER_ID(requester); >>>> + } >>>> + >>>> + return requester; >>>> +} >>>> + >>>> +static int pci_do_requester_callback(struct pci_dev **dev, >>>> + int (*fn)(struct pci_dev *, >>>> + u16 id, void *), >>>> + void *data) >>>> +{ >>>> + struct pci_dev *dma_dev; >>>> + int ret; >>>> + >>>> + ret = fn(*dev, PCI_REQUESTER_ID(*dev), data); >>>> + if (ret) >>>> + return ret; >>>> + >>>> + dma_dev = pci_get_dma_source(*dev); >>>> + pci_dev_put(dma_dev); /* XXX skip ref cnt */ >>>> + if (dma_dev == *dev) >>> >>> Same ref count question as above. >>> >>>> + return 0; >>>> + >>>> + ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data); >>>> + if (ret) >>>> + return ret; >>>> + >>>> + *dev = dma_dev; >>>> + return 0; >>>> +} >>>> + >>>> +/* >>>> + * pcie_for_each_requester - Call callback @fn on each devices and DMA source >>>> + * from @requestee to the PCIe requester ID visible >>>> + * to @bridge. >>> >>> Transactions from a device may appear with one of several requester IDs, >>> but there's not necessarily an actual pci_dev for each ID, so I think the >> ditto above; have to have a pdev for each id.... >> >>> caller reads better if it's "...for_each_requester_id()" >>> >>> The "Call X on each devices and DMA source from Y to the requester ID" >>> part doesn't quite make a sentence. >>> >>>> + * @requestee: Starting device >>>> + * @bridge: upstream bridge (or NULL for root bus) >>> >>> You should say something about the significance of @bridge. I think the >>> point is to call @fn for every possible requester ID @bridge could see for >>> transactions from @requestee. This is a way to learn the requester IDs an >>> IOMMU at @bridge needs to be prepared for. >>> >>>> + * @fn: callback function >>>> + * @data: data to pass to callback >>>> + */ >>>> +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, >>>> + int (*fn)(struct pci_dev *, u16 id, void *), >>>> + void *data) >>>> +{ >>>> + struct pci_dev *requester; >>>> + struct pci_dev *dev = requestee; >>>> + int ret = 0; >>>> + >>>> + requester = pci_get_visible_pcie_requester(requestee, bridge, NULL); >>>> + if (!requester) >>>> + return -EINVAL; >>>> + >>>> + do { >>>> + ret = pci_do_requester_callback(&dev, fn, data); >>>> + if (ret) >>>> + return ret; >>>> + >>>> + if (dev == requester) >>>> + return 0; >>>> + >>>> + /* >>>> + * We always consider root bus devices to have a visible >>>> + * requester ID, therefore this should never be true. >>>> + */ >>>> + BUG_ON(pci_is_root_bus(dev->bus)); >>> >>> What are we going to do if somebody hits this BUG_ON()? If it's impossible >>> to hit, we should just remove it. If it's possible to hit it in some weird >>> topology you didn't consider, we should see IOMMU faults for any requester >>> ID we neglected to map, and that fault would be a better debugging hint >>> than a BUG_ON() here. >>> >> according to spec, all pdev's have a requester-id, even RC ones, albeit >> "implementation specific"... >> >>>> + >>>> + dev = dev->bus->self; >>>> + >>>> + } while (dev != requester); >>>> + >>>> + /* >>>> + * If we've made it here, @requester is a bridge upstream from >>>> + * @requestee. >>>> + */ >>>> + if (pci_bridge_uses_endpoint_requester(requester)) >>>> + return pci_do_requester_callback(&requester, fn, data); >>>> + >>>> + return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data); >>>> +} >>>> + >>>> +/* >>>> * find the upstream PCIe-to-PCI bridge of a PCI device >>>> * if the device is PCIE, return NULL >>>> * if the device isn't connected to a PCIe bridge (that is its parent is a >>>> diff --git a/include/linux/pci.h b/include/linux/pci.h >>>> index 3a24e4f..94e81d1 100644 >>>> --- a/include/linux/pci.h >>>> +++ b/include/linux/pci.h >>>> @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev) >>>> } >>>> #endif >>>> >>>> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, >>>> + struct pci_dev *bridge, >>>> + u16 *requester_id); >>> >>> The structure of this interface implies that there is only one visible >>> requester ID, but the whole point of this patch is that a transaction from >>> @requestee may appear with one of several requester IDs. So which one will >>> this return? >>> >> Are there devices that use multiple requester id's? >> I know we have ones that use the wrong id. >> If we want to handle the multiple requester-id's per pdev, >> we could pass in a ptr to an initial requester-id; if null, give me first; >> if !null, give me next one; would also need a flag returned to indicate >> there is more than 1. >> null-rtn when pass in !null ref, means the end. >> ... sledge-hammer + nail ??? > > That does not sound safe. Again, this is why I think the > pcie_for_each_requester() works. Given a requester, call fn() on every > device with the same requester ID. That means when you have a bridge, > we call it for the bridge and every device and every device quirk behind > the bridge, fully populating context entries for all of them. Thanks, > > Alex > ok. >>>> +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, >>>> + int (*fn)(struct pci_dev *, u16 id, void *), >>>> + void *data); >>>> + >>>> /** >>>> * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device >>>> * @pdev: the PCI device >>>> >> > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 25, 2013 at 02:25:04PM -0400, Don Dutile wrote: > On 07/25/2013 01:19 PM, Bjorn Helgaas wrote: > >On Wed, Jul 24, 2013 at 04:42:03PM -0400, Don Dutile wrote: > >>On 07/23/2013 06:35 PM, Bjorn Helgaas wrote: > >>>On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: > >>>>+ * pcie_for_each_requester - Call callback @fn on each devices and DMA source > >>>>+ * from @requestee to the PCIe requester ID visible > >>>>+ * to @bridge. > >>> > >>>Transactions from a device may appear with one of several requester IDs, > >>>but there's not necessarily an actual pci_dev for each ID, so I think the > >>ditto above; have to have a pdev for each id.... > > > >This *might* be true, but I don't think we should rely on it. For > >example: > > > > 00:1c.0 PCIe to PCI bridge to [bus 01] > > 01:01.0 PCI endpoint > > > >The bridge will take ownership of DMA transactions from the 01:01.0 > >endpoint. An IOMMU on bus 00 will see a bridge-assigned requester > >ID of 01:00.0 (subordinate bus number, devfn zero), but there is no > >01:00.0 device. > > > Clarification: > I meant that each requester-id must have at least 1 PCI device associated > with it. I don't think that's true, as in the example above. Requester ID 0x0100 has no pci_dev associated with it. What am I missing? Maybe you mean that requester ID 0x0100 is associated with pci_dev 01:01.0 in the sense that DMAs from 01:01.0 appear with that ID? That's true, but I can't think of a reason why we would start with ID 0x0100 and try to look up 01:01.0 from it. And of course, if you *did* try to look up the device, there could be several of them. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/26/2013 03:48 PM, Bjorn Helgaas wrote: > On Thu, Jul 25, 2013 at 02:25:04PM -0400, Don Dutile wrote: >> On 07/25/2013 01:19 PM, Bjorn Helgaas wrote: >>> On Wed, Jul 24, 2013 at 04:42:03PM -0400, Don Dutile wrote: >>>> On 07/23/2013 06:35 PM, Bjorn Helgaas wrote: >>>>> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: >>>>>> + * pcie_for_each_requester - Call callback @fn on each devices and DMA source >>>>>> + * from @requestee to the PCIe requester ID visible >>>>>> + * to @bridge. >>>>> >>>>> Transactions from a device may appear with one of several requester IDs, >>>>> but there's not necessarily an actual pci_dev for each ID, so I think the >>>> ditto above; have to have a pdev for each id.... >>> >>> This *might* be true, but I don't think we should rely on it. For >>> example: >>> >>> 00:1c.0 PCIe to PCI bridge to [bus 01] >>> 01:01.0 PCI endpoint >>> >>> The bridge will take ownership of DMA transactions from the 01:01.0 >>> endpoint. An IOMMU on bus 00 will see a bridge-assigned requester >>> ID of 01:00.0 (subordinate bus number, devfn zero), but there is no >>> 01:00.0 device. >>> >> Clarification: >> I meant that each requester-id must have at least 1 PCI device associated >> with it. > > I don't think that's true, as in the example above. Requester ID > 0x0100 has no pci_dev associated with it. What am I missing? > > Maybe you mean that requester ID 0x0100 is associated with pci_dev > 01:01.0 in the sense that DMAs from 01:01.0 appear with that ID? yes. > That's true, but I can't think of a reason why we would start with > ID 0x0100 and try to look up 01:01.0 from it. And of course, if you other than to figure out errant device behavior, or a bug in sw. ;) > *did* try to look up the device, there could be several of them. > > Bjorn What's most important for the consumers (IOMMUs) is to get all the requester-id's related to a (p)dev for mapping & unmapping setup & teardown, resp. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote: > On Wed, 2013-07-24 at 17:24 -0600, Bjorn Helgaas wrote: > > On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote: > > > On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote: > > > > On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson > > > > <alex.williamson@redhat.com> wrote: > > > > > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote: > > > > >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: > > > > As the DMA > > > > transaction propagates through the fabric, it may be tagged by bridges > > > > with different requester IDs. > > > > > > > > The requester IDs are needed outside PCI (by IOMMU drivers), but I'm > > > > not sure the intermediate pci_devs are. > > > > > > A u16 requester ID doesn't mean much on it's own though, it's not > > > necessarily even unique. A requester ID associated with the context of > > > a pci_dev is unique and gives us a reference point if we need to perform > > > another operation on that requester ID. > > > > A u16 requester ID better mean something to an IOMMU -- it's all the > > IOMMU can use to look up the correct mapping. That's why we have to > > give the iterator something to define the scope to iterate over. The > > same requester ID could mean something totally unrelated in a > > different scope, e.g., below a different IOMMU. > > The point I'm trying to make is that a requester ID depends on it's > context (minimally, the PCI segment). The caller can assume the context > based on the calling parameters or we can provide context in the form of > an associated pci_dev. I chose the latter path because I prefer > explicit interfaces and it has some usefulness in the intel-iommu > implementation. > > For instance, get_domain_for_dev() first looks to see if a pci_dev > already has a domain. If it doesn't, we want to look to see if there's > an upstream device that would use the same requester ID that already has > a domain. If our get-requester-ID-for-device function returns only the > requester ID, we don't know if that requester ID is the device we're > searching from or some upstream device. Therefore we potentially do an > unnecessary search for the domain. > > The other user is intel_iommu_add_device() where we're trying to build > IOMMU groups. Visibility is the first requirement of an IOMMU group. > If the IOMMU cannot distinguish between devices, they must be part of > the same IOMMU group. Here we want to find the pci_dev that hosts the > requester ID. I don't even know how we'd implement this with a function > that only returned the requester ID. Perhaps we'd have to walk upstream > from the device calling the get-requester-ID-for-device function at each > step and noticing when it changed. That moves significant logic back > into the caller code. > ... > > I don't see the point of passing a "device closest to the requester > > ID." What would the IOMMU do with that? As far as the IOMMU is > > concerned, the requester ID could be an arbitrary number completely > > unrelated to a pci_dev. > > Do you have an example of a case where a requester ID doesn't have some > association to a pci_dev? I think our confusion here is the same as what Don & I have been hashing out -- I'm saying a requester ID fabricated by a bridge need not correspond to a specific pci_dev, and you probably mean that every requester ID is by definition the result of *some* PCI device making a DMA request. > ... > Furthermore, if we have: > > -- A > / > X--Y > \ > -- B > ... > Let me go back to the X-Y-A|B example above to see if I can explain why > pcie_for_each_requester_id() doesn't make sense to me. Generally a > for_each_foo function should iterate across all things with the same > foo. So a for_each_requester_id should iterate across all things with > the same requester ID. Hm, that's not the way I think of for_each_FOO() interfaces. I think of it as "execute the body (or callback) for every possible FOO", where FOO is different each time. for_each_pci_dev(), pci_bus_for_each_resource(), for_each_zone(), for_each_cpu(), etc., work like that. But the more important question is what arguments we give to the callback. My proposal was to map {pci_dev -> {requester-ID-A, requester-ID-B, ...}} Yours is to map {pci_dev -> {{pci_dev-A, requester-ID-A}, {pci_dev-B, requester-ID-B}, ...}} i.e., your callback gets both a pci_dev and a requester-ID. I'm trying to figure out how you handle cases where requester-id-A doesn't have a corresponding pci_dev-A. I guess you pass the device most closely associated with requester-id-A The "most closely associated device" idea seems confusing to describe and think about. I think you want to use it as part of grouping devices into domains. But I think we should attack that problem separately. For grouping or isolation questions, we have to pay attention to things like ACS, which are not relevant for mapping. > If we look at an endpoint device like A, only A > has A's requester ID. Therefore, why would for_each_requester_id(A) > traverse up to Y? Even if A is a PCIe device, we have to traverse upwards to find any bridges that might drop A's requester ID or take ownership, e.g., if we have this: 00:1c.0 PCIe-to-PCI bridge to [bus 01-02] 01:00.0 PCI-to-PCIe bridge to [bus 02] 02:00.0 PCIe endpoint A the IOMMU has to look for requester-ID 0100. > Y can take ownership and become the requester for A, > but if we were to call for_each_requester_id(Y), wouldn't you expect the > callback to happen on {Y, A, B} since all of those can use that > requester ID? No. If I call for_each_requester_id(Y), I'm expecting the callback to happen for each requester ID that could be used for transactions originated by *Y*. I'm trying to make an IOMMU mapping for use by Y, so any devices downstream of Y, e.g., A and B, are irrelevant. I think a lot of the confusion here is because we're trying to solve both two questions at once: (1) what requester-IDs need to be mapped to handle DMA from a device, and (2) what devices can't be isolated from each other and must be in the same domain. I think things will be simpler if we can deal with those separately. Even if it ends up being more code, at least each piece will be easier to understand by itself. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote: > On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote: > > On Wed, 2013-07-24 at 17:24 -0600, Bjorn Helgaas wrote: > > > On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote: > > > > On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote: > > > > > On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson > > > > > <alex.williamson@redhat.com> wrote: > > > > > > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote: > > > > > >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: > > > > > As the DMA > > > > > transaction propagates through the fabric, it may be tagged by bridges > > > > > with different requester IDs. > > > > > > > > > > The requester IDs are needed outside PCI (by IOMMU drivers), but I'm > > > > > not sure the intermediate pci_devs are. > > > > > > > > A u16 requester ID doesn't mean much on it's own though, it's not > > > > necessarily even unique. A requester ID associated with the context of > > > > a pci_dev is unique and gives us a reference point if we need to perform > > > > another operation on that requester ID. > > > > > > A u16 requester ID better mean something to an IOMMU -- it's all the > > > IOMMU can use to look up the correct mapping. That's why we have to > > > give the iterator something to define the scope to iterate over. The > > > same requester ID could mean something totally unrelated in a > > > different scope, e.g., below a different IOMMU. > > > > The point I'm trying to make is that a requester ID depends on it's > > context (minimally, the PCI segment). The caller can assume the context > > based on the calling parameters or we can provide context in the form of > > an associated pci_dev. I chose the latter path because I prefer > > explicit interfaces and it has some usefulness in the intel-iommu > > implementation. > > > > For instance, get_domain_for_dev() first looks to see if a pci_dev > > already has a domain. If it doesn't, we want to look to see if there's > > an upstream device that would use the same requester ID that already has > > a domain. If our get-requester-ID-for-device function returns only the > > requester ID, we don't know if that requester ID is the device we're > > searching from or some upstream device. Therefore we potentially do an > > unnecessary search for the domain. > > > > The other user is intel_iommu_add_device() where we're trying to build > > IOMMU groups. Visibility is the first requirement of an IOMMU group. > > If the IOMMU cannot distinguish between devices, they must be part of > > the same IOMMU group. Here we want to find the pci_dev that hosts the > > requester ID. I don't even know how we'd implement this with a function > > that only returned the requester ID. Perhaps we'd have to walk upstream > > from the device calling the get-requester-ID-for-device function at each > > step and noticing when it changed. That moves significant logic back > > into the caller code. > > ... > > > > I don't see the point of passing a "device closest to the requester > > > ID." What would the IOMMU do with that? As far as the IOMMU is > > > concerned, the requester ID could be an arbitrary number completely > > > unrelated to a pci_dev. > > > > Do you have an example of a case where a requester ID doesn't have some > > association to a pci_dev? > > I think our confusion here is the same as what Don & I have been > hashing out -- I'm saying a requester ID fabricated by a bridge > need not correspond to a specific pci_dev, and you probably mean > that every requester ID is by definition the result of *some* PCI > device making a DMA request. Yes > > ... > > Furthermore, if we have: > > > > -- A > > / > > X--Y > > \ > > -- B > > ... > > > Let me go back to the X-Y-A|B example above to see if I can explain why > > pcie_for_each_requester_id() doesn't make sense to me. Generally a > > for_each_foo function should iterate across all things with the same > > foo. So a for_each_requester_id should iterate across all things with > > the same requester ID. > > Hm, that's not the way I think of for_each_FOO() interfaces. I > think of it as "execute the body (or callback) for every possible > FOO", where FOO is different each time. for_each_pci_dev(), > pci_bus_for_each_resource(), for_each_zone(), for_each_cpu(), etc., > work like that. Most of these aren't relevant because they iterate over everything. I think they only show that if we had a pci_for_each_requester_id() that it should iterate over every possible PCI requester ID in the system and the same could be said for pci_for_each_requester(). pci_bus_for_each_resource() is the only one we can build from; for all resources on a bus. We want all requester IDs for a pci_dev. Does that perhaps mean it should be called pci_dev_for_each_requester_id()? I'd be ok with that name, but I think it even more implies that a pci_dev is associated with a requester ID. > But the more important question is what arguments we give to the > callback. My proposal was to map > > {pci_dev -> {requester-ID-A, requester-ID-B, ...}} > > Yours is to map > > {pci_dev -> {{pci_dev-A, requester-ID-A}, {pci_dev-B, requester-ID-B}, ...}} > > i.e., your callback gets both a pci_dev and a requester-ID. I'm > trying to figure out how you handle cases where requester-id-A > doesn't have a corresponding pci_dev-A. I guess you pass the device > most closely associated with requester-id-A Yes > The "most closely associated device" idea seems confusing to > describe and think about. I think you want to use it as part of > grouping devices into domains. But I think we should attack that > problem separately. For grouping or isolation questions, we have > to pay attention to things like ACS, which are not relevant for > mapping. We are only touch isolation insofar as providing an interface for a driver to determine the point in the PCI topology where the requester ID is rooted. Yes, grouping can make use of that, but I object to the idea that we're approaching some slippery slope of rolling multiple concepts into this patch. What I'm proposing here is strictly a requester ID interface. I believe that providing a way for a caller to determine that two devices have a requester ID rooted at the same point in the PCI topology is directly relevant to that interface. pci_find_upstream_pcie_bridge() attempts to do this same thing. Unfortunately, the interface is convoluted, it doesn't handle quirks, and it requires a great deal of work by the caller to then walk the topology and create requester IDs at each step. This also indicates that at each step, the requester ID is associated with some pci_dev. Providing a pci_dev is simply showing our work and providing context to the requester ID (ie. here's the requester ID and the step along the path from which it was derived. Here's your top level requester ID and the point in the topology where it's based). > > If we look at an endpoint device like A, only A > > has A's requester ID. Therefore, why would for_each_requester_id(A) > > traverse up to Y? > > Even if A is a PCIe device, we have to traverse upwards to find any > bridges that might drop A's requester ID or take ownership, e.g., if > we have this: > > 00:1c.0 PCIe-to-PCI bridge to [bus 01-02] > 01:00.0 PCI-to-PCIe bridge to [bus 02] > 02:00.0 PCIe endpoint A > > the IOMMU has to look for requester-ID 0100. And I believe this patch handles this case correctly; I mentioned this exact example in the v2 RFC cover letter. This is another example where pci_find_upstream_pcie_bridge() will currently fail. > > Y can take ownership and become the requester for A, > > but if we were to call for_each_requester_id(Y), wouldn't you expect the > > callback to happen on {Y, A, B} since all of those can use that > > requester ID? > > No. If I call for_each_requester_id(Y), I'm expecting the callback > to happen for each requester ID that could be used for transactions > originated by *Y*. I'm trying to make an IOMMU mapping for use by > Y, so any devices downstream of Y, e.g., A and B, are irrelevant. Ok, you think of for_each_requester_id() the same as I think of for_each_requester(). Can we split the difference and call it pci_dev_for_each_requester_id()? > I think a lot of the confusion here is because we're trying to solve > both two questions at once: (1) what requester-IDs need to be mapped > to handle DMA from a device, and (2) what devices can't be isolated > from each other and must be in the same domain. I think things will > be simpler if we can deal with those separately. Even if it ends up > being more code, at least each piece will be easier to understand by > itself. Don't we already have this split in the code? (1) pcie_for_each_requester (or pci_dev_for_each_requester_id) (2) pci_get_visible_pcie_requester (or pci_get_visible_pcie_requester_id) Note however that (2) does not impose anything about domains or isolation, it is strictly based on PCI topology. It's left to the caller to determine how that translates to IOMMU domains, but the typical case is trivial. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 29, 2013 at 10:06 AM, Alex Williamson <alex.williamson@redhat.com> wrote: > On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote: >> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote: >> The "most closely associated device" idea seems confusing to >> describe and think about. I think you want to use it as part of >> grouping devices into domains. But I think we should attack that >> problem separately. For grouping or isolation questions, we have >> to pay attention to things like ACS, which are not relevant for >> mapping. > > We are only touch isolation insofar as providing an interface for a > driver to determine the point in the PCI topology where the requester ID > is rooted. Yes, grouping can make use of that, but I object to the idea > that we're approaching some slippery slope of rolling multiple concepts > into this patch. What I'm proposing here is strictly a requester ID > interface. I believe that providing a way for a caller to determine > that two devices have a requester ID rooted at the same point in the PCI > topology is directly relevant to that interface. > > pci_find_upstream_pcie_bridge() attempts to do this same thing. > Unfortunately, the interface is convoluted, it doesn't handle quirks, > and it requires a great deal of work by the caller to then walk the > topology and create requester IDs at each step. This also indicates > that at each step, the requester ID is associated with some pci_dev. > Providing a pci_dev is simply showing our work and providing context to > the requester ID (ie. here's the requester ID and the step along the > path from which it was derived. Here's your top level requester ID and > the point in the topology where it's based). What does the driver *do* with the pci_dev? If it's not necessary, then showing our work and providing context just complicates the interface and creates opportunities for mistakes. If we're creating IOMMU mappings, only the requester ID is needed. I think you used get_domain_for_dev() and intel_iommu_add_device() as examples, but as far as I can tell, they use the pci_dev as a way to learn about isolation. For that purpose, I don't think you want an iterator -- you only want to know about the single pci_dev that's the root of the isolation domain, and requester IDs really aren't relevant. >> > If we look at an endpoint device like A, only A >> > has A's requester ID. Therefore, why would for_each_requester_id(A) >> > traverse up to Y? >> >> Even if A is a PCIe device, we have to traverse upwards to find any >> bridges that might drop A's requester ID or take ownership, e.g., if >> we have this: >> >> 00:1c.0 PCIe-to-PCI bridge to [bus 01-02] >> 01:00.0 PCI-to-PCIe bridge to [bus 02] >> 02:00.0 PCIe endpoint A >> >> the IOMMU has to look for requester-ID 0100. > > And I believe this patch handles this case correctly; I mentioned this > exact example in the v2 RFC cover letter. This is another example where > pci_find_upstream_pcie_bridge() will currently fail. OK, I was just trying to answer your question "why we would need to traverse up to Y." But apparently we agree about that. >> > Y can take ownership and become the requester for A, >> > but if we were to call for_each_requester_id(Y), wouldn't you expect the >> > callback to happen on {Y, A, B} since all of those can use that >> > requester ID? >> >> No. If I call for_each_requester_id(Y), I'm expecting the callback >> to happen for each requester ID that could be used for transactions >> originated by *Y*. I'm trying to make an IOMMU mapping for use by >> Y, so any devices downstream of Y, e.g., A and B, are irrelevant. > > Ok, you think of for_each_requester_id() the same as I think of > for_each_requester(). Can we split the difference and call it > pci_dev_for_each_requester_id()? Sure. >> I think a lot of the confusion here is because we're trying to solve >> both two questions at once: (1) what requester-IDs need to be mapped >> to handle DMA from a device, and (2) what devices can't be isolated >> from each other and must be in the same domain. ... > > Don't we already have this split in the code? > > (1) pcie_for_each_requester > (or pci_dev_for_each_requester_id) > > (2) pci_get_visible_pcie_requester > (or pci_get_visible_pcie_requester_id) > > Note however that (2) does not impose anything about domains or > isolation, it is strictly based on PCI topology. It's left to the > caller to determine how that translates to IOMMU domains, but the > typical case is trivial. Can you expand on this a little bit? What's involved in translating the pci_get_visible_pcie_requester() result to an IOMMU domain? Is there something IOMMU-specific about this? I assume that for any given PCI device A, there's a subtree of devices that the IOMMU can't distinguish from A, and that this subtree is not IOMMU-specific so it's reasonable to to have a PCI interface to compute it. An IOMMU driver might want to expand the domain to include more devices, of course, and *that* seems like something to leave to the driver. I'm also assuming that a pci_dev (the root of the subtree) is a good way to identify this set of indistinguishable devices because it's easy to test whether a device B is part of it. But the requester ID seems redundant in this case because you can just associate an IOMMU domain with the pci_dev. Where does ACS come into the picture? Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/29/2013 12:06 PM, Alex Williamson wrote: > On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote: >> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote: >>> On Wed, 2013-07-24 at 17:24 -0600, Bjorn Helgaas wrote: >>>> On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote: >>>>> On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote: >>>>>> On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson >>>>>> <alex.williamson@redhat.com> wrote: >>>>>>> On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote: >>>>>>>> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: >>>>>> As the DMA >>>>>> transaction propagates through the fabric, it may be tagged by bridges >>>>>> with different requester IDs. >>>>>> >>>>>> The requester IDs are needed outside PCI (by IOMMU drivers), but I'm >>>>>> not sure the intermediate pci_devs are. >>>>> >>>>> A u16 requester ID doesn't mean much on it's own though, it's not >>>>> necessarily even unique. A requester ID associated with the context of >>>>> a pci_dev is unique and gives us a reference point if we need to perform >>>>> another operation on that requester ID. >>>> >>>> A u16 requester ID better mean something to an IOMMU -- it's all the >>>> IOMMU can use to look up the correct mapping. That's why we have to >>>> give the iterator something to define the scope to iterate over. The >>>> same requester ID could mean something totally unrelated in a >>>> different scope, e.g., below a different IOMMU. >>> >>> The point I'm trying to make is that a requester ID depends on it's >>> context (minimally, the PCI segment). The caller can assume the context >>> based on the calling parameters or we can provide context in the form of >>> an associated pci_dev. I chose the latter path because I prefer >>> explicit interfaces and it has some usefulness in the intel-iommu >>> implementation. >>> >>> For instance, get_domain_for_dev() first looks to see if a pci_dev >>> already has a domain. If it doesn't, we want to look to see if there's >>> an upstream device that would use the same requester ID that already has >>> a domain. If our get-requester-ID-for-device function returns only the >>> requester ID, we don't know if that requester ID is the device we're >>> searching from or some upstream device. Therefore we potentially do an >>> unnecessary search for the domain. >>> >>> The other user is intel_iommu_add_device() where we're trying to build >>> IOMMU groups. Visibility is the first requirement of an IOMMU group. >>> If the IOMMU cannot distinguish between devices, they must be part of >>> the same IOMMU group. Here we want to find the pci_dev that hosts the >>> requester ID. I don't even know how we'd implement this with a function >>> that only returned the requester ID. Perhaps we'd have to walk upstream >>> from the device calling the get-requester-ID-for-device function at each >>> step and noticing when it changed. That moves significant logic back >>> into the caller code. >>> ... >> >>>> I don't see the point of passing a "device closest to the requester >>>> ID." What would the IOMMU do with that? As far as the IOMMU is >>>> concerned, the requester ID could be an arbitrary number completely >>>> unrelated to a pci_dev. >>> >>> Do you have an example of a case where a requester ID doesn't have some >>> association to a pci_dev? >> >> I think our confusion here is the same as what Don& I have been >> hashing out -- I'm saying a requester ID fabricated by a bridge >> need not correspond to a specific pci_dev, and you probably mean >> that every requester ID is by definition the result of *some* PCI >> device making a DMA request. > > Yes > >>> ... >>> Furthermore, if we have: >>> >>> -- A >>> / >>> X--Y >>> \ >>> -- B >>> ... >> >>> Let me go back to the X-Y-A|B example above to see if I can explain why >>> pcie_for_each_requester_id() doesn't make sense to me. Generally a >>> for_each_foo function should iterate across all things with the same >>> foo. So a for_each_requester_id should iterate across all things with >>> the same requester ID. >> >> Hm, that's not the way I think of for_each_FOO() interfaces. I >> think of it as "execute the body (or callback) for every possible >> FOO", where FOO is different each time. for_each_pci_dev(), >> pci_bus_for_each_resource(), for_each_zone(), for_each_cpu(), etc., >> work like that. > > Most of these aren't relevant because they iterate over everything. I > think they only show that if we had a pci_for_each_requester_id() that > it should iterate over every possible PCI requester ID in the system and > the same could be said for pci_for_each_requester(). > Lost me here.... I thought the functions took in a pdev, so it'll only iterate on the segment that pdev is associated with. > pci_bus_for_each_resource() is the only one we can build from; for all > resources on a bus. We want all requester IDs for a pci_dev. Does that > perhaps mean it should be called pci_dev_for_each_requester_id()? I'd > be ok with that name, but I think it even more implies that a pci_dev is > associated with a requester ID. > and the fcn call name changes have equally lost me here. AIUI, IOMMUs want to call a PCI core function with a pdev, asking for the req-id's that the pdev may generate to the IOMMU when performing DMA. that doesn't strike me as a for-each-requester-id() but a 'get-requester-id()' operation. >> But the more important question is what arguments we give to the >> callback. My proposal was to map >> >> {pci_dev -> {requester-ID-A, requester-ID-B, ...}} >> >> Yours is to map >> >> {pci_dev -> {{pci_dev-A, requester-ID-A}, {pci_dev-B, requester-ID-B}, ...}} >> >> i.e., your callback gets both a pci_dev and a requester-ID. I'm >> trying to figure out how you handle cases where requester-id-A >> doesn't have a corresponding pci_dev-A. I guess you pass the device >> most closely associated with requester-id-A > > Yes > >> The "most closely associated device" idea seems confusing to >> describe and think about. I think you want to use it as part of >> grouping devices into domains. But I think we should attack that >> problem separately. For grouping or isolation questions, we have >> to pay attention to things like ACS, which are not relevant for >> mapping. > > We are only touch isolation insofar as providing an interface for a > driver to determine the point in the PCI topology where the requester ID > is rooted. Yes, grouping can make use of that, but I object to the idea > that we're approaching some slippery slope of rolling multiple concepts > into this patch. What I'm proposing here is strictly a requester ID > interface. I believe that providing a way for a caller to determine > that two devices have a requester ID rooted at the same point in the PCI > topology is directly relevant to that interface. > I strongly agree here. providing the pdev's associated with each req-id rtn'd cannot hurt, and in fact, I believe it is an improvement, avoiding a potential set of more interfaces that may be needed to do (segment, req-id)->pdev mapping/matching/get-ing. > pci_find_upstream_pcie_bridge() attempts to do this same thing. > Unfortunately, the interface is convoluted, it doesn't handle quirks, > and it requires a great deal of work by the caller to then walk the > topology and create requester IDs at each step. This also indicates > that at each step, the requester ID is associated with some pci_dev. > Providing a pci_dev is simply showing our work and providing context to > the requester ID (ie. here's the requester ID and the step along the > path from which it was derived. Here's your top level requester ID and > the point in the topology where it's based). > IMO, getting rid of pci_find_upstream_pcie_bridge() gets non-PCI code out of the biz of knowing too much about PCI topology and the idiosyncracies around req-id's, and the proposed interface cleans up the IOMMU (PCI) support. Having the IOMMU api get the pdev rtn'd with a req-id, and then passing it back to the core (for release/free) seems like the proper get/put logic btwn the PCI core and another kernel subsystem. >>> If we look at an endpoint device like A, only A >>> has A's requester ID. Therefore, why would for_each_requester_id(A) >>> traverse up to Y? >> >> Even if A is a PCIe device, we have to traverse upwards to find any >> bridges that might drop A's requester ID or take ownership, e.g., if >> we have this: >> >> 00:1c.0 PCIe-to-PCI bridge to [bus 01-02] >> 01:00.0 PCI-to-PCIe bridge to [bus 02] >> 02:00.0 PCIe endpoint A >> >> the IOMMU has to look for requester-ID 0100. > > And I believe this patch handles this case correctly; I mentioned this > exact example in the v2 RFC cover letter. This is another example where > pci_find_upstream_pcie_bridge() will currently fail. > +1 >>> Y can take ownership and become the requester for A, >>> but if we were to call for_each_requester_id(Y), wouldn't you expect the >>> callback to happen on {Y, A, B} since all of those can use that >>> requester ID? >> >> No. If I call for_each_requester_id(Y), I'm expecting the callback >> to happen for each requester ID that could be used for transactions >> originated by *Y*. I'm trying to make an IOMMU mapping for use by >> Y, so any devices downstream of Y, e.g., A and B, are irrelevant. > > Ok, you think of for_each_requester_id() the same as I think of > for_each_requester(). Can we split the difference and call it > pci_dev_for_each_requester_id()? > I can only ditto my sentiment above. Can I toss in 'pci_get_requester_id()'?!? ... ;-) >> I think a lot of the confusion here is because we're trying to solve >> both two questions at once: (1) what requester-IDs need to be mapped >> to handle DMA from a device, and (2) what devices can't be isolated >> from each other and must be in the same domain. I think things will >> be simpler if we can deal with those separately. Even if it ends up >> being more code, at least each piece will be easier to understand by >> itself. > > Don't we already have this split in the code? > > (1) pcie_for_each_requester > (or pci_dev_for_each_requester_id) > > (2) pci_get_visible_pcie_requester > (or pci_get_visible_pcie_requester_id) > again, unless you understand all of the mapped/coerced/modified request-id of PCIe, PCIe-to-PCI(x) bridges, and the potential (RC, other) quirks, these interface names are confusing. .... ... that could be me, and I need to go back to look at patches and the description of the functions, to see if they help to clarify their uses. > Note however that (2) does not impose anything about domains or > isolation, it is strictly based on PCI topology. It's left to the > caller to determine how that translates to IOMMU domains, but the > typical case is trivial. Thanks, > +1, again. > Alex > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-07-29 at 15:02 -0600, Bjorn Helgaas wrote: > On Mon, Jul 29, 2013 at 10:06 AM, Alex Williamson > <alex.williamson@redhat.com> wrote: > > On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote: > >> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote: > > >> The "most closely associated device" idea seems confusing to > >> describe and think about. I think you want to use it as part of > >> grouping devices into domains. But I think we should attack that > >> problem separately. For grouping or isolation questions, we have > >> to pay attention to things like ACS, which are not relevant for > >> mapping. > > > > We are only touch isolation insofar as providing an interface for a > > driver to determine the point in the PCI topology where the requester ID > > is rooted. Yes, grouping can make use of that, but I object to the idea > > that we're approaching some slippery slope of rolling multiple concepts > > into this patch. What I'm proposing here is strictly a requester ID > > interface. I believe that providing a way for a caller to determine > > that two devices have a requester ID rooted at the same point in the PCI > > topology is directly relevant to that interface. > > > > pci_find_upstream_pcie_bridge() attempts to do this same thing. > > Unfortunately, the interface is convoluted, it doesn't handle quirks, > > and it requires a great deal of work by the caller to then walk the > > topology and create requester IDs at each step. This also indicates > > that at each step, the requester ID is associated with some pci_dev. > > Providing a pci_dev is simply showing our work and providing context to > > the requester ID (ie. here's the requester ID and the step along the > > path from which it was derived. Here's your top level requester ID and > > the point in the topology where it's based). > > What does the driver *do* with the pci_dev? If it's not necessary, > then showing our work and providing context just complicates the > interface and creates opportunities for mistakes. If we're creating > IOMMU mappings, only the requester ID is needed. It's true, I don't have a use for the pci_dev in pci_dev_for_each_requester_id() today. But I think providing the context for a requester ID is valuable information. Otherwise we have to make assumptions about the requester ID. For example, if I have devices in different PCI segments with the same requester ID the callback function only knows that those are actually different requester IDs from information the caller provides itself in the opaque pointer. This is the case with intel-iommu, but why would we design an API that requires the caller to provide that kind of context? I also provide the pci_dev because I think both the pci_dev_for_each_requester_id() and pci_get_visible_pcie_requester() should provide similar APIs. There's an association of a requester ID to a pci_dev. Why hide that? > I think you used get_domain_for_dev() and intel_iommu_add_device() as > examples, but as far as I can tell, they use the pci_dev as a way to > learn about isolation. For that purpose, I don't think you want an > iterator -- you only want to know about the single pci_dev that's the > root of the isolation domain, and requester IDs really aren't > relevant. See get_domain_for_dev() in patch 2/2. It uses the returned pci_dev to know whether the requester ID is rooted upstream or at the device itself. If upstream, it then uses the requester ID to search for an existing domain. The requester ID is relevant here. If the returned pci_dev is the device itself, it proceeds to allocate a domain because the code path has already checked whether the device itself belongs to a domain. The use in intel_iommu_add_device() doesn't use the requester ID, it only wants the pci_dev since it needs to then search above that for additional isolation capabilities. That interface only wants to know the point in the topology where we have bus level granularity of devices, so it doesn't care about the actual requester ID. > >> > If we look at an endpoint device like A, only A > >> > has A's requester ID. Therefore, why would for_each_requester_id(A) > >> > traverse up to Y? > >> > >> Even if A is a PCIe device, we have to traverse upwards to find any > >> bridges that might drop A's requester ID or take ownership, e.g., if > >> we have this: > >> > >> 00:1c.0 PCIe-to-PCI bridge to [bus 01-02] > >> 01:00.0 PCI-to-PCIe bridge to [bus 02] > >> 02:00.0 PCIe endpoint A > >> > >> the IOMMU has to look for requester-ID 0100. > > > > And I believe this patch handles this case correctly; I mentioned this > > exact example in the v2 RFC cover letter. This is another example where > > pci_find_upstream_pcie_bridge() will currently fail. > > OK, I was just trying to answer your question "why we would need to > traverse up to Y." But apparently we agree about that. > > >> > Y can take ownership and become the requester for A, > >> > but if we were to call for_each_requester_id(Y), wouldn't you expect the > >> > callback to happen on {Y, A, B} since all of those can use that > >> > requester ID? > >> > >> No. If I call for_each_requester_id(Y), I'm expecting the callback > >> to happen for each requester ID that could be used for transactions > >> originated by *Y*. I'm trying to make an IOMMU mapping for use by > >> Y, so any devices downstream of Y, e.g., A and B, are irrelevant. > > > > Ok, you think of for_each_requester_id() the same as I think of > > for_each_requester(). Can we split the difference and call it > > pci_dev_for_each_requester_id()? > > Sure. > > >> I think a lot of the confusion here is because we're trying to solve > >> both two questions at once: (1) what requester-IDs need to be mapped > >> to handle DMA from a device, and (2) what devices can't be isolated > >> from each other and must be in the same domain. ... > > > > Don't we already have this split in the code? > > > > (1) pcie_for_each_requester > > (or pci_dev_for_each_requester_id) > > > > (2) pci_get_visible_pcie_requester > > (or pci_get_visible_pcie_requester_id) > > > > Note however that (2) does not impose anything about domains or > > isolation, it is strictly based on PCI topology. It's left to the > > caller to determine how that translates to IOMMU domains, but the > > typical case is trivial. > > Can you expand on this a little bit? What's involved in translating > the pci_get_visible_pcie_requester() result to an IOMMU domain? Is > there something IOMMU-specific about this? I try not to make any assumptions about what's required to get from a pci_dev/requester ID to a domain. The IOMMU may only be able to filter a subset of the requester ID, for instance what if it has no function level visibility, or perhaps only bus level visibility. intel-iommu has full BDF granularity, so it's perhaps the simplest case. > I assume that for any given PCI device A, there's a subtree of devices > that the IOMMU can't distinguish from A, and that this subtree is not > IOMMU-specific so it's reasonable to to have a PCI interface to > compute it. An IOMMU driver might want to expand the domain to > include more devices, of course, and *that* seems like something to > leave to the driver. Yes, I agree. We're only trying to determine the PCI bus level visibility, anything else is left to the IOMMU driver. > I'm also assuming that a pci_dev (the root of the subtree) is a good > way to identify this set of indistinguishable devices because it's > easy to test whether a device B is part of it. But the requester ID > seems redundant in this case because you can just associate an IOMMU > domain with the pci_dev. I don't think we want to assume that an IOMMU driver bases a domain on a pci_dev or a BDF. That's specific to the driver. The requester ID is not redundant though since it depends on topology and quirks. A given pci_dev may have it's own legitimate requester ID or a quirk, where the quirk may be specific to the device or architected, like a bridge using the (bridge->subordinate->number << 8 | 0) requester ID. > Where does ACS come into the picture? ACS determines whether a transaction must necessarily reach the IOMMU without being redirected. Therefore once we figure out the point at which we have bus level granularity for a transaction, we look upstream to determine if a transaction must necessarily reach the IOMMU. In the case of intel-iommu, IOMMU groups are only dependent on the visibility provided by the PCI topology and the isolation guaranteed through ACS, but it's up to the IOMMU driver to determine whether there are additional constraints. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-07-29 at 17:03 -0400, Don Dutile wrote: > On 07/29/2013 12:06 PM, Alex Williamson wrote: > > On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote: > >> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote: > >>> On Wed, 2013-07-24 at 17:24 -0600, Bjorn Helgaas wrote: > >>>> On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote: > >>>>> On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote: > >>>>>> On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson > >>>>>> <alex.williamson@redhat.com> wrote: > >>>>>>> On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote: > >>>>>>>> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote: > >>>>>> As the DMA > >>>>>> transaction propagates through the fabric, it may be tagged by bridges > >>>>>> with different requester IDs. > >>>>>> > >>>>>> The requester IDs are needed outside PCI (by IOMMU drivers), but I'm > >>>>>> not sure the intermediate pci_devs are. > >>>>> > >>>>> A u16 requester ID doesn't mean much on it's own though, it's not > >>>>> necessarily even unique. A requester ID associated with the context of > >>>>> a pci_dev is unique and gives us a reference point if we need to perform > >>>>> another operation on that requester ID. > >>>> > >>>> A u16 requester ID better mean something to an IOMMU -- it's all the > >>>> IOMMU can use to look up the correct mapping. That's why we have to > >>>> give the iterator something to define the scope to iterate over. The > >>>> same requester ID could mean something totally unrelated in a > >>>> different scope, e.g., below a different IOMMU. > >>> > >>> The point I'm trying to make is that a requester ID depends on it's > >>> context (minimally, the PCI segment). The caller can assume the context > >>> based on the calling parameters or we can provide context in the form of > >>> an associated pci_dev. I chose the latter path because I prefer > >>> explicit interfaces and it has some usefulness in the intel-iommu > >>> implementation. > >>> > >>> For instance, get_domain_for_dev() first looks to see if a pci_dev > >>> already has a domain. If it doesn't, we want to look to see if there's > >>> an upstream device that would use the same requester ID that already has > >>> a domain. If our get-requester-ID-for-device function returns only the > >>> requester ID, we don't know if that requester ID is the device we're > >>> searching from or some upstream device. Therefore we potentially do an > >>> unnecessary search for the domain. > >>> > >>> The other user is intel_iommu_add_device() where we're trying to build > >>> IOMMU groups. Visibility is the first requirement of an IOMMU group. > >>> If the IOMMU cannot distinguish between devices, they must be part of > >>> the same IOMMU group. Here we want to find the pci_dev that hosts the > >>> requester ID. I don't even know how we'd implement this with a function > >>> that only returned the requester ID. Perhaps we'd have to walk upstream > >>> from the device calling the get-requester-ID-for-device function at each > >>> step and noticing when it changed. That moves significant logic back > >>> into the caller code. > >>> ... > >> > >>>> I don't see the point of passing a "device closest to the requester > >>>> ID." What would the IOMMU do with that? As far as the IOMMU is > >>>> concerned, the requester ID could be an arbitrary number completely > >>>> unrelated to a pci_dev. > >>> > >>> Do you have an example of a case where a requester ID doesn't have some > >>> association to a pci_dev? > >> > >> I think our confusion here is the same as what Don& I have been > >> hashing out -- I'm saying a requester ID fabricated by a bridge > >> need not correspond to a specific pci_dev, and you probably mean > >> that every requester ID is by definition the result of *some* PCI > >> device making a DMA request. > > > > Yes > > > >>> ... > >>> Furthermore, if we have: > >>> > >>> -- A > >>> / > >>> X--Y > >>> \ > >>> -- B > >>> ... > >> > >>> Let me go back to the X-Y-A|B example above to see if I can explain why > >>> pcie_for_each_requester_id() doesn't make sense to me. Generally a > >>> for_each_foo function should iterate across all things with the same > >>> foo. So a for_each_requester_id should iterate across all things with > >>> the same requester ID. > >> > >> Hm, that's not the way I think of for_each_FOO() interfaces. I > >> think of it as "execute the body (or callback) for every possible > >> FOO", where FOO is different each time. for_each_pci_dev(), > >> pci_bus_for_each_resource(), for_each_zone(), for_each_cpu(), etc., > >> work like that. > > > > Most of these aren't relevant because they iterate over everything. I > > think they only show that if we had a pci_for_each_requester_id() that > > it should iterate over every possible PCI requester ID in the system and > > the same could be said for pci_for_each_requester(). > > > Lost me here.... I thought the functions took in a pdev, so it'll only > iterate on the segment that pdev is associated with. Right, but how does the callback function know the segment we're traversing? Without a pci_dev a requester ID could belong to any PCI segment. It would be up to the caller to provide context in the opaque pointer. > > pci_bus_for_each_resource() is the only one we can build from; for all > > resources on a bus. We want all requester IDs for a pci_dev. Does that > > perhaps mean it should be called pci_dev_for_each_requester_id()? I'd > > be ok with that name, but I think it even more implies that a pci_dev is > > associated with a requester ID. > > > and the fcn call name changes have equally lost me here. > AIUI, IOMMUs want to call a PCI core function with a pdev, asking for > the req-id's that the pdev may generate to the IOMMU when performing DMA. > that doesn't strike me as a for-each-requester-id() but a 'get-requester-id()' > operation. But we have to support multiple requester IDs per endpoint. As Bjorn pointed out in the spec, a PCI-X bridge may pass the requester ID or take ownership of the transaction. Therefore we need to add both the "endpoint" and the (bridge->secondary->number << 8 | 0) requester IDs. We could do a get-requester-id() interface, but then the caller must walk upstream through each device that may possibly take ownership of the transactions. That leaves a lot of logic in the caller versus a for-each-requester-id() operation. > >> But the more important question is what arguments we give to the > >> callback. My proposal was to map > >> > >> {pci_dev -> {requester-ID-A, requester-ID-B, ...}} > >> > >> Yours is to map > >> > >> {pci_dev -> {{pci_dev-A, requester-ID-A}, {pci_dev-B, requester-ID-B}, ...}} > >> > >> i.e., your callback gets both a pci_dev and a requester-ID. I'm > >> trying to figure out how you handle cases where requester-id-A > >> doesn't have a corresponding pci_dev-A. I guess you pass the device > >> most closely associated with requester-id-A > > > > Yes > > > >> The "most closely associated device" idea seems confusing to > >> describe and think about. I think you want to use it as part of > >> grouping devices into domains. But I think we should attack that > >> problem separately. For grouping or isolation questions, we have > >> to pay attention to things like ACS, which are not relevant for > >> mapping. > > > > We are only touch isolation insofar as providing an interface for a > > driver to determine the point in the PCI topology where the requester ID > > is rooted. Yes, grouping can make use of that, but I object to the idea > > that we're approaching some slippery slope of rolling multiple concepts > > into this patch. What I'm proposing here is strictly a requester ID > > interface. I believe that providing a way for a caller to determine > > that two devices have a requester ID rooted at the same point in the PCI > > topology is directly relevant to that interface. > > > I strongly agree here. providing the pdev's associated with each req-id rtn'd > cannot hurt, and in fact, I believe it is an improvement, avoiding a potential > set of more interfaces that may be needed to do (segment, req-id)->pdev mapping/matching/get-ing. > > > > pci_find_upstream_pcie_bridge() attempts to do this same thing. > > Unfortunately, the interface is convoluted, it doesn't handle quirks, > > and it requires a great deal of work by the caller to then walk the > > topology and create requester IDs at each step. This also indicates > > that at each step, the requester ID is associated with some pci_dev. > > Providing a pci_dev is simply showing our work and providing context to > > the requester ID (ie. here's the requester ID and the step along the > > path from which it was derived. Here's your top level requester ID and > > the point in the topology where it's based). > > > IMO, getting rid of pci_find_upstream_pcie_bridge() gets non-PCI code > out of the biz of knowing too much about PCI topology and the idiosyncracies > around req-id's, and the proposed interface cleans up the IOMMU (PCI) support. > Having the IOMMU api get the pdev rtn'd with a req-id, and then passing it > back to the core (for release/free) seems like the proper get/put logic > btwn the PCI core and another kernel subsystem. > > >>> If we look at an endpoint device like A, only A > >>> has A's requester ID. Therefore, why would for_each_requester_id(A) > >>> traverse up to Y? > >> > >> Even if A is a PCIe device, we have to traverse upwards to find any > >> bridges that might drop A's requester ID or take ownership, e.g., if > >> we have this: > >> > >> 00:1c.0 PCIe-to-PCI bridge to [bus 01-02] > >> 01:00.0 PCI-to-PCIe bridge to [bus 02] > >> 02:00.0 PCIe endpoint A > >> > >> the IOMMU has to look for requester-ID 0100. > > > > And I believe this patch handles this case correctly; I mentioned this > > exact example in the v2 RFC cover letter. This is another example where > > pci_find_upstream_pcie_bridge() will currently fail. > > > +1 > > >>> Y can take ownership and become the requester for A, > >>> but if we were to call for_each_requester_id(Y), wouldn't you expect the > >>> callback to happen on {Y, A, B} since all of those can use that > >>> requester ID? > >> > >> No. If I call for_each_requester_id(Y), I'm expecting the callback > >> to happen for each requester ID that could be used for transactions > >> originated by *Y*. I'm trying to make an IOMMU mapping for use by > >> Y, so any devices downstream of Y, e.g., A and B, are irrelevant. > > > > Ok, you think of for_each_requester_id() the same as I think of > > for_each_requester(). Can we split the difference and call it > > pci_dev_for_each_requester_id()? > > > I can only ditto my sentiment above. > Can I toss in 'pci_get_requester_id()'?!? ... ;-) I assume pci_get_requester_id() would return just the requester ID for a given device. The problem with that is that it doesn't have much meaning. What's the requester ID of a bridge? It depends on whether the bridge is mastering the transaction (bridge->bus->number << 8 | bridge->devfn) or whether it's taking ownership of a transaction for a subordinate device (bridge->secondary->number << 8 | 0). So we really want the for-each interface because then we have a set of devices and a vector through them. > >> I think a lot of the confusion here is because we're trying to solve > >> both two questions at once: (1) what requester-IDs need to be mapped > >> to handle DMA from a device, and (2) what devices can't be isolated > >> from each other and must be in the same domain. I think things will > >> be simpler if we can deal with those separately. Even if it ends up > >> being more code, at least each piece will be easier to understand by > >> itself. > > > > Don't we already have this split in the code? > > > > (1) pcie_for_each_requester > > (or pci_dev_for_each_requester_id) > > > > (2) pci_get_visible_pcie_requester > > (or pci_get_visible_pcie_requester_id) > > > again, unless you understand all of the mapped/coerced/modified request-id > of PCIe, PCIe-to-PCI(x) bridges, and the potential (RC, other) quirks, > these interface names are confusing. .... > ... that could be me, and I need to go back to look at patches and > the description of the functions, to see if they help to clarify their uses. Perhaps it's time for a v3, but I'm not sure what all we've learned from v2 yet or what direction a v3 should take. We've learned that root complex PCIe-to-PCI bridges use implementation specific requester IDs and we know that Intel uses the (bus|devfn) requester ID instead of the standard (secondary|0). This should probably be handled as a quirk. We've renamed pcie_for_each_requester() to pci_dev_for_each_requester_id(). I don't know if we've renamed pci_get_visible_pcie_requester() yet, I was just trying to be consistent with naming by adding _id to the end. Thanks, Alex > > Note however that (2) does not impose anything about domains or > > isolation, it is strictly based on PCI topology. It's left to the > > caller to determine how that translates to IOMMU domains, but the > > typical case is trivial. Thanks, > > > +1, again. > > > Alex > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 29, 2013 at 4:32 PM, Alex Williamson <alex.williamson@redhat.com> wrote: > On Mon, 2013-07-29 at 15:02 -0600, Bjorn Helgaas wrote: >> On Mon, Jul 29, 2013 at 10:06 AM, Alex Williamson >> <alex.williamson@redhat.com> wrote: >> > On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote: >> >> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote: >> >> >> The "most closely associated device" idea seems confusing to >> >> describe and think about. I think you want to use it as part of >> >> grouping devices into domains. But I think we should attack that >> >> problem separately. For grouping or isolation questions, we have >> >> to pay attention to things like ACS, which are not relevant for >> >> mapping. >> > >> > We are only touch isolation insofar as providing an interface for a >> > driver to determine the point in the PCI topology where the requester ID >> > is rooted. Yes, grouping can make use of that, but I object to the idea >> > that we're approaching some slippery slope of rolling multiple concepts >> > into this patch. What I'm proposing here is strictly a requester ID >> > interface. I believe that providing a way for a caller to determine >> > that two devices have a requester ID rooted at the same point in the PCI >> > topology is directly relevant to that interface. >> > >> > pci_find_upstream_pcie_bridge() attempts to do this same thing. >> > Unfortunately, the interface is convoluted, it doesn't handle quirks, >> > and it requires a great deal of work by the caller to then walk the >> > topology and create requester IDs at each step. This also indicates >> > that at each step, the requester ID is associated with some pci_dev. >> > Providing a pci_dev is simply showing our work and providing context to >> > the requester ID (ie. here's the requester ID and the step along the >> > path from which it was derived. Here's your top level requester ID and >> > the point in the topology where it's based). >> >> What does the driver *do* with the pci_dev? If it's not necessary, >> then showing our work and providing context just complicates the >> interface and creates opportunities for mistakes. If we're creating >> IOMMU mappings, only the requester ID is needed. > > It's true, I don't have a use for the pci_dev in > pci_dev_for_each_requester_id() today. But I think providing the > context for a requester ID is valuable information. Otherwise we have > to make assumptions about the requester ID. For example, if I have > devices in different PCI segments with the same requester ID the > callback function only knows that those are actually different requester > IDs from information the caller provides itself in the opaque pointer. > This is the case with intel-iommu, but why would we design an API that > requires the caller to provide that kind of context? The caller already has to provide context, e.g., the domain in which to create a mapping, anyway via the opaque pointer. So I would argue that it's pointless to supply context twice in different ways. We only have one caller of pci_dev_for_each_requester_id() anyway (intel-iommu.c). That seems really strange to me. All I can assume is that other IOMMU drivers *should* be doing something like this, but aren't yet. Anyway, since we only have one user, why not just provide the minimum (only the requester ID), and add the pci_dev later if a requirement for it turns up? > I also provide the > pci_dev because I think both the pci_dev_for_each_requester_id() and > pci_get_visible_pcie_requester() should provide similar APIs. There's > an association of a requester ID to a pci_dev. Why hide that? Information hiding is a basic idea of abstraction and encapsulation. If we don't hide unnecessary information, we end up with unnecessary dependencies. >> I think you used get_domain_for_dev() and intel_iommu_add_device() as >> examples, but as far as I can tell, they use the pci_dev as a way to >> learn about isolation. For that purpose, I don't think you want an >> iterator -- you only want to know about the single pci_dev that's the >> root of the isolation domain, and requester IDs really aren't >> relevant. > > See get_domain_for_dev() in patch 2/2. It uses the returned pci_dev to > know whether the requester ID is rooted upstream or at the device > itself. If upstream, it then uses the requester ID to search for an > existing domain. The requester ID is relevant here. If the returned > pci_dev is the device itself, it proceeds to allocate a domain because > the code path has already checked whether the device itself belongs to a > domain. I guess I got myself confused here. get_domain_for_dev() uses pci_get_visible_pcie_requester(), not pci_dev_for_each_requester_id(). >> > (1) pcie_for_each_requester >> > (or pci_dev_for_each_requester_id) >> > >> > (2) pci_get_visible_pcie_requester >> > (or pci_get_visible_pcie_requester_id) >> > >> > Note however that (2) does not impose anything about domains or >> > isolation, it is strictly based on PCI topology. It's left to the >> > caller to determine how that translates to IOMMU domains, but the >> > typical case is trivial. >> >> Can you expand on this a little bit? What's involved in translating >> the pci_get_visible_pcie_requester() result to an IOMMU domain? Is >> there something IOMMU-specific about this? > > I try not to make any assumptions about what's required to get from a > pci_dev/requester ID to a domain. The IOMMU may only be able to filter > a subset of the requester ID, for instance what if it has no function > level visibility, or perhaps only bus level visibility. intel-iommu has > full BDF granularity, so it's perhaps the simplest case. > >> I assume that for any given PCI device A, there's a subtree of devices >> that the IOMMU can't distinguish from A, and that this subtree is not >> IOMMU-specific so it's reasonable to to have a PCI interface to >> compute it. An IOMMU driver might want to expand the domain to >> include more devices, of course, and *that* seems like something to >> leave to the driver. > > Yes, I agree. We're only trying to determine the PCI bus level > visibility, anything else is left to the IOMMU driver. > >> I'm also assuming that a pci_dev (the root of the subtree) is a good >> way to identify this set of indistinguishable devices because it's >> easy to test whether a device B is part of it. But the requester ID >> seems redundant in this case because you can just associate an IOMMU >> domain with the pci_dev. > > I don't think we want to assume that an IOMMU driver bases a domain on a > pci_dev or a BDF. That's specific to the driver. The requester ID is > not redundant though since it depends on topology and quirks. A given > pci_dev may have it's own legitimate requester ID or a quirk, where the > quirk may be specific to the device or architected, like a bridge using > the (bridge->subordinate->number << 8 | 0) requester ID. > >> Where does ACS come into the picture? > > ACS determines whether a transaction must necessarily reach the IOMMU > without being redirected. Therefore once we figure out the point at > which we have bus level granularity for a transaction, we look upstream > to determine if a transaction must necessarily reach the IOMMU. In the > case of intel-iommu, IOMMU groups are only dependent on the visibility > provided by the PCI topology and the isolation guaranteed through ACS, > but it's up to the IOMMU driver to determine whether there are > additional constraints. Thanks, What code will look for ACS support and enablement? Do you envision that being in the IOMMU code or somewhere in PCI? I was hoping that the isolation/domain identification stuff could be fairly generic, but it sounds like that's not the case so I'm guessing the interesting parts of it will be in IOMMU code. So I think I'm willing to go along with pci_get_visible_pcie_requester() returning both a pci_dev and a requester ID. Is a single one of each sufficient? I guess it is -- I don't see any loops around where you're calling it. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2013-08-01 at 16:08 -0600, Bjorn Helgaas wrote: > On Mon, Jul 29, 2013 at 4:32 PM, Alex Williamson > <alex.williamson@redhat.com> wrote: > > On Mon, 2013-07-29 at 15:02 -0600, Bjorn Helgaas wrote: > >> On Mon, Jul 29, 2013 at 10:06 AM, Alex Williamson > >> <alex.williamson@redhat.com> wrote: > >> > On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote: > >> >> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote: > >> > >> >> The "most closely associated device" idea seems confusing to > >> >> describe and think about. I think you want to use it as part of > >> >> grouping devices into domains. But I think we should attack that > >> >> problem separately. For grouping or isolation questions, we have > >> >> to pay attention to things like ACS, which are not relevant for > >> >> mapping. > >> > > >> > We are only touch isolation insofar as providing an interface for a > >> > driver to determine the point in the PCI topology where the requester ID > >> > is rooted. Yes, grouping can make use of that, but I object to the idea > >> > that we're approaching some slippery slope of rolling multiple concepts > >> > into this patch. What I'm proposing here is strictly a requester ID > >> > interface. I believe that providing a way for a caller to determine > >> > that two devices have a requester ID rooted at the same point in the PCI > >> > topology is directly relevant to that interface. > >> > > >> > pci_find_upstream_pcie_bridge() attempts to do this same thing. > >> > Unfortunately, the interface is convoluted, it doesn't handle quirks, > >> > and it requires a great deal of work by the caller to then walk the > >> > topology and create requester IDs at each step. This also indicates > >> > that at each step, the requester ID is associated with some pci_dev. > >> > Providing a pci_dev is simply showing our work and providing context to > >> > the requester ID (ie. here's the requester ID and the step along the > >> > path from which it was derived. Here's your top level requester ID and > >> > the point in the topology where it's based). > >> > >> What does the driver *do* with the pci_dev? If it's not necessary, > >> then showing our work and providing context just complicates the > >> interface and creates opportunities for mistakes. If we're creating > >> IOMMU mappings, only the requester ID is needed. > > > > It's true, I don't have a use for the pci_dev in > > pci_dev_for_each_requester_id() today. But I think providing the > > context for a requester ID is valuable information. Otherwise we have > > to make assumptions about the requester ID. For example, if I have > > devices in different PCI segments with the same requester ID the > > callback function only knows that those are actually different requester > > IDs from information the caller provides itself in the opaque pointer. > > This is the case with intel-iommu, but why would we design an API that > > requires the caller to provide that kind of context? > > The caller already has to provide context, e.g., the domain in which > to create a mapping, anyway via the opaque pointer. So I would argue > that it's pointless to supply context twice in different ways. > > We only have one caller of pci_dev_for_each_requester_id() anyway > (intel-iommu.c). That seems really strange to me. All I can assume > is that other IOMMU drivers *should* be doing something like this, but > aren't yet. Anyway, since we only have one user, why not just provide > the minimum (only the requester ID), and add the pci_dev later if a > requirement for it turns up? Ok, I can do that. I think we only have one user because intel-iommu is the only in-tree user of pci_find_upstream_pcie_bridge(). I know the powerpc folks have been using similar code though. amd_iommu bypasses because they generate an alias table based on information for firmware. That replaces some of this manual grep'ing through the topology. Perhaps Joerg could comment about whether this is a useful interface for amd_iommu. > > I also provide the > > pci_dev because I think both the pci_dev_for_each_requester_id() and > > pci_get_visible_pcie_requester() should provide similar APIs. There's > > an association of a requester ID to a pci_dev. Why hide that? > > Information hiding is a basic idea of abstraction and encapsulation. > If we don't hide unnecessary information, we end up with unnecessary > dependencies. > > >> I think you used get_domain_for_dev() and intel_iommu_add_device() as > >> examples, but as far as I can tell, they use the pci_dev as a way to > >> learn about isolation. For that purpose, I don't think you want an > >> iterator -- you only want to know about the single pci_dev that's the > >> root of the isolation domain, and requester IDs really aren't > >> relevant. > > > > See get_domain_for_dev() in patch 2/2. It uses the returned pci_dev to > > know whether the requester ID is rooted upstream or at the device > > itself. If upstream, it then uses the requester ID to search for an > > existing domain. The requester ID is relevant here. If the returned > > pci_dev is the device itself, it proceeds to allocate a domain because > > the code path has already checked whether the device itself belongs to a > > domain. > > I guess I got myself confused here. get_domain_for_dev() uses > pci_get_visible_pcie_requester(), not pci_dev_for_each_requester_id(). > > >> > (1) pcie_for_each_requester > >> > (or pci_dev_for_each_requester_id) > >> > > >> > (2) pci_get_visible_pcie_requester > >> > (or pci_get_visible_pcie_requester_id) > >> > > >> > Note however that (2) does not impose anything about domains or > >> > isolation, it is strictly based on PCI topology. It's left to the > >> > caller to determine how that translates to IOMMU domains, but the > >> > typical case is trivial. > >> > >> Can you expand on this a little bit? What's involved in translating > >> the pci_get_visible_pcie_requester() result to an IOMMU domain? Is > >> there something IOMMU-specific about this? > > > > I try not to make any assumptions about what's required to get from a > > pci_dev/requester ID to a domain. The IOMMU may only be able to filter > > a subset of the requester ID, for instance what if it has no function > > level visibility, or perhaps only bus level visibility. intel-iommu has > > full BDF granularity, so it's perhaps the simplest case. > > > >> I assume that for any given PCI device A, there's a subtree of devices > >> that the IOMMU can't distinguish from A, and that this subtree is not > >> IOMMU-specific so it's reasonable to to have a PCI interface to > >> compute it. An IOMMU driver might want to expand the domain to > >> include more devices, of course, and *that* seems like something to > >> leave to the driver. > > > > Yes, I agree. We're only trying to determine the PCI bus level > > visibility, anything else is left to the IOMMU driver. > > > >> I'm also assuming that a pci_dev (the root of the subtree) is a good > >> way to identify this set of indistinguishable devices because it's > >> easy to test whether a device B is part of it. But the requester ID > >> seems redundant in this case because you can just associate an IOMMU > >> domain with the pci_dev. > > > > I don't think we want to assume that an IOMMU driver bases a domain on a > > pci_dev or a BDF. That's specific to the driver. The requester ID is > > not redundant though since it depends on topology and quirks. A given > > pci_dev may have it's own legitimate requester ID or a quirk, where the > > quirk may be specific to the device or architected, like a bridge using > > the (bridge->subordinate->number << 8 | 0) requester ID. > > > >> Where does ACS come into the picture? > > > > ACS determines whether a transaction must necessarily reach the IOMMU > > without being redirected. Therefore once we figure out the point at > > which we have bus level granularity for a transaction, we look upstream > > to determine if a transaction must necessarily reach the IOMMU. In the > > case of intel-iommu, IOMMU groups are only dependent on the visibility > > provided by the PCI topology and the isolation guaranteed through ACS, > > but it's up to the IOMMU driver to determine whether there are > > additional constraints. Thanks, > > What code will look for ACS support and enablement? Do you envision > that being in the IOMMU code or somewhere in PCI? I was hoping that > the isolation/domain identification stuff could be fairly generic, but > it sounds like that's not the case so I'm guessing the interesting > parts of it will be in IOMMU code. ACS support is currently in the IOMMUs and it's pretty much a mirror image in both intel-iommu and amd_iommu (and soon powerpc), so it's another good opportunity to create something to facilitate it. The specific IOMMU drivers are definitely more involved in determining device visibility since that directly translates into their page table entries, but ACS determines isolation, which is generic and determined exclusively by device capabilities and quirks. > So I think I'm willing to go along with > pci_get_visible_pcie_requester() returning both a pci_dev and a > requester ID. Is a single one of each sufficient? I guess it is -- I > don't see any loops around where you're calling it. Great! I'm still trying to figure out how to handle the quirk around Intel PCI-to-PCI bridge on the root complex as just another quirk. I respin another version once I have that worked out. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[+cc George] On Fri, Aug 2, 2013 at 10:59 AM, Alex Williamson <alex.williamson@redhat.com> wrote: > ... > Great! I'm still trying to figure out how to handle the quirk around > Intel PCI-to-PCI bridge on the root complex as just another quirk. I > respin another version once I have that worked out. Thanks, Is anything happening here? These buggy IOMMU/DMA source problems, e.g., [1], have been lingering a long time, and I don't know if we're stuck because I haven't been giving them enough attention, or if we don't really have a good solution yet. [1] https://bugzilla.kernel.org/show_bug.cgi?id=42679 -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2014-04-03 at 15:48 -0600, Bjorn Helgaas wrote: > [+cc George] > > On Fri, Aug 2, 2013 at 10:59 AM, Alex Williamson > <alex.williamson@redhat.com> wrote: > > ... > > Great! I'm still trying to figure out how to handle the quirk around > > Intel PCI-to-PCI bridge on the root complex as just another quirk. I > > respin another version once I have that worked out. Thanks, > > Is anything happening here? These buggy IOMMU/DMA source problems, > e.g., [1], have been lingering a long time, and I don't know if we're > stuck because I haven't been giving them enough attention, or if we > don't really have a good solution yet. > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=42679 Sorry, no. This has completely dropped off my radar. I'll try to resurrect it and figure out how to move forward. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 3, 2014 at 8:51 PM, Alex Williamson <alex.williamson@redhat.com> wrote: > On Thu, 2014-04-03 at 15:48 -0600, Bjorn Helgaas wrote: >> [+cc George] >> >> On Fri, Aug 2, 2013 at 10:59 AM, Alex Williamson >> <alex.williamson@redhat.com> wrote: >> > ... >> > Great! I'm still trying to figure out how to handle the quirk around >> > Intel PCI-to-PCI bridge on the root complex as just another quirk. I >> > respin another version once I have that worked out. Thanks, >> >> Is anything happening here? These buggy IOMMU/DMA source problems, >> e.g., [1], have been lingering a long time, and I don't know if we're >> stuck because I haven't been giving them enough attention, or if we >> don't really have a good solution yet. >> >> [1] https://bugzilla.kernel.org/show_bug.cgi?id=42679 > > Sorry, no. This has completely dropped off my radar. I'll try to > resurrect it and figure out how to move forward. Thanks, Not a problem; I'm just way behind on processing patches, so I'm trying to clean up my backlog and take care of things that are waiting on me. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/search.c b/drivers/pci/search.c index d0627fa..4759c02 100644 --- a/drivers/pci/search.c +++ b/drivers/pci/search.c @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem); EXPORT_SYMBOL_GPL(pci_bus_sem); /* + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID + * @dev: device to test + */ +static bool pci_has_pcie_requester_id(struct pci_dev *dev) +{ + /* + * XXX There's no indicator of the bus type, conventional PCI vs + * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe + * requester ID is a native PCIe based system (such as VT-d or + * AMD-Vi). It's common that PCIe root complex devices do not + * include a PCIe capability, but we can assume they are PCIe + * devices based on their topology. + */ + if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus)) + return true; + + /* + * PCI-X devices have a requester ID, but the bridge may still take + * ownership of transactions and create a requester ID. We therefore + * assume that the PCI-X requester ID is not the same one used on PCIe. + */ + +#ifdef CONFIG_PCI_QUIRKS + /* + * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability. + * If the device is a bridge, look to the next device upstream of it. + * If that device is PCIe and not a PCIe-to-PCI bridge, then by + * deduction, the device must be PCIe and therefore has a requester ID. + */ + if (dev->subordinate) { + struct pci_dev *parent = dev->bus->self; + + if (pci_is_pcie(parent) && + pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE) + return true; + } +#endif + + return false; +} + +/* + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID? + * @dev: requester device + * @bridge: upstream bridge (or NULL for root bus) + */ +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev, + struct pci_dev *bridge) +{ + /* + * The entire path must be tested, if any step does not have a + * requester ID, the chain is broken. This allows us to support + * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe + */ + while (dev != bridge) { + if (!pci_has_pcie_requester_id(dev)) + return false; + + if (pci_is_root_bus(dev->bus)) + return !bridge; /* false if we don't hit @bridge */ + + dev = dev->bus->self; + } + + return true; +} + +/* + * Legacy PCI bridges within a root complex (ex. Intel 82801) report + * a different requester ID than a standard PCIe-to-PCI bridge. Instead + * of using (subordinate << 8 | 0) the use (bus << 8 | devfn), like a + * standard PCIe endpoint. This function detects them. + * + * XXX Is this Intel vendor ID specific? + */ +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge) +{ + if (!pci_is_pcie(bridge) && pci_is_root_bus(bridge->bus)) + return true; + + return false; +} + +#define PCI_REQUESTER_ID(dev) (((dev)->bus->number << 8) | (dev)->devfn) +#define PCI_BRIDGE_REQUESTER_ID(dev) ((dev)->subordinate->number << 8) + +/* + * pci_get_visible_pcie_requester - Get requester and requester ID for + * @requestee below @bridge + * @requestee: requester device + * @bridge: upstream bridge (or NULL for root bus) + * @requester_id: location to store requester ID or NULL + */ +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, + struct pci_dev *bridge, + u16 *requester_id) +{ + struct pci_dev *requester = requestee; + + while (requester != bridge) { + requester = pci_get_dma_source(requester); + pci_dev_put(requester); /* XXX skip ref cnt */ + + if (pci_has_visible_pcie_requester_id(requester, bridge)) + break; + + if (pci_is_root_bus(requester->bus)) + return NULL; /* @bridge not parent to @requestee */ + + requester = requester->bus->self; + } + + if (requester_id) { + if (requester->bus != requestee->bus && + !pci_bridge_uses_endpoint_requester(requester)) + *requester_id = PCI_BRIDGE_REQUESTER_ID(requester); + else + *requester_id = PCI_REQUESTER_ID(requester); + } + + return requester; +} + +static int pci_do_requester_callback(struct pci_dev **dev, + int (*fn)(struct pci_dev *, + u16 id, void *), + void *data) +{ + struct pci_dev *dma_dev; + int ret; + + ret = fn(*dev, PCI_REQUESTER_ID(*dev), data); + if (ret) + return ret; + + dma_dev = pci_get_dma_source(*dev); + pci_dev_put(dma_dev); /* XXX skip ref cnt */ + if (dma_dev == *dev) + return 0; + + ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data); + if (ret) + return ret; + + *dev = dma_dev; + return 0; +} + +/* + * pcie_for_each_requester - Call callback @fn on each devices and DMA source + * from @requestee to the PCIe requester ID visible + * to @bridge. + * @requestee: Starting device + * @bridge: upstream bridge (or NULL for root bus) + * @fn: callback function + * @data: data to pass to callback + */ +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, + int (*fn)(struct pci_dev *, u16 id, void *), + void *data) +{ + struct pci_dev *requester; + struct pci_dev *dev = requestee; + int ret = 0; + + requester = pci_get_visible_pcie_requester(requestee, bridge, NULL); + if (!requester) + return -EINVAL; + + do { + ret = pci_do_requester_callback(&dev, fn, data); + if (ret) + return ret; + + if (dev == requester) + return 0; + + /* + * We always consider root bus devices to have a visible + * requester ID, therefore this should never be true. + */ + BUG_ON(pci_is_root_bus(dev->bus)); + + dev = dev->bus->self; + + } while (dev != requester); + + /* + * If we've made it here, @requester is a bridge upstream from + * @requestee. + */ + if (pci_bridge_uses_endpoint_requester(requester)) + return pci_do_requester_callback(&requester, fn, data); + + return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data); +} + +/* * find the upstream PCIe-to-PCI bridge of a PCI device * if the device is PCIE, return NULL * if the device isn't connected to a PCIe bridge (that is its parent is a diff --git a/include/linux/pci.h b/include/linux/pci.h index 3a24e4f..94e81d1 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev) } #endif +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee, + struct pci_dev *bridge, + u16 *requester_id); +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge, + int (*fn)(struct pci_dev *, u16 id, void *), + void *data); + /** * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device * @pdev: the PCI device
This provides interfaces for drivers to discover the visible PCIe requester ID for a device, for things like IOMMU setup, and iterate over the device chain from requestee to requester, including DMA quirks at each step. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- drivers/pci/search.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/pci.h | 7 ++ 2 files changed, 205 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html