diff mbox

[RFC,v2,1/2] pci: Create PCIe requester ID interface

Message ID 20130711210326.1701.56478.stgit@bling.home (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Alex Williamson July 11, 2013, 9:03 p.m. UTC
This provides interfaces for drivers to discover the visible PCIe
requester ID for a device, for things like IOMMU setup, and iterate
over the device chain from requestee to requester, including DMA
quirks at each step.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/pci/search.c |  198 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h  |    7 ++
 2 files changed, 205 insertions(+)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Bjorn Helgaas July 23, 2013, 10:35 p.m. UTC | #1
On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> This provides interfaces for drivers to discover the visible PCIe
> requester ID for a device, for things like IOMMU setup, and iterate

IDs (plural)

> over the device chain from requestee to requester, including DMA
> quirks at each step.

"requestee" doesn't make sense to me.  The "-ee" suffix added to a verb
normally makes a noun that refers to the object of the action.  So
"requestee" sounds like it means something like "target" or "responder,"
but that's not what you mean here.

> Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/pci/search.c |  198 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h  |    7 ++
>  2 files changed, 205 insertions(+)
> 
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index d0627fa..4759c02 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem);
>  EXPORT_SYMBOL_GPL(pci_bus_sem);
>  
>  /*
> + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID
> + * @dev: device to test
> + */
> +static bool pci_has_pcie_requester_id(struct pci_dev *dev)
> +{
> +	/*
> +	 * XXX There's no indicator of the bus type, conventional PCI vs
> +	 * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe
> +	 * requester ID is a native PCIe based system (such as VT-d or
> +	 * AMD-Vi).  It's common that PCIe root complex devices do not
> +	 * include a PCIe capability, but we can assume they are PCIe
> +	 * devices based on their topology.
> +	 */
> +	if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus))
> +		return true;
> +
> +	/*
> +	 * PCI-X devices have a requester ID, but the bridge may still take
> +	 * ownership of transactions and create a requester ID.  We therefore
> +	 * assume that the PCI-X requester ID is not the same one used on PCIe.
> +	 */
> +
> +#ifdef CONFIG_PCI_QUIRKS
> +	/*
> +	 * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability.
> +	 * If the device is a bridge, look to the next device upstream of it.
> +	 * If that device is PCIe and not a PCIe-to-PCI bridge, then by
> +	 * deduction, the device must be PCIe and therefore has a requester ID.
> +	 */
> +	if (dev->subordinate) {
> +		struct pci_dev *parent = dev->bus->self;
> +
> +		if (pci_is_pcie(parent) &&
> +		    pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
> +			return true;
> +	}
> +#endif
> +
> +	return false;
> +}
> +
> +/*
> + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID?
> + * @dev: requester device
> + * @bridge: upstream bridge (or NULL for root bus)
> + */
> +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev,
> +					      struct pci_dev *bridge)
> +{
> +	/*
> +	 * The entire path must be tested, if any step does not have a
> +	 * requester ID, the chain is broken.  This allows us to support
> +	 * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe
> +	 */
> +	while (dev != bridge) {
> +		if (!pci_has_pcie_requester_id(dev))
> +			return false;
> +
> +		if (pci_is_root_bus(dev->bus))
> +			return !bridge; /* false if we don't hit @bridge */
> +
> +		dev = dev->bus->self;
> +	}
> +
> +	return true;
> +}
> +
> +/*
> + * Legacy PCI bridges within a root complex (ex. Intel 82801) report
> + * a different requester ID than a standard PCIe-to-PCI bridge.  Instead
> + * of using (subordinate << 8 | 0) the use (bus << 8 | devfn), like a

s/the/they/

Did you learn about this empirically?  Intel spec?  I wonder if there's
some way to derive this from the PCIe specs.

> + * standard PCIe endpoint.  This function detects them.
> + *
> + * XXX Is this Intel vendor ID specific?
> + */
> +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge)
> +{
> +	if (!pci_is_pcie(bridge) && pci_is_root_bus(bridge->bus))
> +		return true;
> +
> +	return false;
> +}
> +
> +#define PCI_REQUESTER_ID(dev)	(((dev)->bus->number << 8) | (dev)->devfn)
> +#define PCI_BRIDGE_REQUESTER_ID(dev)	((dev)->subordinate->number << 8)
> +
> +/*
> + * pci_get_visible_pcie_requester - Get requester and requester ID for
> + *                                  @requestee below @bridge
> + * @requestee: requester device
> + * @bridge: upstream bridge (or NULL for root bus)
> + * @requester_id: location to store requester ID or NULL
> + */
> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> +					       struct pci_dev *bridge,
> +					       u16 *requester_id)

I'm not sure it makes sense to return a struct pci_dev here because
there's no requirement that a requester ID correspond to an actual
pci_dev.

> +{
> +	struct pci_dev *requester = requestee;
> +
> +	while (requester != bridge) {
> +		requester = pci_get_dma_source(requester);
> +		pci_dev_put(requester); /* XXX skip ref cnt */
> +
> +		if (pci_has_visible_pcie_requester_id(requester, bridge))

If we acquire the "requester" pointer via a ref-counting interface,
it's illegal to use the pointer after dropping the reference, isn't it?
Maybe that's what you mean by the "XXX" comment.

> +			break;
> +
> +		if (pci_is_root_bus(requester->bus))
> +			return NULL; /* @bridge not parent to @requestee */
> +
> +		requester = requester->bus->self;
> +	}
> +
> +	if (requester_id) {
> +		if (requester->bus != requestee->bus &&
> +		    !pci_bridge_uses_endpoint_requester(requester))
> +			*requester_id = PCI_BRIDGE_REQUESTER_ID(requester);
> +		else
> +			*requester_id = PCI_REQUESTER_ID(requester);
> +	}
> +
> +	return requester;
> +}
> +
> +static int pci_do_requester_callback(struct pci_dev **dev,
> +				     int (*fn)(struct pci_dev *,
> +					       u16 id, void *),
> +				     void *data)
> +{
> +	struct pci_dev *dma_dev;
> +	int ret;
> +
> +	ret = fn(*dev, PCI_REQUESTER_ID(*dev), data);
> +	if (ret)
> +		return ret;
> +
> +	dma_dev = pci_get_dma_source(*dev);
> +	pci_dev_put(dma_dev); /* XXX skip ref cnt */
> +	if (dma_dev == *dev)

Same ref count question as above.

> +		return 0;
> +
> +	ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data);
> +	if (ret)
> +		return ret;
> +
> +	*dev = dma_dev;
> +	return 0;
> +}
> +
> +/*
> + * pcie_for_each_requester - Call callback @fn on each devices and DMA source
> + *                           from @requestee to the PCIe requester ID visible
> + *                           to @bridge.

Transactions from a device may appear with one of several requester IDs,
but there's not necessarily an actual pci_dev for each ID, so I think the
caller reads better if it's "...for_each_requester_id()"

The "Call X on each devices and DMA source from Y to the requester ID"
part doesn't quite make a sentence.

> + * @requestee: Starting device
> + * @bridge: upstream bridge (or NULL for root bus)

You should say something about the significance of @bridge.  I think the
point is to call @fn for every possible requester ID @bridge could see for
transactions from @requestee.  This is a way to learn the requester IDs an
IOMMU at @bridge needs to be prepared for.

> + * @fn: callback function
> + * @data: data to pass to callback
> + */
> +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
> +			    int (*fn)(struct pci_dev *, u16 id, void *),
> +			    void *data)
> +{
> +	struct pci_dev *requester;
> +	struct pci_dev *dev = requestee;
> +	int ret = 0;
> +
> +	requester = pci_get_visible_pcie_requester(requestee, bridge, NULL);
> +	if (!requester)
> +		return -EINVAL;
> +
> +	do {
> +		ret = pci_do_requester_callback(&dev, fn, data);
> +		if (ret)
> +			return ret;
> +
> +		if (dev == requester)
> +			return 0;
> +
> +		/*
> +		 * We always consider root bus devices to have a visible
> +		 * requester ID, therefore this should never be true.
> +		 */
> +		BUG_ON(pci_is_root_bus(dev->bus));

What are we going to do if somebody hits this BUG_ON()?  If it's impossible
to hit, we should just remove it.  If it's possible to hit it in some weird
topology you didn't consider, we should see IOMMU faults for any requester
ID we neglected to map, and that fault would be a better debugging hint
than a BUG_ON() here.

> +
> +		dev = dev->bus->self;
> +
> +	} while (dev != requester);
> +
> +	/*
> +	 * If we've made it here, @requester is a bridge upstream from
> +	 * @requestee.
> +	 */
> +	if (pci_bridge_uses_endpoint_requester(requester))
> +		return pci_do_requester_callback(&requester, fn, data);
> +
> +	return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data);
> +}
> +
> +/*
>   * find the upstream PCIe-to-PCI bridge of a PCI device
>   * if the device is PCIE, return NULL
>   * if the device isn't connected to a PCIe bridge (that is its parent is a
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 3a24e4f..94e81d1 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
>  }
>  #endif
>  
> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> +					       struct pci_dev *bridge,
> +					       u16 *requester_id);

The structure of this interface implies that there is only one visible
requester ID, but the whole point of this patch is that a transaction from
@requestee may appear with one of several requester IDs.  So which one will
this return?

> +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
> +			    int (*fn)(struct pci_dev *, u16 id, void *),
> +			    void *data);
> +
>  /**
>   * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device
>   * @pdev: the PCI device
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson July 23, 2013, 11:21 p.m. UTC | #2
On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> > This provides interfaces for drivers to discover the visible PCIe
> > requester ID for a device, for things like IOMMU setup, and iterate
> 
> IDs (plural)

How does a device can't have multiple requester IDs?  Reading below, I'm
not sure we're on the same page for the purpose of this patch.

> > over the device chain from requestee to requester, including DMA
> > quirks at each step.
> 
> "requestee" doesn't make sense to me.  The "-ee" suffix added to a verb
> normally makes a noun that refers to the object of the action.  So
> "requestee" sounds like it means something like "target" or "responder,"
> but that's not what you mean here.

Hmm, ok.  I figured a request-er makes a request on behalf of a
request-ee.  Suggestions?

> > Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >  drivers/pci/search.c |  198 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/pci.h  |    7 ++
> >  2 files changed, 205 insertions(+)
> > 
> > diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> > index d0627fa..4759c02 100644
> > --- a/drivers/pci/search.c
> > +++ b/drivers/pci/search.c
> > @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem);
> >  EXPORT_SYMBOL_GPL(pci_bus_sem);
> >  
> >  /*
> > + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID
> > + * @dev: device to test
> > + */
> > +static bool pci_has_pcie_requester_id(struct pci_dev *dev)
> > +{
> > +	/*
> > +	 * XXX There's no indicator of the bus type, conventional PCI vs
> > +	 * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe
> > +	 * requester ID is a native PCIe based system (such as VT-d or
> > +	 * AMD-Vi).  It's common that PCIe root complex devices do not
> > +	 * include a PCIe capability, but we can assume they are PCIe
> > +	 * devices based on their topology.
> > +	 */
> > +	if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus))
> > +		return true;
> > +
> > +	/*
> > +	 * PCI-X devices have a requester ID, but the bridge may still take
> > +	 * ownership of transactions and create a requester ID.  We therefore
> > +	 * assume that the PCI-X requester ID is not the same one used on PCIe.
> > +	 */
> > +
> > +#ifdef CONFIG_PCI_QUIRKS
> > +	/*
> > +	 * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability.
> > +	 * If the device is a bridge, look to the next device upstream of it.
> > +	 * If that device is PCIe and not a PCIe-to-PCI bridge, then by
> > +	 * deduction, the device must be PCIe and therefore has a requester ID.
> > +	 */
> > +	if (dev->subordinate) {
> > +		struct pci_dev *parent = dev->bus->self;
> > +
> > +		if (pci_is_pcie(parent) &&
> > +		    pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
> > +			return true;
> > +	}
> > +#endif
> > +
> > +	return false;
> > +}
> > +
> > +/*
> > + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID?
> > + * @dev: requester device
> > + * @bridge: upstream bridge (or NULL for root bus)
> > + */
> > +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev,
> > +					      struct pci_dev *bridge)
> > +{
> > +	/*
> > +	 * The entire path must be tested, if any step does not have a
> > +	 * requester ID, the chain is broken.  This allows us to support
> > +	 * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe
> > +	 */
> > +	while (dev != bridge) {
> > +		if (!pci_has_pcie_requester_id(dev))
> > +			return false;
> > +
> > +		if (pci_is_root_bus(dev->bus))
> > +			return !bridge; /* false if we don't hit @bridge */
> > +
> > +		dev = dev->bus->self;
> > +	}
> > +
> > +	return true;
> > +}
> > +
> > +/*
> > + * Legacy PCI bridges within a root complex (ex. Intel 82801) report
> > + * a different requester ID than a standard PCIe-to-PCI bridge.  Instead
> > + * of using (subordinate << 8 | 0) the use (bus << 8 | devfn), like a
> 
> s/the/they/
> 
> Did you learn about this empirically?  Intel spec?  I wonder if there's
> some way to derive this from the PCIe specs.

It's in the current intel-iommu logic, pretty much anywhere it uses
pci_find_upstream_pcie_bridge() it follows that up with a pci_is_pcie()
check.  If it's PCIe, it uses a traditional PCIe-to-PCI bridge ID.  If
it's a legacy PCI bridge it uses the bridge ID itself.

static int
domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev,
                        int translation)
{
...
        /* dependent device mapping */
        tmp = pci_find_upstream_pcie_bridge(pdev);
        if (!tmp)
                return 0;
...
        if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
                return domain_context_mapping_one(domain,
                                        pci_domain_nr(tmp->subordinate),
                                        tmp->subordinate->number, 0,
                                        translation);
        else /* this is a legacy PCI bridge */
                return domain_context_mapping_one(domain,
                                                  pci_domain_nr(tmp->bus),
                                                  tmp->bus->number,
                                                  tmp->devfn,
                                                  translation);

The 82801 reference is from looking at when this would actually happen
on one of my own systems.


> > + * standard PCIe endpoint.  This function detects them.
> > + *
> > + * XXX Is this Intel vendor ID specific?
> > + */
> > +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge)
> > +{
> > +	if (!pci_is_pcie(bridge) && pci_is_root_bus(bridge->bus))
> > +		return true;
> > +
> > +	return false;
> > +}
> > +
> > +#define PCI_REQUESTER_ID(dev)	(((dev)->bus->number << 8) | (dev)->devfn)
> > +#define PCI_BRIDGE_REQUESTER_ID(dev)	((dev)->subordinate->number << 8)
> > +
> > +/*
> > + * pci_get_visible_pcie_requester - Get requester and requester ID for
> > + *                                  @requestee below @bridge
> > + * @requestee: requester device
> > + * @bridge: upstream bridge (or NULL for root bus)
> > + * @requester_id: location to store requester ID or NULL
> > + */
> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> > +					       struct pci_dev *bridge,
> > +					       u16 *requester_id)
> 
> I'm not sure it makes sense to return a struct pci_dev here because
> there's no requirement that a requester ID correspond to an actual
> pci_dev.

That's why this function is named get_.._requester instead of requester
ID.  I believe there still has to be a struct pci_dev that does the
request, but the requester ID for that device may not actually match.
So I return both.  In a PCIe-to-PCI bridge case, the pci_dev is the
bridge, but the requester ID is either the bridge bus|devfn or
subordinate|0 depending on the topology.  If we want to support "ghost
functions", we can return the real pci_dev and a ghost requester ID.

I think if we used just a requester ID, it ends up being extremely
difficult to pass that into anything else since we then have to search
again for where that requester ID is rooted.

> > +{
> > +	struct pci_dev *requester = requestee;
> > +
> > +	while (requester != bridge) {
> > +		requester = pci_get_dma_source(requester);
> > +		pci_dev_put(requester); /* XXX skip ref cnt */
> > +
> > +		if (pci_has_visible_pcie_requester_id(requester, bridge))
> 
> If we acquire the "requester" pointer via a ref-counting interface,
> it's illegal to use the pointer after dropping the reference, isn't it?
> Maybe that's what you mean by the "XXX" comment.


Yes, I was just following your RFC lead and didn't want to deal with
reference counting until this approach had enough traction.


> > +			break;
> > +
> > +		if (pci_is_root_bus(requester->bus))
> > +			return NULL; /* @bridge not parent to @requestee */
> > +
> > +		requester = requester->bus->self;
> > +	}
> > +
> > +	if (requester_id) {
> > +		if (requester->bus != requestee->bus &&
> > +		    !pci_bridge_uses_endpoint_requester(requester))
> > +			*requester_id = PCI_BRIDGE_REQUESTER_ID(requester);
> > +		else
> > +			*requester_id = PCI_REQUESTER_ID(requester);
> > +	}
> > +
> > +	return requester;
> > +}
> > +
> > +static int pci_do_requester_callback(struct pci_dev **dev,
> > +				     int (*fn)(struct pci_dev *,
> > +					       u16 id, void *),
> > +				     void *data)
> > +{
> > +	struct pci_dev *dma_dev;
> > +	int ret;
> > +
> > +	ret = fn(*dev, PCI_REQUESTER_ID(*dev), data);
> > +	if (ret)
> > +		return ret;
> > +
> > +	dma_dev = pci_get_dma_source(*dev);
> > +	pci_dev_put(dma_dev); /* XXX skip ref cnt */
> > +	if (dma_dev == *dev)
> 
> Same ref count question as above.
> 
> > +		return 0;
> > +
> > +	ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data);
> > +	if (ret)
> > +		return ret;
> > +
> > +	*dev = dma_dev;
> > +	return 0;
> > +}
> > +
> > +/*
> > + * pcie_for_each_requester - Call callback @fn on each devices and DMA source
> > + *                           from @requestee to the PCIe requester ID visible
> > + *                           to @bridge.
> 
> Transactions from a device may appear with one of several requester IDs,
> but there's not necessarily an actual pci_dev for each ID, so I think the
> caller reads better if it's "...for_each_requester_id()"

Wouldn't you expect to pass a requester ID into a function with that
name?  I'm pretty sure I had it named that at one point but thought the
parameters made more sense this way.  I'll see if I can think of a
better name.

> The "Call X on each devices and DMA source from Y to the requester ID"
> part doesn't quite make a sentence.


Ok

> > + * @requestee: Starting device
> > + * @bridge: upstream bridge (or NULL for root bus)
> 
> You should say something about the significance of @bridge.  I think the
> point is to call @fn for every possible requester ID @bridge could see for
> transactions from @requestee.  This is a way to learn the requester IDs an
> IOMMU at @bridge needs to be prepared for.

I can add something.  @bridge is supposed to be for bridge-based IOMMUs.
Essentially it's just a stopping point.  There might be PCI topology
upstream of @bridge, but if you only want the requester ID seen by the
bridge, you don't care.

> > + * @fn: callback function
> > + * @data: data to pass to callback
> > + */
> > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
> > +			    int (*fn)(struct pci_dev *, u16 id, void *),
> > +			    void *data)
> > +{
> > +	struct pci_dev *requester;
> > +	struct pci_dev *dev = requestee;
> > +	int ret = 0;
> > +
> > +	requester = pci_get_visible_pcie_requester(requestee, bridge, NULL);
> > +	if (!requester)
> > +		return -EINVAL;
> > +
> > +	do {
> > +		ret = pci_do_requester_callback(&dev, fn, data);
> > +		if (ret)
> > +			return ret;
> > +
> > +		if (dev == requester)
> > +			return 0;
> > +
> > +		/*
> > +		 * We always consider root bus devices to have a visible
> > +		 * requester ID, therefore this should never be true.
> > +		 */
> > +		BUG_ON(pci_is_root_bus(dev->bus));
> 
> What are we going to do if somebody hits this BUG_ON()?  If it's impossible
> to hit, we should just remove it.  If it's possible to hit it in some weird
> topology you didn't consider, we should see IOMMU faults for any requester
> ID we neglected to map, and that fault would be a better debugging hint
> than a BUG_ON() here.

It's mostly for readability.  I've learned that we never what to look at
dev->bus->self without first testing pci_is_root_bus(dev->bus), so if I
was reading this code I'd stumble when I got here and spend too long
looking around for the assumption that makes it not needed.  I suppose I
could just make it a comment, but I thought why not make it official w/
a BUG.

> > +
> > +		dev = dev->bus->self;
> > +
> > +	} while (dev != requester);
> > +
> > +	/*
> > +	 * If we've made it here, @requester is a bridge upstream from
> > +	 * @requestee.
> > +	 */
> > +	if (pci_bridge_uses_endpoint_requester(requester))
> > +		return pci_do_requester_callback(&requester, fn, data);
> > +
> > +	return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data);
> > +}
> > +
> > +/*
> >   * find the upstream PCIe-to-PCI bridge of a PCI device
> >   * if the device is PCIE, return NULL
> >   * if the device isn't connected to a PCIe bridge (that is its parent is a
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 3a24e4f..94e81d1 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
> >  }
> >  #endif
> >  
> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> > +					       struct pci_dev *bridge,
> > +					       u16 *requester_id);
> 
> The structure of this interface implies that there is only one visible
> requester ID, but the whole point of this patch is that a transaction from
> @requestee may appear with one of several requester IDs.  So which one will
> this return?

I thought the point of this patch was to have an integrated interface
for finding the requester ID and doing something across all devices with
that requester ID and thereby remove pci_find_upstream_pcie_bridge(),
provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for
devices that use the wrong requester ID.  In what case does a device
appear to have multiple requester IDs?  We have cases where devices use
the wrong requester ID, but AFAIK they only use a single wrong requester
ID.  Thanks,

Alex

> > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
> > +			    int (*fn)(struct pci_dev *, u16 id, void *),
> > +			    void *data);
> > +
> >  /**
> >   * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device
> >   * @pdev: the PCI device
> > 



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Cooks July 24, 2013, 3:03 p.m. UTC | #3
On Wed, Jul 24, 2013 at 7:21 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
>> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
>> > This provides interfaces for drivers to discover the visible PCIe
>> > requester ID for a device, for things like IOMMU setup, and iterate
>>
>> IDs (plural)
>
> How does a device can't have multiple requester IDs?  Reading below, I'm
> not sure we're on the same page for the purpose of this patch.
>
>> > over the device chain from requestee to requester, including DMA
>> > quirks at each step.
>>
>> "requestee" doesn't make sense to me.  The "-ee" suffix added to a verb
>> normally makes a noun that refers to the object of the action.  So
>> "requestee" sounds like it means something like "target" or "responder,"
>> but that's not what you mean here.
>
> Hmm, ok.  I figured a request-er makes a request on behalf of a
> request-ee.  Suggestions?
>
>> > Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
>> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>> > ---
>> >  drivers/pci/search.c |  198 ++++++++++++++++++++++++++++++++++++++++++++++++++
>> >  include/linux/pci.h  |    7 ++
>> >  2 files changed, 205 insertions(+)
>> >
>> > diff --git a/drivers/pci/search.c b/drivers/pci/search.c
>> > index d0627fa..4759c02 100644
>> > --- a/drivers/pci/search.c
>> > +++ b/drivers/pci/search.c
>> > @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem);
>> >  EXPORT_SYMBOL_GPL(pci_bus_sem);
>> >
>> >  /*
>> > + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID
>> > + * @dev: device to test
>> > + */
>> > +static bool pci_has_pcie_requester_id(struct pci_dev *dev)
>> > +{
>> > +   /*
>> > +    * XXX There's no indicator of the bus type, conventional PCI vs
>> > +    * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe
>> > +    * requester ID is a native PCIe based system (such as VT-d or
>> > +    * AMD-Vi).  It's common that PCIe root complex devices do not
>> > +    * include a PCIe capability, but we can assume they are PCIe
>> > +    * devices based on their topology.
>> > +    */
>> > +   if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus))
>> > +           return true;
>> > +
>> > +   /*
>> > +    * PCI-X devices have a requester ID, but the bridge may still take
>> > +    * ownership of transactions and create a requester ID.  We therefore
>> > +    * assume that the PCI-X requester ID is not the same one used on PCIe.
>> > +    */
>> > +
>> > +#ifdef CONFIG_PCI_QUIRKS
>> > +   /*
>> > +    * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability.
>> > +    * If the device is a bridge, look to the next device upstream of it.
>> > +    * If that device is PCIe and not a PCIe-to-PCI bridge, then by
>> > +    * deduction, the device must be PCIe and therefore has a requester ID.
>> > +    */
>> > +   if (dev->subordinate) {
>> > +           struct pci_dev *parent = dev->bus->self;
>> > +
>> > +           if (pci_is_pcie(parent) &&
>> > +               pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
>> > +                   return true;
>> > +   }
>> > +#endif
>> > +
>> > +   return false;
>> > +}
>> > +
>> > +/*
>> > + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID?
>> > + * @dev: requester device
>> > + * @bridge: upstream bridge (or NULL for root bus)
>> > + */
>> > +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev,
>> > +                                         struct pci_dev *bridge)
>> > +{
>> > +   /*
>> > +    * The entire path must be tested, if any step does not have a
>> > +    * requester ID, the chain is broken.  This allows us to support
>> > +    * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe
>> > +    */
>> > +   while (dev != bridge) {
>> > +           if (!pci_has_pcie_requester_id(dev))
>> > +                   return false;
>> > +
>> > +           if (pci_is_root_bus(dev->bus))
>> > +                   return !bridge; /* false if we don't hit @bridge */
>> > +
>> > +           dev = dev->bus->self;
>> > +   }
>> > +
>> > +   return true;
>> > +}
>> > +
>> > +/*
>> > + * Legacy PCI bridges within a root complex (ex. Intel 82801) report
>> > + * a different requester ID than a standard PCIe-to-PCI bridge.  Instead
>> > + * of using (subordinate << 8 | 0) the use (bus << 8 | devfn), like a
>>
>> s/the/they/
>>
>> Did you learn about this empirically?  Intel spec?  I wonder if there's
>> some way to derive this from the PCIe specs.
>
> It's in the current intel-iommu logic, pretty much anywhere it uses
> pci_find_upstream_pcie_bridge() it follows that up with a pci_is_pcie()
> check.  If it's PCIe, it uses a traditional PCIe-to-PCI bridge ID.  If
> it's a legacy PCI bridge it uses the bridge ID itself.
>
> static int
> domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev,
>                         int translation)
> {
> ...
>         /* dependent device mapping */
>         tmp = pci_find_upstream_pcie_bridge(pdev);
>         if (!tmp)
>                 return 0;
> ...
>         if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
>                 return domain_context_mapping_one(domain,
>                                         pci_domain_nr(tmp->subordinate),
>                                         tmp->subordinate->number, 0,
>                                         translation);
>         else /* this is a legacy PCI bridge */
>                 return domain_context_mapping_one(domain,
>                                                   pci_domain_nr(tmp->bus),
>                                                   tmp->bus->number,
>                                                   tmp->devfn,
>                                                   translation);
>
> The 82801 reference is from looking at when this would actually happen
> on one of my own systems.
>
>
>> > + * standard PCIe endpoint.  This function detects them.
>> > + *
>> > + * XXX Is this Intel vendor ID specific?
>> > + */
>> > +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge)
>> > +{
>> > +   if (!pci_is_pcie(bridge) && pci_is_root_bus(bridge->bus))
>> > +           return true;
>> > +
>> > +   return false;
>> > +}
>> > +
>> > +#define PCI_REQUESTER_ID(dev)      (((dev)->bus->number << 8) | (dev)->devfn)
>> > +#define PCI_BRIDGE_REQUESTER_ID(dev)       ((dev)->subordinate->number << 8)
>> > +
>> > +/*
>> > + * pci_get_visible_pcie_requester - Get requester and requester ID for
>> > + *                                  @requestee below @bridge
>> > + * @requestee: requester device
>> > + * @bridge: upstream bridge (or NULL for root bus)
>> > + * @requester_id: location to store requester ID or NULL
>> > + */
>> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
>> > +                                          struct pci_dev *bridge,
>> > +                                          u16 *requester_id)
>>
>> I'm not sure it makes sense to return a struct pci_dev here because
>> there's no requirement that a requester ID correspond to an actual
>> pci_dev.
>
> That's why this function is named get_.._requester instead of requester
> ID.  I believe there still has to be a struct pci_dev that does the
> request, but the requester ID for that device may not actually match.
> So I return both.  In a PCIe-to-PCI bridge case, the pci_dev is the
> bridge, but the requester ID is either the bridge bus|devfn or
> subordinate|0 depending on the topology.  If we want to support "ghost
> functions", we can return the real pci_dev and a ghost requester ID.
>
> I think if we used just a requester ID, it ends up being extremely
> difficult to pass that into anything else since we then have to search
> again for where that requester ID is rooted.
>
>> > +{
>> > +   struct pci_dev *requester = requestee;
>> > +
>> > +   while (requester != bridge) {
>> > +           requester = pci_get_dma_source(requester);
>> > +           pci_dev_put(requester); /* XXX skip ref cnt */
>> > +
>> > +           if (pci_has_visible_pcie_requester_id(requester, bridge))
>>
>> If we acquire the "requester" pointer via a ref-counting interface,
>> it's illegal to use the pointer after dropping the reference, isn't it?
>> Maybe that's what you mean by the "XXX" comment.
>
>
> Yes, I was just following your RFC lead and didn't want to deal with
> reference counting until this approach had enough traction.
>
>
>> > +                   break;
>> > +
>> > +           if (pci_is_root_bus(requester->bus))
>> > +                   return NULL; /* @bridge not parent to @requestee */
>> > +
>> > +           requester = requester->bus->self;
>> > +   }
>> > +
>> > +   if (requester_id) {
>> > +           if (requester->bus != requestee->bus &&
>> > +               !pci_bridge_uses_endpoint_requester(requester))
>> > +                   *requester_id = PCI_BRIDGE_REQUESTER_ID(requester);
>> > +           else
>> > +                   *requester_id = PCI_REQUESTER_ID(requester);
>> > +   }
>> > +
>> > +   return requester;
>> > +}
>> > +
>> > +static int pci_do_requester_callback(struct pci_dev **dev,
>> > +                                int (*fn)(struct pci_dev *,
>> > +                                          u16 id, void *),
>> > +                                void *data)
>> > +{
>> > +   struct pci_dev *dma_dev;
>> > +   int ret;
>> > +
>> > +   ret = fn(*dev, PCI_REQUESTER_ID(*dev), data);
>> > +   if (ret)
>> > +           return ret;
>> > +
>> > +   dma_dev = pci_get_dma_source(*dev);
>> > +   pci_dev_put(dma_dev); /* XXX skip ref cnt */
>> > +   if (dma_dev == *dev)
>>
>> Same ref count question as above.
>>
>> > +           return 0;
>> > +
>> > +   ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data);
>> > +   if (ret)
>> > +           return ret;
>> > +
>> > +   *dev = dma_dev;
>> > +   return 0;
>> > +}
>> > +
>> > +/*
>> > + * pcie_for_each_requester - Call callback @fn on each devices and DMA source
>> > + *                           from @requestee to the PCIe requester ID visible
>> > + *                           to @bridge.
>>
>> Transactions from a device may appear with one of several requester IDs,
>> but there's not necessarily an actual pci_dev for each ID, so I think the
>> caller reads better if it's "...for_each_requester_id()"
>
> Wouldn't you expect to pass a requester ID into a function with that
> name?  I'm pretty sure I had it named that at one point but thought the
> parameters made more sense this way.  I'll see if I can think of a
> better name.
>
>> The "Call X on each devices and DMA source from Y to the requester ID"
>> part doesn't quite make a sentence.
>
>
> Ok
>
>> > + * @requestee: Starting device
>> > + * @bridge: upstream bridge (or NULL for root bus)
>>
>> You should say something about the significance of @bridge.  I think the
>> point is to call @fn for every possible requester ID @bridge could see for
>> transactions from @requestee.  This is a way to learn the requester IDs an
>> IOMMU at @bridge needs to be prepared for.
>
> I can add something.  @bridge is supposed to be for bridge-based IOMMUs.
> Essentially it's just a stopping point.  There might be PCI topology
> upstream of @bridge, but if you only want the requester ID seen by the
> bridge, you don't care.
>
>> > + * @fn: callback function
>> > + * @data: data to pass to callback
>> > + */
>> > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
>> > +                       int (*fn)(struct pci_dev *, u16 id, void *),
>> > +                       void *data)
>> > +{
>> > +   struct pci_dev *requester;
>> > +   struct pci_dev *dev = requestee;
>> > +   int ret = 0;
>> > +
>> > +   requester = pci_get_visible_pcie_requester(requestee, bridge, NULL);
>> > +   if (!requester)
>> > +           return -EINVAL;
>> > +
>> > +   do {
>> > +           ret = pci_do_requester_callback(&dev, fn, data);
>> > +           if (ret)
>> > +                   return ret;
>> > +
>> > +           if (dev == requester)
>> > +                   return 0;
>> > +
>> > +           /*
>> > +            * We always consider root bus devices to have a visible
>> > +            * requester ID, therefore this should never be true.
>> > +            */
>> > +           BUG_ON(pci_is_root_bus(dev->bus));
>>
>> What are we going to do if somebody hits this BUG_ON()?  If it's impossible
>> to hit, we should just remove it.  If it's possible to hit it in some weird
>> topology you didn't consider, we should see IOMMU faults for any requester
>> ID we neglected to map, and that fault would be a better debugging hint
>> than a BUG_ON() here.
>
> It's mostly for readability.  I've learned that we never what to look at
> dev->bus->self without first testing pci_is_root_bus(dev->bus), so if I
> was reading this code I'd stumble when I got here and spend too long
> looking around for the assumption that makes it not needed.  I suppose I
> could just make it a comment, but I thought why not make it official w/
> a BUG.
>
>> > +
>> > +           dev = dev->bus->self;
>> > +
>> > +   } while (dev != requester);
>> > +
>> > +   /*
>> > +    * If we've made it here, @requester is a bridge upstream from
>> > +    * @requestee.
>> > +    */
>> > +   if (pci_bridge_uses_endpoint_requester(requester))
>> > +           return pci_do_requester_callback(&requester, fn, data);
>> > +
>> > +   return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data);
>> > +}
>> > +
>> > +/*
>> >   * find the upstream PCIe-to-PCI bridge of a PCI device
>> >   * if the device is PCIE, return NULL
>> >   * if the device isn't connected to a PCIe bridge (that is its parent is a
>> > diff --git a/include/linux/pci.h b/include/linux/pci.h
>> > index 3a24e4f..94e81d1 100644
>> > --- a/include/linux/pci.h
>> > +++ b/include/linux/pci.h
>> > @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
>> >  }
>> >  #endif
>> >
>> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
>> > +                                          struct pci_dev *bridge,
>> > +                                          u16 *requester_id);
>>
>> The structure of this interface implies that there is only one visible
>> requester ID, but the whole point of this patch is that a transaction from
>> @requestee may appear with one of several requester IDs.  So which one will
>> this return?
>
> I thought the point of this patch was to have an integrated interface
> for finding the requester ID and doing something across all devices with
> that requester ID and thereby remove pci_find_upstream_pcie_bridge(),
> provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for
> devices that use the wrong requester ID.  In what case does a device
> appear to have multiple requester IDs?  We have cases where devices use
> the wrong requester ID, but AFAIK they only use a single wrong requester
> ID.  Thanks,
>

The cases I've found where multiple requester IDs were used are all related
to Marvell Sata controllers.

Here's a table of affected devices with links to the
bug reports. In each case both functions 0 and 1 are used.

static const struct pci_dev_dma_multi_source_map {
       u16 vendor;
       u16 device;
} pci_dev_dma_multi_source_map[] = {
        /* Reported by Patrick Bregman
         * https://bugzilla.redhat.com/show_bug.cgi?id=863653 */
       { PCI_VENDOR_ID_MARVELL_EXT, 0x9120},

       /* Reported by  Pawe? ?ak, Korneliusz Jarz?bski, Daniel Mayer
        * https://bugzilla.kernel.org/show_bug.cgi?id=42679 and by
        * Justin Piszcz  https://lkml.org/lkml/2012/11/24/94 */
       { PCI_VENDOR_ID_MARVELL_EXT, 0x9123},

       /* Reported by Robert Cicconetti
        * https://bugzilla.kernel.org/show_bug.cgi?id=42679 and by
        * Fernando https://bugzilla.redhat.com/show_bug.cgi?id=757166 */
       { PCI_VENDOR_ID_MARVELL_EXT, 0x9128},

       /* Reported by Stijn Tintel
        * https://bugzilla.kernel.org/show_bug.cgi?id=42679 */
       { PCI_VENDOR_ID_MARVELL_EXT, 0x9130},

       /* Reported by Gaudenz Steinlin
        * https://lkml.org/lkml/2013/3/5/288 */
       { PCI_VENDOR_ID_MARVELL_EXT, 0x9143},

       /* Reported by Andrew Cooks
        * https://bugzilla.kernel.org/show_bug.cgi?id=42679 */
       { PCI_VENDOR_ID_MARVELL_EXT, 0x9172}
};


> Alex
>
>> > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
>> > +                       int (*fn)(struct pci_dev *, u16 id, void *),
>> > +                       void *data);
>> > +
>> >  /**
>> >   * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device
>> >   * @pdev: the PCI device
>> >
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson July 24, 2013, 3:50 p.m. UTC | #4
On Wed, 2013-07-24 at 23:03 +0800, Andrew Cooks wrote:
> On Wed, Jul 24, 2013 at 7:21 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
> >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> >> > This provides interfaces for drivers to discover the visible PCIe
> >> > requester ID for a device, for things like IOMMU setup, and iterate
> >>
> >> IDs (plural)
> >
> > How does a device can't have multiple requester IDs?  Reading below, I'm
> > not sure we're on the same page for the purpose of this patch.
> >
> >> > over the device chain from requestee to requester, including DMA
> >> > quirks at each step.
> >>
> >> "requestee" doesn't make sense to me.  The "-ee" suffix added to a verb
> >> normally makes a noun that refers to the object of the action.  So
> >> "requestee" sounds like it means something like "target" or "responder,"
> >> but that's not what you mean here.
> >
> > Hmm, ok.  I figured a request-er makes a request on behalf of a
> > request-ee.  Suggestions?
> >
> >> > Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
> >> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> >> > ---
> >> >  drivers/pci/search.c |  198 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >  include/linux/pci.h  |    7 ++
> >> >  2 files changed, 205 insertions(+)
> >> >
> >> > diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> >> > index d0627fa..4759c02 100644
> >> > --- a/drivers/pci/search.c
> >> > +++ b/drivers/pci/search.c
> >> > @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem);
> >> >  EXPORT_SYMBOL_GPL(pci_bus_sem);
> >> >
> >> >  /*
> >> > + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID
> >> > + * @dev: device to test
> >> > + */
> >> > +static bool pci_has_pcie_requester_id(struct pci_dev *dev)
> >> > +{
> >> > +   /*
> >> > +    * XXX There's no indicator of the bus type, conventional PCI vs
> >> > +    * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe
> >> > +    * requester ID is a native PCIe based system (such as VT-d or
> >> > +    * AMD-Vi).  It's common that PCIe root complex devices do not
> >> > +    * include a PCIe capability, but we can assume they are PCIe
> >> > +    * devices based on their topology.
> >> > +    */
> >> > +   if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus))
> >> > +           return true;
> >> > +
> >> > +   /*
> >> > +    * PCI-X devices have a requester ID, but the bridge may still take
> >> > +    * ownership of transactions and create a requester ID.  We therefore
> >> > +    * assume that the PCI-X requester ID is not the same one used on PCIe.
> >> > +    */
> >> > +
> >> > +#ifdef CONFIG_PCI_QUIRKS
> >> > +   /*
> >> > +    * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability.
> >> > +    * If the device is a bridge, look to the next device upstream of it.
> >> > +    * If that device is PCIe and not a PCIe-to-PCI bridge, then by
> >> > +    * deduction, the device must be PCIe and therefore has a requester ID.
> >> > +    */
> >> > +   if (dev->subordinate) {
> >> > +           struct pci_dev *parent = dev->bus->self;
> >> > +
> >> > +           if (pci_is_pcie(parent) &&
> >> > +               pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
> >> > +                   return true;
> >> > +   }
> >> > +#endif
> >> > +
> >> > +   return false;
> >> > +}
> >> > +
> >> > +/*
> >> > + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID?
> >> > + * @dev: requester device
> >> > + * @bridge: upstream bridge (or NULL for root bus)
> >> > + */
> >> > +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev,
> >> > +                                         struct pci_dev *bridge)
> >> > +{
> >> > +   /*
> >> > +    * The entire path must be tested, if any step does not have a
> >> > +    * requester ID, the chain is broken.  This allows us to support
> >> > +    * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe
> >> > +    */
> >> > +   while (dev != bridge) {
> >> > +           if (!pci_has_pcie_requester_id(dev))
> >> > +                   return false;
> >> > +
> >> > +           if (pci_is_root_bus(dev->bus))
> >> > +                   return !bridge; /* false if we don't hit @bridge */
> >> > +
> >> > +           dev = dev->bus->self;
> >> > +   }
> >> > +
> >> > +   return true;
> >> > +}
> >> > +
> >> > +/*
> >> > + * Legacy PCI bridges within a root complex (ex. Intel 82801) report
> >> > + * a different requester ID than a standard PCIe-to-PCI bridge.  Instead
> >> > + * of using (subordinate << 8 | 0) the use (bus << 8 | devfn), like a
> >>
> >> s/the/they/
> >>
> >> Did you learn about this empirically?  Intel spec?  I wonder if there's
> >> some way to derive this from the PCIe specs.
> >
> > It's in the current intel-iommu logic, pretty much anywhere it uses
> > pci_find_upstream_pcie_bridge() it follows that up with a pci_is_pcie()
> > check.  If it's PCIe, it uses a traditional PCIe-to-PCI bridge ID.  If
> > it's a legacy PCI bridge it uses the bridge ID itself.
> >
> > static int
> > domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev,
> >                         int translation)
> > {
> > ...
> >         /* dependent device mapping */
> >         tmp = pci_find_upstream_pcie_bridge(pdev);
> >         if (!tmp)
> >                 return 0;
> > ...
> >         if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
> >                 return domain_context_mapping_one(domain,
> >                                         pci_domain_nr(tmp->subordinate),
> >                                         tmp->subordinate->number, 0,
> >                                         translation);
> >         else /* this is a legacy PCI bridge */
> >                 return domain_context_mapping_one(domain,
> >                                                   pci_domain_nr(tmp->bus),
> >                                                   tmp->bus->number,
> >                                                   tmp->devfn,
> >                                                   translation);
> >
> > The 82801 reference is from looking at when this would actually happen
> > on one of my own systems.
> >
> >
> >> > + * standard PCIe endpoint.  This function detects them.
> >> > + *
> >> > + * XXX Is this Intel vendor ID specific?
> >> > + */
> >> > +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge)
> >> > +{
> >> > +   if (!pci_is_pcie(bridge) && pci_is_root_bus(bridge->bus))
> >> > +           return true;
> >> > +
> >> > +   return false;
> >> > +}
> >> > +
> >> > +#define PCI_REQUESTER_ID(dev)      (((dev)->bus->number << 8) | (dev)->devfn)
> >> > +#define PCI_BRIDGE_REQUESTER_ID(dev)       ((dev)->subordinate->number << 8)
> >> > +
> >> > +/*
> >> > + * pci_get_visible_pcie_requester - Get requester and requester ID for
> >> > + *                                  @requestee below @bridge
> >> > + * @requestee: requester device
> >> > + * @bridge: upstream bridge (or NULL for root bus)
> >> > + * @requester_id: location to store requester ID or NULL
> >> > + */
> >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> >> > +                                          struct pci_dev *bridge,
> >> > +                                          u16 *requester_id)
> >>
> >> I'm not sure it makes sense to return a struct pci_dev here because
> >> there's no requirement that a requester ID correspond to an actual
> >> pci_dev.
> >
> > That's why this function is named get_.._requester instead of requester
> > ID.  I believe there still has to be a struct pci_dev that does the
> > request, but the requester ID for that device may not actually match.
> > So I return both.  In a PCIe-to-PCI bridge case, the pci_dev is the
> > bridge, but the requester ID is either the bridge bus|devfn or
> > subordinate|0 depending on the topology.  If we want to support "ghost
> > functions", we can return the real pci_dev and a ghost requester ID.
> >
> > I think if we used just a requester ID, it ends up being extremely
> > difficult to pass that into anything else since we then have to search
> > again for where that requester ID is rooted.
> >
> >> > +{
> >> > +   struct pci_dev *requester = requestee;
> >> > +
> >> > +   while (requester != bridge) {
> >> > +           requester = pci_get_dma_source(requester);
> >> > +           pci_dev_put(requester); /* XXX skip ref cnt */
> >> > +
> >> > +           if (pci_has_visible_pcie_requester_id(requester, bridge))
> >>
> >> If we acquire the "requester" pointer via a ref-counting interface,
> >> it's illegal to use the pointer after dropping the reference, isn't it?
> >> Maybe that's what you mean by the "XXX" comment.
> >
> >
> > Yes, I was just following your RFC lead and didn't want to deal with
> > reference counting until this approach had enough traction.
> >
> >
> >> > +                   break;
> >> > +
> >> > +           if (pci_is_root_bus(requester->bus))
> >> > +                   return NULL; /* @bridge not parent to @requestee */
> >> > +
> >> > +           requester = requester->bus->self;
> >> > +   }
> >> > +
> >> > +   if (requester_id) {
> >> > +           if (requester->bus != requestee->bus &&
> >> > +               !pci_bridge_uses_endpoint_requester(requester))
> >> > +                   *requester_id = PCI_BRIDGE_REQUESTER_ID(requester);
> >> > +           else
> >> > +                   *requester_id = PCI_REQUESTER_ID(requester);
> >> > +   }
> >> > +
> >> > +   return requester;
> >> > +}
> >> > +
> >> > +static int pci_do_requester_callback(struct pci_dev **dev,
> >> > +                                int (*fn)(struct pci_dev *,
> >> > +                                          u16 id, void *),
> >> > +                                void *data)
> >> > +{
> >> > +   struct pci_dev *dma_dev;
> >> > +   int ret;
> >> > +
> >> > +   ret = fn(*dev, PCI_REQUESTER_ID(*dev), data);
> >> > +   if (ret)
> >> > +           return ret;
> >> > +
> >> > +   dma_dev = pci_get_dma_source(*dev);
> >> > +   pci_dev_put(dma_dev); /* XXX skip ref cnt */
> >> > +   if (dma_dev == *dev)
> >>
> >> Same ref count question as above.
> >>
> >> > +           return 0;
> >> > +
> >> > +   ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data);
> >> > +   if (ret)
> >> > +           return ret;
> >> > +
> >> > +   *dev = dma_dev;
> >> > +   return 0;
> >> > +}
> >> > +
> >> > +/*
> >> > + * pcie_for_each_requester - Call callback @fn on each devices and DMA source
> >> > + *                           from @requestee to the PCIe requester ID visible
> >> > + *                           to @bridge.
> >>
> >> Transactions from a device may appear with one of several requester IDs,
> >> but there's not necessarily an actual pci_dev for each ID, so I think the
> >> caller reads better if it's "...for_each_requester_id()"
> >
> > Wouldn't you expect to pass a requester ID into a function with that
> > name?  I'm pretty sure I had it named that at one point but thought the
> > parameters made more sense this way.  I'll see if I can think of a
> > better name.
> >
> >> The "Call X on each devices and DMA source from Y to the requester ID"
> >> part doesn't quite make a sentence.
> >
> >
> > Ok
> >
> >> > + * @requestee: Starting device
> >> > + * @bridge: upstream bridge (or NULL for root bus)
> >>
> >> You should say something about the significance of @bridge.  I think the
> >> point is to call @fn for every possible requester ID @bridge could see for
> >> transactions from @requestee.  This is a way to learn the requester IDs an
> >> IOMMU at @bridge needs to be prepared for.
> >
> > I can add something.  @bridge is supposed to be for bridge-based IOMMUs.
> > Essentially it's just a stopping point.  There might be PCI topology
> > upstream of @bridge, but if you only want the requester ID seen by the
> > bridge, you don't care.
> >
> >> > + * @fn: callback function
> >> > + * @data: data to pass to callback
> >> > + */
> >> > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
> >> > +                       int (*fn)(struct pci_dev *, u16 id, void *),
> >> > +                       void *data)
> >> > +{
> >> > +   struct pci_dev *requester;
> >> > +   struct pci_dev *dev = requestee;
> >> > +   int ret = 0;
> >> > +
> >> > +   requester = pci_get_visible_pcie_requester(requestee, bridge, NULL);
> >> > +   if (!requester)
> >> > +           return -EINVAL;
> >> > +
> >> > +   do {
> >> > +           ret = pci_do_requester_callback(&dev, fn, data);
> >> > +           if (ret)
> >> > +                   return ret;
> >> > +
> >> > +           if (dev == requester)
> >> > +                   return 0;
> >> > +
> >> > +           /*
> >> > +            * We always consider root bus devices to have a visible
> >> > +            * requester ID, therefore this should never be true.
> >> > +            */
> >> > +           BUG_ON(pci_is_root_bus(dev->bus));
> >>
> >> What are we going to do if somebody hits this BUG_ON()?  If it's impossible
> >> to hit, we should just remove it.  If it's possible to hit it in some weird
> >> topology you didn't consider, we should see IOMMU faults for any requester
> >> ID we neglected to map, and that fault would be a better debugging hint
> >> than a BUG_ON() here.
> >
> > It's mostly for readability.  I've learned that we never what to look at
> > dev->bus->self without first testing pci_is_root_bus(dev->bus), so if I
> > was reading this code I'd stumble when I got here and spend too long
> > looking around for the assumption that makes it not needed.  I suppose I
> > could just make it a comment, but I thought why not make it official w/
> > a BUG.
> >
> >> > +
> >> > +           dev = dev->bus->self;
> >> > +
> >> > +   } while (dev != requester);
> >> > +
> >> > +   /*
> >> > +    * If we've made it here, @requester is a bridge upstream from
> >> > +    * @requestee.
> >> > +    */
> >> > +   if (pci_bridge_uses_endpoint_requester(requester))
> >> > +           return pci_do_requester_callback(&requester, fn, data);
> >> > +
> >> > +   return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data);
> >> > +}
> >> > +
> >> > +/*
> >> >   * find the upstream PCIe-to-PCI bridge of a PCI device
> >> >   * if the device is PCIE, return NULL
> >> >   * if the device isn't connected to a PCIe bridge (that is its parent is a
> >> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> >> > index 3a24e4f..94e81d1 100644
> >> > --- a/include/linux/pci.h
> >> > +++ b/include/linux/pci.h
> >> > @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
> >> >  }
> >> >  #endif
> >> >
> >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> >> > +                                          struct pci_dev *bridge,
> >> > +                                          u16 *requester_id);
> >>
> >> The structure of this interface implies that there is only one visible
> >> requester ID, but the whole point of this patch is that a transaction from
> >> @requestee may appear with one of several requester IDs.  So which one will
> >> this return?
> >
> > I thought the point of this patch was to have an integrated interface
> > for finding the requester ID and doing something across all devices with
> > that requester ID and thereby remove pci_find_upstream_pcie_bridge(),
> > provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for
> > devices that use the wrong requester ID.  In what case does a device
> > appear to have multiple requester IDs?  We have cases where devices use
> > the wrong requester ID, but AFAIK they only use a single wrong requester
> > ID.  Thanks,
> >
> 
> The cases I've found where multiple requester IDs were used are all related
> to Marvell Sata controllers.
> 
> Here's a table of affected devices with links to the
> bug reports. In each case both functions 0 and 1 are used.

I'm confused.  There's only one requester ID for a given transaction.
Therefore on a per transaction level there's a 1:1 relationship between
a requester ID and a device.  The requester ID may be incorrect, but
there cannot be more than one per transaction.  Are you suggesting that
a given device (PCI function) will alternate using one requester ID for
some transactions and a different requester ID for others?  I'm thinking
of cases like this bz:

https://bugzilla.redhat.com/show_bug.cgi?id=757166#c16

In that case pci_get_visible_pcie_requester() (assuming no bridges are
involved) should return the device and the requester ID that it would
use if it worked correctly.  pcie_for_each_requester() should then call
the callback function using both the correct requester ID and the
quirked requester ID.  The reason I mentioned that this doesn't
currently handle ghost functions is because it uses the
pci_get_dma_source() quirk, which only handles actual devices.  We'd
need to add a pci_get_dma_alias() or something that would give us a list
of ghost requester IDs to handle these broken marvel devices.  Thanks,

Alex

> static const struct pci_dev_dma_multi_source_map {
>        u16 vendor;
>        u16 device;
> } pci_dev_dma_multi_source_map[] = {
>         /* Reported by Patrick Bregman
>          * https://bugzilla.redhat.com/show_bug.cgi?id=863653 */
>        { PCI_VENDOR_ID_MARVELL_EXT, 0x9120},
> 
>        /* Reported by  Pawe? ?ak, Korneliusz Jarz?bski, Daniel Mayer
>         * https://bugzilla.kernel.org/show_bug.cgi?id=42679 and by
>         * Justin Piszcz  https://lkml.org/lkml/2012/11/24/94 */
>        { PCI_VENDOR_ID_MARVELL_EXT, 0x9123},
> 
>        /* Reported by Robert Cicconetti
>         * https://bugzilla.kernel.org/show_bug.cgi?id=42679 and by
>         * Fernando https://bugzilla.redhat.com/show_bug.cgi?id=757166 */
>        { PCI_VENDOR_ID_MARVELL_EXT, 0x9128},
> 
>        /* Reported by Stijn Tintel
>         * https://bugzilla.kernel.org/show_bug.cgi?id=42679 */
>        { PCI_VENDOR_ID_MARVELL_EXT, 0x9130},
> 
>        /* Reported by Gaudenz Steinlin
>         * https://lkml.org/lkml/2013/3/5/288 */
>        { PCI_VENDOR_ID_MARVELL_EXT, 0x9143},
> 
>        /* Reported by Andrew Cooks
>         * https://bugzilla.kernel.org/show_bug.cgi?id=42679 */
>        { PCI_VENDOR_ID_MARVELL_EXT, 0x9172}
> };
> 
> 
> > Alex
> >
> >> > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
> >> > +                       int (*fn)(struct pci_dev *, u16 id, void *),
> >> > +                       void *data);
> >> > +
> >> >  /**
> >> >   * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device
> >> >   * @pdev: the PCI device
> >> >
> >
> >
> >



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas July 24, 2013, 4:47 p.m. UTC | #5
On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
>> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
>> > This provides interfaces for drivers to discover the visible PCIe
>> > requester ID for a device, for things like IOMMU setup, and iterate
>>
>> IDs (plural)
>
> How does a device can't have multiple requester IDs?  Reading below, I'm
> not sure we're on the same page for the purpose of this patch.

>> "requestee" doesn't make sense to me.  The "-ee" suffix added to a verb
>> normally makes a noun that refers to the object of the action.  So
>> "requestee" sounds like it means something like "target" or "responder,"
>> but that's not what you mean here.
>
> Hmm, ok.  I figured a request-er makes a request on behalf of a
> request-ee.  Suggestions?

I would expect a request-er to make a request *to* a request-ee, just
like a grant-or makes a grant to a grant-ee.  My suggestion is to use
"requester" consistently for only the originator of a DMA transaction.
 Any devices above it are by definition "bridges".  As the DMA
transaction propagates through the fabric, it may be tagged by bridges
with different requester IDs.

The requester IDs are needed outside PCI (by IOMMU drivers), but I'm
not sure the intermediate pci_devs are.

>> > + * pci_get_visible_pcie_requester - Get requester and requester ID for
>> > + *                                  @requestee below @bridge
>> > + * @requestee: requester device
>> > + * @bridge: upstream bridge (or NULL for root bus)
>> > + * @requester_id: location to store requester ID or NULL
>> > + */
>> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
>> > +                                          struct pci_dev *bridge,
>> > +                                          u16 *requester_id)
>>
>> I'm not sure it makes sense to return a struct pci_dev here because
>> there's no requirement that a requester ID correspond to an actual
>> pci_dev.
>
> That's why this function is named get_.._requester instead of requester
> ID.  I believe there still has to be a struct pci_dev that does the
> request, but the requester ID for that device may not actually match.
> So I return both.  In a PCIe-to-PCI bridge case, the pci_dev is the
> bridge, but the requester ID is either the bridge bus|devfn or
> subordinate|0 depending on the topology.  If we want to support "ghost
> functions", we can return the real pci_dev and a ghost requester ID.
>
> I think if we used just a requester ID, it ends up being extremely
> difficult to pass that into anything else since we then have to search
> again for where that requester ID is rooted.

Returning both a pci_dev and a requester ID makes it more complicated.
 At the hardware level, transactions contain only a requester ID, and
bridges can either drop it, pass it unchanged, or assign a new one.  I
think the code will be simpler if we just model that.

>> > + * pcie_for_each_requester - Call callback @fn on each devices and DMA source
>> > + *                           from @requestee to the PCIe requester ID visible
>> > + *                           to @bridge.
>>
>> Transactions from a device may appear with one of several requester IDs,
>> but there's not necessarily an actual pci_dev for each ID, so I think the
>> caller reads better if it's "...for_each_requester_id()"
>
> Wouldn't you expect to pass a requester ID into a function with that
> name?  I'm pretty sure I had it named that at one point but thought the
> parameters made more sense this way.  I'll see if I can think of a
> better name.

My thought was to pass a pci_dev (the originator of the DMA, which I
would call the "requester") and a callback.  The callback would accept
the requester pci_dev (always the same requester device) and a
requester ID.

This would call @fn for each possible requester ID for transactions
from the device.  IOMMU drivers should only need the requester ID to
manage mappings; they shouldn't need a pci_dev corresponding to any
intermediate bridges.

>> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
>> > +                                          struct pci_dev *bridge,
>> > +                                          u16 *requester_id);
>>
>> The structure of this interface implies that there is only one visible
>> requester ID, but the whole point of this patch is that a transaction from
>> @requestee may appear with one of several requester IDs.  So which one will
>> this return?
>
> I thought the point of this patch was to have an integrated interface
> for finding the requester ID and doing something across all devices with
> that requester ID

Oh, here's the difference in our understanding.  "Doing something
across all devices with that requester ID" sounds like identifying the
set of devices that have to be handled as a group by the IOMMU.
That's certainly an issue, but I wasn't considering it at all.

I was only concerned with the question of a single device that
requires multiple IOMMU mappings because DMA requests might use any of
several source IDs.  This is mentioned in sec 3.6.1.1 of the VT-d
spec, i.e., "requests arriving with the source-id in the original
PCI-X transaction or the source-id provided by the bridge. ...
software must program multiple context entries, each corresponding to
the possible set of source-ids."

My thought was that the IOMMU driver would use
pcie_for_each_requester_id() and have the callback set up a mapping
for one requester ID each time it was called.

> and thereby remove pci_find_upstream_pcie_bridge(),
> provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for
> devices that use the wrong requester ID.  In what case does a device
> appear to have multiple requester IDs?  We have cases where devices use
> the wrong requester ID, but AFAIK they only use a single wrong requester
> ID.

I think the example in sec 3.6.1.1 answers this, doesn't it?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson July 24, 2013, 6:12 p.m. UTC | #6
On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote:
> On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
> >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> >> > This provides interfaces for drivers to discover the visible PCIe
> >> > requester ID for a device, for things like IOMMU setup, and iterate
> >>
> >> IDs (plural)
> >
> > How does a device can't have multiple requester IDs?  Reading below, I'm
> > not sure we're on the same page for the purpose of this patch.
> 
> >> "requestee" doesn't make sense to me.  The "-ee" suffix added to a verb
> >> normally makes a noun that refers to the object of the action.  So
> >> "requestee" sounds like it means something like "target" or "responder,"
> >> but that's not what you mean here.
> >
> > Hmm, ok.  I figured a request-er makes a request on behalf of a
> > request-ee.  Suggestions?
> 
> I would expect a request-er to make a request *to* a request-ee, just
> like a grant-or makes a grant to a grant-ee.  My suggestion is to use
> "requester" consistently for only the originator of a DMA transaction.
>  Any devices above it are by definition "bridges".

But there may not be a bridge, so are we then creating just a revised
pci_find_upstream_pcie_bridge() function?  I think we want to abstract
away caring whether there's a bridge or a well behaved device or a
broken device that requires quirks.

>   As the DMA
> transaction propagates through the fabric, it may be tagged by bridges
> with different requester IDs.
> 
> The requester IDs are needed outside PCI (by IOMMU drivers), but I'm
> not sure the intermediate pci_devs are.

A u16 requester ID doesn't mean much on it's own though, it's not
necessarily even unique.  A requester ID associated with the context of
a pci_dev is unique and gives us a reference point if we need to perform
another operation on that requester ID.

> >> > + * pci_get_visible_pcie_requester - Get requester and requester ID for
> >> > + *                                  @requestee below @bridge
> >> > + * @requestee: requester device
> >> > + * @bridge: upstream bridge (or NULL for root bus)
> >> > + * @requester_id: location to store requester ID or NULL
> >> > + */
> >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> >> > +                                          struct pci_dev *bridge,
> >> > +                                          u16 *requester_id)
> >>
> >> I'm not sure it makes sense to return a struct pci_dev here because
> >> there's no requirement that a requester ID correspond to an actual
> >> pci_dev.
> >
> > That's why this function is named get_.._requester instead of requester
> > ID.  I believe there still has to be a struct pci_dev that does the
> > request, but the requester ID for that device may not actually match.
> > So I return both.  In a PCIe-to-PCI bridge case, the pci_dev is the
> > bridge, but the requester ID is either the bridge bus|devfn or
> > subordinate|0 depending on the topology.  If we want to support "ghost
> > functions", we can return the real pci_dev and a ghost requester ID.
> >
> > I think if we used just a requester ID, it ends up being extremely
> > difficult to pass that into anything else since we then have to search
> > again for where that requester ID is rooted.
> 
> Returning both a pci_dev and a requester ID makes it more complicated.
>  At the hardware level, transactions contain only a requester ID, and
> bridges can either drop it, pass it unchanged, or assign a new one.  I
> think the code will be simpler if we just model that.

I'm not convinced.  Patch 2/2 makes use of both the returned pci_dev and
the returned requester ID and it's a huge simplification overall.

> >> > + * pcie_for_each_requester - Call callback @fn on each devices and DMA source
> >> > + *                           from @requestee to the PCIe requester ID visible
> >> > + *                           to @bridge.
> >>
> >> Transactions from a device may appear with one of several requester IDs,
> >> but there's not necessarily an actual pci_dev for each ID, so I think the
> >> caller reads better if it's "...for_each_requester_id()"
> >
> > Wouldn't you expect to pass a requester ID into a function with that
> > name?  I'm pretty sure I had it named that at one point but thought the
> > parameters made more sense this way.  I'll see if I can think of a
> > better name.
> 
> My thought was to pass a pci_dev (the originator of the DMA, which I
> would call the "requester") and a callback.  The callback would accept
> the requester pci_dev (always the same requester device) and a
> requester ID.
> 
> This would call @fn for each possible requester ID for transactions
> from the device.  IOMMU drivers should only need the requester ID to
> manage mappings; they shouldn't need a pci_dev corresponding to any
> intermediate bridges.

This implementation is almost the same with the only change being that
the pci_dev passed to the callback is the one most closely associated
with the requester ID.  For the IOMMU driver, it doesn't matter since
it's only using the requester ID, but why would the callback care about
the original requester?  If it needed to do something device specific,
it's going to care about the closest device to the requester ID.

> >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> >> > +                                          struct pci_dev *bridge,
> >> > +                                          u16 *requester_id);
> >>
> >> The structure of this interface implies that there is only one visible
> >> requester ID, but the whole point of this patch is that a transaction from
> >> @requestee may appear with one of several requester IDs.  So which one will
> >> this return?
> >
> > I thought the point of this patch was to have an integrated interface
> > for finding the requester ID and doing something across all devices with
> > that requester ID
> 
> Oh, here's the difference in our understanding.  "Doing something
> across all devices with that requester ID" sounds like identifying the
> set of devices that have to be handled as a group by the IOMMU.
> That's certainly an issue, but I wasn't considering it at all.
> 
> I was only concerned with the question of a single device that
> requires multiple IOMMU mappings because DMA requests might use any of
> several source IDs.  This is mentioned in sec 3.6.1.1 of the VT-d
> spec, i.e., "requests arriving with the source-id in the original
> PCI-X transaction or the source-id provided by the bridge. ...
> software must program multiple context entries, each corresponding to
> the possible set of source-ids."
> 
> My thought was that the IOMMU driver would use
> pcie_for_each_requester_id() and have the callback set up a mapping
> for one requester ID each time it was called.

Which is exactly what pcie_for_each_requester() does, but I try to solve
the problem of a single device that needs multiple context entries the
same as a bridge that masks multiple devices.

> > and thereby remove pci_find_upstream_pcie_bridge(),
> > provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for
> > devices that use the wrong requester ID.  In what case does a device
> > appear to have multiple requester IDs?  We have cases where devices use
> > the wrong requester ID, but AFAIK they only use a single wrong requester
> > ID.
> 
> I think the example in sec 3.6.1.1 answers this, doesn't it?

It does, in summary we have to be prepared that a transaction from a
device behind a PCIe-to-PCI/X bridge may use either the device requester
ID (bus|devfn) or the bridge (subordinate|0) requester ID.  This is why
pcie_for_each_requester() makes sense to me, we iterate across all
requesters that may use the same requester ID.  If we had a
pcie_for_each_requester_id() how would we handle that?  Do we call it
once for the (bus|devfn) requester ID of the device we're trying to map
and then call it again for the (subordinate|0) requester ID of the
bridge?  That's almost how pci_find_upstream_pcie_bridge() is used and
it ends up putting a lot of the logic into the driver.  How do we write
a pcie_for_each_requester_id() function that can uniquely identify a PCI
hierarchy to traverse in the presences of multiple domains?  I have a
hard time seeing how we get away from returning a {pci_dev, requester
ID} pair if we want to be descriptive.  Maybe we create a:

struct pci_requester_id {
	struct pci_dev *dev;
	u16 id;
}

and return that if this is a matter of semantics.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Donald Dutile July 24, 2013, 8:42 p.m. UTC | #7
On 07/23/2013 06:35 PM, Bjorn Helgaas wrote:
> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
>> This provides interfaces for drivers to discover the visible PCIe
>> requester ID for a device, for things like IOMMU setup, and iterate
>
> IDs (plural)
>
a single device does not have multiple requester id's;
can have multiple tag-id's (that are ignored in this situation, but
can be used by switches for ordering purposes), but there's only 1/fcn
(except for those quirker pdevs!).

>> over the device chain from requestee to requester, including DMA
>> quirks at each step.
>
> "requestee" doesn't make sense to me.  The "-ee" suffix added to a verb
> normally makes a noun that refers to the object of the action.  So
> "requestee" sounds like it means something like "target" or "responder,"
> but that's not what you mean here.
>
sorry, I didn't follow the requester/requestee terminology either...

>> Suggested-by: Bjorn Helgaas<bhelgaas@google.com>
>> Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
>> ---
>>   drivers/pci/search.c |  198 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   include/linux/pci.h  |    7 ++
>>   2 files changed, 205 insertions(+)
>>
>> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
>> index d0627fa..4759c02 100644
>> --- a/drivers/pci/search.c
>> +++ b/drivers/pci/search.c
>> @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem);
>>   EXPORT_SYMBOL_GPL(pci_bus_sem);
>>
>>   /*
>> + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID
>> + * @dev: device to test
>> + */
>> +static bool pci_has_pcie_requester_id(struct pci_dev *dev)
>> +{
>> +	/*
>> +	 * XXX There's no indicator of the bus type, conventional PCI vs
>> +	 * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe
>> +	 * requester ID is a native PCIe based system (such as VT-d or
>> +	 * AMD-Vi).  It's common that PCIe root complex devices do not
could the above comment be x86-iommu-neutral?
by definition of PCIe, all devices have a requester id (min. to accept cfg cycles);
req'd if source of read/write requests, read completions.

>> +	 * include a PCIe capability, but we can assume they are PCIe
>> +	 * devices based on their topology.
>> +	 */
>> +	if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus))
>> +		return true;
>> +
>> +	/*
>> +	 * PCI-X devices have a requester ID, but the bridge may still take
>> +	 * ownership of transactions and create a requester ID.  We therefore
>> +	 * assume that the PCI-X requester ID is not the same one used on PCIe.
>> +	 */
>> +
>> +#ifdef CONFIG_PCI_QUIRKS
>> +	/*
>> +	 * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability.
>> +	 * If the device is a bridge, look to the next device upstream of it.
>> +	 * If that device is PCIe and not a PCIe-to-PCI bridge, then by
>> +	 * deduction, the device must be PCIe and therefore has a requester ID.
>> +	 */
>> +	if (dev->subordinate) {
>> +		struct pci_dev *parent = dev->bus->self;
>> +
>> +		if (pci_is_pcie(parent)&&
>> +		    pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
>> +			return true;
>> +	}
>> +#endif
>> +
>> +	return false;
>> +}
>> +
>> +/*
>> + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID?
>> + * @dev: requester device
>> + * @bridge: upstream bridge (or NULL for root bus)
>> + */
>> +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev,
>> +					      struct pci_dev *bridge)
>> +{
>> +	/*
>> +	 * The entire path must be tested, if any step does not have a
>> +	 * requester ID, the chain is broken.  This allows us to support
>> +	 * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe
>> +	 */
>> +	while (dev != bridge) {
>> +		if (!pci_has_pcie_requester_id(dev))
>> +			return false;
>> +
>> +		if (pci_is_root_bus(dev->bus))
>> +			return !bridge; /* false if we don't hit @bridge */
>> +
>> +		dev = dev->bus->self;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +/*
>> + * Legacy PCI bridges within a root complex (ex. Intel 82801) report
>> + * a different requester ID than a standard PCIe-to-PCI bridge.  Instead
First, I'm assuming you mean that devices behind a Legacy PCI bridge within a root complex
get assigned IDs different than std PCIe-to-PCI bridges (as quoted below).

>> + * of using (subordinate<<  8 | 0) the use (bus<<  8 | devfn), like a
>
> s/the/they/
>
well, the PPBs should inject their (secondary << 8 | 0), not subordinate.
 From the PCI Express to PCI/PCI-X Bridge spec, v1.0:
The Tag is reassigned by the bridge according to the rules outlined in the following sections. If the
bridge generates a new Requester ID for a transaction forwarded from the secondary interface to the
primary interface, the bridge assigns the PCI Express Requester ID using its secondary interface’s
                                                                              ^^^^^^^^^
Bus Number and sets both the Device Number and Function Number fields to zero.
^^^^^^^^^^

As for the 82801, looks like they took this part of the PCIe spec to heart:
(PCIe v3 spec, section 2.2.6.2 Transaction Descriptor - Transaction ID Field):
Exception: The assignment of Bus and Device Numbers to the Devices within a Root Complex,
and the Device Numbers to the Downstream Ports within a Switch, may be done in an
implementation specific way.
Obviously, you're missing the 'implementation-specific way' compiler... ;-)

aw: which 'bus' do you mean above in  '(bus<<  8 | devfn)' ?


> Did you learn about this empirically?  Intel spec?  I wonder if there's
> some way to derive this from the PCIe specs.
>
>> + * standard PCIe endpoint.  This function detects them.
>> + *
>> + * XXX Is this Intel vendor ID specific?
>> + */
>> +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge)
>> +{
>> +	if (!pci_is_pcie(bridge)&&  pci_is_root_bus(bridge->bus))
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>> +#define PCI_REQUESTER_ID(dev)	(((dev)->bus->number<<  8) | (dev)->devfn)
>> +#define PCI_BRIDGE_REQUESTER_ID(dev)	((dev)->subordinate->number<<  8)
>> +
>> +/*
>> + * pci_get_visible_pcie_requester - Get requester and requester ID for
>> + *                                  @requestee below @bridge
>> + * @requestee: requester device
>> + * @bridge: upstream bridge (or NULL for root bus)
>> + * @requester_id: location to store requester ID or NULL
>> + */
>> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
>> +					       struct pci_dev *bridge,
>> +					       u16 *requester_id)
>
> I'm not sure it makes sense to return a struct pci_dev here because
> there's no requirement that a requester ID correspond to an actual
> pci_dev.
>
well, I would expect the only callers would be for subsys (iommu's)
searching to find requester-id for a pdev, b/c if a pdev doesn't exist,
then the device (and requester-id) doesn't exist... :-/

>> +{
>> +	struct pci_dev *requester = requestee;
>> +
>> +	while (requester != bridge) {
>> +		requester = pci_get_dma_source(requester);
>> +		pci_dev_put(requester); /* XXX skip ref cnt */
>> +
>> +		if (pci_has_visible_pcie_requester_id(requester, bridge))
>
> If we acquire the "requester" pointer via a ref-counting interface,
> it's illegal to use the pointer after dropping the reference, isn't it?
> Maybe that's what you mean by the "XXX" comment.
>
>> +			break;
>> +
>> +		if (pci_is_root_bus(requester->bus))
>> +			return NULL; /* @bridge not parent to @requestee */
>> +
>> +		requester = requester->bus->self;
>> +	}
>> +
>> +	if (requester_id) {
>> +		if (requester->bus != requestee->bus&&
>> +		    !pci_bridge_uses_endpoint_requester(requester))
>> +			*requester_id = PCI_BRIDGE_REQUESTER_ID(requester);
>> +		else
>> +			*requester_id = PCI_REQUESTER_ID(requester);
>> +	}
>> +
>> +	return requester;
>> +}
>> +
>> +static int pci_do_requester_callback(struct pci_dev **dev,
>> +				     int (*fn)(struct pci_dev *,
>> +					       u16 id, void *),
>> +				     void *data)
>> +{
>> +	struct pci_dev *dma_dev;
>> +	int ret;
>> +
>> +	ret = fn(*dev, PCI_REQUESTER_ID(*dev), data);
>> +	if (ret)
>> +		return ret;
>> +
>> +	dma_dev = pci_get_dma_source(*dev);
>> +	pci_dev_put(dma_dev); /* XXX skip ref cnt */
>> +	if (dma_dev == *dev)
>
> Same ref count question as above.
>
>> +		return 0;
>> +
>> +	ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data);
>> +	if (ret)
>> +		return ret;
>> +
>> +	*dev = dma_dev;
>> +	return 0;
>> +}
>> +
>> +/*
>> + * pcie_for_each_requester - Call callback @fn on each devices and DMA source
>> + *                           from @requestee to the PCIe requester ID visible
>> + *                           to @bridge.
>
> Transactions from a device may appear with one of several requester IDs,
> but there's not necessarily an actual pci_dev for each ID, so I think the
ditto above; have to have a pdev for each id....

> caller reads better if it's "...for_each_requester_id()"
>
> The "Call X on each devices and DMA source from Y to the requester ID"
> part doesn't quite make a sentence.
>
>> + * @requestee: Starting device
>> + * @bridge: upstream bridge (or NULL for root bus)
>
> You should say something about the significance of @bridge.  I think the
> point is to call @fn for every possible requester ID @bridge could see for
> transactions from @requestee.  This is a way to learn the requester IDs an
> IOMMU at @bridge needs to be prepared for.
>
>> + * @fn: callback function
>> + * @data: data to pass to callback
>> + */
>> +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
>> +			    int (*fn)(struct pci_dev *, u16 id, void *),
>> +			    void *data)
>> +{
>> +	struct pci_dev *requester;
>> +	struct pci_dev *dev = requestee;
>> +	int ret = 0;
>> +
>> +	requester = pci_get_visible_pcie_requester(requestee, bridge, NULL);
>> +	if (!requester)
>> +		return -EINVAL;
>> +
>> +	do {
>> +		ret = pci_do_requester_callback(&dev, fn, data);
>> +		if (ret)
>> +			return ret;
>> +
>> +		if (dev == requester)
>> +			return 0;
>> +
>> +		/*
>> +		 * We always consider root bus devices to have a visible
>> +		 * requester ID, therefore this should never be true.
>> +		 */
>> +		BUG_ON(pci_is_root_bus(dev->bus));
>
> What are we going to do if somebody hits this BUG_ON()?  If it's impossible
> to hit, we should just remove it.  If it's possible to hit it in some weird
> topology you didn't consider, we should see IOMMU faults for any requester
> ID we neglected to map, and that fault would be a better debugging hint
> than a BUG_ON() here.
>
according to spec, all pdev's have a requester-id, even RC ones, albeit
"implementation specific"...

>> +
>> +		dev = dev->bus->self;
>> +
>> +	} while (dev != requester);
>> +
>> +	/*
>> +	 * If we've made it here, @requester is a bridge upstream from
>> +	 * @requestee.
>> +	 */
>> +	if (pci_bridge_uses_endpoint_requester(requester))
>> +		return pci_do_requester_callback(&requester, fn, data);
>> +
>> +	return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data);
>> +}
>> +
>> +/*
>>    * find the upstream PCIe-to-PCI bridge of a PCI device
>>    * if the device is PCIE, return NULL
>>    * if the device isn't connected to a PCIe bridge (that is its parent is a
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 3a24e4f..94e81d1 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
>>   }
>>   #endif
>>
>> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
>> +					       struct pci_dev *bridge,
>> +					       u16 *requester_id);
>
> The structure of this interface implies that there is only one visible
> requester ID, but the whole point of this patch is that a transaction from
> @requestee may appear with one of several requester IDs.  So which one will
> this return?
>
Are there devices that use multiple requester id's?
I know we have ones that use the wrong id.
If we want to handle the multiple requester-id's per pdev,
we could pass in a ptr to an initial requester-id; if null, give me first;
if !null, give me next one;  would also need a flag returned to indicate
there is more than 1.
null-rtn when pass in !null ref, means the end.
  ... sledge-hammer + nail ???

>> +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
>> +			    int (*fn)(struct pci_dev *, u16 id, void *),
>> +			    void *data);
>> +
>>   /**
>>    * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device
>>    * @pdev: the PCI device
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson July 24, 2013, 9:22 p.m. UTC | #8
On Wed, 2013-07-24 at 16:42 -0400, Don Dutile wrote:
> On 07/23/2013 06:35 PM, Bjorn Helgaas wrote:
> > On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> >> This provides interfaces for drivers to discover the visible PCIe
> >> requester ID for a device, for things like IOMMU setup, and iterate
> >
> > IDs (plural)
> >
> a single device does not have multiple requester id's;
> can have multiple tag-id's (that are ignored in this situation, but
> can be used by switches for ordering purposes), but there's only 1/fcn
> (except for those quirker pdevs!).
> 
> >> over the device chain from requestee to requester, including DMA
> >> quirks at each step.
> >
> > "requestee" doesn't make sense to me.  The "-ee" suffix added to a verb
> > normally makes a noun that refers to the object of the action.  So
> > "requestee" sounds like it means something like "target" or "responder,"
> > but that's not what you mean here.
> >
> sorry, I didn't follow the requester/requestee terminology either...
> 
> >> Suggested-by: Bjorn Helgaas<bhelgaas@google.com>
> >> Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
> >> ---
> >>   drivers/pci/search.c |  198 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >>   include/linux/pci.h  |    7 ++
> >>   2 files changed, 205 insertions(+)
> >>
> >> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> >> index d0627fa..4759c02 100644
> >> --- a/drivers/pci/search.c
> >> +++ b/drivers/pci/search.c
> >> @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem);
> >>   EXPORT_SYMBOL_GPL(pci_bus_sem);
> >>
> >>   /*
> >> + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID
> >> + * @dev: device to test
> >> + */
> >> +static bool pci_has_pcie_requester_id(struct pci_dev *dev)
> >> +{
> >> +	/*
> >> +	 * XXX There's no indicator of the bus type, conventional PCI vs
> >> +	 * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe
> >> +	 * requester ID is a native PCIe based system (such as VT-d or
> >> +	 * AMD-Vi).  It's common that PCIe root complex devices do not
> could the above comment be x86-iommu-neutral?
> by definition of PCIe, all devices have a requester id (min. to accept cfg cycles);
> req'd if source of read/write requests, read completions.

I agree completely, the question is whether we have a PCIe root complex
or a conventional PCI host bus.  I don't think we have any way to tell,
so I'm assuming pci_is_root_bus() indicates we're on a PCIe root complex
and therefore have requester IDs.  If there's some way to determine this
let me know and we can avoid any kind of assumption.

> >> +	 * include a PCIe capability, but we can assume they are PCIe
> >> +	 * devices based on their topology.
> >> +	 */
> >> +	if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus))
> >> +		return true;
> >> +
> >> +	/*
> >> +	 * PCI-X devices have a requester ID, but the bridge may still take
> >> +	 * ownership of transactions and create a requester ID.  We therefore
> >> +	 * assume that the PCI-X requester ID is not the same one used on PCIe.
> >> +	 */
> >> +
> >> +#ifdef CONFIG_PCI_QUIRKS
> >> +	/*
> >> +	 * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability.
> >> +	 * If the device is a bridge, look to the next device upstream of it.
> >> +	 * If that device is PCIe and not a PCIe-to-PCI bridge, then by
> >> +	 * deduction, the device must be PCIe and therefore has a requester ID.
> >> +	 */
> >> +	if (dev->subordinate) {
> >> +		struct pci_dev *parent = dev->bus->self;
> >> +
> >> +		if (pci_is_pcie(parent)&&
> >> +		    pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
> >> +			return true;
> >> +	}
> >> +#endif
> >> +
> >> +	return false;
> >> +}
> >> +
> >> +/*
> >> + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID?
> >> + * @dev: requester device
> >> + * @bridge: upstream bridge (or NULL for root bus)
> >> + */
> >> +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev,
> >> +					      struct pci_dev *bridge)
> >> +{
> >> +	/*
> >> +	 * The entire path must be tested, if any step does not have a
> >> +	 * requester ID, the chain is broken.  This allows us to support
> >> +	 * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe
> >> +	 */
> >> +	while (dev != bridge) {
> >> +		if (!pci_has_pcie_requester_id(dev))
> >> +			return false;
> >> +
> >> +		if (pci_is_root_bus(dev->bus))
> >> +			return !bridge; /* false if we don't hit @bridge */
> >> +
> >> +		dev = dev->bus->self;
> >> +	}
> >> +
> >> +	return true;
> >> +}
> >> +
> >> +/*
> >> + * Legacy PCI bridges within a root complex (ex. Intel 82801) report
> >> + * a different requester ID than a standard PCIe-to-PCI bridge.  Instead
> First, I'm assuming you mean that devices behind a Legacy PCI bridge within a root complex
> get assigned IDs different than std PCIe-to-PCI bridges (as quoted below).

Yes

> >> + * of using (subordinate<<  8 | 0) the use (bus<<  8 | devfn), like a
> >
> > s/the/they/
> >
> well, the PPBs should inject their (secondary << 8 | 0), not subordinate.
>  From the PCI Express to PCI/PCI-X Bridge spec, v1.0:
> The Tag is reassigned by the bridge according to the rules outlined in the following sections. If the
> bridge generates a new Requester ID for a transaction forwarded from the secondary interface to the
> primary interface, the bridge assigns the PCI Express Requester ID using its secondary interface’s
>                                                                               ^^^^^^^^^
> Bus Number and sets both the Device Number and Function Number fields to zero.
> ^^^^^^^^^^

I'm referring to (pdev->subordinate->number << 8 | 0) vs
(pdev->bus->number << 8 | pdev->devfn).  The subordinate struct pci_bus
is the secondary interface bus.

> As for the 82801, looks like they took this part of the PCIe spec to heart:
> (PCIe v3 spec, section 2.2.6.2 Transaction Descriptor - Transaction ID Field):
> Exception: The assignment of Bus and Device Numbers to the Devices within a Root Complex,
> and the Device Numbers to the Downstream Ports within a Switch, may be done in an
> implementation specific way.
> Obviously, you're missing the 'implementation-specific way' compiler... ;-)

So this is potentially Intel specific.  How can we tell it's an Intel
root complex?

> aw: which 'bus' do you mean above in  '(bus<<  8 | devfn)' ?

(pdev->bus->number << 8 | pdev->devfn)

> > Did you learn about this empirically?  Intel spec?  I wonder if there's
> > some way to derive this from the PCIe specs.
> >
> >> + * standard PCIe endpoint.  This function detects them.
> >> + *
> >> + * XXX Is this Intel vendor ID specific?
> >> + */
> >> +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge)
> >> +{
> >> +	if (!pci_is_pcie(bridge)&&  pci_is_root_bus(bridge->bus))
> >> +		return true;
> >> +
> >> +	return false;
> >> +}
> >> +
> >> +#define PCI_REQUESTER_ID(dev)	(((dev)->bus->number<<  8) | (dev)->devfn)
> >> +#define PCI_BRIDGE_REQUESTER_ID(dev)	((dev)->subordinate->number<<  8)
> >> +
> >> +/*
> >> + * pci_get_visible_pcie_requester - Get requester and requester ID for
> >> + *                                  @requestee below @bridge
> >> + * @requestee: requester device
> >> + * @bridge: upstream bridge (or NULL for root bus)
> >> + * @requester_id: location to store requester ID or NULL
> >> + */
> >> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> >> +					       struct pci_dev *bridge,
> >> +					       u16 *requester_id)
> >
> > I'm not sure it makes sense to return a struct pci_dev here because
> > there's no requirement that a requester ID correspond to an actual
> > pci_dev.
> >
> well, I would expect the only callers would be for subsys (iommu's)
> searching to find requester-id for a pdev, b/c if a pdev doesn't exist,
> then the device (and requester-id) doesn't exist... :-/

One of the cases Bjorn is referring to is probably the simple case of a
PCIe-to-PCI bridge.  The requester ID is (bridge->subordinate->number <<
8 | 0), which is not an actual device.  As coded here, the function
returns bridge, but requester_id is (bridge->subordinate->number << 8 |
0).

> >> +{
> >> +	struct pci_dev *requester = requestee;
> >> +
> >> +	while (requester != bridge) {
> >> +		requester = pci_get_dma_source(requester);
> >> +		pci_dev_put(requester); /* XXX skip ref cnt */
> >> +
> >> +		if (pci_has_visible_pcie_requester_id(requester, bridge))
> >
> > If we acquire the "requester" pointer via a ref-counting interface,
> > it's illegal to use the pointer after dropping the reference, isn't it?
> > Maybe that's what you mean by the "XXX" comment.
> >
> >> +			break;
> >> +
> >> +		if (pci_is_root_bus(requester->bus))
> >> +			return NULL; /* @bridge not parent to @requestee */
> >> +
> >> +		requester = requester->bus->self;
> >> +	}
> >> +
> >> +	if (requester_id) {
> >> +		if (requester->bus != requestee->bus&&
> >> +		    !pci_bridge_uses_endpoint_requester(requester))
> >> +			*requester_id = PCI_BRIDGE_REQUESTER_ID(requester);
> >> +		else
> >> +			*requester_id = PCI_REQUESTER_ID(requester);
> >> +	}
> >> +
> >> +	return requester;
> >> +}
> >> +
> >> +static int pci_do_requester_callback(struct pci_dev **dev,
> >> +				     int (*fn)(struct pci_dev *,
> >> +					       u16 id, void *),
> >> +				     void *data)
> >> +{
> >> +	struct pci_dev *dma_dev;
> >> +	int ret;
> >> +
> >> +	ret = fn(*dev, PCI_REQUESTER_ID(*dev), data);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	dma_dev = pci_get_dma_source(*dev);
> >> +	pci_dev_put(dma_dev); /* XXX skip ref cnt */
> >> +	if (dma_dev == *dev)
> >
> > Same ref count question as above.
> >
> >> +		return 0;
> >> +
> >> +	ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	*dev = dma_dev;
> >> +	return 0;
> >> +}
> >> +
> >> +/*
> >> + * pcie_for_each_requester - Call callback @fn on each devices and DMA source
> >> + *                           from @requestee to the PCIe requester ID visible
> >> + *                           to @bridge.
> >
> > Transactions from a device may appear with one of several requester IDs,
> > but there's not necessarily an actual pci_dev for each ID, so I think the
> ditto above; have to have a pdev for each id....
> 
> > caller reads better if it's "...for_each_requester_id()"
> >
> > The "Call X on each devices and DMA source from Y to the requester ID"
> > part doesn't quite make a sentence.
> >
> >> + * @requestee: Starting device
> >> + * @bridge: upstream bridge (or NULL for root bus)
> >
> > You should say something about the significance of @bridge.  I think the
> > point is to call @fn for every possible requester ID @bridge could see for
> > transactions from @requestee.  This is a way to learn the requester IDs an
> > IOMMU at @bridge needs to be prepared for.
> >
> >> + * @fn: callback function
> >> + * @data: data to pass to callback
> >> + */
> >> +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
> >> +			    int (*fn)(struct pci_dev *, u16 id, void *),
> >> +			    void *data)
> >> +{
> >> +	struct pci_dev *requester;
> >> +	struct pci_dev *dev = requestee;
> >> +	int ret = 0;
> >> +
> >> +	requester = pci_get_visible_pcie_requester(requestee, bridge, NULL);
> >> +	if (!requester)
> >> +		return -EINVAL;
> >> +
> >> +	do {
> >> +		ret = pci_do_requester_callback(&dev, fn, data);
> >> +		if (ret)
> >> +			return ret;
> >> +
> >> +		if (dev == requester)
> >> +			return 0;
> >> +
> >> +		/*
> >> +		 * We always consider root bus devices to have a visible
> >> +		 * requester ID, therefore this should never be true.
> >> +		 */
> >> +		BUG_ON(pci_is_root_bus(dev->bus));
> >
> > What are we going to do if somebody hits this BUG_ON()?  If it's impossible
> > to hit, we should just remove it.  If it's possible to hit it in some weird
> > topology you didn't consider, we should see IOMMU faults for any requester
> > ID we neglected to map, and that fault would be a better debugging hint
> > than a BUG_ON() here.
> >
> according to spec, all pdev's have a requester-id, even RC ones, albeit
> "implementation specific"...
> 
> >> +
> >> +		dev = dev->bus->self;
> >> +
> >> +	} while (dev != requester);
> >> +
> >> +	/*
> >> +	 * If we've made it here, @requester is a bridge upstream from
> >> +	 * @requestee.
> >> +	 */
> >> +	if (pci_bridge_uses_endpoint_requester(requester))
> >> +		return pci_do_requester_callback(&requester, fn, data);
> >> +
> >> +	return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data);
> >> +}
> >> +
> >> +/*
> >>    * find the upstream PCIe-to-PCI bridge of a PCI device
> >>    * if the device is PCIE, return NULL
> >>    * if the device isn't connected to a PCIe bridge (that is its parent is a
> >> diff --git a/include/linux/pci.h b/include/linux/pci.h
> >> index 3a24e4f..94e81d1 100644
> >> --- a/include/linux/pci.h
> >> +++ b/include/linux/pci.h
> >> @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
> >>   }
> >>   #endif
> >>
> >> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> >> +					       struct pci_dev *bridge,
> >> +					       u16 *requester_id);
> >
> > The structure of this interface implies that there is only one visible
> > requester ID, but the whole point of this patch is that a transaction from
> > @requestee may appear with one of several requester IDs.  So which one will
> > this return?
> >
> Are there devices that use multiple requester id's?
> I know we have ones that use the wrong id.
> If we want to handle the multiple requester-id's per pdev,
> we could pass in a ptr to an initial requester-id; if null, give me first;
> if !null, give me next one;  would also need a flag returned to indicate
> there is more than 1.
> null-rtn when pass in !null ref, means the end.
>   ... sledge-hammer + nail ???

That does not sound safe.  Again, this is why I think the
pcie_for_each_requester() works.  Given a requester, call fn() on every
device with the same requester ID.  That means when you have a bridge,
we call it for the bridge and every device and every device quirk behind
the bridge, fully populating context entries for all of them.  Thanks,

Alex

> >> +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
> >> +			    int (*fn)(struct pci_dev *, u16 id, void *),
> >> +			    void *data);
> >> +
> >>   /**
> >>    * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device
> >>    * @pdev: the PCI device
> >>
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas July 24, 2013, 11:24 p.m. UTC | #9
On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote:
> On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote:
> > On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson
> > <alex.williamson@redhat.com> wrote:
> > > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
> > >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> >   As the DMA
> > transaction propagates through the fabric, it may be tagged by bridges
> > with different requester IDs.
> > 
> > The requester IDs are needed outside PCI (by IOMMU drivers), but I'm
> > not sure the intermediate pci_devs are.
> 
> A u16 requester ID doesn't mean much on it's own though, it's not
> necessarily even unique.  A requester ID associated with the context of
> a pci_dev is unique and gives us a reference point if we need to perform
> another operation on that requester ID.

A u16 requester ID better mean something to an IOMMU -- it's all the
IOMMU can use to look up the correct mapping.  That's why we have to
give the iterator something to define the scope to iterate over.  The
same requester ID could mean something totally unrelated in a
different scope, e.g., below a different IOMMU.

> > Returning both a pci_dev and a requester ID makes it more complicated.
> >  At the hardware level, transactions contain only a requester ID, and
> > bridges can either drop it, pass it unchanged, or assign a new one.  I
> > think the code will be simpler if we just model that.
> 
> I'm not convinced.  Patch 2/2 makes use of both the returned pci_dev and
> the returned requester ID and it's a huge simplification overall.

The IOMMU driver makes mappings so DMA from device A can reach a
buffer.  It needs to know about device A, and it needs to know what
source IDs might appear on those DMA transactions.  Why would it need
to know anything about any bridges between A and the IOMMU?

> > My thought was to pass a pci_dev (the originator of the DMA, which I
> > would call the "requester") and a callback.  The callback would accept
> > the requester pci_dev (always the same requester device) and a
> > requester ID.
> > 
> > This would call @fn for each possible requester ID for transactions
> > from the device.  IOMMU drivers should only need the requester ID to
> > manage mappings; they shouldn't need a pci_dev corresponding to any
> > intermediate bridges.
> 
> This implementation is almost the same with the only change being that
> the pci_dev passed to the callback is the one most closely associated
> with the requester ID.  For the IOMMU driver, it doesn't matter since
> it's only using the requester ID, but why would the callback care about
> the original requester?  If it needed to do something device specific,
> it's going to care about the closest device to the requester ID.

If this can be done without passing a pci_dev at all to the callback,
that would be even better.  If a caller needed the original pci_dev,
it could always pass it via the "void *data".

I don't see the point of passing a "device closest to the requester
ID."  What would the IOMMU do with that?  As far as the IOMMU is
concerned, the requester ID could be an arbitrary number completely
unrelated to a pci_dev.

> > >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> > >> > +                                          struct pci_dev *bridge,
> > >> > +                                          u16 *requester_id);
> > >>
> > >> The structure of this interface implies that there is only one visible
> > >> requester ID, but the whole point of this patch is that a transaction from
> > >> @requestee may appear with one of several requester IDs.  So which one will
> > >> this return?
> > >
> > > I thought the point of this patch was to have an integrated interface
> > > for finding the requester ID and doing something across all devices with
> > > that requester ID
> > 
> > Oh, here's the difference in our understanding.  "Doing something
> > across all devices with that requester ID" sounds like identifying the
> > set of devices that have to be handled as a group by the IOMMU.
> > That's certainly an issue, but I wasn't considering it at all.
> > 
> > I was only concerned with the question of a single device that
> > requires multiple IOMMU mappings because DMA requests might use any of
> > several source IDs.  This is mentioned in sec 3.6.1.1 of the VT-d
> > spec, i.e., "requests arriving with the source-id in the original
> > PCI-X transaction or the source-id provided by the bridge. ...
> > software must program multiple context entries, each corresponding to
> > the possible set of source-ids."
> > 
> > My thought was that the IOMMU driver would use
> > pcie_for_each_requester_id() and have the callback set up a mapping
> > for one requester ID each time it was called.
> 
> Which is exactly what pcie_for_each_requester() does, but I try to solve
> the problem of a single device that needs multiple context entries the
> same as a bridge that masks multiple devices.

I'm not sure I understand your point yet, but I think it's better to
think of this as computing the set of requester IDs we might see from
a given device.  If we think of it as finding the set of devices that
use a given requester ID, then we have to worry about that set being
modified by hotplug.

> > > and thereby remove pci_find_upstream_pcie_bridge(),
> > > provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for
> > > devices that use the wrong requester ID.  In what case does a device
> > > appear to have multiple requester IDs?  We have cases where devices use
> > > the wrong requester ID, but AFAIK they only use a single wrong requester
> > > ID.
> > 
> > I think the example in sec 3.6.1.1 answers this, doesn't it?
> 
> It does, in summary we have to be prepared that a transaction from a
> device behind a PCIe-to-PCI/X bridge may use either the device requester
> ID (bus|devfn) or the bridge (subordinate|0) requester ID.  This is why
> pcie_for_each_requester() makes sense to me, we iterate across all
> requesters that may use the same requester ID.

Let's say device A and B use the same requester ID because they're
behind a bridge.  What happens if we map a buffer for A and then
hot-remove B?  Here's what I think your proposal would do:

  map buffer for A
    callback for A
    callback for B
  hot-remove B
  unmap buffer for A
    callback for A

We never do an unmap callback for B, so if we did anything while
mapping, it will never be undone.

> If we had a
> pcie_for_each_requester_id() how would we handle that?  Do we call it
> once for the (bus|devfn) requester ID of the device we're trying to map
> and then call it again for the (subordinate|0) requester ID of the
> bridge?  That's almost how pci_find_upstream_pcie_bridge() is used and
> it ends up putting a lot of the logic into the driver.  

That *is* what I'm proposing (calling the callback once for each
possible requester ID that could be seen on DMA from the device).  In
my opinion, the problem with pci_find_upstream_pcie_bridge() is that
you only get one device back from it, and the driver has to roll its
own looping.

> How do we write
> a pcie_for_each_requester_id() function that can uniquely identify a PCI
> hierarchy to traverse in the presences of multiple domains?  

I must be missing something in your question.  If we give
pcie_for_each_requester_id() the requester pci_dev, that uniquely
identifies the PCI hierarchy.  Then it's just a question of how far up
the hierarchy we need to go, and the bridge (or bus) argument tells us
that.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas July 25, 2013, 5:19 p.m. UTC | #10
On Wed, Jul 24, 2013 at 04:42:03PM -0400, Don Dutile wrote:
> On 07/23/2013 06:35 PM, Bjorn Helgaas wrote:
> >On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> >>This provides interfaces for drivers to discover the visible PCIe
> >>requester ID for a device, for things like IOMMU setup, and iterate
> >
> >IDs (plural)
> >
> a single device does not have multiple requester id's;
> can have multiple tag-id's (that are ignored in this situation, but
> can be used by switches for ordering purposes), but there's only 1/fcn
> (except for those quirker pdevs!).

Generally a device does not have multiple requester IDs, but the
IOMMU may see one of several requester IDs for DMAs from a given
device because bridges may take ownership of those transactions (sec
3.6.1.1 of the VT-d spec).

Just to be clear, I envision this whole interface as being
specifically for use by IOMMU drivers, so I'm only trying to provide
what's necessary to build IOMMU mappings.

> >>+ * pci_get_visible_pcie_requester - Get requester and requester ID for
> >>+ *                                  @requestee below @bridge
> >>+ * @requestee: requester device
> >>+ * @bridge: upstream bridge (or NULL for root bus)
> >>+ * @requester_id: location to store requester ID or NULL
> >>+ */
> >>+struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> >>+					       struct pci_dev *bridge,
> >>+					       u16 *requester_id)
> >
> >I'm not sure it makes sense to return a struct pci_dev here because
> >there's no requirement that a requester ID correspond to an actual
> >pci_dev.
> >
> well, I would expect the only callers would be for subsys (iommu's)
> searching to find requester-id for a pdev, b/c if a pdev doesn't exist,
> then the device (and requester-id) doesn't exist... :-/

> >>+ * pcie_for_each_requester - Call callback @fn on each devices and DMA source
> >>+ *                           from @requestee to the PCIe requester ID visible
> >>+ *                           to @bridge.
> >
> >Transactions from a device may appear with one of several requester IDs,
> >but there's not necessarily an actual pci_dev for each ID, so I think the
> ditto above; have to have a pdev for each id....

This *might* be true, but I don't think we should rely on it.  For
example:

  00:1c.0 PCIe to PCI bridge to [bus 01]
  01:01.0 PCI endpoint

The bridge will take ownership of DMA transactions from the 01:01.0
endpoint.  An IOMMU on bus 00 will see a bridge-assigned requester
ID of 01:00.0 (subordinate bus number, devfn zero), but there is no
01:00.0 device.

Maybe the rules of conventional PCI require a device zero (I don't
remember), but even if they do, it's ugly to rely on that here
because I don't think device 01:00.0 is relevant to mappings for
device 01:01.0.

Obviously we also have to be aware that 01:00.0 and 01:01.0 can't be
isolated from each other, but I think that issue is separate from
the question of what requester IDs have to be mapped to make 01:01.0
work.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson July 25, 2013, 5:56 p.m. UTC | #11
On Wed, 2013-07-24 at 17:24 -0600, Bjorn Helgaas wrote:
> On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote:
> > On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote:
> > > On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson
> > > <alex.williamson@redhat.com> wrote:
> > > > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
> > > >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> > >   As the DMA
> > > transaction propagates through the fabric, it may be tagged by bridges
> > > with different requester IDs.
> > > 
> > > The requester IDs are needed outside PCI (by IOMMU drivers), but I'm
> > > not sure the intermediate pci_devs are.
> > 
> > A u16 requester ID doesn't mean much on it's own though, it's not
> > necessarily even unique.  A requester ID associated with the context of
> > a pci_dev is unique and gives us a reference point if we need to perform
> > another operation on that requester ID.
> 
> A u16 requester ID better mean something to an IOMMU -- it's all the
> IOMMU can use to look up the correct mapping.  That's why we have to
> give the iterator something to define the scope to iterate over.  The
> same requester ID could mean something totally unrelated in a
> different scope, e.g., below a different IOMMU.

The point I'm trying to make is that a requester ID depends on it's
context (minimally, the PCI segment).  The caller can assume the context
based on the calling parameters or we can provide context in the form of
an associated pci_dev.  I chose the latter path because I prefer
explicit interfaces and it has some usefulness in the intel-iommu
implementation.

For instance, get_domain_for_dev() first looks to see if a pci_dev
already has a domain.  If it doesn't, we want to look to see if there's
an upstream device that would use the same requester ID that already has
a domain.  If our get-requester-ID-for-device function returns only the
requester ID, we don't know if that requester ID is the device we're
searching from or some upstream device.  Therefore we potentially do an
unnecessary search for the domain.

The other user is intel_iommu_add_device() where we're trying to build
IOMMU groups.  Visibility is the first requirement of an IOMMU group.
If the IOMMU cannot distinguish between devices, they must be part of
the same IOMMU group.  Here we want to find the pci_dev that hosts the
requester ID.  I don't even know how we'd implement this with a function
that only returned the requester ID.  Perhaps we'd have to walk upstream
from the device calling the get-requester-ID-for-device function at each
step and noticing when it changed.  That moves significant logic back
into the caller code.

> > > Returning both a pci_dev and a requester ID makes it more complicated.
> > >  At the hardware level, transactions contain only a requester ID, and
> > > bridges can either drop it, pass it unchanged, or assign a new one.  I
> > > think the code will be simpler if we just model that.
> > 
> > I'm not convinced.  Patch 2/2 makes use of both the returned pci_dev and
> > the returned requester ID and it's a huge simplification overall.
> 
> The IOMMU driver makes mappings so DMA from device A can reach a
> buffer.  It needs to know about device A, and it needs to know what
> source IDs might appear on those DMA transactions.  Why would it need
> to know anything about any bridges between A and the IOMMU?

Maybe it doesn't, but it might want to know about the top level bridge
for the examples above and I prefer that we be consistent in providing
both a requester ID and context device.  It's not currently used in the
callback function, but I could imagine it could be.  For instance, what
if there are multiple endpoints downstream of a bridge and the IOMMU
wants to use the bridge pci_dev to track how many routes it's managing
through this bridge.

I think the latter is actually a current bug in intel-iommu as noted by
the XXX in patch 2/2.  iommu_detach_dependent_devices() will tear down
all context entries along a path, but there's no reference counting as
to how many devices are using that path.  Should that reference counting
be associated with the pci_dev?  I suppose the IOMMU could already have
data structures per requester ID or could create them and use an idr for
lookup, but does this interface want to require that?

> > > My thought was to pass a pci_dev (the originator of the DMA, which I
> > > would call the "requester") and a callback.  The callback would accept
> > > the requester pci_dev (always the same requester device) and a
> > > requester ID.
> > > 
> > > This would call @fn for each possible requester ID for transactions
> > > from the device.  IOMMU drivers should only need the requester ID to
> > > manage mappings; they shouldn't need a pci_dev corresponding to any
> > > intermediate bridges.
> > 
> > This implementation is almost the same with the only change being that
> > the pci_dev passed to the callback is the one most closely associated
> > with the requester ID.  For the IOMMU driver, it doesn't matter since
> > it's only using the requester ID, but why would the callback care about
> > the original requester?  If it needed to do something device specific,
> > it's going to care about the closest device to the requester ID.
> 
> If this can be done without passing a pci_dev at all to the callback,
> that would be even better.  If a caller needed the original pci_dev,
> it could always pass it via the "void *data".

Agreed, the original "device A" is the caller's problem if it wants to
know about it, but I think providing full context of a requester ID
(which means a pci_dev) makes sense too.

> I don't see the point of passing a "device closest to the requester
> ID."  What would the IOMMU do with that?  As far as the IOMMU is
> concerned, the requester ID could be an arbitrary number completely
> unrelated to a pci_dev.

Do you have an example of a case where a requester ID doesn't have some
association to a pci_dev?

> > > >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> > > >> > +                                          struct pci_dev *bridge,
> > > >> > +                                          u16 *requester_id);
> > > >>
> > > >> The structure of this interface implies that there is only one visible
> > > >> requester ID, but the whole point of this patch is that a transaction from
> > > >> @requestee may appear with one of several requester IDs.  So which one will
> > > >> this return?
> > > >
> > > > I thought the point of this patch was to have an integrated interface
> > > > for finding the requester ID and doing something across all devices with
> > > > that requester ID
> > > 
> > > Oh, here's the difference in our understanding.  "Doing something
> > > across all devices with that requester ID" sounds like identifying the
> > > set of devices that have to be handled as a group by the IOMMU.
> > > That's certainly an issue, but I wasn't considering it at all.
> > > 
> > > I was only concerned with the question of a single device that
> > > requires multiple IOMMU mappings because DMA requests might use any of
> > > several source IDs.  This is mentioned in sec 3.6.1.1 of the VT-d
> > > spec, i.e., "requests arriving with the source-id in the original
> > > PCI-X transaction or the source-id provided by the bridge. ...
> > > software must program multiple context entries, each corresponding to
> > > the possible set of source-ids."
> > > 
> > > My thought was that the IOMMU driver would use
> > > pcie_for_each_requester_id() and have the callback set up a mapping
> > > for one requester ID each time it was called.
> > 
> > Which is exactly what pcie_for_each_requester() does, but I try to solve
> > the problem of a single device that needs multiple context entries the
> > same as a bridge that masks multiple devices.
> 
> I'm not sure I understand your point yet, but I think it's better to
> think of this as computing the set of requester IDs we might see from
> a given device.  If we think of it as finding the set of devices that
> use a given requester ID, then we have to worry about that set being
> modified by hotplug.

Perhaps I'm explain it wrong, I've had to refresh myself on how this all
works since I posted it.  Looking at pcie_for_each_requester() we first
find the requester ID and associated device for the requested device
(yet another users of the associated device part of the interface).  We
then walk up from the requested device to the associated device, calling
the callback on each bridge or dma quirk in between.  So, I was wrong to
indicate that pcie_for_each_requester() does anything across all devices
with the same requester ID.  It traverses all devices and quirks of
devices along the path from the requested device to the device
associated with the requester ID.

> > > > and thereby remove pci_find_upstream_pcie_bridge(),
> > > > provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for
> > > > devices that use the wrong requester ID.  In what case does a device
> > > > appear to have multiple requester IDs?  We have cases where devices use
> > > > the wrong requester ID, but AFAIK they only use a single wrong requester
> > > > ID.
> > > 
> > > I think the example in sec 3.6.1.1 answers this, doesn't it?
> > 
> > It does, in summary we have to be prepared that a transaction from a
> > device behind a PCIe-to-PCI/X bridge may use either the device requester
> > ID (bus|devfn) or the bridge (subordinate|0) requester ID.  This is why
> > pcie_for_each_requester() makes sense to me, we iterate across all
> > requesters that may use the same requester ID.
> 
> Let's say device A and B use the same requester ID because they're
> behind a bridge.  What happens if we map a buffer for A and then
> hot-remove B?  Here's what I think your proposal would do:
> 
>   map buffer for A
>     callback for A
>     callback for B
>   hot-remove B
>   unmap buffer for A
>     callback for A
> 
> We never do an unmap callback for B, so if we did anything while
> mapping, it will never be undone.

So I think I've again lost my context since I posted this, apologies.
pcie_for_each_requester() is implemented to handle the path, not the
subtree.  I believe what pcie_for_each_requester() as written would do
is:

   map buffer for A
     callback for A
   hot-remove B
   unmap buffer for A
     callback for A

Furthermore, if we have:

     -- A
    /
X--Y
    \
     -- B

Where X is the associated device for the requester ID, we'd do:

  map buffer for A
    callback for A
    callback for Y
    callback for X
  hot-remove B
  unmap buffer for A
    callback for A
    callback for Y
    callback for X

That hot-remove though is where I think intel-iommu has a bug as the set
of dependent devices for B includes Y & X, which would then have their
context entries cleared.  That's yet another intel-iommu bug though, not
anything that requires a change to this interface.

> > If we had a
> > pcie_for_each_requester_id() how would we handle that?  Do we call it
> > once for the (bus|devfn) requester ID of the device we're trying to map
> > and then call it again for the (subordinate|0) requester ID of the
> > bridge?  That's almost how pci_find_upstream_pcie_bridge() is used and
> > it ends up putting a lot of the logic into the driver.  
> 
> That *is* what I'm proposing (calling the callback once for each
> possible requester ID that could be seen on DMA from the device).  In
> my opinion, the problem with pci_find_upstream_pcie_bridge() is that
> you only get one device back from it, and the driver has to roll its
> own looping.

I'm not sure if my argument above is still relevant, but what I was
trying to suggest was that the caller would need to make multiple calls,
which puts us in the same boat as the roll-your-own loop with
find_upstream_pcie_bridge.  Of course the callback needs to be called
for each requester ID.

Let me go back to the X-Y-A|B example above to see if I can explain why
pcie_for_each_requester_id() doesn't make sense to me.  Generally a
for_each_foo function should iterate across all things with the same
foo.  So a for_each_requester_id should iterate across all things with
the same requester ID.  If we look at an endpoint device like A, only A
has A's requester ID.  Therefore, why would for_each_requester_id(A)
traverse up to Y?  Y can take ownership and become the requester for A,
but if we were to call for_each_requester_id(Y), wouldn't you expect the
callback to happen on {Y, A, B} since all of those can use that
requester ID?  That makes it a downstream looking for_each callback
which is not really want we want.

As you indicate above, we want an upstream looking function.  If I call
for_each_requester(A), A is obviously a requester for A.  Y can take
ownership of a transaction from A, therefore we must assume Y is a
requester for A.  Same with X.  I believe this is what we want and this
is what the code does.

> > How do we write
> > a pcie_for_each_requester_id() function that can uniquely identify a PCI
> > hierarchy to traverse in the presences of multiple domains?  
> 
> I must be missing something in your question.  If we give
> pcie_for_each_requester_id() the requester pci_dev, that uniquely
> identifies the PCI hierarchy.  Then it's just a question of how far up
> the hierarchy we need to go, and the bridge (or bus) argument tells us
> that.

I think the code actually does what you want, so we're only in
disagreement about the naming and usefulness (appropriate-ness) of
passing a pci_dev associated with a requester ID.  Is that correct?

There are several examples above about the associated device being
useful to the implementation.  It may not be required, but it's only
going to make the caller implement more code on their side not to
provide it.  The for_each_requester_id vs for_each_requester is a matter
of personal taste, but I think for_each_requester actually makes more
sense given the parameters and the upstream vs downstream nature of the
call.  Sorry again for the confusion, I can barely identify my own code
after a couple weeks.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Donald Dutile July 25, 2013, 6:25 p.m. UTC | #12
On 07/25/2013 01:19 PM, Bjorn Helgaas wrote:
> On Wed, Jul 24, 2013 at 04:42:03PM -0400, Don Dutile wrote:
>> On 07/23/2013 06:35 PM, Bjorn Helgaas wrote:
>>> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
>>>> This provides interfaces for drivers to discover the visible PCIe
>>>> requester ID for a device, for things like IOMMU setup, and iterate
>>>
>>> IDs (plural)
>>>
>> a single device does not have multiple requester id's;
>> can have multiple tag-id's (that are ignored in this situation, but
>> can be used by switches for ordering purposes), but there's only 1/fcn
>> (except for those quirker pdevs!).
>
> Generally a device does not have multiple requester IDs, but the
> IOMMU may see one of several requester IDs for DMAs from a given
> device because bridges may take ownership of those transactions (sec
> 3.6.1.1 of the VT-d spec).
>
> Just to be clear, I envision this whole interface as being
> specifically for use by IOMMU drivers, so I'm only trying to provide
> what's necessary to build IOMMU mappings.
>
This is all just PCIe-PCI bridge spec; the code Alex is proposing handles this case.

>>>> + * pci_get_visible_pcie_requester - Get requester and requester ID for
>>>> + *                                  @requestee below @bridge
>>>> + * @requestee: requester device
>>>> + * @bridge: upstream bridge (or NULL for root bus)
>>>> + * @requester_id: location to store requester ID or NULL
>>>> + */
>>>> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
>>>> +					       struct pci_dev *bridge,
>>>> +					       u16 *requester_id)
>>>
>>> I'm not sure it makes sense to return a struct pci_dev here because
>>> there's no requirement that a requester ID correspond to an actual
>>> pci_dev.
>>>
>> well, I would expect the only callers would be for subsys (iommu's)
>> searching to find requester-id for a pdev, b/c if a pdev doesn't exist,
>> then the device (and requester-id) doesn't exist... :-/
>
>>>> + * pcie_for_each_requester - Call callback @fn on each devices and DMA source
>>>> + *                           from @requestee to the PCIe requester ID visible
>>>> + *                           to @bridge.
>>>
>>> Transactions from a device may appear with one of several requester IDs,
>>> but there's not necessarily an actual pci_dev for each ID, so I think the
>> ditto above; have to have a pdev for each id....
>
> This *might* be true, but I don't think we should rely on it.  For
> example:
>
>    00:1c.0 PCIe to PCI bridge to [bus 01]
>    01:01.0 PCI endpoint
>
> The bridge will take ownership of DMA transactions from the 01:01.0
> endpoint.  An IOMMU on bus 00 will see a bridge-assigned requester
> ID of 01:00.0 (subordinate bus number, devfn zero), but there is no
> 01:00.0 device.
>
Clarification:
I meant that each requester-id must have at least 1 PCI device associated
with it.  when PCIe-PCI(X) bridges are in the path, then iommu-groups
lumps multiple requester-ids into a single group, and all devices behind
the bridges are lumped into that group.
The only time its possible for the requester-id for a device behind a
PCIe-PCI(X) Bridge to not be <secondary-bus>:0.0 is when the device is PCI-X.
I can't find anything in the spec that such a PPB can tell the sw what
requester-id alg it's using (pass PCI-X device's requester-id, or use <sbus>:0.0).
The (vtd) spec does say that all device behind a bridge (PCI-X or PCI) must
be in the same domain, which means same iommu group.
So, comments in the code or reviewers implied a requester-id could exist
w/o a pdev; IMO, the (iommu?) code ought to track/tag req-id -> pdev and vice-versa.

> Maybe the rules of conventional PCI require a device zero (I don't
> remember), but even if they do, it's ugly to rely on that here
> because I don't think device 01:00.0 is relevant to mappings for
> device 01:01.0.
>
no requirement for any device-number on a PCI bus.

> Obviously we also have to be aware that 01:00.0 and 01:01.0 can't be
> isolated from each other, but I think that issue is separate from
> the question of what requester IDs have to be mapped to make 01:01.0
> work.
>
yup.
> Bjorn

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Donald Dutile July 25, 2013, 6:38 p.m. UTC | #13
On 07/24/2013 05:22 PM, Alex Williamson wrote:
> On Wed, 2013-07-24 at 16:42 -0400, Don Dutile wrote:
>> On 07/23/2013 06:35 PM, Bjorn Helgaas wrote:
>>> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
>>>> This provides interfaces for drivers to discover the visible PCIe
>>>> requester ID for a device, for things like IOMMU setup, and iterate
>>>
>>> IDs (plural)
>>>
>> a single device does not have multiple requester id's;
>> can have multiple tag-id's (that are ignored in this situation, but
>> can be used by switches for ordering purposes), but there's only 1/fcn
>> (except for those quirker pdevs!).
>>
>>>> over the device chain from requestee to requester, including DMA
>>>> quirks at each step.
>>>
>>> "requestee" doesn't make sense to me.  The "-ee" suffix added to a verb
>>> normally makes a noun that refers to the object of the action.  So
>>> "requestee" sounds like it means something like "target" or "responder,"
>>> but that's not what you mean here.
>>>
>> sorry, I didn't follow the requester/requestee terminology either...
>>
>>>> Suggested-by: Bjorn Helgaas<bhelgaas@google.com>
>>>> Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
>>>> ---
>>>>    drivers/pci/search.c |  198 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    include/linux/pci.h  |    7 ++
>>>>    2 files changed, 205 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
>>>> index d0627fa..4759c02 100644
>>>> --- a/drivers/pci/search.c
>>>> +++ b/drivers/pci/search.c
>>>> @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem);
>>>>    EXPORT_SYMBOL_GPL(pci_bus_sem);
>>>>
>>>>    /*
>>>> + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID
>>>> + * @dev: device to test
>>>> + */
>>>> +static bool pci_has_pcie_requester_id(struct pci_dev *dev)
>>>> +{
>>>> +	/*
>>>> +	 * XXX There's no indicator of the bus type, conventional PCI vs
>>>> +	 * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe
>>>> +	 * requester ID is a native PCIe based system (such as VT-d or
>>>> +	 * AMD-Vi).  It's common that PCIe root complex devices do not
>> could the above comment be x86-iommu-neutral?
>> by definition of PCIe, all devices have a requester id (min. to accept cfg cycles);
>> req'd if source of read/write requests, read completions.
>
> I agree completely, the question is whether we have a PCIe root complex
> or a conventional PCI host bus.  I don't think we have any way to tell,
> so I'm assuming pci_is_root_bus() indicates we're on a PCIe root complex
> and therefore have requester IDs.  If there's some way to determine this
> let me know and we can avoid any kind of assumption.
>
>>>> +	 * include a PCIe capability, but we can assume they are PCIe
>>>> +	 * devices based on their topology.
>>>> +	 */
>>>> +	if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus))
>>>> +		return true;
>>>> +
>>>> +	/*
>>>> +	 * PCI-X devices have a requester ID, but the bridge may still take
>>>> +	 * ownership of transactions and create a requester ID.  We therefore
>>>> +	 * assume that the PCI-X requester ID is not the same one used on PCIe.
>>>> +	 */
>>>> +
>>>> +#ifdef CONFIG_PCI_QUIRKS
>>>> +	/*
>>>> +	 * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability.
>>>> +	 * If the device is a bridge, look to the next device upstream of it.
>>>> +	 * If that device is PCIe and not a PCIe-to-PCI bridge, then by
>>>> +	 * deduction, the device must be PCIe and therefore has a requester ID.
>>>> +	 */
>>>> +	if (dev->subordinate) {
>>>> +		struct pci_dev *parent = dev->bus->self;
>>>> +
>>>> +		if (pci_is_pcie(parent)&&
>>>> +		    pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
>>>> +			return true;
>>>> +	}
>>>> +#endif
>>>> +
>>>> +	return false;
>>>> +}
>>>> +
>>>> +/*
>>>> + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID?
>>>> + * @dev: requester device
>>>> + * @bridge: upstream bridge (or NULL for root bus)
>>>> + */
>>>> +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev,
>>>> +					      struct pci_dev *bridge)
>>>> +{
>>>> +	/*
>>>> +	 * The entire path must be tested, if any step does not have a
>>>> +	 * requester ID, the chain is broken.  This allows us to support
>>>> +	 * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe
>>>> +	 */
>>>> +	while (dev != bridge) {
>>>> +		if (!pci_has_pcie_requester_id(dev))
>>>> +			return false;
>>>> +
>>>> +		if (pci_is_root_bus(dev->bus))
>>>> +			return !bridge; /* false if we don't hit @bridge */
>>>> +
>>>> +		dev = dev->bus->self;
>>>> +	}
>>>> +
>>>> +	return true;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Legacy PCI bridges within a root complex (ex. Intel 82801) report
>>>> + * a different requester ID than a standard PCIe-to-PCI bridge.  Instead
>> First, I'm assuming you mean that devices behind a Legacy PCI bridge within a root complex
>> get assigned IDs different than std PCIe-to-PCI bridges (as quoted below).
>
> Yes
>
>>>> + * of using (subordinate<<   8 | 0) the use (bus<<   8 | devfn), like a
>>>
>>> s/the/they/
>>>
>> well, the PPBs should inject their (secondary<<  8 | 0), not subordinate.
>>   From the PCI Express to PCI/PCI-X Bridge spec, v1.0:
>> The Tag is reassigned by the bridge according to the rules outlined in the following sections. If the
>> bridge generates a new Requester ID for a transaction forwarded from the secondary interface to the
>> primary interface, the bridge assigns the PCI Express Requester ID using its secondary interface’s
>>                                                                                ^^^^^^^^^
>> Bus Number and sets both the Device Number and Function Number fields to zero.
>> ^^^^^^^^^^
>
> I'm referring to (pdev->subordinate->number<<  8 | 0) vs
> (pdev->bus->number<<  8 | pdev->devfn).  The subordinate struct pci_bus
> is the secondary interface bus.
>
>> As for the 82801, looks like they took this part of the PCIe spec to heart:
>> (PCIe v3 spec, section 2.2.6.2 Transaction Descriptor - Transaction ID Field):
>> Exception: The assignment of Bus and Device Numbers to the Devices within a Root Complex,
>> and the Device Numbers to the Downstream Ports within a Switch, may be done in an
>> implementation specific way.
>> Obviously, you're missing the 'implementation-specific way' compiler... ;-)
>
> So this is potentially Intel specific.  How can we tell it's an Intel
> root complex?
>
Good question. ... another question to ask Intel.... :(

>> aw: which 'bus' do you mean above in  '(bus<<   8 | devfn)' ?
>
> (pdev->bus->number<<  8 | pdev->devfn)
>
k.

>>> Did you learn about this empirically?  Intel spec?  I wonder if there's
>>> some way to derive this from the PCIe specs.
>>>
>>>> + * standard PCIe endpoint.  This function detects them.
>>>> + *
>>>> + * XXX Is this Intel vendor ID specific?
>>>> + */
>>>> +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge)
>>>> +{
>>>> +	if (!pci_is_pcie(bridge)&&   pci_is_root_bus(bridge->bus))
>>>> +		return true;
>>>> +
>>>> +	return false;
>>>> +}
>>>> +
>>>> +#define PCI_REQUESTER_ID(dev)	(((dev)->bus->number<<   8) | (dev)->devfn)
>>>> +#define PCI_BRIDGE_REQUESTER_ID(dev)	((dev)->subordinate->number<<   8)
>>>> +
>>>> +/*
>>>> + * pci_get_visible_pcie_requester - Get requester and requester ID for
>>>> + *                                  @requestee below @bridge
>>>> + * @requestee: requester device
>>>> + * @bridge: upstream bridge (or NULL for root bus)
>>>> + * @requester_id: location to store requester ID or NULL
>>>> + */
>>>> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
>>>> +					       struct pci_dev *bridge,
>>>> +					       u16 *requester_id)
>>>
>>> I'm not sure it makes sense to return a struct pci_dev here because
>>> there's no requirement that a requester ID correspond to an actual
>>> pci_dev.
>>>
>> well, I would expect the only callers would be for subsys (iommu's)
>> searching to find requester-id for a pdev, b/c if a pdev doesn't exist,
>> then the device (and requester-id) doesn't exist... :-/
>
> One of the cases Bjorn is referring to is probably the simple case of a
> PCIe-to-PCI bridge.  The requester ID is (bridge->subordinate->number<<
> 8 | 0), which is not an actual device.  As coded here, the function
> returns bridge, but requester_id is (bridge->subordinate->number<<  8 |
> 0).
>
right, that requester-id doesn't map (most likely) to a pdev w/matching BDF.

>>>> +{
>>>> +	struct pci_dev *requester = requestee;
>>>> +
>>>> +	while (requester != bridge) {
>>>> +		requester = pci_get_dma_source(requester);
>>>> +		pci_dev_put(requester); /* XXX skip ref cnt */
>>>> +
>>>> +		if (pci_has_visible_pcie_requester_id(requester, bridge))
>>>
>>> If we acquire the "requester" pointer via a ref-counting interface,
>>> it's illegal to use the pointer after dropping the reference, isn't it?
>>> Maybe that's what you mean by the "XXX" comment.
>>>
>>>> +			break;
>>>> +
>>>> +		if (pci_is_root_bus(requester->bus))
>>>> +			return NULL; /* @bridge not parent to @requestee */
>>>> +
>>>> +		requester = requester->bus->self;
>>>> +	}
>>>> +
>>>> +	if (requester_id) {
>>>> +		if (requester->bus != requestee->bus&&
>>>> +		    !pci_bridge_uses_endpoint_requester(requester))
>>>> +			*requester_id = PCI_BRIDGE_REQUESTER_ID(requester);
>>>> +		else
>>>> +			*requester_id = PCI_REQUESTER_ID(requester);
>>>> +	}
>>>> +
>>>> +	return requester;
>>>> +}
>>>> +
>>>> +static int pci_do_requester_callback(struct pci_dev **dev,
>>>> +				     int (*fn)(struct pci_dev *,
>>>> +					       u16 id, void *),
>>>> +				     void *data)
>>>> +{
>>>> +	struct pci_dev *dma_dev;
>>>> +	int ret;
>>>> +
>>>> +	ret = fn(*dev, PCI_REQUESTER_ID(*dev), data);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	dma_dev = pci_get_dma_source(*dev);
>>>> +	pci_dev_put(dma_dev); /* XXX skip ref cnt */
>>>> +	if (dma_dev == *dev)
>>>
>>> Same ref count question as above.
>>>
>>>> +		return 0;
>>>> +
>>>> +	ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	*dev = dma_dev;
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * pcie_for_each_requester - Call callback @fn on each devices and DMA source
>>>> + *                           from @requestee to the PCIe requester ID visible
>>>> + *                           to @bridge.
>>>
>>> Transactions from a device may appear with one of several requester IDs,
>>> but there's not necessarily an actual pci_dev for each ID, so I think the
>> ditto above; have to have a pdev for each id....
>>
>>> caller reads better if it's "...for_each_requester_id()"
>>>
>>> The "Call X on each devices and DMA source from Y to the requester ID"
>>> part doesn't quite make a sentence.
>>>
>>>> + * @requestee: Starting device
>>>> + * @bridge: upstream bridge (or NULL for root bus)
>>>
>>> You should say something about the significance of @bridge.  I think the
>>> point is to call @fn for every possible requester ID @bridge could see for
>>> transactions from @requestee.  This is a way to learn the requester IDs an
>>> IOMMU at @bridge needs to be prepared for.
>>>
>>>> + * @fn: callback function
>>>> + * @data: data to pass to callback
>>>> + */
>>>> +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
>>>> +			    int (*fn)(struct pci_dev *, u16 id, void *),
>>>> +			    void *data)
>>>> +{
>>>> +	struct pci_dev *requester;
>>>> +	struct pci_dev *dev = requestee;
>>>> +	int ret = 0;
>>>> +
>>>> +	requester = pci_get_visible_pcie_requester(requestee, bridge, NULL);
>>>> +	if (!requester)
>>>> +		return -EINVAL;
>>>> +
>>>> +	do {
>>>> +		ret = pci_do_requester_callback(&dev, fn, data);
>>>> +		if (ret)
>>>> +			return ret;
>>>> +
>>>> +		if (dev == requester)
>>>> +			return 0;
>>>> +
>>>> +		/*
>>>> +		 * We always consider root bus devices to have a visible
>>>> +		 * requester ID, therefore this should never be true.
>>>> +		 */
>>>> +		BUG_ON(pci_is_root_bus(dev->bus));
>>>
>>> What are we going to do if somebody hits this BUG_ON()?  If it's impossible
>>> to hit, we should just remove it.  If it's possible to hit it in some weird
>>> topology you didn't consider, we should see IOMMU faults for any requester
>>> ID we neglected to map, and that fault would be a better debugging hint
>>> than a BUG_ON() here.
>>>
>> according to spec, all pdev's have a requester-id, even RC ones, albeit
>> "implementation specific"...
>>
>>>> +
>>>> +		dev = dev->bus->self;
>>>> +
>>>> +	} while (dev != requester);
>>>> +
>>>> +	/*
>>>> +	 * If we've made it here, @requester is a bridge upstream from
>>>> +	 * @requestee.
>>>> +	 */
>>>> +	if (pci_bridge_uses_endpoint_requester(requester))
>>>> +		return pci_do_requester_callback(&requester, fn, data);
>>>> +
>>>> +	return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data);
>>>> +}
>>>> +
>>>> +/*
>>>>     * find the upstream PCIe-to-PCI bridge of a PCI device
>>>>     * if the device is PCIE, return NULL
>>>>     * if the device isn't connected to a PCIe bridge (that is its parent is a
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index 3a24e4f..94e81d1 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
>>>>    }
>>>>    #endif
>>>>
>>>> +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
>>>> +					       struct pci_dev *bridge,
>>>> +					       u16 *requester_id);
>>>
>>> The structure of this interface implies that there is only one visible
>>> requester ID, but the whole point of this patch is that a transaction from
>>> @requestee may appear with one of several requester IDs.  So which one will
>>> this return?
>>>
>> Are there devices that use multiple requester id's?
>> I know we have ones that use the wrong id.
>> If we want to handle the multiple requester-id's per pdev,
>> we could pass in a ptr to an initial requester-id; if null, give me first;
>> if !null, give me next one;  would also need a flag returned to indicate
>> there is more than 1.
>> null-rtn when pass in !null ref, means the end.
>>    ... sledge-hammer + nail ???
>
> That does not sound safe.  Again, this is why I think the
> pcie_for_each_requester() works.  Given a requester, call fn() on every
> device with the same requester ID.  That means when you have a bridge,
> we call it for the bridge and every device and every device quirk behind
> the bridge, fully populating context entries for all of them.  Thanks,
>
> Alex
>
ok.

>>>> +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
>>>> +			    int (*fn)(struct pci_dev *, u16 id, void *),
>>>> +			    void *data);
>>>> +
>>>>    /**
>>>>     * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device
>>>>     * @pdev: the PCI device
>>>>
>>
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas July 26, 2013, 7:48 p.m. UTC | #14
On Thu, Jul 25, 2013 at 02:25:04PM -0400, Don Dutile wrote:
> On 07/25/2013 01:19 PM, Bjorn Helgaas wrote:
> >On Wed, Jul 24, 2013 at 04:42:03PM -0400, Don Dutile wrote:
> >>On 07/23/2013 06:35 PM, Bjorn Helgaas wrote:
> >>>On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> >>>>+ * pcie_for_each_requester - Call callback @fn on each devices and DMA source
> >>>>+ *                           from @requestee to the PCIe requester ID visible
> >>>>+ *                           to @bridge.
> >>>
> >>>Transactions from a device may appear with one of several requester IDs,
> >>>but there's not necessarily an actual pci_dev for each ID, so I think the
> >>ditto above; have to have a pdev for each id....
> >
> >This *might* be true, but I don't think we should rely on it.  For
> >example:
> >
> >   00:1c.0 PCIe to PCI bridge to [bus 01]
> >   01:01.0 PCI endpoint
> >
> >The bridge will take ownership of DMA transactions from the 01:01.0
> >endpoint.  An IOMMU on bus 00 will see a bridge-assigned requester
> >ID of 01:00.0 (subordinate bus number, devfn zero), but there is no
> >01:00.0 device.
> >
> Clarification:
> I meant that each requester-id must have at least 1 PCI device associated
> with it.

I don't think that's true, as in the example above.  Requester ID
0x0100 has no pci_dev associated with it.  What am I missing?

Maybe you mean that requester ID 0x0100 is associated with pci_dev
01:01.0 in the sense that DMAs from 01:01.0 appear with that ID?
That's true, but I can't think of a reason why we would start with
ID 0x0100 and try to look up 01:01.0 from it.  And of course, if you
*did* try to look up the device, there could be several of them.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Donald Dutile July 26, 2013, 8:04 p.m. UTC | #15
On 07/26/2013 03:48 PM, Bjorn Helgaas wrote:
> On Thu, Jul 25, 2013 at 02:25:04PM -0400, Don Dutile wrote:
>> On 07/25/2013 01:19 PM, Bjorn Helgaas wrote:
>>> On Wed, Jul 24, 2013 at 04:42:03PM -0400, Don Dutile wrote:
>>>> On 07/23/2013 06:35 PM, Bjorn Helgaas wrote:
>>>>> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
>>>>>> + * pcie_for_each_requester - Call callback @fn on each devices and DMA source
>>>>>> + *                           from @requestee to the PCIe requester ID visible
>>>>>> + *                           to @bridge.
>>>>>
>>>>> Transactions from a device may appear with one of several requester IDs,
>>>>> but there's not necessarily an actual pci_dev for each ID, so I think the
>>>> ditto above; have to have a pdev for each id....
>>>
>>> This *might* be true, but I don't think we should rely on it.  For
>>> example:
>>>
>>>    00:1c.0 PCIe to PCI bridge to [bus 01]
>>>    01:01.0 PCI endpoint
>>>
>>> The bridge will take ownership of DMA transactions from the 01:01.0
>>> endpoint.  An IOMMU on bus 00 will see a bridge-assigned requester
>>> ID of 01:00.0 (subordinate bus number, devfn zero), but there is no
>>> 01:00.0 device.
>>>
>> Clarification:
>> I meant that each requester-id must have at least 1 PCI device associated
>> with it.
>
> I don't think that's true, as in the example above.  Requester ID
> 0x0100 has no pci_dev associated with it.  What am I missing?
>
> Maybe you mean that requester ID 0x0100 is associated with pci_dev
> 01:01.0 in the sense that DMAs from 01:01.0 appear with that ID?
yes.
> That's true, but I can't think of a reason why we would start with
> ID 0x0100 and try to look up 01:01.0 from it.  And of course, if you
other than to figure out errant device behavior, or a bug in sw. ;)

> *did* try to look up the device, there could be several of them.
>
> Bjorn
What's most important for the consumers (IOMMUs) is to get all the requester-id's
related to a (p)dev for mapping & unmapping setup & teardown, resp.

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas July 26, 2013, 9:54 p.m. UTC | #16
On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote:
> On Wed, 2013-07-24 at 17:24 -0600, Bjorn Helgaas wrote:
> > On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote:
> > > On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote:
> > > > On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson
> > > > <alex.williamson@redhat.com> wrote:
> > > > > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
> > > > >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> > > >   As the DMA
> > > > transaction propagates through the fabric, it may be tagged by bridges
> > > > with different requester IDs.
> > > > 
> > > > The requester IDs are needed outside PCI (by IOMMU drivers), but I'm
> > > > not sure the intermediate pci_devs are.
> > > 
> > > A u16 requester ID doesn't mean much on it's own though, it's not
> > > necessarily even unique.  A requester ID associated with the context of
> > > a pci_dev is unique and gives us a reference point if we need to perform
> > > another operation on that requester ID.
> > 
> > A u16 requester ID better mean something to an IOMMU -- it's all the
> > IOMMU can use to look up the correct mapping.  That's why we have to
> > give the iterator something to define the scope to iterate over.  The
> > same requester ID could mean something totally unrelated in a
> > different scope, e.g., below a different IOMMU.
> 
> The point I'm trying to make is that a requester ID depends on it's
> context (minimally, the PCI segment).  The caller can assume the context
> based on the calling parameters or we can provide context in the form of
> an associated pci_dev.  I chose the latter path because I prefer
> explicit interfaces and it has some usefulness in the intel-iommu
> implementation.
> 
> For instance, get_domain_for_dev() first looks to see if a pci_dev
> already has a domain.  If it doesn't, we want to look to see if there's
> an upstream device that would use the same requester ID that already has
> a domain.  If our get-requester-ID-for-device function returns only the
> requester ID, we don't know if that requester ID is the device we're
> searching from or some upstream device.  Therefore we potentially do an
> unnecessary search for the domain.
> 
> The other user is intel_iommu_add_device() where we're trying to build
> IOMMU groups.  Visibility is the first requirement of an IOMMU group.
> If the IOMMU cannot distinguish between devices, they must be part of
> the same IOMMU group.  Here we want to find the pci_dev that hosts the
> requester ID.  I don't even know how we'd implement this with a function
> that only returned the requester ID.  Perhaps we'd have to walk upstream
> from the device calling the get-requester-ID-for-device function at each
> step and noticing when it changed.  That moves significant logic back
> into the caller code.
> ...

> > I don't see the point of passing a "device closest to the requester
> > ID."  What would the IOMMU do with that?  As far as the IOMMU is
> > concerned, the requester ID could be an arbitrary number completely
> > unrelated to a pci_dev.
> 
> Do you have an example of a case where a requester ID doesn't have some
> association to a pci_dev?

I think our confusion here is the same as what Don & I have been
hashing out -- I'm saying a requester ID fabricated by a bridge
need not correspond to a specific pci_dev, and you probably mean
that every requester ID is by definition the result of *some* PCI
device making a DMA request.

> ...
> Furthermore, if we have:
> 
>      -- A
>     /
> X--Y
>     \
>      -- B
> ... 

> Let me go back to the X-Y-A|B example above to see if I can explain why
> pcie_for_each_requester_id() doesn't make sense to me.  Generally a
> for_each_foo function should iterate across all things with the same
> foo.  So a for_each_requester_id should iterate across all things with
> the same requester ID.  

Hm, that's not the way I think of for_each_FOO() interfaces.  I
think of it as "execute the body (or callback) for every possible
FOO", where FOO is different each time.  for_each_pci_dev(),
pci_bus_for_each_resource(), for_each_zone(), for_each_cpu(), etc.,
work like that.

But the more important question is what arguments we give to the
callback.  My proposal was to map

  {pci_dev -> {requester-ID-A, requester-ID-B, ...}}

Yours is to map

  {pci_dev -> {{pci_dev-A, requester-ID-A}, {pci_dev-B, requester-ID-B}, ...}}

i.e., your callback gets both a pci_dev and a requester-ID.  I'm
trying to figure out how you handle cases where requester-id-A
doesn't have a corresponding pci_dev-A.  I guess you pass the device
most closely associated with requester-id-A

The "most closely associated device" idea seems confusing to
describe and think about.  I think you want to use it as part of
grouping devices into domains.  But I think we should attack that
problem separately.  For grouping or isolation questions, we have
to pay attention to things like ACS, which are not relevant for
mapping.

> If we look at an endpoint device like A, only A
> has A's requester ID.  Therefore, why would for_each_requester_id(A)
> traverse up to Y?

Even if A is a PCIe device, we have to traverse upwards to find any
bridges that might drop A's requester ID or take ownership, e.g., if
we have this:

  00:1c.0 PCIe-to-PCI bridge to [bus 01-02]
  01:00.0 PCI-to-PCIe bridge to [bus 02]
  02:00.0 PCIe endpoint A

the IOMMU has to look for requester-ID 0100.

> Y can take ownership and become the requester for A,
> but if we were to call for_each_requester_id(Y), wouldn't you expect the
> callback to happen on {Y, A, B} since all of those can use that
> requester ID?

No.  If I call for_each_requester_id(Y), I'm expecting the callback
to happen for each requester ID that could be used for transactions
originated by *Y*.  I'm trying to make an IOMMU mapping for use by
Y, so any devices downstream of Y, e.g., A and B, are irrelevant.

I think a lot of the confusion here is because we're trying to solve
both two questions at once: (1) what requester-IDs need to be mapped
to handle DMA from a device, and (2) what devices can't be isolated
from each other and must be in the same domain.  I think things will
be simpler if we can deal with those separately.  Even if it ends up
being more code, at least each piece will be easier to understand by
itself.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson July 29, 2013, 4:06 p.m. UTC | #17
On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote:
> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote:
> > On Wed, 2013-07-24 at 17:24 -0600, Bjorn Helgaas wrote:
> > > On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote:
> > > > On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote:
> > > > > On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson
> > > > > <alex.williamson@redhat.com> wrote:
> > > > > > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
> > > > > >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> > > > >   As the DMA
> > > > > transaction propagates through the fabric, it may be tagged by bridges
> > > > > with different requester IDs.
> > > > > 
> > > > > The requester IDs are needed outside PCI (by IOMMU drivers), but I'm
> > > > > not sure the intermediate pci_devs are.
> > > > 
> > > > A u16 requester ID doesn't mean much on it's own though, it's not
> > > > necessarily even unique.  A requester ID associated with the context of
> > > > a pci_dev is unique and gives us a reference point if we need to perform
> > > > another operation on that requester ID.
> > > 
> > > A u16 requester ID better mean something to an IOMMU -- it's all the
> > > IOMMU can use to look up the correct mapping.  That's why we have to
> > > give the iterator something to define the scope to iterate over.  The
> > > same requester ID could mean something totally unrelated in a
> > > different scope, e.g., below a different IOMMU.
> > 
> > The point I'm trying to make is that a requester ID depends on it's
> > context (minimally, the PCI segment).  The caller can assume the context
> > based on the calling parameters or we can provide context in the form of
> > an associated pci_dev.  I chose the latter path because I prefer
> > explicit interfaces and it has some usefulness in the intel-iommu
> > implementation.
> > 
> > For instance, get_domain_for_dev() first looks to see if a pci_dev
> > already has a domain.  If it doesn't, we want to look to see if there's
> > an upstream device that would use the same requester ID that already has
> > a domain.  If our get-requester-ID-for-device function returns only the
> > requester ID, we don't know if that requester ID is the device we're
> > searching from or some upstream device.  Therefore we potentially do an
> > unnecessary search for the domain.
> > 
> > The other user is intel_iommu_add_device() where we're trying to build
> > IOMMU groups.  Visibility is the first requirement of an IOMMU group.
> > If the IOMMU cannot distinguish between devices, they must be part of
> > the same IOMMU group.  Here we want to find the pci_dev that hosts the
> > requester ID.  I don't even know how we'd implement this with a function
> > that only returned the requester ID.  Perhaps we'd have to walk upstream
> > from the device calling the get-requester-ID-for-device function at each
> > step and noticing when it changed.  That moves significant logic back
> > into the caller code.
> > ...
> 
> > > I don't see the point of passing a "device closest to the requester
> > > ID."  What would the IOMMU do with that?  As far as the IOMMU is
> > > concerned, the requester ID could be an arbitrary number completely
> > > unrelated to a pci_dev.
> > 
> > Do you have an example of a case where a requester ID doesn't have some
> > association to a pci_dev?
> 
> I think our confusion here is the same as what Don & I have been
> hashing out -- I'm saying a requester ID fabricated by a bridge
> need not correspond to a specific pci_dev, and you probably mean
> that every requester ID is by definition the result of *some* PCI
> device making a DMA request.

Yes

> > ...
> > Furthermore, if we have:
> > 
> >      -- A
> >     /
> > X--Y
> >     \
> >      -- B
> > ... 
> 
> > Let me go back to the X-Y-A|B example above to see if I can explain why
> > pcie_for_each_requester_id() doesn't make sense to me.  Generally a
> > for_each_foo function should iterate across all things with the same
> > foo.  So a for_each_requester_id should iterate across all things with
> > the same requester ID.  
> 
> Hm, that's not the way I think of for_each_FOO() interfaces.  I
> think of it as "execute the body (or callback) for every possible
> FOO", where FOO is different each time.  for_each_pci_dev(),
> pci_bus_for_each_resource(), for_each_zone(), for_each_cpu(), etc.,
> work like that.

Most of these aren't relevant because they iterate over everything.  I
think they only show that if we had a pci_for_each_requester_id() that
it should iterate over every possible PCI requester ID in the system and
the same could be said for pci_for_each_requester().

pci_bus_for_each_resource() is the only one we can build from; for all
resources on a bus.  We want all requester IDs for a pci_dev.  Does that
perhaps mean it should be called pci_dev_for_each_requester_id()?  I'd
be ok with that name, but I think it even more implies that a pci_dev is
associated with a requester ID.

> But the more important question is what arguments we give to the
> callback.  My proposal was to map
> 
>   {pci_dev -> {requester-ID-A, requester-ID-B, ...}}
> 
> Yours is to map
> 
>   {pci_dev -> {{pci_dev-A, requester-ID-A}, {pci_dev-B, requester-ID-B}, ...}}
> 
> i.e., your callback gets both a pci_dev and a requester-ID.  I'm
> trying to figure out how you handle cases where requester-id-A
> doesn't have a corresponding pci_dev-A.  I guess you pass the device
> most closely associated with requester-id-A

Yes

> The "most closely associated device" idea seems confusing to
> describe and think about.  I think you want to use it as part of
> grouping devices into domains.  But I think we should attack that
> problem separately.  For grouping or isolation questions, we have
> to pay attention to things like ACS, which are not relevant for
> mapping.

We are only touch isolation insofar as providing an interface for a
driver to determine the point in the PCI topology where the requester ID
is rooted.  Yes, grouping can make use of that, but I object to the idea
that we're approaching some slippery slope of rolling multiple concepts
into this patch.  What I'm proposing here is strictly a requester ID
interface.  I believe that providing a way for a caller to determine
that two devices have a requester ID rooted at the same point in the PCI
topology is directly relevant to that interface.

pci_find_upstream_pcie_bridge() attempts to do this same thing.
Unfortunately, the interface is convoluted, it doesn't handle quirks,
and it requires a great deal of work by the caller to then walk the
topology and create requester IDs at each step.  This also indicates
that at each step, the requester ID is associated with some pci_dev.
Providing a pci_dev is simply showing our work and providing context to
the requester ID (ie. here's the requester ID and the step along the
path from which it was derived.  Here's your top level requester ID and
the point in the topology where it's based).

> > If we look at an endpoint device like A, only A
> > has A's requester ID.  Therefore, why would for_each_requester_id(A)
> > traverse up to Y?
> 
> Even if A is a PCIe device, we have to traverse upwards to find any
> bridges that might drop A's requester ID or take ownership, e.g., if
> we have this:
> 
>   00:1c.0 PCIe-to-PCI bridge to [bus 01-02]
>   01:00.0 PCI-to-PCIe bridge to [bus 02]
>   02:00.0 PCIe endpoint A
> 
> the IOMMU has to look for requester-ID 0100.

And I believe this patch handles this case correctly; I mentioned this
exact example in the v2 RFC cover letter.  This is another example where
pci_find_upstream_pcie_bridge() will currently fail.

> > Y can take ownership and become the requester for A,
> > but if we were to call for_each_requester_id(Y), wouldn't you expect the
> > callback to happen on {Y, A, B} since all of those can use that
> > requester ID?
> 
> No.  If I call for_each_requester_id(Y), I'm expecting the callback
> to happen for each requester ID that could be used for transactions
> originated by *Y*.  I'm trying to make an IOMMU mapping for use by
> Y, so any devices downstream of Y, e.g., A and B, are irrelevant.

Ok, you think of for_each_requester_id() the same as I think of
for_each_requester().  Can we split the difference and call it
pci_dev_for_each_requester_id()?

> I think a lot of the confusion here is because we're trying to solve
> both two questions at once: (1) what requester-IDs need to be mapped
> to handle DMA from a device, and (2) what devices can't be isolated
> from each other and must be in the same domain.  I think things will
> be simpler if we can deal with those separately.  Even if it ends up
> being more code, at least each piece will be easier to understand by
> itself.

Don't we already have this split in the code?

(1) pcie_for_each_requester
    (or pci_dev_for_each_requester_id)

(2) pci_get_visible_pcie_requester
    (or pci_get_visible_pcie_requester_id)

Note however that (2) does not impose anything about domains or
isolation, it is strictly based on PCI topology.  It's left to the
caller to determine how that translates to IOMMU domains, but the
typical case is trivial.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas July 29, 2013, 9:02 p.m. UTC | #18
On Mon, Jul 29, 2013 at 10:06 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote:
>> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote:

>> The "most closely associated device" idea seems confusing to
>> describe and think about.  I think you want to use it as part of
>> grouping devices into domains.  But I think we should attack that
>> problem separately.  For grouping or isolation questions, we have
>> to pay attention to things like ACS, which are not relevant for
>> mapping.
>
> We are only touch isolation insofar as providing an interface for a
> driver to determine the point in the PCI topology where the requester ID
> is rooted.  Yes, grouping can make use of that, but I object to the idea
> that we're approaching some slippery slope of rolling multiple concepts
> into this patch.  What I'm proposing here is strictly a requester ID
> interface.  I believe that providing a way for a caller to determine
> that two devices have a requester ID rooted at the same point in the PCI
> topology is directly relevant to that interface.
>
> pci_find_upstream_pcie_bridge() attempts to do this same thing.
> Unfortunately, the interface is convoluted, it doesn't handle quirks,
> and it requires a great deal of work by the caller to then walk the
> topology and create requester IDs at each step.  This also indicates
> that at each step, the requester ID is associated with some pci_dev.
> Providing a pci_dev is simply showing our work and providing context to
> the requester ID (ie. here's the requester ID and the step along the
> path from which it was derived.  Here's your top level requester ID and
> the point in the topology where it's based).

What does the driver *do* with the pci_dev?  If it's not necessary,
then showing our work and providing context just complicates the
interface and creates opportunities for mistakes.  If we're creating
IOMMU mappings, only the requester ID is needed.

I think you used get_domain_for_dev() and intel_iommu_add_device() as
examples, but as far as I can tell, they use the pci_dev as a way to
learn about isolation.  For that purpose, I don't think you want an
iterator -- you only want to know about the single pci_dev that's the
root of the isolation domain, and requester IDs really aren't
relevant.

>> > If we look at an endpoint device like A, only A
>> > has A's requester ID.  Therefore, why would for_each_requester_id(A)
>> > traverse up to Y?
>>
>> Even if A is a PCIe device, we have to traverse upwards to find any
>> bridges that might drop A's requester ID or take ownership, e.g., if
>> we have this:
>>
>>   00:1c.0 PCIe-to-PCI bridge to [bus 01-02]
>>   01:00.0 PCI-to-PCIe bridge to [bus 02]
>>   02:00.0 PCIe endpoint A
>>
>> the IOMMU has to look for requester-ID 0100.
>
> And I believe this patch handles this case correctly; I mentioned this
> exact example in the v2 RFC cover letter.  This is another example where
> pci_find_upstream_pcie_bridge() will currently fail.

OK, I was just trying to answer your question "why we would need to
traverse up to Y."  But apparently we agree about that.

>> > Y can take ownership and become the requester for A,
>> > but if we were to call for_each_requester_id(Y), wouldn't you expect the
>> > callback to happen on {Y, A, B} since all of those can use that
>> > requester ID?
>>
>> No.  If I call for_each_requester_id(Y), I'm expecting the callback
>> to happen for each requester ID that could be used for transactions
>> originated by *Y*.  I'm trying to make an IOMMU mapping for use by
>> Y, so any devices downstream of Y, e.g., A and B, are irrelevant.
>
> Ok, you think of for_each_requester_id() the same as I think of
> for_each_requester().  Can we split the difference and call it
> pci_dev_for_each_requester_id()?

Sure.

>> I think a lot of the confusion here is because we're trying to solve
>> both two questions at once: (1) what requester-IDs need to be mapped
>> to handle DMA from a device, and (2) what devices can't be isolated
>> from each other and must be in the same domain. ...
>
> Don't we already have this split in the code?
>
> (1) pcie_for_each_requester
>     (or pci_dev_for_each_requester_id)
>
> (2) pci_get_visible_pcie_requester
>     (or pci_get_visible_pcie_requester_id)
>
> Note however that (2) does not impose anything about domains or
> isolation, it is strictly based on PCI topology.  It's left to the
> caller to determine how that translates to IOMMU domains, but the
> typical case is trivial.

Can you expand on this a little bit?  What's involved in translating
the pci_get_visible_pcie_requester() result to an IOMMU domain?  Is
there something IOMMU-specific about this?

I assume that for any given PCI device A, there's a subtree of devices
that the IOMMU can't distinguish from A, and that this subtree is not
IOMMU-specific so it's reasonable to to have a PCI interface to
compute it.  An IOMMU driver might want to expand the domain to
include more devices, of course, and *that* seems like something to
leave to the driver.

I'm also assuming that a pci_dev (the root of the subtree) is a good
way to identify this set of indistinguishable devices because it's
easy to test whether a device B is part of it.  But the requester ID
seems redundant in this case because you can just associate an IOMMU
domain with the pci_dev.

Where does ACS come into the picture?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Donald Dutile July 29, 2013, 9:03 p.m. UTC | #19
On 07/29/2013 12:06 PM, Alex Williamson wrote:
> On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote:
>> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote:
>>> On Wed, 2013-07-24 at 17:24 -0600, Bjorn Helgaas wrote:
>>>> On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote:
>>>>> On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote:
>>>>>> On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson
>>>>>> <alex.williamson@redhat.com>  wrote:
>>>>>>> On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
>>>>>>>> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
>>>>>>    As the DMA
>>>>>> transaction propagates through the fabric, it may be tagged by bridges
>>>>>> with different requester IDs.
>>>>>>
>>>>>> The requester IDs are needed outside PCI (by IOMMU drivers), but I'm
>>>>>> not sure the intermediate pci_devs are.
>>>>>
>>>>> A u16 requester ID doesn't mean much on it's own though, it's not
>>>>> necessarily even unique.  A requester ID associated with the context of
>>>>> a pci_dev is unique and gives us a reference point if we need to perform
>>>>> another operation on that requester ID.
>>>>
>>>> A u16 requester ID better mean something to an IOMMU -- it's all the
>>>> IOMMU can use to look up the correct mapping.  That's why we have to
>>>> give the iterator something to define the scope to iterate over.  The
>>>> same requester ID could mean something totally unrelated in a
>>>> different scope, e.g., below a different IOMMU.
>>>
>>> The point I'm trying to make is that a requester ID depends on it's
>>> context (minimally, the PCI segment).  The caller can assume the context
>>> based on the calling parameters or we can provide context in the form of
>>> an associated pci_dev.  I chose the latter path because I prefer
>>> explicit interfaces and it has some usefulness in the intel-iommu
>>> implementation.
>>>
>>> For instance, get_domain_for_dev() first looks to see if a pci_dev
>>> already has a domain.  If it doesn't, we want to look to see if there's
>>> an upstream device that would use the same requester ID that already has
>>> a domain.  If our get-requester-ID-for-device function returns only the
>>> requester ID, we don't know if that requester ID is the device we're
>>> searching from or some upstream device.  Therefore we potentially do an
>>> unnecessary search for the domain.
>>>
>>> The other user is intel_iommu_add_device() where we're trying to build
>>> IOMMU groups.  Visibility is the first requirement of an IOMMU group.
>>> If the IOMMU cannot distinguish between devices, they must be part of
>>> the same IOMMU group.  Here we want to find the pci_dev that hosts the
>>> requester ID.  I don't even know how we'd implement this with a function
>>> that only returned the requester ID.  Perhaps we'd have to walk upstream
>>> from the device calling the get-requester-ID-for-device function at each
>>> step and noticing when it changed.  That moves significant logic back
>>> into the caller code.
>>> ...
>>
>>>> I don't see the point of passing a "device closest to the requester
>>>> ID."  What would the IOMMU do with that?  As far as the IOMMU is
>>>> concerned, the requester ID could be an arbitrary number completely
>>>> unrelated to a pci_dev.
>>>
>>> Do you have an example of a case where a requester ID doesn't have some
>>> association to a pci_dev?
>>
>> I think our confusion here is the same as what Don&  I have been
>> hashing out -- I'm saying a requester ID fabricated by a bridge
>> need not correspond to a specific pci_dev, and you probably mean
>> that every requester ID is by definition the result of *some* PCI
>> device making a DMA request.
>
> Yes
>
>>> ...
>>> Furthermore, if we have:
>>>
>>>       -- A
>>>      /
>>> X--Y
>>>      \
>>>       -- B
>>> ...
>>
>>> Let me go back to the X-Y-A|B example above to see if I can explain why
>>> pcie_for_each_requester_id() doesn't make sense to me.  Generally a
>>> for_each_foo function should iterate across all things with the same
>>> foo.  So a for_each_requester_id should iterate across all things with
>>> the same requester ID.
>>
>> Hm, that's not the way I think of for_each_FOO() interfaces.  I
>> think of it as "execute the body (or callback) for every possible
>> FOO", where FOO is different each time.  for_each_pci_dev(),
>> pci_bus_for_each_resource(), for_each_zone(), for_each_cpu(), etc.,
>> work like that.
>
> Most of these aren't relevant because they iterate over everything.  I
> think they only show that if we had a pci_for_each_requester_id() that
> it should iterate over every possible PCI requester ID in the system and
> the same could be said for pci_for_each_requester().
>
Lost me here.... I thought the functions took in a pdev, so it'll only
iterate on the segment that pdev is associated with.

> pci_bus_for_each_resource() is the only one we can build from; for all
> resources on a bus.  We want all requester IDs for a pci_dev.  Does that
> perhaps mean it should be called pci_dev_for_each_requester_id()?  I'd
> be ok with that name, but I think it even more implies that a pci_dev is
> associated with a requester ID.
>
and the fcn call name changes have equally lost me here.
AIUI, IOMMUs want to call a PCI core function with a pdev, asking for
the req-id's that the pdev may generate to the IOMMU when performing DMA.
that doesn't strike me as a for-each-requester-id() but a 'get-requester-id()'
operation.

>> But the more important question is what arguments we give to the
>> callback.  My proposal was to map
>>
>>    {pci_dev ->  {requester-ID-A, requester-ID-B, ...}}
>>
>> Yours is to map
>>
>>    {pci_dev ->  {{pci_dev-A, requester-ID-A}, {pci_dev-B, requester-ID-B}, ...}}
>>
>> i.e., your callback gets both a pci_dev and a requester-ID.  I'm
>> trying to figure out how you handle cases where requester-id-A
>> doesn't have a corresponding pci_dev-A.  I guess you pass the device
>> most closely associated with requester-id-A
>
> Yes
>
>> The "most closely associated device" idea seems confusing to
>> describe and think about.  I think you want to use it as part of
>> grouping devices into domains.  But I think we should attack that
>> problem separately.  For grouping or isolation questions, we have
>> to pay attention to things like ACS, which are not relevant for
>> mapping.
>
> We are only touch isolation insofar as providing an interface for a
> driver to determine the point in the PCI topology where the requester ID
> is rooted.  Yes, grouping can make use of that, but I object to the idea
> that we're approaching some slippery slope of rolling multiple concepts
> into this patch.  What I'm proposing here is strictly a requester ID
> interface.  I believe that providing a way for a caller to determine
> that two devices have a requester ID rooted at the same point in the PCI
> topology is directly relevant to that interface.
>
I strongly agree here.  providing the pdev's associated with each req-id rtn'd
cannot hurt, and in fact, I believe it is an improvement, avoiding a potential
set of more interfaces that may be needed to do (segment, req-id)->pdev mapping/matching/get-ing.


> pci_find_upstream_pcie_bridge() attempts to do this same thing.
> Unfortunately, the interface is convoluted, it doesn't handle quirks,
> and it requires a great deal of work by the caller to then walk the
> topology and create requester IDs at each step.  This also indicates
> that at each step, the requester ID is associated with some pci_dev.
> Providing a pci_dev is simply showing our work and providing context to
> the requester ID (ie. here's the requester ID and the step along the
> path from which it was derived.  Here's your top level requester ID and
> the point in the topology where it's based).
>
IMO, getting rid of pci_find_upstream_pcie_bridge() gets non-PCI code
out of the biz of knowing too much about PCI topology and the idiosyncracies
around req-id's, and the proposed interface cleans up the IOMMU (PCI) support.
Having the IOMMU api get the pdev rtn'd with a req-id, and then passing it
back to the core (for release/free) seems like the proper get/put logic
btwn the PCI core and another kernel subsystem.

>>> If we look at an endpoint device like A, only A
>>> has A's requester ID.  Therefore, why would for_each_requester_id(A)
>>> traverse up to Y?
>>
>> Even if A is a PCIe device, we have to traverse upwards to find any
>> bridges that might drop A's requester ID or take ownership, e.g., if
>> we have this:
>>
>>    00:1c.0 PCIe-to-PCI bridge to [bus 01-02]
>>    01:00.0 PCI-to-PCIe bridge to [bus 02]
>>    02:00.0 PCIe endpoint A
>>
>> the IOMMU has to look for requester-ID 0100.
>
> And I believe this patch handles this case correctly; I mentioned this
> exact example in the v2 RFC cover letter.  This is another example where
> pci_find_upstream_pcie_bridge() will currently fail.
>
+1

>>> Y can take ownership and become the requester for A,
>>> but if we were to call for_each_requester_id(Y), wouldn't you expect the
>>> callback to happen on {Y, A, B} since all of those can use that
>>> requester ID?
>>
>> No.  If I call for_each_requester_id(Y), I'm expecting the callback
>> to happen for each requester ID that could be used for transactions
>> originated by *Y*.  I'm trying to make an IOMMU mapping for use by
>> Y, so any devices downstream of Y, e.g., A and B, are irrelevant.
>
> Ok, you think of for_each_requester_id() the same as I think of
> for_each_requester().  Can we split the difference and call it
> pci_dev_for_each_requester_id()?
>
I can only ditto my sentiment above.
Can I toss in 'pci_get_requester_id()'?!? ... ;-)

>> I think a lot of the confusion here is because we're trying to solve
>> both two questions at once: (1) what requester-IDs need to be mapped
>> to handle DMA from a device, and (2) what devices can't be isolated
>> from each other and must be in the same domain.  I think things will
>> be simpler if we can deal with those separately.  Even if it ends up
>> being more code, at least each piece will be easier to understand by
>> itself.
>
> Don't we already have this split in the code?
>
> (1) pcie_for_each_requester
>      (or pci_dev_for_each_requester_id)
>
> (2) pci_get_visible_pcie_requester
>      (or pci_get_visible_pcie_requester_id)
>
again, unless you understand all of the mapped/coerced/modified request-id
of PCIe, PCIe-to-PCI(x) bridges, and the potential (RC, other) quirks,
these interface names are confusing. ....
... that could be me, and I need to go back to look at patches and
the description of the functions, to see if they help to clarify their uses.

> Note however that (2) does not impose anything about domains or
> isolation, it is strictly based on PCI topology.  It's left to the
> caller to determine how that translates to IOMMU domains, but the
> typical case is trivial.  Thanks,
>
+1, again.

> Alex
>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson July 29, 2013, 10:32 p.m. UTC | #20
On Mon, 2013-07-29 at 15:02 -0600, Bjorn Helgaas wrote:
> On Mon, Jul 29, 2013 at 10:06 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote:
> >> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote:
> 
> >> The "most closely associated device" idea seems confusing to
> >> describe and think about.  I think you want to use it as part of
> >> grouping devices into domains.  But I think we should attack that
> >> problem separately.  For grouping or isolation questions, we have
> >> to pay attention to things like ACS, which are not relevant for
> >> mapping.
> >
> > We are only touch isolation insofar as providing an interface for a
> > driver to determine the point in the PCI topology where the requester ID
> > is rooted.  Yes, grouping can make use of that, but I object to the idea
> > that we're approaching some slippery slope of rolling multiple concepts
> > into this patch.  What I'm proposing here is strictly a requester ID
> > interface.  I believe that providing a way for a caller to determine
> > that two devices have a requester ID rooted at the same point in the PCI
> > topology is directly relevant to that interface.
> >
> > pci_find_upstream_pcie_bridge() attempts to do this same thing.
> > Unfortunately, the interface is convoluted, it doesn't handle quirks,
> > and it requires a great deal of work by the caller to then walk the
> > topology and create requester IDs at each step.  This also indicates
> > that at each step, the requester ID is associated with some pci_dev.
> > Providing a pci_dev is simply showing our work and providing context to
> > the requester ID (ie. here's the requester ID and the step along the
> > path from which it was derived.  Here's your top level requester ID and
> > the point in the topology where it's based).
> 
> What does the driver *do* with the pci_dev?  If it's not necessary,
> then showing our work and providing context just complicates the
> interface and creates opportunities for mistakes.  If we're creating
> IOMMU mappings, only the requester ID is needed.

It's true, I don't have a use for the pci_dev in
pci_dev_for_each_requester_id() today.  But I think providing the
context for a requester ID is valuable information.  Otherwise we have
to make assumptions about the requester ID.  For example, if I have
devices in different PCI segments with the same requester ID the
callback function only knows that those are actually different requester
IDs from information the caller provides itself in the opaque pointer.
This is the case with intel-iommu, but why would we design an API that
requires the caller to provide that kind of context?  I also provide the
pci_dev because I think both the pci_dev_for_each_requester_id() and
pci_get_visible_pcie_requester() should provide similar APIs.  There's
an association of a requester ID to a pci_dev.  Why hide that?

> I think you used get_domain_for_dev() and intel_iommu_add_device() as
> examples, but as far as I can tell, they use the pci_dev as a way to
> learn about isolation.  For that purpose, I don't think you want an
> iterator -- you only want to know about the single pci_dev that's the
> root of the isolation domain, and requester IDs really aren't
> relevant.

See get_domain_for_dev() in patch 2/2.  It uses the returned pci_dev to
know whether the requester ID is rooted upstream or at the device
itself.  If upstream, it then uses the requester ID to search for an
existing domain.  The requester ID is relevant here.  If the returned
pci_dev is the device itself, it proceeds to allocate a domain because
the code path has already checked whether the device itself belongs to a
domain.

The use in intel_iommu_add_device() doesn't use the requester ID, it
only wants the pci_dev since it needs to then search above that for
additional isolation capabilities.  That interface only wants to know
the point in the topology where we have bus level granularity of
devices, so it doesn't care about the actual requester ID.

> >> > If we look at an endpoint device like A, only A
> >> > has A's requester ID.  Therefore, why would for_each_requester_id(A)
> >> > traverse up to Y?
> >>
> >> Even if A is a PCIe device, we have to traverse upwards to find any
> >> bridges that might drop A's requester ID or take ownership, e.g., if
> >> we have this:
> >>
> >>   00:1c.0 PCIe-to-PCI bridge to [bus 01-02]
> >>   01:00.0 PCI-to-PCIe bridge to [bus 02]
> >>   02:00.0 PCIe endpoint A
> >>
> >> the IOMMU has to look for requester-ID 0100.
> >
> > And I believe this patch handles this case correctly; I mentioned this
> > exact example in the v2 RFC cover letter.  This is another example where
> > pci_find_upstream_pcie_bridge() will currently fail.
> 
> OK, I was just trying to answer your question "why we would need to
> traverse up to Y."  But apparently we agree about that.
> 
> >> > Y can take ownership and become the requester for A,
> >> > but if we were to call for_each_requester_id(Y), wouldn't you expect the
> >> > callback to happen on {Y, A, B} since all of those can use that
> >> > requester ID?
> >>
> >> No.  If I call for_each_requester_id(Y), I'm expecting the callback
> >> to happen for each requester ID that could be used for transactions
> >> originated by *Y*.  I'm trying to make an IOMMU mapping for use by
> >> Y, so any devices downstream of Y, e.g., A and B, are irrelevant.
> >
> > Ok, you think of for_each_requester_id() the same as I think of
> > for_each_requester().  Can we split the difference and call it
> > pci_dev_for_each_requester_id()?
> 
> Sure.
> 
> >> I think a lot of the confusion here is because we're trying to solve
> >> both two questions at once: (1) what requester-IDs need to be mapped
> >> to handle DMA from a device, and (2) what devices can't be isolated
> >> from each other and must be in the same domain. ...
> >
> > Don't we already have this split in the code?
> >
> > (1) pcie_for_each_requester
> >     (or pci_dev_for_each_requester_id)
> >
> > (2) pci_get_visible_pcie_requester
> >     (or pci_get_visible_pcie_requester_id)
> >
> > Note however that (2) does not impose anything about domains or
> > isolation, it is strictly based on PCI topology.  It's left to the
> > caller to determine how that translates to IOMMU domains, but the
> > typical case is trivial.
> 
> Can you expand on this a little bit?  What's involved in translating
> the pci_get_visible_pcie_requester() result to an IOMMU domain?  Is
> there something IOMMU-specific about this?

I try not to make any assumptions about what's required to get from a
pci_dev/requester ID to a domain.  The IOMMU may only be able to filter
a subset of the requester ID, for instance what if it has no function
level visibility, or perhaps only bus level visibility.  intel-iommu has
full BDF granularity, so it's perhaps the simplest case.

> I assume that for any given PCI device A, there's a subtree of devices
> that the IOMMU can't distinguish from A, and that this subtree is not
> IOMMU-specific so it's reasonable to to have a PCI interface to
> compute it.  An IOMMU driver might want to expand the domain to
> include more devices, of course, and *that* seems like something to
> leave to the driver.

Yes, I agree.  We're only trying to determine the PCI bus level
visibility, anything else is left to the IOMMU driver.

> I'm also assuming that a pci_dev (the root of the subtree) is a good
> way to identify this set of indistinguishable devices because it's
> easy to test whether a device B is part of it.  But the requester ID
> seems redundant in this case because you can just associate an IOMMU
> domain with the pci_dev.

I don't think we want to assume that an IOMMU driver bases a domain on a
pci_dev or a BDF.  That's specific to the driver.  The requester ID is
not redundant though since it depends on topology and quirks.  A given
pci_dev may have it's own legitimate requester ID or a quirk, where the
quirk may be specific to the device or architected, like a bridge using
the (bridge->subordinate->number << 8 | 0) requester ID.

> Where does ACS come into the picture?

ACS determines whether a transaction must necessarily reach the IOMMU
without being redirected.  Therefore once we figure out the point at
which we have bus level granularity for a transaction, we look upstream
to determine if a transaction must necessarily reach the IOMMU.  In the
case of intel-iommu, IOMMU groups are only dependent on the visibility
provided by the PCI topology and the isolation guaranteed through ACS,
but it's up to the IOMMU driver to determine whether there are
additional constraints.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson July 29, 2013, 10:55 p.m. UTC | #21
On Mon, 2013-07-29 at 17:03 -0400, Don Dutile wrote:
> On 07/29/2013 12:06 PM, Alex Williamson wrote:
> > On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote:
> >> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote:
> >>> On Wed, 2013-07-24 at 17:24 -0600, Bjorn Helgaas wrote:
> >>>> On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote:
> >>>>> On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote:
> >>>>>> On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson
> >>>>>> <alex.williamson@redhat.com>  wrote:
> >>>>>>> On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
> >>>>>>>> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> >>>>>>    As the DMA
> >>>>>> transaction propagates through the fabric, it may be tagged by bridges
> >>>>>> with different requester IDs.
> >>>>>>
> >>>>>> The requester IDs are needed outside PCI (by IOMMU drivers), but I'm
> >>>>>> not sure the intermediate pci_devs are.
> >>>>>
> >>>>> A u16 requester ID doesn't mean much on it's own though, it's not
> >>>>> necessarily even unique.  A requester ID associated with the context of
> >>>>> a pci_dev is unique and gives us a reference point if we need to perform
> >>>>> another operation on that requester ID.
> >>>>
> >>>> A u16 requester ID better mean something to an IOMMU -- it's all the
> >>>> IOMMU can use to look up the correct mapping.  That's why we have to
> >>>> give the iterator something to define the scope to iterate over.  The
> >>>> same requester ID could mean something totally unrelated in a
> >>>> different scope, e.g., below a different IOMMU.
> >>>
> >>> The point I'm trying to make is that a requester ID depends on it's
> >>> context (minimally, the PCI segment).  The caller can assume the context
> >>> based on the calling parameters or we can provide context in the form of
> >>> an associated pci_dev.  I chose the latter path because I prefer
> >>> explicit interfaces and it has some usefulness in the intel-iommu
> >>> implementation.
> >>>
> >>> For instance, get_domain_for_dev() first looks to see if a pci_dev
> >>> already has a domain.  If it doesn't, we want to look to see if there's
> >>> an upstream device that would use the same requester ID that already has
> >>> a domain.  If our get-requester-ID-for-device function returns only the
> >>> requester ID, we don't know if that requester ID is the device we're
> >>> searching from or some upstream device.  Therefore we potentially do an
> >>> unnecessary search for the domain.
> >>>
> >>> The other user is intel_iommu_add_device() where we're trying to build
> >>> IOMMU groups.  Visibility is the first requirement of an IOMMU group.
> >>> If the IOMMU cannot distinguish between devices, they must be part of
> >>> the same IOMMU group.  Here we want to find the pci_dev that hosts the
> >>> requester ID.  I don't even know how we'd implement this with a function
> >>> that only returned the requester ID.  Perhaps we'd have to walk upstream
> >>> from the device calling the get-requester-ID-for-device function at each
> >>> step and noticing when it changed.  That moves significant logic back
> >>> into the caller code.
> >>> ...
> >>
> >>>> I don't see the point of passing a "device closest to the requester
> >>>> ID."  What would the IOMMU do with that?  As far as the IOMMU is
> >>>> concerned, the requester ID could be an arbitrary number completely
> >>>> unrelated to a pci_dev.
> >>>
> >>> Do you have an example of a case where a requester ID doesn't have some
> >>> association to a pci_dev?
> >>
> >> I think our confusion here is the same as what Don&  I have been
> >> hashing out -- I'm saying a requester ID fabricated by a bridge
> >> need not correspond to a specific pci_dev, and you probably mean
> >> that every requester ID is by definition the result of *some* PCI
> >> device making a DMA request.
> >
> > Yes
> >
> >>> ...
> >>> Furthermore, if we have:
> >>>
> >>>       -- A
> >>>      /
> >>> X--Y
> >>>      \
> >>>       -- B
> >>> ...
> >>
> >>> Let me go back to the X-Y-A|B example above to see if I can explain why
> >>> pcie_for_each_requester_id() doesn't make sense to me.  Generally a
> >>> for_each_foo function should iterate across all things with the same
> >>> foo.  So a for_each_requester_id should iterate across all things with
> >>> the same requester ID.
> >>
> >> Hm, that's not the way I think of for_each_FOO() interfaces.  I
> >> think of it as "execute the body (or callback) for every possible
> >> FOO", where FOO is different each time.  for_each_pci_dev(),
> >> pci_bus_for_each_resource(), for_each_zone(), for_each_cpu(), etc.,
> >> work like that.
> >
> > Most of these aren't relevant because they iterate over everything.  I
> > think they only show that if we had a pci_for_each_requester_id() that
> > it should iterate over every possible PCI requester ID in the system and
> > the same could be said for pci_for_each_requester().
> >
> Lost me here.... I thought the functions took in a pdev, so it'll only
> iterate on the segment that pdev is associated with.

Right, but how does the callback function know the segment we're
traversing?  Without a pci_dev a requester ID could belong to any PCI
segment.  It would be up to the caller to provide context in the opaque
pointer.

> > pci_bus_for_each_resource() is the only one we can build from; for all
> > resources on a bus.  We want all requester IDs for a pci_dev.  Does that
> > perhaps mean it should be called pci_dev_for_each_requester_id()?  I'd
> > be ok with that name, but I think it even more implies that a pci_dev is
> > associated with a requester ID.
> >
> and the fcn call name changes have equally lost me here.
> AIUI, IOMMUs want to call a PCI core function with a pdev, asking for
> the req-id's that the pdev may generate to the IOMMU when performing DMA.
> that doesn't strike me as a for-each-requester-id() but a 'get-requester-id()'
> operation.

But we have to support multiple requester IDs per endpoint.  As Bjorn
pointed out in the spec, a PCI-X bridge may pass the requester ID or
take ownership of the transaction.  Therefore we need to add both the
"endpoint" and the (bridge->secondary->number << 8 | 0) requester IDs.
We could do a get-requester-id() interface, but then the caller must
walk upstream through each device that may possibly take ownership of
the transactions.  That leaves a lot of logic in the caller versus a
for-each-requester-id() operation.


> >> But the more important question is what arguments we give to the
> >> callback.  My proposal was to map
> >>
> >>    {pci_dev ->  {requester-ID-A, requester-ID-B, ...}}
> >>
> >> Yours is to map
> >>
> >>    {pci_dev ->  {{pci_dev-A, requester-ID-A}, {pci_dev-B, requester-ID-B}, ...}}
> >>
> >> i.e., your callback gets both a pci_dev and a requester-ID.  I'm
> >> trying to figure out how you handle cases where requester-id-A
> >> doesn't have a corresponding pci_dev-A.  I guess you pass the device
> >> most closely associated with requester-id-A
> >
> > Yes
> >
> >> The "most closely associated device" idea seems confusing to
> >> describe and think about.  I think you want to use it as part of
> >> grouping devices into domains.  But I think we should attack that
> >> problem separately.  For grouping or isolation questions, we have
> >> to pay attention to things like ACS, which are not relevant for
> >> mapping.
> >
> > We are only touch isolation insofar as providing an interface for a
> > driver to determine the point in the PCI topology where the requester ID
> > is rooted.  Yes, grouping can make use of that, but I object to the idea
> > that we're approaching some slippery slope of rolling multiple concepts
> > into this patch.  What I'm proposing here is strictly a requester ID
> > interface.  I believe that providing a way for a caller to determine
> > that two devices have a requester ID rooted at the same point in the PCI
> > topology is directly relevant to that interface.
> >
> I strongly agree here.  providing the pdev's associated with each req-id rtn'd
> cannot hurt, and in fact, I believe it is an improvement, avoiding a potential
> set of more interfaces that may be needed to do (segment, req-id)->pdev mapping/matching/get-ing.
> 
> 
> > pci_find_upstream_pcie_bridge() attempts to do this same thing.
> > Unfortunately, the interface is convoluted, it doesn't handle quirks,
> > and it requires a great deal of work by the caller to then walk the
> > topology and create requester IDs at each step.  This also indicates
> > that at each step, the requester ID is associated with some pci_dev.
> > Providing a pci_dev is simply showing our work and providing context to
> > the requester ID (ie. here's the requester ID and the step along the
> > path from which it was derived.  Here's your top level requester ID and
> > the point in the topology where it's based).
> >
> IMO, getting rid of pci_find_upstream_pcie_bridge() gets non-PCI code
> out of the biz of knowing too much about PCI topology and the idiosyncracies
> around req-id's, and the proposed interface cleans up the IOMMU (PCI) support.
> Having the IOMMU api get the pdev rtn'd with a req-id, and then passing it
> back to the core (for release/free) seems like the proper get/put logic
> btwn the PCI core and another kernel subsystem.
> 
> >>> If we look at an endpoint device like A, only A
> >>> has A's requester ID.  Therefore, why would for_each_requester_id(A)
> >>> traverse up to Y?
> >>
> >> Even if A is a PCIe device, we have to traverse upwards to find any
> >> bridges that might drop A's requester ID or take ownership, e.g., if
> >> we have this:
> >>
> >>    00:1c.0 PCIe-to-PCI bridge to [bus 01-02]
> >>    01:00.0 PCI-to-PCIe bridge to [bus 02]
> >>    02:00.0 PCIe endpoint A
> >>
> >> the IOMMU has to look for requester-ID 0100.
> >
> > And I believe this patch handles this case correctly; I mentioned this
> > exact example in the v2 RFC cover letter.  This is another example where
> > pci_find_upstream_pcie_bridge() will currently fail.
> >
> +1
> 
> >>> Y can take ownership and become the requester for A,
> >>> but if we were to call for_each_requester_id(Y), wouldn't you expect the
> >>> callback to happen on {Y, A, B} since all of those can use that
> >>> requester ID?
> >>
> >> No.  If I call for_each_requester_id(Y), I'm expecting the callback
> >> to happen for each requester ID that could be used for transactions
> >> originated by *Y*.  I'm trying to make an IOMMU mapping for use by
> >> Y, so any devices downstream of Y, e.g., A and B, are irrelevant.
> >
> > Ok, you think of for_each_requester_id() the same as I think of
> > for_each_requester().  Can we split the difference and call it
> > pci_dev_for_each_requester_id()?
> >
> I can only ditto my sentiment above.
> Can I toss in 'pci_get_requester_id()'?!? ... ;-)

I assume pci_get_requester_id() would return just the requester ID for a
given device.  The problem with that is that it doesn't have much
meaning.  What's the requester ID of a bridge?  It depends on whether
the bridge is mastering the transaction (bridge->bus->number << 8 |
bridge->devfn) or whether it's taking ownership of a transaction for a
subordinate device (bridge->secondary->number << 8 | 0).  So we really
want the for-each interface because then we have a set of devices and a
vector through them.

> >> I think a lot of the confusion here is because we're trying to solve
> >> both two questions at once: (1) what requester-IDs need to be mapped
> >> to handle DMA from a device, and (2) what devices can't be isolated
> >> from each other and must be in the same domain.  I think things will
> >> be simpler if we can deal with those separately.  Even if it ends up
> >> being more code, at least each piece will be easier to understand by
> >> itself.
> >
> > Don't we already have this split in the code?
> >
> > (1) pcie_for_each_requester
> >      (or pci_dev_for_each_requester_id)
> >
> > (2) pci_get_visible_pcie_requester
> >      (or pci_get_visible_pcie_requester_id)
> >
> again, unless you understand all of the mapped/coerced/modified request-id
> of PCIe, PCIe-to-PCI(x) bridges, and the potential (RC, other) quirks,
> these interface names are confusing. ....
> ... that could be me, and I need to go back to look at patches and
> the description of the functions, to see if they help to clarify their uses.

Perhaps it's time for a v3, but I'm not sure what all we've learned from
v2 yet or what direction a v3 should take.  We've learned that root
complex PCIe-to-PCI bridges use implementation specific requester IDs
and we know that Intel uses the (bus|devfn) requester ID instead of the
standard (secondary|0).  This should probably be handled as a quirk.
We've renamed pcie_for_each_requester() to
pci_dev_for_each_requester_id().  I don't know if we've renamed
pci_get_visible_pcie_requester() yet, I was just trying to be consistent
with naming by adding _id to the end.  Thanks,

Alex

> > Note however that (2) does not impose anything about domains or
> > isolation, it is strictly based on PCI topology.  It's left to the
> > caller to determine how that translates to IOMMU domains, but the
> > typical case is trivial.  Thanks,
> >
> +1, again.
> 
> > Alex
> >
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Aug. 1, 2013, 10:08 p.m. UTC | #22
On Mon, Jul 29, 2013 at 4:32 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Mon, 2013-07-29 at 15:02 -0600, Bjorn Helgaas wrote:
>> On Mon, Jul 29, 2013 at 10:06 AM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>> > On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote:
>> >> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote:
>>
>> >> The "most closely associated device" idea seems confusing to
>> >> describe and think about.  I think you want to use it as part of
>> >> grouping devices into domains.  But I think we should attack that
>> >> problem separately.  For grouping or isolation questions, we have
>> >> to pay attention to things like ACS, which are not relevant for
>> >> mapping.
>> >
>> > We are only touch isolation insofar as providing an interface for a
>> > driver to determine the point in the PCI topology where the requester ID
>> > is rooted.  Yes, grouping can make use of that, but I object to the idea
>> > that we're approaching some slippery slope of rolling multiple concepts
>> > into this patch.  What I'm proposing here is strictly a requester ID
>> > interface.  I believe that providing a way for a caller to determine
>> > that two devices have a requester ID rooted at the same point in the PCI
>> > topology is directly relevant to that interface.
>> >
>> > pci_find_upstream_pcie_bridge() attempts to do this same thing.
>> > Unfortunately, the interface is convoluted, it doesn't handle quirks,
>> > and it requires a great deal of work by the caller to then walk the
>> > topology and create requester IDs at each step.  This also indicates
>> > that at each step, the requester ID is associated with some pci_dev.
>> > Providing a pci_dev is simply showing our work and providing context to
>> > the requester ID (ie. here's the requester ID and the step along the
>> > path from which it was derived.  Here's your top level requester ID and
>> > the point in the topology where it's based).
>>
>> What does the driver *do* with the pci_dev?  If it's not necessary,
>> then showing our work and providing context just complicates the
>> interface and creates opportunities for mistakes.  If we're creating
>> IOMMU mappings, only the requester ID is needed.
>
> It's true, I don't have a use for the pci_dev in
> pci_dev_for_each_requester_id() today.  But I think providing the
> context for a requester ID is valuable information.  Otherwise we have
> to make assumptions about the requester ID.  For example, if I have
> devices in different PCI segments with the same requester ID the
> callback function only knows that those are actually different requester
> IDs from information the caller provides itself in the opaque pointer.
> This is the case with intel-iommu, but why would we design an API that
> requires the caller to provide that kind of context?

The caller already has to provide context, e.g., the domain in which
to create a mapping, anyway via the opaque pointer.  So I would argue
that it's pointless to supply context twice in different ways.

We only have one caller of pci_dev_for_each_requester_id() anyway
(intel-iommu.c).  That seems really strange to me.  All I can assume
is that other IOMMU drivers *should* be doing something like this, but
aren't yet.  Anyway, since we only have one user, why not just provide
the minimum (only the requester ID), and add the pci_dev later if a
requirement for it turns up?

>  I also provide the
> pci_dev because I think both the pci_dev_for_each_requester_id() and
> pci_get_visible_pcie_requester() should provide similar APIs.  There's
> an association of a requester ID to a pci_dev.  Why hide that?

Information hiding is a basic idea of abstraction and encapsulation.
If we don't hide unnecessary information, we end up with unnecessary
dependencies.

>> I think you used get_domain_for_dev() and intel_iommu_add_device() as
>> examples, but as far as I can tell, they use the pci_dev as a way to
>> learn about isolation.  For that purpose, I don't think you want an
>> iterator -- you only want to know about the single pci_dev that's the
>> root of the isolation domain, and requester IDs really aren't
>> relevant.
>
> See get_domain_for_dev() in patch 2/2.  It uses the returned pci_dev to
> know whether the requester ID is rooted upstream or at the device
> itself.  If upstream, it then uses the requester ID to search for an
> existing domain.  The requester ID is relevant here.  If the returned
> pci_dev is the device itself, it proceeds to allocate a domain because
> the code path has already checked whether the device itself belongs to a
> domain.

I guess I got myself confused here.  get_domain_for_dev() uses
pci_get_visible_pcie_requester(), not pci_dev_for_each_requester_id().

>> > (1) pcie_for_each_requester
>> >     (or pci_dev_for_each_requester_id)
>> >
>> > (2) pci_get_visible_pcie_requester
>> >     (or pci_get_visible_pcie_requester_id)
>> >
>> > Note however that (2) does not impose anything about domains or
>> > isolation, it is strictly based on PCI topology.  It's left to the
>> > caller to determine how that translates to IOMMU domains, but the
>> > typical case is trivial.
>>
>> Can you expand on this a little bit?  What's involved in translating
>> the pci_get_visible_pcie_requester() result to an IOMMU domain?  Is
>> there something IOMMU-specific about this?
>
> I try not to make any assumptions about what's required to get from a
> pci_dev/requester ID to a domain.  The IOMMU may only be able to filter
> a subset of the requester ID, for instance what if it has no function
> level visibility, or perhaps only bus level visibility.  intel-iommu has
> full BDF granularity, so it's perhaps the simplest case.
>
>> I assume that for any given PCI device A, there's a subtree of devices
>> that the IOMMU can't distinguish from A, and that this subtree is not
>> IOMMU-specific so it's reasonable to to have a PCI interface to
>> compute it.  An IOMMU driver might want to expand the domain to
>> include more devices, of course, and *that* seems like something to
>> leave to the driver.
>
> Yes, I agree.  We're only trying to determine the PCI bus level
> visibility, anything else is left to the IOMMU driver.
>
>> I'm also assuming that a pci_dev (the root of the subtree) is a good
>> way to identify this set of indistinguishable devices because it's
>> easy to test whether a device B is part of it.  But the requester ID
>> seems redundant in this case because you can just associate an IOMMU
>> domain with the pci_dev.
>
> I don't think we want to assume that an IOMMU driver bases a domain on a
> pci_dev or a BDF.  That's specific to the driver.  The requester ID is
> not redundant though since it depends on topology and quirks.  A given
> pci_dev may have it's own legitimate requester ID or a quirk, where the
> quirk may be specific to the device or architected, like a bridge using
> the (bridge->subordinate->number << 8 | 0) requester ID.
>
>> Where does ACS come into the picture?
>
> ACS determines whether a transaction must necessarily reach the IOMMU
> without being redirected.  Therefore once we figure out the point at
> which we have bus level granularity for a transaction, we look upstream
> to determine if a transaction must necessarily reach the IOMMU.  In the
> case of intel-iommu, IOMMU groups are only dependent on the visibility
> provided by the PCI topology and the isolation guaranteed through ACS,
> but it's up to the IOMMU driver to determine whether there are
> additional constraints.  Thanks,

What code will look for ACS support and enablement?  Do you envision
that being in the IOMMU code or somewhere in PCI?  I was hoping that
the isolation/domain identification stuff could be fairly generic, but
it sounds like that's not the case so I'm guessing the interesting
parts of it will be in IOMMU code.

So I think I'm willing to go along with
pci_get_visible_pcie_requester() returning both a pci_dev and a
requester ID.  Is a single one of each sufficient?  I guess it is -- I
don't see any loops around where you're calling it.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Aug. 2, 2013, 4:59 p.m. UTC | #23
On Thu, 2013-08-01 at 16:08 -0600, Bjorn Helgaas wrote:
> On Mon, Jul 29, 2013 at 4:32 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Mon, 2013-07-29 at 15:02 -0600, Bjorn Helgaas wrote:
> >> On Mon, Jul 29, 2013 at 10:06 AM, Alex Williamson
> >> <alex.williamson@redhat.com> wrote:
> >> > On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote:
> >> >> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote:
> >>
> >> >> The "most closely associated device" idea seems confusing to
> >> >> describe and think about.  I think you want to use it as part of
> >> >> grouping devices into domains.  But I think we should attack that
> >> >> problem separately.  For grouping or isolation questions, we have
> >> >> to pay attention to things like ACS, which are not relevant for
> >> >> mapping.
> >> >
> >> > We are only touch isolation insofar as providing an interface for a
> >> > driver to determine the point in the PCI topology where the requester ID
> >> > is rooted.  Yes, grouping can make use of that, but I object to the idea
> >> > that we're approaching some slippery slope of rolling multiple concepts
> >> > into this patch.  What I'm proposing here is strictly a requester ID
> >> > interface.  I believe that providing a way for a caller to determine
> >> > that two devices have a requester ID rooted at the same point in the PCI
> >> > topology is directly relevant to that interface.
> >> >
> >> > pci_find_upstream_pcie_bridge() attempts to do this same thing.
> >> > Unfortunately, the interface is convoluted, it doesn't handle quirks,
> >> > and it requires a great deal of work by the caller to then walk the
> >> > topology and create requester IDs at each step.  This also indicates
> >> > that at each step, the requester ID is associated with some pci_dev.
> >> > Providing a pci_dev is simply showing our work and providing context to
> >> > the requester ID (ie. here's the requester ID and the step along the
> >> > path from which it was derived.  Here's your top level requester ID and
> >> > the point in the topology where it's based).
> >>
> >> What does the driver *do* with the pci_dev?  If it's not necessary,
> >> then showing our work and providing context just complicates the
> >> interface and creates opportunities for mistakes.  If we're creating
> >> IOMMU mappings, only the requester ID is needed.
> >
> > It's true, I don't have a use for the pci_dev in
> > pci_dev_for_each_requester_id() today.  But I think providing the
> > context for a requester ID is valuable information.  Otherwise we have
> > to make assumptions about the requester ID.  For example, if I have
> > devices in different PCI segments with the same requester ID the
> > callback function only knows that those are actually different requester
> > IDs from information the caller provides itself in the opaque pointer.
> > This is the case with intel-iommu, but why would we design an API that
> > requires the caller to provide that kind of context?
> 
> The caller already has to provide context, e.g., the domain in which
> to create a mapping, anyway via the opaque pointer.  So I would argue
> that it's pointless to supply context twice in different ways.
> 
> We only have one caller of pci_dev_for_each_requester_id() anyway
> (intel-iommu.c).  That seems really strange to me.  All I can assume
> is that other IOMMU drivers *should* be doing something like this, but
> aren't yet.  Anyway, since we only have one user, why not just provide
> the minimum (only the requester ID), and add the pci_dev later if a
> requirement for it turns up?


Ok, I can do that.  I think we only have one user because intel-iommu is
the only in-tree user of pci_find_upstream_pcie_bridge().  I know the
powerpc folks have been using similar code though.  amd_iommu bypasses
because they generate an alias table based on information for firmware.
That replaces some of this manual grep'ing through the topology.
Perhaps Joerg could comment about whether this is a useful interface for
amd_iommu.

> >  I also provide the
> > pci_dev because I think both the pci_dev_for_each_requester_id() and
> > pci_get_visible_pcie_requester() should provide similar APIs.  There's
> > an association of a requester ID to a pci_dev.  Why hide that?
> 
> Information hiding is a basic idea of abstraction and encapsulation.
> If we don't hide unnecessary information, we end up with unnecessary
> dependencies.
> 
> >> I think you used get_domain_for_dev() and intel_iommu_add_device() as
> >> examples, but as far as I can tell, they use the pci_dev as a way to
> >> learn about isolation.  For that purpose, I don't think you want an
> >> iterator -- you only want to know about the single pci_dev that's the
> >> root of the isolation domain, and requester IDs really aren't
> >> relevant.
> >
> > See get_domain_for_dev() in patch 2/2.  It uses the returned pci_dev to
> > know whether the requester ID is rooted upstream or at the device
> > itself.  If upstream, it then uses the requester ID to search for an
> > existing domain.  The requester ID is relevant here.  If the returned
> > pci_dev is the device itself, it proceeds to allocate a domain because
> > the code path has already checked whether the device itself belongs to a
> > domain.
> 
> I guess I got myself confused here.  get_domain_for_dev() uses
> pci_get_visible_pcie_requester(), not pci_dev_for_each_requester_id().
> 
> >> > (1) pcie_for_each_requester
> >> >     (or pci_dev_for_each_requester_id)
> >> >
> >> > (2) pci_get_visible_pcie_requester
> >> >     (or pci_get_visible_pcie_requester_id)
> >> >
> >> > Note however that (2) does not impose anything about domains or
> >> > isolation, it is strictly based on PCI topology.  It's left to the
> >> > caller to determine how that translates to IOMMU domains, but the
> >> > typical case is trivial.
> >>
> >> Can you expand on this a little bit?  What's involved in translating
> >> the pci_get_visible_pcie_requester() result to an IOMMU domain?  Is
> >> there something IOMMU-specific about this?
> >
> > I try not to make any assumptions about what's required to get from a
> > pci_dev/requester ID to a domain.  The IOMMU may only be able to filter
> > a subset of the requester ID, for instance what if it has no function
> > level visibility, or perhaps only bus level visibility.  intel-iommu has
> > full BDF granularity, so it's perhaps the simplest case.
> >
> >> I assume that for any given PCI device A, there's a subtree of devices
> >> that the IOMMU can't distinguish from A, and that this subtree is not
> >> IOMMU-specific so it's reasonable to to have a PCI interface to
> >> compute it.  An IOMMU driver might want to expand the domain to
> >> include more devices, of course, and *that* seems like something to
> >> leave to the driver.
> >
> > Yes, I agree.  We're only trying to determine the PCI bus level
> > visibility, anything else is left to the IOMMU driver.
> >
> >> I'm also assuming that a pci_dev (the root of the subtree) is a good
> >> way to identify this set of indistinguishable devices because it's
> >> easy to test whether a device B is part of it.  But the requester ID
> >> seems redundant in this case because you can just associate an IOMMU
> >> domain with the pci_dev.
> >
> > I don't think we want to assume that an IOMMU driver bases a domain on a
> > pci_dev or a BDF.  That's specific to the driver.  The requester ID is
> > not redundant though since it depends on topology and quirks.  A given
> > pci_dev may have it's own legitimate requester ID or a quirk, where the
> > quirk may be specific to the device or architected, like a bridge using
> > the (bridge->subordinate->number << 8 | 0) requester ID.
> >
> >> Where does ACS come into the picture?
> >
> > ACS determines whether a transaction must necessarily reach the IOMMU
> > without being redirected.  Therefore once we figure out the point at
> > which we have bus level granularity for a transaction, we look upstream
> > to determine if a transaction must necessarily reach the IOMMU.  In the
> > case of intel-iommu, IOMMU groups are only dependent on the visibility
> > provided by the PCI topology and the isolation guaranteed through ACS,
> > but it's up to the IOMMU driver to determine whether there are
> > additional constraints.  Thanks,
> 
> What code will look for ACS support and enablement?  Do you envision
> that being in the IOMMU code or somewhere in PCI?  I was hoping that
> the isolation/domain identification stuff could be fairly generic, but
> it sounds like that's not the case so I'm guessing the interesting
> parts of it will be in IOMMU code.

ACS support is currently in the IOMMUs and it's pretty much a mirror
image in both intel-iommu and amd_iommu (and soon powerpc), so it's
another good opportunity to create something to facilitate it.  The
specific IOMMU drivers are definitely more involved in determining
device visibility since that directly translates into their page table
entries, but ACS determines isolation, which is generic and determined
exclusively by device capabilities and quirks.

> So I think I'm willing to go along with
> pci_get_visible_pcie_requester() returning both a pci_dev and a
> requester ID.  Is a single one of each sufficient?  I guess it is -- I
> don't see any loops around where you're calling it.

Great!  I'm still trying to figure out how to handle the quirk around
Intel PCI-to-PCI bridge on the root complex as just another quirk.  I
respin another version once I have that worked out.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas April 3, 2014, 9:48 p.m. UTC | #24
[+cc George]

On Fri, Aug 2, 2013 at 10:59 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> ...
> Great!  I'm still trying to figure out how to handle the quirk around
> Intel PCI-to-PCI bridge on the root complex as just another quirk.  I
> respin another version once I have that worked out.  Thanks,

Is anything happening here?  These buggy IOMMU/DMA source problems,
e.g., [1], have been lingering a long time, and I don't know if we're
stuck because I haven't been giving them enough attention, or if we
don't really have a good solution yet.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=42679
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson April 4, 2014, 2:51 a.m. UTC | #25
On Thu, 2014-04-03 at 15:48 -0600, Bjorn Helgaas wrote:
> [+cc George]
> 
> On Fri, Aug 2, 2013 at 10:59 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > ...
> > Great!  I'm still trying to figure out how to handle the quirk around
> > Intel PCI-to-PCI bridge on the root complex as just another quirk.  I
> > respin another version once I have that worked out.  Thanks,
> 
> Is anything happening here?  These buggy IOMMU/DMA source problems,
> e.g., [1], have been lingering a long time, and I don't know if we're
> stuck because I haven't been giving them enough attention, or if we
> don't really have a good solution yet.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=42679

Sorry, no.  This has completely dropped off my radar.  I'll try to
resurrect it and figure out how to move forward.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas April 4, 2014, 3 p.m. UTC | #26
On Thu, Apr 3, 2014 at 8:51 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Thu, 2014-04-03 at 15:48 -0600, Bjorn Helgaas wrote:
>> [+cc George]
>>
>> On Fri, Aug 2, 2013 at 10:59 AM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>> > ...
>> > Great!  I'm still trying to figure out how to handle the quirk around
>> > Intel PCI-to-PCI bridge on the root complex as just another quirk.  I
>> > respin another version once I have that worked out.  Thanks,
>>
>> Is anything happening here?  These buggy IOMMU/DMA source problems,
>> e.g., [1], have been lingering a long time, and I don't know if we're
>> stuck because I haven't been giving them enough attention, or if we
>> don't really have a good solution yet.
>>
>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=42679
>
> Sorry, no.  This has completely dropped off my radar.  I'll try to
> resurrect it and figure out how to move forward.  Thanks,

Not a problem; I'm just way behind on processing patches, so I'm
trying to clean up my backlog and take care of things that are waiting
on me.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index d0627fa..4759c02 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -18,6 +18,204 @@  DECLARE_RWSEM(pci_bus_sem);
 EXPORT_SYMBOL_GPL(pci_bus_sem);
 
 /*
+ * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID
+ * @dev: device to test
+ */
+static bool pci_has_pcie_requester_id(struct pci_dev *dev)
+{
+	/*
+	 * XXX There's no indicator of the bus type, conventional PCI vs
+	 * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe
+	 * requester ID is a native PCIe based system (such as VT-d or
+	 * AMD-Vi).  It's common that PCIe root complex devices do not
+	 * include a PCIe capability, but we can assume they are PCIe
+	 * devices based on their topology.
+	 */
+	if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus))
+		return true;
+
+	/*
+	 * PCI-X devices have a requester ID, but the bridge may still take
+	 * ownership of transactions and create a requester ID.  We therefore
+	 * assume that the PCI-X requester ID is not the same one used on PCIe.
+	 */
+
+#ifdef CONFIG_PCI_QUIRKS
+	/*
+	 * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability.
+	 * If the device is a bridge, look to the next device upstream of it.
+	 * If that device is PCIe and not a PCIe-to-PCI bridge, then by
+	 * deduction, the device must be PCIe and therefore has a requester ID.
+	 */
+	if (dev->subordinate) {
+		struct pci_dev *parent = dev->bus->self;
+
+		if (pci_is_pcie(parent) &&
+		    pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
+			return true;
+	}
+#endif
+
+	return false;
+}
+
+/*
+ * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID?
+ * @dev: requester device
+ * @bridge: upstream bridge (or NULL for root bus)
+ */
+static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev,
+					      struct pci_dev *bridge)
+{
+	/*
+	 * The entire path must be tested, if any step does not have a
+	 * requester ID, the chain is broken.  This allows us to support
+	 * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe
+	 */
+	while (dev != bridge) {
+		if (!pci_has_pcie_requester_id(dev))
+			return false;
+
+		if (pci_is_root_bus(dev->bus))
+			return !bridge; /* false if we don't hit @bridge */
+
+		dev = dev->bus->self;
+	}
+
+	return true;
+}
+
+/*
+ * Legacy PCI bridges within a root complex (ex. Intel 82801) report
+ * a different requester ID than a standard PCIe-to-PCI bridge.  Instead
+ * of using (subordinate << 8 | 0) the use (bus << 8 | devfn), like a
+ * standard PCIe endpoint.  This function detects them.
+ *
+ * XXX Is this Intel vendor ID specific?
+ */
+static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge)
+{
+	if (!pci_is_pcie(bridge) && pci_is_root_bus(bridge->bus))
+		return true;
+
+	return false;
+}
+
+#define PCI_REQUESTER_ID(dev)	(((dev)->bus->number << 8) | (dev)->devfn)
+#define PCI_BRIDGE_REQUESTER_ID(dev)	((dev)->subordinate->number << 8)
+
+/*
+ * pci_get_visible_pcie_requester - Get requester and requester ID for
+ *                                  @requestee below @bridge
+ * @requestee: requester device
+ * @bridge: upstream bridge (or NULL for root bus)
+ * @requester_id: location to store requester ID or NULL
+ */
+struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
+					       struct pci_dev *bridge,
+					       u16 *requester_id)
+{
+	struct pci_dev *requester = requestee;
+
+	while (requester != bridge) {
+		requester = pci_get_dma_source(requester);
+		pci_dev_put(requester); /* XXX skip ref cnt */
+
+		if (pci_has_visible_pcie_requester_id(requester, bridge))
+			break;
+
+		if (pci_is_root_bus(requester->bus))
+			return NULL; /* @bridge not parent to @requestee */
+
+		requester = requester->bus->self;
+	}
+
+	if (requester_id) {
+		if (requester->bus != requestee->bus &&
+		    !pci_bridge_uses_endpoint_requester(requester))
+			*requester_id = PCI_BRIDGE_REQUESTER_ID(requester);
+		else
+			*requester_id = PCI_REQUESTER_ID(requester);
+	}
+
+	return requester;
+}
+
+static int pci_do_requester_callback(struct pci_dev **dev,
+				     int (*fn)(struct pci_dev *,
+					       u16 id, void *),
+				     void *data)
+{
+	struct pci_dev *dma_dev;
+	int ret;
+
+	ret = fn(*dev, PCI_REQUESTER_ID(*dev), data);
+	if (ret)
+		return ret;
+
+	dma_dev = pci_get_dma_source(*dev);
+	pci_dev_put(dma_dev); /* XXX skip ref cnt */
+	if (dma_dev == *dev)
+		return 0;
+
+	ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data);
+	if (ret)
+		return ret;
+
+	*dev = dma_dev;
+	return 0;
+}
+
+/*
+ * pcie_for_each_requester - Call callback @fn on each devices and DMA source
+ *                           from @requestee to the PCIe requester ID visible
+ *                           to @bridge.
+ * @requestee: Starting device
+ * @bridge: upstream bridge (or NULL for root bus)
+ * @fn: callback function
+ * @data: data to pass to callback
+ */
+int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
+			    int (*fn)(struct pci_dev *, u16 id, void *),
+			    void *data)
+{
+	struct pci_dev *requester;
+	struct pci_dev *dev = requestee;
+	int ret = 0;
+
+	requester = pci_get_visible_pcie_requester(requestee, bridge, NULL);
+	if (!requester)
+		return -EINVAL;
+
+	do {
+		ret = pci_do_requester_callback(&dev, fn, data);
+		if (ret)
+			return ret;
+
+		if (dev == requester)
+			return 0;
+
+		/*
+		 * We always consider root bus devices to have a visible
+		 * requester ID, therefore this should never be true.
+		 */
+		BUG_ON(pci_is_root_bus(dev->bus));
+
+		dev = dev->bus->self;
+
+	} while (dev != requester);
+
+	/*
+	 * If we've made it here, @requester is a bridge upstream from
+	 * @requestee.
+	 */
+	if (pci_bridge_uses_endpoint_requester(requester))
+		return pci_do_requester_callback(&requester, fn, data);
+
+	return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data);
+}
+
+/*
  * find the upstream PCIe-to-PCI bridge of a PCI device
  * if the device is PCIE, return NULL
  * if the device isn't connected to a PCIe bridge (that is its parent is a
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3a24e4f..94e81d1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1873,6 +1873,13 @@  static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
 }
 #endif
 
+struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
+					       struct pci_dev *bridge,
+					       u16 *requester_id);
+int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
+			    int (*fn)(struct pci_dev *, u16 id, void *),
+			    void *data);
+
 /**
  * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device
  * @pdev: the PCI device