diff mbox series

[RFC,v3,4/7] bus/cdx: add cdx-MSI domain with gic-its domain as parent

Message ID 20220906134801.4079497-5-nipun.gupta@amd.com (mailing list archive)
State New, archived
Headers show
Series [RFC,v3,1/7] dt-bindings: bus: add CDX bus device tree bindings | expand

Commit Message

Nipun Gupta Sept. 6, 2022, 1:47 p.m. UTC
Devices on cdx bus are dynamically detected and registered using
platform_device_register API. As these devices are not linked to
of node they need a separate MSI domain for handling device ID
to be provided to the GIC ITS domain.

This also introduces APIs to alloc and free IRQs for CDX domain.

Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
Signed-off-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
---
 drivers/bus/cdx/cdx.c        |  18 +++
 drivers/bus/cdx/cdx.h        |  19 +++
 drivers/bus/cdx/cdx_msi.c    | 236 +++++++++++++++++++++++++++++++++++
 drivers/bus/cdx/mcdi_stubs.c |   1 +
 include/linux/cdx/cdx_bus.h  |  19 +++
 5 files changed, 293 insertions(+)
 create mode 100644 drivers/bus/cdx/cdx_msi.c

Comments

Jason Gunthorpe Sept. 6, 2022, 5:19 p.m. UTC | #1
On Tue, Sep 06, 2022 at 07:17:58PM +0530, Nipun Gupta wrote:

> +static void cdx_msi_write_msg(struct irq_data *irq_data,
> +			      struct msi_msg *msg)
> +{
> +	/*
> +	 * Do nothing as CDX devices have these pre-populated
> +	 * in the hardware itself.
> +	 */
> +}

Huh?

There is no way it can be pre-populated, the addr/data pair,
especially on ARM, is completely under SW control. There is some
commonly used IOVA base in Linux for the ITS page, but no HW should
hardwire that.

Jason
Marc Zyngier Sept. 7, 2022, 11:17 a.m. UTC | #2
On Tue, 06 Sep 2022 18:19:06 +0100,
Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> On Tue, Sep 06, 2022 at 07:17:58PM +0530, Nipun Gupta wrote:
> 
> > +static void cdx_msi_write_msg(struct irq_data *irq_data,
> > +			      struct msi_msg *msg)
> > +{
> > +	/*
> > +	 * Do nothing as CDX devices have these pre-populated
> > +	 * in the hardware itself.
> > +	 */
> > +}
> 
> Huh?
> 
> There is no way it can be pre-populated, the addr/data pair,
> especially on ARM, is completely under SW control.

There is nothing in the GIC spec that says that.

> There is some commonly used IOVA base in Linux for the ITS page, but
> no HW should hardwire that.

That's not strictly true. It really depends on how this block is
integrated, and there is a number of existing blocks that know *in HW*
how to signal an LPI.

See, as the canonical example, how the mbigen driver doesn't need to
know about the address of GITS_TRANSLATER.

Yes, this messes with translation (the access is downstream of the
SMMU) if you relied on it to have some isolation, and it has a "black
hole" effect as nobody can have an IOVA that overlaps with the
physical address of the GITS_TRANSLATER register.

But is it illegal as per the architecture? No. It's just stupid.

	M.
Robin Murphy Sept. 7, 2022, 11:33 a.m. UTC | #3
On 2022-09-07 12:17, Marc Zyngier wrote:
> On Tue, 06 Sep 2022 18:19:06 +0100,
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>>
>> On Tue, Sep 06, 2022 at 07:17:58PM +0530, Nipun Gupta wrote:
>>
>>> +static void cdx_msi_write_msg(struct irq_data *irq_data,
>>> +			      struct msi_msg *msg)
>>> +{
>>> +	/*
>>> +	 * Do nothing as CDX devices have these pre-populated
>>> +	 * in the hardware itself.
>>> +	 */
>>> +}
>>
>> Huh?
>>
>> There is no way it can be pre-populated, the addr/data pair,
>> especially on ARM, is completely under SW control.
> 
> There is nothing in the GIC spec that says that.
> 
>> There is some commonly used IOVA base in Linux for the ITS page, but
>> no HW should hardwire that.
> 
> That's not strictly true. It really depends on how this block is
> integrated, and there is a number of existing blocks that know *in HW*
> how to signal an LPI.
> 
> See, as the canonical example, how the mbigen driver doesn't need to
> know about the address of GITS_TRANSLATER.
> 
> Yes, this messes with translation (the access is downstream of the
> SMMU) if you relied on it to have some isolation, and it has a "black
> hole" effect as nobody can have an IOVA that overlaps with the
> physical address of the GITS_TRANSLATER register.
> 
> But is it illegal as per the architecture? No. It's just stupid.

If that were the case, then we'd also need a platform quirk so the SMMU 
driver knows about it. Yuck.

But even then, are you suggesting there is some way to convince the ITS 
driver to allocate a specific predetermined EventID when a driver 
requests an MSI? Asking for a friend...

Cheers,
Robin.
Radovanovic, Aleksandar Sept. 7, 2022, 11:35 a.m. UTC | #4
[AMD Official Use Only - General]



> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: 07 September 2022 12:17
> To: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Gupta, Nipun <Nipun.Gupta@amd.com>; robh+dt@kernel.org;
> krzysztof.kozlowski+dt@linaro.org; gregkh@linuxfoundation.org;
> rafael@kernel.org; eric.auger@redhat.com; alex.williamson@redhat.com;
> cohuck@redhat.com; Gupta, Puneet (DCG-ENG)
> <puneet.gupta@amd.com>; song.bao.hua@hisilicon.com;
> mchehab+huawei@kernel.org; f.fainelli@gmail.com;
> jeffrey.l.hugo@gmail.com; saravanak@google.com;
> Michael.Srba@seznam.cz; mani@kernel.org; yishaih@nvidia.com;
> robin.murphy@arm.com; will@kernel.org; joro@8bytes.org;
> masahiroy@kernel.org; ndesaulniers@google.com; linux-arm-
> kernel@lists.infradead.org; linux-kbuild@vger.kernel.org; linux-
> kernel@vger.kernel.org; devicetree@vger.kernel.org; kvm@vger.kernel.org;
> okaya@kernel.org; Anand, Harpreet <harpreet.anand@amd.com>; Agarwal,
> Nikhil <nikhil.agarwal@amd.com>; Simek, Michal <michal.simek@amd.com>;
> Radovanovic, Aleksandar <aleksandar.radovanovic@amd.com>; git (AMD-
> Xilinx) <git@amd.com>
> Subject: Re: [RFC PATCH v3 4/7] bus/cdx: add cdx-MSI domain with gic-its
> domain as parent
> 
> [CAUTION: External Email]
> 
> On Tue, 06 Sep 2022 18:19:06 +0100,
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Tue, Sep 06, 2022 at 07:17:58PM +0530, Nipun Gupta wrote:
> >
> > > +static void cdx_msi_write_msg(struct irq_data *irq_data,
> > > +                         struct msi_msg *msg) {
> > > +   /*
> > > +    * Do nothing as CDX devices have these pre-populated
> > > +    * in the hardware itself.
> > > +    */
> > > +}
> >
> > Huh?
> >
> > There is no way it can be pre-populated, the addr/data pair,
> > especially on ARM, is completely under SW control.
> 
> There is nothing in the GIC spec that says that.
> 
> > There is some commonly used IOVA base in Linux for the ITS page, but
> > no HW should hardwire that.
> 
> That's not strictly true. It really depends on how this block is integrated, and
> there is a number of existing blocks that know *in HW* how to signal an LPI.
> 
> See, as the canonical example, how the mbigen driver doesn't need to know
> about the address of GITS_TRANSLATER.
> 
> Yes, this messes with translation (the access is downstream of the
> SMMU) if you relied on it to have some isolation, and it has a "black hole"
> effect as nobody can have an IOVA that overlaps with the physical address of
> the GITS_TRANSLATER register.
> 
> But is it illegal as per the architecture? No. It's just stupid.
> 
>         M.
> 
> --
> Without deviation from the norm, progress is not possible.

To give some context, CDX devices are specific to embedded ARM CPUs on the FPGA and a lot of the CDX hardware core is under the control of the system firmware, not the application CPUs. 

That being said, the MSI address is always going to be the GIC GITS_TRANSLATER, which is known to the system firmware, as it is fixed per FPGA platform. At present, we do not allow the application CPU OS to change this - I believe this is for security reasons, but this may or may not be a good idea in general. As Marc mentions, CDX MSI writes are downstream of the SMMU and, if SMMU does not provide identity mapping for GITS_TRANSLATER, then we have a problem and may need to allow the OS to write the address part. However, even if we did, the CDX hardware is limited in that it can only take one GITS_TRANSLATER register target address per system, not per CDX device, nor per MSI vector.

As for the data part (EventID in GIC parlance), this is always going to be the CDX device-relative vector number - I believe this can't be changed, it is a hardware limitation (but I need to double-check). That should be OK, though, as I believe this is exactly what Linux would write anyway, as each CDX device should be in its own IRQ domain (i.e. have its own ITS device table).

The best I can propose is to pass the addr/data info to firmware here, which will then decide what to do with it. At least, it can assert that the values are what the hardware expects and fail loudly if not, rather than having a silently misconfigured system.

Aleksandar
Marc Zyngier Sept. 7, 2022, 12:14 p.m. UTC | #5
On Wed, 07 Sep 2022 12:33:12 +0100,
Robin Murphy <robin.murphy@arm.com> wrote:
> 
> On 2022-09-07 12:17, Marc Zyngier wrote:
> > On Tue, 06 Sep 2022 18:19:06 +0100,
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >> 
> >> On Tue, Sep 06, 2022 at 07:17:58PM +0530, Nipun Gupta wrote:
> >> 
> >>> +static void cdx_msi_write_msg(struct irq_data *irq_data,
> >>> +			      struct msi_msg *msg)
> >>> +{
> >>> +	/*
> >>> +	 * Do nothing as CDX devices have these pre-populated
> >>> +	 * in the hardware itself.
> >>> +	 */
> >>> +}
> >> 
> >> Huh?
> >> 
> >> There is no way it can be pre-populated, the addr/data pair,
> >> especially on ARM, is completely under SW control.
> > 
> > There is nothing in the GIC spec that says that.
> > 
> >> There is some commonly used IOVA base in Linux for the ITS page, but
> >> no HW should hardwire that.
> > 
> > That's not strictly true. It really depends on how this block is
> > integrated, and there is a number of existing blocks that know *in HW*
> > how to signal an LPI.
> > 
> > See, as the canonical example, how the mbigen driver doesn't need to
> > know about the address of GITS_TRANSLATER.
> > 
> > Yes, this messes with translation (the access is downstream of the
> > SMMU) if you relied on it to have some isolation, and it has a "black
> > hole" effect as nobody can have an IOVA that overlaps with the
> > physical address of the GITS_TRANSLATER register.
> > 
> > But is it illegal as per the architecture? No. It's just stupid.
> 
> If that were the case, then we'd also need a platform quirk so the
> SMMU driver knows about it. Yuck.

Yup. As I said, this is stupid.

> But even then, are you suggesting there is some way to convince the
> ITS driver to allocate a specific predetermined EventID when a driver
> requests an MSI? Asking for a friend...

Of course not. Whoever did that has decided to hardcode the Linux
behaviour into the HW, because it is well known that SW behaviour
never changes. Nononono.

I am >this< tempted to sneak a change into the allocation scheme to
start at 5 or 13 (alternatively), and to map LPIs top-down. That
should get people thinking.

Cheers,

	M.
Marc Zyngier Sept. 7, 2022, 1:18 p.m. UTC | #6
On Tue, 06 Sep 2022 14:47:58 +0100,
Nipun Gupta <nipun.gupta@amd.com> wrote:
> 
> Devices on cdx bus are dynamically detected and registered using
> platform_device_register API. As these devices are not linked to
> of node they need a separate MSI domain for handling device ID
> to be provided to the GIC ITS domain.
> 
> This also introduces APIs to alloc and free IRQs for CDX domain.
> 
> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> Signed-off-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
> ---
>  drivers/bus/cdx/cdx.c        |  18 +++
>  drivers/bus/cdx/cdx.h        |  19 +++
>  drivers/bus/cdx/cdx_msi.c    | 236 +++++++++++++++++++++++++++++++++++
>  drivers/bus/cdx/mcdi_stubs.c |   1 +
>  include/linux/cdx/cdx_bus.h  |  19 +++
>  5 files changed, 293 insertions(+)
>  create mode 100644 drivers/bus/cdx/cdx_msi.c
> 
> diff --git a/drivers/bus/cdx/cdx.c b/drivers/bus/cdx/cdx.c
> index fc417c32c59b..02ececce1c84 100644
> --- a/drivers/bus/cdx/cdx.c
> +++ b/drivers/bus/cdx/cdx.c
> @@ -17,6 +17,7 @@
>  #include <linux/dma-map-ops.h>
>  #include <linux/property.h>
>  #include <linux/iommu.h>
> +#include <linux/msi.h>
>  #include <linux/cdx/cdx_bus.h>
>  
>  #include "cdx.h"
> @@ -236,6 +237,7 @@ static int cdx_device_add(struct device *parent,
>  			  struct cdx_dev_params_t *dev_params)
>  {
>  	struct cdx_device *cdx_dev;
> +	struct irq_domain *cdx_msi_domain;
>  	int ret;
>  
>  	cdx_dev = kzalloc(sizeof(*cdx_dev), GFP_KERNEL);
> @@ -252,6 +254,7 @@ static int cdx_device_add(struct device *parent,
>  
>  	/* Populate CDX dev params */
>  	cdx_dev->req_id = dev_params->req_id;
> +	cdx_dev->num_msi = dev_params->num_msi;
>  	cdx_dev->vendor = dev_params->vendor;
>  	cdx_dev->device = dev_params->device;
>  	cdx_dev->bus_id = dev_params->bus_id;
> @@ -269,6 +272,21 @@ static int cdx_device_add(struct device *parent,
>  	dev_set_name(&cdx_dev->dev, "cdx-%02x:%02x", cdx_dev->bus_id,
>  			cdx_dev->func_id);
>  
> +	/* If CDX MSI domain is not created, create one. */
> +	cdx_msi_domain = cdx_find_msi_domain(parent);

Why do we need such a wrapper around find_host_domain()?

> +	if (!cdx_msi_domain) {
> +		cdx_msi_domain = cdx_msi_domain_init(parent);

This is racy. If device are populated in parallel, bad things will
happen.

> +		if (!cdx_msi_domain) {
> +			dev_err(&cdx_dev->dev,
> +				"cdx_msi_domain_init() failed: %d", ret);
> +			kfree(cdx_dev);
> +			return -1;

Use standard error codes.

> +		}
> +	}
> +
> +	/* Set the MSI domain */
> +	dev_set_msi_domain(&cdx_dev->dev, cdx_msi_domain);
> +
>  	ret = device_add(&cdx_dev->dev);
>  	if (ret != 0) {
>  		dev_err(&cdx_dev->dev,
> diff --git a/drivers/bus/cdx/cdx.h b/drivers/bus/cdx/cdx.h
> index db0569431c10..95df440ebd73 100644
> --- a/drivers/bus/cdx/cdx.h
> +++ b/drivers/bus/cdx/cdx.h
> @@ -20,6 +20,7 @@
>   * @res: array of MMIO region entries
>   * @res_count: number of valid MMIO regions
>   * @req_id: Requestor ID associated with CDX device
> + * @num_msi: Number of MSI's supported by the device
>   */
>  struct cdx_dev_params_t {
>  	u16 vendor;
> @@ -29,6 +30,24 @@ struct cdx_dev_params_t {
>  	struct resource res[MAX_CDX_DEV_RESOURCES];
>  	u8 res_count;
>  	u32 req_id;
> +	u32 num_msi;
>  };
>  
> +/**
> + * cdx_msi_domain_init - Init the CDX bus MSI domain.
> + * @dev: Device of the CDX bus controller
> + *
> + * Return CDX MSI domain, NULL on failure
> + */
> +struct irq_domain *cdx_msi_domain_init(struct device *dev);
> +
> +/**
> + * cdx_find_msi_domain - Get the CDX-MSI domain.
> + * @dev: CDX controller generic device
> + *
> + * Return CDX MSI domain, NULL on error or if CDX-MSI domain is
> + *   not yet created.
> + */
> +struct irq_domain *cdx_find_msi_domain(struct device *parent);
> +
>  #endif /* _CDX_H_ */
> diff --git a/drivers/bus/cdx/cdx_msi.c b/drivers/bus/cdx/cdx_msi.c
> new file mode 100644
> index 000000000000..2fb7bac18393
> --- /dev/null
> +++ b/drivers/bus/cdx/cdx_msi.c
> @@ -0,0 +1,236 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * AMD CDX bus driver MSI support
> + *
> + * Copyright (C) 2022, Advanced Micro Devices, Inc.
> + *
> + */
> +
> +#include <linux/of.h>
> +#include <linux/of_device.h>
> +#include <linux/of_address.h>
> +#include <linux/of_irq.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/msi.h>
> +#include <linux/cdx/cdx_bus.h>
> +
> +#include "cdx.h"
> +
> +#ifdef GENERIC_MSI_DOMAIN_OPS
> +/*
> + * Generate a unique ID identifying the interrupt (only used within the MSI
> + * irqdomain.  Combine the req_id with the interrupt index.
> + */
> +static irq_hw_number_t cdx_domain_calc_hwirq(struct cdx_device *dev,
> +					     struct msi_desc *desc)
> +{
> +	/*
> +	 * Make the base hwirq value for req_id*10000 so it is readable
> +	 * as a decimal value in /proc/interrupts.
> +	 */
> +	return (irq_hw_number_t)(desc->msi_index + (dev->req_id * 10000));

No, please. Use shifts, and use a script if decimal conversion fails
you. We're not playing these games. And the cast is pointless.

Yes, you have lifted it from the FSL code, bad move. /me makes a note
to go and clean-up this crap.

> +}
> +
> +static void cdx_msi_set_desc(msi_alloc_info_t *arg,
> +			     struct msi_desc *desc)
> +{
> +	arg->desc = desc;
> +	arg->hwirq = cdx_domain_calc_hwirq(to_cdx_device(desc->dev), desc);
> +}
> +#else
> +#define cdx_msi_set_desc NULL

Why the ifdefery? This should *only* be supported with
GENERIC_MSI_DOMAIN_OPS.

> +#endif
> +
> +static void cdx_msi_update_dom_ops(struct msi_domain_info *info)
> +{
> +	struct msi_domain_ops *ops = info->ops;
> +
> +	if (!ops)
> +		return;
> +
> +	/* set_desc should not be set by the caller */
> +	if (!ops->set_desc)
> +		ops->set_desc = cdx_msi_set_desc;

Then why are you allowing this to be overridden?

> +}
> +
> +static void cdx_msi_write_msg(struct irq_data *irq_data,
> +			      struct msi_msg *msg)
> +{
> +	/*
> +	 * Do nothing as CDX devices have these pre-populated
> +	 * in the hardware itself.
> +	 */

We talked about this in a separate thread. This is a major problem.

> +}
> +
> +static void cdx_msi_update_chip_ops(struct msi_domain_info *info)
> +{
> +	struct irq_chip *chip = info->chip;
> +
> +	if (!chip)
> +		return;
> +
> +	/*
> +	 * irq_write_msi_msg should not be set by the caller
> +	 */
> +	if (!chip->irq_write_msi_msg)
> +		chip->irq_write_msi_msg = cdx_msi_write_msg;

Then why the check?

> +}
> +/**
> + * cdx_msi_create_irq_domain - Create a CDX MSI interrupt domain
> + * @fwnode:	Optional firmware node of the interrupt controller
> + * @info:	MSI domain info
> + * @parent:	Parent irq domain
> + *
> + * Updates the domain and chip ops and creates a CDX MSI
> + * interrupt domain.
> + *
> + * Returns:
> + * A domain pointer or NULL in case of failure.
> + */
> +static struct irq_domain *cdx_msi_create_irq_domain(struct fwnode_handle *fwnode,
> +						    struct msi_domain_info *info,
> +						    struct irq_domain *parent)
> +{
> +	if (WARN_ON((info->flags & MSI_FLAG_LEVEL_CAPABLE)))
> +		info->flags &= ~MSI_FLAG_LEVEL_CAPABLE;

No. Just fail the domain creation. We shouldn't paper over these things.

> +	if (info->flags & MSI_FLAG_USE_DEF_DOM_OPS)
> +		cdx_msi_update_dom_ops(info);
> +	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
> +		cdx_msi_update_chip_ops(info);

Under what circumstances would the default ops not be used? The only
caller is in this file and has pre-computed values.

This looks like a copy/paste from platform-msi.c.

> +	info->flags |= MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS | MSI_FLAG_FREE_MSI_DESCS;
> +
> +	return msi_create_irq_domain(fwnode, info, parent);

This whole function makes no sense. You should move everything to the
relevant structures, and simply call msi_create_irq_domain() from the
sole caller of this function.

> +}
> +
> +int cdx_msi_domain_alloc_irqs(struct device *dev, unsigned int irq_count)
> +{
> +	struct irq_domain *msi_domain;
> +	int ret;
> +
> +	msi_domain = dev_get_msi_domain(dev);
> +	if (!msi_domain) {

How can that happen?

> +		dev_err(dev, "msi domain get failed\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = msi_setup_device_data(dev);
> +	if (ret) {
> +		dev_err(dev, "msi setup device failed: %d\n", ret);
> +		return ret;
> +	}
> +
> +	msi_lock_descs(dev);
> +	if (msi_first_desc(dev, MSI_DESC_ALL))
> +		ret = -EINVAL;
> +	msi_unlock_descs(dev);
> +	if (ret) {
> +		dev_err(dev, "msi setup device failed: %d\n", ret);

Same message twice, not very useful. Consider grouping these things at
the end of the function and make use of a (oh Gawd) goto...

> +		return ret;
> +	}
> +
> +	ret = msi_domain_alloc_irqs(msi_domain, dev, irq_count);
> +	if (ret)
> +		dev_err(dev, "Failed to allocate IRQs\n");
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(cdx_msi_domain_alloc_irqs);

EXPORT_SYMBOL_GPL(), please, for all the exports.

> +
> +void cdx_msi_domain_free_irqs(struct device *dev)
> +{
> +	struct irq_domain *msi_domain;
> +
> +	msi_domain = dev_get_msi_domain(dev);
> +	if (!msi_domain)

Again, how can that happen?

> +		return;
> +
> +	msi_domain_free_irqs(msi_domain, dev);
> +}
> +EXPORT_SYMBOL(cdx_msi_domain_free_irqs);

This feels like a very pointless helper, and again a copy/paste from
the FSL code. I'd rather you change msi_domain_free_irqs() to only
take a device and use the implicit MSI domain.

> +
> +static struct irq_chip cdx_msi_irq_chip = {
> +	.name = "CDX-MSI",
> +	.irq_mask = irq_chip_mask_parent,
> +	.irq_unmask = irq_chip_unmask_parent,
> +	.irq_eoi = irq_chip_eoi_parent,
> +	.irq_set_affinity = msi_domain_set_affinity

nit: please align things vertically.

> +};
> +
> +static int cdx_msi_prepare(struct irq_domain *msi_domain,
> +			   struct device *dev,
> +			   int nvec, msi_alloc_info_t *info)
> +{
> +	struct cdx_device *cdx_dev = to_cdx_device(dev);
> +	struct msi_domain_info *msi_info;
> +	struct device *parent = dev->parent;
> +	u32 dev_id;
> +	int ret;
> +
> +	/* Retrieve device ID from requestor ID using parent device */
> +	ret = of_map_id(parent->of_node, cdx_dev->req_id, "msi-map",
> +			"msi-map-mask",	NULL, &dev_id);
> +	if (ret) {
> +		dev_err(dev, "of_map_id failed for MSI: %d\n", ret);
> +		return ret;
> +	}
> +
> +	/* Set the device Id to be passed to the GIC-ITS */
> +	info->scratchpad[0].ul = dev_id;
> +
> +	msi_info = msi_get_domain_info(msi_domain->parent);
> +
> +	/* Allocate at least 32 MSIs, and always as a power of 2 */

Where is this requirement coming from?

> +	nvec = max_t(int, 32, roundup_pow_of_two(nvec));
> +	return msi_info->ops->msi_prepare(msi_domain->parent, dev, nvec, info);
> +}
> +
> +static struct msi_domain_ops cdx_msi_ops __ro_after_init = {
> +	.msi_prepare = cdx_msi_prepare,
> +};
> +
> +static struct msi_domain_info cdx_msi_domain_info = {
> +	.flags	= (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS),
> +	.ops	= &cdx_msi_ops,
> +	.chip	= &cdx_msi_irq_chip,
> +};
> +
> +struct irq_domain *cdx_msi_domain_init(struct device *dev)
> +{
> +	struct irq_domain *parent;
> +	struct irq_domain *cdx_msi_domain;
> +	struct fwnode_handle *fwnode_handle;
> +	struct device_node *parent_node;
> +	struct device_node *np = dev->of_node;
> +
> +	fwnode_handle = of_node_to_fwnode(np);
> +
> +	parent_node = of_parse_phandle(np, "msi-map", 1);

Huh. This only works because you are stuck with a single ITS per system.

> +	if (!parent_node) {
> +		dev_err(dev, "msi-map not present on cdx controller\n");
> +		return NULL;
> +	}
> +
> +	parent = irq_find_matching_fwnode(of_node_to_fwnode(parent_node),
> +			DOMAIN_BUS_NEXUS);
> +	if (!parent || !msi_get_domain_info(parent)) {
> +		dev_err(dev, "unable to locate ITS domain\n");
> +		return NULL;
> +	}
> +
> +	cdx_msi_domain = cdx_msi_create_irq_domain(fwnode_handle,
> +				&cdx_msi_domain_info, parent);
> +	if (!cdx_msi_domain) {
> +		dev_err(dev, "unable to create CDX-MSI domain\n");
> +		return NULL;
> +	}
> +
> +	dev_dbg(dev, "CDX-MSI domain created\n");
> +
> +	return cdx_msi_domain;
> +}
> +
> +struct irq_domain *cdx_find_msi_domain(struct device *parent)
> +{
> +	return irq_find_host(parent->of_node);
> +}
> diff --git a/drivers/bus/cdx/mcdi_stubs.c b/drivers/bus/cdx/mcdi_stubs.c
> index cc9d30fa02f8..2c8db1f5a057 100644
> --- a/drivers/bus/cdx/mcdi_stubs.c
> +++ b/drivers/bus/cdx/mcdi_stubs.c
> @@ -45,6 +45,7 @@ int cdx_mcdi_get_func_config(struct cdx_mcdi_t *cdx_mcdi,
>  	dev_params->res_count = 2;
>  
>  	dev_params->req_id = 0x250;
> +	dev_params->num_msi = 4;

Why the hardcoded 4? Is that part of the firmware emulation stuff?

	M.
Radovanovic, Aleksandar Sept. 7, 2022, 1:18 p.m. UTC | #7
[AMD Official Use Only - General]



> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: 07 September 2022 13:33
> To: Radovanovic, Aleksandar <aleksandar.radovanovic@amd.com>
> Cc: Jason Gunthorpe <jgg@nvidia.com>; Gupta, Nipun
> <Nipun.Gupta@amd.com>; robh+dt@kernel.org;
> krzysztof.kozlowski+dt@linaro.org; gregkh@linuxfoundation.org;
> rafael@kernel.org; eric.auger@redhat.com; alex.williamson@redhat.com;
> cohuck@redhat.com; Gupta, Puneet (DCG-ENG)
> <puneet.gupta@amd.com>; song.bao.hua@hisilicon.com;
> mchehab+huawei@kernel.org; f.fainelli@gmail.com;
> jeffrey.l.hugo@gmail.com; saravanak@google.com;
> Michael.Srba@seznam.cz; mani@kernel.org; yishaih@nvidia.com;
> robin.murphy@arm.com; will@kernel.org; joro@8bytes.org;
> masahiroy@kernel.org; ndesaulniers@google.com; linux-arm-
> kernel@lists.infradead.org; linux-kbuild@vger.kernel.org; linux-
> kernel@vger.kernel.org; devicetree@vger.kernel.org; kvm@vger.kernel.org;
> okaya@kernel.org; Anand, Harpreet <harpreet.anand@amd.com>; Agarwal,
> Nikhil <nikhil.agarwal@amd.com>; Simek, Michal <michal.simek@amd.com>;
> git (AMD-Xilinx) <git@amd.com>
> Subject: Re: [RFC PATCH v3 4/7] bus/cdx: add cdx-MSI domain with gic-its
> domain as parent
> 
> [CAUTION: External Email]
> 
> > As Marc mentions, CDX
> > MSI writes are downstream of the SMMU and, if SMMU does not provide
> > identity mapping for GITS_TRANSLATER, then we have a problem and may
> > need to allow the OS to write the address part. However, even if we
> > did, the CDX hardware is limited in that it can only take one
> > GITS_TRANSLATER register target address per system, not per CDX
> > device, nor per MSI vector.
> 
> If the MSI generation is downstream of the SMMU, why should the SMMU
> provide a 1:1 mapping for GITS_TRANSLATER? I don't think it should provide a
> mapping at all in this case. But it looks like I don't really understand how
> these things are placed relative to each other... :-/
> 

Apologies, I got my streams confused. It is _upstream_ of the SMMU, it does go through SMMU mapping.

> >
> > As for the data part (EventID in GIC parlance), this is always going
> > to be the CDX device-relative vector number - I believe this can't be
> > changed, it is a hardware limitation (but I need to double-check).
> > That should be OK, though, as I believe this is exactly what Linux
> > would write anyway, as each CDX device should be in its own IRQ domain
> > (i.e. have its own ITS device table).
> 
> But that's really the worse part. You have hardcoded what is the
> *current* Linux behaviour. Things change. And baking SW behaviour into a
> piece of HW looks incredibly shortsighted...

For posterity, I'm not an RTL designer/architect, so share your sentiment to a certain extent. That said, I expect the decision was not based on Linux or any other SW behaviour, but because it is the most straightforward and least expensive way to do it. Having an EventID register for each and every MSI source just so you can program it in any random order costs flops and all the associated complexity of extra RTL logic (think timing closure, etc.), so trade-offs are made. The fact that it matches current Linux behaviour means we just got lucky... 

Anyway, I'm straying off topic here, I'll check with the system architects if there's anything that can be done here. But I'm not feeling hopeful.

Aleksandar
Radovanovic, Aleksandar Sept. 8, 2022, 9:51 a.m. UTC | #8
[AMD Official Use Only - General]



> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: 08 September 2022 09:08
> To: Radovanovic, Aleksandar <aleksandar.radovanovic@amd.com>
> Cc: Jason Gunthorpe <jgg@nvidia.com>; Gupta, Nipun
> <Nipun.Gupta@amd.com>; robh+dt@kernel.org;
> krzysztof.kozlowski+dt@linaro.org; gregkh@linuxfoundation.org;
> rafael@kernel.org; eric.auger@redhat.com; alex.williamson@redhat.com;
> cohuck@redhat.com; Gupta, Puneet (DCG-ENG)
> <puneet.gupta@amd.com>; song.bao.hua@hisilicon.com;
> mchehab+huawei@kernel.org; f.fainelli@gmail.com;
> jeffrey.l.hugo@gmail.com; saravanak@google.com;
> Michael.Srba@seznam.cz; mani@kernel.org; yishaih@nvidia.com;
> robin.murphy@arm.com; will@kernel.org; joro@8bytes.org;
> masahiroy@kernel.org; ndesaulniers@google.com; linux-arm-
> kernel@lists.infradead.org; linux-kbuild@vger.kernel.org; linux-
> kernel@vger.kernel.org; devicetree@vger.kernel.org; kvm@vger.kernel.org;
> okaya@kernel.org; Anand, Harpreet <harpreet.anand@amd.com>; Agarwal,
> Nikhil <nikhil.agarwal@amd.com>; Simek, Michal <michal.simek@amd.com>;
> git (AMD-Xilinx) <git@amd.com>
> Subject: Re: [RFC PATCH v3 4/7] bus/cdx: add cdx-MSI domain with gic-its
> domain as parent
> 
> [CAUTION: External Email]
>  
> OK, so you definitely need a mapping, but it cannot be a translation, and it
> needs to be in all the possible address spaces. OMG.

Could you elaborate why it needs to be in all the possible address spaces? I'm in no way familiar with kernel IOVA allocation, so not sure I understand this requirement. Note that each CDX device will have its own unique StreamID (in general case, equal to DeviceID sent to the GIC), so, from a SMMU perspective, the mapping can be specific to that device. As long as that IOVA is not allocated to any DMA region for _that_ device, things should be OK? But, I appreciate it might not be that simple from a kernel perspective.

> > > > As for the data part (EventID in GIC parlance), this is always
> > > > going to be the CDX device-relative vector number - I believe this
> > > > can't be changed, it is a hardware limitation (but I need to double-
> check).
> > > > That should be OK, though, as I believe this is exactly what Linux
> > > > would write anyway, as each CDX device should be in its own IRQ
> > > > domain (i.e. have its own ITS device table).
> > >
> > > But that's really the worse part. You have hardcoded what is the
> > > *current* Linux behaviour. Things change. And baking SW behaviour
> > > into a piece of HW looks incredibly shortsighted...
> >
> > For posterity, I'm not an RTL designer/architect, so share your
> > sentiment to a certain extent. That said, I expect the decision was
> > not based on Linux or any other SW behaviour, but because it is the
> > most straightforward and least expensive way to do it. Having an
> > EventID register for each and every MSI source just so you can program
> > it in any random order costs flops and all the associated complexity
> > of extra RTL logic (think timing closure, etc.), so trade-offs are
> > made. The fact that it matches current Linux behaviour means we just
> > got lucky...
> 
> Yeah, but that's not the only problem: there is no guarantee that we have
> enough LPIs to allocate for the device, so we'll perform a partial allocation (8
> instead of 32 LPIs, for example).

Why should that be a problem? The driver will know in advance the number of vectors required by the device. I expect it will need to call some equivalent of  platform_msi_domain_alloc_irqs(), shouldn't that fail if not enough IRQs are allocated (and ultimately fail the probe)? Even if not, we can still inform the firmware in write_msg, which will serve as an indication that a particular vector is enabled. The firmware can decide what to do with the device if not all of the vectors are enabled.

Aleksandar
Robin Murphy Sept. 8, 2022, 11:49 a.m. UTC | #9
On 2022-09-08 10:51, Radovanovic, Aleksandar wrote:
> [AMD Official Use Only - General]
> 
> 
> 
>> -----Original Message-----
>> From: Marc Zyngier <maz@kernel.org>
>> Sent: 08 September 2022 09:08
>> To: Radovanovic, Aleksandar <aleksandar.radovanovic@amd.com>
>> Cc: Jason Gunthorpe <jgg@nvidia.com>; Gupta, Nipun
>> <Nipun.Gupta@amd.com>; robh+dt@kernel.org;
>> krzysztof.kozlowski+dt@linaro.org; gregkh@linuxfoundation.org;
>> rafael@kernel.org; eric.auger@redhat.com; alex.williamson@redhat.com;
>> cohuck@redhat.com; Gupta, Puneet (DCG-ENG)
>> <puneet.gupta@amd.com>; song.bao.hua@hisilicon.com;
>> mchehab+huawei@kernel.org; f.fainelli@gmail.com;
>> jeffrey.l.hugo@gmail.com; saravanak@google.com;
>> Michael.Srba@seznam.cz; mani@kernel.org; yishaih@nvidia.com;
>> robin.murphy@arm.com; will@kernel.org; joro@8bytes.org;
>> masahiroy@kernel.org; ndesaulniers@google.com; linux-arm-
>> kernel@lists.infradead.org; linux-kbuild@vger.kernel.org; linux-
>> kernel@vger.kernel.org; devicetree@vger.kernel.org; kvm@vger.kernel.org;
>> okaya@kernel.org; Anand, Harpreet <harpreet.anand@amd.com>; Agarwal,
>> Nikhil <nikhil.agarwal@amd.com>; Simek, Michal <michal.simek@amd.com>;
>> git (AMD-Xilinx) <git@amd.com>
>> Subject: Re: [RFC PATCH v3 4/7] bus/cdx: add cdx-MSI domain with gic-its
>> domain as parent
>>
>> [CAUTION: External Email]
>>   
>> OK, so you definitely need a mapping, but it cannot be a translation, and it
>> needs to be in all the possible address spaces. OMG.
> 
> Could you elaborate why it needs to be in all the possible address spaces? I'm in no way familiar with kernel IOVA allocation, so not sure I understand this requirement. Note that each CDX device will have its own unique StreamID (in general case, equal to DeviceID sent to the GIC), so, from a SMMU perspective, the mapping can be specific to that device. As long as that IOVA is not allocated to any DMA region for _that_ device, things should be OK? But, I appreciate it might not be that simple from a kernel perspective.

That's the point - any device could could have its own mapping, 
therefore that hole has to be punched in *every* mapping that any of 
those devices could use, so that MSI writes don't unexpectedly fault, or 
corrupt memory if that address is free to be used to map a DMA buffer. 
At least the HiSilicon PCI quirk is functionally similar (for slightly 
different underlying reasons) so there's already precedent and an 
example that you can follow to a reasonable degree.

Robin.
Nipun Gupta Sept. 8, 2022, 2:13 p.m. UTC | #10
[AMD Official Use Only - General]



> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: Wednesday, September 7, 2022 6:48 PM
> To: Gupta, Nipun <Nipun.Gupta@amd.com>
> Cc: robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org;
> gregkh@linuxfoundation.org; rafael@kernel.org; eric.auger@redhat.com;
> alex.williamson@redhat.com; cohuck@redhat.com; Gupta, Puneet (DCG-
> ENG) <puneet.gupta@amd.com>; song.bao.hua@hisilicon.com;
> mchehab+huawei@kernel.org; f.fainelli@gmail.com;
> jeffrey.l.hugo@gmail.com; saravanak@google.com;
> Michael.Srba@seznam.cz; mani@kernel.org; yishaih@nvidia.com;
> jgg@ziepe.ca; jgg@nvidia.com; robin.murphy@arm.com; will@kernel.org;
> joro@8bytes.org; masahiroy@kernel.org; ndesaulniers@google.com; linux-
> arm-kernel@lists.infradead.org; linux-kbuild@vger.kernel.org; linux-
> kernel@vger.kernel.org; devicetree@vger.kernel.org; kvm@vger.kernel.org;
> okaya@kernel.org; Anand, Harpreet <harpreet.anand@amd.com>; Agarwal,
> Nikhil <nikhil.agarwal@amd.com>; Simek, Michal <michal.simek@amd.com>;
> Radovanovic, Aleksandar <aleksandar.radovanovic@amd.com>; git (AMD-
> Xilinx) <git@amd.com>
> Subject: Re: [RFC PATCH v3 4/7] bus/cdx: add cdx-MSI domain with gic-its
> domain as parent
> 
> [CAUTION: External Email]
> 
> On Tue, 06 Sep 2022 14:47:58 +0100,
> Nipun Gupta <nipun.gupta@amd.com> wrote:
> >
> > Devices on cdx bus are dynamically detected and registered using
> > platform_device_register API. As these devices are not linked to
> > of node they need a separate MSI domain for handling device ID
> > to be provided to the GIC ITS domain.
> >
> > This also introduces APIs to alloc and free IRQs for CDX domain.
> >
> > Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> > Signed-off-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
> > ---
> >  drivers/bus/cdx/cdx.c        |  18 +++
> >  drivers/bus/cdx/cdx.h        |  19 +++
> >  drivers/bus/cdx/cdx_msi.c    | 236
> +++++++++++++++++++++++++++++++++++
> >  drivers/bus/cdx/mcdi_stubs.c |   1 +
> >  include/linux/cdx/cdx_bus.h  |  19 +++
> >  5 files changed, 293 insertions(+)
> >  create mode 100644 drivers/bus/cdx/cdx_msi.c
> >

<..>

> > +             return;
> > +
> > +     msi_domain_free_irqs(msi_domain, dev);
> > +}
> > +EXPORT_SYMBOL(cdx_msi_domain_free_irqs);
> 
> This feels like a very pointless helper, and again a copy/paste from
> the FSL code. I'd rather you change msi_domain_free_irqs() to only
> take a device and use the implicit MSI domain.

I agree with other comments except this one.

In current implementation we have an API "cdx_msi_domain_alloc_irqs()",
so having "cdx_msi_domain_free_irqs()" seems legitimate, as the caller
would allocate and free MSI's using a similar APIs (cdx_msi_domain*).

Changing msi_domain_free_irqs() to use implicit msi domain in case
msi_domain is not provided by the caller seems appropriate, Ill change the
same for "msi_domain_alloc_irqs()" too.

<..>

> > diff --git a/drivers/bus/cdx/mcdi_stubs.c b/drivers/bus/cdx/mcdi_stubs.c
> > index cc9d30fa02f8..2c8db1f5a057 100644
> > --- a/drivers/bus/cdx/mcdi_stubs.c
> > +++ b/drivers/bus/cdx/mcdi_stubs.c
> > @@ -45,6 +45,7 @@ int cdx_mcdi_get_func_config(struct cdx_mcdi_t
> *cdx_mcdi,
> >       dev_params->res_count = 2;
> >
> >       dev_params->req_id = 0x250;
> > +     dev_params->num_msi = 4;
> 
> Why the hardcoded 4? Is that part of the firmware emulation stuff?

Yes, this is currently part of emulation, and would change with proper
emulation support.

> 
>         M.
> 
> --
> Without deviation from the norm, progress is not possible.
Nipun Gupta Sept. 9, 2022, 6:32 a.m. UTC | #11
[AMD Official Use Only - General]



> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: Thursday, September 8, 2022 8:00 PM
> To: Gupta, Nipun <Nipun.Gupta@amd.com>
> Cc: robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org;
> gregkh@linuxfoundation.org; rafael@kernel.org; eric.auger@redhat.com;
> alex.williamson@redhat.com; cohuck@redhat.com; Gupta, Puneet (DCG-ENG)
> <puneet.gupta@amd.com>; song.bao.hua@hisilicon.com;
> mchehab+huawei@kernel.org; f.fainelli@gmail.com; jeffrey.l.hugo@gmail.com;
> saravanak@google.com; Michael.Srba@seznam.cz; mani@kernel.org;
> yishaih@nvidia.com; jgg@ziepe.ca; jgg@nvidia.com; robin.murphy@arm.com;
> will@kernel.org; joro@8bytes.org; masahiroy@kernel.org;
> ndesaulniers@google.com; linux-arm-kernel@lists.infradead.org; linux-
> kbuild@vger.kernel.org; linux-kernel@vger.kernel.org;
> devicetree@vger.kernel.org; kvm@vger.kernel.org; okaya@kernel.org; Anand,
> Harpreet <harpreet.anand@amd.com>; Agarwal, Nikhil
> <nikhil.agarwal@amd.com>; Simek, Michal <michal.simek@amd.com>;
> Radovanovic, Aleksandar <aleksandar.radovanovic@amd.com>; git (AMD-Xilinx)
> <git@amd.com>
> Subject: Re: [RFC PATCH v3 4/7] bus/cdx: add cdx-MSI domain with gic-its
> domain as parent
> 
> [CAUTION: External Email]
> 
> On Thu, 08 Sep 2022 15:13:31 +0100,
> "Gupta, Nipun" <Nipun.Gupta@amd.com> wrote:
> >
> >
> > > > +             return;
> > > > +
> > > > +     msi_domain_free_irqs(msi_domain, dev);
> > > > +}
> > > > +EXPORT_SYMBOL(cdx_msi_domain_free_irqs);
> > >
> > > This feels like a very pointless helper, and again a copy/paste from
> > > the FSL code. I'd rather you change msi_domain_free_irqs() to only
> > > take a device and use the implicit MSI domain.
> >
> > I agree with other comments except this one.
> >
> > In current implementation we have an API "cdx_msi_domain_alloc_irqs()",
> > so having "cdx_msi_domain_free_irqs()" seems legitimate, as the caller
> > would allocate and free MSI's using a similar APIs (cdx_msi_domain*).
> 
> Why would that be a problem? Using generic functions when they apply
> should be the default, and "specialised" helpers are only here as a
> reminder that our MSI API still needs serious improvement.

We can remove the wrapper API, rather have a #define to provide same name
convention for alloc and free IRQ APIs for CDX drivers. But both ways if we use
#define or direct use of msi_domain_free_irqs() API, we need
msi_domain_free_irqs() symbol exported I hope having export symbol to this
API would not be a problem.

> 
> > Changing msi_domain_free_irqs() to use implicit msi domain in case
> > msi_domain is not provided by the caller seems appropriate, Ill change the
> > same for "msi_domain_alloc_irqs()" too.
> 
> What I'm asking is that there is no explicit msi_domain anymore. We
> always use the one referenced by the device. And if that can be done
> on the allocation path too, great.

I think it can be removed from both the APIs. Also, API's
msi_domain_alloc_irqs_descs_locked() and msi_domain_free_irqs_descs_locked()
can have similar change.

> 
> > <..>
> >
> > > > diff --git a/drivers/bus/cdx/mcdi_stubs.c b/drivers/bus/cdx/mcdi_stubs.c
> > > > index cc9d30fa02f8..2c8db1f5a057 100644
> > > > --- a/drivers/bus/cdx/mcdi_stubs.c
> > > > +++ b/drivers/bus/cdx/mcdi_stubs.c
> > > > @@ -45,6 +45,7 @@ int cdx_mcdi_get_func_config(struct cdx_mcdi_t
> > > *cdx_mcdi,
> > > >       dev_params->res_count = 2;
> > > >
> > > >       dev_params->req_id = 0x250;
> > > > +     dev_params->num_msi = 4;
> > >
> > > Why the hardcoded 4? Is that part of the firmware emulation stuff?
> >
> > Yes, this is currently part of emulation, and would change with proper
> > emulation support.
> 
> What "proper emulation support"? I expect no emulation at all, but
> instead a well defined probing method.

I meant proper firmware interfacing support for probing.

> 
>         M.
> 
> --
> Without deviation from the norm, progress is not possible.
Nipun Gupta Oct. 12, 2022, 10:04 a.m. UTC | #12
[AMD Official Use Only - General]


<snip>

> 
> > +}
> > +
> > +static void cdx_msi_write_msg(struct irq_data *irq_data,
> > +                           struct msi_msg *msg)
> > +{
> > +     /*
> > +      * Do nothing as CDX devices have these pre-populated
> > +      * in the hardware itself.
> > +      */
> 
> We talked about this in a separate thread. This is a major problem.

We discussed this further with the hardware design team and now have the
correct and complete understanding here. As the CDX devices are FPGA based,
they don't exist yet, so it would be possible to construct them in such a way
that the eventid is programable.

To make it generic for CDX devices, we have added a firmware API which provide
the mappings (MSI vector ID to eventID) to the fabric, that can be referred by the
device while generating the MSI interrupt.

Also, there is an existing table to have GITS_TRANSLATOR iova address (address in
msi_msg) for CDX devices, which can be programmed by the firmware. So, providing
IOVA address to device would also not be a problem here.

We would be rolling out RFC v4 with these changes soon.

Regards,
Nipun
Radovanovic, Aleksandar Oct. 12, 2022, 10:34 a.m. UTC | #13
[AMD Official Use Only - General]

> -----Original Message-----
> From: Gupta, Nipun <Nipun.Gupta@amd.com>
> Sent: 12 October 2022 11:04
> To: Marc Zyngier <maz@kernel.org>; Robin Murphy
> <robin.murphy@arm.com>
> Cc: robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org;
> gregkh@linuxfoundation.org; rafael@kernel.org; eric.auger@redhat.com;
> alex.williamson@redhat.com; cohuck@redhat.com; Gupta, Puneet (DCG-
> ENG) <puneet.gupta@amd.com>; song.bao.hua@hisilicon.com;
> mchehab+huawei@kernel.org; f.fainelli@gmail.com;
> jeffrey.l.hugo@gmail.com; saravanak@google.com;
> Michael.Srba@seznam.cz; mani@kernel.org; yishaih@nvidia.com;
> jgg@ziepe.ca; jgg@nvidia.com; will@kernel.org; joro@8bytes.org;
> masahiroy@kernel.org; ndesaulniers@google.com; linux-arm-
> kernel@lists.infradead.org; linux-kbuild@vger.kernel.org; linux-
> kernel@vger.kernel.org; devicetree@vger.kernel.org; kvm@vger.kernel.org;
> okaya@kernel.org; Anand, Harpreet <harpreet.anand@amd.com>; Agarwal,
> Nikhil <nikhil.agarwal@amd.com>; Simek, Michal <michal.simek@amd.com>;
> Radovanovic, Aleksandar <aleksandar.radovanovic@amd.com>; git (AMD-
> Xilinx) <git@amd.com>
> Subject: RE: [RFC PATCH v3 4/7] bus/cdx: add cdx-MSI domain with gic-its
> domain as parent
> 
> [AMD Official Use Only - General]
> 
> 
> <snip>
> 
> >
> > > +}
> > > +
> > > +static void cdx_msi_write_msg(struct irq_data *irq_data,
> > > +                           struct msi_msg *msg) {
> > > +     /*
> > > +      * Do nothing as CDX devices have these pre-populated
> > > +      * in the hardware itself.
> > > +      */
> >
> > We talked about this in a separate thread. This is a major problem.
> 
> We discussed this further with the hardware design team and now have the
> correct and complete understanding here. As the CDX devices are FPGA
> based, they don't exist yet, so it would be possible to construct them in such
> a way that the eventid is programable.
> 
> To make it generic for CDX devices, we have added a firmware API which
> provide the mappings (MSI vector ID to eventID) to the fabric, that can be
> referred by the device while generating the MSI interrupt.
> 
> Also, there is an existing table to have GITS_TRANSLATOR iova address
> (address in
> msi_msg) for CDX devices, which can be programmed by the firmware. So,
> providing IOVA address to device would also not be a problem here.
> 
> We would be rolling out RFC v4 with these changes soon.
> 
> Regards,
> Nipun

Just to be clear, there will be some HW limitations with the proposed solution,
so let's just make sure that we're all OK with it.

For the MSI EventID, the HW interrupt logic assumes the MSI write value is 
equal to the MSI vector number. However, the vector number is programmable
for most (all) of the interrupt sources, which isn't exactly the same as saying
EventID is programmable for a vector number, but can be used to emulate the
desired behaviour, with a translation table in firmware. 

The limitation here is that we support at most 16 bits of EventID (and this still
needs to be confirmed for all interrupt sources)

As for GITS_TRANSLATER, we can take up to 4 different IOVAs, which limits us
to 4 CDX devices (should be sufficient for current HW use-cases). Also, it means
that the address part must be the same for all vectors within a single CDX 
device. I'm assuming this is OK as it is going to be a single interrupt and IOMMU
domain anyway.

Thanks,
Aleksandar
Jason Gunthorpe Oct. 12, 2022, 1:02 p.m. UTC | #14
On Wed, Oct 12, 2022 at 10:34:23AM +0000, Radovanovic, Aleksandar wrote:

> For the MSI EventID, the HW interrupt logic assumes the MSI write value is 
> equal to the MSI vector number. However, the vector number is programmable
> for most (all) of the interrupt sources, which isn't exactly the same as saying
> EventID is programmable for a vector number, but can be used to emulate the
> desired behaviour, with a translation table in firmware. 

If you do this stuff wrong you will eventually run into situations
that don't work. Like VFIO/VMs and things.

> As for GITS_TRANSLATER, we can take up to 4 different IOVAs, which limits us
> to 4 CDX devices (should be sufficient for current HW use-cases). Also, it means
> that the address part must be the same for all vectors within a single CDX 
> device. I'm assuming this is OK as it is going to be a single interrupt and IOMMU
> domain anyway.

This is not at all how MSI is supposed to work.

Jason
Radovanovic, Aleksandar Oct. 12, 2022, 1:37 p.m. UTC | #15
[AMD Official Use Only - General]



> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: 12 October 2022 14:02
> To: Radovanovic, Aleksandar <aleksandar.radovanovic@amd.com>
> Cc: Gupta, Nipun <Nipun.Gupta@amd.com>; Marc Zyngier
> <maz@kernel.org>; Robin Murphy <robin.murphy@arm.com>;
> robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org;
> gregkh@linuxfoundation.org; rafael@kernel.org; eric.auger@redhat.com;
> alex.williamson@redhat.com; cohuck@redhat.com; Gupta, Puneet (DCG-
> ENG) <puneet.gupta@amd.com>; song.bao.hua@hisilicon.com;
> mchehab+huawei@kernel.org; f.fainelli@gmail.com;
> jeffrey.l.hugo@gmail.com; saravanak@google.com;
> Michael.Srba@seznam.cz; mani@kernel.org; yishaih@nvidia.com;
> will@kernel.org; joro@8bytes.org; masahiroy@kernel.org;
> ndesaulniers@google.com; linux-arm-kernel@lists.infradead.org; linux-
> kbuild@vger.kernel.org; linux-kernel@vger.kernel.org;
> devicetree@vger.kernel.org; kvm@vger.kernel.org; okaya@kernel.org;
> Anand, Harpreet <harpreet.anand@amd.com>; Agarwal, Nikhil
> <nikhil.agarwal@amd.com>; Simek, Michal <michal.simek@amd.com>; git
> (AMD-Xilinx) <git@amd.com>
> Subject: Re: [RFC PATCH v3 4/7] bus/cdx: add cdx-MSI domain with gic-its
> domain as parent
> 
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
> 
> 
> On Wed, Oct 12, 2022 at 10:34:23AM +0000, Radovanovic, Aleksandar wrote:
> 
> 
> > As for GITS_TRANSLATER, we can take up to 4 different IOVAs, which
> > limits us to 4 CDX devices (should be sufficient for current HW
> > use-cases). Also, it means that the address part must be the same for
> > all vectors within a single CDX device. I'm assuming this is OK as it
> > is going to be a single interrupt and IOMMU domain anyway.
> 
> This is not at all how MSI is supposed to work.

In the general case, no, they're not. However, this is an embedded device
with a GICv3, so the general case does not really apply. On GIC, the MSI 
target address is always a fixed register in the ITS (GIC_TRANSLATER), 
possibly SMMU translated. As long as the translation is consistent across
a single device (and I see no reason why or how the kernel would do it 
otherwise, given that a single CDX device generates the same StreamID
for all MSI writes), the GIC IOVA should be the same for all vectors of 
the device.

It is worth noting that this limitation is not going to be baked in the 
proposed MSI configuration interface, it will still take both the address
and data parts for each vector. It is just that this particular 
implementation will throw an error if you supply a different target
address across device MSI vectors. It does not preclude some future
device accepting different addresses per vector over the same 
interface.

Thanks,
Aleksandar
Jason Gunthorpe Oct. 12, 2022, 2:38 p.m. UTC | #16
On Wed, Oct 12, 2022 at 01:37:54PM +0000, Radovanovic, Aleksandar wrote:
> > On Wed, Oct 12, 2022 at 10:34:23AM +0000, Radovanovic, Aleksandar wrote:
> > 
> > 
> > > As for GITS_TRANSLATER, we can take up to 4 different IOVAs, which
> > > limits us to 4 CDX devices (should be sufficient for current HW
> > > use-cases). Also, it means that the address part must be the same for
> > > all vectors within a single CDX device. I'm assuming this is OK as it
> > > is going to be a single interrupt and IOMMU domain anyway.
> > 
> > This is not at all how MSI is supposed to work.
> 
> In the general case, no, they're not.

I don't mean that you can hack this to work - I mean that in MSI the
addr/data is supposed to come from the end point itself, not from some
kind of shared structure. This is important because the actual act of
generating the write has to be coherent with the DMA the device is
doing, as the MSI write must push any DMA data to visibility to meet
the "producer / consumer" model.

So it is really weird/wrong to have a HW design where the MSI
infrastructure is shared across many devices.

Jason
Radovanovic, Aleksandar Oct. 12, 2022, 3:09 p.m. UTC | #17
[AMD Official Use Only - General]



> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: 12 October 2022 15:38
> To: Radovanovic, Aleksandar <aleksandar.radovanovic@amd.com>
> Cc: Gupta, Nipun <Nipun.Gupta@amd.com>; Marc Zyngier
> <maz@kernel.org>; Robin Murphy <robin.murphy@arm.com>;
> robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org;
> gregkh@linuxfoundation.org; rafael@kernel.org; eric.auger@redhat.com;
> alex.williamson@redhat.com; cohuck@redhat.com; Gupta, Puneet (DCG-
> ENG) <puneet.gupta@amd.com>; song.bao.hua@hisilicon.com;
> mchehab+huawei@kernel.org; f.fainelli@gmail.com;
> jeffrey.l.hugo@gmail.com; saravanak@google.com;
> Michael.Srba@seznam.cz; mani@kernel.org; yishaih@nvidia.com;
> will@kernel.org; joro@8bytes.org; masahiroy@kernel.org;
> ndesaulniers@google.com; linux-arm-kernel@lists.infradead.org; linux-
> kbuild@vger.kernel.org; linux-kernel@vger.kernel.org;
> devicetree@vger.kernel.org; kvm@vger.kernel.org; okaya@kernel.org;
> Anand, Harpreet <harpreet.anand@amd.com>; Agarwal, Nikhil
> <nikhil.agarwal@amd.com>; Simek, Michal <michal.simek@amd.com>; git
> (AMD-Xilinx) <git@amd.com>
> Subject: Re: [RFC PATCH v3 4/7] bus/cdx: add cdx-MSI domain with gic-its
> domain as parent
> 
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
> 
> 
> On Wed, Oct 12, 2022 at 01:37:54PM +0000, Radovanovic, Aleksandar wrote:
> > > On Wed, Oct 12, 2022 at 10:34:23AM +0000, Radovanovic, Aleksandar
> wrote:
> > >
> > >
> > > > As for GITS_TRANSLATER, we can take up to 4 different IOVAs, which
> > > > limits us to 4 CDX devices (should be sufficient for current HW
> > > > use-cases). Also, it means that the address part must be the same
> > > > for all vectors within a single CDX device. I'm assuming this is
> > > > OK as it is going to be a single interrupt and IOMMU domain anyway.
> > >
> > > This is not at all how MSI is supposed to work.
> >
> > In the general case, no, they're not.
> 
> I don't mean that you can hack this to work - I mean that in MSI the
> addr/data is supposed to come from the end point itself, not from some kind
> of shared structure. This is important because the actual act of generating
> the write has to be coherent with the DMA the device is doing, as the MSI
> write must push any DMA data to visibility to meet the "producer /
> consumer" model.
> 

I'm not sure I follow your argument, the limitation here is that the MSI
address value is shared between vectors of the same device (requester id
or endpoint, whichever way you prefer to call it), not between devices. 
This in no way implies that it is unordered with respect to device DMA - 
it is ordered and takes the same AXI path into the CPU cluster, so the
producer/consumer semantics are preserved.

Thanks,
Aleksandar
Jason Gunthorpe Oct. 13, 2022, 12:43 p.m. UTC | #18
On Wed, Oct 12, 2022 at 03:09:26PM +0000, Radovanovic, Aleksandar wrote:

> > On Wed, Oct 12, 2022 at 01:37:54PM +0000, Radovanovic, Aleksandar wrote:
> > > > On Wed, Oct 12, 2022 at 10:34:23AM +0000, Radovanovic, Aleksandar
> > wrote:
> > > >
> > > >
> > > > > As for GITS_TRANSLATER, we can take up to 4 different IOVAs, which
> > > > > limits us to 4 CDX devices (should be sufficient for current HW
> > > > > use-cases). Also, it means that the address part must be the same
> > > > > for all vectors within a single CDX device. I'm assuming this is
> > > > > OK as it is going to be a single interrupt and IOMMU domain anyway.
> > > >
> > > > This is not at all how MSI is supposed to work.
> > >
> > > In the general case, no, they're not.
> > 
> > I don't mean that you can hack this to work - I mean that in MSI the
> > addr/data is supposed to come from the end point itself, not from some kind
> > of shared structure. This is important because the actual act of generating
> > the write has to be coherent with the DMA the device is doing, as the MSI
> > write must push any DMA data to visibility to meet the "producer /
> > consumer" model.
> > 
> 
> I'm not sure I follow your argument, the limitation here is that the MSI
> address value is shared between vectors of the same device (requester id
> or endpoint, whichever way you prefer to call it), not between
> devices.

That isn't what you said, you said "we can take up to 4 different
IOVAs, which limits us to 4 CDX devices" - which sounds like HW being
shared across devices??

Jason
Radovanovic, Aleksandar Oct. 14, 2022, 11:18 a.m. UTC | #19
[AMD Official Use Only - General]



> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: 13 October 2022 13:43
> To: Radovanovic, Aleksandar <aleksandar.radovanovic@amd.com>
> Cc: Gupta, Nipun <Nipun.Gupta@amd.com>; Marc Zyngier
> <maz@kernel.org>; Robin Murphy <robin.murphy@arm.com>;
> robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org;
> gregkh@linuxfoundation.org; rafael@kernel.org; eric.auger@redhat.com;
> alex.williamson@redhat.com; cohuck@redhat.com; Gupta, Puneet (DCG-
> ENG) <puneet.gupta@amd.com>; song.bao.hua@hisilicon.com;
> mchehab+huawei@kernel.org; f.fainelli@gmail.com;
> jeffrey.l.hugo@gmail.com; saravanak@google.com;
> Michael.Srba@seznam.cz; mani@kernel.org; yishaih@nvidia.com;
> will@kernel.org; joro@8bytes.org; masahiroy@kernel.org;
> ndesaulniers@google.com; linux-arm-kernel@lists.infradead.org; linux-
> kbuild@vger.kernel.org; linux-kernel@vger.kernel.org;
> devicetree@vger.kernel.org; kvm@vger.kernel.org; okaya@kernel.org;
> Anand, Harpreet <harpreet.anand@amd.com>; Agarwal, Nikhil
> <nikhil.agarwal@amd.com>; Simek, Michal <michal.simek@amd.com>; git
> (AMD-Xilinx) <git@amd.com>
> Subject: Re: [RFC PATCH v3 4/7] bus/cdx: add cdx-MSI domain with gic-its
> domain as parent
> 
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
> 
> 
> On Wed, Oct 12, 2022 at 03:09:26PM +0000, Radovanovic, Aleksandar wrote:
> 
> > > On Wed, Oct 12, 2022 at 01:37:54PM +0000, Radovanovic, Aleksandar
> wrote:
> > > > > On Wed, Oct 12, 2022 at 10:34:23AM +0000, Radovanovic,
> > > > > Aleksandar
> > > wrote:
> > > > >
> > > > >
> > > > > > As for GITS_TRANSLATER, we can take up to 4 different IOVAs,
> > > > > > which limits us to 4 CDX devices (should be sufficient for
> > > > > > current HW use-cases). Also, it means that the address part
> > > > > > must be the same for all vectors within a single CDX device.
> > > > > > I'm assuming this is OK as it is going to be a single interrupt and
> IOMMU domain anyway.
> > > > >
> > > > > This is not at all how MSI is supposed to work.
> > > >
> > > > In the general case, no, they're not.
> > >
> > > I don't mean that you can hack this to work - I mean that in MSI the
> > > addr/data is supposed to come from the end point itself, not from
> > > some kind of shared structure. This is important because the actual
> > > act of generating the write has to be coherent with the DMA the
> > > device is doing, as the MSI write must push any DMA data to
> > > visibility to meet the "producer / consumer" model.
> > >
> >
> > I'm not sure I follow your argument, the limitation here is that the
> > MSI address value is shared between vectors of the same device
> > (requester id or endpoint, whichever way you prefer to call it), not
> > between devices.
> 
> That isn't what you said, you said "we can take up to 4 different IOVAs, which
> limits us to 4 CDX devices" - which sounds like HW being shared across
> devices??

And that still does not imply lack of ordering or sharing of MSI target addresses between devices.

This is a highly programmable IP block, at the core of which is an interconnect interfacing to programmable logic (PL), a number of PCIe controllers (either endpoint or root-port), DMA engines, offload engines, the embedded processor subsystem (PSX), etc. DMA and interrupts can be routed across it in almost any (meaningful) direction. The datapath 'endpoints' request DMA and interrupts, but don't concern themselves with the mechanics of delivering that in the target domain. It is the responsibility of the egress bridges to the target domains to convert the interconnect interrupt transactions to whatever the interrupt delivery mechanism for that domain is. E.g. for PCIe controllers in endpoint mode, that would be through PCIe MSI-X tables internal to the controller (and managed by the PCIe host), for PSX that would be the PSX bridge (partially managed by the PSX OS, mediated through firmware, i.e. through CDX bus driver) and so on. It is the responsibility of the interconnect to maintain transaction ordering (including DMA vs. interrupts). It is the responsibility of the firmware to manage the bridges according to the implemented use-case, so everything works as expected. 

The CDX bus driver manages a single aspect of this and that is endpoints implemented in PL/engines, targeting the PSX.

So, yes, the hardware that translates interrupt transactions to GIC AXI writes is shared between endpoints, but what I said above still applies. And that doesn't necessarily make it weird/wrong, it's just more complex than you might think.

Anyway, I think we're straying off topic here, none of this is visible to the kernel anyway. The question that we still need to answer is, are you OK with the limitations I listed originally?

Thanks,
Aleksandar
Greg KH Oct. 14, 2022, 11:54 a.m. UTC | #20
On Fri, Oct 14, 2022 at 11:18:36AM +0000, Radovanovic, Aleksandar wrote:
> Anyway, I think we're straying off topic here, none of this is visible to the kernel anyway. The question that we still need to answer is, are you OK with the limitations I listed originally?

What original limitations?  This thread is crazy and complex and you
need to use your \n key more :)

thanks,

greg k-h
Radovanovic, Aleksandar Oct. 14, 2022, 12:13 p.m. UTC | #21
[AMD Official Use Only - General]



> -----Original Message-----
> From: gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>
> Sent: 14 October 2022 12:55
> To: Radovanovic, Aleksandar <aleksandar.radovanovic@amd.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>; Gupta, Nipun
> <Nipun.Gupta@amd.com>; Marc Zyngier <maz@kernel.org>; Robin Murphy
> <robin.murphy@arm.com>; robh+dt@kernel.org;
> krzysztof.kozlowski+dt@linaro.org; rafael@kernel.org;
> eric.auger@redhat.com; alex.williamson@redhat.com; cohuck@redhat.com;
> Gupta, Puneet (DCG-ENG) <puneet.gupta@amd.com>;
> song.bao.hua@hisilicon.com; mchehab+huawei@kernel.org;
> f.fainelli@gmail.com; jeffrey.l.hugo@gmail.com; saravanak@google.com;
> Michael.Srba@seznam.cz; mani@kernel.org; yishaih@nvidia.com;
> will@kernel.org; joro@8bytes.org; masahiroy@kernel.org;
> ndesaulniers@google.com; linux-arm-kernel@lists.infradead.org; linux-
> kbuild@vger.kernel.org; linux-kernel@vger.kernel.org;
> devicetree@vger.kernel.org; kvm@vger.kernel.org; okaya@kernel.org;
> Anand, Harpreet <harpreet.anand@amd.com>; Agarwal, Nikhil
> <nikhil.agarwal@amd.com>; Simek, Michal <michal.simek@amd.com>; git
> (AMD-Xilinx) <git@amd.com>
> Subject: Re: [RFC PATCH v3 4/7] bus/cdx: add cdx-MSI domain with gic-its
> domain as parent
> 
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
> 
> 
> On Fri, Oct 14, 2022 at 11:18:36AM +0000, Radovanovic, Aleksandar wrote:
> > Anyway, I think we're straying off topic here, none of this is visible to the
> kernel anyway. The question that we still need to answer is, are you OK with
> the limitations I listed originally?
> 
> What original limitations?  

Limitations with regards to MSI message configuration of a CDX device:

1) MSI write value is at most 16 useable bits

2) MSI address value must be the same across all vectors of a single CDX device
This would be the (potentially IOMMU translated) I/O address of
GITS_TRANSLATER. As long as that IOMMU translation is consistent across a 
single device, I think we should be OK.

> This thread is crazy and complex and you need to
> use your \n key more :)

I do try, but the corporate mail client that we're stuck with seems to just merge
everything back together. Sigh.

> thanks,
> 
> greg k-h
Greg KH Oct. 14, 2022, 1:46 p.m. UTC | #22
On Fri, Oct 14, 2022 at 12:13:45PM +0000, Radovanovic, Aleksandar wrote:
> [AMD Official Use Only - General]
> 
> 
> 
> > -----Original Message-----
> > From: gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>
> > Sent: 14 October 2022 12:55
> > To: Radovanovic, Aleksandar <aleksandar.radovanovic@amd.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>; Gupta, Nipun
> > <Nipun.Gupta@amd.com>; Marc Zyngier <maz@kernel.org>; Robin Murphy
> > <robin.murphy@arm.com>; robh+dt@kernel.org;
> > krzysztof.kozlowski+dt@linaro.org; rafael@kernel.org;
> > eric.auger@redhat.com; alex.williamson@redhat.com; cohuck@redhat.com;
> > Gupta, Puneet (DCG-ENG) <puneet.gupta@amd.com>;
> > song.bao.hua@hisilicon.com; mchehab+huawei@kernel.org;
> > f.fainelli@gmail.com; jeffrey.l.hugo@gmail.com; saravanak@google.com;
> > Michael.Srba@seznam.cz; mani@kernel.org; yishaih@nvidia.com;
> > will@kernel.org; joro@8bytes.org; masahiroy@kernel.org;
> > ndesaulniers@google.com; linux-arm-kernel@lists.infradead.org; linux-
> > kbuild@vger.kernel.org; linux-kernel@vger.kernel.org;
> > devicetree@vger.kernel.org; kvm@vger.kernel.org; okaya@kernel.org;
> > Anand, Harpreet <harpreet.anand@amd.com>; Agarwal, Nikhil
> > <nikhil.agarwal@amd.com>; Simek, Michal <michal.simek@amd.com>; git
> > (AMD-Xilinx) <git@amd.com>
> > Subject: Re: [RFC PATCH v3 4/7] bus/cdx: add cdx-MSI domain with gic-its
> > domain as parent
> > 
> > Caution: This message originated from an External Source. Use proper
> > caution when opening attachments, clicking links, or responding.
> > 
> > 
> > On Fri, Oct 14, 2022 at 11:18:36AM +0000, Radovanovic, Aleksandar wrote:
> > > Anyway, I think we're straying off topic here, none of this is visible to the
> > kernel anyway. The question that we still need to answer is, are you OK with
> > the limitations I listed originally?
> > 
> > What original limitations?  
> 
> Limitations with regards to MSI message configuration of a CDX device:
> 
> 1) MSI write value is at most 16 useable bits
> 
> 2) MSI address value must be the same across all vectors of a single CDX device
> This would be the (potentially IOMMU translated) I/O address of
> GITS_TRANSLATER. As long as that IOMMU translation is consistent across a 
> single device, I think we should be OK.

It's been a while since I read the PCI spec, but this feels like it does
not follow what MSI is supposed to allow.  Is the "CDX" spec anywhere
that mentions any of this as to what is supposed to be allowed and
supported?

And what is a "single device" here in how the kernel knows about it?  Is
it a PCI device, or some other new structure that is handed to the
kernel from the BIOS?

thanks,

greg k-h
Jason Gunthorpe Oct. 14, 2022, 1:58 p.m. UTC | #23
On Fri, Oct 14, 2022 at 11:18:36AM +0000, Radovanovic, Aleksandar wrote:

> And that still does not imply lack of ordering or sharing of MSI
> target addresses between devices.

Either the end point generates the MSI, and maybe the bridge mangles
it, or it raises a lot of suspicion that this is not right. If the end
point generates the MSI then it raises the question why do we need to
tolerate these limits?

> This is a highly programmable IP block, at the core of which is an
> interconnect interfacing to programmable logic (PL), a number of
> PCIe controllers (either endpoint or root-port), DMA engines,
> offload engines, the embedded processor subsystem (PSX), etc. DMA
> and interrupts can be routed across it in almost any (meaningful)
> direction. The datapath 'endpoints' request DMA and interrupts, but
> don't concern themselves with the mechanics of delivering that in
> the target domain. It is the responsibility of the egress bridges to
> the target domains to convert the interconnect interrupt
> transactions to whatever the interrupt delivery mechanism for that
> domain is. E.g. for PCIe controllers in endpoint mode, that would be
> through PCIe MSI-X tables internal to the controller (and managed by
> the PCIe host), for PSX that would be the PSX bridge (partially
> managed by the PSX OS, mediated through firmware, i.e. through CDX
> bus driver) and so on. It is the responsibility of the interconnect
> to maintain transaction ordering (including DMA vs. interrupts). It
> is the responsibility of the firmware to manage the bridges
> according to the implemented use-case, so everything works as
> expected.

Again, this all just seems wrongly designed. MSI should not be part
of an interconnect bridge. We did that already 20 years ago, it was
called IOAPICs on x86 and I think everyone is happy to see it gone.

If you want to build IOAPICs again, I guess you can, but that is a
slightly different SW setup than the MSI you are trying to use here,
and even that didn't have the same limitations you are proposing.

> So, yes, the hardware that translates interrupt transactions to GIC
> AXI writes is shared between endpoints, but what I said above still
> applies. And that doesn't necessarily make it weird/wrong, it's just
> more complex than you might think.

If it doesn't fit the architecture, then I think it must be considered
wrong. Mis-using platform architected components like MSI in HW is
problematic.

You should design the HW properly so you don't have these
problems. Involving FW in the MSI setup is also a bad idea, POWER did
this and it made a big mess of their arch code :(

Jason
diff mbox series

Patch

diff --git a/drivers/bus/cdx/cdx.c b/drivers/bus/cdx/cdx.c
index fc417c32c59b..02ececce1c84 100644
--- a/drivers/bus/cdx/cdx.c
+++ b/drivers/bus/cdx/cdx.c
@@ -17,6 +17,7 @@ 
 #include <linux/dma-map-ops.h>
 #include <linux/property.h>
 #include <linux/iommu.h>
+#include <linux/msi.h>
 #include <linux/cdx/cdx_bus.h>
 
 #include "cdx.h"
@@ -236,6 +237,7 @@  static int cdx_device_add(struct device *parent,
 			  struct cdx_dev_params_t *dev_params)
 {
 	struct cdx_device *cdx_dev;
+	struct irq_domain *cdx_msi_domain;
 	int ret;
 
 	cdx_dev = kzalloc(sizeof(*cdx_dev), GFP_KERNEL);
@@ -252,6 +254,7 @@  static int cdx_device_add(struct device *parent,
 
 	/* Populate CDX dev params */
 	cdx_dev->req_id = dev_params->req_id;
+	cdx_dev->num_msi = dev_params->num_msi;
 	cdx_dev->vendor = dev_params->vendor;
 	cdx_dev->device = dev_params->device;
 	cdx_dev->bus_id = dev_params->bus_id;
@@ -269,6 +272,21 @@  static int cdx_device_add(struct device *parent,
 	dev_set_name(&cdx_dev->dev, "cdx-%02x:%02x", cdx_dev->bus_id,
 			cdx_dev->func_id);
 
+	/* If CDX MSI domain is not created, create one. */
+	cdx_msi_domain = cdx_find_msi_domain(parent);
+	if (!cdx_msi_domain) {
+		cdx_msi_domain = cdx_msi_domain_init(parent);
+		if (!cdx_msi_domain) {
+			dev_err(&cdx_dev->dev,
+				"cdx_msi_domain_init() failed: %d", ret);
+			kfree(cdx_dev);
+			return -1;
+		}
+	}
+
+	/* Set the MSI domain */
+	dev_set_msi_domain(&cdx_dev->dev, cdx_msi_domain);
+
 	ret = device_add(&cdx_dev->dev);
 	if (ret != 0) {
 		dev_err(&cdx_dev->dev,
diff --git a/drivers/bus/cdx/cdx.h b/drivers/bus/cdx/cdx.h
index db0569431c10..95df440ebd73 100644
--- a/drivers/bus/cdx/cdx.h
+++ b/drivers/bus/cdx/cdx.h
@@ -20,6 +20,7 @@ 
  * @res: array of MMIO region entries
  * @res_count: number of valid MMIO regions
  * @req_id: Requestor ID associated with CDX device
+ * @num_msi: Number of MSI's supported by the device
  */
 struct cdx_dev_params_t {
 	u16 vendor;
@@ -29,6 +30,24 @@  struct cdx_dev_params_t {
 	struct resource res[MAX_CDX_DEV_RESOURCES];
 	u8 res_count;
 	u32 req_id;
+	u32 num_msi;
 };
 
+/**
+ * cdx_msi_domain_init - Init the CDX bus MSI domain.
+ * @dev: Device of the CDX bus controller
+ *
+ * Return CDX MSI domain, NULL on failure
+ */
+struct irq_domain *cdx_msi_domain_init(struct device *dev);
+
+/**
+ * cdx_find_msi_domain - Get the CDX-MSI domain.
+ * @dev: CDX controller generic device
+ *
+ * Return CDX MSI domain, NULL on error or if CDX-MSI domain is
+ *   not yet created.
+ */
+struct irq_domain *cdx_find_msi_domain(struct device *parent);
+
 #endif /* _CDX_H_ */
diff --git a/drivers/bus/cdx/cdx_msi.c b/drivers/bus/cdx/cdx_msi.c
new file mode 100644
index 000000000000..2fb7bac18393
--- /dev/null
+++ b/drivers/bus/cdx/cdx_msi.c
@@ -0,0 +1,236 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD CDX bus driver MSI support
+ *
+ * Copyright (C) 2022, Advanced Micro Devices, Inc.
+ *
+ */
+
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/irq.h>
+#include <linux/irqdomain.h>
+#include <linux/msi.h>
+#include <linux/cdx/cdx_bus.h>
+
+#include "cdx.h"
+
+#ifdef GENERIC_MSI_DOMAIN_OPS
+/*
+ * Generate a unique ID identifying the interrupt (only used within the MSI
+ * irqdomain.  Combine the req_id with the interrupt index.
+ */
+static irq_hw_number_t cdx_domain_calc_hwirq(struct cdx_device *dev,
+					     struct msi_desc *desc)
+{
+	/*
+	 * Make the base hwirq value for req_id*10000 so it is readable
+	 * as a decimal value in /proc/interrupts.
+	 */
+	return (irq_hw_number_t)(desc->msi_index + (dev->req_id * 10000));
+}
+
+static void cdx_msi_set_desc(msi_alloc_info_t *arg,
+			     struct msi_desc *desc)
+{
+	arg->desc = desc;
+	arg->hwirq = cdx_domain_calc_hwirq(to_cdx_device(desc->dev), desc);
+}
+#else
+#define cdx_msi_set_desc NULL
+#endif
+
+static void cdx_msi_update_dom_ops(struct msi_domain_info *info)
+{
+	struct msi_domain_ops *ops = info->ops;
+
+	if (!ops)
+		return;
+
+	/* set_desc should not be set by the caller */
+	if (!ops->set_desc)
+		ops->set_desc = cdx_msi_set_desc;
+}
+
+static void cdx_msi_write_msg(struct irq_data *irq_data,
+			      struct msi_msg *msg)
+{
+	/*
+	 * Do nothing as CDX devices have these pre-populated
+	 * in the hardware itself.
+	 */
+}
+
+static void cdx_msi_update_chip_ops(struct msi_domain_info *info)
+{
+	struct irq_chip *chip = info->chip;
+
+	if (!chip)
+		return;
+
+	/*
+	 * irq_write_msi_msg should not be set by the caller
+	 */
+	if (!chip->irq_write_msi_msg)
+		chip->irq_write_msi_msg = cdx_msi_write_msg;
+}
+/**
+ * cdx_msi_create_irq_domain - Create a CDX MSI interrupt domain
+ * @fwnode:	Optional firmware node of the interrupt controller
+ * @info:	MSI domain info
+ * @parent:	Parent irq domain
+ *
+ * Updates the domain and chip ops and creates a CDX MSI
+ * interrupt domain.
+ *
+ * Returns:
+ * A domain pointer or NULL in case of failure.
+ */
+static struct irq_domain *cdx_msi_create_irq_domain(struct fwnode_handle *fwnode,
+						    struct msi_domain_info *info,
+						    struct irq_domain *parent)
+{
+	if (WARN_ON((info->flags & MSI_FLAG_LEVEL_CAPABLE)))
+		info->flags &= ~MSI_FLAG_LEVEL_CAPABLE;
+	if (info->flags & MSI_FLAG_USE_DEF_DOM_OPS)
+		cdx_msi_update_dom_ops(info);
+	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
+		cdx_msi_update_chip_ops(info);
+	info->flags |= MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS | MSI_FLAG_FREE_MSI_DESCS;
+
+	return msi_create_irq_domain(fwnode, info, parent);
+}
+
+int cdx_msi_domain_alloc_irqs(struct device *dev, unsigned int irq_count)
+{
+	struct irq_domain *msi_domain;
+	int ret;
+
+	msi_domain = dev_get_msi_domain(dev);
+	if (!msi_domain) {
+		dev_err(dev, "msi domain get failed\n");
+		return -EINVAL;
+	}
+
+	ret = msi_setup_device_data(dev);
+	if (ret) {
+		dev_err(dev, "msi setup device failed: %d\n", ret);
+		return ret;
+	}
+
+	msi_lock_descs(dev);
+	if (msi_first_desc(dev, MSI_DESC_ALL))
+		ret = -EINVAL;
+	msi_unlock_descs(dev);
+	if (ret) {
+		dev_err(dev, "msi setup device failed: %d\n", ret);
+		return ret;
+	}
+
+	ret = msi_domain_alloc_irqs(msi_domain, dev, irq_count);
+	if (ret)
+		dev_err(dev, "Failed to allocate IRQs\n");
+
+	return ret;
+}
+EXPORT_SYMBOL(cdx_msi_domain_alloc_irqs);
+
+void cdx_msi_domain_free_irqs(struct device *dev)
+{
+	struct irq_domain *msi_domain;
+
+	msi_domain = dev_get_msi_domain(dev);
+	if (!msi_domain)
+		return;
+
+	msi_domain_free_irqs(msi_domain, dev);
+}
+EXPORT_SYMBOL(cdx_msi_domain_free_irqs);
+
+static struct irq_chip cdx_msi_irq_chip = {
+	.name = "CDX-MSI",
+	.irq_mask = irq_chip_mask_parent,
+	.irq_unmask = irq_chip_unmask_parent,
+	.irq_eoi = irq_chip_eoi_parent,
+	.irq_set_affinity = msi_domain_set_affinity
+};
+
+static int cdx_msi_prepare(struct irq_domain *msi_domain,
+			   struct device *dev,
+			   int nvec, msi_alloc_info_t *info)
+{
+	struct cdx_device *cdx_dev = to_cdx_device(dev);
+	struct msi_domain_info *msi_info;
+	struct device *parent = dev->parent;
+	u32 dev_id;
+	int ret;
+
+	/* Retrieve device ID from requestor ID using parent device */
+	ret = of_map_id(parent->of_node, cdx_dev->req_id, "msi-map",
+			"msi-map-mask",	NULL, &dev_id);
+	if (ret) {
+		dev_err(dev, "of_map_id failed for MSI: %d\n", ret);
+		return ret;
+	}
+
+	/* Set the device Id to be passed to the GIC-ITS */
+	info->scratchpad[0].ul = dev_id;
+
+	msi_info = msi_get_domain_info(msi_domain->parent);
+
+	/* Allocate at least 32 MSIs, and always as a power of 2 */
+	nvec = max_t(int, 32, roundup_pow_of_two(nvec));
+	return msi_info->ops->msi_prepare(msi_domain->parent, dev, nvec, info);
+}
+
+static struct msi_domain_ops cdx_msi_ops __ro_after_init = {
+	.msi_prepare = cdx_msi_prepare,
+};
+
+static struct msi_domain_info cdx_msi_domain_info = {
+	.flags	= (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS),
+	.ops	= &cdx_msi_ops,
+	.chip	= &cdx_msi_irq_chip,
+};
+
+struct irq_domain *cdx_msi_domain_init(struct device *dev)
+{
+	struct irq_domain *parent;
+	struct irq_domain *cdx_msi_domain;
+	struct fwnode_handle *fwnode_handle;
+	struct device_node *parent_node;
+	struct device_node *np = dev->of_node;
+
+	fwnode_handle = of_node_to_fwnode(np);
+
+	parent_node = of_parse_phandle(np, "msi-map", 1);
+	if (!parent_node) {
+		dev_err(dev, "msi-map not present on cdx controller\n");
+		return NULL;
+	}
+
+	parent = irq_find_matching_fwnode(of_node_to_fwnode(parent_node),
+			DOMAIN_BUS_NEXUS);
+	if (!parent || !msi_get_domain_info(parent)) {
+		dev_err(dev, "unable to locate ITS domain\n");
+		return NULL;
+	}
+
+	cdx_msi_domain = cdx_msi_create_irq_domain(fwnode_handle,
+				&cdx_msi_domain_info, parent);
+	if (!cdx_msi_domain) {
+		dev_err(dev, "unable to create CDX-MSI domain\n");
+		return NULL;
+	}
+
+	dev_dbg(dev, "CDX-MSI domain created\n");
+
+	return cdx_msi_domain;
+}
+
+struct irq_domain *cdx_find_msi_domain(struct device *parent)
+{
+	return irq_find_host(parent->of_node);
+}
diff --git a/drivers/bus/cdx/mcdi_stubs.c b/drivers/bus/cdx/mcdi_stubs.c
index cc9d30fa02f8..2c8db1f5a057 100644
--- a/drivers/bus/cdx/mcdi_stubs.c
+++ b/drivers/bus/cdx/mcdi_stubs.c
@@ -45,6 +45,7 @@  int cdx_mcdi_get_func_config(struct cdx_mcdi_t *cdx_mcdi,
 	dev_params->res_count = 2;
 
 	dev_params->req_id = 0x250;
+	dev_params->num_msi = 4;
 	dev_params->vendor = 0x10ee;
 	dev_params->device = 0x8084;
 	dev_params->bus_id = bus_id;
diff --git a/include/linux/cdx/cdx_bus.h b/include/linux/cdx/cdx_bus.h
index 6e870b2c87d9..bf86024783d2 100644
--- a/include/linux/cdx/cdx_bus.h
+++ b/include/linux/cdx/cdx_bus.h
@@ -25,6 +25,7 @@ 
  * @dma_mask: Default DMA mask
  * @flags: CDX device flags
  * @req_id: Requestor ID associated with CDX device
+ * @num_msi: Number of MSI's supported by the device
  * @driver_override: driver name to force a match; do not set directly,
  *                   because core frees it; use driver_set_override() to
  *                   set or clear it.
@@ -40,6 +41,7 @@  struct cdx_device {
 	u64 dma_mask;
 	u16 flags;
 	u32 req_id;
+	u32 num_msi;
 	const char *driver_override;
 };
 
@@ -90,4 +92,21 @@  void cdx_driver_unregister(struct cdx_driver *driver);
 
 extern struct bus_type cdx_bus_type;
 
+/**
+ * cdx_msi_domain_alloc_irqs - Allocate MSI's for the CDX device
+ * @dev: device pointer
+ * @irq_count: Number of MSI's to be allocated
+ *
+ * Return 0 for success, -errno on failure
+ */
+int cdx_msi_domain_alloc_irqs(struct device *dev, unsigned int irq_count);
+
+/**
+ * cdx_msi_domain_free_irqs - Free MSI's for CDX device
+ * @dev: device pointer
+ *
+ * Return 0 for success, -errno on failure
+ */
+void cdx_msi_domain_free_irqs(struct device *dev);
+
 #endif /* _CDX_BUS_H_ */