diff mbox series

[v3,27/40] cxl/pci: Cache device DVSEC offset

Message ID 164298426273.3018233.9302136088649279124.stgit@dwillia2-desk3.amr.corp.intel.com
State New, archived
Headers show
Series CXL.mem Topology Discovery and Hotplug Support | expand

Commit Message

Dan Williams Jan. 24, 2022, 12:31 a.m. UTC
From: Ben Widawsky <ben.widawsky@intel.com>

The PCIe device DVSEC, defined in the CXL 2.0 spec, 8.1.3 is required to
be implemented by CXL 2.0 endpoint devices. Since the information
contained within this DVSEC will be critically important, it makes sense
to find the value early, and error out if it cannot be found.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/cxlmem.h |    2 ++
 drivers/cxl/pci.c    |    9 +++++++++
 2 files changed, 11 insertions(+)

Comments

Jonathan Cameron Jan. 31, 2022, 6:19 p.m. UTC | #1
On Sun, 23 Jan 2022 16:31:02 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> From: Ben Widawsky <ben.widawsky@intel.com>
> 
> The PCIe device DVSEC, defined in the CXL 2.0 spec, 8.1.3 is required to
> be implemented by CXL 2.0 endpoint devices. Since the information
> contained within this DVSEC will be critically important, it makes sense
> to find the value early, and error out if it cannot be found.
> 
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Guess the logic makes sense about checking this early though my cynical
mind says, that if someone is putting in devices that claim to be
CXL ones and this isn't there it is there own problem if they
kernel wastes effort bringing the driver up only to find later
it can't finish doing so...

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

note that I got confused by this one when checking what it was for
as you rename it in the next patch... I'll complain about that there ;)


> ---
>  drivers/cxl/cxlmem.h |    2 ++
>  drivers/cxl/pci.c    |    9 +++++++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 90d67fff5bed..cedc6d3c0448 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -98,6 +98,7 @@ struct cxl_mbox_cmd {
>   *
>   * @dev: The device associated with this CXL state
>   * @regs: Parsed register blocks
> + * @device_dvsec: Offset to the PCIe device DVSEC
>   * @payload_size: Size of space for payload
>   *                (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
>   * @lsa_size: Size of Label Storage Area
> @@ -126,6 +127,7 @@ struct cxl_dev_state {
>  	struct device *dev;
>  
>  	struct cxl_regs regs;
> +	int device_dvsec;
>  
>  	size_t payload_size;
>  	size_t lsa_size;
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index e54dbdf9ac15..76de39b90351 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -408,6 +408,15 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (IS_ERR(cxlds))
>  		return PTR_ERR(cxlds);
>  
> +	cxlds->device_dvsec = pci_find_dvsec_capability(pdev,
> +							PCI_DVSEC_VENDOR_ID_CXL,
> +							CXL_DVSEC_PCIE_DEVICE);
> +	if (!cxlds->device_dvsec) {
> +		dev_err(&pdev->dev,
> +			"Device DVSEC not present. Expect limited functionality.\n");
> +		return -ENXIO;
> +	}
> +
>  	rc = cxl_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
>  	if (rc)
>  		return rc;
>
Ben Widawsky Feb. 1, 2022, 3:24 p.m. UTC | #2
On 22-01-31 18:19:24, Jonathan Cameron wrote:
> On Sun, 23 Jan 2022 16:31:02 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > From: Ben Widawsky <ben.widawsky@intel.com>
> > 
> > The PCIe device DVSEC, defined in the CXL 2.0 spec, 8.1.3 is required to
> > be implemented by CXL 2.0 endpoint devices. Since the information
> > contained within this DVSEC will be critically important, it makes sense
> > to find the value early, and error out if it cannot be found.
> > 
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Guess the logic makes sense about checking this early though my cynical
> mind says, that if someone is putting in devices that claim to be
> CXL ones and this isn't there it is there own problem if they
> kernel wastes effort bringing the driver up only to find later
> it can't finish doing so...

I don't remember if Dan and I discussed actually failing to bind this early if
the DVSEC isn't there. I think the concern is less about wasted effort and more
about the inability to determine if the device is actively decoding something
and then having the kernel driver tear that out when it takes over the decoder
resources. This was specifically targeted toward the DVSEC range registers
(obviously things would fail later if we couldn't find the MMIO).

I agree with your cynical mind though that it might not be our job to prevent
devices which aren't spec compliant. I'd say if we start seeing bug reports
around this we can revisit.

> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> note that I got confused by this one when checking what it was for
> as you rename it in the next patch... I'll complain about that there ;)
> 
> 
> > ---
> >  drivers/cxl/cxlmem.h |    2 ++
> >  drivers/cxl/pci.c    |    9 +++++++++
> >  2 files changed, 11 insertions(+)
> > 
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 90d67fff5bed..cedc6d3c0448 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -98,6 +98,7 @@ struct cxl_mbox_cmd {
> >   *
> >   * @dev: The device associated with this CXL state
> >   * @regs: Parsed register blocks
> > + * @device_dvsec: Offset to the PCIe device DVSEC
> >   * @payload_size: Size of space for payload
> >   *                (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
> >   * @lsa_size: Size of Label Storage Area
> > @@ -126,6 +127,7 @@ struct cxl_dev_state {
> >  	struct device *dev;
> >  
> >  	struct cxl_regs regs;
> > +	int device_dvsec;
> >  
> >  	size_t payload_size;
> >  	size_t lsa_size;
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index e54dbdf9ac15..76de39b90351 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -408,6 +408,15 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >  	if (IS_ERR(cxlds))
> >  		return PTR_ERR(cxlds);
> >  
> > +	cxlds->device_dvsec = pci_find_dvsec_capability(pdev,
> > +							PCI_DVSEC_VENDOR_ID_CXL,
> > +							CXL_DVSEC_PCIE_DEVICE);
> > +	if (!cxlds->device_dvsec) {
> > +		dev_err(&pdev->dev,
> > +			"Device DVSEC not present. Expect limited functionality.\n");
> > +		return -ENXIO;
> > +	}
> > +
> >  	rc = cxl_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
> >  	if (rc)
> >  		return rc;
> > 
>
Dan Williams Feb. 1, 2022, 9:41 p.m. UTC | #3
On Tue, Feb 1, 2022 at 7:24 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> On 22-01-31 18:19:24, Jonathan Cameron wrote:
> > On Sun, 23 Jan 2022 16:31:02 -0800
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > > From: Ben Widawsky <ben.widawsky@intel.com>
> > >
> > > The PCIe device DVSEC, defined in the CXL 2.0 spec, 8.1.3 is required to
> > > be implemented by CXL 2.0 endpoint devices. Since the information
> > > contained within this DVSEC will be critically important, it makes sense
> > > to find the value early, and error out if it cannot be found.
> > >
> > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > Guess the logic makes sense about checking this early though my cynical
> > mind says, that if someone is putting in devices that claim to be
> > CXL ones and this isn't there it is there own problem if they
> > kernel wastes effort bringing the driver up only to find later
> > it can't finish doing so...
>
> I don't remember if Dan and I discussed actually failing to bind this early if
> the DVSEC isn't there.

On second look, the error message does not make sense because there is
"no functionality" not "limited functionality" as a result of this
failure because the cxl_pci driver just gives up. This failure should
be limited to cxl_mem, not cxl_pci as there might still be value in
accessing the mailbox on this device.

> I think the concern is less about wasted effort and more
> about the inability to determine if the device is actively decoding something
> and then having the kernel driver tear that out when it takes over the decoder
> resources. This was specifically targeted toward the DVSEC range registers
> (obviously things would fail later if we couldn't find the MMIO).

If there is no CXL DVSEC then cxl_mem should fail, that's it.

> I agree with your cynical mind though that it might not be our job to prevent
> devices which aren't spec compliant. I'd say if we start seeing bug reports
> around this we can revisit.

What would the bug report be, "driver fails to attach to device that
does not implement the spec"?
Ben Widawsky Feb. 1, 2022, 10:11 p.m. UTC | #4
On 22-02-01 13:41:50, Dan Williams wrote:
> On Tue, Feb 1, 2022 at 7:24 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> >
> > On 22-01-31 18:19:24, Jonathan Cameron wrote:
> > > On Sun, 23 Jan 2022 16:31:02 -0800
> > > Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > > From: Ben Widawsky <ben.widawsky@intel.com>
> > > >
> > > > The PCIe device DVSEC, defined in the CXL 2.0 spec, 8.1.3 is required to
> > > > be implemented by CXL 2.0 endpoint devices. Since the information
> > > > contained within this DVSEC will be critically important, it makes sense
> > > > to find the value early, and error out if it cannot be found.
> > > >
> > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > Guess the logic makes sense about checking this early though my cynical
> > > mind says, that if someone is putting in devices that claim to be
> > > CXL ones and this isn't there it is there own problem if they
> > > kernel wastes effort bringing the driver up only to find later
> > > it can't finish doing so...
> >
> > I don't remember if Dan and I discussed actually failing to bind this early if
> > the DVSEC isn't there.
> 
> On second look, the error message does not make sense because there is
> "no functionality" not "limited functionality" as a result of this
> failure because the cxl_pci driver just gives up. This failure should
> be limited to cxl_mem, not cxl_pci as there might still be value in
> accessing the mailbox on this device.
> 
> > I think the concern is less about wasted effort and more
> > about the inability to determine if the device is actively decoding something
> > and then having the kernel driver tear that out when it takes over the decoder
> > resources. This was specifically targeted toward the DVSEC range registers
> > (obviously things would fail later if we couldn't find the MMIO).
> 
> If there is no CXL DVSEC then cxl_mem should fail, that's it.
> 

If there is no CXL DVSEC we have no way to find the device's MMIO. You need the
register locator dvsec. Not sure how you intend to do anything with the device
at that point, but if you see something I don't, then by all means, change it.

> > I agree with your cynical mind though that it might not be our job to prevent
> > devices which aren't spec compliant. I'd say if we start seeing bug reports
> > around this we can revisit.
> 
> What would the bug report be, "driver fails to attach to device that
> does not implement the spec"?
Dan Williams Feb. 1, 2022, 10:15 p.m. UTC | #5
On Tue, Feb 1, 2022 at 2:11 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> On 22-02-01 13:41:50, Dan Williams wrote:
> > On Tue, Feb 1, 2022 at 7:24 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > >
> > > On 22-01-31 18:19:24, Jonathan Cameron wrote:
> > > > On Sun, 23 Jan 2022 16:31:02 -0800
> > > > Dan Williams <dan.j.williams@intel.com> wrote:
> > > >
> > > > > From: Ben Widawsky <ben.widawsky@intel.com>
> > > > >
> > > > > The PCIe device DVSEC, defined in the CXL 2.0 spec, 8.1.3 is required to
> > > > > be implemented by CXL 2.0 endpoint devices. Since the information
> > > > > contained within this DVSEC will be critically important, it makes sense
> > > > > to find the value early, and error out if it cannot be found.
> > > > >
> > > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > > Guess the logic makes sense about checking this early though my cynical
> > > > mind says, that if someone is putting in devices that claim to be
> > > > CXL ones and this isn't there it is there own problem if they
> > > > kernel wastes effort bringing the driver up only to find later
> > > > it can't finish doing so...
> > >
> > > I don't remember if Dan and I discussed actually failing to bind this early if
> > > the DVSEC isn't there.
> >
> > On second look, the error message does not make sense because there is
> > "no functionality" not "limited functionality" as a result of this
> > failure because the cxl_pci driver just gives up. This failure should
> > be limited to cxl_mem, not cxl_pci as there might still be value in
> > accessing the mailbox on this device.
> >
> > > I think the concern is less about wasted effort and more
> > > about the inability to determine if the device is actively decoding something
> > > and then having the kernel driver tear that out when it takes over the decoder
> > > resources. This was specifically targeted toward the DVSEC range registers
> > > (obviously things would fail later if we couldn't find the MMIO).
> >
> > If there is no CXL DVSEC then cxl_mem should fail, that's it.
> >
>
> If there is no CXL DVSEC we have no way to find the device's MMIO. You need the
> register locator dvsec. Not sure how you intend to do anything with the device
> at that point, but if you see something I don't, then by all means, change it.

I see:

pci_find_dvsec_capability(pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);

...and:

pci_find_dvsec_capability(pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_REG_LOCATOR);

...aren't they independent?
Ben Widawsky Feb. 1, 2022, 10:20 p.m. UTC | #6
On 22-02-01 14:15:22, Dan Williams wrote:
> On Tue, Feb 1, 2022 at 2:11 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> >
> > On 22-02-01 13:41:50, Dan Williams wrote:
> > > On Tue, Feb 1, 2022 at 7:24 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > > >
> > > > On 22-01-31 18:19:24, Jonathan Cameron wrote:
> > > > > On Sun, 23 Jan 2022 16:31:02 -0800
> > > > > Dan Williams <dan.j.williams@intel.com> wrote:
> > > > >
> > > > > > From: Ben Widawsky <ben.widawsky@intel.com>
> > > > > >
> > > > > > The PCIe device DVSEC, defined in the CXL 2.0 spec, 8.1.3 is required to
> > > > > > be implemented by CXL 2.0 endpoint devices. Since the information
> > > > > > contained within this DVSEC will be critically important, it makes sense
> > > > > > to find the value early, and error out if it cannot be found.
> > > > > >
> > > > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > > > Guess the logic makes sense about checking this early though my cynical
> > > > > mind says, that if someone is putting in devices that claim to be
> > > > > CXL ones and this isn't there it is there own problem if they
> > > > > kernel wastes effort bringing the driver up only to find later
> > > > > it can't finish doing so...
> > > >
> > > > I don't remember if Dan and I discussed actually failing to bind this early if
> > > > the DVSEC isn't there.
> > >
> > > On second look, the error message does not make sense because there is
> > > "no functionality" not "limited functionality" as a result of this
> > > failure because the cxl_pci driver just gives up. This failure should
> > > be limited to cxl_mem, not cxl_pci as there might still be value in
> > > accessing the mailbox on this device.
> > >
> > > > I think the concern is less about wasted effort and more
> > > > about the inability to determine if the device is actively decoding something
> > > > and then having the kernel driver tear that out when it takes over the decoder
> > > > resources. This was specifically targeted toward the DVSEC range registers
> > > > (obviously things would fail later if we couldn't find the MMIO).
> > >
> > > If there is no CXL DVSEC then cxl_mem should fail, that's it.
> > >
> >
> > If there is no CXL DVSEC we have no way to find the device's MMIO. You need the
> > register locator dvsec. Not sure how you intend to do anything with the device
> > at that point, but if you see something I don't, then by all means, change it.
> 
> I see:
> 
> pci_find_dvsec_capability(pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
> 
> ...and:
> 
> pci_find_dvsec_capability(pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_REG_LOCATOR);
> 
> ...aren't they independent?

My mistake. I was thinking of a different patch, "cxl/pci: Retrieve CXL DVSEC
memory info". You're correct, they are independent (both mandatory for type 3
devices).

However, Jonathan was the one who originally suggested it. I had it as a warn
originally.
https://lore.kernel.org/linux-cxl/20211122223430.gvkwj3yeckriffes@intel.com/
Dan Williams Feb. 1, 2022, 10:24 p.m. UTC | #7
On Tue, Feb 1, 2022 at 2:20 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> On 22-02-01 14:15:22, Dan Williams wrote:
> > On Tue, Feb 1, 2022 at 2:11 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > >
> > > On 22-02-01 13:41:50, Dan Williams wrote:
> > > > On Tue, Feb 1, 2022 at 7:24 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > > > >
> > > > > On 22-01-31 18:19:24, Jonathan Cameron wrote:
> > > > > > On Sun, 23 Jan 2022 16:31:02 -0800
> > > > > > Dan Williams <dan.j.williams@intel.com> wrote:
> > > > > >
> > > > > > > From: Ben Widawsky <ben.widawsky@intel.com>
> > > > > > >
> > > > > > > The PCIe device DVSEC, defined in the CXL 2.0 spec, 8.1.3 is required to
> > > > > > > be implemented by CXL 2.0 endpoint devices. Since the information
> > > > > > > contained within this DVSEC will be critically important, it makes sense
> > > > > > > to find the value early, and error out if it cannot be found.
> > > > > > >
> > > > > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > > > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > > > > Guess the logic makes sense about checking this early though my cynical
> > > > > > mind says, that if someone is putting in devices that claim to be
> > > > > > CXL ones and this isn't there it is there own problem if they
> > > > > > kernel wastes effort bringing the driver up only to find later
> > > > > > it can't finish doing so...
> > > > >
> > > > > I don't remember if Dan and I discussed actually failing to bind this early if
> > > > > the DVSEC isn't there.
> > > >
> > > > On second look, the error message does not make sense because there is
> > > > "no functionality" not "limited functionality" as a result of this
> > > > failure because the cxl_pci driver just gives up. This failure should
> > > > be limited to cxl_mem, not cxl_pci as there might still be value in
> > > > accessing the mailbox on this device.
> > > >
> > > > > I think the concern is less about wasted effort and more
> > > > > about the inability to determine if the device is actively decoding something
> > > > > and then having the kernel driver tear that out when it takes over the decoder
> > > > > resources. This was specifically targeted toward the DVSEC range registers
> > > > > (obviously things would fail later if we couldn't find the MMIO).
> > > >
> > > > If there is no CXL DVSEC then cxl_mem should fail, that's it.
> > > >
> > >
> > > If there is no CXL DVSEC we have no way to find the device's MMIO. You need the
> > > register locator dvsec. Not sure how you intend to do anything with the device
> > > at that point, but if you see something I don't, then by all means, change it.
> >
> > I see:
> >
> > pci_find_dvsec_capability(pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
> >
> > ...and:
> >
> > pci_find_dvsec_capability(pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_REG_LOCATOR);
> >
> > ...aren't they independent?
>
> My mistake. I was thinking of a different patch, "cxl/pci: Retrieve CXL DVSEC
> memory info". You're correct, they are independent (both mandatory for type 3
> devices).
>
> However, Jonathan was the one who originally suggested it. I had it as a warn
> originally.
> https://lore.kernel.org/linux-cxl/20211122223430.gvkwj3yeckriffes@intel.com/

At least to the concern of "nothing" working without the base CXL
DVSEC the cxl_mem driver failing to attach catches that case.
Otherwise a device that only implements the mailbox seems not outside
the realm of possibility. Jonathan?
Jonathan Cameron Feb. 2, 2022, 9:36 a.m. UTC | #8
On Tue, 1 Feb 2022 14:24:51 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> On Tue, Feb 1, 2022 at 2:20 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> >
> > On 22-02-01 14:15:22, Dan Williams wrote:  
> > > On Tue, Feb 1, 2022 at 2:11 PM Ben Widawsky <ben.widawsky@intel.com> wrote:  
> > > >
> > > > On 22-02-01 13:41:50, Dan Williams wrote:  
> > > > > On Tue, Feb 1, 2022 at 7:24 AM Ben Widawsky <ben.widawsky@intel.com> wrote:  
> > > > > >
> > > > > > On 22-01-31 18:19:24, Jonathan Cameron wrote:  
> > > > > > > On Sun, 23 Jan 2022 16:31:02 -0800
> > > > > > > Dan Williams <dan.j.williams@intel.com> wrote:
> > > > > > >  
> > > > > > > > From: Ben Widawsky <ben.widawsky@intel.com>
> > > > > > > >
> > > > > > > > The PCIe device DVSEC, defined in the CXL 2.0 spec, 8.1.3 is required to
> > > > > > > > be implemented by CXL 2.0 endpoint devices. Since the information
> > > > > > > > contained within this DVSEC will be critically important, it makes sense
> > > > > > > > to find the value early, and error out if it cannot be found.
> > > > > > > >
> > > > > > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > > > > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>  
> > > > > > > Guess the logic makes sense about checking this early though my cynical
> > > > > > > mind says, that if someone is putting in devices that claim to be
> > > > > > > CXL ones and this isn't there it is there own problem if they
> > > > > > > kernel wastes effort bringing the driver up only to find later
> > > > > > > it can't finish doing so...  
> > > > > >
> > > > > > I don't remember if Dan and I discussed actually failing to bind this early if
> > > > > > the DVSEC isn't there.  
> > > > >
> > > > > On second look, the error message does not make sense because there is
> > > > > "no functionality" not "limited functionality" as a result of this
> > > > > failure because the cxl_pci driver just gives up. This failure should
> > > > > be limited to cxl_mem, not cxl_pci as there might still be value in
> > > > > accessing the mailbox on this device.
> > > > >  
> > > > > > I think the concern is less about wasted effort and more
> > > > > > about the inability to determine if the device is actively decoding something
> > > > > > and then having the kernel driver tear that out when it takes over the decoder
> > > > > > resources. This was specifically targeted toward the DVSEC range registers
> > > > > > (obviously things would fail later if we couldn't find the MMIO).  
> > > > >
> > > > > If there is no CXL DVSEC then cxl_mem should fail, that's it.
> > > > >  
> > > >
> > > > If there is no CXL DVSEC we have no way to find the device's MMIO. You need the
> > > > register locator dvsec. Not sure how you intend to do anything with the device
> > > > at that point, but if you see something I don't, then by all means, change it.  
> > >
> > > I see:
> > >
> > > pci_find_dvsec_capability(pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
> > >
> > > ...and:
> > >
> > > pci_find_dvsec_capability(pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_REG_LOCATOR);
> > >
> > > ...aren't they independent?  
> >
> > My mistake. I was thinking of a different patch, "cxl/pci: Retrieve CXL DVSEC
> > memory info". You're correct, they are independent (both mandatory for type 3
> > devices).
> >
> > However, Jonathan was the one who originally suggested it. I had it as a warn
> > originally.
> > https://lore.kernel.org/linux-cxl/20211122223430.gvkwj3yeckriffes@intel.com/  
> 
> At least to the concern of "nothing" working without the base CXL
> DVSEC the cxl_mem driver failing to attach catches that case.
> Otherwise a device that only implements the mailbox seems not outside
> the realm of possibility. Jonathan?

I don't really care. To my mind the hardware is broken anyway,
but if you want to try and enable 'some stuff' then I'm fine with
that - mostly I think if it's broken enough that the DVSEC locator
isn't there, then chances the mailbox works are probably low, but
then I'm not implementing the hardware :)

Jonathan
diff mbox series

Patch

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 90d67fff5bed..cedc6d3c0448 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -98,6 +98,7 @@  struct cxl_mbox_cmd {
  *
  * @dev: The device associated with this CXL state
  * @regs: Parsed register blocks
+ * @device_dvsec: Offset to the PCIe device DVSEC
  * @payload_size: Size of space for payload
  *                (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
  * @lsa_size: Size of Label Storage Area
@@ -126,6 +127,7 @@  struct cxl_dev_state {
 	struct device *dev;
 
 	struct cxl_regs regs;
+	int device_dvsec;
 
 	size_t payload_size;
 	size_t lsa_size;
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index e54dbdf9ac15..76de39b90351 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -408,6 +408,15 @@  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (IS_ERR(cxlds))
 		return PTR_ERR(cxlds);
 
+	cxlds->device_dvsec = pci_find_dvsec_capability(pdev,
+							PCI_DVSEC_VENDOR_ID_CXL,
+							CXL_DVSEC_PCIE_DEVICE);
+	if (!cxlds->device_dvsec) {
+		dev_err(&pdev->dev,
+			"Device DVSEC not present. Expect limited functionality.\n");
+		return -ENXIO;
+	}
+
 	rc = cxl_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
 	if (rc)
 		return rc;