Message ID | 164730735869.3806189.4032428192652531946.stgit@dwillia2-desk3.amr.corp.intel.com |
---|---|
State | Accepted |
Commit | 36bfc6ad508af38f212cf5a38147d867fb3f80a8 |
Headers | show |
Series | cxl: Handle DVSEC range init failures | expand |
On 22-03-14 18:22:38, Dan Williams wrote: > cxl_dvsec_ranges(), the helper for enumerating the presence of an active > legacy CXL.mem configuration on a CXL 2.0 Memory Expander, is not fatal > for cxl_pci because there is still value to enable mailbox operations > even if CXL.mem operation is disabled. Recall that the reason cxl_pci > does this initialization and not cxl_mem is to preserve the useful > property (for unit testing) that cxl_mem is cxl_memdev + mmio generic, > and does not require access to a 'struct pci_dev' to issue config > cycles. > > Update 'struct cxl_endpoint_dvsec_info' to carry either a positive > number of non-zero size legacy CXL DVSEC ranges, or the negative error > code from __cxl_dvsec_ranges() in its @ranges member. > > Reported-by: Krzysztof Zach <krzysztof.zach@intel.com> > Fixes: 560f78559006 ("cxl/pci: Retrieve CXL DVSEC memory info") > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > drivers/cxl/pci.c | 27 ++++++++++++++++++--------- > 1 file changed, 18 insertions(+), 9 deletions(-) > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 257cf735505d..994c79bf6afd 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -463,13 +463,18 @@ static int wait_for_media_ready(struct cxl_dev_state *cxlds) > return 0; > } > > -static int cxl_dvsec_ranges(struct cxl_dev_state *cxlds) > +/* > + * Return positive number of non-zero ranges on success and a negative > + * error code on failure. The cxl_mem driver depends on ranges == 0 to > + * init HDM operation. > + */ It shouldn't depend on 0 ranges, it depends on ranges matching existing HDM decoder ranges. I took a shortcut by just checking global enable originally because we didn't yet have code to enumerate decoders (anything else would be a crazy BIOS bug that's probably not worth working around). > +static int __cxl_dvsec_ranges(struct cxl_dev_state *cxlds, > + struct cxl_endpoint_dvsec_info *info) > { > - struct cxl_endpoint_dvsec_info *info = &cxlds->info; > struct pci_dev *pdev = to_pci_dev(cxlds->dev); > + int hdm_count, rc, i, ranges = 0; > struct device *dev = &pdev->dev; > int d = cxlds->cxl_dvsec; > - int hdm_count, rc, i; > u16 cap, ctrl; > > if (!d) { > @@ -546,10 +551,17 @@ static int cxl_dvsec_ranges(struct cxl_dev_state *cxlds) > }; > > if (size) > - info->ranges++; > + ranges++; > } > > - return 0; > + return ranges; > +} > + > +static void cxl_dvsec_ranges(struct cxl_dev_state *cxlds) > +{ > + struct cxl_endpoint_dvsec_info *info = &cxlds->info; > + > + info->ranges = __cxl_dvsec_ranges(cxlds, info); > } > > static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > @@ -618,10 +630,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > if (rc) > return rc; > > - rc = cxl_dvsec_ranges(cxlds); > - if (rc) > - dev_warn(&pdev->dev, > - "Failed to get DVSEC range information (%d)\n", rc); > + cxl_dvsec_ranges(cxlds); > > cxlmd = devm_cxl_add_memdev(cxlds); > if (IS_ERR(cxlmd)) >
On Thu, Mar 17, 2022 at 10:52 AM Ben Widawsky <ben.widawsky@intel.com> wrote: > > On 22-03-14 18:22:38, Dan Williams wrote: > > cxl_dvsec_ranges(), the helper for enumerating the presence of an active > > legacy CXL.mem configuration on a CXL 2.0 Memory Expander, is not fatal > > for cxl_pci because there is still value to enable mailbox operations > > even if CXL.mem operation is disabled. Recall that the reason cxl_pci > > does this initialization and not cxl_mem is to preserve the useful > > property (for unit testing) that cxl_mem is cxl_memdev + mmio generic, > > and does not require access to a 'struct pci_dev' to issue config > > cycles. > > > > Update 'struct cxl_endpoint_dvsec_info' to carry either a positive > > number of non-zero size legacy CXL DVSEC ranges, or the negative error > > code from __cxl_dvsec_ranges() in its @ranges member. > > > > Reported-by: Krzysztof Zach <krzysztof.zach@intel.com> > > Fixes: 560f78559006 ("cxl/pci: Retrieve CXL DVSEC memory info") > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > --- > > drivers/cxl/pci.c | 27 ++++++++++++++++++--------- > > 1 file changed, 18 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > > index 257cf735505d..994c79bf6afd 100644 > > --- a/drivers/cxl/pci.c > > +++ b/drivers/cxl/pci.c > > @@ -463,13 +463,18 @@ static int wait_for_media_ready(struct cxl_dev_state *cxlds) > > return 0; > > } > > > > -static int cxl_dvsec_ranges(struct cxl_dev_state *cxlds) > > +/* > > + * Return positive number of non-zero ranges on success and a negative > > + * error code on failure. The cxl_mem driver depends on ranges == 0 to > > + * init HDM operation. > > + */ > > It shouldn't depend on 0 ranges, it depends on ranges matching existing HDM > decoder ranges. I took a shortcut by just checking global enable originally > because we didn't yet have code to enumerate decoders (anything else would be a > crazy BIOS bug that's probably not worth working around). Hmm, I don't see that as a shortcut. If global enable is set then there is no requirement that CXL DVSEC matches the HDM decoder ranges, if global enable is not set then the HDM decoder ranges are not in effect. This is what the new comment in cxl_hdm_decode_init() is trying to clarify. My proposal is that Linux ignores the recommendation which I think is trying to accommodate legacy CXL 1.1 software stacks which Linux never had. /* * Per CXL 2.0 Section 8.1.3.8.3 and 8.1.3.8.4 DVSEC CXL Range 1 Base * [High,Low] when HDM operation is enabled the range register values * are ignored by the device, but the spec also recommends matching the * DVSEC Range 1,2 to HDM Decoder Range 0,1 so, non-zero info->ranges * are expected. */
On 22-03-17 11:20:48, Dan Williams wrote: > On Thu, Mar 17, 2022 at 10:52 AM Ben Widawsky <ben.widawsky@intel.com> wrote: > > > > On 22-03-14 18:22:38, Dan Williams wrote: > > > cxl_dvsec_ranges(), the helper for enumerating the presence of an active > > > legacy CXL.mem configuration on a CXL 2.0 Memory Expander, is not fatal > > > for cxl_pci because there is still value to enable mailbox operations > > > even if CXL.mem operation is disabled. Recall that the reason cxl_pci > > > does this initialization and not cxl_mem is to preserve the useful > > > property (for unit testing) that cxl_mem is cxl_memdev + mmio generic, > > > and does not require access to a 'struct pci_dev' to issue config > > > cycles. > > > > > > Update 'struct cxl_endpoint_dvsec_info' to carry either a positive > > > number of non-zero size legacy CXL DVSEC ranges, or the negative error > > > code from __cxl_dvsec_ranges() in its @ranges member. > > > > > > Reported-by: Krzysztof Zach <krzysztof.zach@intel.com> > > > Fixes: 560f78559006 ("cxl/pci: Retrieve CXL DVSEC memory info") > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > > --- > > > drivers/cxl/pci.c | 27 ++++++++++++++++++--------- > > > 1 file changed, 18 insertions(+), 9 deletions(-) > > > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > > > index 257cf735505d..994c79bf6afd 100644 > > > --- a/drivers/cxl/pci.c > > > +++ b/drivers/cxl/pci.c > > > @@ -463,13 +463,18 @@ static int wait_for_media_ready(struct cxl_dev_state *cxlds) > > > return 0; > > > } > > > > > > -static int cxl_dvsec_ranges(struct cxl_dev_state *cxlds) > > > +/* > > > + * Return positive number of non-zero ranges on success and a negative > > > + * error code on failure. The cxl_mem driver depends on ranges == 0 to > > > + * init HDM operation. > > > + */ > > > > It shouldn't depend on 0 ranges, it depends on ranges matching existing HDM > > decoder ranges. I took a shortcut by just checking global enable originally > > because we didn't yet have code to enumerate decoders (anything else would be a > > crazy BIOS bug that's probably not worth working around). > > Hmm, I don't see that as a shortcut. If global enable is set then > there is no requirement that CXL DVSEC matches the HDM decoder ranges, > if global enable is not set then the HDM decoder ranges are not in > effect. It's true that there is no requirement that the DVSEC matches HDM decoder ranges, except as you point out before it's going against spec recommendation. > > This is what the new comment in cxl_hdm_decode_init() is trying to > clarify. My proposal is that Linux ignores the recommendation which I > think is trying to accommodate legacy CXL 1.1 software stacks which > Linux never had. > Okay, that's fine. > /* > * Per CXL 2.0 Section 8.1.3.8.3 and 8.1.3.8.4 DVSEC CXL Range 1 Base > * [High,Low] when HDM operation is enabled the range register values > * are ignored by the device, but the spec also recommends matching the > * DVSEC Range 1,2 to HDM Decoder Range 0,1 so, non-zero info->ranges > * are expected. > */ Maybe explicitly say the driver does not attempt to check this recommendation.
On Thu, Mar 17, 2022 at 11:29 AM Ben Widawsky <ben.widawsky@intel.com> wrote: > > On 22-03-17 11:20:48, Dan Williams wrote: > > On Thu, Mar 17, 2022 at 10:52 AM Ben Widawsky <ben.widawsky@intel.com> wrote: > > > > > > On 22-03-14 18:22:38, Dan Williams wrote: > > > > cxl_dvsec_ranges(), the helper for enumerating the presence of an active > > > > legacy CXL.mem configuration on a CXL 2.0 Memory Expander, is not fatal > > > > for cxl_pci because there is still value to enable mailbox operations > > > > even if CXL.mem operation is disabled. Recall that the reason cxl_pci > > > > does this initialization and not cxl_mem is to preserve the useful > > > > property (for unit testing) that cxl_mem is cxl_memdev + mmio generic, > > > > and does not require access to a 'struct pci_dev' to issue config > > > > cycles. > > > > > > > > Update 'struct cxl_endpoint_dvsec_info' to carry either a positive > > > > number of non-zero size legacy CXL DVSEC ranges, or the negative error > > > > code from __cxl_dvsec_ranges() in its @ranges member. > > > > > > > > Reported-by: Krzysztof Zach <krzysztof.zach@intel.com> > > > > Fixes: 560f78559006 ("cxl/pci: Retrieve CXL DVSEC memory info") > > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > > > --- > > > > drivers/cxl/pci.c | 27 ++++++++++++++++++--------- > > > > 1 file changed, 18 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > > > > index 257cf735505d..994c79bf6afd 100644 > > > > --- a/drivers/cxl/pci.c > > > > +++ b/drivers/cxl/pci.c > > > > @@ -463,13 +463,18 @@ static int wait_for_media_ready(struct cxl_dev_state *cxlds) > > > > return 0; > > > > } > > > > > > > > -static int cxl_dvsec_ranges(struct cxl_dev_state *cxlds) > > > > +/* > > > > + * Return positive number of non-zero ranges on success and a negative > > > > + * error code on failure. The cxl_mem driver depends on ranges == 0 to > > > > + * init HDM operation. > > > > + */ > > > > > > It shouldn't depend on 0 ranges, it depends on ranges matching existing HDM > > > decoder ranges. I took a shortcut by just checking global enable originally > > > because we didn't yet have code to enumerate decoders (anything else would be a > > > crazy BIOS bug that's probably not worth working around). > > > > Hmm, I don't see that as a shortcut. If global enable is set then > > there is no requirement that CXL DVSEC matches the HDM decoder ranges, > > if global enable is not set then the HDM decoder ranges are not in > > effect. > > It's true that there is no requirement that the DVSEC matches HDM decoder > ranges, except as you point out before it's going against spec recommendation. > > > > > This is what the new comment in cxl_hdm_decode_init() is trying to > > clarify. My proposal is that Linux ignores the recommendation which I > > think is trying to accommodate legacy CXL 1.1 software stacks which > > Linux never had. > > > > Okay, that's fine. > > > /* > > * Per CXL 2.0 Section 8.1.3.8.3 and 8.1.3.8.4 DVSEC CXL Range 1 Base > > * [High,Low] when HDM operation is enabled the range register values > > * are ignored by the device, but the spec also recommends matching the > > * DVSEC Range 1,2 to HDM Decoder Range 0,1 so, non-zero info->ranges > > * are expected. > > */ > > Maybe explicitly say the driver does not attempt to check this recommendation. Sure, I can add that.
On Mon, 14 Mar 2022 18:22:38 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > cxl_dvsec_ranges(), the helper for enumerating the presence of an active > legacy CXL.mem configuration on a CXL 2.0 Memory Expander, is not fatal > for cxl_pci because there is still value to enable mailbox operations > even if CXL.mem operation is disabled. Recall that the reason cxl_pci > does this initialization and not cxl_mem is to preserve the useful > property (for unit testing) that cxl_mem is cxl_memdev + mmio generic, > and does not require access to a 'struct pci_dev' to issue config > cycles. > > Update 'struct cxl_endpoint_dvsec_info' to carry either a positive > number of non-zero size legacy CXL DVSEC ranges, or the negative error > code from __cxl_dvsec_ranges() in its @ranges member. > > Reported-by: Krzysztof Zach <krzysztof.zach@intel.com> > Fixes: 560f78559006 ("cxl/pci: Retrieve CXL DVSEC memory info") > Signed-off-by: Dan Williams <dan.j.williams@intel.com> with comment Ben requested Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > drivers/cxl/pci.c | 27 ++++++++++++++++++--------- > 1 file changed, 18 insertions(+), 9 deletions(-) > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 257cf735505d..994c79bf6afd 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -463,13 +463,18 @@ static int wait_for_media_ready(struct cxl_dev_state *cxlds) > return 0; > } > > -static int cxl_dvsec_ranges(struct cxl_dev_state *cxlds) > +/* > + * Return positive number of non-zero ranges on success and a negative > + * error code on failure. The cxl_mem driver depends on ranges == 0 to > + * init HDM operation. > + */ > +static int __cxl_dvsec_ranges(struct cxl_dev_state *cxlds, > + struct cxl_endpoint_dvsec_info *info) > { > - struct cxl_endpoint_dvsec_info *info = &cxlds->info; > struct pci_dev *pdev = to_pci_dev(cxlds->dev); > + int hdm_count, rc, i, ranges = 0; > struct device *dev = &pdev->dev; > int d = cxlds->cxl_dvsec; > - int hdm_count, rc, i; > u16 cap, ctrl; > > if (!d) { > @@ -546,10 +551,17 @@ static int cxl_dvsec_ranges(struct cxl_dev_state *cxlds) > }; > > if (size) > - info->ranges++; > + ranges++; > } > > - return 0; > + return ranges; > +} > + > +static void cxl_dvsec_ranges(struct cxl_dev_state *cxlds) > +{ > + struct cxl_endpoint_dvsec_info *info = &cxlds->info; > + > + info->ranges = __cxl_dvsec_ranges(cxlds, info); > } > > static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > @@ -618,10 +630,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > if (rc) > return rc; > > - rc = cxl_dvsec_ranges(cxlds); > - if (rc) > - dev_warn(&pdev->dev, > - "Failed to get DVSEC range information (%d)\n", rc); > + cxl_dvsec_ranges(cxlds); > > cxlmd = devm_cxl_add_memdev(cxlds); > if (IS_ERR(cxlmd)) >
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 257cf735505d..994c79bf6afd 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -463,13 +463,18 @@ static int wait_for_media_ready(struct cxl_dev_state *cxlds) return 0; } -static int cxl_dvsec_ranges(struct cxl_dev_state *cxlds) +/* + * Return positive number of non-zero ranges on success and a negative + * error code on failure. The cxl_mem driver depends on ranges == 0 to + * init HDM operation. + */ +static int __cxl_dvsec_ranges(struct cxl_dev_state *cxlds, + struct cxl_endpoint_dvsec_info *info) { - struct cxl_endpoint_dvsec_info *info = &cxlds->info; struct pci_dev *pdev = to_pci_dev(cxlds->dev); + int hdm_count, rc, i, ranges = 0; struct device *dev = &pdev->dev; int d = cxlds->cxl_dvsec; - int hdm_count, rc, i; u16 cap, ctrl; if (!d) { @@ -546,10 +551,17 @@ static int cxl_dvsec_ranges(struct cxl_dev_state *cxlds) }; if (size) - info->ranges++; + ranges++; } - return 0; + return ranges; +} + +static void cxl_dvsec_ranges(struct cxl_dev_state *cxlds) +{ + struct cxl_endpoint_dvsec_info *info = &cxlds->info; + + info->ranges = __cxl_dvsec_ranges(cxlds, info); } static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) @@ -618,10 +630,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (rc) return rc; - rc = cxl_dvsec_ranges(cxlds); - if (rc) - dev_warn(&pdev->dev, - "Failed to get DVSEC range information (%d)\n", rc); + cxl_dvsec_ranges(cxlds); cxlmd = devm_cxl_add_memdev(cxlds); if (IS_ERR(cxlmd))
cxl_dvsec_ranges(), the helper for enumerating the presence of an active legacy CXL.mem configuration on a CXL 2.0 Memory Expander, is not fatal for cxl_pci because there is still value to enable mailbox operations even if CXL.mem operation is disabled. Recall that the reason cxl_pci does this initialization and not cxl_mem is to preserve the useful property (for unit testing) that cxl_mem is cxl_memdev + mmio generic, and does not require access to a 'struct pci_dev' to issue config cycles. Update 'struct cxl_endpoint_dvsec_info' to carry either a positive number of non-zero size legacy CXL DVSEC ranges, or the negative error code from __cxl_dvsec_ranges() in its @ranges member. Reported-by: Krzysztof Zach <krzysztof.zach@intel.com> Fixes: 560f78559006 ("cxl/pci: Retrieve CXL DVSEC memory info") Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/cxl/pci.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-)