Message ID | 163855974164.1338601.11643774914793606293.stgit@dwillia2-desk3.amr.corp.intel.com |
---|---|
State | New, archived |
Headers | show |
Series | cxl/mailbox: Replace racy error checking with timeouts | expand |
On 21-12-03 11:29:01, Dan Williams wrote: > From: Ben Widawsky <ben.widawsky@intel.com> > > The original driver implementation used the doorbell timeout for the > Mailbox Interface Ready bit to piggy back off of, since the latter does > not have a defined timeout. This functionality, introduced in commit > 8adaf747c9f0 ("cxl/mem: Find device capabilities"), needs improvement as > the recent "Add Mailbox Ready Time" ECN timeout indicates that the > mailbox ready time can be significantly longer that 2 seconds. > > While the specification limits the maximum timeout to 256s, the cxl_pci > driver gives up on the mailbox after 60s. This value corresponds with > important timeout values already present in the kernel. A module > parameter is provided as an emergency override. > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> > [djbw: add modparam, drop check_device_status()] > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > drivers/cxl/pci.c | 34 ++++++++++++++++++++++++++++++++++ > 1 file changed, 34 insertions(+) > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 8dc91fd3396a..519795432708 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -1,7 +1,9 @@ > // SPDX-License-Identifier: GPL-2.0-only > /* Copyright(c) 2020 Intel Corporation. All rights reserved. */ > #include <linux/io-64-nonatomic-lo-hi.h> > +#include <linux/moduleparam.h> > #include <linux/module.h> > +#include <linux/delay.h> > #include <linux/sizes.h> > #include <linux/mutex.h> > #include <linux/list.h> > @@ -35,6 +37,19 @@ > /* CXL 2.0 - 8.2.8.4 */ > #define CXL_MAILBOX_TIMEOUT_MS (2 * HZ) > > +/* > + * CXL 2.0 ECN "Add Mailbox Ready Time" defines a capability field to > + * dictate how long to wait for the mailbox to become ready. The new > + * field allows the device to tell software the amount of time to wait > + * before mailbox ready. This field allows for up to 255 seconds. 255 > + * seconds is unreasonable long, and longer than other default timeouts s/unreasonable/unreasonably > + * in the OS. Use the more sane, 60 seconds instead. > + */ > +static unsigned short mbox_ready_timeout = 60; > +module_param(mbox_ready_timeout, ushort, 0600); > +MODULE_PARM_DESC(mbox_ready_timeout, > + "seconds to wait for mailbox ready status"); > + It's a bit of a weird thing to set as a modparam since it's not module specific, but device specific. However, I suppose it's better than hardcoded 60s and I can't come up with a better alternative. Perhaps mention this in the commit message? > static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds) > { > const unsigned long start = jiffies; > @@ -281,6 +296,25 @@ static int cxl_pci_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *c > static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds) > { > const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET); > + unsigned long timeout; > + u64 md_status; > + > + timeout = jiffies + mbox_ready_timeout * HZ; > + do { > + md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET); > + if (md_status & CXLMDEV_MBOX_IF_READY) > + break; > + if (msleep_interruptible(100)) > + break; > + } while (!time_after(jiffies, timeout)); One thing I noticed after I wrote these using time_after... time_before_eq() might be a better fit. I'm slightly annoyed both APIs exist, but... > + > + if (!(md_status & CXLMDEV_MBOX_IF_READY)) { > + dev_err(cxlds->dev, > + "timeout awaiting mailbox ready, device state:%s%s\n", > + md_status & CXLMDEV_DEV_FATAL ? " fatal" : "", > + md_status & CXLMDEV_FW_HALT ? " firmware-halt" : ""); > + return -EIO; > + } > > cxlds->mbox_send = cxl_pci_mbox_send; > cxlds->payload_size = > All in all, LGTM
On Fri, Dec 3, 2021 at 5:37 PM Ben Widawsky <ben.widawsky@intel.com> wrote: > > On 21-12-03 11:29:01, Dan Williams wrote: > > From: Ben Widawsky <ben.widawsky@intel.com> > > > > The original driver implementation used the doorbell timeout for the > > Mailbox Interface Ready bit to piggy back off of, since the latter does > > not have a defined timeout. This functionality, introduced in commit > > 8adaf747c9f0 ("cxl/mem: Find device capabilities"), needs improvement as > > the recent "Add Mailbox Ready Time" ECN timeout indicates that the > > mailbox ready time can be significantly longer that 2 seconds. > > > > While the specification limits the maximum timeout to 256s, the cxl_pci > > driver gives up on the mailbox after 60s. This value corresponds with > > important timeout values already present in the kernel. A module > > parameter is provided as an emergency override. > > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> > > [djbw: add modparam, drop check_device_status()] > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > --- > > drivers/cxl/pci.c | 34 ++++++++++++++++++++++++++++++++++ > > 1 file changed, 34 insertions(+) > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > > index 8dc91fd3396a..519795432708 100644 > > --- a/drivers/cxl/pci.c > > +++ b/drivers/cxl/pci.c > > @@ -1,7 +1,9 @@ > > // SPDX-License-Identifier: GPL-2.0-only > > /* Copyright(c) 2020 Intel Corporation. All rights reserved. */ > > #include <linux/io-64-nonatomic-lo-hi.h> > > +#include <linux/moduleparam.h> > > #include <linux/module.h> > > +#include <linux/delay.h> > > #include <linux/sizes.h> > > #include <linux/mutex.h> > > #include <linux/list.h> > > @@ -35,6 +37,19 @@ > > /* CXL 2.0 - 8.2.8.4 */ > > #define CXL_MAILBOX_TIMEOUT_MS (2 * HZ) > > > > +/* > > + * CXL 2.0 ECN "Add Mailbox Ready Time" defines a capability field to > > + * dictate how long to wait for the mailbox to become ready. The new > > + * field allows the device to tell software the amount of time to wait > > + * before mailbox ready. This field allows for up to 255 seconds. 255 > > + * seconds is unreasonable long, and longer than other default timeouts > > s/unreasonable/unreasonably yup. > > > + * in the OS. Use the more sane, 60 seconds instead. > > + */ > > +static unsigned short mbox_ready_timeout = 60; > > +module_param(mbox_ready_timeout, ushort, 0600); > > +MODULE_PARM_DESC(mbox_ready_timeout, > > + "seconds to wait for mailbox ready status"); > > + > > It's a bit of a weird thing to set as a modparam since it's not module specific, > but device specific. However, I suppose it's better than hardcoded 60s and I > can't come up with a better alternative. Perhaps mention this in the commit > message? The individual device timeout doesn't matter. "This timeout is the singleton 'Linux' policy." Does that clarify? > > > static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds) > > { > > const unsigned long start = jiffies; > > @@ -281,6 +296,25 @@ static int cxl_pci_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *c > > static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds) > > { > > const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET); > > + unsigned long timeout; > > + u64 md_status; > > + > > + timeout = jiffies + mbox_ready_timeout * HZ; > > + do { > > + md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET); > > + if (md_status & CXLMDEV_MBOX_IF_READY) > > + break; > > + if (msleep_interruptible(100)) > > + break; > > + } while (!time_after(jiffies, timeout)); > > One thing I noticed after I wrote these using time_after... time_before_eq() > might be a better fit. I'm slightly annoyed both APIs exist, but... I don't think an occasional extra 100ms matters. > > > + > > + if (!(md_status & CXLMDEV_MBOX_IF_READY)) { > > + dev_err(cxlds->dev, > > + "timeout awaiting mailbox ready, device state:%s%s\n", > > + md_status & CXLMDEV_DEV_FATAL ? " fatal" : "", > > + md_status & CXLMDEV_FW_HALT ? " firmware-halt" : ""); > > + return -EIO; > > + } > > > > cxlds->mbox_send = cxl_pci_mbox_send; > > cxlds->payload_size = > > > > All in all, LGTM cool.
On Fri, 3 Dec 2021 11:29:01 -0800 Dan Williams <dan.j.williams@intel.com> wrote: > From: Ben Widawsky <ben.widawsky@intel.com> > > The original driver implementation used the doorbell timeout for the > Mailbox Interface Ready bit to piggy back off of, since the latter does > not have a defined timeout. This functionality, introduced in commit > 8adaf747c9f0 ("cxl/mem: Find device capabilities"), needs improvement as > the recent "Add Mailbox Ready Time" ECN timeout indicates that the > mailbox ready time can be significantly longer that 2 seconds. > > While the specification limits the maximum timeout to 256s, the cxl_pci > driver gives up on the mailbox after 60s. This value corresponds with > important timeout values already present in the kernel. A module > parameter is provided as an emergency override. > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> > [djbw: add modparam, drop check_device_status()] > Signed-off-by: Dan Williams <dan.j.williams@intel.com> LGTM Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > drivers/cxl/pci.c | 34 ++++++++++++++++++++++++++++++++++ > 1 file changed, 34 insertions(+) > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 8dc91fd3396a..519795432708 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -1,7 +1,9 @@ > // SPDX-License-Identifier: GPL-2.0-only > /* Copyright(c) 2020 Intel Corporation. All rights reserved. */ > #include <linux/io-64-nonatomic-lo-hi.h> > +#include <linux/moduleparam.h> > #include <linux/module.h> > +#include <linux/delay.h> > #include <linux/sizes.h> > #include <linux/mutex.h> > #include <linux/list.h> > @@ -35,6 +37,19 @@ > /* CXL 2.0 - 8.2.8.4 */ > #define CXL_MAILBOX_TIMEOUT_MS (2 * HZ) > > +/* > + * CXL 2.0 ECN "Add Mailbox Ready Time" defines a capability field to > + * dictate how long to wait for the mailbox to become ready. The new > + * field allows the device to tell software the amount of time to wait > + * before mailbox ready. This field allows for up to 255 seconds. 255 > + * seconds is unreasonable long, and longer than other default timeouts > + * in the OS. Use the more sane, 60 seconds instead. > + */ > +static unsigned short mbox_ready_timeout = 60; > +module_param(mbox_ready_timeout, ushort, 0600); > +MODULE_PARM_DESC(mbox_ready_timeout, > + "seconds to wait for mailbox ready status"); > + > static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds) > { > const unsigned long start = jiffies; > @@ -281,6 +296,25 @@ static int cxl_pci_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *c > static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds) > { > const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET); > + unsigned long timeout; > + u64 md_status; > + > + timeout = jiffies + mbox_ready_timeout * HZ; > + do { > + md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET); > + if (md_status & CXLMDEV_MBOX_IF_READY) > + break; > + if (msleep_interruptible(100)) > + break; > + } while (!time_after(jiffies, timeout)); > + > + if (!(md_status & CXLMDEV_MBOX_IF_READY)) { > + dev_err(cxlds->dev, > + "timeout awaiting mailbox ready, device state:%s%s\n", > + md_status & CXLMDEV_DEV_FATAL ? " fatal" : "", > + md_status & CXLMDEV_FW_HALT ? " firmware-halt" : ""); > + return -EIO; > + } > > cxlds->mbox_send = cxl_pci_mbox_send; > cxlds->payload_size = >
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 8dc91fd3396a..519795432708 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -1,7 +1,9 @@ // SPDX-License-Identifier: GPL-2.0-only /* Copyright(c) 2020 Intel Corporation. All rights reserved. */ #include <linux/io-64-nonatomic-lo-hi.h> +#include <linux/moduleparam.h> #include <linux/module.h> +#include <linux/delay.h> #include <linux/sizes.h> #include <linux/mutex.h> #include <linux/list.h> @@ -35,6 +37,19 @@ /* CXL 2.0 - 8.2.8.4 */ #define CXL_MAILBOX_TIMEOUT_MS (2 * HZ) +/* + * CXL 2.0 ECN "Add Mailbox Ready Time" defines a capability field to + * dictate how long to wait for the mailbox to become ready. The new + * field allows the device to tell software the amount of time to wait + * before mailbox ready. This field allows for up to 255 seconds. 255 + * seconds is unreasonable long, and longer than other default timeouts + * in the OS. Use the more sane, 60 seconds instead. + */ +static unsigned short mbox_ready_timeout = 60; +module_param(mbox_ready_timeout, ushort, 0600); +MODULE_PARM_DESC(mbox_ready_timeout, + "seconds to wait for mailbox ready status"); + static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds) { const unsigned long start = jiffies; @@ -281,6 +296,25 @@ static int cxl_pci_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *c static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds) { const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET); + unsigned long timeout; + u64 md_status; + + timeout = jiffies + mbox_ready_timeout * HZ; + do { + md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET); + if (md_status & CXLMDEV_MBOX_IF_READY) + break; + if (msleep_interruptible(100)) + break; + } while (!time_after(jiffies, timeout)); + + if (!(md_status & CXLMDEV_MBOX_IF_READY)) { + dev_err(cxlds->dev, + "timeout awaiting mailbox ready, device state:%s%s\n", + md_status & CXLMDEV_DEV_FATAL ? " fatal" : "", + md_status & CXLMDEV_FW_HALT ? " firmware-halt" : ""); + return -EIO; + } cxlds->mbox_send = cxl_pci_mbox_send; cxlds->payload_size =