Message ID | 172964781104.81806.4277549800082443769.stgit@dwillia2-xfh.jf.intel.com |
---|---|
State | Accepted |
Commit | 15eb0c007224b801bd746b1fd19b6977f5bf6ac4 |
Headers | show |
Series | cxl: Initialization and shutdown fixes | expand |
On Tue, Oct 22, 2024 at 06:43:32PM -0700, Dan Williams wrote: > It turns out since its original introduction, pre-2.6.12, > bus_rescan_devices() has skipped devices that might be in the process of > attaching or detaching from their driver. For CXL this behavior is > unwanted and expects that cxl_bus_rescan() is a probe barrier. > > That behavior is simple enough to achieve with bus_for_each_dev() paired > with call to device_attach(), and it is unclear why bus_rescan_devices() > took the position of lockless consumption of dev->driver which is racy. > > The "Fixes:" but no "Cc: stable" on this patch reflects that the issue > is merely by inspection since the bug that triggered the discovery of > this potential problem [1] is fixed by other means. However, a stable > backport should do no harm. > > Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") > Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] > Signed-off-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Gregory Price <gourry@gourry.net> > --- > drivers/cxl/core/port.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index e666ec6a9085..af92c67bc954 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -2084,11 +2084,18 @@ static void cxl_bus_remove(struct device *dev) > > static struct workqueue_struct *cxl_bus_wq; > > -static void cxl_bus_rescan_queue(struct work_struct *w) > +static int cxl_rescan_attach(struct device *dev, void *data) > { > - int rc = bus_rescan_devices(&cxl_bus_type); > + int rc = device_attach(dev); > + > + dev_vdbg(dev, "rescan: %s\n", rc ? "attach" : "detached"); > > - pr_debug("CXL bus rescan result: %d\n", rc); > + return 0; > +} > + > +static void cxl_bus_rescan_queue(struct work_struct *w) > +{ > + bus_for_each_dev(&cxl_bus_type, NULL, NULL, cxl_rescan_attach); > } > > void cxl_bus_rescan(void) >
On Tue, 22 Oct 2024 18:43:32 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > It turns out since its original introduction, pre-2.6.12, > bus_rescan_devices() has skipped devices that might be in the process of > attaching or detaching from their driver. For CXL this behavior is > unwanted and expects that cxl_bus_rescan() is a probe barrier. > > That behavior is simple enough to achieve with bus_for_each_dev() paired > with call to device_attach(), and it is unclear why bus_rescan_devices() > took the position of lockless consumption of dev->driver which is racy. > > The "Fixes:" but no "Cc: stable" on this patch reflects that the issue > is merely by inspection since the bug that triggered the discovery of > this potential problem [1] is fixed by other means. However, a stable > backport should do no harm. > > Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") > Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] > Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Dan Williams wrote: > It turns out since its original introduction, pre-2.6.12, > bus_rescan_devices() has skipped devices that might be in the process of > attaching or detaching from their driver. For CXL this behavior is > unwanted and expects that cxl_bus_rescan() is a probe barrier. > > That behavior is simple enough to achieve with bus_for_each_dev() paired > with call to device_attach(), and it is unclear why bus_rescan_devices() > took the position of lockless consumption of dev->driver which is racy. > > The "Fixes:" but no "Cc: stable" on this patch reflects that the issue > is merely by inspection since the bug that triggered the discovery of > this potential problem [1] is fixed by other means. However, a stable > backport should do no harm. > > Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") > Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] > Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> [snip]
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index e666ec6a9085..af92c67bc954 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -2084,11 +2084,18 @@ static void cxl_bus_remove(struct device *dev) static struct workqueue_struct *cxl_bus_wq; -static void cxl_bus_rescan_queue(struct work_struct *w) +static int cxl_rescan_attach(struct device *dev, void *data) { - int rc = bus_rescan_devices(&cxl_bus_type); + int rc = device_attach(dev); + + dev_vdbg(dev, "rescan: %s\n", rc ? "attach" : "detached"); - pr_debug("CXL bus rescan result: %d\n", rc); + return 0; +} + +static void cxl_bus_rescan_queue(struct work_struct *w) +{ + bus_for_each_dev(&cxl_bus_type, NULL, NULL, cxl_rescan_attach); } void cxl_bus_rescan(void)
It turns out since its original introduction, pre-2.6.12, bus_rescan_devices() has skipped devices that might be in the process of attaching or detaching from their driver. For CXL this behavior is unwanted and expects that cxl_bus_rescan() is a probe barrier. That behavior is simple enough to achieve with bus_for_each_dev() paired with call to device_attach(), and it is unclear why bus_rescan_devices() took the position of lockless consumption of dev->driver which is racy. The "Fixes:" but no "Cc: stable" on this patch reflects that the issue is merely by inspection since the bug that triggered the discovery of this potential problem [1] is fixed by other means. However, a stable backport should do no harm. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/cxl/core/port.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)