Message ID | 172862484920.2150669.7306809902566347902.stgit@dwillia2-xfh.jf.intel.com |
---|---|
State | Superseded |
Headers | show |
Series | cxl: Initialization and shutdown fixes | expand |
On 2024/10/11 13:34, Dan Williams wrote: > It turns out since its original introduction, pre-2.6.12, > bus_rescan_devices() has skipped devices that might be in the process of > attaching or detaching from their driver. For CXL this behavior is > unwanted and expects that cxl_bus_rescan() is a probe barrier. > is it relevant with bus_rescan_devices()'s bugs fixed by below link https://lore.kernel.org/all/20240913-bus_match_unlikely-v2-2-5c0c3bfda2f6@quicinc.com/ which also checks dev->driver within device_lock()'s protection. > That behavior is simple enough to achieve with bus_for_each_dev() paired > with call to device_attach(), and it is unclear why bus_rescan_devices() > took the positition of lockless consumption of dev->driver which is > racy. > > The "Fixes:" but no "Cc: stable" on this patch reflects that the issue > is merely by inspection since the bug that triggered the discovery of > this potential problem [1] is fixed by other means. However, a stable > backport should do no harm. > > Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") > Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > drivers/cxl/core/port.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index e666ec6a9085..b7828b6c7826 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -2084,11 +2084,18 @@ static void cxl_bus_remove(struct device *dev) > > static struct workqueue_struct *cxl_bus_wq; > > -static void cxl_bus_rescan_queue(struct work_struct *w) > +static int attach_device(struct device *dev, void *data) > { > - int rc = bus_rescan_devices(&cxl_bus_type); > + int rc = device_attach(dev); > + > + dev_vdbg(dev, "rescan: %s\n", rc ? "attach" : "detached"); > > - pr_debug("CXL bus rescan result: %d\n", rc); > + return 0; > +} > + > +static void cxl_bus_rescan_queue(struct work_struct *w) > +{ > + bus_for_each_dev(&cxl_bus_type, NULL, NULL, attach_device); > } > > void cxl_bus_rescan(void) > >
Lk Sii wrote: > On 2024/10/11 13:34, Dan Williams wrote: > > It turns out since its original introduction, pre-2.6.12, > > bus_rescan_devices() has skipped devices that might be in the process of > > attaching or detaching from their driver. For CXL this behavior is > > unwanted and expects that cxl_bus_rescan() is a probe barrier. > > > > is it relevant with bus_rescan_devices()'s bugs fixed by below link > https://lore.kernel.org/all/20240913-bus_match_unlikely-v2-2-5c0c3bfda2f6@quicinc.com/ > > which also checks dev->driver within device_lock()'s protection. I considered that, but the complexity of taking the lock combined with the risk of touching the the long legacy of the handful for bus_rescan_devices() users (hid-core and pcmcia), I decided that just using bus_for_each_dev()+device_attach() is more prudent. So I would say, just document that bus_rescan_devices() is a legacy helper with this old locking behavior and direct others to use bus_for_each_dev()+device_attach() for new use cases.
On Thu, 10 Oct 2024 22:34:10 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > It turns out since its original introduction, pre-2.6.12, > bus_rescan_devices() has skipped devices that might be in the process of > attaching or detaching from their driver. For CXL this behavior is > unwanted and expects that cxl_bus_rescan() is a probe barrier. > > That behavior is simple enough to achieve with bus_for_each_dev() paired > with call to device_attach(), and it is unclear why bus_rescan_devices() > took the positition of lockless consumption of dev->driver which is position > racy. Feels like should be +CC a few folk related to the driver core, GregKH etc to get a sanity check on if they can recall why. +CC Greg. > > The "Fixes:" but no "Cc: stable" on this patch reflects that the issue > is merely by inspection since the bug that triggered the discovery of > this potential problem [1] is fixed by other means. However, a stable > backport should do no harm. > > Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") > Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] > Signed-off-by: Dan Williams <dan.j.williams@intel.com> Fix itself looks fine to me as it will check the dev->driver under the device lock. > --- > drivers/cxl/core/port.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index e666ec6a9085..b7828b6c7826 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -2084,11 +2084,18 @@ static void cxl_bus_remove(struct device *dev) > > static struct workqueue_struct *cxl_bus_wq; > > -static void cxl_bus_rescan_queue(struct work_struct *w) > +static int attach_device(struct device *dev, void *data) That naming is never going to be a problem :) I'd prefix this with something more specific > { > - int rc = bus_rescan_devices(&cxl_bus_type); > + int rc = device_attach(dev); > + > + dev_vdbg(dev, "rescan: %s\n", rc ? "attach" : "detached"); > > - pr_debug("CXL bus rescan result: %d\n", rc); > + return 0; > +} > + > +static void cxl_bus_rescan_queue(struct work_struct *w) > +{ > + bus_for_each_dev(&cxl_bus_type, NULL, NULL, attach_device); > } > > void cxl_bus_rescan(void) > >
Jonathan Cameron wrote: > On Thu, 10 Oct 2024 22:34:10 -0700 > Dan Williams <dan.j.williams@intel.com> wrote: > > > It turns out since its original introduction, pre-2.6.12, > > bus_rescan_devices() has skipped devices that might be in the process of > > attaching or detaching from their driver. For CXL this behavior is > > unwanted and expects that cxl_bus_rescan() is a probe barrier. > > > > That behavior is simple enough to achieve with bus_for_each_dev() paired > > with call to device_attach(), and it is unclear why bus_rescan_devices() > > took the positition of lockless consumption of dev->driver which is > > position Thanks, checkpatch spell check miss. > > racy. > > Feels like should be +CC a few folk related to the driver core, GregKH > etc to get a sanity check on if they can recall why. +CC Greg. I went spelunking through the history, and it just seems like "it has always been that way". The callers have never seemed to care that bus_rescan_devices() was not a probe barrier. I considered creating a new helper called bus_rescan_devices_sync() that uses device_attach() internally, but given the limited usage of bus_rescan_devices() throughout the kernel, it did not seem worth it. > > The "Fixes:" but no "Cc: stable" on this patch reflects that the issue > > is merely by inspection since the bug that triggered the discovery of > > this potential problem [1] is fixed by other means. However, a stable > > backport should do no harm. > > > > Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") > > Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > Fix itself looks fine to me as it will check the dev->driver > under the device lock. > > > --- > > drivers/cxl/core/port.c | 13 ++++++++++--- > > 1 file changed, 10 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > > index e666ec6a9085..b7828b6c7826 100644 > > --- a/drivers/cxl/core/port.c > > +++ b/drivers/cxl/core/port.c > > @@ -2084,11 +2084,18 @@ static void cxl_bus_remove(struct device *dev) > > > > static struct workqueue_struct *cxl_bus_wq; > > > > -static void cxl_bus_rescan_queue(struct work_struct *w) > > +static int attach_device(struct device *dev, void *data) > > That naming is never going to be a problem :) > I'd prefix this with something more specific I don't immediately see a problem since that symbol is private to the cxl core. Any alternate name ideas?
> > > > > > -static void cxl_bus_rescan_queue(struct work_struct *w) > > > +static int attach_device(struct device *dev, void *data) > > > > That naming is never going to be a problem :) > > I'd prefix this with something more specific > > I don't immediately see a problem since that symbol is private to the > cxl core. Any alternate name ideas? Not an immediate problem until another attach_device() shows up that isn't private. Ah well whoever introduces that gets to rename this I guess. Jonathan >
Jonathan Cameron wrote: > > > > > > > > -static void cxl_bus_rescan_queue(struct work_struct *w) > > > > +static int attach_device(struct device *dev, void *data) > > > > > > That naming is never going to be a problem :) > > > I'd prefix this with something more specific > > > > I don't immediately see a problem since that symbol is private to the > > cxl core. Any alternate name ideas? > Not an immediate problem until another attach_device() shows up > that isn't private. > > Ah well whoever introduces that gets to rename this I guess. Since I am going a v2 anyway went ahead and called the helper cxl_rescan_attach().
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index e666ec6a9085..b7828b6c7826 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -2084,11 +2084,18 @@ static void cxl_bus_remove(struct device *dev) static struct workqueue_struct *cxl_bus_wq; -static void cxl_bus_rescan_queue(struct work_struct *w) +static int attach_device(struct device *dev, void *data) { - int rc = bus_rescan_devices(&cxl_bus_type); + int rc = device_attach(dev); + + dev_vdbg(dev, "rescan: %s\n", rc ? "attach" : "detached"); - pr_debug("CXL bus rescan result: %d\n", rc); + return 0; +} + +static void cxl_bus_rescan_queue(struct work_struct *w) +{ + bus_for_each_dev(&cxl_bus_type, NULL, NULL, attach_device); } void cxl_bus_rescan(void)
It turns out since its original introduction, pre-2.6.12, bus_rescan_devices() has skipped devices that might be in the process of attaching or detaching from their driver. For CXL this behavior is unwanted and expects that cxl_bus_rescan() is a probe barrier. That behavior is simple enough to achieve with bus_for_each_dev() paired with call to device_attach(), and it is unclear why bus_rescan_devices() took the positition of lockless consumption of dev->driver which is racy. The "Fixes:" but no "Cc: stable" on this patch reflects that the issue is merely by inspection since the bug that triggered the discovery of this potential problem [1] is fixed by other means. However, a stable backport should do no harm. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/cxl/core/port.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)