diff mbox series

[2/5] cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()

Message ID 172862484920.2150669.7306809902566347902.stgit@dwillia2-xfh.jf.intel.com
State Superseded
Headers show
Series cxl: Initialization and shutdown fixes | expand

Commit Message

Dan Williams Oct. 11, 2024, 5:34 a.m. UTC
It turns out since its original introduction, pre-2.6.12,
bus_rescan_devices() has skipped devices that might be in the process of
attaching or detaching from their driver. For CXL this behavior is
unwanted and expects that cxl_bus_rescan() is a probe barrier.

That behavior is simple enough to achieve with bus_for_each_dev() paired
with call to device_attach(), and it is unclear why bus_rescan_devices()
took the positition of lockless consumption of dev->driver which is
racy.

The "Fixes:" but no "Cc: stable" on this patch reflects that the issue
is merely by inspection since the bug that triggered the discovery of
this potential problem [1] is fixed by other means.  However, a stable
backport should do no harm.

Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/port.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Comments

Lk Sii Oct. 11, 2024, 12:27 p.m. UTC | #1
On 2024/10/11 13:34, Dan Williams wrote:
> It turns out since its original introduction, pre-2.6.12,
> bus_rescan_devices() has skipped devices that might be in the process of
> attaching or detaching from their driver. For CXL this behavior is
> unwanted and expects that cxl_bus_rescan() is a probe barrier.
> 

is it relevant with bus_rescan_devices()'s bugs fixed by below link
https://lore.kernel.org/all/20240913-bus_match_unlikely-v2-2-5c0c3bfda2f6@quicinc.com/

which also checks dev->driver within device_lock()'s protection.

> That behavior is simple enough to achieve with bus_for_each_dev() paired
> with call to device_attach(), and it is unclear why bus_rescan_devices()
> took the positition of lockless consumption of dev->driver which is
> racy.
> 
> The "Fixes:" but no "Cc: stable" on this patch reflects that the issue
> is merely by inspection since the bug that triggered the discovery of
> this potential problem [1] is fixed by other means.  However, a stable
> backport should do no harm.
> 
> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
> Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/port.c |   13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index e666ec6a9085..b7828b6c7826 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -2084,11 +2084,18 @@ static void cxl_bus_remove(struct device *dev)
>  
>  static struct workqueue_struct *cxl_bus_wq;
>  
> -static void cxl_bus_rescan_queue(struct work_struct *w)
> +static int attach_device(struct device *dev, void *data)
>  {
> -	int rc = bus_rescan_devices(&cxl_bus_type);
> +	int rc = device_attach(dev);
> +
> +	dev_vdbg(dev, "rescan: %s\n", rc ? "attach" : "detached");
>  
> -	pr_debug("CXL bus rescan result: %d\n", rc);
> +	return 0;
> +}
> +
> +static void cxl_bus_rescan_queue(struct work_struct *w)
> +{
> +	bus_for_each_dev(&cxl_bus_type, NULL, NULL, attach_device);
>  }
>  
>  void cxl_bus_rescan(void)
> 
>
Dan Williams Oct. 11, 2024, 5:52 p.m. UTC | #2
Lk Sii wrote:
> On 2024/10/11 13:34, Dan Williams wrote:
> > It turns out since its original introduction, pre-2.6.12,
> > bus_rescan_devices() has skipped devices that might be in the process of
> > attaching or detaching from their driver. For CXL this behavior is
> > unwanted and expects that cxl_bus_rescan() is a probe barrier.
> > 
> 
> is it relevant with bus_rescan_devices()'s bugs fixed by below link
> https://lore.kernel.org/all/20240913-bus_match_unlikely-v2-2-5c0c3bfda2f6@quicinc.com/
> 
> which also checks dev->driver within device_lock()'s protection.

I considered that, but the complexity of taking the lock combined with
the risk of touching the the long legacy of the handful for
bus_rescan_devices() users (hid-core and pcmcia), I decided that just
using bus_for_each_dev()+device_attach() is more prudent.

So I would say, just document that bus_rescan_devices() is a legacy
helper with this old locking behavior and direct others to use
bus_for_each_dev()+device_attach() for new use cases.
Jonathan Cameron Oct. 15, 2024, 4:36 p.m. UTC | #3
On Thu, 10 Oct 2024 22:34:10 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> It turns out since its original introduction, pre-2.6.12,
> bus_rescan_devices() has skipped devices that might be in the process of
> attaching or detaching from their driver. For CXL this behavior is
> unwanted and expects that cxl_bus_rescan() is a probe barrier.
> 
> That behavior is simple enough to achieve with bus_for_each_dev() paired
> with call to device_attach(), and it is unclear why bus_rescan_devices()
> took the positition of lockless consumption of dev->driver which is

position

> racy.

Feels like should be +CC a few folk related to the driver core, GregKH
etc to get a sanity check on if they can recall why. +CC Greg.

> 
> The "Fixes:" but no "Cc: stable" on this patch reflects that the issue
> is merely by inspection since the bug that triggered the discovery of
> this potential problem [1] is fixed by other means.  However, a stable
> backport should do no harm.
> 
> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
> Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Fix itself looks fine to me as it will check the dev->driver
under the device lock.

> ---
>  drivers/cxl/core/port.c |   13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index e666ec6a9085..b7828b6c7826 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -2084,11 +2084,18 @@ static void cxl_bus_remove(struct device *dev)
>  
>  static struct workqueue_struct *cxl_bus_wq;
>  
> -static void cxl_bus_rescan_queue(struct work_struct *w)
> +static int attach_device(struct device *dev, void *data)

That naming is never going to be a problem :)
I'd prefix this with something more specific

>  {
> -	int rc = bus_rescan_devices(&cxl_bus_type);
> +	int rc = device_attach(dev);
> +
> +	dev_vdbg(dev, "rescan: %s\n", rc ? "attach" : "detached");
>  
> -	pr_debug("CXL bus rescan result: %d\n", rc);
> +	return 0;
> +}
> +
> +static void cxl_bus_rescan_queue(struct work_struct *w)
> +{
> +	bus_for_each_dev(&cxl_bus_type, NULL, NULL, attach_device);
>  }
>  
>  void cxl_bus_rescan(void)
> 
>
Dan Williams Oct. 15, 2024, 5:57 p.m. UTC | #4
Jonathan Cameron wrote:
> On Thu, 10 Oct 2024 22:34:10 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > It turns out since its original introduction, pre-2.6.12,
> > bus_rescan_devices() has skipped devices that might be in the process of
> > attaching or detaching from their driver. For CXL this behavior is
> > unwanted and expects that cxl_bus_rescan() is a probe barrier.
> > 
> > That behavior is simple enough to achieve with bus_for_each_dev() paired
> > with call to device_attach(), and it is unclear why bus_rescan_devices()
> > took the positition of lockless consumption of dev->driver which is
> 
> position

Thanks, checkpatch spell check miss.

> > racy.
> 
> Feels like should be +CC a few folk related to the driver core, GregKH
> etc to get a sanity check on if they can recall why. +CC Greg.

I went spelunking through the history, and it just seems like "it has
always been that way". The callers have never seemed to care that
bus_rescan_devices() was not a probe barrier.

I considered creating a new helper called bus_rescan_devices_sync() that
uses device_attach() internally, but given the limited usage of
bus_rescan_devices() throughout the kernel, it did not seem worth it.

> > The "Fixes:" but no "Cc: stable" on this patch reflects that the issue
> > is merely by inspection since the bug that triggered the discovery of
> > this potential problem [1] is fixed by other means.  However, a stable
> > backport should do no harm.
> > 
> > Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
> > Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> Fix itself looks fine to me as it will check the dev->driver
> under the device lock.
> 
> > ---
> >  drivers/cxl/core/port.c |   13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index e666ec6a9085..b7828b6c7826 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -2084,11 +2084,18 @@ static void cxl_bus_remove(struct device *dev)
> >  
> >  static struct workqueue_struct *cxl_bus_wq;
> >  
> > -static void cxl_bus_rescan_queue(struct work_struct *w)
> > +static int attach_device(struct device *dev, void *data)
> 
> That naming is never going to be a problem :)
> I'd prefix this with something more specific

I don't immediately see a problem since that symbol is private to the
cxl core. Any alternate name ideas?
Jonathan Cameron Oct. 16, 2024, 2:51 p.m. UTC | #5
> > >  
> > > -static void cxl_bus_rescan_queue(struct work_struct *w)
> > > +static int attach_device(struct device *dev, void *data)  
> > 
> > That naming is never going to be a problem :)
> > I'd prefix this with something more specific  
> 
> I don't immediately see a problem since that symbol is private to the
> cxl core. Any alternate name ideas?
Not an immediate problem until another attach_device() shows up
that isn't private.

Ah well whoever introduces that gets to rename this I guess.

Jonathan

>
Dan Williams Oct. 23, 2024, 12:33 a.m. UTC | #6
Jonathan Cameron wrote:
> > > >  
> > > > -static void cxl_bus_rescan_queue(struct work_struct *w)
> > > > +static int attach_device(struct device *dev, void *data)  
> > > 
> > > That naming is never going to be a problem :)
> > > I'd prefix this with something more specific  
> > 
> > I don't immediately see a problem since that symbol is private to the
> > cxl core. Any alternate name ideas?
> Not an immediate problem until another attach_device() shows up
> that isn't private.
> 
> Ah well whoever introduces that gets to rename this I guess.

Since I am going a v2 anyway went ahead and called the helper
cxl_rescan_attach().
diff mbox series

Patch

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index e666ec6a9085..b7828b6c7826 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -2084,11 +2084,18 @@  static void cxl_bus_remove(struct device *dev)
 
 static struct workqueue_struct *cxl_bus_wq;
 
-static void cxl_bus_rescan_queue(struct work_struct *w)
+static int attach_device(struct device *dev, void *data)
 {
-	int rc = bus_rescan_devices(&cxl_bus_type);
+	int rc = device_attach(dev);
+
+	dev_vdbg(dev, "rescan: %s\n", rc ? "attach" : "detached");
 
-	pr_debug("CXL bus rescan result: %d\n", rc);
+	return 0;
+}
+
+static void cxl_bus_rescan_queue(struct work_struct *w)
+{
+	bus_for_each_dev(&cxl_bus_type, NULL, NULL, attach_device);
 }
 
 void cxl_bus_rescan(void)