Message ID | 165055519869.3745911.10162603933337340370.stgit@dwillia2-desk3.amr.corp.intel.com |
---|---|
State | New, archived |
Headers | show |
Series | device-core: Enable device_lock() lockdep validation | expand |
On Thu, Apr 21, 2022 at 08:33:18AM -0700, Dan Williams wrote: > The CXL "root" device, ACPI0017, is an attach point for coordinating > platform level CXL resources and is the parent device for a CXL port > topology tree. As such it has distinct locking rules relative to other > CXL subsystem objects, but because it is an ACPI device the lock class > is established well before it is given to the cxl_acpi driver. > > However, the lockdep API does support changing the lock class "live" for > situations like this. Add a device_lock_set_class() helper that a driver > can use in ->probe() to set a custom lock class, and > device_lock_reset_class() to return to the default "no validate" class > before the custom lock class key goes out of scope after ->remove(). > > Note the helpers are all macros to support dead code elimination in the > CONFIG_PROVE_LOCKING=n case. > > Suggested-by: Peter Zijlstra <peterz@infradead.org> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: "Rafael J. Wysocki" <rafael@kernel.org> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Will Deacon <will@kernel.org> > Cc: Waiman Long <longman@redhat.com> > Cc: Boqun Feng <boqun.feng@gmail.com> > Cc: Alison Schofield <alison.schofield@intel.com> > Cc: Vishal Verma <vishal.l.verma@intel.com> > Cc: Ira Weiny <ira.weiny@intel.com> > Cc: Ben Widawsky <ben.widawsky@intel.com> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > drivers/cxl/acpi.c | 15 +++++++++++++++ > include/linux/device.h | 25 +++++++++++++++++++++++++ > 2 files changed, 40 insertions(+) Much simpler, great work. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On Thu, Apr 21, 2022 at 08:33:18AM -0700, Dan Williams wrote: > The CXL "root" device, ACPI0017, is an attach point for coordinating > platform level CXL resources and is the parent device for a CXL port > topology tree. As such it has distinct locking rules relative to other > CXL subsystem objects, but because it is an ACPI device the lock class > is established well before it is given to the cxl_acpi driver. This final sentence gave me pause because it implied that the device lock class was set to something other than no validate. But I don't see that anywhere in the acpi code. So given that it looks to me like ACPI is just using the default no validate class... Reviewed-by: Ira Weiny <ira.weiny@intel.com> > However, the lockdep API does support changing the lock class "live" for > situations like this. Add a device_lock_set_class() helper that a driver > can use in ->probe() to set a custom lock class, and > device_lock_reset_class() to return to the default "no validate" class > before the custom lock class key goes out of scope after ->remove(). > > Note the helpers are all macros to support dead code elimination in the > CONFIG_PROVE_LOCKING=n case. > > Suggested-by: Peter Zijlstra <peterz@infradead.org> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: "Rafael J. Wysocki" <rafael@kernel.org> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Will Deacon <will@kernel.org> > Cc: Waiman Long <longman@redhat.com> > Cc: Boqun Feng <boqun.feng@gmail.com> > Cc: Alison Schofield <alison.schofield@intel.com> > Cc: Vishal Verma <vishal.l.verma@intel.com> > Cc: Ira Weiny <ira.weiny@intel.com> > Cc: Ben Widawsky <ben.widawsky@intel.com> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > drivers/cxl/acpi.c | 15 +++++++++++++++ > include/linux/device.h | 25 +++++++++++++++++++++++++ > 2 files changed, 40 insertions(+) > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > index d15a6aec0331..e19cea27387e 100644 > --- a/drivers/cxl/acpi.c > +++ b/drivers/cxl/acpi.c > @@ -275,6 +275,15 @@ static int add_root_nvdimm_bridge(struct device *match, void *data) > return 1; > } > > +static struct lock_class_key cxl_root_key; > + > +static void cxl_acpi_lock_reset_class(void *_dev) > +{ > + struct device *dev = _dev; > + > + device_lock_reset_class(dev); > +} > + > static int cxl_acpi_probe(struct platform_device *pdev) > { > int rc; > @@ -283,6 +292,12 @@ static int cxl_acpi_probe(struct platform_device *pdev) > struct acpi_device *adev = ACPI_COMPANION(host); > struct cxl_cfmws_context ctx; > > + device_lock_set_class(&pdev->dev, &cxl_root_key); > + rc = devm_add_action_or_reset(&pdev->dev, cxl_acpi_lock_reset_class, > + &pdev->dev); > + if (rc) > + return rc; > + > root_port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL); > if (IS_ERR(root_port)) > return PTR_ERR(root_port); > diff --git a/include/linux/device.h b/include/linux/device.h > index 93459724dcde..82c9d307e7bd 100644 > --- a/include/linux/device.h > +++ b/include/linux/device.h > @@ -850,6 +850,31 @@ static inline bool device_supports_offline(struct device *dev) > return dev->bus && dev->bus->offline && dev->bus->online; > } > > +#define __device_lock_set_class(dev, name, key) \ > + lock_set_class(&(dev)->mutex.dep_map, name, key, 0, _THIS_IP_) > + > +/** > + * device_lock_set_class - Specify a temporary lock class while a device > + * is attached to a driver > + * @dev: device to modify > + * @key: lock class key data > + * > + * This must be called with the device_lock() already held, for example > + * from driver ->probe(). > + */ > +#define device_lock_set_class(dev, key) \ > + __device_lock_set_class(dev, #key, key) > + > +/** > + * device_lock_reset_class - Return a device to the default lockdep novalidate state > + * @dev: device to modify > + * > + * This must be called with the device_lock() already held, for example > + * from driver ->remove(). > + */ > +#define device_lock_reset_class(dev) \ > + device_lock_set_class(dev, &__lockdep_no_validate__) > + > void lock_device_hotplug(void); > void unlock_device_hotplug(void); > int lock_device_hotplug_sysfs(void); >
On Fri, Apr 22, 2022 at 4:58 PM Ira Weiny <ira.weiny@intel.com> wrote: > > On Thu, Apr 21, 2022 at 08:33:18AM -0700, Dan Williams wrote: > > The CXL "root" device, ACPI0017, is an attach point for coordinating > > platform level CXL resources and is the parent device for a CXL port > > topology tree. As such it has distinct locking rules relative to other > > CXL subsystem objects, but because it is an ACPI device the lock class > > is established well before it is given to the cxl_acpi driver. > > This final sentence gave me pause because it implied that the device lock class > was set to something other than no validate. But I don't see that anywhere in > the acpi code. So given that it looks to me like ACPI is just using the > default no validate class... Oh, good observation. *If* ACPI had set a custom lock class then cxl_acpi would need to be careful to restore that ACPI-specific class and not reset it to "no validate" on exit, or skip setting its own custom class. However, I think for generic buses like ACPI that feed devices into other subsystems it likely has little reason to set its own class. For safety, since device_lock_set_class() is general purpose, I'll have it emit a debug message and fail if the class is not "no validate" on entry. Thanks Ira!
On Fri, Apr 22, 2022 at 5:08 PM Dan Williams <dan.j.williams@intel.com> wrote: > > On Fri, Apr 22, 2022 at 4:58 PM Ira Weiny <ira.weiny@intel.com> wrote: > > > > On Thu, Apr 21, 2022 at 08:33:18AM -0700, Dan Williams wrote: > > > The CXL "root" device, ACPI0017, is an attach point for coordinating > > > platform level CXL resources and is the parent device for a CXL port > > > topology tree. As such it has distinct locking rules relative to other > > > CXL subsystem objects, but because it is an ACPI device the lock class > > > is established well before it is given to the cxl_acpi driver. > > > > This final sentence gave me pause because it implied that the device lock class > > was set to something other than no validate. But I don't see that anywhere in > > the acpi code. So given that it looks to me like ACPI is just using the > > default no validate class... > > Oh, good observation. *If* ACPI had set a custom lock class then > cxl_acpi would need to be careful to restore that ACPI-specific class > and not reset it to "no validate" on exit, or skip setting its own > custom class. However, I think for generic buses like ACPI that feed > devices into other subsystems it likely has little reason to set its > own class. For safety, since device_lock_set_class() is general > purpose, I'll have it emit a debug message and fail if the class is > not "no validate" on entry. > So this turned out way uglier than I expected: drivers/cxl/acpi.c | 4 +++- include/linux/device.h | 33 +++++++++++++++++++++++++-------- 2 files changed, 28 insertions(+), 9 deletions(-) ...so I'm going to drop it and just add a comment about the expectations. As Peter said there's already a multitude of ways to cause false positive / negative results with lockdep so this is just one more area where one needs to be careful and understand the lock context they might be overriding.
On Sat, Apr 23, 2022 at 10:27:52AM -0700, Dan Williams wrote: > ...so I'm going to drop it and just add a comment about the > expectations. As Peter said there's already a multitude of ways to > cause false positive / negative results with lockdep so this is just > one more area where one needs to be careful and understand the lock > context they might be overriding. One safe-guard might be to check the class you're overriding is indeed __no_validate__, and WARN if not. Then the unconditional reset is conistent. Then, if/when, that WARN ever triggers you can revisit all this.
On Mon, Apr 25, 2022 at 3:33 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Sat, Apr 23, 2022 at 10:27:52AM -0700, Dan Williams wrote: > > > ...so I'm going to drop it and just add a comment about the > > expectations. As Peter said there's already a multitude of ways to > > cause false positive / negative results with lockdep so this is just > > one more area where one needs to be careful and understand the lock > > context they might be overriding. > > One safe-guard might be to check the class you're overriding is indeed > __no_validate__, and WARN if not. Then the unconditional reset is > conistent. > > Then, if/when, that WARN ever triggers you can revisit all this. Ok, that does seem to need a dummy definition of lockdep_match_class() in the CONFIG_LOCKDEP=n case, but that seems worth it to me for the sanity check.
On Mon, Apr 25, 2022 at 9:05 AM Dan Williams <dan.j.williams@intel.com> wrote: > > On Mon, Apr 25, 2022 at 3:33 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Sat, Apr 23, 2022 at 10:27:52AM -0700, Dan Williams wrote: > > > > > ...so I'm going to drop it and just add a comment about the > > > expectations. As Peter said there's already a multitude of ways to > > > cause false positive / negative results with lockdep so this is just > > > one more area where one needs to be careful and understand the lock > > > context they might be overriding. > > > > One safe-guard might be to check the class you're overriding is indeed > > __no_validate__, and WARN if not. Then the unconditional reset is > > conistent. > > > > Then, if/when, that WARN ever triggers you can revisit all this. > > Ok, that does seem to need a dummy definition of lockdep_match_class() > in the CONFIG_LOCKDEP=n case, but that seems worth it to me for the > sanity check. Thankfully the comment in lockdep.h to not define a lockdep_match_class() for the CONFIG_LOCKDEP=n stopped me from going that direction.
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index d15a6aec0331..e19cea27387e 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -275,6 +275,15 @@ static int add_root_nvdimm_bridge(struct device *match, void *data) return 1; } +static struct lock_class_key cxl_root_key; + +static void cxl_acpi_lock_reset_class(void *_dev) +{ + struct device *dev = _dev; + + device_lock_reset_class(dev); +} + static int cxl_acpi_probe(struct platform_device *pdev) { int rc; @@ -283,6 +292,12 @@ static int cxl_acpi_probe(struct platform_device *pdev) struct acpi_device *adev = ACPI_COMPANION(host); struct cxl_cfmws_context ctx; + device_lock_set_class(&pdev->dev, &cxl_root_key); + rc = devm_add_action_or_reset(&pdev->dev, cxl_acpi_lock_reset_class, + &pdev->dev); + if (rc) + return rc; + root_port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL); if (IS_ERR(root_port)) return PTR_ERR(root_port); diff --git a/include/linux/device.h b/include/linux/device.h index 93459724dcde..82c9d307e7bd 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -850,6 +850,31 @@ static inline bool device_supports_offline(struct device *dev) return dev->bus && dev->bus->offline && dev->bus->online; } +#define __device_lock_set_class(dev, name, key) \ + lock_set_class(&(dev)->mutex.dep_map, name, key, 0, _THIS_IP_) + +/** + * device_lock_set_class - Specify a temporary lock class while a device + * is attached to a driver + * @dev: device to modify + * @key: lock class key data + * + * This must be called with the device_lock() already held, for example + * from driver ->probe(). + */ +#define device_lock_set_class(dev, key) \ + __device_lock_set_class(dev, #key, key) + +/** + * device_lock_reset_class - Return a device to the default lockdep novalidate state + * @dev: device to modify + * + * This must be called with the device_lock() already held, for example + * from driver ->remove(). + */ +#define device_lock_reset_class(dev) \ + device_lock_set_class(dev, &__lockdep_no_validate__) + void lock_device_hotplug(void); void unlock_device_hotplug(void); int lock_device_hotplug_sysfs(void);
The CXL "root" device, ACPI0017, is an attach point for coordinating platform level CXL resources and is the parent device for a CXL port topology tree. As such it has distinct locking rules relative to other CXL subsystem objects, but because it is an ACPI device the lock class is established well before it is given to the cxl_acpi driver. However, the lockdep API does support changing the lock class "live" for situations like this. Add a device_lock_set_class() helper that a driver can use in ->probe() to set a custom lock class, and device_lock_reset_class() to return to the default "no validate" class before the custom lock class key goes out of scope after ->remove(). Note the helpers are all macros to support dead code elimination in the CONFIG_PROVE_LOCKING=n case. Suggested-by: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Ben Widawsky <ben.widawsky@intel.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/cxl/acpi.c | 15 +++++++++++++++ include/linux/device.h | 25 +++++++++++++++++++++++++ 2 files changed, 40 insertions(+)