Message ID | YFf2RD931nq3RudJ@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | spi: ensure timely release of driver-allocated resources | expand |
On Sun, Mar 21, 2021 at 06:43:32PM -0700, Dmitry Torokhov wrote: > Note that this is not SPI-specific issue. I already send a similar > patch for I2C and will be sending more. This feels like it might make sense to push up to the driver core level then rather than doing in individual buses?
On Mon, Mar 22, 2021 at 12:37:07PM +0000, Mark Brown wrote: > On Sun, Mar 21, 2021 at 06:43:32PM -0700, Dmitry Torokhov wrote: > > > Note that this is not SPI-specific issue. I already send a similar > > patch for I2C and will be sending more. > > This feels like it might make sense to push up to the driver core level > then rather than doing in individual buses? That is exactly the issue: we can't. Driver core already releases all resources when a device is being unbound but that happens after bus "remove" code is executed and therefore is too late. The device might already be powered down, but various devm release() callbacks will be trying to access it. devm only works when you do not mix manual resources with managed ones, and when bus code allocates resources themselves (attaching a device to a power domain can be viewed as resource acquisition) we violate this principle. We could, of course, to make SPI bus' probe() use devm_add_action_or_reset() to work in removal of the device from the power domain into the stream of devm resources, but that still requires changes at bus code, and I believe will complicate matters if we need to extend SPI bus code to allocate more resources in probe(). So I opted for opening a devm group to separate resources allocated before and after probe() to be able to release them in the right order. Thanks.
On Mon, Mar 22, 2021 at 12:38:15PM -0700, Dmitry Torokhov wrote: > On Mon, Mar 22, 2021 at 12:37:07PM +0000, Mark Brown wrote: > > This feels like it might make sense to push up to the driver core level > > then rather than doing in individual buses? > That is exactly the issue: we can't. Driver core already releases all > resources when a device is being unbound but that happens after bus > "remove" code is executed and therefore is too late. The device might > already be powered down, but various devm release() callbacks will be > trying to access it. Can you provide a concrete example of something that is causing problems here? If something is trying to access the device after remove() has run that sounds like it's abusing devres somehow. It sounded from your commit log like this was something to do with the amount of time it took the driver core to action the frees rather than an ordering issue. > devm only works when you do not mix manual resources with managed ones, > and when bus code allocates resources themselves (attaching a device to > a power domain can be viewed as resource acquisition) we violate this > principle. We could, of course, to make SPI bus' probe() use > devm_add_action_or_reset() to work in removal of the device from the > power domain into the stream of devm resources, but that still requires > changes at bus code, and I believe will complicate matters if we need to > extend SPI bus code to allocate more resources in probe(). So I opted > for opening a devm group to separate resources allocated before and > after probe() to be able to release them in the right order. Sure, these are standard issues that people create with excessive use of devm but the device's remove() callback is surely already a concern by itself here?
On Tue, Mar 23, 2021 at 05:36:06PM +0000, Mark Brown wrote: > On Mon, Mar 22, 2021 at 12:38:15PM -0700, Dmitry Torokhov wrote: > > On Mon, Mar 22, 2021 at 12:37:07PM +0000, Mark Brown wrote: > > > > This feels like it might make sense to push up to the driver core level > > > then rather than doing in individual buses? > > > That is exactly the issue: we can't. Driver core already releases all > > resources when a device is being unbound but that happens after bus > > "remove" code is executed and therefore is too late. The device might > > already be powered down, but various devm release() callbacks will be > > trying to access it. > > Can you provide a concrete example of something that is causing problems > here? If something is trying to access the device after remove() has > run that sounds like it's abusing devres somehow. It sounded from your > commit log like this was something to do with the amount of time it took > the driver core to action the frees rather than an ordering issue. No it is ordering issue. I do not have a proven real-life example for SPI, but we do have one for I2C: https://lore.kernel.org/linux-devicetree/20210305041236.3489-7-jeff@labundy.com/ However, if we consider fairly typical SPI driver, such as drivers/input/touchscreen/ad7877.c, you can see that it uses devm in its probe() and because all resources are managed, it does not define remove() at all. So during proble we have: <driver core allocations> SPI: dev_pm_domain_attach AD7877: devm_kzalloc driver structure AD7877: devm allocation of input device AD7877: devm custom action to disable the chip on removal AD7877: devm IRQ request AD7877: devm sysfs attribute group AD7877: devm input registration <additional devm driver core allocations?> And on remove: SPI: dev_pm_domain_detach !!!!!! <deallocate additional devm driver core allocations?> AD7877: devm input unregistration AD7877: devm sysfs attribute group removal AD7877: devm freeing IRQ AD7877: devm disable the chip AD7877: devm freeing of input device AD7877: devm free driver structure <deallocate driver core allocations> Note how dev_pm_domain_detach() jumped ahead of everything, and strictly speaking past this point we can no longer guarantee that we can access the chip and disable it. > > > devm only works when you do not mix manual resources with managed ones, > > and when bus code allocates resources themselves (attaching a device to > > a power domain can be viewed as resource acquisition) we violate this > > principle. We could, of course, to make SPI bus' probe() use > > devm_add_action_or_reset() to work in removal of the device from the > > power domain into the stream of devm resources, but that still requires > > changes at bus code, and I believe will complicate matters if we need to > > extend SPI bus code to allocate more resources in probe(). So I opted > > for opening a devm group to separate resources allocated before and > > after probe() to be able to release them in the right order. > > Sure, these are standard issues that people create with excessive use of devm is a fact of life and we need to live with it. I am unconvinced if it solved more issues that it brought in, but it is something that driver authors like to use and are pushed towards. > devm but the device's remove() callback is surely already a concern by > itself here? In the example above there is not one, but even if it exists, it is called first, so in some limited cases you could have non-managed resources allocated very last and released first in remove(), and then have devm release the rest. However driver's remove() is not issue here, it is bus' non-trivial remove. Thanks.
On Tue, Mar 23, 2021 at 12:04:34PM -0700, Dmitry Torokhov wrote: > On Tue, Mar 23, 2021 at 05:36:06PM +0000, Mark Brown wrote: > No it is ordering issue. I do not have a proven real-life example for > SPI, but we do have one for I2C: > https://lore.kernel.org/linux-devicetree/20210305041236.3489-7-jeff@labundy.com/ TBH that looks like a fairly standard case where you probably don't want to be using devm for the interrupts in the first place. Leaving the interrupts live after the bus thinks it freed the device doesn't seem like the best idea, I'm not sure I'd expect that to work reliably when the device tries to call into the bus code to interact with the device that the bus thought was already freed anyway. If we want this to work reliably it really feels like we should have two remove callbacks in the driver core doing this rather than open coding in every single bus which is what we'd need to do - this is going to affect any bus that does anything other than just call the device's remove() callback. PCI looks like it might have issues too for example, and platform does as well and those were simply the first two buses I looked at. Possibly we want a driver core callback which is scheduled via devm (remove_devm(), cleanup() or something). We'd still need to move things about in all the buses but it seems preferable to do it that way rather than open coding opening a group and the comments about what's going on and the ordering requirements everywhere, it's a little less error prone going forward. > Note how dev_pm_domain_detach() jumped ahead of everything, and > strictly speaking past this point we can no longer guarantee that we can > access the chip and disable it. Frankly it looks like the PM domain stuff shouldn't be in the probe() and remove() paths at all and this has been bogusly copies from other buses, it should be in the device registration paths. The device is in the domain no matter what's going on with binding it. Given how generic code is I'm not even sure why it's in the buses.
On Wed, Mar 24, 2021 at 09:32:26PM +0000, Mark Brown wrote: > On Tue, Mar 23, 2021 at 12:04:34PM -0700, Dmitry Torokhov wrote: > > On Tue, Mar 23, 2021 at 05:36:06PM +0000, Mark Brown wrote: > > > No it is ordering issue. I do not have a proven real-life example for > > SPI, but we do have one for I2C: > > > https://lore.kernel.org/linux-devicetree/20210305041236.3489-7-jeff@labundy.com/ > > TBH that looks like a fairly standard case where you probably don't want > to be using devm for the interrupts in the first place. Leaving the > interrupts live after the bus thinks it freed the device doesn't seem > like the best idea, I'm not sure I'd expect that to work reliably when > the device tries to call into the bus code to interact with the device > that the bus thought was already freed anyway. That is not an argument really. By that token we should not be using devm for anything but memory, and that is only until we implement some kind of memleak checker that will ensure that driver-allocated memory is released after driver's remove() method exits. If we have devm API we need to make sure it works. You also misread that the issue is limited to interrupts, when i fact in this particular driver it is the input device that is being unregistered too late and fails to timely quiesce the device. Resulting interrupt storm is merely a side effect of this. > > If we want this to work reliably it really feels like we should have two > remove callbacks in the driver core doing this rather than open coding > in every single bus which is what we'd need to do - this is going to > affect any bus that does anything other than just call the device's > remove() callback. PCI looks like it might have issues too for example, > and platform does as well and those were simply the first two buses I > looked at. Possibly we want a driver core callback which is scheduled > via devm (remove_devm(), cleanup() or something). We'd still need to > move things about in all the buses but it seems preferable to do it that > way rather than open coding opening a group and the comments about > what's going on and the ordering requirements everywhere, it's a little > less error prone going forward. From the driver's perspective they expect devm-allocated resources to happen immediately after ->remove() method is run. I do not believe any driver is interested in additional callback, and you'd need to modify a boatload of drivers to fix this issue. I agree with you that manual group handling might be a bit confusing and sprinkling the same comment everywhere is not too nice, so how about we provide: void *devm_mark_driver_resources(struct device *dev) and void devm_release_driver_resources() and keep the commentary there? The question is where to keep driver_res_id field - in generic device structure or put it into bus' device structure as some buses and classes do not need it and we'd be sawing 4-8 bytes per device structure this way. Another way is to force buses to use devm for their resource management (I.e. in case of SPI wrap dev_pm_domain_detach() in devm_add_action_or_release()). It works for buses that have small number of resource allocated, but gets more unwieldy and more expensive the more resources are managed at bus level, that is why I opted for opening a group. > > > Note how dev_pm_domain_detach() jumped ahead of everything, and > > strictly speaking past this point we can no longer guarantee that we can > > access the chip and disable it. > > Frankly it looks like the PM domain stuff shouldn't be in the probe() > and remove() paths at all and this has been bogusly copies from other > buses, it should be in the device registration paths. The device is in > the domain no matter what's going on with binding it. Given how generic > code is I'm not even sure why it's in the buses. Here I will agree with you, bit I think it comes from power domain "duality". In OF power domain represents grouping of devices, and is static as devices do not move around, whereas in ACPI domain means control, and we are putting a device under control of ACPI PM when we bind it to a driver. As part of that control we bring the device into _D0, etc. Yay for mixing concepts, but this is not really material to the question of how to orderly release resources. Thanks.
On Wed, Mar 24, 2021 at 03:27:23PM -0700, Dmitry Torokhov wrote: > On Wed, Mar 24, 2021 at 09:32:26PM +0000, Mark Brown wrote: > > TBH that looks like a fairly standard case where you probably don't want > > to be using devm for the interrupts in the first place. Leaving the > > interrupts live after the bus thinks it freed the device doesn't seem > > like the best idea, I'm not sure I'd expect that to work reliably when > > the device tries to call into the bus code to interact with the device > > that the bus thought was already freed anyway. > > That is not an argument really. By that token we should not be using > devm for anything but memory, and that is only until we implement some > kind of memleak checker that will ensure that driver-allocated memory is > released after driver's remove() method exits. I pretty much agree with that perspective TBH, I rarely see interrupt users that I conside safe. It's the things that actively do stuff that cause issues, interrupts and registration of userspace interfaces both being particularly likely to do so as work comes in asynchronously. > You also misread that the issue is limited to interrupts, when i fact > in this particular driver it is the input device that is being > unregistered too late and fails to timely quiesce the device. Resulting > interrupt storm is merely a side effect of this. My understanding was that the issue is that the driver is generating work because the interrupt is firing, if the interrupt had been unregistered we'd not be getting the interupts delivered (probably they'd have been disabled at the interrupt controller level) and not have the problem of trying to handle them on a partially unwound device. > > looked at. Possibly we want a driver core callback which is scheduled > > via devm (remove_devm(), cleanup() or something). We'd still need to > > move things about in all the buses but it seems preferable to do it that > > way rather than open coding opening a group and the comments about > > what's going on and the ordering requirements everywhere, it's a little > > less error prone going forward. > From the driver's perspective they expect devm-allocated resources to > happen immediately after ->remove() method is run. I do not believe any > driver is interested in additional callback, and you'd need to modify > a boatload of drivers to fix this issue. Not a callback for the device drivers, for the buses. This is essentially about converting the unwinding the bus does to be sequenced for devm. > I agree with you that manual group handling might be a bit confusing > and sprinkling the same comment everywhere is not too nice, so how about > we provide: > void *devm_mark_driver_resources(struct device *dev) > and > void devm_release_driver_resources() > and keep the commentary there? The question is where to keep > driver_res_id field - in generic device structure or put it into bus' > device structure as some buses and classes do not need it and we'd be > sawing 4-8 bytes per device structure this way. I guess bus' device :/ > Another way is to force buses to use devm for their resource management > (I.e. in case of SPI wrap dev_pm_domain_detach() in > devm_add_action_or_release()). It works for buses that have small number > of resource allocated, but gets more unwieldy and more expensive the > more resources are managed at bus level, that is why I opted for opening > a group. If the driver core were doing it and scheduling a single callback the bus provides then that callback could do as much as it likes...
On Wed, Mar 24, 2021 at 11:39:35PM +0000, Mark Brown wrote: > On Wed, Mar 24, 2021 at 03:27:23PM -0700, Dmitry Torokhov wrote: > > On Wed, Mar 24, 2021 at 09:32:26PM +0000, Mark Brown wrote: > > > > TBH that looks like a fairly standard case where you probably don't want > > > to be using devm for the interrupts in the first place. Leaving the > > > interrupts live after the bus thinks it freed the device doesn't seem > > > like the best idea, I'm not sure I'd expect that to work reliably when > > > the device tries to call into the bus code to interact with the device > > > that the bus thought was already freed anyway. > > > > That is not an argument really. By that token we should not be using > > devm for anything but memory, and that is only until we implement some > > kind of memleak checker that will ensure that driver-allocated memory is > > released after driver's remove() method exits. > > I pretty much agree with that perspective TBH, I rarely see interrupt > users that I conside safe. It's the things that actively do stuff that > cause issues, interrupts and registration of userspace interfaces both > being particularly likely to do so as work comes in asynchronously. > > > You also misread that the issue is limited to interrupts, when i fact > > in this particular driver it is the input device that is being > > unregistered too late and fails to timely quiesce the device. Resulting > > interrupt storm is merely a side effect of this. > > My understanding was that the issue is that the driver is generating > work because the interrupt is firing, if the interrupt had been > unregistered we'd not be getting the interupts delivered (probably > they'd have been disabled at the interrupt controller level) and not > have the problem of trying to handle them on a partially unwound device. Yes, but the root of the problem is that we rely that the device will be stopped when input core "closes" it upon device unregistering, so even if we did not have this particular issue with interrupt storm we would still be at risk of accessing half-dead device. What I am trying to say (and I believe we are actually agreeing with each other) is that not only interrupts, but other devm-allocated objects are also can cause problems here. > > > > looked at. Possibly we want a driver core callback which is scheduled > > > via devm (remove_devm(), cleanup() or something). We'd still need to > > > move things about in all the buses but it seems preferable to do it that > > > way rather than open coding opening a group and the comments about > > > what's going on and the ordering requirements everywhere, it's a little > > > less error prone going forward. > > > From the driver's perspective they expect devm-allocated resources to > > happen immediately after ->remove() method is run. I do not believe any > > driver is interested in additional callback, and you'd need to modify > > a boatload of drivers to fix this issue. > > Not a callback for the device drivers, for the buses. This is > essentially about converting the unwinding the bus does to be sequenced > for devm. Ah, OK. > > > I agree with you that manual group handling might be a bit confusing > > and sprinkling the same comment everywhere is not too nice, so how about > > we provide: > > > void *devm_mark_driver_resources(struct device *dev) > > > and > > > void devm_release_driver_resources() > > > and keep the commentary there? The question is where to keep > > driver_res_id field - in generic device structure or put it into bus' > > device structure as some buses and classes do not need it and we'd be > > sawing 4-8 bytes per device structure this way. > > I guess bus' device :/ > > > Another way is to force buses to use devm for their resource management > > (I.e. in case of SPI wrap dev_pm_domain_detach() in > > devm_add_action_or_release()). It works for buses that have small number > > of resource allocated, but gets more unwieldy and more expensive the > > more resources are managed at bus level, that is why I opted for opening > > a group. > > If the driver core were doing it and scheduling a single callback the > bus provides then that callback could do as much as it likes... OK, I can look into it. In the meantime wills something like below a bit easier for you to stomach? If so I'll resubmit formally. diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index c19a09201358..a0db83bcb3d4 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -375,6 +375,11 @@ static int spi_uevent(struct device *dev, struct kobj_uevent_env *env) return add_uevent_var(env, "MODALIAS=%s%s", SPI_MODULE_PREFIX, spi->modalias); } +static void spi_dev_pm_domain_detach(void *_dev) +{ + dev_pm_domain_detach(_dev, true); +} + static int spi_probe(struct device *dev) { const struct spi_driver *sdrv = to_spi_driver(dev->driver); @@ -421,11 +426,12 @@ static int spi_probe(struct device *dev) if (ret) return ret; - if (sdrv->probe) { + ret = devm_add_action_or_reset(dev, spi_dev_pm_domain_detach, dev); + if (ret) + return ret; + + if (sdrv->probe) ret = sdrv->probe(spi); - if (ret) - dev_pm_domain_detach(dev, true); - } return ret; } @@ -444,8 +450,6 @@ static int spi_remove(struct device *dev) ERR_PTR(ret)); } - dev_pm_domain_detach(dev, true); - return 0; }
On Wed, Mar 24, 2021 at 05:17:52PM -0700, Dmitry Torokhov wrote: > OK, I can look into it. In the meantime wills something like below a bit > easier for you to stomach? If so I'll resubmit formally. Well, it's a bit nicer for a bus level bodge so I guess, but really I'm more thinking we need something in the driver core for this.
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index c19a09201358..7c369cebebbd 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -421,29 +421,50 @@ static int spi_probe(struct device *dev) if (ret) return ret; + spi->devres_group_id = devres_open_group(dev, NULL, GFP_KERNEL); + if (!spi->devres_group_id) { + ret = -ENOMEM; + goto err_detach_pm_domain; + } + if (sdrv->probe) { ret = sdrv->probe(spi); if (ret) - dev_pm_domain_detach(dev, true); + goto err_release_driver_resources; } + /* + * Note that we are not closing the devres group opened above so + * even resources that were attached to the device after probe has + * run are released when spi_remove() is executed. This is needed as + * some drivers might allocate additional resources. + */ + + return 0; + +err_release_driver_resources: + devres_release_group(dev, spi->devres_group_id); +err_detach_pm_domain: + dev_pm_domain_detach(dev, true); return ret; } static int spi_remove(struct device *dev) { const struct spi_driver *sdrv = to_spi_driver(dev->driver); + struct spi_device *spi = to_spi_device(dev); if (sdrv->remove) { int ret; - ret = sdrv->remove(to_spi_device(dev)); + ret = sdrv->remove(spi); if (ret) dev_warn(dev, "Failed to unbind driver (%pe), ignoring\n", ERR_PTR(ret)); } + devres_release_group(dev, spi->devres_group_id); dev_pm_domain_detach(dev, true); return 0; diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index aa09fdc8042d..969dd8ccc657 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -144,6 +144,8 @@ extern int spi_delay_exec(struct spi_delay *_delay, struct spi_transfer *xfer); * not using a GPIO line) * @word_delay: delay to be inserted between consecutive * words of a transfer + * @devres_group_id: id of the devres group that will be created for resources + * acquired when probing this device. * * @statistics: statistics for the spi_device * @@ -195,6 +197,8 @@ struct spi_device { struct gpio_desc *cs_gpiod; /* chip select gpio desc */ struct spi_delay word_delay; /* inter-word delay */ + void *devres_group_id; + /* the statistics */ struct spi_statistics statistics;
More and more drivers rely on devres to manage their resources, however if bus' probe() and release() methods are not trivial and control some of resources as well (for example enable or disable clocks, or attach device to a power domain), we need to make sure that driver-allocated resources are released immediately after driver's remove() method returns, and not postponed until driver core gets around to releasing resources. To fix that we open a new devres group before calling driver's probe() and explicitly release it when we return from driver's remove(). Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> --- Note that this is not SPI-specific issue. I already send a similar patch for I2C and will be sending more. drivers/spi/spi.c | 25 +++++++++++++++++++++++-- include/linux/spi/spi.h | 4 ++++ 2 files changed, 27 insertions(+), 2 deletions(-)