Message ID | 20220725160859.1274472-1-rf@opensource.cirrus.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | component: try_module_get() to prevent unloading while in use | expand |
On 25/07/2022 19:09, Greg KH wrote: > On Mon, Jul 25, 2022 at 05:08:59PM +0100, Richard Fitzgerald wrote: >> Call try_module_get() on a component before attempting to call its >> bind() function, to ensure that a loadable module cannot be >> unloaded while we are executing its bind(). > > How can bind be called while the module is unloaded? > I didn't say it could. What I said is "unloaded while we are executing its bind()". Maybe that's already guaranteed to be safe somehow. It's actually the problem below that I was trying to fix but placing the try_module_get() before the bind() rather than after bind() seemed a trivial extra safety. >> If the bind is successful the module_put() is called only after it >> has been unbound. This ensures that the module cannot be unloaded >> while it is in use as an aggregate device. > > That's almost never the correct thing to do, what problem is this > solving? > What I see is that when a loadable module has been made part of an aggregate it is still possible to rmmod'd it. An alternative workaround would be for the parent to softdep to every driver that _might_ provide the aggregated components. Softdeps aren't unusual (we use it in some drivers that are directly related but don't directly link into each other). But to me this feels like a hack when used with the component framework - isn't the idea that the parent doesn't know (or doesn't need to know) which drivers will be aggregated? Wouldn't it be better that when a component driver is bound into an aggregate its module is automatically marked in-use? If there's a better way to mark the module in-use while is it bound into an aggregate, let me know and I'll look at implementing it. > thanks, > > greg k-h
On Tue, Jul 26, 2022 at 11:32:28AM +0100, Richard Fitzgerald wrote: > On 25/07/2022 19:09, Greg KH wrote: > > On Mon, Jul 25, 2022 at 05:08:59PM +0100, Richard Fitzgerald wrote: > > > Call try_module_get() on a component before attempting to call its > > > bind() function, to ensure that a loadable module cannot be > > > unloaded while we are executing its bind(). > > > > How can bind be called while the module is unloaded? > > > > I didn't say it could. What I said is "unloaded while we are executing > its bind()". Maybe that's already guaranteed to be safe somehow. It's > actually the problem below that I was trying to fix but placing the > try_module_get() before the bind() rather than after bind() seemed a > trivial extra safety. It should be safe, bind() can't race with module remove as the driver core locks will handle this. > > > If the bind is successful the module_put() is called only after it > > > has been unbound. This ensures that the module cannot be unloaded > > > while it is in use as an aggregate device. > > > > That's almost never the correct thing to do, what problem is this > > solving? > > > > What I see is that when a loadable module has been made part of an > aggregate it is still possible to rmmod'd it. > > An alternative workaround would be for the parent to softdep to every > driver that _might_ provide the aggregated components. Softdeps aren't > unusual (we use it in some drivers that are directly related but don't > directly link into each other). But to me this feels like a hack when > used with the component framework - isn't the idea that the parent > doesn't know (or doesn't need to know) which drivers will be aggregated? > Wouldn't it be better that when a component driver is bound into an > aggregate its module is automatically marked in-use? > > If there's a better way to mark the module in-use while is it bound > into an aggregate, let me know and I'll look at implementing it. No module references should be incremented if a device is bound to a driver, that's the old (1990's) way of thinking. If a module wants to be unloaded, let it, and clean up everything that it was controlling/talking to before the module remove is finished. That's the way all busses should be working, you don't increment a module count when a driver binds to a device, otherwise how would you unload a module that was being used at all? So just remove the components controlled by the module properly when it is removed and all should be good. Do you have example code in the kernel tree today that does not properly do this? Why not just fix that instead? thanks, greg k-h
On 26/07/2022 18:28, Greg KH wrote: > On Tue, Jul 26, 2022 at 11:32:28AM +0100, Richard Fitzgerald wrote: >> On 25/07/2022 19:09, Greg KH wrote: >>> On Mon, Jul 25, 2022 at 05:08:59PM +0100, Richard Fitzgerald wrote: >>>> Call try_module_get() on a component before attempting to call its >>>> bind() function, to ensure that a loadable module cannot be >>>> unloaded while we are executing its bind(). >>> >>> How can bind be called while the module is unloaded? >>> >> >> I didn't say it could. What I said is "unloaded while we are executing >> its bind()". Maybe that's already guaranteed to be safe somehow. It's >> actually the problem below that I was trying to fix but placing the >> try_module_get() before the bind() rather than after bind() seemed a >> trivial extra safety. > > It should be safe, bind() can't race with module remove as the driver > core locks will handle this. > >>>> If the bind is successful the module_put() is called only after it >>>> has been unbound. This ensures that the module cannot be unloaded >>>> while it is in use as an aggregate device. >>> >>> That's almost never the correct thing to do, what problem is this >>> solving? >>> >> >> What I see is that when a loadable module has been made part of an >> aggregate it is still possible to rmmod'd it. >> >> An alternative workaround would be for the parent to softdep to every >> driver that _might_ provide the aggregated components. Softdeps aren't >> unusual (we use it in some drivers that are directly related but don't >> directly link into each other). But to me this feels like a hack when >> used with the component framework - isn't the idea that the parent >> doesn't know (or doesn't need to know) which drivers will be aggregated? >> Wouldn't it be better that when a component driver is bound into an >> aggregate its module is automatically marked in-use? >> >> If there's a better way to mark the module in-use while is it bound >> into an aggregate, let me know and I'll look at implementing it. > > No module references should be incremented if a device is bound to a > driver, that's the old (1990's) way of thinking. If a module wants to > be unloaded, let it, and clean up everything that it was > controlling/talking to before the module remove is finished. > > That's the way all busses should be working, you don't increment a > module count when a driver binds to a device, otherwise how would you > unload a module that was being used at all? > > So just remove the components controlled by the module properly when it > is removed and all should be good. > > Do you have example code in the kernel tree today that does not properly > do this? Why not just fix that instead? > The actual code I'm working on isn't upstream yet, but it's a derivative of the way these two interoperate: sound/pci/hda/patch_realtek sound/pci/hda/cs35l41_hda.c In these systems the host audio interface is HDA but the amps are not HDA devices. Audio goes through the Realtek HDA codec to the amp but the amp is on a different bus (i2c or spi). The modules in the HDA stack all get marked as in-use, but the amp driver doesn't. So if it's unloaded the audio system is left in a limbo state where ALSA and the HDA stack are still up but the amp driver code has gone. However I realised that my try_module_get() isn't a fix anyway. It's claiming use of the module implementing the component but not of the bus that owns that module. I assume that's what you were referring to by having to deal with an unload instead of trying to prevent the unload. (And yes, I'm aware that in that patch_realtek.c it's missing locking around the shared struct to prevent it being accessed during a bind and unbind.) > thanks, > > greg k-h
On Thu, Jul 28, 2022 at 11:49:57AM +0100, Richard Fitzgerald wrote: > On 26/07/2022 18:28, Greg KH wrote: > > On Tue, Jul 26, 2022 at 11:32:28AM +0100, Richard Fitzgerald wrote: > > > On 25/07/2022 19:09, Greg KH wrote: > > > > On Mon, Jul 25, 2022 at 05:08:59PM +0100, Richard Fitzgerald wrote: > > > > > Call try_module_get() on a component before attempting to call its > > > > > bind() function, to ensure that a loadable module cannot be > > > > > unloaded while we are executing its bind(). > > > > > > > > How can bind be called while the module is unloaded? > > > > > > > > > > I didn't say it could. What I said is "unloaded while we are executing > > > its bind()". Maybe that's already guaranteed to be safe somehow. It's > > > actually the problem below that I was trying to fix but placing the > > > try_module_get() before the bind() rather than after bind() seemed a > > > trivial extra safety. > > > > It should be safe, bind() can't race with module remove as the driver > > core locks will handle this. > > > > > > > If the bind is successful the module_put() is called only after it > > > > > has been unbound. This ensures that the module cannot be unloaded > > > > > while it is in use as an aggregate device. > > > > > > > > That's almost never the correct thing to do, what problem is this > > > > solving? > > > > > > > > > > What I see is that when a loadable module has been made part of an > > > aggregate it is still possible to rmmod'd it. > > > > > > An alternative workaround would be for the parent to softdep to every > > > driver that _might_ provide the aggregated components. Softdeps aren't > > > unusual (we use it in some drivers that are directly related but don't > > > directly link into each other). But to me this feels like a hack when > > > used with the component framework - isn't the idea that the parent > > > doesn't know (or doesn't need to know) which drivers will be aggregated? > > > Wouldn't it be better that when a component driver is bound into an > > > aggregate its module is automatically marked in-use? > > > > > > If there's a better way to mark the module in-use while is it bound > > > into an aggregate, let me know and I'll look at implementing it. > > > > No module references should be incremented if a device is bound to a > > driver, that's the old (1990's) way of thinking. If a module wants to > > be unloaded, let it, and clean up everything that it was > > controlling/talking to before the module remove is finished. > > > > That's the way all busses should be working, you don't increment a > > module count when a driver binds to a device, otherwise how would you > > unload a module that was being used at all? > > > > So just remove the components controlled by the module properly when it > > is removed and all should be good. > > > > Do you have example code in the kernel tree today that does not properly > > do this? Why not just fix that instead? > > > > The actual code I'm working on isn't upstream yet, but it's a derivative > of the way these two interoperate: > sound/pci/hda/patch_realtek > sound/pci/hda/cs35l41_hda.c > > In these systems the host audio interface is HDA but the amps are not > HDA devices. Audio goes through the Realtek HDA codec to the amp but > the amp is on a different bus (i2c or spi). The modules in the HDA stack > all get marked as in-use, but the amp driver doesn't. So if it's > unloaded the audio system is left in a limbo state where ALSA and the > HDA stack are still up but the amp driver code has gone. > > However I realised that my try_module_get() isn't a fix anyway. > It's claiming use of the module implementing the component but not > of the bus that owns that module. I assume that's what you were > referring to by having to deal with an unload instead of trying to > prevent the unload. Yes. Also there is no requirement that a module increment its reference count even if it is being used, as long as it can properly tear things down when it is removed. Look at network modules for examples of that working properly. Odds are other modules should move to that model over time, and we are not going to try to go backwards and add module reference counting to the driver core. module references lock code, you are modifying data here, two different things. thanks, greg k-h
diff --git a/drivers/base/component.c b/drivers/base/component.c index 5eadeac6c532..01dbef4a6187 100644 --- a/drivers/base/component.c +++ b/drivers/base/component.c @@ -580,6 +580,8 @@ static void component_unbind(struct component *component, /* Release all resources claimed in the binding of this component */ devres_release_group(component->dev, component); + + module_put(component->dev->driver->owner); } /** @@ -617,13 +619,18 @@ static int component_bind(struct component *component, struct aggregate_device * { int ret; + if (!try_module_get(component->dev->driver->owner)) + return -ENODEV; + /* * Each component initialises inside its own devres group. * This allows us to roll-back a failed component without * affecting anything else. */ - if (!devres_open_group(adev->parent, NULL, GFP_KERNEL)) + if (!devres_open_group(adev->parent, NULL, GFP_KERNEL)) { + module_put(component->dev->driver->owner); return -ENOMEM; + } /* * Also open a group for the device itself: this allows us @@ -632,6 +639,7 @@ static int component_bind(struct component *component, struct aggregate_device * */ if (!devres_open_group(component->dev, component, GFP_KERNEL)) { devres_release_group(adev->parent, NULL); + module_put(component->dev->driver->owner); return -ENOMEM; }
Call try_module_get() on a component before attempting to call its bind() function, to ensure that a loadable module cannot be unloaded while we are executing its bind(). If the bind is successful the module_put() is called only after it has been unbound. This ensures that the module cannot be unloaded while it is in use as an aggregate device. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> --- drivers/base/component.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)