Message ID | 20241212110221.1509163-1-ckeepax@opensource.cirrus.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | soundwire: intel_auxdevice: Don't disable IRQs before removing children | expand |
On 12/12/24 5:02 AM, Charles Keepax wrote: > Currently the auxiliary device for the link disables IRQs before it > calls sdw_bus_master_delete(). This has the side effect that > none of the devices on the link can access their own registers whilst > their remove functions run, because the IRQs are required for bus > transactions to function. Obviously, devices should be able to access > their own registers during disable to park the device suitably. What does 'park the device suitably' mean really? That sounds scary. What happens if the manager abruptly ceases to operate and yanks the power? Does the device catch on fire? On a related note, the manager should *never* expect to find the device in a 'suitable' state but always proceed with complete re-initialization. It would be good to expand on the issue, introducing new locks is one of those "no good deed goes unpunished" situations. > It would appear the reason for the disabling of the IRQs is that the IRQ > handler iterates through a linked list of all the links, once a link is > removed the memory pointed at by this linked list is freed, but not > removed from the linked_list itself. Add a list_del() for the linked > list item, note whilst the list itself is contained in the intel_init > portion of the code, the list remove needs to be attached to the > auxiliary device for the link, since that owns the memory that the list > points at. Locking is also required to ensure the IRQ handler runs > before or after any additions/removals from the list. that sounds very detailed but the initial reason for all this is still unclear.
On Mon, Dec 16, 2024 at 11:35:23AM -0600, Pierre-Louis Bossart wrote: > On 12/12/24 5:02 AM, Charles Keepax wrote: > > Currently the auxiliary device for the link disables IRQs before it > > calls sdw_bus_master_delete(). This has the side effect that > > none of the devices on the link can access their own registers whilst > > their remove functions run, because the IRQs are required for bus > > transactions to function. Obviously, devices should be able to access > > their own registers during disable to park the device suitably. > > What does 'park the device suitably' mean really? That sounds > scary. What happens if the manager abruptly ceases to operate and > yanks the power? Does the device catch on fire? On a related note, > the manager should *never* expect to find the device in a 'suitable' > state but always proceed with complete re-initialization. > > It would be good to expand on the issue, introducing new locks > is one of those "no good deed goes unpunished" situations. > I would agree that one should never make hardware that needs parked to avoid damage, but people do stupid things. Also, it doesn't have to be as catastrophic as that, it is usually a case of wanting to move the device into its lowest power state, turn off regulators on the device etc. > > It would appear the reason for the disabling of the IRQs is that the IRQ > > handler iterates through a linked list of all the links, once a link is > > removed the memory pointed at by this linked list is freed, but not > > removed from the linked_list itself. Add a list_del() for the linked > > list item, note whilst the list itself is contained in the intel_init > > portion of the code, the list remove needs to be attached to the > > auxiliary device for the link, since that owns the memory that the list > > points at. Locking is also required to ensure the IRQ handler runs > > before or after any additions/removals from the list. > > that sounds very detailed but the initial reason for all this is still > unclear. If you want, I can add the exact reason I am adding this change to the commit message, which is that regmap IRQ sensibly masks IRQs as they are removed, so when the cs42l43 removes one sees a bunch of failed transations, as the bus has broken. But to be honest I feel like it is overly specific, one could construct any number of similar situations, the real problem here is it is completely normal and reasonable to want to communicate with a device in remove and we should support that. Thanks, Charles
On 16/12/2024 5:35 pm, Pierre-Louis Bossart wrote: > On 12/12/24 5:02 AM, Charles Keepax wrote: >> Currently the auxiliary device for the link disables IRQs before it >> calls sdw_bus_master_delete(). This has the side effect that >> none of the devices on the link can access their own registers whilst >> their remove functions run, because the IRQs are required for bus >> transactions to function. Obviously, devices should be able to access >> their own registers during disable to park the device suitably. > > What does 'park the device suitably' mean really? That sounds scary. What happens if the manager abruptly ceases to operate and yanks the power? Does the device catch on fire? On a related note, the manager should *never* expect to find the device in a 'suitable' state but always proceed with complete re-initialization. > > It would be good to expand on the issue, introducing new locks is one of those "no good deed goes unpunished" situations. > >> It would appear the reason for the disabling of the IRQs is that the IRQ >> handler iterates through a linked list of all the links, once a link is >> removed the memory pointed at by this linked list is freed, but not >> removed from the linked_list itself. Add a list_del() for the linked >> list item, note whilst the list itself is contained in the intel_init >> portion of the code, the list remove needs to be attached to the >> auxiliary device for the link, since that owns the memory that the list >> points at. Locking is also required to ensure the IRQ handler runs >> before or after any additions/removals from the list. > > that sounds very detailed but the initial reason for all this is still unclear. > For example, if the bus driver module is unloaded, the kernel will call remove() on all the child drivers. The bus should remain functional while the child drivers deal with this unexpected unload. This could for example be writing some registers to put the peripheral into a low-power state. On ACPI systems the drivers don't have control of regulators so can't just pull power from the peripheral.
On 12/17/24 4:49 AM, Richard Fitzgerald wrote: > On 16/12/2024 5:35 pm, Pierre-Louis Bossart wrote: >> On 12/12/24 5:02 AM, Charles Keepax wrote: >>> Currently the auxiliary device for the link disables IRQs before it >>> calls sdw_bus_master_delete(). This has the side effect that >>> none of the devices on the link can access their own registers whilst >>> their remove functions run, because the IRQs are required for bus >>> transactions to function. Obviously, devices should be able to access >>> their own registers during disable to park the device suitably. >> >> What does 'park the device suitably' mean really? That sounds scary. What happens if the manager abruptly ceases to operate and yanks the power? Does the device catch on fire? On a related note, the manager should *never* expect to find the device in a 'suitable' state but always proceed with complete re-initialization. >> >> It would be good to expand on the issue, introducing new locks is one of those "no good deed goes unpunished" situations. >> >>> It would appear the reason for the disabling of the IRQs is that the IRQ >>> handler iterates through a linked list of all the links, once a link is >>> removed the memory pointed at by this linked list is freed, but not >>> removed from the linked_list itself. Add a list_del() for the linked >>> list item, note whilst the list itself is contained in the intel_init >>> portion of the code, the list remove needs to be attached to the >>> auxiliary device for the link, since that owns the memory that the list >>> points at. Locking is also required to ensure the IRQ handler runs >>> before or after any additions/removals from the list. >> >> that sounds very detailed but the initial reason for all this is still unclear. >> > For example, if the bus driver module is unloaded, the kernel will call > remove() on all the child drivers. The bus should remain functional > while the child drivers deal with this unexpected unload. This could > for example be writing some registers to put the peripheral into a > low-power state. On ACPI systems the drivers don't have control of > regulators so can't just pull power from the peripheral. Answering to the two replies at once: If the bus driver is unloaded, then the SoundWire clock will stop toggling. That's a rather large piece of information for the device to change states - I am pretty sure the SDCA spec even mandates that the state changes to at least PS_2. But to some extent one could argue that a remove() should be more aggressive than a suspend() and the driver could use PS_4 as the lower power state - there is no real requirement to restart interaction with the device with a simple procedure. The other problem I have with the notion of 'link_lock' is that we already have a notion of 'bus_lock'. And in everything we did so far the terms manager, link and bus are interoperable. So adding a new concept that looks very similar to the existing one shouldn't be done with an explanation of what lock is used for what.
On Tue, Dec 17, 2024 at 05:49:17PM -0600, Pierre-Louis Bossart wrote: > On 12/17/24 4:49 AM, Richard Fitzgerald wrote: > > On 16/12/2024 5:35 pm, Pierre-Louis Bossart wrote: > >> On 12/12/24 5:02 AM, Charles Keepax wrote: > > For example, if the bus driver module is unloaded, the kernel will call > > remove() on all the child drivers. The bus should remain functional > > while the child drivers deal with this unexpected unload. This could > > for example be writing some registers to put the peripheral into a > > low-power state. On ACPI systems the drivers don't have control of > > regulators so can't just pull power from the peripheral. > > Answering to the two replies at once: > > If the bus driver is unloaded, then the SoundWire clock will stop > toggling. That's a rather large piece of information for the device > to change states - The clock should only stop toggling after the drivers have been removed, anything else is a bug. > I am pretty sure the SDCA spec even mandates that > the state changes to at least PS_2. This code applies to more than just SDCA devices. > But to some extent one could argue that a remove() should be more > aggressive than a suspend() and the driver could use PS_4 as the > lower power state - there is no real requirement to restart > interaction with the device with a simple procedure. > Not really sure I follow this bit, none of this has anything to do with when one restarts interacting with the device. It is about leaving the device in a nice state when you stop interacting with it. > The other problem I have with the notion of 'link_lock' is that > we already have a notion of 'bus_lock'. And in everything we did so > far the terms manager, link and bus are interoperable. So adding a > new concept that looks very similar to the existing one shouldn't > be done with an explanation of what lock is used for what. I don't see much confusion here, the two locks are at different levels in the stack. If is fairly normal for a framework to have a lock and drivers to have individual locks under that. And the comment with the lock states it is protecting the list. That said I am not attached to this way of solving the problem either, all I am attached to is allowing devices to communicate in their remove functions. I think perhaps the important questions here are do you object to my assertion that a device should be able to communicate in its remove function? Or do you object to the way I have solved that problem? I am certainly open to other solutions, if you have any suggestions? Thanks, Charles
diff --git a/drivers/soundwire/intel.h b/drivers/soundwire/intel.h index dddd293814418..4df582ceaed1a 100644 --- a/drivers/soundwire/intel.h +++ b/drivers/soundwire/intel.h @@ -45,6 +45,7 @@ struct sdw_intel_link_res { u32 link_mask; struct sdw_cdns *cdns; struct list_head list; + struct mutex *link_lock; /* lock protecting list */ struct hdac_bus *hbus; }; diff --git a/drivers/soundwire/intel_auxdevice.c b/drivers/soundwire/intel_auxdevice.c index 599954d927529..5b2bbafef1ac0 100644 --- a/drivers/soundwire/intel_auxdevice.c +++ b/drivers/soundwire/intel_auxdevice.c @@ -498,9 +498,12 @@ static void intel_link_remove(struct auxiliary_device *auxdev) if (!bus->prop.hw_disabled) { sdw_intel_debugfs_exit(sdw); cancel_delayed_work_sync(&cdns->attach_dwork); - sdw_cdns_enable_interrupt(cdns, false); } + sdw_bus_master_delete(bus); + + if (!bus->prop.hw_disabled) + sdw_cdns_enable_interrupt(cdns, false); } int intel_link_process_wakeen_event(struct auxiliary_device *auxdev) diff --git a/drivers/soundwire/intel_init.c b/drivers/soundwire/intel_init.c index 12e7a98f319f8..db49ee3808151 100644 --- a/drivers/soundwire/intel_init.c +++ b/drivers/soundwire/intel_init.c @@ -28,6 +28,15 @@ static void intel_link_dev_release(struct device *dev) kfree(ldev); } +static void intel_link_list_del(void *data) +{ + struct sdw_intel_link_res *link = data; + + mutex_lock(link->link_lock); + list_del(&link->list); + mutex_unlock(link->link_lock); +} + /* alloc, init and add link devices */ static struct sdw_intel_link_dev *intel_link_dev_register(struct sdw_intel_res *res, struct sdw_intel_ctx *ctx, @@ -78,6 +87,7 @@ static struct sdw_intel_link_dev *intel_link_dev_register(struct sdw_intel_res * link->shim_vs = res->mmio_base + SDW_SHIM2_VS_BASE(link_id); link->shim_lock = res->eml_lock; } + link->link_lock = &ctx->link_lock; link->ops = res->ops; link->dev = res->dev; @@ -144,8 +154,10 @@ irqreturn_t sdw_intel_thread(int irq, void *dev_id) struct sdw_intel_ctx *ctx = dev_id; struct sdw_intel_link_res *link; + mutex_lock(&ctx->link_lock); list_for_each_entry(link, &ctx->link_list, list) sdw_cdns_irq(irq, link->cdns); + mutex_unlock(&ctx->link_lock); return IRQ_HANDLED; } @@ -209,6 +221,7 @@ static struct sdw_intel_ctx ctx->link_mask = res->link_mask; ctx->handle = res->handle; mutex_init(&ctx->shim_lock); + mutex_init(&ctx->link_lock); link_mask = ctx->link_mask; @@ -245,7 +258,10 @@ static struct sdw_intel_ctx i++; goto err; } + mutex_lock(&ctx->link_lock); list_add_tail(&link->list, &ctx->link_list); + mutex_unlock(&ctx->link_lock); + devm_add_action_or_reset(&ldev->auxdev.dev, intel_link_list_del, link); bus = &link->cdns->bus; /* Calculate number of slaves */ list_for_each(node, &bus->slaves) diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index 580086417e4b0..4444c99aead5f 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -304,6 +304,7 @@ struct sdw_intel_ctx { acpi_handle handle; struct sdw_intel_link_dev **ldev; struct list_head link_list; + struct mutex link_lock; /* lock protecting link_list */ struct mutex shim_lock; /* lock for access to shared SHIM registers */ u32 shim_mask; u32 shim_base;