diff mbox series

soundwire: intel_auxdevice: Don't disable IRQs before removing children

Message ID 20241212110221.1509163-1-ckeepax@opensource.cirrus.com (mailing list archive)
State New
Headers show
Series soundwire: intel_auxdevice: Don't disable IRQs before removing children | expand

Commit Message

Charles Keepax Dec. 12, 2024, 11:02 a.m. UTC
Currently the auxiliary device for the link disables IRQs before it
calls sdw_bus_master_delete(). This has the side effect that
none of the devices on the link can access their own registers whilst
their remove functions run, because the IRQs are required for bus
transactions to function. Obviously, devices should be able to access
their own registers during disable to park the device suitably.

It would appear the reason for the disabling of the IRQs is that the IRQ
handler iterates through a linked list of all the links, once a link is
removed the memory pointed at by this linked list is freed, but not
removed from the linked_list itself. Add a list_del() for the linked
list item, note whilst the list itself is contained in the intel_init
portion of the code, the list remove needs to be attached to the
auxiliary device for the link, since that owns the memory that the list
points at. Locking is also required to ensure the IRQ handler runs
before or after any additions/removals from the list.

Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
---
 drivers/soundwire/intel.h           |  1 +
 drivers/soundwire/intel_auxdevice.c |  5 ++++-
 drivers/soundwire/intel_init.c      | 16 ++++++++++++++++
 include/linux/soundwire/sdw_intel.h |  1 +
 4 files changed, 22 insertions(+), 1 deletion(-)

Comments

Pierre-Louis Bossart Dec. 16, 2024, 5:35 p.m. UTC | #1
On 12/12/24 5:02 AM, Charles Keepax wrote:
> Currently the auxiliary device for the link disables IRQs before it
> calls sdw_bus_master_delete(). This has the side effect that
> none of the devices on the link can access their own registers whilst
> their remove functions run, because the IRQs are required for bus
> transactions to function. Obviously, devices should be able to access
> their own registers during disable to park the device suitably.

What does 'park the device suitably' mean really? That sounds scary. What happens if the manager abruptly ceases to operate and yanks the power? Does the device catch on fire? On a related note, the manager should *never* expect to find the device in a 'suitable' state but always proceed with complete re-initialization.

It would be good to expand on the issue, introducing new locks is one of those "no good deed goes unpunished" situations.

> It would appear the reason for the disabling of the IRQs is that the IRQ
> handler iterates through a linked list of all the links, once a link is
> removed the memory pointed at by this linked list is freed, but not
> removed from the linked_list itself. Add a list_del() for the linked
> list item, note whilst the list itself is contained in the intel_init
> portion of the code, the list remove needs to be attached to the
> auxiliary device for the link, since that owns the memory that the list
> points at. Locking is also required to ensure the IRQ handler runs
> before or after any additions/removals from the list.

that sounds very detailed but the initial reason for all this is still unclear.
Charles Keepax Dec. 17, 2024, 10:24 a.m. UTC | #2
On Mon, Dec 16, 2024 at 11:35:23AM -0600, Pierre-Louis Bossart wrote:
> On 12/12/24 5:02 AM, Charles Keepax wrote:
> > Currently the auxiliary device for the link disables IRQs before it
> > calls sdw_bus_master_delete(). This has the side effect that
> > none of the devices on the link can access their own registers whilst
> > their remove functions run, because the IRQs are required for bus
> > transactions to function. Obviously, devices should be able to access
> > their own registers during disable to park the device suitably.
> 
> What does 'park the device suitably' mean really? That sounds
> scary. What happens if the manager abruptly ceases to operate and
> yanks the power? Does the device catch on fire? On a related note,
> the manager should *never* expect to find the device in a 'suitable'
> state but always proceed with complete re-initialization.
> 
> It would be good to expand on the issue, introducing new locks
> is one of those "no good deed goes unpunished" situations.
> 

I would agree that one should never make hardware that needs parked
to avoid damage, but people do stupid things. Also, it doesn't
have to be as catastrophic as that, it is usually a case of wanting
to move the device into its lowest power state, turn off regulators
on the device etc.

> > It would appear the reason for the disabling of the IRQs is that the IRQ
> > handler iterates through a linked list of all the links, once a link is
> > removed the memory pointed at by this linked list is freed, but not
> > removed from the linked_list itself. Add a list_del() for the linked
> > list item, note whilst the list itself is contained in the intel_init
> > portion of the code, the list remove needs to be attached to the
> > auxiliary device for the link, since that owns the memory that the list
> > points at. Locking is also required to ensure the IRQ handler runs
> > before or after any additions/removals from the list.
> 
> that sounds very detailed but the initial reason for all this is still
> unclear.

If you want, I can add the exact reason I am adding this change to
the commit message, which is that regmap IRQ sensibly masks IRQs
as they are removed, so when the cs42l43 removes one sees a bunch
of failed transations, as the bus has broken. But to be honest I
feel like it is overly specific, one could construct any number of
similar situations, the real problem here is it is completely normal
and reasonable to want to communicate with a device in remove and
we should support that.

Thanks,
Charles
Richard Fitzgerald Dec. 17, 2024, 10:49 a.m. UTC | #3
On 16/12/2024 5:35 pm, Pierre-Louis Bossart wrote:
> On 12/12/24 5:02 AM, Charles Keepax wrote:
>> Currently the auxiliary device for the link disables IRQs before it
>> calls sdw_bus_master_delete(). This has the side effect that
>> none of the devices on the link can access their own registers whilst
>> their remove functions run, because the IRQs are required for bus
>> transactions to function. Obviously, devices should be able to access
>> their own registers during disable to park the device suitably.
> 
> What does 'park the device suitably' mean really? That sounds scary. What happens if the manager abruptly ceases to operate and yanks the power? Does the device catch on fire? On a related note, the manager should *never* expect to find the device in a 'suitable' state but always proceed with complete re-initialization.
> 
> It would be good to expand on the issue, introducing new locks is one of those "no good deed goes unpunished" situations.
> 
>> It would appear the reason for the disabling of the IRQs is that the IRQ
>> handler iterates through a linked list of all the links, once a link is
>> removed the memory pointed at by this linked list is freed, but not
>> removed from the linked_list itself. Add a list_del() for the linked
>> list item, note whilst the list itself is contained in the intel_init
>> portion of the code, the list remove needs to be attached to the
>> auxiliary device for the link, since that owns the memory that the list
>> points at. Locking is also required to ensure the IRQ handler runs
>> before or after any additions/removals from the list.
> 
> that sounds very detailed but the initial reason for all this is still unclear.
> 
For example, if the bus driver module is unloaded, the kernel will call
remove() on all the child drivers. The bus should remain functional
while the child drivers deal with this unexpected unload. This could
for example be writing some registers to put the peripheral into a
low-power state. On ACPI systems the drivers don't have control of
regulators so can't just pull power from the peripheral.
Pierre-Louis Bossart Dec. 17, 2024, 11:49 p.m. UTC | #4
On 12/17/24 4:49 AM, Richard Fitzgerald wrote:
> On 16/12/2024 5:35 pm, Pierre-Louis Bossart wrote:
>> On 12/12/24 5:02 AM, Charles Keepax wrote:
>>> Currently the auxiliary device for the link disables IRQs before it
>>> calls sdw_bus_master_delete(). This has the side effect that
>>> none of the devices on the link can access their own registers whilst
>>> their remove functions run, because the IRQs are required for bus
>>> transactions to function. Obviously, devices should be able to access
>>> their own registers during disable to park the device suitably.
>>
>> What does 'park the device suitably' mean really? That sounds scary. What happens if the manager abruptly ceases to operate and yanks the power? Does the device catch on fire? On a related note, the manager should *never* expect to find the device in a 'suitable' state but always proceed with complete re-initialization.
>>
>> It would be good to expand on the issue, introducing new locks is one of those "no good deed goes unpunished" situations.
>>
>>> It would appear the reason for the disabling of the IRQs is that the IRQ
>>> handler iterates through a linked list of all the links, once a link is
>>> removed the memory pointed at by this linked list is freed, but not
>>> removed from the linked_list itself. Add a list_del() for the linked
>>> list item, note whilst the list itself is contained in the intel_init
>>> portion of the code, the list remove needs to be attached to the
>>> auxiliary device for the link, since that owns the memory that the list
>>> points at. Locking is also required to ensure the IRQ handler runs
>>> before or after any additions/removals from the list.
>>
>> that sounds very detailed but the initial reason for all this is still unclear.
>>
> For example, if the bus driver module is unloaded, the kernel will call
> remove() on all the child drivers. The bus should remain functional
> while the child drivers deal with this unexpected unload. This could
> for example be writing some registers to put the peripheral into a
> low-power state. On ACPI systems the drivers don't have control of
> regulators so can't just pull power from the peripheral.

Answering to the two replies at once:

If the bus driver is unloaded, then the SoundWire clock will stop toggling. That's a rather large piece of information for the device to change states - I am pretty sure the SDCA spec even mandates that the state changes to at least PS_2.
But to some extent one could argue that a remove() should be more aggressive than a suspend() and the driver could use PS_4 as the lower power state - there is no real requirement to restart interaction with the device with a simple procedure.

The other problem I have with the notion of 'link_lock' is that we already have a notion of 'bus_lock'. And in everything we did so far the terms manager, link and bus are interoperable. So adding a new concept that looks very similar to the existing one shouldn't be done with an explanation of what lock is used for what.
Charles Keepax Dec. 18, 2024, 9:51 a.m. UTC | #5
On Tue, Dec 17, 2024 at 05:49:17PM -0600, Pierre-Louis Bossart wrote:
> On 12/17/24 4:49 AM, Richard Fitzgerald wrote:
> > On 16/12/2024 5:35 pm, Pierre-Louis Bossart wrote:
> >> On 12/12/24 5:02 AM, Charles Keepax wrote:
> > For example, if the bus driver module is unloaded, the kernel will call
> > remove() on all the child drivers. The bus should remain functional
> > while the child drivers deal with this unexpected unload. This could
> > for example be writing some registers to put the peripheral into a
> > low-power state. On ACPI systems the drivers don't have control of
> > regulators so can't just pull power from the peripheral.
> 
> Answering to the two replies at once:
> 
> If the bus driver is unloaded, then the SoundWire clock will stop
> toggling. That's a rather large piece of information for the device
> to change states -

The clock should only stop toggling after the drivers have been
removed, anything else is a bug.

> I am pretty sure the SDCA spec even mandates that
> the state changes to at least PS_2.

This code applies to more than just SDCA devices.

> But to some extent one could argue that a remove() should be more
> aggressive than a suspend() and the driver could use PS_4 as the
> lower power state - there is no real requirement to restart
> interaction with the device with a simple procedure.
> 

Not really sure I follow this bit, none of this has anything to do
with when one restarts interacting with the device. It is about
leaving the device in a nice state when you stop interacting with
it.

> The other problem I have with the notion of 'link_lock' is that
> we already have a notion of 'bus_lock'. And in everything we did so
> far the terms manager, link and bus are interoperable. So adding a
> new concept that looks very similar to the existing one shouldn't
> be done with an explanation of what lock is used for what.

I don't see much confusion here, the two locks are at different
levels in the stack. If is fairly normal for a framework to have
a lock and drivers to have individual locks under that. And the
comment with the lock states it is protecting the list.

That said I am not attached to this way of solving the problem
either, all I am attached to is allowing devices to communicate in
their remove functions. I think perhaps the important questions
here are do you object to my assertion that a device should be
able to communicate in its remove function? Or do you object to
the way I have solved that problem? I am certainly open to other
solutions, if you have any suggestions?

Thanks,
Charles
diff mbox series

Patch

diff --git a/drivers/soundwire/intel.h b/drivers/soundwire/intel.h
index dddd293814418..4df582ceaed1a 100644
--- a/drivers/soundwire/intel.h
+++ b/drivers/soundwire/intel.h
@@ -45,6 +45,7 @@  struct sdw_intel_link_res {
 	u32 link_mask;
 	struct sdw_cdns *cdns;
 	struct list_head list;
+	struct mutex *link_lock; /* lock protecting list */
 	struct hdac_bus *hbus;
 };
 
diff --git a/drivers/soundwire/intel_auxdevice.c b/drivers/soundwire/intel_auxdevice.c
index 599954d927529..5b2bbafef1ac0 100644
--- a/drivers/soundwire/intel_auxdevice.c
+++ b/drivers/soundwire/intel_auxdevice.c
@@ -498,9 +498,12 @@  static void intel_link_remove(struct auxiliary_device *auxdev)
 	if (!bus->prop.hw_disabled) {
 		sdw_intel_debugfs_exit(sdw);
 		cancel_delayed_work_sync(&cdns->attach_dwork);
-		sdw_cdns_enable_interrupt(cdns, false);
 	}
+
 	sdw_bus_master_delete(bus);
+
+	if (!bus->prop.hw_disabled)
+		sdw_cdns_enable_interrupt(cdns, false);
 }
 
 int intel_link_process_wakeen_event(struct auxiliary_device *auxdev)
diff --git a/drivers/soundwire/intel_init.c b/drivers/soundwire/intel_init.c
index 12e7a98f319f8..db49ee3808151 100644
--- a/drivers/soundwire/intel_init.c
+++ b/drivers/soundwire/intel_init.c
@@ -28,6 +28,15 @@  static void intel_link_dev_release(struct device *dev)
 	kfree(ldev);
 }
 
+static void intel_link_list_del(void *data)
+{
+	struct sdw_intel_link_res *link = data;
+
+	mutex_lock(link->link_lock);
+	list_del(&link->list);
+	mutex_unlock(link->link_lock);
+}
+
 /* alloc, init and add link devices */
 static struct sdw_intel_link_dev *intel_link_dev_register(struct sdw_intel_res *res,
 							  struct sdw_intel_ctx *ctx,
@@ -78,6 +87,7 @@  static struct sdw_intel_link_dev *intel_link_dev_register(struct sdw_intel_res *
 		link->shim_vs = res->mmio_base + SDW_SHIM2_VS_BASE(link_id);
 		link->shim_lock = res->eml_lock;
 	}
+	link->link_lock = &ctx->link_lock;
 
 	link->ops = res->ops;
 	link->dev = res->dev;
@@ -144,8 +154,10 @@  irqreturn_t sdw_intel_thread(int irq, void *dev_id)
 	struct sdw_intel_ctx *ctx = dev_id;
 	struct sdw_intel_link_res *link;
 
+	mutex_lock(&ctx->link_lock);
 	list_for_each_entry(link, &ctx->link_list, list)
 		sdw_cdns_irq(irq, link->cdns);
+	mutex_unlock(&ctx->link_lock);
 
 	return IRQ_HANDLED;
 }
@@ -209,6 +221,7 @@  static struct sdw_intel_ctx
 	ctx->link_mask = res->link_mask;
 	ctx->handle = res->handle;
 	mutex_init(&ctx->shim_lock);
+	mutex_init(&ctx->link_lock);
 
 	link_mask = ctx->link_mask;
 
@@ -245,7 +258,10 @@  static struct sdw_intel_ctx
 			i++;
 			goto err;
 		}
+		mutex_lock(&ctx->link_lock);
 		list_add_tail(&link->list, &ctx->link_list);
+		mutex_unlock(&ctx->link_lock);
+		devm_add_action_or_reset(&ldev->auxdev.dev, intel_link_list_del, link);
 		bus = &link->cdns->bus;
 		/* Calculate number of slaves */
 		list_for_each(node, &bus->slaves)
diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h
index 580086417e4b0..4444c99aead5f 100644
--- a/include/linux/soundwire/sdw_intel.h
+++ b/include/linux/soundwire/sdw_intel.h
@@ -304,6 +304,7 @@  struct sdw_intel_ctx {
 	acpi_handle handle;
 	struct sdw_intel_link_dev **ldev;
 	struct list_head link_list;
+	struct mutex link_lock; /* lock protecting link_list */
 	struct mutex shim_lock; /* lock for access to shared SHIM registers */
 	u32 shim_mask;
 	u32 shim_base;