diff mbox series

[v2] net: dsa: mv88e6xxx: propperly shutdown PPU re-enable timer on destroy

Message ID 20241203144448.30880-1-david.oberhollenzer@sigma-star.at (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [v2] net: dsa: mv88e6xxx: propperly shutdown PPU re-enable timer on destroy | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: buytenh@wantstofly.org; 1 maintainers not CCed: buytenh@wantstofly.org
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 304 this patch: 304
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 41 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2024-12-04--00-00 (tests: 759)

Commit Message

David Oberhollenzer Dec. 3, 2024, 2:43 p.m. UTC
The mv88e6xxx has an internal PPU that polls PHY state. If we want to
access the internal PHYs, we need to disable it. Because enable/disable
of the PPU is a slow operation, a 10ms timer is used to re-enable it,
canceled with every access, so bulk operations effectively only disable
it once and re-enable it some 10ms after the last access.

If a PHY is accessed and then the mv88e6xxx module is removed before
the 10ms are up, the PPU re-enable ends up accessing a dangling pointer.

This especially affects probing during bootup. The MDIO bus and PHY
registration may succeed, but registration with the DSA framework
may fail later on (e.g. because the CPU port depends on another,
very slow device that isn't done probing yet, returning -EPROBE_DEFER).
In this case, probe() fails, but the MDIO subsystem may already have
accessed the MIDO bus or PHYs, arming timer.

This is fixed as follows:
 - If probe fails after mv88e6xxx_phy_init(), make sure we also call
   mv88e6xxx_phy_destroy() before returning
 - In mv88e6xxx_phy_destroy(), grab the ppu_mutex to make sure the work
   function either has already exited, or (should it run) cannot do
   anything, fails to grab the mutex and returns.
 - In addition to destroying the timer, also destroy the work item, in
   case the timer has already fired.
 - Do all of this synchronously, to make sure timer & work item are
   destroyed and none of the callbacks are running.

Fixes: 2e5f032095ff ("dsa: add support for the Marvell 88E6131 switch chip")
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
---
FWIW, this is a forward port of a patch I'm using on v6.6.

Thanks,

David
---
 drivers/net/dsa/mv88e6xxx/chip.c | 8 +++++---
 drivers/net/dsa/mv88e6xxx/phy.c  | 3 +++
 2 files changed, 8 insertions(+), 3 deletions(-)

Comments

Andrew Lunn Dec. 4, 2024, 2:31 a.m. UTC | #1
On Tue, Dec 03, 2024 at 03:43:40PM +0100, David Oberhollenzer wrote:
> The mv88e6xxx has an internal PPU that polls PHY state. If we want to
> access the internal PHYs, we need to disable it. Because enable/disable
> of the PPU is a slow operation, a 10ms timer is used to re-enable it,
> canceled with every access, so bulk operations effectively only disable
> it once and re-enable it some 10ms after the last access.
> 
> If a PHY is accessed and then the mv88e6xxx module is removed before
> the 10ms are up, the PPU re-enable ends up accessing a dangling pointer.
> 
> This especially affects probing during bootup. The MDIO bus and PHY
> registration may succeed, but registration with the DSA framework
> may fail later on (e.g. because the CPU port depends on another,
> very slow device that isn't done probing yet, returning -EPROBE_DEFER).
> In this case, probe() fails, but the MDIO subsystem may already have
> accessed the MIDO bus or PHYs, arming timer.
> 
> This is fixed as follows:
>  - If probe fails after mv88e6xxx_phy_init(), make sure we also call
>    mv88e6xxx_phy_destroy() before returning
>  - In mv88e6xxx_phy_destroy(), grab the ppu_mutex to make sure the work
>    function either has already exited, or (should it run) cannot do
>    anything, fails to grab the mutex and returns.

On first reading this, i did not understand the code is using
mutex_trylock() which made me think it could deadlock. Maybe change
this to "mutex_trylock() fails to get the mutex and returns.

But i'm not actually sure this is needed. There are plenty of other
examples of destroying a work which does not take a mutex.

>  - In addition to destroying the timer, also destroy the work item, in
>    case the timer has already fired.
>  - Do all of this synchronously, to make sure timer & work item are
>    destroyed and none of the callbacks are running.

This is the important part, doing it synchronously. cancel_work_sync()
should be enough.

>  static void mv88e6xxx_phy_ppu_state_destroy(struct mv88e6xxx_chip *chip)
>  {
> +	mutex_lock(&chip->ppu_mutex);
>  	del_timer_sync(&chip->ppu_timer);
> +	cancel_work_sync(&chip->ppu_work);
> +	mutex_unlock(&chip->ppu_mutex);
>  }

/**
 * del_timer_sync - Delete a pending timer and wait for a running callback
 * @timer:	The timer to be deleted
 *
 * See timer_delete_sync() for detailed explanation.
 *
 * Do not use in new code. Use timer_delete_sync() instead.


    Andrew

---
pw-bot: cr
diff mbox series

Patch

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 284270a4ade1..c2af69bed660 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -7264,13 +7264,13 @@  static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 	err = mv88e6xxx_switch_reset(chip);
 	mv88e6xxx_reg_unlock(chip);
 	if (err)
-		goto out;
+		goto out_phy;
 
 	if (np) {
 		chip->irq = of_irq_get(np, 0);
 		if (chip->irq == -EPROBE_DEFER) {
 			err = chip->irq;
-			goto out;
+			goto out_phy;
 		}
 	}
 
@@ -7289,7 +7289,7 @@  static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 	mv88e6xxx_reg_unlock(chip);
 
 	if (err)
-		goto out;
+		goto out_phy;
 
 	if (chip->info->g2_irqs > 0) {
 		err = mv88e6xxx_g2_irq_setup(chip);
@@ -7323,6 +7323,8 @@  static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 		mv88e6xxx_g1_irq_free(chip);
 	else
 		mv88e6xxx_irq_poll_free(chip);
+out_phy:
+	mv88e6xxx_phy_destroy(chip);
 out:
 	if (pdata)
 		dev_put(pdata->netdev);
diff --git a/drivers/net/dsa/mv88e6xxx/phy.c b/drivers/net/dsa/mv88e6xxx/phy.c
index 8bb88b3d900d..ee9e5d7e5277 100644
--- a/drivers/net/dsa/mv88e6xxx/phy.c
+++ b/drivers/net/dsa/mv88e6xxx/phy.c
@@ -229,7 +229,10 @@  static void mv88e6xxx_phy_ppu_state_init(struct mv88e6xxx_chip *chip)
 
 static void mv88e6xxx_phy_ppu_state_destroy(struct mv88e6xxx_chip *chip)
 {
+	mutex_lock(&chip->ppu_mutex);
 	del_timer_sync(&chip->ppu_timer);
+	cancel_work_sync(&chip->ppu_work);
+	mutex_unlock(&chip->ppu_mutex);
 }
 
 int mv88e6185_phy_ppu_read(struct mv88e6xxx_chip *chip, struct mii_bus *bus,