diff mbox series

PCI: Fix use-after-free of slot->bus on hot remove

Message ID 4bfd4c0e976c1776cd08e76603903b338cf25729.1728579288.git.lukas@wunner.de (mailing list archive)
State Accepted
Delegated to: Bjorn Helgaas
Headers show
Series PCI: Fix use-after-free of slot->bus on hot remove | expand

Commit Message

Lukas Wunner Oct. 10, 2024, 5:10 p.m. UTC
Dennis reports a boot crash on recent Lenovo laptops with a USB4 dock.

Since commit 0fc70886569c ("thunderbolt: Reset USB4 v2 host router") and
commit 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot
firmware"), USB4 v2 and v1 Host Routers are reset on probe of the
thunderbolt driver.

The reset clears the Presence Detect State and Data Link Layer Link Active
bits at the USB4 Host Router's Root Port and thus causes hot removal of
the dock.

The crash occurs when pciehp is unbound from one of the dock's Downstream
Ports:  pciehp creates a pci_slot on bind and destroys it on unbind.  The
pci_slot contains a pointer to the pci_bus below the Downstream Port, but
a reference on that pci_bus is never acquired.  The pci_bus is destroyed
before the pci_slot, so a use-after-free ensues when pci_slot_release()
accesses slot->bus.

In principle this should not happen because pci_stop_bus_device() unbinds
pciehp (and therefore destroys the pci_slot) before the pci_bus is
destroyed by pci_remove_bus_device().

However the stacktrace provided by Dennis shows that pciehp is unbound
from pci_remove_bus_device() instead of pci_stop_bus_device().
To understand the significance of this, one needs to know that the PCI
core uses a two step process to remove a portion of the hierarchy:  It
first unbinds all drivers in the sub-hierarchy in pci_stop_bus_device()
and then actually removes the devices in pci_remove_bus_device().
There is no precaution to prevent driver binding in-between
pci_stop_bus_device() and pci_remove_bus_device().

In Dennis' case, it seems removal of the hierarchy by pciehp races with
driver binding by pci_bus_add_devices().  pciehp is bound to the
Downstream Port after pci_stop_bus_device() has run, so it is unbound by
pci_remove_bus_device() instead of pci_stop_bus_device().  Because the
pci_bus has already been destroyed at that point, accesses to it result in
a use-after-free.

One might conclude that driver binding needs to be prevented after
pci_stop_bus_device() has run.  However it seems risky that pci_slot
points to pci_bus without holding a reference.  Solely relying on correct
ordering of driver unbind versus pci_bus destruction is certainly not
defensive programming.

If pci_slot has a need to access data in pci_bus, it ought to acquire a
reference.  Amend pci_create_slot() accordingly.  Dennis reports that the
crash is not reproducible with this change.

Abridged stacktrace:

  pcieport 0000:00:07.0: PME: Signaling with IRQ 156
  pcieport 0000:00:07.0: pciehp: Slot #12 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
  pci_bus 0000:20: dev 00, created physical slot 12
  pcieport 0000:00:07.0: pciehp: Slot(12): Card not present
  ...
  pcieport 0000:21:02.0: pciehp: pcie_disable_notification: SLOTCTRL d8 write cmd 0
  Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 13 UID: 0 PID: 134 Comm: irq/156-pciehp Not tainted 6.11.0-devel+ #1
  RIP: 0010:dev_driver_string+0x12/0x40
  pci_destroy_slot
  pciehp_remove
  pcie_port_remove_service
  device_release_driver_internal
  bus_remove_device
  device_del
  device_unregister
  remove_iter
  device_for_each_child
  pcie_portdrv_remove
  pci_device_remove
  device_release_driver_internal
  bus_remove_device
  device_del
  pci_remove_bus_device (recursive invocation)
  pci_remove_bus_device
  pciehp_unconfigure_device
  pciehp_disable_slot
  pciehp_handle_presence_or_link_change
  pciehp_ist

Reported-by: Dennis Wassenberg <Dennis.Wassenberg@secunet.com>
Tested-by: Dennis Wassenberg <Dennis.Wassenberg@secunet.com>
Closes: https://lore.kernel.org/r/6de4b45ff2b32dd91a805ec02ec8ec73ef411bf6.camel@secunet.com/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
---
I am tempted to remove the call to device_release_driver() from
pci_stop_dev() and just rely on driver unbinding by device_del().
It would simplify and rationalize the code.  The call was introduced by
commit c4a0a5d964e9 (PCI: Move device_del() from pci_stop_dev() to
pci_destroy_dev()) without providing an explicit reason.

Dennis stress-tested driver unbinding via device_del() without witnessing
any problems.  The only downside I see is that it would re-introduce the
cosmetic issue avoided by commit 16b6c8bb687c ("PCI: Detach driver before
procfs & sysfs teardown on device remove").

Preventing driver binding after pci_stop_bus_device() should be achieved
by this one-line patch, though that's still racy as pci_bus_add_devices()
might revert the match_driver flag to true after pci_stop_bus_device() has
set it to false:
https://lore.kernel.org/r/Zv-dIHDXNNYomG2Y@wunner.de/

An alternative would be to serialize removal of the hierarchy with
pci_bus_add_devices() by way of pci_lock_rescan_remove():
https://lore.kernel.org/r/20241003084342.27501-1-brgl@bgdev.pl/

Both approaches are yet to be tested by Dennis.  Personally I would like
to avoid the pci_lock_rescan_remove() approach.  We should try to move
away from this big lock and use finer grained locking instead.  So again,
just dropping the call to device_release_driver() would be the simplest
and most preferred approach from my point of view.  Thoughts?

 drivers/pci/slot.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Mika Westerberg Oct. 11, 2024, 5:41 a.m. UTC | #1
Hi,

On Thu, Oct 10, 2024 at 07:10:34PM +0200, Lukas Wunner wrote:
> Dennis reports a boot crash on recent Lenovo laptops with a USB4 dock.
> 
> Since commit 0fc70886569c ("thunderbolt: Reset USB4 v2 host router") and
> commit 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot
> firmware"), USB4 v2 and v1 Host Routers are reset on probe of the
> thunderbolt driver.
> 
> The reset clears the Presence Detect State and Data Link Layer Link Active
> bits at the USB4 Host Router's Root Port and thus causes hot removal of
> the dock.

Can't this happen also simply unplug at some part of the PCIe topology?
I don't think this is specific to TB/USB4.

> The crash occurs when pciehp is unbound from one of the dock's Downstream
> Ports:  pciehp creates a pci_slot on bind and destroys it on unbind.  The
> pci_slot contains a pointer to the pci_bus below the Downstream Port, but
> a reference on that pci_bus is never acquired.  The pci_bus is destroyed
> before the pci_slot, so a use-after-free ensues when pci_slot_release()
> accesses slot->bus.
> 
> In principle this should not happen because pci_stop_bus_device() unbinds
> pciehp (and therefore destroys the pci_slot) before the pci_bus is
> destroyed by pci_remove_bus_device().
> 
> However the stacktrace provided by Dennis shows that pciehp is unbound
> from pci_remove_bus_device() instead of pci_stop_bus_device().
> To understand the significance of this, one needs to know that the PCI
> core uses a two step process to remove a portion of the hierarchy:  It
> first unbinds all drivers in the sub-hierarchy in pci_stop_bus_device()
> and then actually removes the devices in pci_remove_bus_device().
> There is no precaution to prevent driver binding in-between
> pci_stop_bus_device() and pci_remove_bus_device().
> 
> In Dennis' case, it seems removal of the hierarchy by pciehp races with
> driver binding by pci_bus_add_devices().  pciehp is bound to the
> Downstream Port after pci_stop_bus_device() has run, so it is unbound by
> pci_remove_bus_device() instead of pci_stop_bus_device().  Because the
> pci_bus has already been destroyed at that point, accesses to it result in
> a use-after-free.
> 
> One might conclude that driver binding needs to be prevented after
> pci_stop_bus_device() has run.  However it seems risky that pci_slot
> points to pci_bus without holding a reference.  Solely relying on correct
> ordering of driver unbind versus pci_bus destruction is certainly not
> defensive programming.
> 
> If pci_slot has a need to access data in pci_bus, it ought to acquire a
> reference.  Amend pci_create_slot() accordingly.  Dennis reports that the
> crash is not reproducible with this change.
> 
> Abridged stacktrace:
> 
>   pcieport 0000:00:07.0: PME: Signaling with IRQ 156
>   pcieport 0000:00:07.0: pciehp: Slot #12 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
>   pci_bus 0000:20: dev 00, created physical slot 12
>   pcieport 0000:00:07.0: pciehp: Slot(12): Card not present
>   ...
>   pcieport 0000:21:02.0: pciehp: pcie_disable_notification: SLOTCTRL d8 write cmd 0
>   Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
>   CPU: 13 UID: 0 PID: 134 Comm: irq/156-pciehp Not tainted 6.11.0-devel+ #1
>   RIP: 0010:dev_driver_string+0x12/0x40
>   pci_destroy_slot
>   pciehp_remove
>   pcie_port_remove_service
>   device_release_driver_internal
>   bus_remove_device
>   device_del
>   device_unregister
>   remove_iter
>   device_for_each_child
>   pcie_portdrv_remove
>   pci_device_remove
>   device_release_driver_internal
>   bus_remove_device
>   device_del
>   pci_remove_bus_device (recursive invocation)
>   pci_remove_bus_device
>   pciehp_unconfigure_device
>   pciehp_disable_slot
>   pciehp_handle_presence_or_link_change
>   pciehp_ist
> 
> Reported-by: Dennis Wassenberg <Dennis.Wassenberg@secunet.com>
> Tested-by: Dennis Wassenberg <Dennis.Wassenberg@secunet.com>
> Closes: https://lore.kernel.org/r/6de4b45ff2b32dd91a805ec02ec8ec73ef411bf6.camel@secunet.com/
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Lukas Wunner Oct. 11, 2024, 10:12 a.m. UTC | #2
On Fri, Oct 11, 2024 at 08:41:15AM +0300, Mika Westerberg wrote:
> On Thu, Oct 10, 2024 at 07:10:34PM +0200, Lukas Wunner wrote:
> > Dennis reports a boot crash on recent Lenovo laptops with a USB4 dock.
> > 
> > Since commit 0fc70886569c ("thunderbolt: Reset USB4 v2 host router") and
> > commit 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot
> > firmware"), USB4 v2 and v1 Host Routers are reset on probe of the
> > thunderbolt driver.
> > 
> > The reset clears the Presence Detect State and Data Link Layer Link Active
> > bits at the USB4 Host Router's Root Port and thus causes hot removal of
> > the dock.
> 
> Can't this happen also simply unplug at some part of the PCIe topology?
> I don't think this is specific to TB/USB4.

The crash seems to occur because the boot-time invocation of
pci_bus_add_devices() races with pciehp's pci_stop_and_remove_bus_device().

In principle, yes, on a non-USB4 system you could unplug the dock exactly
when pci_bus_add_devices() is running and cause the same crash, even though
the Host Router is not reset.  But that's very hard to reproduce.
You need to unplug at just the right moment.

In this case however the reset of the Host Router seems to reliably
reproduce the conditions to cause the crash, so I thought it's worth
calling that out explicitly.  USB4 Host Routers are readily available
in the field and becoming more and more commonplace, so chances that
users experience the crash are high -- specifically if they're booting
a USB4 system with attached Thunderbolt devices.

Thanks,

Lukas
Mika Westerberg Oct. 11, 2024, 10:55 a.m. UTC | #3
On Fri, Oct 11, 2024 at 12:12:39PM +0200, Lukas Wunner wrote:
> On Fri, Oct 11, 2024 at 08:41:15AM +0300, Mika Westerberg wrote:
> > On Thu, Oct 10, 2024 at 07:10:34PM +0200, Lukas Wunner wrote:
> > > Dennis reports a boot crash on recent Lenovo laptops with a USB4 dock.
> > > 
> > > Since commit 0fc70886569c ("thunderbolt: Reset USB4 v2 host router") and
> > > commit 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot
> > > firmware"), USB4 v2 and v1 Host Routers are reset on probe of the
> > > thunderbolt driver.
> > > 
> > > The reset clears the Presence Detect State and Data Link Layer Link Active
> > > bits at the USB4 Host Router's Root Port and thus causes hot removal of
> > > the dock.
> > 
> > Can't this happen also simply unplug at some part of the PCIe topology?
> > I don't think this is specific to TB/USB4.
> 
> The crash seems to occur because the boot-time invocation of
> pci_bus_add_devices() races with pciehp's pci_stop_and_remove_bus_device().
> 
> In principle, yes, on a non-USB4 system you could unplug the dock exactly
> when pci_bus_add_devices() is running and cause the same crash, even though
> the Host Router is not reset.  But that's very hard to reproduce.
> You need to unplug at just the right moment.
> 
> In this case however the reset of the Host Router seems to reliably
> reproduce the conditions to cause the crash, so I thought it's worth
> calling that out explicitly.  USB4 Host Routers are readily available
> in the field and becoming more and more commonplace, so chances that
> users experience the crash are high -- specifically if they're booting
> a USB4 system with attached Thunderbolt devices.

Yeah agree, makes sense.
Bjorn Helgaas Oct. 30, 2024, 9:35 p.m. UTC | #4
On Thu, Oct 10, 2024 at 07:10:34PM +0200, Lukas Wunner wrote:
> Dennis reports a boot crash on recent Lenovo laptops with a USB4 dock.
> 
> Since commit 0fc70886569c ("thunderbolt: Reset USB4 v2 host router") and
> commit 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot
> firmware"), USB4 v2 and v1 Host Routers are reset on probe of the
> thunderbolt driver.
> 
> The reset clears the Presence Detect State and Data Link Layer Link Active
> bits at the USB4 Host Router's Root Port and thus causes hot removal of
> the dock.
> 
> The crash occurs when pciehp is unbound from one of the dock's Downstream
> Ports:  pciehp creates a pci_slot on bind and destroys it on unbind.  The
> pci_slot contains a pointer to the pci_bus below the Downstream Port, but
> a reference on that pci_bus is never acquired.  The pci_bus is destroyed
> before the pci_slot, so a use-after-free ensues when pci_slot_release()
> accesses slot->bus.
> 
> In principle this should not happen because pci_stop_bus_device() unbinds
> pciehp (and therefore destroys the pci_slot) before the pci_bus is
> destroyed by pci_remove_bus_device().
> 
> However the stacktrace provided by Dennis shows that pciehp is unbound
> from pci_remove_bus_device() instead of pci_stop_bus_device().
> To understand the significance of this, one needs to know that the PCI
> core uses a two step process to remove a portion of the hierarchy:  It
> first unbinds all drivers in the sub-hierarchy in pci_stop_bus_device()
> and then actually removes the devices in pci_remove_bus_device().
> There is no precaution to prevent driver binding in-between
> pci_stop_bus_device() and pci_remove_bus_device().
> 
> In Dennis' case, it seems removal of the hierarchy by pciehp races with
> driver binding by pci_bus_add_devices().  pciehp is bound to the
> Downstream Port after pci_stop_bus_device() has run, so it is unbound by
> pci_remove_bus_device() instead of pci_stop_bus_device().  Because the
> pci_bus has already been destroyed at that point, accesses to it result in
> a use-after-free.
> 
> One might conclude that driver binding needs to be prevented after
> pci_stop_bus_device() has run.  However it seems risky that pci_slot
> points to pci_bus without holding a reference.  Solely relying on correct
> ordering of driver unbind versus pci_bus destruction is certainly not
> defensive programming.
> 
> If pci_slot has a need to access data in pci_bus, it ought to acquire a
> reference.  Amend pci_create_slot() accordingly.  Dennis reports that the
> crash is not reproducible with this change.
> 
> Abridged stacktrace:
> 
>   pcieport 0000:00:07.0: PME: Signaling with IRQ 156
>   pcieport 0000:00:07.0: pciehp: Slot #12 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
>   pci_bus 0000:20: dev 00, created physical slot 12
>   pcieport 0000:00:07.0: pciehp: Slot(12): Card not present
>   ...
>   pcieport 0000:21:02.0: pciehp: pcie_disable_notification: SLOTCTRL d8 write cmd 0
>   Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
>   CPU: 13 UID: 0 PID: 134 Comm: irq/156-pciehp Not tainted 6.11.0-devel+ #1
>   RIP: 0010:dev_driver_string+0x12/0x40
>   pci_destroy_slot
>   pciehp_remove
>   pcie_port_remove_service
>   device_release_driver_internal
>   bus_remove_device
>   device_del
>   device_unregister
>   remove_iter
>   device_for_each_child
>   pcie_portdrv_remove
>   pci_device_remove
>   device_release_driver_internal
>   bus_remove_device
>   device_del
>   pci_remove_bus_device (recursive invocation)
>   pci_remove_bus_device
>   pciehp_unconfigure_device
>   pciehp_disable_slot
>   pciehp_handle_presence_or_link_change
>   pciehp_ist
> 
> Reported-by: Dennis Wassenberg <Dennis.Wassenberg@secunet.com>
> Tested-by: Dennis Wassenberg <Dennis.Wassenberg@secunet.com>
> Closes: https://lore.kernel.org/r/6de4b45ff2b32dd91a805ec02ec8ec73ef411bf6.camel@secunet.com/
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org

Applied with Mika's reviewed-by to pci/hotplug for v6.13, thanks!

> ---
> I am tempted to remove the call to device_release_driver() from
> pci_stop_dev() and just rely on driver unbinding by device_del().
> It would simplify and rationalize the code.  The call was introduced by
> commit c4a0a5d964e9 (PCI: Move device_del() from pci_stop_dev() to
> pci_destroy_dev()) without providing an explicit reason.
> 
> Dennis stress-tested driver unbinding via device_del() without witnessing
> any problems.  The only downside I see is that it would re-introduce the
> cosmetic issue avoided by commit 16b6c8bb687c ("PCI: Detach driver before
> procfs & sysfs teardown on device remove").
> 
> Preventing driver binding after pci_stop_bus_device() should be achieved
> by this one-line patch, though that's still racy as pci_bus_add_devices()
> might revert the match_driver flag to true after pci_stop_bus_device() has
> set it to false:
> https://lore.kernel.org/r/Zv-dIHDXNNYomG2Y@wunner.de/
> 
> An alternative would be to serialize removal of the hierarchy with
> pci_bus_add_devices() by way of pci_lock_rescan_remove():
> https://lore.kernel.org/r/20241003084342.27501-1-brgl@bgdev.pl/
> 
> Both approaches are yet to be tested by Dennis.  Personally I would like
> to avoid the pci_lock_rescan_remove() approach.  We should try to move
> away from this big lock and use finer grained locking instead.  So again,
> just dropping the call to device_release_driver() would be the simplest
> and most preferred approach from my point of view.  Thoughts?
> 
>  drivers/pci/slot.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
> index 0f87cad..ed645c7 100644
> --- a/drivers/pci/slot.c
> +++ b/drivers/pci/slot.c
> @@ -79,6 +79,7 @@ static void pci_slot_release(struct kobject *kobj)
>  	up_read(&pci_bus_sem);
>  
>  	list_del(&slot->list);
> +	pci_bus_put(slot->bus);
>  
>  	kfree(slot);
>  }
> @@ -261,7 +262,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
>  		goto err;
>  	}
>  
> -	slot->bus = parent;
> +	slot->bus = pci_bus_get(parent);
>  	slot->number = slot_nr;
>  
>  	slot->kobj.kset = pci_slots_kset;
> @@ -269,6 +270,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
>  	slot_name = make_slot_name(name);
>  	if (!slot_name) {
>  		err = -ENOMEM;
> +		pci_bus_put(slot->bus);
>  		kfree(slot);
>  		goto err;
>  	}
> -- 
> 2.43.0
>
diff mbox series

Patch

diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
index 0f87cad..ed645c7 100644
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -79,6 +79,7 @@  static void pci_slot_release(struct kobject *kobj)
 	up_read(&pci_bus_sem);
 
 	list_del(&slot->list);
+	pci_bus_put(slot->bus);
 
 	kfree(slot);
 }
@@ -261,7 +262,7 @@  struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
 		goto err;
 	}
 
-	slot->bus = parent;
+	slot->bus = pci_bus_get(parent);
 	slot->number = slot_nr;
 
 	slot->kobj.kset = pci_slots_kset;
@@ -269,6 +270,7 @@  struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
 	slot_name = make_slot_name(name);
 	if (!slot_name) {
 		err = -ENOMEM;
+		pci_bus_put(slot->bus);
 		kfree(slot);
 		goto err;
 	}