Message ID | PU1P153MB0169DBCFEE7257F5BB93580ABFD90@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier | expand |
Hi Dexuan, The subject line only describes the mechanical code change, which is obvious from the patch. It would be better if we could say something about *why* we need this. On Fri, Aug 02, 2019 at 01:32:28AM +0000, Dexuan Cui wrote: > > When a slot is removed, the pci_dev must still exist. > > pci_remove_root_bus() removes and free all the pci_devs, so > hv_pci_remove_slots() must be called before pci_remove_root_bus(), > otherwise a general protection fault can happen, if the kernel is built "general protection fault" is an x86 term that doesn't really say what the issue is. I suspect this would be a "use-after-free" problem. > with the memory debugging options. > > Fixes: 15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the driver") > Signed-off-by: Dexuan Cui <decui@microsoft.com> > Cc: stable@vger.kernel.org > > --- > > When pci-hyperv is unloaded, this panic can happen: > > general protection fault: > CPU: 2 PID: 1091 Comm: rmmod Not tainted 5.2.0+ > RIP: 0010:pci_slot_release+0x30/0xd0 > Call Trace: > kobject_release+0x65/0x190 > pci_destroy_slot+0x25/0x60 > hv_pci_remove+0xec/0x110 [pci_hyperv] > vmbus_remove+0x20/0x30 [hv_vmbus] > device_release_driver_internal+0xd5/0x1b0 > driver_detach+0x44/0x7c > bus_remove_driver+0x75/0xc7 > vmbus_driver_unregister+0x50/0xbd [hv_vmbus] > __x64_sys_delete_module+0x136/0x200 > do_syscall_64+0x5e/0x220 > > drivers/pci/controller/pci-hyperv.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c > index 6b9cc6e60a..68c611d 100644 > --- a/drivers/pci/controller/pci-hyperv.c > +++ b/drivers/pci/controller/pci-hyperv.c > @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev) > /* Remove the bus from PCI's point of view. */ > pci_lock_rescan_remove(); > pci_stop_root_bus(hbus->pci_bus); > - pci_remove_root_bus(hbus->pci_bus); > hv_pci_remove_slots(hbus); > + pci_remove_root_bus(hbus->pci_bus); I'm curious about why we need hv_pci_remove_slots() at all. None of the other callers of pci_stop_root_bus() and pci_remove_root_bus() do anything similar to hv_pci_remove_slots(). Surely some of those callers also support slots, so there must be some other path that calls pci_destroy_slot() in those cases. Can we use a similar strategy here? > pci_unlock_rescan_remove(); > hbus->state = hv_pcibus_removed; > } > -- > 1.8.3.1 >
> From: Bjorn Helgaas <helgaas@kernel.org> > Sent: Friday, August 2, 2019 12:41 PM > The subject line only describes the mechanical code change, which is > obvious from the patch. It would be better if we could say something > about *why* we need this. Hi Bjorn, Sorry. I'll try to write a better changelog in v2. :-) > On Fri, Aug 02, 2019 at 01:32:28AM +0000, Dexuan Cui wrote: > > > > When a slot is removed, the pci_dev must still exist. > > > > pci_remove_root_bus() removes and free all the pci_devs, so > > hv_pci_remove_slots() must be called before pci_remove_root_bus(), > > otherwise a general protection fault can happen, if the kernel is built > > "general protection fault" is an x86 term that doesn't really say what > the issue is. I suspect this would be a "use-after-free" problem. Yes, it's use-after-free. I'll fix the the wording. > > --- a/drivers/pci/controller/pci-hyperv.c > > +++ b/drivers/pci/controller/pci-hyperv.c > > @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev) > > /* Remove the bus from PCI's point of view. */ > > pci_lock_rescan_remove(); > > pci_stop_root_bus(hbus->pci_bus); > > - pci_remove_root_bus(hbus->pci_bus); > > hv_pci_remove_slots(hbus); > > + pci_remove_root_bus(hbus->pci_bus); > > I'm curious about why we need hv_pci_remove_slots() at all. None of > the other callers of pci_stop_root_bus() and pci_remove_root_bus() do > anything similar to hv_pci_remove_slots(). > > Surely some of those callers also support slots, so there must be some > other path that calls pci_destroy_slot() in those cases. Can we use a > similar strategy here? Originally Stephen Heminger added the slot code for pci-hyperv.c: a15f2c08c708 ("PCI: hv: support reporting serial number as slot information") So he may know this better. My understanding is: we can not use the similar stragegy used in the 2 other users of pci_create_slot(): drivers/pci/hotplug/pci_hotplug_core.c calls pci_create_slot(). It looks drivers/pci/hotplug/ is quite different from pci-hyperv.c because pci-hyper-v uses a simple *private* hot-plug protocol, making it impossible to use the API pci_hp_register() and pci_hp_destroy() -> pci_destroy_slot(). drivers/acpi/pci_slot.c calls pci_create_slot(), and saves the created slots in the static "slot_list" list in the same file. Again, since pci-hyper-v uses a private PCI-device-discovery protocol (which is based on VMBus rather the emulated ACPI and PCI), acpi_pci_slot_enumerate() can not find the PCI devices that are discovered by pci-hyperv, so we can not use the standard register_slot() -> pci_create_slot() to create the slots and hence acpi_pci_slot_remove() -> pci_destroy_slot() can not work for pci-hyperv. I think I can use this as the v2 changelog: The slot must be removed before the pci_dev is removed, otherwise a panic can happen due to use-after-free. Thanks, Dexuan
On Fri, Aug 02, 2019 at 08:31:26PM +0000, Dexuan Cui wrote: > > From: Bjorn Helgaas <helgaas@kernel.org> > > Sent: Friday, August 2, 2019 12:41 PM > > The subject line only describes the mechanical code change, which is > > obvious from the patch. It would be better if we could say something > > about *why* we need this. > > Hi Bjorn, > Sorry. I'll try to write a better changelog in v2. :-) > > > On Fri, Aug 02, 2019 at 01:32:28AM +0000, Dexuan Cui wrote: > > > > > > When a slot is removed, the pci_dev must still exist. > > > > > > pci_remove_root_bus() removes and free all the pci_devs, so > > > hv_pci_remove_slots() must be called before pci_remove_root_bus(), > > > otherwise a general protection fault can happen, if the kernel is built > > > > "general protection fault" is an x86 term that doesn't really say what > > the issue is. I suspect this would be a "use-after-free" problem. > > Yes, it's use-after-free. I'll fix the the wording. > > > > --- a/drivers/pci/controller/pci-hyperv.c > > > +++ b/drivers/pci/controller/pci-hyperv.c > > > @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev) > > > /* Remove the bus from PCI's point of view. */ > > > pci_lock_rescan_remove(); > > > pci_stop_root_bus(hbus->pci_bus); > > > - pci_remove_root_bus(hbus->pci_bus); > > > hv_pci_remove_slots(hbus); > > > + pci_remove_root_bus(hbus->pci_bus); > > > > I'm curious about why we need hv_pci_remove_slots() at all. None of > > the other callers of pci_stop_root_bus() and pci_remove_root_bus() do > > anything similar to hv_pci_remove_slots(). > > > > Surely some of those callers also support slots, so there must be some > > other path that calls pci_destroy_slot() in those cases. Can we use a > > similar strategy here? > > Originally Stephen Heminger added the slot code for pci-hyperv.c: > a15f2c08c708 ("PCI: hv: support reporting serial number as slot information") > So he may know this better. My understanding is: we can not use the similar > stragegy used in the 2 other users of pci_create_slot(): > > drivers/pci/hotplug/pci_hotplug_core.c calls pci_create_slot(). > It looks drivers/pci/hotplug/ is quite different from pci-hyperv.c because > pci-hyper-v uses a simple *private* hot-plug protocol, making it impossible > to use the API pci_hp_register() and pci_hp_destroy() -> pci_destroy_slot(). > > drivers/acpi/pci_slot.c calls pci_create_slot(), and saves the created slots in > the static "slot_list" list in the same file. Again, since pci-hyper-v uses a private > PCI-device-discovery protocol (which is based on VMBus rather the emulated > ACPI and PCI), acpi_pci_slot_enumerate() can not find the PCI devices that are > discovered by pci-hyperv, so we can not use the standard register_slot() -> > pci_create_slot() to create the slots and hence acpi_pci_slot_remove() -> > pci_destroy_slot() can not work for pci-hyperv. Hmm, ok. This still doesn't seem right to me, but I think the bottom line will be that the current slot registration interfaces just don't work quite right for all the cases we want them to. Maybe it would be a good project for somebody to rethink them, but it doesn't seem practical for *this* patch. Thanks for looking into it this far! > I think I can use this as the v2 changelog: > > The slot must be removed before the pci_dev is removed, otherwise a panic > can happen due to use-after-free. Sounds good. Bjorn
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index 6b9cc6e60a..68c611d 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev) /* Remove the bus from PCI's point of view. */ pci_lock_rescan_remove(); pci_stop_root_bus(hbus->pci_bus); - pci_remove_root_bus(hbus->pci_bus); hv_pci_remove_slots(hbus); + pci_remove_root_bus(hbus->pci_bus); pci_unlock_rescan_remove(); hbus->state = hv_pcibus_removed; }
When a slot is removed, the pci_dev must still exist. pci_remove_root_bus() removes and free all the pci_devs, so hv_pci_remove_slots() must be called before pci_remove_root_bus(), otherwise a general protection fault can happen, if the kernel is built with the memory debugging options. Fixes: 15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the driver") Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: stable@vger.kernel.org --- When pci-hyperv is unloaded, this panic can happen: general protection fault: CPU: 2 PID: 1091 Comm: rmmod Not tainted 5.2.0+ RIP: 0010:pci_slot_release+0x30/0xd0 Call Trace: kobject_release+0x65/0x190 pci_destroy_slot+0x25/0x60 hv_pci_remove+0xec/0x110 [pci_hyperv] vmbus_remove+0x20/0x30 [hv_vmbus] device_release_driver_internal+0xd5/0x1b0 driver_detach+0x44/0x7c bus_remove_driver+0x75/0xc7 vmbus_driver_unregister+0x50/0xbd [hv_vmbus] __x64_sys_delete_module+0x136/0x200 do_syscall_64+0x5e/0x220 drivers/pci/controller/pci-hyperv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)