Message ID | 20090326213839.GA1112@ldl.fc.hp.com (mailing list archive) |
---|---|
State | Rejected, archived |
Headers | show |
Alex Chiang wrote: > * Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>: >> Thank you very much for testing. >> >> We still have similar kernel oops (see below) with ACPI pci slot >> detection driver. I guess the same problem would also occur with >> acpiphp though I've not tried yet. I don't look at Trent's bus >> notifier approach yet, but I think we need something like this to >> fix this problem. >> >> Here are steps to reproduce and kernel oops message. >> >> * Steps to reproduce >> >> (1) Load ACPI pci slot detection driver >> (2) Remove the parent bridge of the slot >> (3) Unload ACPI pci slot detection driver > > Thanks for the report. > > I believe this patch will fix the case for bridges. I haven't > tested what happens if we remove an endpoint yet though. > > Can you try this? > > Thanks. > > /ac > > --- > commit 557ce38e78cf06ce16aefcf273051ea0bac3d35c > Author: Alex Chiang <achiang@hp.com> > Date: Thu Mar 26 15:36:34 2009 -0600 > > PCI: pci_create_slot / pci_destroy_slot need to grab reference to parent bus > > If a logical hotunplug (remove) is performed on a PCI bridge claimed by > a hotplug or slot detection driver, and then the hotplug/detection module > is unloaded, we will encounter an oops: > > Call Trace: > [<a000000100039bc0>] die+0x1c0/0x2c0 > sp=e0000005062ff9e0 bsp=e0000005062f13b0 > [<a000000100039d00>] die_if_kernel+0x40/0x60 > sp=e0000005062ff9e0 bsp=e0000005062f1380 > [<a00000010003b590>] ia64_fault+0x1230/0x1280 > sp=e0000005062ff9e0 bsp=e0000005062f1300 > [<a00000010000c700>] ia64_native_leave_kernel+0x0/0x270 > sp=e0000005062ffbf0 bsp=e0000005062f1300 > [<a0000001003988f0>] pci_slot_release+0x70/0x1c0 > sp=e0000005062ffdc0 bsp=e0000005062f12b0 > [<a0000001003694d0>] kobject_release+0x4f0/0x5e0 > sp=e0000005062ffdc0 bsp=e0000005062f1270 > [<a00000010036b490>] kref_put+0xd0/0x100 > sp=e0000005062ffdc0 bsp=e0000005062f1248 > [<a000000100368650>] kobject_put+0x90/0xc0 > sp=e0000005062ffdc0 bsp=e0000005062f1220 > [<a000000100399260>] pci_destroy_slot+0xa0/0xe0 > sp=e0000005062ffdc0 bsp=e0000005062f11f0 > [<a0000002044d2c70>] pci_hp_deregister+0x510/0x560 [pci_hotplug] > sp=e0000005062ffdc0 bsp=e0000005062f11a8 > [<a000000205d71aa0>] acpiphp_unregister_hotplug_slot+0x80/0x100 [acpiphp] > sp=e0000005062ffdc0 bsp=e0000005062f1180 > [<a000000205d73d40>] cleanup_bridge+0x3a0/0x4c0 [acpiphp] > sp=e0000005062ffdc0 bsp=e0000005062f1128 > [<a000000205d73ee0>] cleanup_p2p_bridge+0x80/0xc0 [acpiphp] > sp=e0000005062ffdc0 bsp=e0000005062f1108 > [<a0000001003ed200>] acpi_ns_walk_namespace+0x160/0x2e0 > sp=e0000005062ffdc0 bsp=e0000005062f1098 > [<a0000001003e8850>] acpi_walk_namespace+0x90/0xe0 > sp=e0000005062ffdc0 bsp=e0000005062f1048 > [<a000000205d73f70>] remove_bridge+0x50/0xe0 [acpiphp] > sp=e0000005062ffdc0 bsp=e0000005062f1028 > [<a000000100411590>] acpi_pci_unregister_driver+0x1f0/0x2a0 > sp=e0000005062ffdc0 bsp=e0000005062f0fe8 > [<a000000205d759d0>] acpiphp_glue_exit+0x30/0x60 [acpiphp] > sp=e0000005062ffdc0 bsp=e0000005062f0fd0 > [<a000000205d77380>] acpiphp_exit+0x20/0x40 [acpiphp] > sp=e0000005062ffdc0 bsp=e0000005062f0fb8 > [<a0000001000e9310>] sys_delete_module+0x410/0x520 > sp=e0000005062ffdc0 bsp=e0000005062f0f38 > > This is because pci_slot_release will access the parent PCI bus, > which has already been released by the user's prior hot unplug. > > The solution is for pci_create_slot to grab a reference on the > parent PCI bus (and pci_destroy_slot to put the reference). This > will prevent the parent from release while the hotplug or slot > detection driver is loaded. > > Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> > Signed-off-by: Alex Chiang <achiang@hp.com> > > diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c > index 2118944..459d6a2 100644 > --- a/drivers/pci/slot.c > +++ b/drivers/pci/slot.c > @@ -248,6 +248,7 @@ placeholder: > if (PCI_SLOT(dev->devfn) == slot_nr) > dev->slot = slot; > > + get_device(&parent->dev); > dev_dbg(&parent->dev, "dev %02x, created physical slot %s\n", > slot_nr, pci_slot_name(slot)); > > @@ -302,6 +303,7 @@ void pci_destroy_slot(struct pci_slot *slot) > slot->number, atomic_read(&slot->kobj.kref.refcount) - 1); > > down_write(&pci_bus_sem); > + put_device(&slot->bus->dev); > kobject_put(&slot->kobj); > up_write(&pci_bus_sem); > } > I've not tried your patch yet, but I don't think it works because pci_create_slot() can be executed by some hotplug drivers (pciehp, shpchp, ...) before parent->dev is initialized. Anyway, I'll try it and report the result as soon as possible. Thanks, Kenji Kaneshige -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kenji Kaneshige wrote: > Alex Chiang wrote: >> * Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>: >>> Thank you very much for testing. >>> >>> We still have similar kernel oops (see below) with ACPI pci slot >>> detection driver. I guess the same problem would also occur with >>> acpiphp though I've not tried yet. I don't look at Trent's bus >>> notifier approach yet, but I think we need something like this to >>> fix this problem. >>> >>> Here are steps to reproduce and kernel oops message. >>> >>> * Steps to reproduce >>> >>> (1) Load ACPI pci slot detection driver >>> (2) Remove the parent bridge of the slot >>> (3) Unload ACPI pci slot detection driver >> >> Thanks for the report. >> >> I believe this patch will fix the case for bridges. I haven't >> tested what happens if we remove an endpoint yet though. >> >> Can you try this? >> >> Thanks. >> >> /ac >> >> --- >> commit 557ce38e78cf06ce16aefcf273051ea0bac3d35c >> Author: Alex Chiang <achiang@hp.com> >> Date: Thu Mar 26 15:36:34 2009 -0600 >> >> PCI: pci_create_slot / pci_destroy_slot need to grab reference to >> parent bus >> If a logical hotunplug (remove) is performed on a PCI bridge >> claimed by >> a hotplug or slot detection driver, and then the hotplug/detection >> module >> is unloaded, we will encounter an oops: >> Call Trace: >> [<a000000100039bc0>] die+0x1c0/0x2c0 >> sp=e0000005062ff9e0 >> bsp=e0000005062f13b0 >> [<a000000100039d00>] die_if_kernel+0x40/0x60 >> sp=e0000005062ff9e0 >> bsp=e0000005062f1380 >> [<a00000010003b590>] ia64_fault+0x1230/0x1280 >> sp=e0000005062ff9e0 >> bsp=e0000005062f1300 >> [<a00000010000c700>] ia64_native_leave_kernel+0x0/0x270 >> sp=e0000005062ffbf0 >> bsp=e0000005062f1300 >> [<a0000001003988f0>] pci_slot_release+0x70/0x1c0 >> sp=e0000005062ffdc0 >> bsp=e0000005062f12b0 >> [<a0000001003694d0>] kobject_release+0x4f0/0x5e0 >> sp=e0000005062ffdc0 >> bsp=e0000005062f1270 >> [<a00000010036b490>] kref_put+0xd0/0x100 >> sp=e0000005062ffdc0 >> bsp=e0000005062f1248 >> [<a000000100368650>] kobject_put+0x90/0xc0 >> sp=e0000005062ffdc0 >> bsp=e0000005062f1220 >> [<a000000100399260>] pci_destroy_slot+0xa0/0xe0 >> sp=e0000005062ffdc0 >> bsp=e0000005062f11f0 >> [<a0000002044d2c70>] pci_hp_deregister+0x510/0x560 [pci_hotplug] >> sp=e0000005062ffdc0 >> bsp=e0000005062f11a8 >> [<a000000205d71aa0>] acpiphp_unregister_hotplug_slot+0x80/0x100 >> [acpiphp] >> sp=e0000005062ffdc0 >> bsp=e0000005062f1180 >> [<a000000205d73d40>] cleanup_bridge+0x3a0/0x4c0 [acpiphp] >> sp=e0000005062ffdc0 >> bsp=e0000005062f1128 >> [<a000000205d73ee0>] cleanup_p2p_bridge+0x80/0xc0 [acpiphp] >> sp=e0000005062ffdc0 >> bsp=e0000005062f1108 >> [<a0000001003ed200>] acpi_ns_walk_namespace+0x160/0x2e0 >> sp=e0000005062ffdc0 >> bsp=e0000005062f1098 >> [<a0000001003e8850>] acpi_walk_namespace+0x90/0xe0 >> sp=e0000005062ffdc0 >> bsp=e0000005062f1048 >> [<a000000205d73f70>] remove_bridge+0x50/0xe0 [acpiphp] >> sp=e0000005062ffdc0 >> bsp=e0000005062f1028 >> [<a000000100411590>] acpi_pci_unregister_driver+0x1f0/0x2a0 >> sp=e0000005062ffdc0 >> bsp=e0000005062f0fe8 >> [<a000000205d759d0>] acpiphp_glue_exit+0x30/0x60 [acpiphp] >> sp=e0000005062ffdc0 >> bsp=e0000005062f0fd0 >> [<a000000205d77380>] acpiphp_exit+0x20/0x40 [acpiphp] >> sp=e0000005062ffdc0 >> bsp=e0000005062f0fb8 >> [<a0000001000e9310>] sys_delete_module+0x410/0x520 >> sp=e0000005062ffdc0 >> bsp=e0000005062f0f38 >> This is because pci_slot_release will access the parent PCI bus, >> which has already been released by the user's prior hot unplug. >> The solution is for pci_create_slot to grab a reference on the >> parent PCI bus (and pci_destroy_slot to put the reference). This >> will prevent the parent from release while the hotplug or slot >> detection driver is loaded. >> Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> >> Signed-off-by: Alex Chiang <achiang@hp.com> >> >> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c >> index 2118944..459d6a2 100644 >> --- a/drivers/pci/slot.c >> +++ b/drivers/pci/slot.c >> @@ -248,6 +248,7 @@ placeholder: >> if (PCI_SLOT(dev->devfn) == slot_nr) >> dev->slot = slot; >> >> + get_device(&parent->dev); >> dev_dbg(&parent->dev, "dev %02x, created physical slot %s\n", >> slot_nr, pci_slot_name(slot)); >> >> @@ -302,6 +303,7 @@ void pci_destroy_slot(struct pci_slot *slot) >> slot->number, atomic_read(&slot->kobj.kref.refcount) - 1); >> >> down_write(&pci_bus_sem); >> + put_device(&slot->bus->dev); >> kobject_put(&slot->kobj); >> up_write(&pci_bus_sem); >> } >> > > I've not tried your patch yet, but I don't think it works because > pci_create_slot() can be executed by some hotplug drivers (pciehp, > shpchp, ...) before parent->dev is initialized. > > Anyway, I'll try it and report the result as soon as possible. > > Thanks, > Kenji Kaneshige > > I tried your patch. The result is that the patch doesn't fix anything and introduces new kernel oops (see below). I can reproduce the new kernel oops by the following steps. (1) Load pciehp (2) Remove parent bridge of the slot (3) Add parent bridge of the slot I think the reason why your patch doesn't fix the problem is the place of put_device() is wrong. It needs to be placed after kobject_put(), doesn't it? And the reason why your patch introduce the new kernel oops is get_device(&parent->dev) is executed before parent->dev is initialized, as I mentioned in the previous e-mail. [ 144.362098] ------------[ cut here ]------------ [ 144.362239] WARNING: at lib/kref.c:43 kref_get+0x2f/0x40() [ 144.362382] Hardware name: PRIMERGY [ 144.362524] Modules linked in: pciehp ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod sbs sbshc pci_slot battery ac parport_pc lp parport mptspi mptscsih mptbase scsi_transport_spi e1000e sg sr_mod cdrom button serio_raw i2c_i801 i2c_core shpchp pcspkr ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] [ 144.365747] Pid: 68, comm: work_on_cpu/0 Not tainted 2.6.29-rc8-kk #3 [ 144.365892] Call Trace: [ 144.366039] [<ffffffff8024507c>] warn_slowpath+0xdc/0x110 [ 144.366185] [<ffffffff8033ef9b>] ? sysfs_add_file_mode+0x6b/0xa0 [ 144.366331] [<ffffffff8033efdc>] ? sysfs_add_file+0xc/0x10 [ 144.366475] [<ffffffff8033f00b>] ? sysfs_create_file+0x2b/0x40 [ 144.366622] [<ffffffff803a7bf1>] ? kobject_add_internal+0x141/0x250 [ 144.366769] [<ffffffff803a7e0d>] ? kobject_add_varg+0x5d/0x70 [ 144.366917] [<ffffffff80567466>] ? _spin_unlock+0x26/0x30 [ 144.367070] [<ffffffff803a8353>] ? kset_find_obj+0x23/0x90 [ 144.367213] [<ffffffff803a8395>] ? kset_find_obj+0x65/0x90 [ 144.367355] [<ffffffff803a8d5f>] kref_get+0x2f/0x40 [ 144.367497] [<ffffffff803a7a9a>] kobject_get+0x1a/0x30 [ 144.367641] [<ffffffff8043df27>] get_device+0x17/0x20 [ 144.367784] [<ffffffff803bef5f>] pci_create_slot+0x1af/0x280 [ 144.367929] [<ffffffff803c75cf>] pci_hp_register+0x8f/0x320 [ 144.368087] [<ffffffffa031aa70>] pciehp_probe+0xb0/0x4d0 [pciehp] [ 144.368234] [<ffffffff803c5354>] pcie_port_probe_service+0x54/0x90 [ 144.368381] [<ffffffff804421de>] driver_probe_device+0xbe/0x2e0 [ 144.368526] [<ffffffff804424a0>] ? __device_attach+0x0/0x10 [ 144.368670] [<ffffffff804424a9>] __device_attach+0x9/0x10 [ 144.368813] [<ffffffff80440ee8>] bus_for_each_drv+0x68/0x90 [ 144.368957] [<ffffffff80442578>] device_attach+0x88/0x90 [ 144.369110] [<ffffffff80440e5b>] bus_attach_device+0x5b/0x80 [ 144.369255] [<ffffffff8043ea0a>] device_add+0x40a/0x690 [ 144.369399] [<ffffffff8043eca9>] device_register+0x19/0x20 [ 144.369544] [<ffffffff803c51ec>] pcie_port_device_register+0x3fc/0x510 [ 144.369694] [<ffffffff80257410>] ? do_work_for_cpu+0x0/0x20 [ 144.369839] [<ffffffff80257410>] ? do_work_for_cpu+0x0/0x20 [ 144.369985] [<ffffffff80554fc5>] pcie_portdrv_probe+0x35/0xa0 [ 144.370140] [<ffffffff803bf232>] local_pci_probe+0x12/0x20 [ 144.370283] [<ffffffff80257423>] do_work_for_cpu+0x13/0x20 [ 144.371678] [<ffffffff80257559>] ? run_workqueue+0x19/0x230 [ 144.371822] [<ffffffff8025769a>] run_workqueue+0x15a/0x230 [ 144.371965] [<ffffffff80257648>] ? run_workqueue+0x108/0x230 [ 144.372114] [<ffffffff8025846f>] worker_thread+0x9f/0x100 [ 144.372258] [<ffffffff8025bce0>] ? autoremove_wake_function+0x0/0x40 [ 144.372404] [<ffffffff802583d0>] ? worker_thread+0x0/0x100 [ 144.372547] [<ffffffff8025b89d>] kthread+0x4d/0x80 [ 144.372689] [<ffffffff8020d4ba>] child_rip+0xa/0x20 [ 144.372831] [<ffffffff8020cebc>] ? restore_args+0x0/0x30 [ 144.372975] [<ffffffff8025b850>] ? kthread+0x0/0x80 [ 144.373126] [<ffffffff8020d4b0>] ? child_rip+0x0/0x20 [ 144.373269] ---[ end trace 43bc571356db47af ]--- Thanks, Kenji Kaneshige -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>: > Kenji Kaneshige wrote: >> Alex Chiang wrote: >>> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c >>> index 2118944..459d6a2 100644 >>> --- a/drivers/pci/slot.c >>> +++ b/drivers/pci/slot.c >>> @@ -248,6 +248,7 @@ placeholder: >>> if (PCI_SLOT(dev->devfn) == slot_nr) >>> dev->slot = slot; >>> + get_device(&parent->dev); >>> dev_dbg(&parent->dev, "dev %02x, created physical slot %s\n", >>> slot_nr, pci_slot_name(slot)); >>> @@ -302,6 +303,7 @@ void pci_destroy_slot(struct pci_slot *slot) >>> slot->number, atomic_read(&slot->kobj.kref.refcount) - 1); >>> down_write(&pci_bus_sem); >>> + put_device(&slot->bus->dev); >>> kobject_put(&slot->kobj); >>> up_write(&pci_bus_sem); >>> } >>> >> >> I've not tried your patch yet, but I don't think it works because >> pci_create_slot() can be executed by some hotplug drivers (pciehp, >> shpchp, ...) before parent->dev is initialized. >> >> Anyway, I'll try it and report the result as soon as possible. >> > > I tried your patch. The result is that the patch doesn't fix anything > and introduces new kernel oops (see below). I can reproduce the new > kernel oops by the following steps. > > (1) Load pciehp > (2) Remove parent bridge of the slot > (3) Add parent bridge of the slot > > I think the reason why your patch doesn't fix the problem is the place > of put_device() is wrong. It needs to be placed after kobject_put(), > doesn't it? You are right. > And the reason why your patch introduce the new kernel oops is > get_device(&parent->dev) is executed before parent->dev is initialized, > as I mentioned in the previous e-mail. Sigh, you are right again. I tested with acpiphp which does not have that issue. Sorry for the noise. I'll try and think of something else. :-/ Thanks. /ac -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Alex Chiang wrote: > * Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>: >> Kenji Kaneshige wrote: >>> Alex Chiang wrote: >>>> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c >>>> index 2118944..459d6a2 100644 >>>> --- a/drivers/pci/slot.c >>>> +++ b/drivers/pci/slot.c >>>> @@ -248,6 +248,7 @@ placeholder: >>>> if (PCI_SLOT(dev->devfn) == slot_nr) >>>> dev->slot = slot; >>>> + get_device(&parent->dev); >>>> dev_dbg(&parent->dev, "dev %02x, created physical slot %s\n", >>>> slot_nr, pci_slot_name(slot)); >>>> @@ -302,6 +303,7 @@ void pci_destroy_slot(struct pci_slot *slot) >>>> slot->number, atomic_read(&slot->kobj.kref.refcount) - 1); >>>> down_write(&pci_bus_sem); >>>> + put_device(&slot->bus->dev); >>>> kobject_put(&slot->kobj); >>>> up_write(&pci_bus_sem); >>>> } >>>> >>> I've not tried your patch yet, but I don't think it works because >>> pci_create_slot() can be executed by some hotplug drivers (pciehp, >>> shpchp, ...) before parent->dev is initialized. >>> >>> Anyway, I'll try it and report the result as soon as possible. >>> >> I tried your patch. The result is that the patch doesn't fix anything >> and introduces new kernel oops (see below). I can reproduce the new >> kernel oops by the following steps. >> >> (1) Load pciehp >> (2) Remove parent bridge of the slot >> (3) Add parent bridge of the slot >> >> I think the reason why your patch doesn't fix the problem is the place >> of put_device() is wrong. It needs to be placed after kobject_put(), >> doesn't it? > > You are right. > >> And the reason why your patch introduce the new kernel oops is >> get_device(&parent->dev) is executed before parent->dev is initialized, >> as I mentioned in the previous e-mail. > > Sigh, you are right again. I tested with acpiphp which does not > have that issue. Sorry for the noise. > No problem. Please feel free to ask me for testing. Thanks, Kenji Kaneshige -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c index 2118944..459d6a2 100644 --- a/drivers/pci/slot.c +++ b/drivers/pci/slot.c @@ -248,6 +248,7 @@ placeholder: if (PCI_SLOT(dev->devfn) == slot_nr) dev->slot = slot; + get_device(&parent->dev); dev_dbg(&parent->dev, "dev %02x, created physical slot %s\n", slot_nr, pci_slot_name(slot)); @@ -302,6 +303,7 @@ void pci_destroy_slot(struct pci_slot *slot) slot->number, atomic_read(&slot->kobj.kref.refcount) - 1); down_write(&pci_bus_sem); + put_device(&slot->bus->dev); kobject_put(&slot->kobj); up_write(&pci_bus_sem); }