Message ID | 1606876422-117457-1-git-send-email-zhongjubin@huawei.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [v2] PCI: Fix Oops caused by uninitialized slot->list in pci_slot_release() | expand |
On Wed, Dec 02, 2020 at 10:33:42AM +0800, Jubin Zhong wrote: > Once kobject_init_and_add() failed, pci_slot_release() is called to > delete slot->list from parent->slots. But slot->list is intialized > afterwards, so we ran into the following crash: > > Unable to handle kernel NULL pointer dereference at virtual address > 00000000 > ... > CPU: 10 PID: 1 Comm: swapper/0 Not tainted 4.4.240 #197 > task: ffffeb398a45ef10 task.stack: ffffeb398a470000 > PC is at __list_del_entry_valid+0x5c/0xb0 > LR is at pci_slot_release+0x84/0xe4 > ... > __list_del_entry_valid+0x5c/0xb0 > pci_slot_release+0x84/0xe4 > kobject_put+0x184/0x1c4 > pci_create_slot+0x17c/0x1b4 > __pci_hp_initialize+0x68/0xa4 > pciehp_probe+0x1a4/0x2fc > pcie_port_probe_service+0x58/0x84 > driver_probe_device+0x320/0x470 > __driver_attach+0x54/0xb8 > bus_for_each_dev+0xc8/0xf0 > driver_attach+0x30/0x3c > bus_add_driver+0x1b0/0x24c > driver_register+0x9c/0xe0 > pcie_port_service_register+0x64/0x7c > pcied_init+0x44/0xa4 > do_one_initcall+0x1d0/0x1f0 > kernel_init_freeable+0x24c/0x254 > kernel_init+0x18/0xe8 > ret_from_fork+0x10/0x20 > > Fixes: 8a94644b440e ("PCI: Fix pci_create_slot() reference count leak") > Signed-off-by: Jubin Zhong <zhongjubin@huawei.com> > Cc: stable@vger.kernel.org #v4.4.235 I'm curious how you noticed this. Did kobject_init_and_add() fail for some reason? Is there a bug there that we need to fix also? And why did you mention v4.4.235 in the stable tag? 8a94644b440e wasn't merged until v5.9. If 8a94644b440e was backported to older kernels, we can't know that in general, so I think whoever backported it is responsible for watching for Fixes tags that mention it. I updated the stable tag to "v5.9+" and applied this to pci/hotplug for v5.11, thanks! > ---- > v2: > Since both slot memory and slot->list would be handled by > pci_slot_release(), we need to make sure slot->list is properly > initialized beforehand. > > v1: https://lore.kernel.org/lkml/1606288971-47927-1-git-send-email-zhongjubin@huawei.com/ > Two things need to be cleaned up on pci_create_slot's error path: > 1. free slot memory > 2. remove slot->list from its parent->slots > This patch mistakenly took slot memory as unfreed (which is not), and > would introduce double free problem. > --- > drivers/pci/slot.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c > index 3861505..ed2077e 100644 > --- a/drivers/pci/slot.c > +++ b/drivers/pci/slot.c > @@ -272,6 +272,9 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr, > goto err; > } > > + INIT_LIST_HEAD(&slot->list); > + list_add(&slot->list, &parent->slots); > + > err = kobject_init_and_add(&slot->kobj, &pci_slot_ktype, NULL, > "%s", slot_name); > if (err) { > @@ -279,9 +282,6 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr, > goto err; > } > > - INIT_LIST_HEAD(&slot->list); > - list_add(&slot->list, &parent->slots); > - > down_read(&pci_bus_sem); > list_for_each_entry(dev, &parent->devices, bus_list) > if (PCI_SLOT(dev->devfn) == slot_nr) > -- > 1.8.5.6 >
>On Wed, Dec 02, 2020 at 10:33:42AM +0800, Jubin Zhong wrote: >> Once kobject_init_and_add() failed, pci_slot_release() is called to >> delete slot->list from parent->slots. But slot->list is intialized >> afterwards, so we ran into the following crash: >> >> Unable to handle kernel NULL pointer dereference at virtual address >> 00000000 >> ... >> CPU: 10 PID: 1 Comm: swapper/0 Not tainted 4.4.240 #197 >> task: ffffeb398a45ef10 task.stack: ffffeb398a470000 >> PC is at __list_del_entry_valid+0x5c/0xb0 >> LR is at pci_slot_release+0x84/0xe4 >> ... >> __list_del_entry_valid+0x5c/0xb0 >> pci_slot_release+0x84/0xe4 >> kobject_put+0x184/0x1c4 >> pci_create_slot+0x17c/0x1b4 >> __pci_hp_initialize+0x68/0xa4 >> pciehp_probe+0x1a4/0x2fc >> pcie_port_probe_service+0x58/0x84 >> driver_probe_device+0x320/0x470 >> __driver_attach+0x54/0xb8 >> bus_for_each_dev+0xc8/0xf0 >> driver_attach+0x30/0x3c >> bus_add_driver+0x1b0/0x24c >> driver_register+0x9c/0xe0 >> pcie_port_service_register+0x64/0x7c >> pcied_init+0x44/0xa4 >> do_one_initcall+0x1d0/0x1f0 >> kernel_init_freeable+0x24c/0x254 >> kernel_init+0x18/0xe8 >> ret_from_fork+0x10/0x20 >> >> Fixes: 8a94644b440e ("PCI: Fix pci_create_slot() reference count >> leak") >> Signed-off-by: Jubin Zhong <zhongjubin@huawei.com> >> Cc: stable@vger.kernel.org #v4.4.235 >I'm curious how you noticed this. Did kobject_init_and_add() fail for some reason? Is there a bug there that we need to fix also? Actually not. In the very beginning, Coverity reported a RESOURCE_LEAK warning of slot memory. It should be a false alarm because pci_slot_release() is already called to free slot memory as a function hook (which Coverity could not recognize). After code review I prepared to cancel the warning, that's when I noticed the unintialized slot list is also deleted by pci_slot_release(). >And why did you mention v4.4.235 in the stable tag? 8a94644b440e wasn't merged until v5.9. If 8a94644b440e was backported to older kernels, we can't know that in general, so I think whoever backported it is responsible for watching for Fixes tags that mention it. The Coverity warning was reported on v4.4.240. I also did a little test on v4.4.240 by forcing kobject_init_and_add() to fail, that's how I got the above call stack. Since 8a94644b440e was backported to v4.4.235, I thought it should be necessary to fix it on v4.4 too. >>I updated the stable tag to "v5.9+" and applied this to pci/hotplug for v5.11, thanks! >> ---- >> v2: >> Since both slot memory and slot->list would be handled by >> pci_slot_release(), we need to make sure slot->list is properly >> initialized beforehand. >> >> v1: https://lore.kernel.org/lkml/1606288971-47927-1-git-send-email-zhongjubin@huawei.com/ >> Two things need to be cleaned up on pci_create_slot's error path: >> 1. free slot memory >> 2. remove slot->list from its parent->slots >> This patch mistakenly took slot memory as unfreed (which is not), >> and would introduce double free problem. >> --- >> drivers/pci/slot.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c index >> 3861505..ed2077e 100644 >> --- a/drivers/pci/slot.c >> +++ b/drivers/pci/slot.c >> @@ -272,6 +272,9 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr, >> goto err; >> } >> >> + INIT_LIST_HEAD(&slot->list); >> + list_add(&slot->list, &parent->slots); >> + >> err = kobject_init_and_add(&slot->kobj, &pci_slot_ktype, NULL, >> "%s", slot_name); >> if (err) { >> @@ -279,9 +282,6 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr, >> goto err; >> } >> >> - INIT_LIST_HEAD(&slot->list); >> - list_add(&slot->list, &parent->slots); >> - >> down_read(&pci_bus_sem); >> list_for_each_entry(dev, &parent->devices, bus_list) >> if (PCI_SLOT(dev->devfn) == slot_nr) >> -- >> 1.8.5.6 >>
diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c index 3861505..ed2077e 100644 --- a/drivers/pci/slot.c +++ b/drivers/pci/slot.c @@ -272,6 +272,9 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr, goto err; } + INIT_LIST_HEAD(&slot->list); + list_add(&slot->list, &parent->slots); + err = kobject_init_and_add(&slot->kobj, &pci_slot_ktype, NULL, "%s", slot_name); if (err) { @@ -279,9 +282,6 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr, goto err; } - INIT_LIST_HEAD(&slot->list); - list_add(&slot->list, &parent->slots); - down_read(&pci_bus_sem); list_for_each_entry(dev, &parent->devices, bus_list) if (PCI_SLOT(dev->devfn) == slot_nr)
Once kobject_init_and_add() failed, pci_slot_release() is called to delete slot->list from parent->slots. But slot->list is intialized afterwards, so we ran into the following crash: Unable to handle kernel NULL pointer dereference at virtual address 00000000 ... CPU: 10 PID: 1 Comm: swapper/0 Not tainted 4.4.240 #197 task: ffffeb398a45ef10 task.stack: ffffeb398a470000 PC is at __list_del_entry_valid+0x5c/0xb0 LR is at pci_slot_release+0x84/0xe4 ... __list_del_entry_valid+0x5c/0xb0 pci_slot_release+0x84/0xe4 kobject_put+0x184/0x1c4 pci_create_slot+0x17c/0x1b4 __pci_hp_initialize+0x68/0xa4 pciehp_probe+0x1a4/0x2fc pcie_port_probe_service+0x58/0x84 driver_probe_device+0x320/0x470 __driver_attach+0x54/0xb8 bus_for_each_dev+0xc8/0xf0 driver_attach+0x30/0x3c bus_add_driver+0x1b0/0x24c driver_register+0x9c/0xe0 pcie_port_service_register+0x64/0x7c pcied_init+0x44/0xa4 do_one_initcall+0x1d0/0x1f0 kernel_init_freeable+0x24c/0x254 kernel_init+0x18/0xe8 ret_from_fork+0x10/0x20 Fixes: 8a94644b440e ("PCI: Fix pci_create_slot() reference count leak") Signed-off-by: Jubin Zhong <zhongjubin@huawei.com> Cc: stable@vger.kernel.org #v4.4.235 ---- v2: Since both slot memory and slot->list would be handled by pci_slot_release(), we need to make sure slot->list is properly initialized beforehand. v1: https://lore.kernel.org/lkml/1606288971-47927-1-git-send-email-zhongjubin@huawei.com/ Two things need to be cleaned up on pci_create_slot's error path: 1. free slot memory 2. remove slot->list from its parent->slots This patch mistakenly took slot memory as unfreed (which is not), and would introduce double free problem. --- drivers/pci/slot.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)