Message ID | 20241003084342.27501-1-brgl@bgdev.pl (mailing list archive) |
---|---|
State | Accepted |
Commit | 1d59d474e1cb7d4fdf87dfaf96f44647f13ea590 |
Headers | show |
Series | [v2] PCI: take the rescan lock when adding devices during host probe | expand |
On Thu, Oct 03, 2024 at 10:43:41AM +0200, Bartosz Golaszewski wrote: > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > Since adding the PCI power control code, we may end up with a race > between the pwrctl platform device rescanning the bus and the host > controller probe function. The latter needs to take the rescan lock when > adding devices or we may end up in an undefined state having two > incompletely added devices and hit the following crash when trying to > remove the device over sysfs: > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > Internal error: Oops: 0000000096000004 [#1] SMP > Call trace: > __pi_strlen+0x14/0x150 > kernfs_find_ns+0x80/0x13c > kernfs_remove_by_name_ns+0x54/0xf0 > sysfs_remove_bin_file+0x24/0x34 > pci_remove_resource_files+0x3c/0x84 > pci_remove_sysfs_dev_files+0x28/0x38 > pci_stop_bus_device+0x8c/0xd8 > pci_stop_bus_device+0x40/0xd8 > pci_stop_and_remove_bus_device_locked+0x28/0x48 > remove_store+0x70/0xb0 > dev_attr_store+0x20/0x38 > sysfs_kf_write+0x58/0x78 > kernfs_fop_write_iter+0xe8/0x184 > vfs_write+0x2dc/0x308 > ksys_write+0x7c/0xec > Thanks for adding the crash log. It always helps to have the log in patch description to find *this* patch. > Reported-by: Konrad Dybcio <konradybcio@kernel.org> > Tested-by: Konrad Dybcio <konradybcio@kernel.org> > Fixes: 4565d2652a37 ("PCI/pwrctl: Add PCI power control core code") > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> - Mani > --- > v1 -> v2: > - improve the commit message, add example stack trace > > drivers/pci/probe.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 4f68414c3086..f1615805f5b0 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -3105,7 +3105,9 @@ int pci_host_probe(struct pci_host_bridge *bridge) > list_for_each_entry(child, &bus->children, node) > pcie_bus_configure_settings(child); > > + pci_lock_rescan_remove(); > pci_bus_add_devices(bus); > + pci_unlock_rescan_remove(); > return 0; > } > EXPORT_SYMBOL_GPL(pci_host_probe); > -- > 2.30.2 > >
On Thu, Oct 3, 2024 at 10:43 AM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > Since adding the PCI power control code, we may end up with a race > between the pwrctl platform device rescanning the bus and the host > controller probe function. The latter needs to take the rescan lock when > adding devices or we may end up in an undefined state having two > incompletely added devices and hit the following crash when trying to > remove the device over sysfs: > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > Internal error: Oops: 0000000096000004 [#1] SMP > Call trace: > __pi_strlen+0x14/0x150 > kernfs_find_ns+0x80/0x13c > kernfs_remove_by_name_ns+0x54/0xf0 > sysfs_remove_bin_file+0x24/0x34 > pci_remove_resource_files+0x3c/0x84 > pci_remove_sysfs_dev_files+0x28/0x38 > pci_stop_bus_device+0x8c/0xd8 > pci_stop_bus_device+0x40/0xd8 > pci_stop_and_remove_bus_device_locked+0x28/0x48 > remove_store+0x70/0xb0 > dev_attr_store+0x20/0x38 > sysfs_kf_write+0x58/0x78 > kernfs_fop_write_iter+0xe8/0x184 > vfs_write+0x2dc/0x308 > ksys_write+0x7c/0xec > > Reported-by: Konrad Dybcio <konradybcio@kernel.org> > Tested-by: Konrad Dybcio <konradybcio@kernel.org> > Fixes: 4565d2652a37 ("PCI/pwrctl: Add PCI power control core code") > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > --- It's been a week, so gentle ping - can this be picked up into v6.12? Thanks, Bartosz
On Thu, Oct 10, 2024 at 11:17:47AM +0200, Bartosz Golaszewski wrote: > On Thu, Oct 3, 2024 at 10:43 AM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > > Since adding the PCI power control code, we may end up with a race > > between the pwrctl platform device rescanning the bus and the host > > controller probe function. The latter needs to take the rescan lock when > > adding devices or we may end up in an undefined state having two > > incompletely added devices and hit the following crash when trying to > > remove the device over sysfs: > > > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > > Internal error: Oops: 0000000096000004 [#1] SMP > > Call trace: > > __pi_strlen+0x14/0x150 > > kernfs_find_ns+0x80/0x13c > > kernfs_remove_by_name_ns+0x54/0xf0 > > sysfs_remove_bin_file+0x24/0x34 > > pci_remove_resource_files+0x3c/0x84 > > pci_remove_sysfs_dev_files+0x28/0x38 > > pci_stop_bus_device+0x8c/0xd8 > > pci_stop_bus_device+0x40/0xd8 > > pci_stop_and_remove_bus_device_locked+0x28/0x48 > > remove_store+0x70/0xb0 > > dev_attr_store+0x20/0x38 > > sysfs_kf_write+0x58/0x78 > > kernfs_fop_write_iter+0xe8/0x184 > > vfs_write+0x2dc/0x308 > > ksys_write+0x7c/0xec > > > > Reported-by: Konrad Dybcio <konradybcio@kernel.org> > > Tested-by: Konrad Dybcio <konradybcio@kernel.org> > > Fixes: 4565d2652a37 ("PCI/pwrctl: Add PCI power control core code") > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > --- > > It's been a week, so gentle ping - can this be picked up into v6.12? I hoped we could fix the similar latent issues in other drivers, but yes, we can get this in v6.12. Thanks for the hint that it should go there. I'll pick it up when I return from vacation on Wednesday. Bjorn
On Sat, Oct 12, 2024 at 4:31 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Thu, Oct 10, 2024 at 11:17:47AM +0200, Bartosz Golaszewski wrote: > > On Thu, Oct 3, 2024 at 10:43 AM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > > > > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > > > > Since adding the PCI power control code, we may end up with a race > > > between the pwrctl platform device rescanning the bus and the host > > > controller probe function. The latter needs to take the rescan lock when > > > adding devices or we may end up in an undefined state having two > > > incompletely added devices and hit the following crash when trying to > > > remove the device over sysfs: > > > > > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > > > Internal error: Oops: 0000000096000004 [#1] SMP > > > Call trace: > > > __pi_strlen+0x14/0x150 > > > kernfs_find_ns+0x80/0x13c > > > kernfs_remove_by_name_ns+0x54/0xf0 > > > sysfs_remove_bin_file+0x24/0x34 > > > pci_remove_resource_files+0x3c/0x84 > > > pci_remove_sysfs_dev_files+0x28/0x38 > > > pci_stop_bus_device+0x8c/0xd8 > > > pci_stop_bus_device+0x40/0xd8 > > > pci_stop_and_remove_bus_device_locked+0x28/0x48 > > > remove_store+0x70/0xb0 > > > dev_attr_store+0x20/0x38 > > > sysfs_kf_write+0x58/0x78 > > > kernfs_fop_write_iter+0xe8/0x184 > > > vfs_write+0x2dc/0x308 > > > ksys_write+0x7c/0xec > > > > > > Reported-by: Konrad Dybcio <konradybcio@kernel.org> > > > Tested-by: Konrad Dybcio <konradybcio@kernel.org> > > > Fixes: 4565d2652a37 ("PCI/pwrctl: Add PCI power control core code") > > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > --- > > > > It's been a week, so gentle ping - can this be picked up into v6.12? > > I hoped we could fix the similar latent issues in other drivers, but > yes, we can get this in v6.12. Thanks for the hint that it should go > there. I'll pick it up when I return from vacation on Wednesday. > Sure this can still be done but this patch fixes an urgent issue and I think it warrants fast tracking it to mainline. Bart
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 4f68414c3086..f1615805f5b0 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -3105,7 +3105,9 @@ int pci_host_probe(struct pci_host_bridge *bridge) list_for_each_entry(child, &bus->children, node) pcie_bus_configure_settings(child); + pci_lock_rescan_remove(); pci_bus_add_devices(bus); + pci_unlock_rescan_remove(); return 0; } EXPORT_SYMBOL_GPL(pci_host_probe);