Message ID | 20240926130924.36409-1-brgl@bgdev.pl (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | PCI: take the rescan lock when adding devices during host probe | expand |
On 26.09.2024 3:09 PM, Bartosz Golaszewski wrote: > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > Since adding the PCI power control code, we may end up with a race > between the pwrctl platform device rescanning the bus and the host > controller probe function. The latter needs to take the rescan lock when > adding devices or may crash. > > Reported-by: Konrad Dybcio <konradybcio@kernel.org> > Fixes: 4565d2652a37 ("PCI/pwrctl: Add PCI power control core code") > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > --- Tested-by: Konrad Dybcio <konradybcio@kernel.org> Konrad
On Thu, Sep 26, 2024 at 03:09:23PM +0200, Bartosz Golaszewski wrote: > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > Since adding the PCI power control code, we may end up with a race > between the pwrctl platform device rescanning the bus and the host > controller probe function. The latter needs to take the rescan lock when > adding devices or may crash. > > Reported-by: Konrad Dybcio <konradybcio@kernel.org> > Fixes: 4565d2652a37 ("PCI/pwrctl: Add PCI power control core code") > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > --- > drivers/pci/probe.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 4f68414c3086..f1615805f5b0 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -3105,7 +3105,9 @@ int pci_host_probe(struct pci_host_bridge *bridge) > list_for_each_entry(child, &bus->children, node) > pcie_bus_configure_settings(child); > > + pci_lock_rescan_remove(); > pci_bus_add_devices(bus); > + pci_unlock_rescan_remove(); Seems like we do need locking here, but don't we need a more comprehensive change? There are many other callers of pci_bus_add_devices(), and most of them look similarly unprotected. > return 0; > } > EXPORT_SYMBOL_GPL(pci_host_probe); > -- > 2.30.2 >
On Tue, Oct 1, 2024 at 11:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Thu, Sep 26, 2024 at 03:09:23PM +0200, Bartosz Golaszewski wrote: > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > > Since adding the PCI power control code, we may end up with a race > > between the pwrctl platform device rescanning the bus and the host > > controller probe function. The latter needs to take the rescan lock when > > adding devices or may crash. > > > > Reported-by: Konrad Dybcio <konradybcio@kernel.org> > > Fixes: 4565d2652a37 ("PCI/pwrctl: Add PCI power control core code") > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > --- > > drivers/pci/probe.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > > index 4f68414c3086..f1615805f5b0 100644 > > --- a/drivers/pci/probe.c > > +++ b/drivers/pci/probe.c > > @@ -3105,7 +3105,9 @@ int pci_host_probe(struct pci_host_bridge *bridge) > > list_for_each_entry(child, &bus->children, node) > > pcie_bus_configure_settings(child); > > > > + pci_lock_rescan_remove(); > > pci_bus_add_devices(bus); > > + pci_unlock_rescan_remove(); > > Seems like we do need locking here, but don't we need a more > comprehensive change? There are many other callers of > pci_bus_add_devices(), and most of them look similarly unprotected. > From a quick glance it looks like the majority of users are specific drivers (controller, hotplug, etc.). The calls inside pci_rescan_bus() and pci_rescan_bus_bridge_resize() are already protected from what I can tell. I'm not saying that the driver calls shouldn't be fixed but there's no immediate danger. This however fixes an issue we hit with PCI core so I'd send it upstream now and then we can think about the other use-cases. Bart
On 2.10.2024 10:36 AM, Bartosz Golaszewski wrote: > On Tue, Oct 1, 2024 at 11:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote: >> >> On Thu, Sep 26, 2024 at 03:09:23PM +0200, Bartosz Golaszewski wrote: >>> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> >>> >>> Since adding the PCI power control code, we may end up with a race >>> between the pwrctl platform device rescanning the bus and the host >>> controller probe function. The latter needs to take the rescan lock when >>> adding devices or may crash. >>> >>> Reported-by: Konrad Dybcio <konradybcio@kernel.org> >>> Fixes: 4565d2652a37 ("PCI/pwrctl: Add PCI power control core code") >>> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> >>> --- >>> drivers/pci/probe.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >>> index 4f68414c3086..f1615805f5b0 100644 >>> --- a/drivers/pci/probe.c >>> +++ b/drivers/pci/probe.c >>> @@ -3105,7 +3105,9 @@ int pci_host_probe(struct pci_host_bridge *bridge) >>> list_for_each_entry(child, &bus->children, node) >>> pcie_bus_configure_settings(child); >>> >>> + pci_lock_rescan_remove(); >>> pci_bus_add_devices(bus); >>> + pci_unlock_rescan_remove(); >> >> Seems like we do need locking here, but don't we need a more >> comprehensive change? There are many other callers of >> pci_bus_add_devices(), and most of them look similarly unprotected. >> > > From a quick glance it looks like the majority of users are specific > drivers (controller, hotplug, etc.). The calls inside pci_rescan_bus() > and pci_rescan_bus_bridge_resize() are already protected from what I > can tell. I'm not saying that the driver calls shouldn't be fixed but > there's no immediate danger. This however fixes an issue we hit with > PCI core so I'd send it upstream now and then we can think about the > other use-cases. Probably worth showing an example of how this can manifest: removed a device through sysfs and called bus rescan: [ 63.851901] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 63.861048] Mem abort info: [ 63.863940] ESR = 0x0000000096000004 [ 63.867822] EC = 0x25: DABT (current EL), IL = 32 bits [ 63.873291] SET = 0, FnV = 0 [ 63.876440] EA = 0, S1PTW = 0 [ 63.879688] FSC = 0x04: level 0 translation fault [ 63.884711] Data abort info: [ 63.887697] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 63.893344] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 63.898549] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 63.904009] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000116c36000 [ 63.910644] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 [ 63.917683] Internal error: Oops: 0000000096000004 [#1] SMP [ 63.923413] Modules linked in: [ 63.926561] CPU: 1 UID: 0 PID: 530 Comm: bash Tainted: G W <snip> #10176 [ 63.938971] Tainted: [W]=WARN [ 63.942037] Hardware name: <snip> [ 63.951239] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 63.958398] pc : __pi_strlen+0x14/0x150 [ 63.962350] lr : kernfs_name_hash+0x24/0x88 [ 63.966668] sp : ffff800083c43af0 [ 63.970089] x29: ffff800083c43af0 x28: ffff519888b83500 x27: 0000000000000000 [ 63.977420] x26: 0000000000000000 x25: 0000000000000000 x24: ffff519888b83500 [ 63.984751] x23: 0000000000000011 x22: ffff519886bd6040 x21: ffff519886bd5b00 [ 63.992081] x20: 0000000000000000 x19: 0000000000000000 x18: ffff80008a0bd0a8 [ 63.999410] x17: 0000000000000001 x16: ffff519888b83de8 x15: ffffa08dea3f5bf0 [ 64.006741] x14: ffff519888b83de8 x13: ffffffffffffffff x12: 0000000000000028 [ 64.014076] x11: ffffa08dea3f5bf0 x10: 0000000000000012 x9 : 0000000000000001 [ 64.021403] x8 : 0101010101010101 x7 : ffffa08de7a5e844 x6 : 0000000000000000 [ 64.028730] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 64.036062] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 [ 64.043390] Call trace: [ 64.045918] __pi_strlen+0x14/0x150 [ 64.049508] kernfs_find_ns+0x80/0x13c [ 64.053375] kernfs_remove_by_name_ns+0x54/0xf0 [ 64.058036] sysfs_remove_bin_file+0x24/0x34 [ 64.062436] pci_remove_resource_files+0x3c/0x84 [ 64.067190] pci_remove_sysfs_dev_files+0x28/0x38 [ 64.072025] pci_stop_bus_device+0x8c/0xd8 [ 64.076241] pci_stop_bus_device+0x40/0xd8 [ 64.080463] pci_stop_and_remove_bus_device_locked+0x28/0x48 [ 64.086277] remove_store+0x70/0xb0 [ 64.089878] dev_attr_store+0x20/0x38 [ 64.093649] sysfs_kf_write+0x58/0x78 [ 64.097426] kernfs_fop_write_iter+0xe8/0x184 [ 64.101905] vfs_write+0x2dc/0x308 [ 64.105413] ksys_write+0x7c/0xec [ 64.108834] __arm64_sys_write+0x24/0x34 [ 64.112880] invoke_syscall+0x48/0x100 [ 64.116744] el0_svc_common+0xb4/0xe8 [ 64.120522] do_el0_svc+0x24/0x34 [ 64.123938] el0_svc+0x58/0xb4 [ 64.127085] el0t_64_sync_handler+0x50/0x12c [ 64.131474] el0t_64_sync+0x198/0x19c [ 64.135244] Code: 92402c04 b200c3e8 f13fc09f 5400088c (a9400c02) [ 64.141508] ---[ end trace 0000000000000000 ]--- Konrad
On Wed, Oct 02, 2024 at 10:31:46PM +0200, Konrad Dybcio wrote: > On 2.10.2024 10:36 AM, Bartosz Golaszewski wrote: > > On Tue, Oct 1, 2024 at 11:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > >> > >> On Thu, Sep 26, 2024 at 03:09:23PM +0200, Bartosz Golaszewski wrote: > >>> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > >>> > >>> Since adding the PCI power control code, we may end up with a race > >>> between the pwrctl platform device rescanning the bus and the host > >>> controller probe function. The latter needs to take the rescan lock when > >>> adding devices or may crash. > >>> > >>> Reported-by: Konrad Dybcio <konradybcio@kernel.org> > >>> Fixes: 4565d2652a37 ("PCI/pwrctl: Add PCI power control core code") > >>> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > >>> --- > >>> drivers/pci/probe.c | 2 ++ > >>> 1 file changed, 2 insertions(+) > >>> > >>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > >>> index 4f68414c3086..f1615805f5b0 100644 > >>> --- a/drivers/pci/probe.c > >>> +++ b/drivers/pci/probe.c > >>> @@ -3105,7 +3105,9 @@ int pci_host_probe(struct pci_host_bridge *bridge) > >>> list_for_each_entry(child, &bus->children, node) > >>> pcie_bus_configure_settings(child); > >>> > >>> + pci_lock_rescan_remove(); > >>> pci_bus_add_devices(bus); > >>> + pci_unlock_rescan_remove(); > >> > >> Seems like we do need locking here, but don't we need a more > >> comprehensive change? There are many other callers of > >> pci_bus_add_devices(), and most of them look similarly unprotected. > > > > From a quick glance it looks like the majority of users are specific > > drivers (controller, hotplug, etc.). The calls inside pci_rescan_bus() > > and pci_rescan_bus_bridge_resize() are already protected from what I > > can tell. I'm not saying that the driver calls shouldn't be fixed but > > there's no immediate danger. This however fixes an issue we hit with > > PCI core so I'd send it upstream now and then we can think about the > > other use-cases. Agreed that all current callers of pci_rescan_bus() and pci_rescan_bus_bridge_resize() already do their own locking. Most of the hotplug drivers that use pci_bus_add_devices() do their own locking as well. pci_host_probe() is used by many controller drivers, but my guess is that a dozen or so controller drivers that use pci_bus_add_devices() directly without locking are still at risk. It's sort of an unfinished project to convert drivers like this over to using pci_host_probe(). In the meantime, I wish we had a safer interface that could enforce the locking internally. > Probably worth showing an example of how this can manifest: > > removed a device through sysfs and called bus rescan: Thanks for this; I was about to ask for it! I don't think we need *all* the details, but something like the following might help people recognize if we trip over another instance: > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > Internal error: Oops: 0000000096000004 [#1] SMP > Call trace: > __pi_strlen+0x14/0x150 > kernfs_find_ns+0x80/0x13c > kernfs_remove_by_name_ns+0x54/0xf0 > sysfs_remove_bin_file+0x24/0x34 > pci_remove_resource_files+0x3c/0x84 > pci_remove_sysfs_dev_files+0x28/0x38 > pci_stop_bus_device+0x8c/0xd8 > pci_stop_bus_device+0x40/0xd8 > pci_stop_and_remove_bus_device_locked+0x28/0x48 > remove_store+0x70/0xb0 > dev_attr_store+0x20/0x38 > sysfs_kf_write+0x58/0x78 > kernfs_fop_write_iter+0xe8/0x184 > vfs_write+0x2dc/0x308 Bjorn
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 4f68414c3086..f1615805f5b0 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -3105,7 +3105,9 @@ int pci_host_probe(struct pci_host_bridge *bridge) list_for_each_entry(child, &bus->children, node) pcie_bus_configure_settings(child); + pci_lock_rescan_remove(); pci_bus_add_devices(bus); + pci_unlock_rescan_remove(); return 0; } EXPORT_SYMBOL_GPL(pci_host_probe);