Message ID | 20230316091540.494366-1-alexander.stein@ew.tq-group.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [v2,1/1] Guard pci_create_sysfs_dev_files with atomic value | expand |
Hi, for some reason my additional info was missing. See below. Am Donnerstag, 16. März 2023, 10:15:40 CET schrieb Alexander Stein: > From: Korneliusz Osmenda <korneliuszo@gmail.com> > > On Gateworks Ventana there is a number of PCI devices and: > - imx6_pcie_probe takes longer than start of late init > - pci_sysfs_init sets up flag sysfs_initialized > - pci_sysfs_init initializes already found devices > - imx6_pcie_probe tries to reinitialize device > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215515 > > Signed-off-by: Korneliusz Osmenda <korneliuszo@gmail.com> > Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> > --- Changes in v2: * Rebased to next-20230323 * checkpath.pl fixes I'm hitting the same issue on TQMa6x+MBa6x. There is a race in pci_sysfs_init late initcall and pci_bus_add_device due to regular PCIe bus probing. Having some debug output 'sysfs_initialized' is set to 1 while pci_bus_add_devices is still adding PCIe devices. Having this patch applied my PCIe device using several bridges is detected fine. My PCIe bus looks like thi (using this patch): root@tqma6-common:~# lspci 00:00.0 PCI bridge: Synopsys, Inc. DWC_usb3 / PCIe bridge (rev 01) 01:00.0 PCI bridge: Pericom Semiconductor Device a303 (rev 03) 02:01.0 PCI bridge: Pericom Semiconductor Device a303 (rev 03) 02:02.0 PCI bridge: Pericom Semiconductor Device a303 (rev 03) 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) 04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) root@tqma6-common:~# lspci -t -[0000:00]---00.0-[01-ff]----00.0-[02-04]--+-01.0-[03]----00.0 \-02.0-[04]----00.0 Best regards, Alexander > drivers/pci/pci-sysfs.c | 6 ++++++ > include/linux/pci.h | 2 ++ > 2 files changed, 8 insertions(+) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index dd0d9d9bc509..998e44716b6f 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -1497,6 +1497,9 @@ int __must_check pci_create_sysfs_dev_files(struct > pci_dev *pdev) if (!sysfs_initialized) > return -EACCES; > > + if (atomic_cmpxchg(&pdev->sysfs_init_cnt, 0, 1) == 1) > + return 0; /* already added */ > + > return pci_create_resource_files(pdev); > } > > @@ -1511,6 +1514,9 @@ void pci_remove_sysfs_dev_files(struct pci_dev *pdev) > if (!sysfs_initialized) > return; > > + if (atomic_cmpxchg(&pdev->sysfs_init_cnt, 1, 0) == 0) > + return; /* already removed */ > + > pci_remove_resource_files(pdev); > } > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index b50e5c79f7e3..024313a7a90a 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -467,6 +467,8 @@ struct pci_dev { > pci_dev_flags_t dev_flags; > atomic_t enable_cnt; /* pci_enable_device has been called */ > > + atomic_t sysfs_init_cnt; /* pci_create_sysfs_dev_files has been called */ > + > u32 saved_config_space[16]; /* Config space saved at suspend time */ > struct hlist_head saved_cap_space; > int rom_attr_enabled; /* Display of ROM attribute enabled? */
On 16.03.23 10:15, Alexander Stein wrote: > From: Korneliusz Osmenda <korneliuszo@gmail.com> > > On Gateworks Ventana there is a number of PCI devices and: > - imx6_pcie_probe takes longer than start of late init > - pci_sysfs_init sets up flag sysfs_initialized > - pci_sysfs_init initializes already found devices > - imx6_pcie_probe tries to reinitialize device > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215515 > > Signed-off-by: Korneliusz Osmenda <korneliuszo@gmail.com> > Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> > --- > drivers/pci/pci-sysfs.c | 6 ++++++ > include/linux/pci.h | 2 ++ > 2 files changed, 8 insertions(+) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index dd0d9d9bc509..998e44716b6f 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -1497,6 +1497,9 @@ int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev) > if (!sysfs_initialized) > return -EACCES; > > + if (atomic_cmpxchg(&pdev->sysfs_init_cnt, 0, 1) == 1) > + return 0; /* already added */ > + > return pci_create_resource_files(pdev); This is very likely a bug. You are returning an error in the error case. Yet the flag stays. And simply resetting it in the error case would be a race. There is something fishy in that design. Regards Oliver
Hi Oliver, Am Donnerstag, 16. März 2023, 10:23:54 CET schrieb Oliver Neukum: > On 16.03.23 10:15, Alexander Stein wrote: > > From: Korneliusz Osmenda <korneliuszo@gmail.com> > > > > On Gateworks Ventana there is a number of PCI devices and: > > - imx6_pcie_probe takes longer than start of late init > > - pci_sysfs_init sets up flag sysfs_initialized > > - pci_sysfs_init initializes already found devices > > - imx6_pcie_probe tries to reinitialize device > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215515 > > > > Signed-off-by: Korneliusz Osmenda <korneliuszo@gmail.com> > > Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> > > --- > > > > drivers/pci/pci-sysfs.c | 6 ++++++ > > include/linux/pci.h | 2 ++ > > 2 files changed, 8 insertions(+) > > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > > index dd0d9d9bc509..998e44716b6f 100644 > > --- a/drivers/pci/pci-sysfs.c > > +++ b/drivers/pci/pci-sysfs.c > > @@ -1497,6 +1497,9 @@ int __must_check pci_create_sysfs_dev_files(struct > > pci_dev *pdev)> > > if (!sysfs_initialized) > > > > return -EACCES; > > > > + if (atomic_cmpxchg(&pdev->sysfs_init_cnt, 0, 1) == 1) > > + return 0; /* already added */ > > + > > > > return pci_create_resource_files(pdev); > > This is very likely a bug. You are returning an error in the error > case. Yet the flag stays. Ah, you are right. This is something needed to address. > And simply resetting it in the error case > would be a race. There is something fishy in that design. Admittedly I would like to get rid of these two pathes for creating sysfs files in the first place, but I do not know the pci subsystem very well. IMHO for_each_pci_dev(pdev) in pci_sysfs_init is part of the problem as it unconditionally iterates over the bus, without any locks, thus creating sysfs files for each device added to the bus. Any ideas? Best regards, Alexander
On 16.03.23 10:33, Alexander Stein wrote: > Hi Oliver, Hi, > > Admittedly > I would like to get rid of these two pathes for creating sysfs files in the > first place, but I do not know the pci subsystem very well. > IMHO for_each_pci_dev(pdev) in pci_sysfs_init is part of the problem as it > unconditionally iterates over the bus, without any locks, thus creating sysfs > files for each device added to the bus. > Any ideas? First of all, this existing code is a mess. If I understand you have the issue that your driver adds a bridge in dw_pcie_host_init() and the generic code in pci_create_sysfs_dev_files() populates the directory before or while your driver does so and the devices are effectively discovered twice. It seems to me that you must not add a bridge before pci_create_sysfs_dev_files() has finished. Now you could add a wait_queue and a flag and wait for it to finish. But that is not very elegant. From which initcall is your driver probed? Regards Oliver
Hi Oliver, Am Donnerstag, 16. März 2023, 12:17:32 CET schrieb Oliver Neukum: > On 16.03.23 10:33, Alexander Stein wrote: > > Hi Oliver, > > Hi, > > > Admittedly > > I would like to get rid of these two pathes for creating sysfs files in > > the > > first place, but I do not know the pci subsystem very well. > > IMHO for_each_pci_dev(pdev) in pci_sysfs_init is part of the problem as it > > unconditionally iterates over the bus, without any locks, thus creating > > sysfs files for each device added to the bus. > > Any ideas? > > First of all, this existing code is a mess. > > If I understand you have the issue that your driver adds a bridge > in dw_pcie_host_init() and the generic code in pci_create_sysfs_dev_files() > populates the directory before or while your driver does so and > the devices are effectively discovered twice. Yep, that's my observation as well. > It seems to me that you must not add a bridge before > pci_create_sysfs_dev_files() has finished. Now you could add a wait_queue > and a flag and wait for it to finish. But that is not very elegant. Do we need the pci_sysfs_init initcall at all? Or to put it in other words, what does this initcall solve? See my different approach eliminating this race at all. > From which initcall is your driver probed? The callstack looks like this: > imx6_pcie_probe from platform_probe+0x5c/0xb8 > platform_probe from call_driver_probe+0x24/0x118 > call_driver_probe from really_probe+0xc4/0x31c > really_probe from __driver_probe_device+0x8c/0x120 > __driver_probe_device from driver_probe_device+0x30/0xc0 > driver_probe_device from __driver_attach_async_helper+0x50/0xd8 > __driver_attach_async_helper from async_run_entry_fn+0x30/0x144 > async_run_entry_fn from process_one_work+0x1c4/0x3d0 > process_one_work from worker_thread+0x50/0x41c > worker_thread from kthread+0xec/0x104 > kthread from ret_from_fork+0x14/0x2c So technically the device is not probed from within a initcall but a kthread. It is set to be probed asynchronous in imx6_pcie_driver. This async call is scheduled in __driver_attach, from this callstack: > __driver_attach from bus_for_each_dev+0x74/0xc8 > bus_for_each_dev from bus_add_driver+0xf0/0x1f4 > bus_add_driver from driver_register+0x7c/0x118 > driver_register from do_one_initcall+0x4c/0x180 > do_one_initcall from do_initcalls+0xe0/0x114 > do_initcalls from kernel_init_freeable+0xd8/0x100 > kernel_init_freeable from kernel_init+0x18/0x12c > kernel_init from ret_from_fork+0x14/0x2c Best regards, Alexander
On 16.03.23 12:58, Alexander Stein wrote: > Hi Oliver, > > Am Donnerstag, 16. März 2023, 12:17:32 CET schrieb Oliver Neukum: >> It seems to me that you must not add a bridge before >> pci_create_sysfs_dev_files() has finished. Now you could add a wait_queue >> and a flag and wait for it to finish. But that is not very elegant. > > Do we need the pci_sysfs_init initcall at all? Or to put it in other words, > what does this initcall solve? Fundamentally something has to discover the root bridge. Secondly your system has to boot. The device right behind the root bridge will already be up and running when the kernel takes control. IMHO treating such devices differently from other devices makes sense. > See my different approach eliminating this race at all. Please elaborate >> From which initcall is your driver probed? > > The callstack looks like this: >> imx6_pcie_probe from platform_probe+0x5c/0xb8 >> platform_probe from call_driver_probe+0x24/0x118 >> call_driver_probe from really_probe+0xc4/0x31c >> really_probe from __driver_probe_device+0x8c/0x120 >> __driver_probe_device from driver_probe_device+0x30/0xc0 >> driver_probe_device from __driver_attach_async_helper+0x50/0xd8 >> __driver_attach_async_helper from async_run_entry_fn+0x30/0x144 >> async_run_entry_fn from process_one_work+0x1c4/0x3d0 >> process_one_work from worker_thread+0x50/0x41c >> worker_thread from kthread+0xec/0x104 >> kthread from ret_from_fork+0x14/0x2c > > So technically the device is not probed from within a initcall but a kthread. > It is set to be probed asynchronous in imx6_pcie_driver. That may be the problem, respectively that system is incomplete You are registering a PCI bridge. The PCI subsystem should be done setting up when you run. That is just a simple dependency. Regards Oliver
Hi Oliver, Am Donnerstag, 16. März 2023, 13:23:25 CET schrieb Oliver Neukum: > On 16.03.23 12:58, Alexander Stein wrote: > > Hi Oliver, > > > > Am Donnerstag, 16. März 2023, 12:17:32 CET schrieb Oliver Neukum: > >> It seems to me that you must not add a bridge before > >> pci_create_sysfs_dev_files() has finished. Now you could add a wait_queue > >> and a flag and wait for it to finish. But that is not very elegant. > > > > Do we need the pci_sysfs_init initcall at all? Or to put it in other > > words, > > what does this initcall solve? > > Fundamentally something has to discover the root bridge. > Secondly your system has to boot. The device right behind > the root bridge will already be up and running when the kernel > takes control. IMHO treating such devices differently from > other devices makes sense. But isn't the root bridge discovered by the driver (pci-imx6 in this case) for that? And the driver probe path eventually calls into the sysfs file creation. I compared the file creation to usb, as this is a discoverable bus as well. There is no special initialization regarding sysfs. > > See my different approach eliminating this race at all. > > Please elaborate Currently the initcall pci_sysfs_init and the PCIe root bridge driver probe paths are competing for file creation. If, for some reason, the device enumeration for PCI bus during imx6_pcie_probe is delayed after pci_sysfs_init initcall, this initcall essentially does nothing, no devices or busses to iterate. Which means the complete pcie sysfs creation is done from bridge probe path. There is no reason to iterate over discovered PCIe devices/busses separately. I assume this issue is not that prominent, if at all, as other platforms vary in speed a lot. I was not able to reproduce on i.MX8MP which uses the same PCIe bridge driver. Due to improved speed performance, I guess on this platform pci_sysfs_init finishes, without doing anything, before PCIe bridge is probed. I might be missing something (ACPI systems, etc.), I do not know the details within pci subsystem, but from my point of view this initcall is superfluous. For the record the patch is at [1] [1] https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ew.tq-group.com/T/#u > >> From which initcall is your driver probed? > > > > The callstack looks like this: > >> imx6_pcie_probe from platform_probe+0x5c/0xb8 > >> platform_probe from call_driver_probe+0x24/0x118 > >> call_driver_probe from really_probe+0xc4/0x31c > >> really_probe from __driver_probe_device+0x8c/0x120 > >> __driver_probe_device from driver_probe_device+0x30/0xc0 > >> driver_probe_device from __driver_attach_async_helper+0x50/0xd8 > >> __driver_attach_async_helper from async_run_entry_fn+0x30/0x144 > >> async_run_entry_fn from process_one_work+0x1c4/0x3d0 > >> process_one_work from worker_thread+0x50/0x41c > >> worker_thread from kthread+0xec/0x104 > >> kthread from ret_from_fork+0x14/0x2c > > > > So technically the device is not probed from within a initcall but a > > kthread. It is set to be probed asynchronous in imx6_pcie_driver. > > That may be the problem, respectively that system is incomplete > You are registering a PCI bridge. The PCI subsystem should be > done setting up when you run. That is just a simple dependency. Is there such an dependency in the first place? I can't see anything, even the late_initcall to pci_resource_alignment_sysfs_init is a different matter. Best regards, Alexander
On 16.03.23 14:16, Alexander Stein wrote: > But isn't the root bridge discovered by the driver (pci-imx6 in this case) for > that? And the driver probe path eventually calls into the sysfs file creation. > I compared the file creation to usb, as this is a discoverable bus as well. > There is no special initialization regarding sysfs. If you discover a bus system you always have the option of creating of virtual hotplug event for the root hub or host controller. But for PCI that is a bad design choice. USB is different. > If, for some reason, the device enumeration for PCI bus during imx6_pcie_probe > is delayed after pci_sysfs_init initcall, this initcall essentially does > nothing, no devices or busses to iterate. Which means the complete pcie sysfs On your specific system. You cannot use that as a model for all systems. > creation is done from bridge probe path. There is no reason to iterate over > discovered PCIe devices/busses separately. If there is no other PCI device, the loop is a nop. But otherwise it is necessary. >>> So technically the device is not probed from within a initcall but a >>> kthread. It is set to be probed asynchronous in imx6_pcie_driver. >> >> That may be the problem, respectively that system is incomplete >> You are registering a PCI bridge. The PCI subsystem should be >> done setting up when you run. That is just a simple dependency. > > Is there such an dependency in the first place? I can't see anything, even the > late_initcall to pci_resource_alignment_sysfs_init is a different matter. On your hardware, yes. In the kernel, no. That is the very point. The kernel is missing a way to represent a dependency. Regards Oliver
Hi Oliver, Am Donnerstag, 16. März 2023, 15:01:25 CET schrieb Oliver Neukum: > On 16.03.23 14:16, Alexander Stein wrote: > > But isn't the root bridge discovered by the driver (pci-imx6 in this case) > > for that? And the driver probe path eventually calls into the sysfs file > > creation. I compared the file creation to usb, as this is a discoverable > > bus as well. There is no special initialization regarding sysfs. > > If you discover a bus system you always have the option of creating of > virtual hotplug event for the root hub or host controller. > But for PCI that is a bad design choice. USB is different. I'm not sure if I can follow you here. Can you elaborate? > > If, for some reason, the device enumeration for PCI bus during > > imx6_pcie_probe is delayed after pci_sysfs_init initcall, this initcall > > essentially does nothing, no devices or busses to iterate. Which means > > the complete pcie sysfs > On your specific system. You cannot use that as a model for all systems. I am aware that my platform is not a role model for the others. But I've yet to get information what is actually different on other platforms. > > creation is done from bridge probe path. There is no reason to iterate > > over > > discovered PCIe devices/busses separately. > > If there is no other PCI device, the loop is a nop. But otherwise it is > necessary. How is it necessary? How do these PCI devices get attaches to the pci_bus_type bus without calling pci_bus_add_device? > >>> So technically the device is not probed from within a initcall but a > >>> kthread. It is set to be probed asynchronous in imx6_pcie_driver. > >> > >> That may be the problem, respectively that system is incomplete > >> You are registering a PCI bridge. The PCI subsystem should be > >> done setting up when you run. That is just a simple dependency. > > > > Is there such an dependency in the first place? I can't see anything, even > > the late_initcall to pci_resource_alignment_sysfs_init is a different > > matter. > On your hardware, yes. In the kernel, no. > That is the very point. The kernel is missing a way to represent a > dependency. Okay, so which dependency is provided by pci_sysfs_init, which are required by drivers then? Best regards, Alexander
Am Donnerstag, 16. März 2023, 15:01:25 CET schrieb Oliver Neukum: > I'm not sure if I can follow you here. Can you elaborate? There are far better reasons to leave the setup of a PCI bus as the firmware has done it than to leave a USB as is. > How is it necessary? How do these PCI devices get attaches to the pci_bus_type > bus without calling pci_bus_add_device? AFAICT they don't. But somebody has to call it. > Okay, so which dependency is provided by pci_sysfs_init, which are required by > drivers then? It isn't. It is missing from the code. But it exists in reality. That is the point. You have a race condition between two probes. We cannot have that. Hence IMHO the dependency would be best expressed by waiting for pci_sysfs_init() to finish in the init sequence before you add any PCI bridges or devices to the system. Regards Oliver
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index dd0d9d9bc509..998e44716b6f 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -1497,6 +1497,9 @@ int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev) if (!sysfs_initialized) return -EACCES; + if (atomic_cmpxchg(&pdev->sysfs_init_cnt, 0, 1) == 1) + return 0; /* already added */ + return pci_create_resource_files(pdev); } @@ -1511,6 +1514,9 @@ void pci_remove_sysfs_dev_files(struct pci_dev *pdev) if (!sysfs_initialized) return; + if (atomic_cmpxchg(&pdev->sysfs_init_cnt, 1, 0) == 0) + return; /* already removed */ + pci_remove_resource_files(pdev); } diff --git a/include/linux/pci.h b/include/linux/pci.h index b50e5c79f7e3..024313a7a90a 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -467,6 +467,8 @@ struct pci_dev { pci_dev_flags_t dev_flags; atomic_t enable_cnt; /* pci_enable_device has been called */ + atomic_t sysfs_init_cnt; /* pci_create_sysfs_dev_files has been called */ + u32 saved_config_space[16]; /* Config space saved at suspend time */ struct hlist_head saved_cap_space; int rom_attr_enabled; /* Display of ROM attribute enabled? */