Message ID | 67a9117f463ecdb38a2dbca6a20391ce2f1e7a06.1678543498.git.lukas@wunner.de |
---|---|
State | Accepted |
Commit | 92dc899c3b4927f3cfa23f55bf759171234b5802 |
Headers | show |
Series | Collection of DOE material | expand |
On 12/3/23 01:40, Lukas Wunner wrote: > Gregory Price reports a WARN splat with CONFIG_DEBUG_OBJECTS=y upon CXL > probing because pci_doe_submit_task() invokes INIT_WORK() instead of > INIT_WORK_ONSTACK() for a work_struct that was allocated on the stack. > > All callers of pci_doe_submit_task() allocate the work_struct on the > stack, so replace INIT_WORK() with INIT_WORK_ONSTACK() as a backportable > short-term fix. > > The long-term fix implemented by a subsequent commit is to move to a > synchronous API which allocates the work_struct internally in the DOE > library. > > Stacktrace for posterity: > > WARNING: CPU: 0 PID: 23 at lib/debugobjects.c:545 __debug_object_init.cold+0x18/0x183 > CPU: 0 PID: 23 Comm: kworker/u2:1 Not tainted 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > Call Trace: > pci_doe_submit_task+0x5d/0xd0 > pci_doe_discovery+0xb4/0x100 > pcim_doe_create_mb+0x219/0x290 > cxl_pci_probe+0x192/0x430 > local_pci_probe+0x41/0x80 > pci_device_probe+0xb3/0x220 > really_probe+0xde/0x380 > __driver_probe_device+0x78/0x170 > driver_probe_device+0x1f/0x90 > __driver_attach_async_helper+0x5c/0xe0 > async_run_entry_fn+0x30/0x130 > process_one_work+0x294/0x5b0 > > Fixes: 9d24322e887b ("PCI/DOE: Add DOE mailbox support functions") > Link: https://lore.kernel.org/linux-cxl/Y1bOniJliOFszvIK@memverge.com/ > Reported-by: Gregory Price <gregory.price@memverge.com> > Tested-by: Ira Weiny <ira.weiny@intel.com> > Tested-by: Gregory Price <gregory.price@memverge.com> > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > Reviewed-by: Gregory Price <gregory.price@memverge.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huwei.com> ^^^^^ huwei? :)
On Tue, 21 Mar 2023 14:42:01 +1100 Alexey Kardashevskiy <aik@amd.com> wrote: > On 12/3/23 01:40, Lukas Wunner wrote: > > Gregory Price reports a WARN splat with CONFIG_DEBUG_OBJECTS=y upon CXL > > probing because pci_doe_submit_task() invokes INIT_WORK() instead of > > INIT_WORK_ONSTACK() for a work_struct that was allocated on the stack. > > > > All callers of pci_doe_submit_task() allocate the work_struct on the > > stack, so replace INIT_WORK() with INIT_WORK_ONSTACK() as a backportable > > short-term fix. > > > > The long-term fix implemented by a subsequent commit is to move to a > > synchronous API which allocates the work_struct internally in the DOE > > library. > > > > Stacktrace for posterity: > > > > WARNING: CPU: 0 PID: 23 at lib/debugobjects.c:545 __debug_object_init.cold+0x18/0x183 > > CPU: 0 PID: 23 Comm: kworker/u2:1 Not tainted 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > > Call Trace: > > pci_doe_submit_task+0x5d/0xd0 > > pci_doe_discovery+0xb4/0x100 > > pcim_doe_create_mb+0x219/0x290 > > cxl_pci_probe+0x192/0x430 > > local_pci_probe+0x41/0x80 > > pci_device_probe+0xb3/0x220 > > really_probe+0xde/0x380 > > __driver_probe_device+0x78/0x170 > > driver_probe_device+0x1f/0x90 > > __driver_attach_async_helper+0x5c/0xe0 > > async_run_entry_fn+0x30/0x130 > > process_one_work+0x294/0x5b0 > > > > Fixes: 9d24322e887b ("PCI/DOE: Add DOE mailbox support functions") > > Link: https://lore.kernel.org/linux-cxl/Y1bOniJliOFszvIK@memverge.com/ > > Reported-by: Gregory Price <gregory.price@memverge.com> > > Tested-by: Ira Weiny <ira.weiny@intel.com> > > Tested-by: Gregory Price <gregory.price@memverge.com> > > Signed-off-by: Lukas Wunner <lukas@wunner.de> > > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > > Reviewed-by: Gregory Price <gregory.price@memverge.com> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huwei.com> > ^^^^^ > > huwei? :) Doh. I normally type my own name wrong ;) Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Thanks, Jonathan > >
On Tue, Mar 21, 2023 at 02:42:01PM +1100, Alexey Kardashevskiy wrote: > On 12/3/23 01:40, Lukas Wunner wrote: > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huwei.com> > ^^^^^ > > huwei? :) Thanks for spotting this Alexey. Dan fixed it up when he applied the patch to cxl/fixes yesterday: https://git.kernel.org/cxl/cxl/c/92dc899c3b49
diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c index 6f097932ccbf..c14ffdf23f87 100644 --- a/drivers/pci/doe.c +++ b/drivers/pci/doe.c @@ -523,6 +523,8 @@ EXPORT_SYMBOL_GPL(pci_doe_supports_prot); * task->complete will be called when the state machine is done processing this * task. * + * @task must be allocated on the stack. + * * Excess data will be discarded. * * RETURNS: 0 when task has been successfully queued, -ERRNO on error @@ -544,7 +546,7 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) return -EIO; task->doe_mb = doe_mb; - INIT_WORK(&task->work, doe_statemachine_work); + INIT_WORK_ONSTACK(&task->work, doe_statemachine_work); queue_work(doe_mb->work_queue, &task->work); return 0; }