diff mbox series

[v9,1/6] platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()

Message ID 20241203-ep-msi-v9-1-a60dbc3f15dd@nxp.com (mailing list archive)
State Superseded
Headers show
Series PCI: EP: Add RC-to-EP doorbell with platform MSI controller | expand

Commit Message

Frank Li Dec. 3, 2024, 8:36 p.m. UTC
The follow steps trigger kernel dump warning and
platform_device_msi_init_and_alloc_irqs() return false.

1: platform_device_msi_init_and_alloc_irqs();
2: platform_device_msi_free_irqs_all();
3: platform_device_msi_init_and_alloc_irqs();

[   76.713677] WARNING: CPU: 3 PID: 134 at kernel/irq/msi.c:1028 msi_create_device_irq_domain+0x1bc/0x22c
[   76.723010] Modules linked in:
[   76.726082] CPU: 3 UID: 0 PID: 134 Comm: kworker/3:1H Not tainted 6.13.0-rc1-00015-gd60b98003b43-dirty #57
[   76.735741] Hardware name: NXP i.MX95 19X19 board (DT)
[   76.740883] Workqueue: kpcitest pci_epf_test_cmd_handler
[   76.746212] pstate: a0400009 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   76.753172] pc : msi_create_device_irq_domain+0x1bc/0x22c
[   76.758586] lr : msi_create_device_irq_domain+0x104/0x22c
[   76.763988] sp : ffff800083f43be0
[   76.767313] x29: ffff800083f43be0 x28: 0000000000000000 x27: ffff8000827a7000
[   76.774466] x26: ffff00008085f400 x25: ffff00008000b180 x24: ffff000080fc6410
[   76.781624] x23: ffff000085704cc0 x22: ffff8000811c8828 x21: ffff000085704cc0
[   76.788774] x20: ffff000082814000 x19: 0000000000000000 x18: ffffffffffffffff
[   76.795933] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[   76.803083] x14: 0000000000000000 x13: 0000000f00000000 x12: 0000000000000000
[   76.810233] x11: 0000000000000000 x10: 000000000000002d x9 : ffff800083f43ba0
[   76.817383] x8 : 00000000ffffffff x7 : 0000000000000019 x6 : ffff0000857e443a
[   76.824533] x5 : 0000000000000000 x4 : ffffffffffffffff x3 : ffff000085704ce8
[   76.831683] x2 : ffff000080835640 x1 : 0000000000000213 x0 : ffff0000877189c0
[   76.838840] Call trace:
[   76.841287]  msi_create_device_irq_domain+0x1bc/0x22c (P)
[   76.846701]  msi_create_device_irq_domain+0x104/0x22c (L)
[   76.852118]  platform_device_msi_init_and_alloc_irqs+0x6c/0xb8

Do below two things in platform_device_msi_init_and_alloc_irqs().
- msi_create_device_irq_domain()
- msi_domain_alloc_irqs_range()

But only call msi_domain_free_irqs_all() in
platform_device_msi_free_irqs_all(), which missed call
msi_remove_device_irq_domain(). This cause above kernel dump when call
platform_device_msi_init_and_alloc_irqs() again.

Fixes: c88f9110bfbc ("platform-msi: Prepare for real per device domains")
Cc: stable@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
---
 drivers/base/platform-msi.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Thomas Gleixner Dec. 3, 2024, 9:12 p.m. UTC | #1
On Tue, Dec 03 2024 at 15:36, Frank Li wrote:
> The follow steps trigger kernel dump warning and
> platform_device_msi_init_and_alloc_irqs() return false.
>
> 1: platform_device_msi_init_and_alloc_irqs();
> 2: platform_device_msi_free_irqs_all();
> 3: platform_device_msi_init_and_alloc_irqs();
>
> Do below two things in platform_device_msi_init_and_alloc_irqs().
> - msi_create_device_irq_domain()
> - msi_domain_alloc_irqs_range()
>
> But only call msi_domain_free_irqs_all() in
> platform_device_msi_free_irqs_all(), which missed call
> msi_remove_device_irq_domain().

It's not a missed call. It's intentional as all existing users remove
the device afterwards.

> This cause above kernel dump when call
> platform_device_msi_init_and_alloc_irqs() again.

Sure, but that's not a fix and not required for stable because no
existing driver is affected by this unless I'm missing something.

What's the actual use case for this? You describe in great length what
fails, which is nice, but I'm missing the larger picture here.

Thanks,

        tglx
Frank Li Dec. 3, 2024, 9:40 p.m. UTC | #2
On Tue, Dec 03, 2024 at 10:12:36PM +0100, Thomas Gleixner wrote:
> On Tue, Dec 03 2024 at 15:36, Frank Li wrote:
> > The follow steps trigger kernel dump warning and
> > platform_device_msi_init_and_alloc_irqs() return false.
> >
> > 1: platform_device_msi_init_and_alloc_irqs();
> > 2: platform_device_msi_free_irqs_all();
> > 3: platform_device_msi_init_and_alloc_irqs();
> >
> > Do below two things in platform_device_msi_init_and_alloc_irqs().
> > - msi_create_device_irq_domain()
> > - msi_domain_alloc_irqs_range()
> >
> > But only call msi_domain_free_irqs_all() in
> > platform_device_msi_free_irqs_all(), which missed call
> > msi_remove_device_irq_domain().
>
> It's not a missed call. It's intentional as all existing users remove
> the device afterwards.
>
> > This cause above kernel dump when call
> > platform_device_msi_init_and_alloc_irqs() again.
>
> Sure, but that's not a fix and not required for stable because no
> existing driver is affected by this unless I'm missing something.
>
> What's the actual use case for this? You describe in great length what
> fails, which is nice, but I'm missing the larger picture here.

PCI host send a door bell to PCI endpoint, which use platform msi to
trigger a IRQ.

PCI Host side				PCI endpoint side

Send "enable"  command      ->         call platform_device_msi_init_and_alloc_irqs()
Get doorbell address        <-         send back MSI address by shared memory
Write data to doorbell      -> 	       MSI irq handler triggered.
Send "Disable"  command     ->	       call platform_device_msi_free_irqs_all()


At endpoint side, need dymatic response "enable/disable" commands. Of
course, I can call msi_remove_device_irq_domain() in my disable function.
But I think it should be symetic in alloc/free pair functions.

Previous version, alloc/free in bind/unbind function. I missed to do
unbind/bind test. the principle should be the same.

Frank

>
> Thanks,
>
>         tglx
Thomas Gleixner Dec. 3, 2024, 11:14 p.m. UTC | #3
On Tue, Dec 03 2024 at 16:40, Frank Li wrote:
> On Tue, Dec 03, 2024 at 10:12:36PM +0100, Thomas Gleixner wrote:
>> Sure, but that's not a fix and not required for stable because no
>> existing driver is affected by this unless I'm missing something.
>>
>> What's the actual use case for this? You describe in great length what
>> fails, which is nice, but I'm missing the larger picture here.
>
> PCI host send a door bell to PCI endpoint, which use platform msi to
> trigger a IRQ.
>
> PCI Host side				PCI endpoint side
>
> Send "enable"  command      ->         call platform_device_msi_init_and_alloc_irqs()
> Get doorbell address        <-         send back MSI address by shared memory
> Write data to doorbell      -> 	       MSI irq handler triggered.
> Send "Disable"  command     ->	       call platform_device_msi_free_irqs_all()
>
>
> At endpoint side, need dymatic response "enable/disable" commands. Of
> course, I can call msi_remove_device_irq_domain() in my disable function.
> But I think it should be symetic in alloc/free pair functions.

No objections, but that's not a justification for a stable backport as
nothing in tree has this problem right now. You add a new use case which
requires it, so only that new use case has this dependency, no?

Thanks,

        tglx
Frank Li Dec. 4, 2024, 4:10 p.m. UTC | #4
On Wed, Dec 04, 2024 at 12:14:25AM +0100, Thomas Gleixner wrote:
> On Tue, Dec 03 2024 at 16:40, Frank Li wrote:
> > On Tue, Dec 03, 2024 at 10:12:36PM +0100, Thomas Gleixner wrote:
> >> Sure, but that's not a fix and not required for stable because no
> >> existing driver is affected by this unless I'm missing something.
> >>
> >> What's the actual use case for this? You describe in great length what
> >> fails, which is nice, but I'm missing the larger picture here.
> >
> > PCI host send a door bell to PCI endpoint, which use platform msi to
> > trigger a IRQ.
> >
> > PCI Host side				PCI endpoint side
> >
> > Send "enable"  command      ->         call platform_device_msi_init_and_alloc_irqs()
> > Get doorbell address        <-         send back MSI address by shared memory
> > Write data to doorbell      -> 	       MSI irq handler triggered.
> > Send "Disable"  command     ->	       call platform_device_msi_free_irqs_all()
> >
> >
> > At endpoint side, need dymatic response "enable/disable" commands. Of
> > course, I can call msi_remove_device_irq_domain() in my disable function.
> > But I think it should be symetic in alloc/free pair functions.
>
> No objections, but that's not a justification for a stable backport as
> nothing in tree has this problem right now. You add a new use case which
> requires it, so only that new use case has this dependency, no?

Okay, Let me drop fixes and stable tag in next version.

Frank

>
> Thanks,
>
>         tglx
diff mbox series

Patch

diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index 4401cc31e4736..e7615daf5f539 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -98,5 +98,6 @@  EXPORT_SYMBOL_GPL(platform_device_msi_init_and_alloc_irqs);
 void platform_device_msi_free_irqs_all(struct device *dev)
 {
 	msi_domain_free_irqs_all(dev, MSI_DEFAULT_DOMAIN);
+	msi_remove_device_irq_domain(dev, MSI_DEFAULT_DOMAIN);
 }
 EXPORT_SYMBOL_GPL(platform_device_msi_free_irqs_all);