diff mbox series

irqchip/riscv-aplic: Fix crash when MSI domain is missing

Message ID 20241114200133.3069460-1-samuel.holland@sifive.com (mailing list archive)
State New
Headers show
Series irqchip/riscv-aplic: Fix crash when MSI domain is missing | expand

Checks

Context Check Description
conchuod/vmtest-for-next-PR success PR summary
conchuod/patch-1-test-1 success .github/scripts/patches/tests/build_rv32_defconfig.sh took 147.83s
conchuod/patch-1-test-2 success .github/scripts/patches/tests/build_rv64_clang_allmodconfig.sh took 1403.35s
conchuod/patch-1-test-3 success .github/scripts/patches/tests/build_rv64_gcc_allmodconfig.sh took 1587.93s
conchuod/patch-1-test-4 success .github/scripts/patches/tests/build_rv64_nommu_k210_defconfig.sh took 20.44s
conchuod/patch-1-test-5 success .github/scripts/patches/tests/build_rv64_nommu_virt_defconfig.sh took 22.55s
conchuod/patch-1-test-6 success .github/scripts/patches/tests/checkpatch.sh took 0.65s
conchuod/patch-1-test-7 success .github/scripts/patches/tests/dtb_warn_rv64.sh took 44.27s
conchuod/patch-1-test-8 success .github/scripts/patches/tests/header_inline.sh took 0.00s
conchuod/patch-1-test-9 success .github/scripts/patches/tests/kdoc.sh took 0.50s
conchuod/patch-1-test-10 success .github/scripts/patches/tests/module_param.sh took 0.01s
conchuod/patch-1-test-11 success .github/scripts/patches/tests/verify_fixes.sh took 0.02s
conchuod/patch-1-test-12 success .github/scripts/patches/tests/verify_signedoff.sh took 0.03s

Commit Message

Samuel Holland Nov. 14, 2024, 8:01 p.m. UTC
If the APLIC driver is probed before the IMSIC driver, the parent MSI
domain will be missing, which causes a NULL pointer dereference in
msi_create_device_irq_domain(). Avoid this by deferring probe until the
parent MSI domain is available. Use dev_err_probe() to avoid printing an
error message when returning -EPROBE_DEFER.

Fixes: ca8df97fe679 ("irqchip/riscv-aplic: Add support for MSI-mode")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 drivers/irqchip/irq-riscv-aplic-main.c | 3 ++-
 drivers/irqchip/irq-riscv-aplic-msi.c  | 3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

Comments

Anup Patel Nov. 15, 2024, 3:42 p.m. UTC | #1
On Fri, Nov 15, 2024 at 1:31 AM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> If the APLIC driver is probed before the IMSIC driver, the parent MSI
> domain will be missing, which causes a NULL pointer dereference in
> msi_create_device_irq_domain(). Avoid this by deferring probe until the
> parent MSI domain is available. Use dev_err_probe() to avoid printing an
> error message when returning -EPROBE_DEFER.

The -EPROBE_DEFER is not needed because we expect that platforms to
use "msi-parent" DT property in APLIC DT node which in-turn allows Linux
DD framework to re-order probing based on fw_devlink dependencies. The
APLIC DT bindings mandates that any of "interrupt-extended" or "msi-parent"
DT properties MUST be present.

Can you elaborate a bit more on how you are hitting this issue ?

Regards,
Anup

>
> Fixes: ca8df97fe679 ("irqchip/riscv-aplic: Add support for MSI-mode")
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
>  drivers/irqchip/irq-riscv-aplic-main.c | 3 ++-
>  drivers/irqchip/irq-riscv-aplic-msi.c  | 3 +++
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/irqchip/irq-riscv-aplic-main.c b/drivers/irqchip/irq-riscv-aplic-main.c
> index 900e72541db9..93e7c51f944a 100644
> --- a/drivers/irqchip/irq-riscv-aplic-main.c
> +++ b/drivers/irqchip/irq-riscv-aplic-main.c
> @@ -207,7 +207,8 @@ static int aplic_probe(struct platform_device *pdev)
>         else
>                 rc = aplic_direct_setup(dev, regs);
>         if (rc)
> -               dev_err(dev, "failed to setup APLIC in %s mode\n", msi_mode ? "MSI" : "direct");
> +               dev_err_probe(dev, rc, "failed to setup APLIC in %s mode\n",
> +                             msi_mode ? "MSI" : "direct");
>
>  #ifdef CONFIG_ACPI
>         if (!acpi_disabled)
> diff --git a/drivers/irqchip/irq-riscv-aplic-msi.c b/drivers/irqchip/irq-riscv-aplic-msi.c
> index 945bff28265c..fb8d1838609f 100644
> --- a/drivers/irqchip/irq-riscv-aplic-msi.c
> +++ b/drivers/irqchip/irq-riscv-aplic-msi.c
> @@ -266,6 +266,9 @@ int aplic_msi_setup(struct device *dev, void __iomem *regs)
>                         if (msi_domain)
>                                 dev_set_msi_domain(dev, msi_domain);
>                 }
> +
> +               if (!dev_get_msi_domain(dev))
> +                       return -EPROBE_DEFER;
>         }
>
>         if (!msi_create_device_irq_domain(dev, MSI_DEFAULT_DOMAIN, &aplic_msi_template,
> --
> 2.45.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Samuel Holland Nov. 15, 2024, 3:57 p.m. UTC | #2
Hi Anup,

On 2024-11-15 9:42 AM, Anup Patel wrote:
> On Fri, Nov 15, 2024 at 1:31 AM Samuel Holland
> <samuel.holland@sifive.com> wrote:
>>
>> If the APLIC driver is probed before the IMSIC driver, the parent MSI
>> domain will be missing, which causes a NULL pointer dereference in
>> msi_create_device_irq_domain(). Avoid this by deferring probe until the
>> parent MSI domain is available. Use dev_err_probe() to avoid printing an
>> error message when returning -EPROBE_DEFER.
> 
> The -EPROBE_DEFER is not needed because we expect that platforms to
> use "msi-parent" DT property in APLIC DT node which in-turn allows Linux
> DD framework to re-order probing based on fw_devlink dependencies. The
> APLIC DT bindings mandates that any of "interrupt-extended" or "msi-parent"
> DT properties MUST be present.
> 
> Can you elaborate a bit more on how you are hitting this issue ?

I agree that fw_devlink should help avoid the situation where we need to return
-EPROBE_DEFER, but the kernel must still not crash even if fw_devlink is
disabled (which is a perfectly valid thing to do: "fw_devlink=off" on the kernel
command line) or if fw_devlink fails to come up with the ideal probe order.
fw_devlink is an optimization. It should not be relied on for correctness. In my
specific case, fw_devlink got the order wrong due to some false dependency
cycles, which I sent a patch for separately[1].

Regards,
Samuel

[1]:
https://lore.kernel.org/lkml/20241114195652.3068725-1-samuel.holland@sifive.com/

>> Fixes: ca8df97fe679 ("irqchip/riscv-aplic: Add support for MSI-mode")
>> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
>> ---
>>
>>  drivers/irqchip/irq-riscv-aplic-main.c | 3 ++-
>>  drivers/irqchip/irq-riscv-aplic-msi.c  | 3 +++
>>  2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/irqchip/irq-riscv-aplic-main.c b/drivers/irqchip/irq-riscv-aplic-main.c
>> index 900e72541db9..93e7c51f944a 100644
>> --- a/drivers/irqchip/irq-riscv-aplic-main.c
>> +++ b/drivers/irqchip/irq-riscv-aplic-main.c
>> @@ -207,7 +207,8 @@ static int aplic_probe(struct platform_device *pdev)
>>         else
>>                 rc = aplic_direct_setup(dev, regs);
>>         if (rc)
>> -               dev_err(dev, "failed to setup APLIC in %s mode\n", msi_mode ? "MSI" : "direct");
>> +               dev_err_probe(dev, rc, "failed to setup APLIC in %s mode\n",
>> +                             msi_mode ? "MSI" : "direct");
>>
>>  #ifdef CONFIG_ACPI
>>         if (!acpi_disabled)
>> diff --git a/drivers/irqchip/irq-riscv-aplic-msi.c b/drivers/irqchip/irq-riscv-aplic-msi.c
>> index 945bff28265c..fb8d1838609f 100644
>> --- a/drivers/irqchip/irq-riscv-aplic-msi.c
>> +++ b/drivers/irqchip/irq-riscv-aplic-msi.c
>> @@ -266,6 +266,9 @@ int aplic_msi_setup(struct device *dev, void __iomem *regs)
>>                         if (msi_domain)
>>                                 dev_set_msi_domain(dev, msi_domain);
>>                 }
>> +
>> +               if (!dev_get_msi_domain(dev))
>> +                       return -EPROBE_DEFER;
>>         }
>>
>>         if (!msi_create_device_irq_domain(dev, MSI_DEFAULT_DOMAIN, &aplic_msi_template,
>> --
>> 2.45.1
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
Anup Patel Nov. 18, 2024, 8:19 a.m. UTC | #3
On Fri, Nov 15, 2024 at 9:27 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> Hi Anup,
>
> On 2024-11-15 9:42 AM, Anup Patel wrote:
> > On Fri, Nov 15, 2024 at 1:31 AM Samuel Holland
> > <samuel.holland@sifive.com> wrote:
> >>
> >> If the APLIC driver is probed before the IMSIC driver, the parent MSI
> >> domain will be missing, which causes a NULL pointer dereference in
> >> msi_create_device_irq_domain(). Avoid this by deferring probe until the
> >> parent MSI domain is available. Use dev_err_probe() to avoid printing an
> >> error message when returning -EPROBE_DEFER.
> >
> > The -EPROBE_DEFER is not needed because we expect that platforms to
> > use "msi-parent" DT property in APLIC DT node which in-turn allows Linux
> > DD framework to re-order probing based on fw_devlink dependencies. The
> > APLIC DT bindings mandates that any of "interrupt-extended" or "msi-parent"
> > DT properties MUST be present.
> >
> > Can you elaborate a bit more on how you are hitting this issue ?
>
> I agree that fw_devlink should help avoid the situation where we need to return
> -EPROBE_DEFER, but the kernel must still not crash even if fw_devlink is
> disabled (which is a perfectly valid thing to do: "fw_devlink=off" on the kernel
> command line) or if fw_devlink fails to come up with the ideal probe order.
> fw_devlink is an optimization. It should not be relied on for correctness. In my
> specific case, fw_devlink got the order wrong due to some false dependency
> cycles, which I sent a patch for separately[1].

The RISC-V kernel is heavily dependent on fw_devlink based probe ordering
and more upcoming drivers are going to increase this dependency.
For example, we also have RISC-V IOMMU driver that needs to be probed
after IMSIC since it can use MSIs.

I think we should ensure that fw_devlink can't be disabled/turned-off for the
RISC-V kernel. If this is not possible then we should have very verbose
boot-time warning when fw_devlink is disabled/turned-off.

Your other "interrupt-parent" related fix [1] looks fine to me.

Regards,
Anup

>
> Regards,
> Samuel
>
> [1]:
> https://lore.kernel.org/lkml/20241114195652.3068725-1-samuel.holland@sifive.com/
>
> >> Fixes: ca8df97fe679 ("irqchip/riscv-aplic: Add support for MSI-mode")
> >> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> >> ---
> >>
> >>  drivers/irqchip/irq-riscv-aplic-main.c | 3 ++-
> >>  drivers/irqchip/irq-riscv-aplic-msi.c  | 3 +++
> >>  2 files changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/irqchip/irq-riscv-aplic-main.c b/drivers/irqchip/irq-riscv-aplic-main.c
> >> index 900e72541db9..93e7c51f944a 100644
> >> --- a/drivers/irqchip/irq-riscv-aplic-main.c
> >> +++ b/drivers/irqchip/irq-riscv-aplic-main.c
> >> @@ -207,7 +207,8 @@ static int aplic_probe(struct platform_device *pdev)
> >>         else
> >>                 rc = aplic_direct_setup(dev, regs);
> >>         if (rc)
> >> -               dev_err(dev, "failed to setup APLIC in %s mode\n", msi_mode ? "MSI" : "direct");
> >> +               dev_err_probe(dev, rc, "failed to setup APLIC in %s mode\n",
> >> +                             msi_mode ? "MSI" : "direct");
> >>
> >>  #ifdef CONFIG_ACPI
> >>         if (!acpi_disabled)
> >> diff --git a/drivers/irqchip/irq-riscv-aplic-msi.c b/drivers/irqchip/irq-riscv-aplic-msi.c
> >> index 945bff28265c..fb8d1838609f 100644
> >> --- a/drivers/irqchip/irq-riscv-aplic-msi.c
> >> +++ b/drivers/irqchip/irq-riscv-aplic-msi.c
> >> @@ -266,6 +266,9 @@ int aplic_msi_setup(struct device *dev, void __iomem *regs)
> >>                         if (msi_domain)
> >>                                 dev_set_msi_domain(dev, msi_domain);
> >>                 }
> >> +
> >> +               if (!dev_get_msi_domain(dev))
> >> +                       return -EPROBE_DEFER;
> >>         }
> >>
> >>         if (!msi_create_device_irq_domain(dev, MSI_DEFAULT_DOMAIN, &aplic_msi_template,
> >> --
> >> 2.45.1
> >>
> >>
> >> _______________________________________________
> >> linux-riscv mailing list
> >> linux-riscv@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
diff mbox series

Patch

diff --git a/drivers/irqchip/irq-riscv-aplic-main.c b/drivers/irqchip/irq-riscv-aplic-main.c
index 900e72541db9..93e7c51f944a 100644
--- a/drivers/irqchip/irq-riscv-aplic-main.c
+++ b/drivers/irqchip/irq-riscv-aplic-main.c
@@ -207,7 +207,8 @@  static int aplic_probe(struct platform_device *pdev)
 	else
 		rc = aplic_direct_setup(dev, regs);
 	if (rc)
-		dev_err(dev, "failed to setup APLIC in %s mode\n", msi_mode ? "MSI" : "direct");
+		dev_err_probe(dev, rc, "failed to setup APLIC in %s mode\n",
+			      msi_mode ? "MSI" : "direct");
 
 #ifdef CONFIG_ACPI
 	if (!acpi_disabled)
diff --git a/drivers/irqchip/irq-riscv-aplic-msi.c b/drivers/irqchip/irq-riscv-aplic-msi.c
index 945bff28265c..fb8d1838609f 100644
--- a/drivers/irqchip/irq-riscv-aplic-msi.c
+++ b/drivers/irqchip/irq-riscv-aplic-msi.c
@@ -266,6 +266,9 @@  int aplic_msi_setup(struct device *dev, void __iomem *regs)
 			if (msi_domain)
 				dev_set_msi_domain(dev, msi_domain);
 		}
+
+		if (!dev_get_msi_domain(dev))
+			return -EPROBE_DEFER;
 	}
 
 	if (!msi_create_device_irq_domain(dev, MSI_DEFAULT_DOMAIN, &aplic_msi_template,