Message ID | 20240706132758.53298-1-marek.vasut+renesas@mailbox.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Geert Uytterhoeven |
Headers | show |
Series | [RFC] genirq/cpuhotplug, PCI/rcar-host: Silence set affinity failed warning | expand |
Marek! On Sat, Jul 06 2024 at 15:27, Marek Vasut wrote: > This is an RFC patch, I am looking for input on the approach taken here. > If the approach is sound, this patch would be split into proper patchset. > > Various PCIe controllers that mux MSIs onto single IRQ line produce these > "IRQ%d: set affinity failed" warnings when entering suspend. This has been > discussed before [1] [2] and an example test case is included at the end > of this commit message. > > Attempt to silence the warning by returning specific error code -EOPNOTSUPP > from the irqchip .irq_set_affinity callback, which skips printing the warning > in cpuhotplug.c . The -EOPNOTSUPP was chosen because it indicates exactly what > the problem is, it is not possible to set affinity of each MSI IRQ line to a > specific CPU due to hardware limitation. Why does the irq_chip in question have an irq_set_affinity() callback in the first place? Thanks, tglx
On 7/7/24 8:47 PM, Thomas Gleixner wrote: > Marek! Hello Thomas, > On Sat, Jul 06 2024 at 15:27, Marek Vasut wrote: > >> This is an RFC patch, I am looking for input on the approach taken here. >> If the approach is sound, this patch would be split into proper patchset. >> >> Various PCIe controllers that mux MSIs onto single IRQ line produce these >> "IRQ%d: set affinity failed" warnings when entering suspend. This has been >> discussed before [1] [2] and an example test case is included at the end >> of this commit message. >> >> Attempt to silence the warning by returning specific error code -EOPNOTSUPP >> from the irqchip .irq_set_affinity callback, which skips printing the warning >> in cpuhotplug.c . The -EOPNOTSUPP was chosen because it indicates exactly what >> the problem is, it is not possible to set affinity of each MSI IRQ line to a >> specific CPU due to hardware limitation. > > Why does the irq_chip in question have an irq_set_affinity() callback in > the first place? I believe originally (at least that's what's being discussed in the linked threads) it was because the irqchip code didn't check whether .irq_set_affinity was not NULL at all, so if it was missing, there would be NULL pointer dereference. Now this is checked and irq_do_set_affinity() returns -EINVAL, which triggers the warning that is being silenced by this patch. If you think this is better, I can: - Tweak the cpuhotplug.c code to do some if (chip && !chip->irq_set_affinity) return false; - Remove all the .irq_set_affinity implementations from PCI drivers which only return -EINVAL Would that be better ?
On Mon, Jul 08 2024 at 13:55, Marek Vasut wrote: > On 7/7/24 8:47 PM, Thomas Gleixner wrote: >> Why does the irq_chip in question have an irq_set_affinity() callback in >> the first place? > I believe originally (at least that's what's being discussed in the > linked threads) it was because the irqchip code didn't check whether > .irq_set_affinity was not NULL at all, so if it was missing, there would > be NULL pointer dereference. > > Now this is checked and irq_do_set_affinity() returns -EINVAL, which > triggers the warning that is being silenced by this patch. > > If you think this is better, I can: > - Tweak the cpuhotplug.c code to do some > if (chip && !chip->irq_set_affinity) return false; It does already: migrate_one_irq() if (chip && !chip->irq_set_affinity) return false; Right at the top. > - Remove all the .irq_set_affinity implementations from PCI drivers > which only return -EINVAL > > Would that be better ? I think so. Thanks, tglx
On Tue, Jul 09 2024 at 19:18, Thomas Gleixner wrote: > On Mon, Jul 08 2024 at 13:55, Marek Vasut wrote: >> On 7/7/24 8:47 PM, Thomas Gleixner wrote: >>> Why does the irq_chip in question have an irq_set_affinity() callback in >>> the first place? >> I believe originally (at least that's what's being discussed in the >> linked threads) it was because the irqchip code didn't check whether >> .irq_set_affinity was not NULL at all, so if it was missing, there would >> be NULL pointer dereference. >> >> Now this is checked and irq_do_set_affinity() returns -EINVAL, which >> triggers the warning that is being silenced by this patch. >> >> If you think this is better, I can: >> - Tweak the cpuhotplug.c code to do some >> if (chip && !chip->irq_set_affinity) return false; > > It does already: > > migrate_one_irq() > if (chip && !chip->irq_set_affinity) > return false; > > Right at the top. if (!chip || !chip->irq_set_affinity) { Obviously :)
diff --git a/drivers/pci/controller/pcie-rcar-host.c b/drivers/pci/controller/pcie-rcar-host.c index c01efc6ea64f6..2314b2b30df8a 100644 --- a/drivers/pci/controller/pcie-rcar-host.c +++ b/drivers/pci/controller/pcie-rcar-host.c @@ -660,7 +660,7 @@ static void rcar_msi_irq_unmask(struct irq_data *d) static int rcar_msi_set_affinity(struct irq_data *d, const struct cpumask *mask, bool force) { - return -EINVAL; + return -EOPNOTSUPP; } static void rcar_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c index eb86283901565..822bd6ca40bf9 100644 --- a/kernel/irq/cpuhotplug.c +++ b/kernel/irq/cpuhotplug.c @@ -147,8 +147,10 @@ static bool migrate_one_irq(struct irq_desc *desc) } if (err) { - pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n", - d->irq, err); + if (err != -EOPNOTSUPP) { + pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n", + d->irq, err); + } brokeaff = false; }
This is an RFC patch, I am looking for input on the approach taken here. If the approach is sound, this patch would be split into proper patchset. Various PCIe controllers that mux MSIs onto single IRQ line produce these "IRQ%d: set affinity failed" warnings when entering suspend. This has been discussed before [1] [2] and an example test case is included at the end of this commit message. Attempt to silence the warning by returning specific error code -EOPNOTSUPP from the irqchip .irq_set_affinity callback, which skips printing the warning in cpuhotplug.c . The -EOPNOTSUPP was chosen because it indicates exactly what the problem is, it is not possible to set affinity of each MSI IRQ line to a specific CPU due to hardware limitation. ``` $ grep 25 /proc/interrupts 25: 0 0 0 0 0 0 0 0 PCIe MSI 0 Edge PCIe PME $ echo core > /sys/power/pm_test ; echo mem > /sys/power/state ... Disabling non-boot CPUs ... IRQ25: set affinity failed(-22). <---------- This is being silenced here psci: CPU7 killed (polled 4 ms) ... ``` [1] https://lore.kernel.org/all/d4a6eea3c5e33a3a4056885419df95a7@kernel.org/ [2] https://lore.kernel.org/all/5f4947b18bf381615a37aa81c2242477@kernel.org/ Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org> --- Cc: "Krzysztof WilczyĆski" <kw@linux.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Cc: Jon Hunter <jonathanh@nvidia.com> Cc: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Cc: linux-pci@vger.kernel.org Cc: linux-renesas-soc@vger.kernel.org --- drivers/pci/controller/pcie-rcar-host.c | 2 +- kernel/irq/cpuhotplug.c | 6 ++++-- 2 files changed, 5 insertions(+), 3 deletions(-)