Message ID | 20220111163800.22362-1-tyhicks@linux.microsoft.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | EDAC/dmc520: Don't print an error for each unconfigured interrupt line | expand |
> -----Original Message----- > From: Tyler Hicks <tyhicks@linux.microsoft.com> > Sent: Tuesday, January 11, 2022 8:38 AM > To: Lei Wang (DPLAT) <Wang.Lei@microsoft.com>; Borislav Petkov > <bp@alien8.de>; Tony Luck <tony.luck@intel.com>; Mauro Carvalho Chehab > <mchehab@kernel.org> > Cc: Sinan Kaya <okaya@kernel.org>; Shiping Ji <shiping.linux@gmail.com>; > James Morse <james.morse@arm.com>; Robert Richter <rric@kernel.org>; > linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: [PATCH] EDAC/dmc520: Don't print an error for each unconfigured > interrupt line > > The dmc520 driver requires that at least one interrupt line, out of the ten > possible, is configured. The driver prints an error and returns -EINVAL from > its .probe function if there are no interrupt lines configured. > > Don't print a KERN_ERR level message for each interrupt line that's > unconfigured as that can confuse users into thinking that there is an error > condition. > > Before this change, the following KERN_ERR level messages would be reported > if only dram_ecc_errc and dram_ecc_errd were configured in the device tree: > > dmc520 68000000.dmc: IRQ ram_ecc_errc not found > dmc520 68000000.dmc: IRQ ram_ecc_errd not found > dmc520 68000000.dmc: IRQ failed_access not found > dmc520 68000000.dmc: IRQ failed_prog not found > dmc520 68000000.dmc: IRQ link_err not > dmc520 68000000.dmc: IRQ temperature_event not found > dmc520 68000000.dmc: IRQ arch_fsm not found > dmc520 68000000.dmc: IRQ phy_request not found > > Fixes: 1088750d7839 ("EDAC: Add EDAC driver for DMC520") > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Looks good. EDAC-CORE maintainers, please take the patch through your tree. Thanks! Acked-by: Lei Wang <lewan@microsoft.com> > Cc: <stable@vger.kernel.org> > Reported-by: Sinan Kaya <okaya@kernel.org> > --- > drivers/edac/dmc520_edac.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/edac/dmc520_edac.c b/drivers/edac/dmc520_edac.c index > b8a7d9594afd..1fa5ca57e9ec 100644 > --- a/drivers/edac/dmc520_edac.c > +++ b/drivers/edac/dmc520_edac.c > @@ -489,7 +489,7 @@ static int dmc520_edac_probe(struct platform_device > *pdev) > dev = &pdev->dev; > > for (idx = 0; idx < NUMBER_OF_IRQS; idx++) { > - irq = platform_get_irq_byname(pdev, > dmc520_irq_configs[idx].name); > + irq = platform_get_irq_byname_optional(pdev, > +dmc520_irq_configs[idx].name); > irqs[idx] = irq; > masks[idx] = dmc520_irq_configs[idx].mask; > if (irq >= 0) { > -- > 2.25.1
On Tue, Jan 11, 2022 at 10:38:00AM -0600, Tyler Hicks wrote: > The dmc520 driver requires that at least one interrupt line, out of the ten > possible, is configured. The driver prints an error and returns -EINVAL > from its .probe function if there are no interrupt lines configured. > > Don't print a KERN_ERR level message for each interrupt line that's > unconfigured as that can confuse users into thinking that there is an > error condition. > > Before this change, the following KERN_ERR level messages would be > reported if only dram_ecc_errc and dram_ecc_errd were configured in the > device tree: > > dmc520 68000000.dmc: IRQ ram_ecc_errc not found > dmc520 68000000.dmc: IRQ ram_ecc_errd not found > dmc520 68000000.dmc: IRQ failed_access not found > dmc520 68000000.dmc: IRQ failed_prog not found > dmc520 68000000.dmc: IRQ link_err not > dmc520 68000000.dmc: IRQ temperature_event not found > dmc520 68000000.dmc: IRQ arch_fsm not found > dmc520 68000000.dmc: IRQ phy_request not found > > Fixes: 1088750d7839 ("EDAC: Add EDAC driver for DMC520") > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> > Cc: <stable@vger.kernel.org> Why stable? AFAICT, this is fixing only the spew of some error messages but the driver is still functional.
On 2022-01-16 19:29:46, Borislav Petkov wrote: > On Tue, Jan 11, 2022 at 10:38:00AM -0600, Tyler Hicks wrote: > > The dmc520 driver requires that at least one interrupt line, out of the ten > > possible, is configured. The driver prints an error and returns -EINVAL > > from its .probe function if there are no interrupt lines configured. > > > > Don't print a KERN_ERR level message for each interrupt line that's > > unconfigured as that can confuse users into thinking that there is an > > error condition. > > > > Before this change, the following KERN_ERR level messages would be > > reported if only dram_ecc_errc and dram_ecc_errd were configured in the > > device tree: > > > > dmc520 68000000.dmc: IRQ ram_ecc_errc not found > > dmc520 68000000.dmc: IRQ ram_ecc_errd not found > > dmc520 68000000.dmc: IRQ failed_access not found > > dmc520 68000000.dmc: IRQ failed_prog not found > > dmc520 68000000.dmc: IRQ link_err not > > dmc520 68000000.dmc: IRQ temperature_event not found > > dmc520 68000000.dmc: IRQ arch_fsm not found > > dmc520 68000000.dmc: IRQ phy_request not found > > > > Fixes: 1088750d7839 ("EDAC: Add EDAC driver for DMC520") > > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> > > Cc: <stable@vger.kernel.org> > > Why stable? AFAICT, this is fixing only the spew of some error messages > but the driver is still functional. KERN_ERR messages trip log scanners and cause concern that the kernel/hardware is not configured or working correctly. They also add a little big of ongoing stress into kernel maintainer's lives, as we prepare and test kernel updates, since they show up as red text in journalctl output that we have to think about regularly. Multiple KERN_ERR messages, 8 in this case, can also be considered a little worse than a single error message. I feel like this trivial fix is worth taking into stable rather than suppressing these errors (mentally and in log scanners) for years. Tyler > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette >
On Tue, Jan 18, 2022 at 09:28:16AM -0600, Tyler Hicks wrote: > KERN_ERR messages trip log scanners and cause concern that the > kernel/hardware is not configured or working correctly. They also add a > little big of ongoing stress into kernel maintainer's lives, as we > prepare and test kernel updates, since they show up as red text in > journalctl output that we have to think about regularly. Multiple > KERN_ERR messages, 8 in this case, can also be considered a little worse > than a single error message. It sounds to me like you wanna read Documentation/process/stable-kernel-rules.rst first. > I feel like this trivial fix is worth taking into stable rather than > suppressing these errors (mentally and in log scanners) for years. Years? In any case, sorry, no, I don't consider this stable material. Thx.
On 2022-01-18 18:28:16, Borislav Petkov wrote: > On Tue, Jan 18, 2022 at 09:28:16AM -0600, Tyler Hicks wrote: > > KERN_ERR messages trip log scanners and cause concern that the > > kernel/hardware is not configured or working correctly. They also add a > > little big of ongoing stress into kernel maintainer's lives, as we > > prepare and test kernel updates, since they show up as red text in > > journalctl output that we have to think about regularly. Multiple > > KERN_ERR messages, 8 in this case, can also be considered a little worse > > than a single error message. > > It sounds to me like you wanna read > > Documentation/process/stable-kernel-rules.rst > > first. I'm familiar with it and the sort of commits that flow into stable. > > I feel like this trivial fix is worth taking into stable rather than > > suppressing these errors (mentally and in log scanners) for years. > > Years? Yes, years. v5.10 is supported through 2026. > In any case, sorry, no, I don't consider this stable material. The bar varies by subsystem maintainer but this wouldn't be the first logging fix that made it into a stable branch. From the linux-5.10.y branch of linux-stable: ddb13ddacc60 scsi: pm80xx: Fix misleading log statement in pm8001_mpi_get_nvmd_resp() 526261c1b706 amd/display: downgrade validation failure log level 9a3f52f73c04 bnxt_en: Improve logging of error recovery settings information. 5f7bda9ba8d7 leds: lm3697: Don't spam logs when probe is deferred 8b195380cd07 staging: fbtft: Don't spam logs when probe is deferred ... But you do the hard work of maintaining the subsystem tree so you get to call the shots about where fixes are routed. :) Thanks for applying the change! Tyler > > Thx. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette >
On Tue, Jan 18, 2022 at 01:54:01PM -0600, Tyler Hicks wrote: > On 2022-01-18 18:28:16, Borislav Petkov wrote: > > On Tue, Jan 18, 2022 at 09:28:16AM -0600, Tyler Hicks wrote: > > > KERN_ERR messages trip log scanners and cause concern that the > > > kernel/hardware is not configured or working correctly. They also add a > > > little big of ongoing stress into kernel maintainer's lives, as we > > > prepare and test kernel updates, since they show up as red text in > > > journalctl output that we have to think about regularly. Multiple > > > KERN_ERR messages, 8 in this case, can also be considered a little worse > > > than a single error message. > > > > It sounds to me like you wanna read > > > > Documentation/process/stable-kernel-rules.rst > > > > first. > > I'm familiar with it and the sort of commits that flow into stable. > > > > I feel like this trivial fix is worth taking into stable rather than > > > suppressing these errors (mentally and in log scanners) for years. > > > > Years? > > Yes, years. v5.10 is supported through 2026. > > > In any case, sorry, no, I don't consider this stable material. > > The bar varies by subsystem maintainer but this wouldn't be the first > logging fix that made it into a stable branch. From the linux-5.10.y > branch of linux-stable: > > ddb13ddacc60 scsi: pm80xx: Fix misleading log statement in pm8001_mpi_get_nvmd_resp() > 526261c1b706 amd/display: downgrade validation failure log level > 9a3f52f73c04 bnxt_en: Improve logging of error recovery settings information. > 5f7bda9ba8d7 leds: lm3697: Don't spam logs when probe is deferred > 8b195380cd07 staging: fbtft: Don't spam logs when probe is deferred > ... Well, lemme add the stable folks for comment then - they might have had their reasons. ( Or Sasha's AI went nuts. Which I've witnessed a bunch of times already.) If I look at the stable-kernel-rules.rst file, the only rule that *maybe*, *probably* applies here is "- It must fix a real bug that bothers people" But this one is formulated so broadly so that it makes me wanna ignore it. Because *anything* can bother people - even spelling mistakes but then a later rule says no spelling fixes. Don't get me wrong - I don't mind having the stable tag where really needed. But here it is questionable. And we have those stable rules for a reason - if we start bending them and ignoring them then we might just as well backport everything that applies and have parallel kernel streams where the version means nothing. Basically a distro kernel. :-P So let's see what the stable folks say first. Thx.
On Tue, Jan 18, 2022 at 10:04:30PM +0100, Borislav Petkov wrote: > On Tue, Jan 18, 2022 at 01:54:01PM -0600, Tyler Hicks wrote: > > On 2022-01-18 18:28:16, Borislav Petkov wrote: > > > On Tue, Jan 18, 2022 at 09:28:16AM -0600, Tyler Hicks wrote: > > > > KERN_ERR messages trip log scanners and cause concern that the > > > > kernel/hardware is not configured or working correctly. They also add a > > > > little big of ongoing stress into kernel maintainer's lives, as we > > > > prepare and test kernel updates, since they show up as red text in > > > > journalctl output that we have to think about regularly. Multiple > > > > KERN_ERR messages, 8 in this case, can also be considered a little worse > > > > than a single error message. > > > > > > It sounds to me like you wanna read > > > > > > Documentation/process/stable-kernel-rules.rst > > > > > > first. > > > > I'm familiar with it and the sort of commits that flow into stable. > > > > > > I feel like this trivial fix is worth taking into stable rather than > > > > suppressing these errors (mentally and in log scanners) for years. > > > > > > Years? > > > > Yes, years. v5.10 is supported through 2026. > > > > > In any case, sorry, no, I don't consider this stable material. > > > > The bar varies by subsystem maintainer but this wouldn't be the first > > logging fix that made it into a stable branch. From the linux-5.10.y > > branch of linux-stable: > > > > ddb13ddacc60 scsi: pm80xx: Fix misleading log statement in pm8001_mpi_get_nvmd_resp() > > 526261c1b706 amd/display: downgrade validation failure log level > > 9a3f52f73c04 bnxt_en: Improve logging of error recovery settings information. > > 5f7bda9ba8d7 leds: lm3697: Don't spam logs when probe is deferred > > 8b195380cd07 staging: fbtft: Don't spam logs when probe is deferred > > ... > > Well, lemme add the stable folks for comment then - they might have had > their reasons. > > ( Or Sasha's AI went nuts. Which I've witnessed a bunch of times > already.) > > If I look at the stable-kernel-rules.rst file, the only rule that > *maybe*, *probably* applies here is > > "- It must fix a real bug that bothers people" > > But this one is formulated so broadly so that it makes me wanna ignore > it. Because *anything* can bother people - even spelling mistakes but > then a later rule says no spelling fixes. > > Don't get me wrong - I don't mind having the stable tag where really > needed. But here it is questionable. And we have those stable rules for > a reason - if we start bending them and ignoring them then we might > just as well backport everything that applies and have parallel kernel > streams where the version means nothing. Basically a distro kernel. :-P > > So let's see what the stable folks say first. I will be glad to take these types of patches if the subsystem maintainer thinks it will help things out, or if they are tired of getting emails about the misleading messages. In this case, I don't think either of those things is relevant, so I don't see why the patch should be backported. For this specific change, I do NOT think it should be backported at all, mostly for the reason that people are still arguing over the whole platform_get_*_optional() mess that we currently have. Let's not go and backport anything right now to stable trees until we have all of that sorted out, as it looks like it all might be changing again. See: https://lore.kernel.org/r/20220110195449.12448-1-s.shtylyov@omp.ru for all of the gory details and the 300+ emails written on the topic so far. Tyler, feel free to jump in to that thread if you want, it's a mess... thanks, greg k-h
On Wed, Jan 19, 2022 at 10:17:52AM +0100, Greg Kroah-Hartman wrote: > For this specific change, I do NOT think it should be backported at all, > mostly for the reason that people are still arguing over the whole > platform_get_*_optional() mess that we currently have. Let's not go and > backport anything right now to stable trees until we have all of that > sorted out, as it looks like it all might be changing again. See: > https://lore.kernel.org/r/20220110195449.12448-1-s.shtylyov@omp.ru > for all of the gory details and the 300+ emails written on the topic so > far. It sounds to me I should not even take this patch upstream yet, considering that's still ongoing...
On Wed, Jan 19, 2022 at 10:37:51AM +0100, Borislav Petkov wrote: > On Wed, Jan 19, 2022 at 10:17:52AM +0100, Greg Kroah-Hartman wrote: > > For this specific change, I do NOT think it should be backported at all, > > mostly for the reason that people are still arguing over the whole > > platform_get_*_optional() mess that we currently have. Let's not go and > > backport anything right now to stable trees until we have all of that > > sorted out, as it looks like it all might be changing again. See: > > https://lore.kernel.org/r/20220110195449.12448-1-s.shtylyov@omp.ru > > for all of the gory details and the 300+ emails written on the topic so > > far. > > It sounds to me I should not even take this patch upstream yet, > considering that's still ongoing... Yes, I would not take that just yet at all. Let's let the api argument settle down a bit first. thanks, greg k-h
On 2022-01-19 11:28:08, Greg Kroah-Hartman wrote: > On Wed, Jan 19, 2022 at 10:37:51AM +0100, Borislav Petkov wrote: > > On Wed, Jan 19, 2022 at 10:17:52AM +0100, Greg Kroah-Hartman wrote: > > > For this specific change, I do NOT think it should be backported at all, > > > mostly for the reason that people are still arguing over the whole > > > platform_get_*_optional() mess that we currently have. Let's not go and > > > backport anything right now to stable trees until we have all of that > > > sorted out, as it looks like it all might be changing again. See: > > > https://lore.kernel.org/r/20220110195449.12448-1-s.shtylyov@omp.ru > > > for all of the gory details and the 300+ emails written on the topic so > > > far. > > > > It sounds to me I should not even take this patch upstream yet, > > considering that's still ongoing... > > Yes, I would not take that just yet at all. Let's let the api argument > settle down a bit first. The API argument seems to have fizzled out in v2: https://lore.kernel.org/lkml/20220212201631.12648-1-s.shtylyov@omp.ru/ Can this fix be merged since there seem to be no API changes coming soon? Boris, feel free to strip off the cc stable tag. Tyler > > thanks, > > greg k-h >
On 2022-04-04 16:56:58, Tyler Hicks wrote: > On 2022-01-19 11:28:08, Greg Kroah-Hartman wrote: > > On Wed, Jan 19, 2022 at 10:37:51AM +0100, Borislav Petkov wrote: > > > On Wed, Jan 19, 2022 at 10:17:52AM +0100, Greg Kroah-Hartman wrote: > > > > For this specific change, I do NOT think it should be backported at all, > > > > mostly for the reason that people are still arguing over the whole > > > > platform_get_*_optional() mess that we currently have. Let's not go and > > > > backport anything right now to stable trees until we have all of that > > > > sorted out, as it looks like it all might be changing again. See: > > > > https://lore.kernel.org/r/20220110195449.12448-1-s.shtylyov@omp.ru > > > > for all of the gory details and the 300+ emails written on the topic so > > > > far. > > > > > > It sounds to me I should not even take this patch upstream yet, > > > considering that's still ongoing... > > > > Yes, I would not take that just yet at all. Let's let the api argument > > settle down a bit first. > > The API argument seems to have fizzled out in v2: > > https://lore.kernel.org/lkml/20220212201631.12648-1-s.shtylyov@omp.ru/ > > Can this fix be merged since there seem to be no API changes coming > soon? Boris, feel free to strip off the cc stable tag. Hi Boris - I just double checked that this still looks correct and applies cleanly to linux-next. Anything I can do on my end to help get this little fix merged into the ras.git tree? Thanks! Tyler > > Tyler > > > > > thanks, > > > > greg k-h > >
On Mon, Apr 18, 2022 at 03:40:29PM -0500, Tyler Hicks wrote: > > The API argument seems to have fizzled out in v2: > > > > https://lore.kernel.org/lkml/20220212201631.12648-1-s.shtylyov@omp.ru/ I don't see those two upstream yet, on a quick glance. Perhaps in Greg's tree? Greg, what's the latest with that platform_get_*_optional() fun? Also, the second of those two patches above has: + * Return: non-zero IRQ number on success, 0 if IRQ wasn't found, negative error + * number on failure. */ int platform_get_irq_byname_optional(struct platform_device *dev, and your patch does: + irq = platform_get_irq_byname_optional(pdev, dmc520_irq_configs[idx].name); irqs[idx] = irq; so on failure, it would still write the negative error value in irqs[idx]. How can that be right?
On 2022-04-18 23:13:36, Borislav Petkov wrote: > On Mon, Apr 18, 2022 at 03:40:29PM -0500, Tyler Hicks wrote: > > > The API argument seems to have fizzled out in v2: > > > > > > https://lore.kernel.org/lkml/20220212201631.12648-1-s.shtylyov@omp.ru/ > > I don't see those two upstream yet, on a quick glance. Perhaps in Greg's tree? > > Greg, what's the latest with that platform_get_*_optional() fun? > > Also, the second of those two patches above has: > > + * Return: non-zero IRQ number on success, 0 if IRQ wasn't found, negative error > + * number on failure. > */ > int platform_get_irq_byname_optional(struct platform_device *dev, > > and your patch does: > > + irq = platform_get_irq_byname_optional(pdev, dmc520_irq_configs[idx].name); > irqs[idx] = irq; > > so on failure, it would still write the negative error value in > irqs[idx]. > > How can that be right? The patches to modify the API have become stale. There have been no new comments or revisions since Feb. What I'm proposing is to proceed with merging this simple fix and let the folks discussing the API changes adjust the use in the dmc250 driver if/when they decide to revive the API changes. Tyler > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette >
On Mon, Apr 18, 2022 at 04:34:53PM -0500, Tyler Hicks wrote: > The patches to modify the API have become stale. There have been no > new comments or revisions since Feb. What I'm proposing is to proceed > with merging this simple fix and let the folks discussing the API > changes adjust the use in the dmc250 driver if/when they decide to > revive the API changes. Ok, fair enough. Queued, thanks.
diff --git a/drivers/edac/dmc520_edac.c b/drivers/edac/dmc520_edac.c index b8a7d9594afd..1fa5ca57e9ec 100644 --- a/drivers/edac/dmc520_edac.c +++ b/drivers/edac/dmc520_edac.c @@ -489,7 +489,7 @@ static int dmc520_edac_probe(struct platform_device *pdev) dev = &pdev->dev; for (idx = 0; idx < NUMBER_OF_IRQS; idx++) { - irq = platform_get_irq_byname(pdev, dmc520_irq_configs[idx].name); + irq = platform_get_irq_byname_optional(pdev, dmc520_irq_configs[idx].name); irqs[idx] = irq; masks[idx] = dmc520_irq_configs[idx].mask; if (irq >= 0) {
The dmc520 driver requires that at least one interrupt line, out of the ten possible, is configured. The driver prints an error and returns -EINVAL from its .probe function if there are no interrupt lines configured. Don't print a KERN_ERR level message for each interrupt line that's unconfigured as that can confuse users into thinking that there is an error condition. Before this change, the following KERN_ERR level messages would be reported if only dram_ecc_errc and dram_ecc_errd were configured in the device tree: dmc520 68000000.dmc: IRQ ram_ecc_errc not found dmc520 68000000.dmc: IRQ ram_ecc_errd not found dmc520 68000000.dmc: IRQ failed_access not found dmc520 68000000.dmc: IRQ failed_prog not found dmc520 68000000.dmc: IRQ link_err not dmc520 68000000.dmc: IRQ temperature_event not found dmc520 68000000.dmc: IRQ arch_fsm not found dmc520 68000000.dmc: IRQ phy_request not found Fixes: 1088750d7839 ("EDAC: Add EDAC driver for DMC520") Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Cc: <stable@vger.kernel.org> Reported-by: Sinan Kaya <okaya@kernel.org> --- drivers/edac/dmc520_edac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)