Message ID | cover.1736341506.git.karolina.stolarek@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | Rate limit reporting of Correctable Errors | expand |
On Wed, 8 Jan 2025 13:55:30 +0000 Karolina Stolarek <karolina.stolarek@oracle.com> wrote: > TL;DR > ==== > > We are getting multiple reports about excessive logging of Correctable > Errors with no clear common root cause. As these errors are already > corrected by hardware, it makes sense to limit them. Introduce > a ratelimit state definition to pci_dev to control the number of > messages reported by a Root Port within a specified time interval. > The series adds other improvements in the area, as outlined in the > Proposal section. Hi Karolina, This is a common impediment for many folks that want to enable AER. The excessive logging stalls execution, making machines unusable. I've been working on a similar solution[1] to yours (i.e. ratelimiting) with a few differences: - ratelimit uncorrectable errors - ratelimit IRQs - configure ratelimits from userspace (sysfs knobs) Hoping we can collaborate on a solution (i.e. take best parts of both patch series). Thanks, Jon [1] https://lore.kernel.org/linux-pci/20250115074301.3514927-1-pandoh@google.com/
Hi Jon, Many thanks for reaching out. On 15/01/2025 08:55, Jon Pan-Doh wrote: > On Wed, 8 Jan 2025 13:55:30 +0000 > Karolina Stolarek <karolina.stolarek@oracle.com> wrote: >> TL;DR >> ==== >> >> We are getting multiple reports about excessive logging of Correctable >> Errors with no clear common root cause. As these errors are already >> corrected by hardware, it makes sense to limit them. Introduce >> a ratelimit state definition to pci_dev to control the number of >> messages reported by a Root Port within a specified time interval. >> The series adds other improvements in the area, as outlined in the >> Proposal section. > > Hi Karolina, > > This is a common impediment for many folks that want to enable AER. The > excessive logging stalls execution, making machines unusable. I've been > working on a similar solution[1] to yours (i.e. ratelimiting) with a few > differences: > > - ratelimit uncorrectable errors > - ratelimit IRQs > - configure ratelimits from userspace (sysfs knobs) > > Hoping we can collaborate on a solution (i.e. take best parts of both patch > series). That indeed looks like a more robust solution, I'm more than happy to join forces and work on this together. Feel free to incorporate the 1/4 patch into your series. I plan to do a proper review tomorrow. Out of curiosity, do your patches apply to cleanly to pci/err and/or pci-next branches? From what I can see, "PCI: Consolidate TLP Log reading and printing" series[1] had been just merged, so there could be conflicts. All the best, Karolina -------------------------------------------------------------- [1] - https://lore.kernel.org/linux-pci/20250114170840.1633-1- ilpo.jarvinen@linux.intel.com/ > > Thanks, > Jon > > [1] https://lore.kernel.org/linux-pci/20250115074301.3514927-1-pandoh@google.com/
On Wed, Jan 15, 2025 at 6:18 AM Karolina Stolarek <karolina.stolarek@oracle.com> wrote: > Feel free to incorporate the 1/4 patch into your series. I plan to do a > proper review tomorrow. Ack. > Out of curiosity, do your patches apply to cleanly to pci/err and/or > pci-next branches? From what I can see, "PCI: Consolidate TLP Log > reading and printing" series[1] had been just merged, so there could be > conflicts. Good catch. I rebased off of Linus' master branch so it's unlikely to be clean. I'll rebase off of pci-next in the next version. Thanks, Jon