Message ID | 20180921221705.6478-1-james.morse@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | APEI in_nmi() rework | expand |
On Fri, Sep 21, 2018 at 11:16:47PM +0100, James Morse wrote: > Hello, > > The GHES driver has collected quite a few bugs: > > ghes_proc() at ghes_probe() time can be interrupted by an NMI that > will clobber the ghes->estatus fields, flags, and the buffer_paddr. > > ghes_copy_tofrom_phys() uses in_nmi() to decide which path to take. arm64's > SEA taking both paths, depending on what it interrupted. > > There is no guarantee that queued memory_failure() errors will be processed > before this CPU returns to user-space. > > x86 can't TLBI from interrupt-masked code which this driver does all the > time. > > > This series aims to fix the first three, with an eye to fixing the > last one with a follow-up series. > > Previous postings included the SDEI notification calls, which I haven't > finished re-testing. This series is big enough as it is. Yeah, and everywhere I look, this thing looks overengineered. Like, for example, what's the purpose of this ghes_esource_prealloc_size() computing a size each time the pool changes size? AFAICT, this size can be computed exactly *once* at driver init and be done with it. Right? Or am I missing something subtle?
Hi Boris, On 25/09/18 13:45, Borislav Petkov wrote: > On Fri, Sep 21, 2018 at 11:16:47PM +0100, James Morse wrote: >> Hello, >> >> The GHES driver has collected quite a few bugs: >> >> ghes_proc() at ghes_probe() time can be interrupted by an NMI that >> will clobber the ghes->estatus fields, flags, and the buffer_paddr. >> >> ghes_copy_tofrom_phys() uses in_nmi() to decide which path to take. arm64's >> SEA taking both paths, depending on what it interrupted. >> >> There is no guarantee that queued memory_failure() errors will be processed >> before this CPU returns to user-space. >> >> x86 can't TLBI from interrupt-masked code which this driver does all the >> time. >> >> >> This series aims to fix the first three, with an eye to fixing the >> last one with a follow-up series. >> >> Previous postings included the SDEI notification calls, which I haven't >> finished re-testing. This series is big enough as it is. > Yeah, and everywhere I look, this thing looks overengineered. Like, > for example, what's the purpose of this ghes_esource_prealloc_size() > computing a size each time the pool changes size? The size to grow the pool by, because each error-source described by a GHES entry has its own worst-case size. Today ghes_nmi_add() does this each time its called. You could have multiple GHES entries in the HEST that describe NMI as the notification. The worst-case size for the records is described in the GHES entry, and could be different for each one. (error_block_length and records_to_preallocate, or table 18-379 of acpi v6.2) These different error-sources could be delivered on different CPUs at the same time, so need their own pre-allocated reserved memory. ghes_notify_nmi()'s atomic_add_unless() suggests this can happen on x86, but I don't know the arch-specifics. It definitely can happen on arm64. > AFAICT, this size can be computed exactly *once* at driver init and be > done with it. Right? We could do two passes of the HEST to pre-compute the total size of this estatus-queue memory, allocate it, then do the notification registration stuff. But this doesn't really work with the way this driver acts as platform-driver for a ghes device... The non-ghes HEST entries have a "number of records to pre-allocate" too, we could make this memory pool something hest.c looks after, but I can't see if the other error sources use those values. Hmmm, The size is capped to 64K, we could ignore the firmware description of the memory requirements, and allocate SZ_64K each time. Doing it per-GHES is still the only way to avoid allocating nmi-safe memory for irqs. Thanks, James
On Wed, Oct 03, 2018 at 06:50:38PM +0100, James Morse wrote: ... > The non-ghes HEST entries have a "number of records to pre-allocate" too, we > could make this memory pool something hest.c looks after, but I can't see if the > other error sources use those values. Thanks for the detailed analysis! > Hmmm, The size is capped to 64K, we could ignore the firmware description of the > memory requirements, and allocate SZ_64K each time. Doing it per-GHES is still > the only way to avoid allocating nmi-safe memory for irqs. Right, so I'm thinking a lot simpler: allocate a pool which should be large enough to handle all situations and drop all that logic which recomputes and reallocates pool size. Just a static thing which JustWorks(tm). For a couple of reasons: - you state it above: all those synchronization issues are gone with a prellocated pool - 64K per-GHES pool is nothing if you consider the machines this thing runs on - fat servers with lotsa memory. And RAS there *is* important. And TBH 64K is nothing even on a small client sporting gigabytes of memory. - code is a lot simpler and cleaner - you don't need all that pool expanding and shrinking. I mean, I'm all for smarter solutions if they have any clear advantages warranting the complication but this is a lot of machinery just so that we can save a couple of KBs. Which, as a whole, sounds just too much to me. But this is just me.