Message ID | 20240513085925.59324-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [for-4.19] x86/mtrr: avoid system wide rendezvous when setting AP MTRRs | expand |
On Mon, 2024-05-13 at 10:59 +0200, Roger Pau Monne wrote: > There's no point in forcing a system wide update of the MTRRs on all > processors > when there are no changes to be propagated. On AP startup it's only > the AP > that needs to write the system wide MTRR values in order to match the > rest of > the already online CPUs. > > We have occasionally seen the watchdog trigger during `xen-hptool > cpu-online` > in one Intel Cascade Lake box with 448 CPUs due to the re-setting of > the MTRRs > on all the CPUs in the system. > > While there adjust the comment to clarify why the system-wide > resetting of the > MTRR registers is not needed for the purposes of mtrr_ap_init(). > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > For consideration for 4.19: it's a bugfix of a rare instance of the > watchdog > triggering, but it's also a good performance improvement when > performing > cpu-online. > > Hopefully runtime changes to MTRR will affect a single MSR at a time, > lowering > the chance of the watchdog triggering due to the system-wide > resetting of the > range. Considering it as a bugfix it will be good to have it in 4.19 release: Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> ~ Oleksii > --- > xen/arch/x86/cpu/mtrr/main.c | 15 ++++++++------- > 1 file changed, 8 insertions(+), 7 deletions(-) > > diff --git a/xen/arch/x86/cpu/mtrr/main.c > b/xen/arch/x86/cpu/mtrr/main.c > index 90b235f57e68..0a44ebbcb04f 100644 > --- a/xen/arch/x86/cpu/mtrr/main.c > +++ b/xen/arch/x86/cpu/mtrr/main.c > @@ -573,14 +573,15 @@ void mtrr_ap_init(void) > if (!mtrr_if || hold_mtrr_updates_on_aps) > return; > /* > - * Ideally we should hold mtrr_mutex here to avoid mtrr > entries changed, > - * but this routine will be called in cpu boot time, holding > the lock > - * breaks it. This routine is called in two cases: 1.very > earily time > - * of software resume, when there absolutely isn't mtrr > entry changes; > - * 2.cpu hotadd time. We let mtrr_add/del_page hold > cpuhotplug lock to > - * prevent mtrr entry changes > + * hold_mtrr_updates_on_aps takes care of preventing > unnecessary MTRR > + * updates when batch starting the CPUs (see > + * mtrr_aps_sync_{begin,end}()). > + * > + * Otherwise just apply the current system wide MTRR values > to this AP. > + * Note this doesn't require synchronization with the other > CPUs, as > + * there are strictly no modifications of the current MTRR > values. > */ > - set_mtrr(~0U, 0, 0, 0); > + mtrr_set_all(); > } > > /**
On 13/05/2024 9:59 am, Roger Pau Monne wrote: > There's no point in forcing a system wide update of the MTRRs on all processors > when there are no changes to be propagated. On AP startup it's only the AP > that needs to write the system wide MTRR values in order to match the rest of > the already online CPUs. > > We have occasionally seen the watchdog trigger during `xen-hptool cpu-online` > in one Intel Cascade Lake box with 448 CPUs due to the re-setting of the MTRRs > on all the CPUs in the system. > > While there adjust the comment to clarify why the system-wide resetting of the > MTRR registers is not needed for the purposes of mtrr_ap_init(). > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > For consideration for 4.19: it's a bugfix of a rare instance of the watchdog > triggering, but it's also a good performance improvement when performing > cpu-online. > > Hopefully runtime changes to MTRR will affect a single MSR at a time, lowering > the chance of the watchdog triggering due to the system-wide resetting of the > range. "Runtime" changes will only be during dom0 boot, if at all, but yes - it is restricted to a single MTRR at a time. It's XENPF_{add,del,read}_memtype, but it's only used by Classic Linux. PVOps only issues read_memtype. Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
On 14/05/2024 12:09 pm, Andrew Cooper wrote: > On 13/05/2024 9:59 am, Roger Pau Monne wrote: >> There's no point in forcing a system wide update of the MTRRs on all processors >> when there are no changes to be propagated. On AP startup it's only the AP >> that needs to write the system wide MTRR values in order to match the rest of >> the already online CPUs. >> >> We have occasionally seen the watchdog trigger during `xen-hptool cpu-online` >> in one Intel Cascade Lake box with 448 CPUs due to the re-setting of the MTRRs >> on all the CPUs in the system. >> >> While there adjust the comment to clarify why the system-wide resetting of the >> MTRR registers is not needed for the purposes of mtrr_ap_init(). >> >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >> --- >> For consideration for 4.19: it's a bugfix of a rare instance of the watchdog >> triggering, but it's also a good performance improvement when performing >> cpu-online. >> >> Hopefully runtime changes to MTRR will affect a single MSR at a time, lowering >> the chance of the watchdog triggering due to the system-wide resetting of the >> range. > "Runtime" changes will only be during dom0 boot, if at all, but yes - it > is restricted to a single MTRR at a time. > > It's XENPF_{add,del,read}_memtype, but it's only used by Classic Linux. > PVOps only issues read_memtype. > > Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Having stared at the manuals, I expect the reason this is intermittent even with dedicated testing is because on SMM entry, CR0.CD/NW are specifically unmodified. Therefore, an SMI hitting the critical region will proceed at a glacial pace. But it does occur to me that the rendezvous is a plain rendezvous, which means it will also be taking NMIs because of the watchdog at 2Hz, and those will be glacial too. A further optimisation would be to not disable caches if there are no updates to make. This will be the overwhelming common case in general, and 100% case on CPU hot{un,}plug, but as it is, getting rid of the unnecessary rendezvous is still a massive improvement. ~Andrew
On 13.05.2024 10:59, Roger Pau Monne wrote: > --- a/xen/arch/x86/cpu/mtrr/main.c > +++ b/xen/arch/x86/cpu/mtrr/main.c > @@ -573,14 +573,15 @@ void mtrr_ap_init(void) > if (!mtrr_if || hold_mtrr_updates_on_aps) > return; > /* > - * Ideally we should hold mtrr_mutex here to avoid mtrr entries changed, > - * but this routine will be called in cpu boot time, holding the lock > - * breaks it. This routine is called in two cases: 1.very earily time > - * of software resume, when there absolutely isn't mtrr entry changes; > - * 2.cpu hotadd time. We let mtrr_add/del_page hold cpuhotplug lock to > - * prevent mtrr entry changes > + * hold_mtrr_updates_on_aps takes care of preventing unnecessary MTRR > + * updates when batch starting the CPUs (see > + * mtrr_aps_sync_{begin,end}()). > + * > + * Otherwise just apply the current system wide MTRR values to this AP. > + * Note this doesn't require synchronization with the other CPUs, as > + * there are strictly no modifications of the current MTRR values. > */ > - set_mtrr(~0U, 0, 0, 0); > + mtrr_set_all(); > } While I agree with the change here, it doesn't go quite far enough. Originally I meant to ask that, with this (supposedly) sole use of ~0U gone, you please also drop the handling of that special case from set_mtrr(). But another similar call exist in mtrr_aps_sync_end(). Yet while that's "fine" for the boot case (watchdog being started only slightly later), it doesn't look to be for the S3 resume one: The watchdog is re-enabled quite a bit earlier there. I actually wonder whether mtrr_aps_sync_{begin,end}() wouldn't better themselves invoke watchdog_{dis,en}able(), thus also making the boot case explicitly safe, not just safe because of ordering. Jan
On Tue, May 14, 2024 at 01:57:13PM +0200, Jan Beulich wrote: > On 13.05.2024 10:59, Roger Pau Monne wrote: > > --- a/xen/arch/x86/cpu/mtrr/main.c > > +++ b/xen/arch/x86/cpu/mtrr/main.c > > @@ -573,14 +573,15 @@ void mtrr_ap_init(void) > > if (!mtrr_if || hold_mtrr_updates_on_aps) > > return; > > /* > > - * Ideally we should hold mtrr_mutex here to avoid mtrr entries changed, > > - * but this routine will be called in cpu boot time, holding the lock > > - * breaks it. This routine is called in two cases: 1.very earily time > > - * of software resume, when there absolutely isn't mtrr entry changes; > > - * 2.cpu hotadd time. We let mtrr_add/del_page hold cpuhotplug lock to > > - * prevent mtrr entry changes > > + * hold_mtrr_updates_on_aps takes care of preventing unnecessary MTRR > > + * updates when batch starting the CPUs (see > > + * mtrr_aps_sync_{begin,end}()). > > + * > > + * Otherwise just apply the current system wide MTRR values to this AP. > > + * Note this doesn't require synchronization with the other CPUs, as > > + * there are strictly no modifications of the current MTRR values. > > */ > > - set_mtrr(~0U, 0, 0, 0); > > + mtrr_set_all(); > > } > > While I agree with the change here, it doesn't go quite far enough. Originally > I meant to ask that, with this (supposedly) sole use of ~0U gone, you please > also drop the handling of that special case from set_mtrr(). But another > similar call exist in mtrr_aps_sync_end(). Yet while that's "fine" for the > boot case (watchdog being started only slightly later), it doesn't look to be > for the S3 resume one: The watchdog is re-enabled quite a bit earlier there. > I actually wonder whether mtrr_aps_sync_{begin,end}() wouldn't better > themselves invoke watchdog_{dis,en}able(), thus also making the boot case > explicitly safe, not just safe because of ordering. Hm, I don't like disabling the watchdog, I guess it could be acceptable here because both usages of mtrr_aps_sync_end() are limited to specific scenarios (boot or resume from suspension). I can prepare a separate patch, but I don't think the watchdog disabling should be part of this patch. Thanks, Roger.
On 14.05.2024 15:10, Roger Pau Monné wrote: > On Tue, May 14, 2024 at 01:57:13PM +0200, Jan Beulich wrote: >> On 13.05.2024 10:59, Roger Pau Monne wrote: >>> --- a/xen/arch/x86/cpu/mtrr/main.c >>> +++ b/xen/arch/x86/cpu/mtrr/main.c >>> @@ -573,14 +573,15 @@ void mtrr_ap_init(void) >>> if (!mtrr_if || hold_mtrr_updates_on_aps) >>> return; >>> /* >>> - * Ideally we should hold mtrr_mutex here to avoid mtrr entries changed, >>> - * but this routine will be called in cpu boot time, holding the lock >>> - * breaks it. This routine is called in two cases: 1.very earily time >>> - * of software resume, when there absolutely isn't mtrr entry changes; >>> - * 2.cpu hotadd time. We let mtrr_add/del_page hold cpuhotplug lock to >>> - * prevent mtrr entry changes >>> + * hold_mtrr_updates_on_aps takes care of preventing unnecessary MTRR >>> + * updates when batch starting the CPUs (see >>> + * mtrr_aps_sync_{begin,end}()). >>> + * >>> + * Otherwise just apply the current system wide MTRR values to this AP. >>> + * Note this doesn't require synchronization with the other CPUs, as >>> + * there are strictly no modifications of the current MTRR values. >>> */ >>> - set_mtrr(~0U, 0, 0, 0); >>> + mtrr_set_all(); >>> } >> >> While I agree with the change here, it doesn't go quite far enough. Originally >> I meant to ask that, with this (supposedly) sole use of ~0U gone, you please >> also drop the handling of that special case from set_mtrr(). But another >> similar call exist in mtrr_aps_sync_end(). Yet while that's "fine" for the >> boot case (watchdog being started only slightly later), it doesn't look to be >> for the S3 resume one: The watchdog is re-enabled quite a bit earlier there. >> I actually wonder whether mtrr_aps_sync_{begin,end}() wouldn't better >> themselves invoke watchdog_{dis,en}able(), thus also making the boot case >> explicitly safe, not just safe because of ordering. > > Hm, I don't like disabling the watchdog, I guess it could be > acceptable here because both usages of mtrr_aps_sync_end() are limited > to specific scenarios (boot or resume from suspension). I can prepare > a separate patch, but I don't think the watchdog disabling should be > part of this patch. Not sure (as to being part of this patch). Of course it would be okay to address the S3 side separately, whichever approach we use. Yet imo it would also be okay to address both in one go, again whichever approach we use. If you prefer a separate one, so be it. Jan
On 14/05/2024 12:09 pm, Andrew Cooper wrote: > On 13/05/2024 9:59 am, Roger Pau Monne wrote: >> There's no point in forcing a system wide update of the MTRRs on all processors >> when there are no changes to be propagated. On AP startup it's only the AP >> that needs to write the system wide MTRR values in order to match the rest of >> the already online CPUs. >> >> We have occasionally seen the watchdog trigger during `xen-hptool cpu-online` >> in one Intel Cascade Lake box with 448 CPUs due to the re-setting of the MTRRs >> on all the CPUs in the system. >> >> While there adjust the comment to clarify why the system-wide resetting of the >> MTRR registers is not needed for the purposes of mtrr_ap_init(). >> >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >> --- >> For consideration for 4.19: it's a bugfix of a rare instance of the watchdog >> triggering, but it's also a good performance improvement when performing >> cpu-online. >> >> Hopefully runtime changes to MTRR will affect a single MSR at a time, lowering >> the chance of the watchdog triggering due to the system-wide resetting of the >> range. > "Runtime" changes will only be during dom0 boot, if at all, but yes - it > is restricted to a single MTRR at a time. > > It's XENPF_{add,del,read}_memtype, but it's only used by Classic Linux. > PVOps only issues read_memtype. > > Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Actually no - this isn't safe in all cases. There are BIOSes which get MTRRs wrong, and with the APs having UC covering a wider region than the BSP. In this case, creating consistency will alter the MTRRs on all CPUs currently up, and we do need to perform the rendezvous in that case. There are 3 cases: 1) Nothing to do. This is the overwhemlingly common case. 2) Local changes only. No broadcast, but we do need to enter CD mode. 3) Remote changes needed. Needs full broadcast. ~Andrew
On Tue, May 14, 2024 at 02:50:18PM +0100, Andrew Cooper wrote: > On 14/05/2024 12:09 pm, Andrew Cooper wrote: > > On 13/05/2024 9:59 am, Roger Pau Monne wrote: > >> There's no point in forcing a system wide update of the MTRRs on all processors > >> when there are no changes to be propagated. On AP startup it's only the AP > >> that needs to write the system wide MTRR values in order to match the rest of > >> the already online CPUs. > >> > >> We have occasionally seen the watchdog trigger during `xen-hptool cpu-online` > >> in one Intel Cascade Lake box with 448 CPUs due to the re-setting of the MTRRs > >> on all the CPUs in the system. > >> > >> While there adjust the comment to clarify why the system-wide resetting of the > >> MTRR registers is not needed for the purposes of mtrr_ap_init(). > >> > >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > >> --- > >> For consideration for 4.19: it's a bugfix of a rare instance of the watchdog > >> triggering, but it's also a good performance improvement when performing > >> cpu-online. > >> > >> Hopefully runtime changes to MTRR will affect a single MSR at a time, lowering > >> the chance of the watchdog triggering due to the system-wide resetting of the > >> range. > > "Runtime" changes will only be during dom0 boot, if at all, but yes - it > > is restricted to a single MTRR at a time. > > > > It's XENPF_{add,del,read}_memtype, but it's only used by Classic Linux. > > PVOps only issues read_memtype. > > > > Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> > > Actually no - this isn't safe in all cases. > > There are BIOSes which get MTRRs wrong, and with the APs having UC > covering a wider region than the BSP. > > In this case, creating consistency will alter the MTRRs on all CPUs > currently up, and we do need to perform the rendezvous in that case. I'm confused, the state that gets applied in mtrr_set_all() is not modified to match what's in the started AP registers. An AP starting with a different set of MTRR registers than the saved state will result in the MTRR state on the AP being changed, but not the Xen state stored in mtrr_state, and hence there will be no changes to synchronize. > There are 3 cases: > > 1) Nothing to do. This is the overwhemlingly common case. > 2) Local changes only. No broadcast, but we do need to enter CD mode. > 3) Remote changes needed. Needs full broadcast. Please bear with me, but I don't think 3) is possible during AP bringup. It's possible I'm missing a path where the differences in the started AP MTRR state are somehow reconciled with the cached MTRR state? Thanks, Roger.
On 14/05/2024 3:43 pm, Roger Pau Monné wrote: > On Tue, May 14, 2024 at 02:50:18PM +0100, Andrew Cooper wrote: >> On 14/05/2024 12:09 pm, Andrew Cooper wrote: >>> On 13/05/2024 9:59 am, Roger Pau Monne wrote: >>>> There's no point in forcing a system wide update of the MTRRs on all processors >>>> when there are no changes to be propagated. On AP startup it's only the AP >>>> that needs to write the system wide MTRR values in order to match the rest of >>>> the already online CPUs. >>>> >>>> We have occasionally seen the watchdog trigger during `xen-hptool cpu-online` >>>> in one Intel Cascade Lake box with 448 CPUs due to the re-setting of the MTRRs >>>> on all the CPUs in the system. >>>> >>>> While there adjust the comment to clarify why the system-wide resetting of the >>>> MTRR registers is not needed for the purposes of mtrr_ap_init(). >>>> >>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >>>> --- >>>> For consideration for 4.19: it's a bugfix of a rare instance of the watchdog >>>> triggering, but it's also a good performance improvement when performing >>>> cpu-online. >>>> >>>> Hopefully runtime changes to MTRR will affect a single MSR at a time, lowering >>>> the chance of the watchdog triggering due to the system-wide resetting of the >>>> range. >>> "Runtime" changes will only be during dom0 boot, if at all, but yes - it >>> is restricted to a single MTRR at a time. >>> >>> It's XENPF_{add,del,read}_memtype, but it's only used by Classic Linux. >>> PVOps only issues read_memtype. >>> >>> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> >> Actually no - this isn't safe in all cases. >> >> There are BIOSes which get MTRRs wrong, and with the APs having UC >> covering a wider region than the BSP. >> >> In this case, creating consistency will alter the MTRRs on all CPUs >> currently up, and we do need to perform the rendezvous in that case. > I'm confused, the state that gets applied in mtrr_set_all() is not > modified to match what's in the started AP registers. > > An AP starting with a different set of MTRR registers than the saved > state will result in the MTRR state on the AP being changed, but not > the Xen state stored in mtrr_state, and hence there will be no changes > to synchronize. > >> There are 3 cases: >> >> 1) Nothing to do. This is the overwhemlingly common case. >> 2) Local changes only. No broadcast, but we do need to enter CD mode. >> 3) Remote changes needed. Needs full broadcast. > Please bear with me, but I don't think 3) is possible during AP > bringup. It's possible I'm missing a path where the differences in > the started AP MTRR state are somehow reconciled with the cached MTRR > state? [Summarising a conversation on Matrix] The problem case is when the BSP has an MTRR covering [$X, $X+2) and an AP has has an MTRR covering [$X, $X+3). This is a firmware bug (asymmetric settings), but it has been observed in practice. The resolution in this case ought to be to update all CPUs to use [$X, $X+3), if that is the more UC direction. However, it appears that Xen always resolves asymmetry like this by choosing the BSP setting. Therefore, (whether we should or not), we don't have a case where observing an AP state results in a change of state on other CPUs. Therefore while case 3 exists in reality, we're not losing it as a side effect of this patch. So we'll take the improvement here and defer the other bugs to a future juncture. ~Andrew
diff --git a/xen/arch/x86/cpu/mtrr/main.c b/xen/arch/x86/cpu/mtrr/main.c index 90b235f57e68..0a44ebbcb04f 100644 --- a/xen/arch/x86/cpu/mtrr/main.c +++ b/xen/arch/x86/cpu/mtrr/main.c @@ -573,14 +573,15 @@ void mtrr_ap_init(void) if (!mtrr_if || hold_mtrr_updates_on_aps) return; /* - * Ideally we should hold mtrr_mutex here to avoid mtrr entries changed, - * but this routine will be called in cpu boot time, holding the lock - * breaks it. This routine is called in two cases: 1.very earily time - * of software resume, when there absolutely isn't mtrr entry changes; - * 2.cpu hotadd time. We let mtrr_add/del_page hold cpuhotplug lock to - * prevent mtrr entry changes + * hold_mtrr_updates_on_aps takes care of preventing unnecessary MTRR + * updates when batch starting the CPUs (see + * mtrr_aps_sync_{begin,end}()). + * + * Otherwise just apply the current system wide MTRR values to this AP. + * Note this doesn't require synchronization with the other CPUs, as + * there are strictly no modifications of the current MTRR values. */ - set_mtrr(~0U, 0, 0, 0); + mtrr_set_all(); } /**
There's no point in forcing a system wide update of the MTRRs on all processors when there are no changes to be propagated. On AP startup it's only the AP that needs to write the system wide MTRR values in order to match the rest of the already online CPUs. We have occasionally seen the watchdog trigger during `xen-hptool cpu-online` in one Intel Cascade Lake box with 448 CPUs due to the re-setting of the MTRRs on all the CPUs in the system. While there adjust the comment to clarify why the system-wide resetting of the MTRR registers is not needed for the purposes of mtrr_ap_init(). Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- For consideration for 4.19: it's a bugfix of a rare instance of the watchdog triggering, but it's also a good performance improvement when performing cpu-online. Hopefully runtime changes to MTRR will affect a single MSR at a time, lowering the chance of the watchdog triggering due to the system-wide resetting of the range. --- xen/arch/x86/cpu/mtrr/main.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)