Message ID | 20230728075903.7838-6-alejandro.vallejo@cloud.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Make PDX compression optional | expand |
Hi, On 28/07/2023 08:59, Alejandro Vallejo wrote: > diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h > index 5a82b6bde2..dfb475c8dc 100644 > --- a/xen/include/xen/pdx.h > +++ b/xen/include/xen/pdx.h > @@ -67,8 +67,6 @@ > * region involved. > */ > > -#ifdef CONFIG_HAS_PDX > - > extern unsigned long max_pdx; > > #define PDX_GROUP_COUNT ((1 << PDX_GROUP_SHIFT) / \ > @@ -100,6 +98,8 @@ bool __mfn_valid(unsigned long mfn); > #define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn)) > #define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx)) > > +#ifdef CONFIG_PDX_COMPRESSION > + > extern unsigned long pfn_pdx_bottom_mask, ma_va_bottom_mask; > extern unsigned int pfn_pdx_hole_shift; > extern unsigned long pfn_hole_mask; > @@ -205,8 +205,39 @@ static inline uint64_t directmapoff_to_maddr(unsigned long offset) > * position marks a potentially compressible bit. > */ > void pfn_pdx_hole_setup(unsigned long mask); > +#else /* CONFIG_PDX_COMPRESSION */ Looking at other places, we tend to put the reason the #else be executed. In this case, it is !CONFIG_PDX_COMPRESSION. Other than that, the changes looks good to me: Reviewed-by: Julien Grall <jgrall@amazon.com> Cheers,
On 28/07/2023 8:59 am, Alejandro Vallejo wrote: > Adds a new compile-time flag to allow disabling pdx compression and > compiles out compression-related code/data. It also shorts the pdx<->pfn > conversion macros and creates stubs for masking fucntions. > > While at it, removes the old arch-defined CONFIG_HAS_PDX flag, as it was > not removable in practice. > > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> > --- > v2: > * Merged v1/patch2: Removal of CONFIG_HAS_PDX here (Jan) This series is now looking fine, except for the Kconfig aspect. This is not something any user or developer should ever be queried about. The feedback on the documentation patches alone show that it's not understood well by the maintainers, even if the principle is accepted. There is never any reason to have this active on x86. Indeed, Julien's quick metric shows how much performance we waste by having it enabled. This patch ought to just rename HAS_PDX to PDX, and keep it selected by the subset of architectures that still want it. ~Andrew
On 28/07/2023 5:36 pm, Andrew Cooper wrote: > On 28/07/2023 8:59 am, Alejandro Vallejo wrote: >> Adds a new compile-time flag to allow disabling pdx compression and >> compiles out compression-related code/data. It also shorts the pdx<->pfn >> conversion macros and creates stubs for masking fucntions. >> >> While at it, removes the old arch-defined CONFIG_HAS_PDX flag, as it was >> not removable in practice. >> >> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> >> --- >> v2: >> * Merged v1/patch2: Removal of CONFIG_HAS_PDX here (Jan) > This series is now looking fine, except for the Kconfig aspect. > > This is not something any user or developer should ever be queried > about. The feedback on the documentation patches alone show that it's > not understood well by the maintainers, even if the principle is accepted. > > There is never any reason to have this active on x86. Indeed, Julien's > quick metric shows how much performance we waste by having it enabled. Further to this, bloat-o-meter says net -30k of code and there are plenty of fastpaths getting a several cacheline reduction from this. ~Andrew
On 28.07.2023 18:58, Andrew Cooper wrote: > On 28/07/2023 5:36 pm, Andrew Cooper wrote: >> On 28/07/2023 8:59 am, Alejandro Vallejo wrote: >>> Adds a new compile-time flag to allow disabling pdx compression and >>> compiles out compression-related code/data. It also shorts the pdx<->pfn >>> conversion macros and creates stubs for masking fucntions. >>> >>> While at it, removes the old arch-defined CONFIG_HAS_PDX flag, as it was >>> not removable in practice. >>> >>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> >>> --- >>> v2: >>> * Merged v1/patch2: Removal of CONFIG_HAS_PDX here (Jan) >> This series is now looking fine, except for the Kconfig aspect. >> >> This is not something any user or developer should ever be queried >> about. The feedback on the documentation patches alone show that it's >> not understood well by the maintainers, even if the principle is accepted. >> >> There is never any reason to have this active on x86. We can of course continue to disagree here. At least with EXPERT=y selecting this option ought to remain possible for x86. Whether or not the original systems this scheme was developed for ever went public, such systems did (do) exist, and hence running Xen sensibly on them (without losing all memory except that on node 0) ought to be possible. > Indeed, Julien's >> quick metric shows how much performance we waste by having it enabled. > > Further to this, bloat-o-meter says net -30k of code and there are > plenty of fastpaths getting a several cacheline reduction from this. A similar reduction was achieved by the BMI2-alt-patching series I had put together, yet you weren't willing to come to consensus on it. Jan
On 28/07/2023 17:36, Andrew Cooper wrote: > On 28/07/2023 8:59 am, Alejandro Vallejo wrote: >> Adds a new compile-time flag to allow disabling pdx compression and >> compiles out compression-related code/data. It also shorts the pdx<->pfn >> conversion macros and creates stubs for masking fucntions. >> >> While at it, removes the old arch-defined CONFIG_HAS_PDX flag, as it was >> not removable in practice. >> >> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> >> --- >> v2: >> * Merged v1/patch2: Removal of CONFIG_HAS_PDX here (Jan) > > This series is now looking fine, except for the Kconfig aspect. > > This is not something any user or developer should ever be queried > about. The feedback on the documentation patches alone show that it's > not understood well by the maintainers, even if the principle is accepted. > > There is never any reason to have this active on x86. I can't really speak for x86. However, for Arm, I think it could be useful in the case there is a system where the frametable size doesn't change with CONFIG_PDX=n and/or the integrator is happy to waste a bit more memory to gain performance. Also, to reply to Jan's, I don't think this is a sort of option that should be gated by expert (againt at least on Arm). Xen ought to work fine with and without PDX enabled. From my understanding, the only differences are the amount of memory wasted vs performance gain. Cheers,
On 28.07.2023 09:59, Alejandro Vallejo wrote: > --- a/xen/common/pdx.c > +++ b/xen/common/pdx.c > @@ -31,11 +31,15 @@ unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS( > > bool __mfn_valid(unsigned long mfn) > { > - if ( unlikely(evaluate_nospec(mfn >= max_page)) ) > + bool invalid = mfn >= max_page; > +#ifdef CONFIG_PDX_COMPRESSION > + invalid |= mfn & pfn_hole_mask; > +#endif Nit: Declaration(s) and statement(s) separated by a blank line please. > @@ -49,6 +53,8 @@ void set_pdx_range(unsigned long smfn, unsigned long emfn) > __set_bit(idx, pdx_group_valid); > } > > +#ifdef CONFIG_PDX_COMPRESSION > + > /* > * Diagram to make sense of the following variables. The masks and shifts > * are done on mfn values in order to convert to/from pdx: Nit: With a blank line after #ifdef, > @@ -175,6 +181,7 @@ void __init pfn_pdx_hole_setup(unsigned long mask) > pfn_top_mask = ~(pfn_pdx_bottom_mask | pfn_hole_mask); > ma_top_mask = pfn_top_mask << PAGE_SHIFT; > } > +#endif /* CONFIG_PDX_COMPRESSION */ > > > /* ... we would typically also have one before #endif. In the case here you could even leverage that there are already (wrongly) two consecutive blank lines. > @@ -100,6 +98,8 @@ bool __mfn_valid(unsigned long mfn); > #define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn)) > #define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx)) > > +#ifdef CONFIG_PDX_COMPRESSION > + > extern unsigned long pfn_pdx_bottom_mask, ma_va_bottom_mask; > extern unsigned int pfn_pdx_hole_shift; > extern unsigned long pfn_hole_mask; > @@ -205,8 +205,39 @@ static inline uint64_t directmapoff_to_maddr(unsigned long offset) > * position marks a potentially compressible bit. > */ > void pfn_pdx_hole_setup(unsigned long mask); > +#else /* CONFIG_PDX_COMPRESSION */ > + > +/* Without PDX compression we can skip some computations */ Same here for the #else then. Jan
On 31/07/2023 9:00 am, Jan Beulich wrote: > On 28.07.2023 18:58, Andrew Cooper wrote: >> On 28/07/2023 5:36 pm, Andrew Cooper wrote: >>> On 28/07/2023 8:59 am, Alejandro Vallejo wrote: >>>> Adds a new compile-time flag to allow disabling pdx compression and >>>> compiles out compression-related code/data. It also shorts the pdx<->pfn >>>> conversion macros and creates stubs for masking fucntions. >>>> >>>> While at it, removes the old arch-defined CONFIG_HAS_PDX flag, as it was >>>> not removable in practice. >>>> >>>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> >>>> --- >>>> v2: >>>> * Merged v1/patch2: Removal of CONFIG_HAS_PDX here (Jan) >>> This series is now looking fine, except for the Kconfig aspect. >>> >>> This is not something any user or developer should ever be queried >>> about. The feedback on the documentation patches alone show that it's >>> not understood well by the maintainers, even if the principle is accepted. >>> >>> There is never any reason to have this active on x86. > We can of course continue to disagree here. At least with EXPERT=y > selecting this option ought to remain possible for x86. Whether or > not the original systems this scheme was developed for ever went > public, such systems did (do) exist, and hence running Xen sensibly > on them (without losing all memory except that on node 0) ought to > be possible. There's one system which never made its way into production, support-for-which in the no-op case is causing a 10% perf hit in at least one metric, 17% in another. ... and your seriously arguing that we should continue to take this perf hit? It is very likely that said machine(s) aren't even powered on these days, and even if it is on, the vendor can take the overhead of turning PDX compression on until such time as they make a production system. Furthermore, it is unrealistic to think that such a machine will ever make its way into production. Linux has never PDX compression, and by-and-large if it doesn't run Linux, you can't sell it in the first place. It is utterly unreasonable to be carrying this overhead in the first place. PDX compression *should not* have been committed on-by-default in the first place. (Yes, I know there was no Kconfig back then, and the review process was non-existent, but someone really should have said no). It is equally unreasonable to offer people (under Expert or not) an ability to shoot themselves in the foot like this. If in the very unlikely case that such a system does come into existence, we can consider re-enabling PDX compression (and by that, I mean doing it in a less invasive way in the common case), but there's very little chance this will ever be a path we need to take. >> Indeed, Julien's >>> quick metric shows how much performance we waste by having it enabled. >> Further to this, bloat-o-meter says net -30k of code and there are >> plenty of fastpaths getting a several cacheline reduction from this. > A similar reduction was achieved Really? You think that replacing the innermost shift and masking with an alternative that has a shorter instruction sequence gets you the same net reduction in code? I do not find that claim credible. > by the BMI2-alt-patching series I > had put together, yet you weren't willing to come to consensus on > it. You have AMD machines, and your patch was alleged to be a performance improvement. So the fact you didn't spot the problems with PEXT/PDEP on all AMD hardware prior to Fam19h suggests there was insufficient testing for an alleged performance improvement. The patch you posted: 1) Added extra complexity via alternatives, and 2) Reduced performance on AMD systems prior to Fam19h. in an area of code which useless on all shipping x86 systems. You literally micro-anti-optimised a no-op path to a more expensive (on one vendor at least) no-op path, claiming it to be a performance improvement. There is no possible way any form of your patch can ever beat Alejandro's work of just compiling-out the useless logic wholesale. ~Andrew
On 31.07.2023 19:38, Andrew Cooper wrote: > On 31/07/2023 9:00 am, Jan Beulich wrote: >> On 28.07.2023 18:58, Andrew Cooper wrote: >>> On 28/07/2023 5:36 pm, Andrew Cooper wrote: >>>> On 28/07/2023 8:59 am, Alejandro Vallejo wrote: >>>>> Adds a new compile-time flag to allow disabling pdx compression and >>>>> compiles out compression-related code/data. It also shorts the pdx<->pfn >>>>> conversion macros and creates stubs for masking fucntions. >>>>> >>>>> While at it, removes the old arch-defined CONFIG_HAS_PDX flag, as it was >>>>> not removable in practice. >>>>> >>>>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> >>>>> --- >>>>> v2: >>>>> * Merged v1/patch2: Removal of CONFIG_HAS_PDX here (Jan) >>>> This series is now looking fine, except for the Kconfig aspect. >>>> >>>> This is not something any user or developer should ever be queried >>>> about. The feedback on the documentation patches alone show that it's >>>> not understood well by the maintainers, even if the principle is accepted. >>>> >>>> There is never any reason to have this active on x86. >> We can of course continue to disagree here. At least with EXPERT=y >> selecting this option ought to remain possible for x86. Whether or >> not the original systems this scheme was developed for ever went >> public, such systems did (do) exist, and hence running Xen sensibly >> on them (without losing all memory except that on node 0) ought to >> be possible. > > There's one system which never made its way into production, > support-for-which in the no-op case is causing a 10% perf hit in at > least one metric, 17% in another. > > ... and your seriously arguing that we should continue to take this perf > hit? Have I asked anywhere to keep this enabled by default? Have I been the one to first propose a way to reduce this overhead? > It is very likely that said machine(s) aren't even powered on these > days, and even if it is on, the vendor can take the overhead of turning > PDX compression on until such time as they make a production system. > > Furthermore, it is unrealistic to think that such a machine will ever > make its way into production. Linux has never PDX compression, and > by-and-large if it doesn't run Linux, you can't sell it in the first place. I'm sure you recall that Linux has much more VA space for its directmap, so aiui there simply was no immediate need there. > It is utterly unreasonable to be carrying this overhead in the first > place. PDX compression *should not* have been committed on-by-default > in the first place. (Yes, I know there was no Kconfig back then, and > the review process was non-existent, but someone really should have said > no). Are you suggesting no code should be committed supporting new hardware, ahead of that hardware going public? > It is equally unreasonable to offer people (under Expert or not) an > ability to shoot themselves in the foot like this. Shoot themselves in the foot? If you enable EXPERT, you better know what you're doing. >>> Indeed, Julien's >>>> quick metric shows how much performance we waste by having it enabled. >>> Further to this, bloat-o-meter says net -30k of code and there are >>> plenty of fastpaths getting a several cacheline reduction from this. >> A similar reduction was achieved > > Really? You think that replacing the innermost shift and masking with > an alternative that has a shorter instruction sequence gets you the same > net reduction in code? > > I do not find that claim credible. Did you ever seriously look at those patches? >> by the BMI2-alt-patching series I >> had put together, yet you weren't willing to come to consensus on >> it. > > You have AMD machines, and your patch was alleged to be a performance > improvement. So the fact you didn't spot the problems with PEXT/PDEP on > all AMD hardware prior to Fam19h suggests there was insufficient testing > for an alleged performance improvement. > > The patch you posted: > > 1) Added extra complexity via alternatives, and > 2) Reduced performance on AMD systems prior to Fam19h. > > in an area of code which useless on all shipping x86 systems. > > You literally micro-anti-optimised a no-op path to a more expensive (on > one vendor at least) no-op path, claiming it to be a performance > improvement. You appear to forget the patch patching to MOV instructions ("x86: use MOV for PFN/PDX conversion when possible"). Are you saying MOV has performance problems on any CPU? Jan
Hi, A few days have passed. May I suggest taking a step back? On Tue, Aug 01, 2023 at 09:57:57AM +0200, Jan Beulich wrote: > On 31.07.2023 19:38, Andrew Cooper wrote: > > There's one system which never made its way into production, > > support-for-which in the no-op case is causing a 10% perf hit in at > > least one metric, 17% in another. > > > > ... and your seriously arguing that we should continue to take this perf > > hit? > > Have I asked anywhere to keep this enabled by default? Have I been the > one to first propose a way to reduce this overhead? > > > It is very likely that said machine(s) aren't even powered on these > > days, and even if it is on, the vendor can take the overhead of turning > > PDX compression on until such time as they make a production system. > > > > Furthermore, it is unrealistic to think that such a machine will ever > > make its way into production. Linux has never PDX compression, and > > by-and-large if it doesn't run Linux, you can't sell it in the first place. > > I'm sure you recall that Linux has much more VA space for its directmap, > so aiui there simply was no immediate need there. > > > It is utterly unreasonable to be carrying this overhead in the first > > place. PDX compression *should not* have been committed on-by-default > > in the first place. (Yes, I know there was no Kconfig back then, and > > the review process was non-existent, but someone really should have said > > no). > > Are you suggesting no code should be committed supporting new hardware, > ahead of that hardware going public? > > > It is equally unreasonable to offer people (under Expert or not) an > > ability to shoot themselves in the foot like this. > > Shoot themselves in the foot? If you enable EXPERT, you better know > what you're doing. > > >>> Indeed, Julien's > >>>> quick metric shows how much performance we waste by having it enabled. > >>> Further to this, bloat-o-meter says net -30k of code and there are > >>> plenty of fastpaths getting a several cacheline reduction from this. > >> A similar reduction was achieved > > > > Really? You think that replacing the innermost shift and masking with > > an alternative that has a shorter instruction sequence gets you the same > > net reduction in code? > > > > I do not find that claim credible. > > Did you ever seriously look at those patches? > > >> by the BMI2-alt-patching series I > >> had put together, yet you weren't willing to come to consensus on > >> it. > > > > You have AMD machines, and your patch was alleged to be a performance > > improvement. So the fact you didn't spot the problems with PEXT/PDEP on > > all AMD hardware prior to Fam19h suggests there was insufficient testing > > for an alleged performance improvement. > > > > The patch you posted: > > > > 1) Added extra complexity via alternatives, and > > 2) Reduced performance on AMD systems prior to Fam19h. > > > > in an area of code which useless on all shipping x86 systems. > > > > You literally micro-anti-optimised a no-op path to a more expensive (on > > one vendor at least) no-op path, claiming it to be a performance > > improvement. > > You appear to forget the patch patching to MOV instructions ("x86: use > MOV for PFN/PDX conversion when possible"). Are you saying MOV has > performance problems on any CPU? > > Jan I think we can all agree that (a) the _current_ pdx code implies taking a perf hit and (b) that we can't just remove PDX compression because it's in current use in certain systems. The contentious points seem to be strictly in the status it should hold by default and the means to reduce the perf hit when it's on. For the status: I think it's clear it must remain at least on ARM. I believe Jan argued about the choice of default in terms of "when" to turn it off, rather than "if". Seeing that... > On 18.07.2023 9:33, Jan Beulich wrote: > I disagree with this choice of default for x86. To avoid surprising > downstreams, this should at best be a two-step process: Keep the > default as Y right now, and switch to N a couple of releases later. ... so I think a more productive discussion is following up on that, looking at the "why", "when", "how" and "risks-involved" rather than going in circles (more on this at the end of the email). Now, for the perf hit reduction: An alt-patching series can probably make it very close to the perf win of this patch as long as it transforms the conversion hunks into no-ops when there's no hole. I looked into the 2018 patch, and I don't think it tried to go that far (afaics it's purpose is to inline the compression parameters into the code stream). I highly suspect it would still noticiably underperform compared to this one, but I admit it's guesswork and I'd be happy to be proven wrong through benchmarks. Regardless that's a lot more complexity than I was willing to tackle here seeing that it's use is limited on x86. Compiling-out and alt-patching compression are not necessarily incompatible with each other though. It would, in fact, do wonders for ARM, where the exact same binary might have to run on separate processors with different memory map configurations. I do think someone ought to do it in the long run as a performance improvement for that port (if only because alt-patching is arch-specific), but I really struggle to justify the dev effort of writing, benchmarking, testing and maintining all that infrastructure when there's no (known) machine I can buy to test it on. Summary: * While alt-patching is an attractive alternative this series doesn't do that and in the spirit of keeping things simple I'd really rather keep it that way. Does this sound reasonable? * For the topic of when to disable compression by default on x86, I genuinely think now's as good a time as any. If we were to do it in 2 steps any project downstream may very well not notice until 2 releases down the line, at which point they simply must turn compression back on, which is what they would have to do now anyway. * For the topic of allowing or not the option to be selectable, I think it would be a mistake to preclude it because while we don't know of physical memory maps with big holes on (publicly available) x86, Xen may be itself virtualized with arbitrary memory maps. Unlikely and far fetched as it is, it's IMO worth being at least cautious about. Gating the feature on EXPERT=y and adding a big warning should be good enough to avoid foot-shootouts. Thanks, Alejandro
Hi Alex, One more remark in the title. s/HAS// as you renamed the Kconfig. On 28/07/2023 08:59, Alejandro Vallejo wrote: > Adds a new compile-time flag to allow disabling pdx compression and > compiles out compression-related code/data. It also shorts the pdx<->pfn > conversion macros and creates stubs for masking fucntions. > > While at it, removes the old arch-defined CONFIG_HAS_PDX flag, as it was > not removable in practice. > > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> > --- > v2: > * Merged v1/patch2: Removal of CONFIG_HAS_PDX here (Jan) > --- > xen/arch/arm/Kconfig | 1 - > xen/arch/x86/Kconfig | 1 - > xen/arch/x86/domain.c | 19 +++++++++++++------ > xen/common/Kconfig | 13 ++++++++++--- > xen/common/Makefile | 2 +- > xen/common/pdx.c | 15 +++++++++++---- > xen/include/xen/pdx.h | 37 ++++++++++++++++++++++++++++++++++--- > 7 files changed, 69 insertions(+), 19 deletions(-) > > diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig > index 439cc94f33..ea1949fbaa 100644 > --- a/xen/arch/arm/Kconfig > +++ b/xen/arch/arm/Kconfig > @@ -14,7 +14,6 @@ config ARM > select HAS_ALTERNATIVE > select HAS_DEVICE_TREE > select HAS_PASSTHROUGH > - select HAS_PDX > select HAS_PMAP > select HAS_UBSAN > select IOMMU_FORCE_PT_SHARE > diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig > index 92f3a627da..30df085d96 100644 > --- a/xen/arch/x86/Kconfig > +++ b/xen/arch/x86/Kconfig > @@ -24,7 +24,6 @@ config X86 > select HAS_PASSTHROUGH > select HAS_PCI > select HAS_PCI_MSI > - select HAS_PDX > select HAS_SCHED_GRANULARITY > select HAS_UBSAN > select HAS_VPCI if HVM > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c > index 5f66c2ae33..ee2830aad7 100644 > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -458,7 +458,7 @@ void domain_cpu_policy_changed(struct domain *d) > } > } > > -#ifndef CONFIG_BIGMEM > +#if !defined(CONFIG_BIGMEM) && defined(CONFIG_PDX_COMPRESSION) > /* > * The hole may be at or above the 44-bit boundary, so we need to determine > * the total bit count until reaching 32 significant (not squashed out) bits > @@ -485,13 +485,20 @@ static unsigned int __init noinline _domain_struct_bits(void) > struct domain *alloc_domain_struct(void) > { > struct domain *d; > -#ifdef CONFIG_BIGMEM > - const unsigned int bits = 0; > -#else > + > /* > - * We pack the PDX of the domain structure into a 32-bit field within > - * the page_info structure. Hence the MEMF_bits() restriction. > + * Without CONFIG_BIGMEM, we pack the PDX of the domain structure into > + * a 32-bit field within the page_info structure. Hence the MEMF_bits() > + * restriction. With PDX compression in place the number of bits must > + * be calculated at runtime, but it's fixed otherwise. > + * > + * On systems with CONFIG_BIGMEM there's no packing, and so there's no > + * such restriction. > */ > +#if defined(CONFIG_BIGMEM) || !defined(CONFIG_PDX_COMPRESSION) > + const unsigned int bits = IS_ENABLED(CONFIG_BIGMEM) ? 0 : > + 32 + PAGE_SHIFT; > +#else > static unsigned int __read_mostly bits; > > if ( unlikely(!bits) ) > diff --git a/xen/common/Kconfig b/xen/common/Kconfig > index dd8d7c3f1c..3a0afd8e83 100644 > --- a/xen/common/Kconfig > +++ b/xen/common/Kconfig > @@ -23,6 +23,16 @@ config GRANT_TABLE > > If unsure, say Y. > > +config PDX_COMPRESSION > + bool "PDX (Page inDeX) compression support" > + default ARM > + help > + PDX compression is a technique that allows the hypervisor to > + represent physical addresses in a very space-efficient manner. > + This is very helpful reducing memory wastage in systems with > + memory banks with base addresses far from each other, but carrier > + a performance cost. > + > config ALTERNATIVE_CALL > bool > > @@ -53,9 +63,6 @@ config HAS_IOPORTS > config HAS_KEXEC > bool > > -config HAS_PDX > - bool > - > config HAS_PMAP > bool > > diff --git a/xen/common/Makefile b/xen/common/Makefile > index 46049eac35..0020cafb8a 100644 > --- a/xen/common/Makefile > +++ b/xen/common/Makefile > @@ -29,7 +29,7 @@ obj-y += multicall.o > obj-y += notifier.o > obj-$(CONFIG_NUMA) += numa.o > obj-y += page_alloc.o > -obj-$(CONFIG_HAS_PDX) += pdx.o > +obj-y += pdx.o > obj-$(CONFIG_PERF_COUNTERS) += perfc.o > obj-bin-$(CONFIG_HAS_PMAP) += pmap.init.o > obj-y += preempt.o > diff --git a/xen/common/pdx.c b/xen/common/pdx.c > index d3d38965bd..a3b1ba9fbb 100644 > --- a/xen/common/pdx.c > +++ b/xen/common/pdx.c > @@ -31,11 +31,15 @@ unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS( > > bool __mfn_valid(unsigned long mfn) > { > - if ( unlikely(evaluate_nospec(mfn >= max_page)) ) > + bool invalid = mfn >= max_page; > +#ifdef CONFIG_PDX_COMPRESSION > + invalid |= mfn & pfn_hole_mask; > +#endif > + > + if ( unlikely(evaluate_nospec(invalid)) ) > return false; > - return likely(!(mfn & pfn_hole_mask)) && > - likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, > - pdx_group_valid)); > + > + return test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, pdx_group_valid); > } > > void set_pdx_range(unsigned long smfn, unsigned long emfn) > @@ -49,6 +53,8 @@ void set_pdx_range(unsigned long smfn, unsigned long emfn) > __set_bit(idx, pdx_group_valid); > } > > +#ifdef CONFIG_PDX_COMPRESSION > + > /* > * Diagram to make sense of the following variables. The masks and shifts > * are done on mfn values in order to convert to/from pdx: > @@ -175,6 +181,7 @@ void __init pfn_pdx_hole_setup(unsigned long mask) > pfn_top_mask = ~(pfn_pdx_bottom_mask | pfn_hole_mask); > ma_top_mask = pfn_top_mask << PAGE_SHIFT; > } > +#endif /* CONFIG_PDX_COMPRESSION */ > > > /* > diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h > index 5a82b6bde2..dfb475c8dc 100644 > --- a/xen/include/xen/pdx.h > +++ b/xen/include/xen/pdx.h > @@ -67,8 +67,6 @@ > * region involved. > */ > > -#ifdef CONFIG_HAS_PDX > - > extern unsigned long max_pdx; > > #define PDX_GROUP_COUNT ((1 << PDX_GROUP_SHIFT) / \ > @@ -100,6 +98,8 @@ bool __mfn_valid(unsigned long mfn); > #define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn)) > #define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx)) > > +#ifdef CONFIG_PDX_COMPRESSION > + > extern unsigned long pfn_pdx_bottom_mask, ma_va_bottom_mask; > extern unsigned int pfn_pdx_hole_shift; > extern unsigned long pfn_hole_mask; > @@ -205,8 +205,39 @@ static inline uint64_t directmapoff_to_maddr(unsigned long offset) > * position marks a potentially compressible bit. > */ > void pfn_pdx_hole_setup(unsigned long mask); > +#else /* CONFIG_PDX_COMPRESSION */ > + > +/* Without PDX compression we can skip some computations */ > + > +/* pdx<->pfn == identity */ > +#define pdx_to_pfn(x) (x) > +#define pfn_to_pdx(x) (x) > + > +/* directmap is indexed by by maddr */ > +#define maddr_to_directmapoff(x) (x) > +#define directmapoff_to_maddr(x) (x) > + > +static inline bool pdx_is_region_compressible(unsigned long smfn, > + unsigned long emfn) > +{ > + return true; > +} > + > +static inline uint64_t pdx_init_mask(uint64_t base_addr) > +{ > + return 0; > +} > + > +static inline uint64_t pdx_region_mask(uint64_t base, uint64_t len) > +{ > + return 0; > +} > + > +static inline void pfn_pdx_hole_setup(unsigned long mask) > +{ > +} > > -#endif /* HAS_PDX */ > +#endif /* CONFIG_PDX_COMPRESSION */ > #endif /* __XEN_PDX_H__ */ > > /*
On 07.08.2023 18:06, Alejandro Vallejo wrote: > An alt-patching series can probably make it very close to the perf win of > this patch as long as it transforms the conversion hunks into no-ops when > there's no hole. I looked into the 2018 patch, and I don't think it tried > to go that far (afaics it's purpose is to inline the compression parameters > into the code stream). I highly suspect it would still noticiably > underperform compared to this one, but I admit it's guesswork and I'd be > happy to be proven wrong through benchmarks. The alt-patching certainly will have some residual overhead; if for nothing else then for the long clobber lists in some of the alternatives. > Summary: > * While alt-patching is an attractive alternative this series doesn't do > that and in the spirit of keeping things simple I'd really rather keep > it that way. Does this sound reasonable? > * For the topic of when to disable compression by default on x86, I > genuinely think now's as good a time as any. If we were to do it in 2 > steps any project downstream may very well not notice until 2 releases > down the line, at which point they simply must turn compression back > on, which is what they would have to do now anyway. > * For the topic of allowing or not the option to be selectable, I think > it would be a mistake to preclude it because while we don't know of > physical memory maps with big holes on (publicly available) x86, Xen > may be itself virtualized with arbitrary memory maps. Unlikely and far > fetched as it is, it's IMO worth being at least cautious about. Gating > the feature on EXPERT=y and adding a big warning should be good enough > to avoid foot-shootouts. I could live with this as a compromise; really my main objection is to not allowing the functionality at all on x86. Beyond that I'm obviously not overly happy with how things are moving here, but so be it. Jan
On Mon, Aug 07, 2023 at 06:48:13PM +0100, Julien Grall wrote: > Hi Alex, > > One more remark in the title. s/HAS// as you renamed the Kconfig. > > [snip] > > -- > Julien Grall Oops. Good catch, cheers.
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig index 439cc94f33..ea1949fbaa 100644 --- a/xen/arch/arm/Kconfig +++ b/xen/arch/arm/Kconfig @@ -14,7 +14,6 @@ config ARM select HAS_ALTERNATIVE select HAS_DEVICE_TREE select HAS_PASSTHROUGH - select HAS_PDX select HAS_PMAP select HAS_UBSAN select IOMMU_FORCE_PT_SHARE diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig index 92f3a627da..30df085d96 100644 --- a/xen/arch/x86/Kconfig +++ b/xen/arch/x86/Kconfig @@ -24,7 +24,6 @@ config X86 select HAS_PASSTHROUGH select HAS_PCI select HAS_PCI_MSI - select HAS_PDX select HAS_SCHED_GRANULARITY select HAS_UBSAN select HAS_VPCI if HVM diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 5f66c2ae33..ee2830aad7 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -458,7 +458,7 @@ void domain_cpu_policy_changed(struct domain *d) } } -#ifndef CONFIG_BIGMEM +#if !defined(CONFIG_BIGMEM) && defined(CONFIG_PDX_COMPRESSION) /* * The hole may be at or above the 44-bit boundary, so we need to determine * the total bit count until reaching 32 significant (not squashed out) bits @@ -485,13 +485,20 @@ static unsigned int __init noinline _domain_struct_bits(void) struct domain *alloc_domain_struct(void) { struct domain *d; -#ifdef CONFIG_BIGMEM - const unsigned int bits = 0; -#else + /* - * We pack the PDX of the domain structure into a 32-bit field within - * the page_info structure. Hence the MEMF_bits() restriction. + * Without CONFIG_BIGMEM, we pack the PDX of the domain structure into + * a 32-bit field within the page_info structure. Hence the MEMF_bits() + * restriction. With PDX compression in place the number of bits must + * be calculated at runtime, but it's fixed otherwise. + * + * On systems with CONFIG_BIGMEM there's no packing, and so there's no + * such restriction. */ +#if defined(CONFIG_BIGMEM) || !defined(CONFIG_PDX_COMPRESSION) + const unsigned int bits = IS_ENABLED(CONFIG_BIGMEM) ? 0 : + 32 + PAGE_SHIFT; +#else static unsigned int __read_mostly bits; if ( unlikely(!bits) ) diff --git a/xen/common/Kconfig b/xen/common/Kconfig index dd8d7c3f1c..3a0afd8e83 100644 --- a/xen/common/Kconfig +++ b/xen/common/Kconfig @@ -23,6 +23,16 @@ config GRANT_TABLE If unsure, say Y. +config PDX_COMPRESSION + bool "PDX (Page inDeX) compression support" + default ARM + help + PDX compression is a technique that allows the hypervisor to + represent physical addresses in a very space-efficient manner. + This is very helpful reducing memory wastage in systems with + memory banks with base addresses far from each other, but carrier + a performance cost. + config ALTERNATIVE_CALL bool @@ -53,9 +63,6 @@ config HAS_IOPORTS config HAS_KEXEC bool -config HAS_PDX - bool - config HAS_PMAP bool diff --git a/xen/common/Makefile b/xen/common/Makefile index 46049eac35..0020cafb8a 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -29,7 +29,7 @@ obj-y += multicall.o obj-y += notifier.o obj-$(CONFIG_NUMA) += numa.o obj-y += page_alloc.o -obj-$(CONFIG_HAS_PDX) += pdx.o +obj-y += pdx.o obj-$(CONFIG_PERF_COUNTERS) += perfc.o obj-bin-$(CONFIG_HAS_PMAP) += pmap.init.o obj-y += preempt.o diff --git a/xen/common/pdx.c b/xen/common/pdx.c index d3d38965bd..a3b1ba9fbb 100644 --- a/xen/common/pdx.c +++ b/xen/common/pdx.c @@ -31,11 +31,15 @@ unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS( bool __mfn_valid(unsigned long mfn) { - if ( unlikely(evaluate_nospec(mfn >= max_page)) ) + bool invalid = mfn >= max_page; +#ifdef CONFIG_PDX_COMPRESSION + invalid |= mfn & pfn_hole_mask; +#endif + + if ( unlikely(evaluate_nospec(invalid)) ) return false; - return likely(!(mfn & pfn_hole_mask)) && - likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, - pdx_group_valid)); + + return test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, pdx_group_valid); } void set_pdx_range(unsigned long smfn, unsigned long emfn) @@ -49,6 +53,8 @@ void set_pdx_range(unsigned long smfn, unsigned long emfn) __set_bit(idx, pdx_group_valid); } +#ifdef CONFIG_PDX_COMPRESSION + /* * Diagram to make sense of the following variables. The masks and shifts * are done on mfn values in order to convert to/from pdx: @@ -175,6 +181,7 @@ void __init pfn_pdx_hole_setup(unsigned long mask) pfn_top_mask = ~(pfn_pdx_bottom_mask | pfn_hole_mask); ma_top_mask = pfn_top_mask << PAGE_SHIFT; } +#endif /* CONFIG_PDX_COMPRESSION */ /* diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h index 5a82b6bde2..dfb475c8dc 100644 --- a/xen/include/xen/pdx.h +++ b/xen/include/xen/pdx.h @@ -67,8 +67,6 @@ * region involved. */ -#ifdef CONFIG_HAS_PDX - extern unsigned long max_pdx; #define PDX_GROUP_COUNT ((1 << PDX_GROUP_SHIFT) / \ @@ -100,6 +98,8 @@ bool __mfn_valid(unsigned long mfn); #define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn)) #define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx)) +#ifdef CONFIG_PDX_COMPRESSION + extern unsigned long pfn_pdx_bottom_mask, ma_va_bottom_mask; extern unsigned int pfn_pdx_hole_shift; extern unsigned long pfn_hole_mask; @@ -205,8 +205,39 @@ static inline uint64_t directmapoff_to_maddr(unsigned long offset) * position marks a potentially compressible bit. */ void pfn_pdx_hole_setup(unsigned long mask); +#else /* CONFIG_PDX_COMPRESSION */ + +/* Without PDX compression we can skip some computations */ + +/* pdx<->pfn == identity */ +#define pdx_to_pfn(x) (x) +#define pfn_to_pdx(x) (x) + +/* directmap is indexed by by maddr */ +#define maddr_to_directmapoff(x) (x) +#define directmapoff_to_maddr(x) (x) + +static inline bool pdx_is_region_compressible(unsigned long smfn, + unsigned long emfn) +{ + return true; +} + +static inline uint64_t pdx_init_mask(uint64_t base_addr) +{ + return 0; +} + +static inline uint64_t pdx_region_mask(uint64_t base, uint64_t len) +{ + return 0; +} + +static inline void pfn_pdx_hole_setup(unsigned long mask) +{ +} -#endif /* HAS_PDX */ +#endif /* CONFIG_PDX_COMPRESSION */ #endif /* __XEN_PDX_H__ */ /*
Adds a new compile-time flag to allow disabling pdx compression and compiles out compression-related code/data. It also shorts the pdx<->pfn conversion macros and creates stubs for masking fucntions. While at it, removes the old arch-defined CONFIG_HAS_PDX flag, as it was not removable in practice. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- v2: * Merged v1/patch2: Removal of CONFIG_HAS_PDX here (Jan) --- xen/arch/arm/Kconfig | 1 - xen/arch/x86/Kconfig | 1 - xen/arch/x86/domain.c | 19 +++++++++++++------ xen/common/Kconfig | 13 ++++++++++--- xen/common/Makefile | 2 +- xen/common/pdx.c | 15 +++++++++++---- xen/include/xen/pdx.h | 37 ++++++++++++++++++++++++++++++++++--- 7 files changed, 69 insertions(+), 19 deletions(-)