Message ID | 20230808130220.27891-5-alejandro.vallejo@cloud.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Make PDX compression optional | expand |
Several things. First, Shawn: PPC has gained a HAS_PDX, the deletion of which needs merging into this patch. It was added as part of 4a2f68f909304 which was "minimal to build". This series address the issue you presumably encountered where pdx.c appears to be optional but wasn't. Do PPC platforms have (or potentially have) sparse RAM banks? If like x86 the answer is definitely no, then you want to have PDX_COMPRESSION=n If the answer is definitely yes always, then you want PDX_COMPRESSION=y If the answer is system specific, then you want to offer users a choice. On 08/08/2023 2:02 pm, Alejandro Vallejo wrote: > diff --git a/xen/common/Kconfig b/xen/common/Kconfig > index 0d248ab941..2c1d1fc3a2 100644 > --- a/xen/common/Kconfig > +++ b/xen/common/Kconfig > @@ -23,6 +23,16 @@ config GRANT_TABLE > > If unsure, say Y. > > +config PDX_COMPRESSION > + bool "PDX (Page inDeX) compression support" if EXPERT This still needs hiding on x86. The minimal hack for 4.18 is "if EXPERT && !X86". Other adjustments needed depending on the PPC answer above. > + default ARM > + help > + PDX compression is a technique that allows the hypervisor to > + represent physical addresses in a very space-efficient manner. > + This is very helpful reducing memory wastage in systems with > + memory banks with base addresses far from each other, but carries > + a performance cost. This is still intractable for a non-Xen-maintainer, and inaccurate. Whether you get any benefit at all depends on how the sparseness happens to line up. --- PDX compression is a technique designed to reduce the memory overhead of physical memory management on platforms with sparse RAM banks. If your platform does have sparse RAM banks, enabling PDX compression may reduce the memory overhead of Xen, but does carry a runtime performance cost. If your platform does not have sparse RAM banks, do not enable PDX compression. --- ~Andrew
On 22.09.2023 22:03, Andrew Cooper wrote: > On 08/08/2023 2:02 pm, Alejandro Vallejo wrote: >> --- a/xen/common/Kconfig >> +++ b/xen/common/Kconfig >> @@ -23,6 +23,16 @@ config GRANT_TABLE >> >> If unsure, say Y. >> >> +config PDX_COMPRESSION >> + bool "PDX (Page inDeX) compression support" if EXPERT > > This still needs hiding on x86. The minimal hack for 4.18 is "if EXPERT > && !X86". If you insist on complete hiding and I insist on allowing it to be used by people who want to play with exotic hardware, then this won't go anywhere. I think I've come far enough with accepting a compromise, and so I think it's your turn now to also take a step from your original position. Jan
On Mon, Sep 25, 2023 at 08:36:03AM +0200, Jan Beulich wrote: > On 22.09.2023 22:03, Andrew Cooper wrote: > > On 08/08/2023 2:02 pm, Alejandro Vallejo wrote: > >> --- a/xen/common/Kconfig > >> +++ b/xen/common/Kconfig > >> @@ -23,6 +23,16 @@ config GRANT_TABLE > >> > >> If unsure, say Y. > >> > >> +config PDX_COMPRESSION > >> + bool "PDX (Page inDeX) compression support" if EXPERT > > > > This still needs hiding on x86. The minimal hack for 4.18 is "if EXPERT > > && !X86". > > If you insist on complete hiding and I insist on allowing it to be used by > people who want to play with exotic hardware, then this won't go anywhere. > I think I've come far enough with accepting a compromise, and so I think > it's your turn now to also take a step from your original position. Just because I'm not familiar with this, is there any x86 hardware that has such sparse memory map that would benefit from PDX? Wouldn't anyone doing bring up on such exotic hardware also likely need to perform other adjustments to Xen, and hence commenting out the !X86 in Kconfig be acceptable? (we would likely make it selectable at that point if such platforms exist). Thanks, Roger.
On 25.09.2023 11:46, Roger Pau Monné wrote: > On Mon, Sep 25, 2023 at 08:36:03AM +0200, Jan Beulich wrote: >> On 22.09.2023 22:03, Andrew Cooper wrote: >>> On 08/08/2023 2:02 pm, Alejandro Vallejo wrote: >>>> --- a/xen/common/Kconfig >>>> +++ b/xen/common/Kconfig >>>> @@ -23,6 +23,16 @@ config GRANT_TABLE >>>> >>>> If unsure, say Y. >>>> >>>> +config PDX_COMPRESSION >>>> + bool "PDX (Page inDeX) compression support" if EXPERT >>> >>> This still needs hiding on x86. The minimal hack for 4.18 is "if EXPERT >>> && !X86". >> >> If you insist on complete hiding and I insist on allowing it to be used by >> people who want to play with exotic hardware, then this won't go anywhere. >> I think I've come far enough with accepting a compromise, and so I think >> it's your turn now to also take a step from your original position. > > Just because I'm not familiar with this, is there any x86 hardware > that has such sparse memory map that would benefit from PDX? > > Wouldn't anyone doing bring up on such exotic hardware also likely need to > perform other adjustments to Xen, and hence commenting out the !X86 in > Kconfig be acceptable? (we would likely make it selectable at that > point if such platforms exist). As mentioned before, the reason PDX was introduced was to actually make Xen work on such exotic x86 hardware. While I can't tell for sure, that hardware probably has never made it into production. Yet still things were known to work there after the original adjustments, so no, I would not expect other adjustments to be necessary (provided there was no bitrot). Jan
On 25/09/2023 10:46 am, Roger Pau Monné wrote: > On Mon, Sep 25, 2023 at 08:36:03AM +0200, Jan Beulich wrote: >> On 22.09.2023 22:03, Andrew Cooper wrote: >>> On 08/08/2023 2:02 pm, Alejandro Vallejo wrote: >>>> --- a/xen/common/Kconfig >>>> +++ b/xen/common/Kconfig >>>> @@ -23,6 +23,16 @@ config GRANT_TABLE >>>> >>>> If unsure, say Y. >>>> >>>> +config PDX_COMPRESSION >>>> + bool "PDX (Page inDeX) compression support" if EXPERT >>> This still needs hiding on x86. The minimal hack for 4.18 is "if EXPERT >>> && !X86". >> If you insist on complete hiding and I insist on allowing it to be used by >> people who want to play with exotic hardware, then this won't go anywhere. >> I think I've come far enough with accepting a compromise, and so I think >> it's your turn now to also take a step from your original position. > Just because I'm not familiar with this, is there any x86 hardware > that has such sparse memory map that would benefit from PDX? There is one known system which never shipped. Xen's implementation was from the anticipation of that system shipping. Nothing else known. None of the other major kernels have facilities such as this, which is very likely a contributory factor to the system not shipping. > Wouldn't anyone doing bring up on such exotic hardware also likely need to > perform other adjustments to Xen, and hence commenting out the !X86 in > Kconfig be acceptable? (we would likely make it selectable at that > point if such platforms exist). People with bizarre hardware can cover the cost of bringing it up. And that includes tweaking Kconfig so they can select this option. ~Andrew
On 25.09.2023 12:01, Andrew Cooper wrote: > On 25/09/2023 10:46 am, Roger Pau Monné wrote: >> On Mon, Sep 25, 2023 at 08:36:03AM +0200, Jan Beulich wrote: >>> On 22.09.2023 22:03, Andrew Cooper wrote: >>>> On 08/08/2023 2:02 pm, Alejandro Vallejo wrote: >>>>> --- a/xen/common/Kconfig >>>>> +++ b/xen/common/Kconfig >>>>> @@ -23,6 +23,16 @@ config GRANT_TABLE >>>>> >>>>> If unsure, say Y. >>>>> >>>>> +config PDX_COMPRESSION >>>>> + bool "PDX (Page inDeX) compression support" if EXPERT >>>> This still needs hiding on x86. The minimal hack for 4.18 is "if EXPERT >>>> && !X86". >>> If you insist on complete hiding and I insist on allowing it to be used by >>> people who want to play with exotic hardware, then this won't go anywhere. >>> I think I've come far enough with accepting a compromise, and so I think >>> it's your turn now to also take a step from your original position. >> Just because I'm not familiar with this, is there any x86 hardware >> that has such sparse memory map that would benefit from PDX? > > There is one known system which never shipped. Xen's implementation was > from the anticipation of that system shipping. Nothing else known. > > None of the other major kernels have facilities such as this, which is > very likely a contributory factor to the system not shipping. Possibly, yet there's one important difference (mentioned before): VA space that Xen can use (for directmap and frame_table[]) is far more constrained than for baremetal systems (Linux, Windows). Hence the guys who got in touch back at the time were not in need of Linux side changes at all (as far as I was told back then). Jan
On 9/22/23 3:03 PM, Andrew Cooper wrote: > Several things. > > First, Shawn: PPC has gained a HAS_PDX, the deletion of which needs > merging into this patch. > > It was added as part of 4a2f68f909304 which was "minimal to build". > This series address the issue you presumably encountered where pdx.c > appears to be optional but wasn't. > That's correct, build was broken without HAS_PDX. > > Do PPC platforms have (or potentially have) sparse RAM banks? > Yes, they do. Especially on POWER9, it is very common for there to be large gaps in the physical address space between RAM banks. > If like x86 the answer is definitely no, then you want to have > PDX_COMPRESSION=n > > If the answer is definitely yes always, then you want PDX_COMPRESSION=y > > If the answer is system specific, then you want to offer users a choice. It looks like PDX_COMPRESSION=y would probably make the most sense for ppc, but I'm not against making it a choice. Thanks, Shawn
On 25/09/2023 6:37 pm, Shawn Anastasio wrote: > On 9/22/23 3:03 PM, Andrew Cooper wrote: >> Several things. >> >> First, Shawn: PPC has gained a HAS_PDX, the deletion of which needs >> merging into this patch. >> >> It was added as part of 4a2f68f909304 which was "minimal to build". >> This series address the issue you presumably encountered where pdx.c >> appears to be optional but wasn't. >> > That's correct, build was broken without HAS_PDX. > >> Do PPC platforms have (or potentially have) sparse RAM banks? >> > Yes, they do. Especially on POWER9, it is very common for there to be > large gaps in the physical address space between RAM banks. > >> If like x86 the answer is definitely no, then you want to have >> PDX_COMPRESSION=n >> >> If the answer is definitely yes always, then you want PDX_COMPRESSION=y >> >> If the answer is system specific, then you want to offer users a choice. > It looks like PDX_COMPRESSION=y would probably make the most sense for > ppc, but I'm not against making it a choice. Thanks. I'll make suitable adjustments. ~Andrew
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig index 57bd1d01d7..224db89c05 100644 --- a/xen/arch/arm/Kconfig +++ b/xen/arch/arm/Kconfig @@ -14,7 +14,6 @@ config ARM select HAS_ALTERNATIVE select HAS_DEVICE_TREE select HAS_PASSTHROUGH - select HAS_PDX select HAS_PMAP select HAS_UBSAN select IOMMU_FORCE_PT_SHARE diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig index 92f3a627da..30df085d96 100644 --- a/xen/arch/x86/Kconfig +++ b/xen/arch/x86/Kconfig @@ -24,7 +24,6 @@ config X86 select HAS_PASSTHROUGH select HAS_PCI select HAS_PCI_MSI - select HAS_PDX select HAS_SCHED_GRANULARITY select HAS_UBSAN select HAS_VPCI if HVM diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index fe86a7f853..f476df17e4 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -458,7 +458,7 @@ void domain_cpu_policy_changed(struct domain *d) } } -#ifndef CONFIG_BIGMEM +#if !defined(CONFIG_BIGMEM) && defined(CONFIG_PDX_COMPRESSION) /* * The hole may be at or above the 44-bit boundary, so we need to determine * the total bit count until reaching 32 significant (not squashed out) bits @@ -485,13 +485,20 @@ static unsigned int __init noinline _domain_struct_bits(void) struct domain *alloc_domain_struct(void) { struct domain *d; -#ifdef CONFIG_BIGMEM - const unsigned int bits = 0; -#else + /* - * We pack the PDX of the domain structure into a 32-bit field within - * the page_info structure. Hence the MEMF_bits() restriction. + * Without CONFIG_BIGMEM, we pack the PDX of the domain structure into + * a 32-bit field within the page_info structure. Hence the MEMF_bits() + * restriction. With PDX compression in place the number of bits must + * be calculated at runtime, but it's fixed otherwise. + * + * On systems with CONFIG_BIGMEM there's no packing, and so there's no + * such restriction. */ +#if defined(CONFIG_BIGMEM) || !defined(CONFIG_PDX_COMPRESSION) + const unsigned int bits = IS_ENABLED(CONFIG_BIGMEM) ? 0 : + 32 + PAGE_SHIFT; +#else static unsigned int __read_mostly bits; if ( unlikely(!bits) ) diff --git a/xen/common/Kconfig b/xen/common/Kconfig index 0d248ab941..2c1d1fc3a2 100644 --- a/xen/common/Kconfig +++ b/xen/common/Kconfig @@ -23,6 +23,16 @@ config GRANT_TABLE If unsure, say Y. +config PDX_COMPRESSION + bool "PDX (Page inDeX) compression support" if EXPERT + default ARM + help + PDX compression is a technique that allows the hypervisor to + represent physical addresses in a very space-efficient manner. + This is very helpful reducing memory wastage in systems with + memory banks with base addresses far from each other, but carries + a performance cost. + config ALTERNATIVE_CALL bool @@ -53,9 +63,6 @@ config HAS_IOPORTS config HAS_KEXEC bool -config HAS_PDX - bool - config HAS_PMAP bool diff --git a/xen/common/Makefile b/xen/common/Makefile index 46049eac35..0020cafb8a 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -29,7 +29,7 @@ obj-y += multicall.o obj-y += notifier.o obj-$(CONFIG_NUMA) += numa.o obj-y += page_alloc.o -obj-$(CONFIG_HAS_PDX) += pdx.o +obj-y += pdx.o obj-$(CONFIG_PERF_COUNTERS) += perfc.o obj-bin-$(CONFIG_HAS_PMAP) += pmap.init.o obj-y += preempt.o diff --git a/xen/common/pdx.c b/xen/common/pdx.c index d3d38965bd..d3d63b0750 100644 --- a/xen/common/pdx.c +++ b/xen/common/pdx.c @@ -31,11 +31,16 @@ unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS( bool __mfn_valid(unsigned long mfn) { - if ( unlikely(evaluate_nospec(mfn >= max_page)) ) + bool invalid = mfn >= max_page; + +#ifdef CONFIG_PDX_COMPRESSION + invalid |= mfn & pfn_hole_mask; +#endif + + if ( unlikely(evaluate_nospec(invalid)) ) return false; - return likely(!(mfn & pfn_hole_mask)) && - likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, - pdx_group_valid)); + + return test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, pdx_group_valid); } void set_pdx_range(unsigned long smfn, unsigned long emfn) @@ -49,6 +54,8 @@ void set_pdx_range(unsigned long smfn, unsigned long emfn) __set_bit(idx, pdx_group_valid); } +#ifdef CONFIG_PDX_COMPRESSION + /* * Diagram to make sense of the following variables. The masks and shifts * are done on mfn values in order to convert to/from pdx: @@ -176,6 +183,7 @@ void __init pfn_pdx_hole_setup(unsigned long mask) ma_top_mask = pfn_top_mask << PAGE_SHIFT; } +#endif /* CONFIG_PDX_COMPRESSION */ /* * Local variables: diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h index b5367a33ca..6511aba6c7 100644 --- a/xen/include/xen/pdx.h +++ b/xen/include/xen/pdx.h @@ -67,8 +67,6 @@ * region involved. */ -#ifdef CONFIG_HAS_PDX - extern unsigned long max_pdx; #define PDX_GROUP_COUNT ((1 << PDX_GROUP_SHIFT) / \ @@ -100,6 +98,8 @@ bool __mfn_valid(unsigned long mfn); #define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn)) #define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx)) +#ifdef CONFIG_PDX_COMPRESSION + extern unsigned long pfn_pdx_bottom_mask, ma_va_bottom_mask; extern unsigned int pfn_pdx_hole_shift; extern unsigned long pfn_hole_mask; @@ -206,7 +206,39 @@ static inline paddr_t directmapoff_to_maddr(unsigned long offset) */ void pfn_pdx_hole_setup(unsigned long mask); -#endif /* HAS_PDX */ +#else /* !CONFIG_PDX_COMPRESSION */ + +/* Without PDX compression we can skip some computations */ + +/* pdx<->pfn == identity */ +#define pdx_to_pfn(x) (x) +#define pfn_to_pdx(x) (x) + +/* directmap is indexed by by maddr */ +#define maddr_to_directmapoff(x) (x) +#define directmapoff_to_maddr(x) (x) + +static inline bool pdx_is_region_compressible(unsigned long smfn, + unsigned long emfn) +{ + return true; +} + +static inline uint64_t pdx_init_mask(uint64_t base_addr) +{ + return 0; +} + +static inline uint64_t pdx_region_mask(uint64_t base, uint64_t len) +{ + return 0; +} + +static inline void pfn_pdx_hole_setup(unsigned long mask) +{ +} + +#endif /* CONFIG_PDX_COMPRESSION */ #endif /* __XEN_PDX_H__ */ /*