diff mbox series

[v3,4/4] pdx: Add CONFIG_PDX_COMPRESSION as a common Kconfig option

Message ID 20230808130220.27891-5-alejandro.vallejo@cloud.com (mailing list archive)
State New, archived
Headers show
Series Make PDX compression optional | expand

Commit Message

Alejandro Vallejo Aug. 8, 2023, 1:02 p.m. UTC
Adds a new compile-time flag to allow disabling pdx compression and
compiles out compression-related code/data. It also shorts the pdx<->pfn
conversion macros and creates stubs for masking fucntions.

While at it, removes the old arch-defined CONFIG_HAS_PDX flag, as it was
not removable in practice.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
---
v3:
  * Fixed a typo in Kconfig's help
  * Gated CONFIG_PDX_COMPRESSION under EXPERT=y and left default=ARM
  * Fixed commit message after renaming CONFIG_PDX_COMPRESSION (Julien)
  * Changed #else comment to the reason it executes (Julien)
  * Various blank-line related style changes (Jan)
---
 xen/arch/arm/Kconfig  |  1 -
 xen/arch/x86/Kconfig  |  1 -
 xen/arch/x86/domain.c | 19 +++++++++++++------
 xen/common/Kconfig    | 13 ++++++++++---
 xen/common/Makefile   |  2 +-
 xen/common/pdx.c      | 16 ++++++++++++----
 xen/include/xen/pdx.h | 38 +++++++++++++++++++++++++++++++++++---
 7 files changed, 71 insertions(+), 19 deletions(-)

Comments

Andrew Cooper Sept. 22, 2023, 8:03 p.m. UTC | #1
Several things.

First, Shawn: PPC has gained a HAS_PDX, the deletion of which needs
merging into this patch.

It was added as part of 4a2f68f909304 which was "minimal to build". 
This series address the issue you presumably encountered where pdx.c
appears to be optional but wasn't.


Do PPC platforms have (or potentially have) sparse RAM banks?

If like x86 the answer is definitely no, then you want to have
PDX_COMPRESSION=n

If the answer is definitely yes always, then you want PDX_COMPRESSION=y

If the answer is system specific, then you want to offer users a choice.


On 08/08/2023 2:02 pm, Alejandro Vallejo wrote:
> diff --git a/xen/common/Kconfig b/xen/common/Kconfig
> index 0d248ab941..2c1d1fc3a2 100644
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -23,6 +23,16 @@ config GRANT_TABLE
>  
>  	  If unsure, say Y.
>  
> +config PDX_COMPRESSION
> +	bool "PDX (Page inDeX) compression support" if EXPERT

This still needs hiding on x86.  The minimal hack for 4.18 is "if EXPERT
&& !X86".

Other adjustments needed depending on the PPC answer above.

> +	default ARM
> +	help
> +	  PDX compression is a technique that allows the hypervisor to
> +	  represent physical addresses in a very space-efficient manner.
> +	  This is very helpful reducing memory wastage in systems with
> +	  memory banks with base addresses far from each other, but carries
> +	  a performance cost.

This is still intractable for a non-Xen-maintainer, and inaccurate. 
Whether you get any benefit at all depends on how the sparseness happens
to line up.

---
PDX compression is a technique designed to reduce the memory overhead of
physical memory management on platforms with sparse RAM banks.

If your platform does have sparse RAM banks, enabling PDX compression
may reduce the memory overhead of Xen, but does carry a runtime
performance cost.

If your platform does not have sparse RAM banks, do not enable PDX
compression.
---

~Andrew
Jan Beulich Sept. 25, 2023, 6:36 a.m. UTC | #2
On 22.09.2023 22:03, Andrew Cooper wrote:
> On 08/08/2023 2:02 pm, Alejandro Vallejo wrote:
>> --- a/xen/common/Kconfig
>> +++ b/xen/common/Kconfig
>> @@ -23,6 +23,16 @@ config GRANT_TABLE
>>  
>>  	  If unsure, say Y.
>>  
>> +config PDX_COMPRESSION
>> +	bool "PDX (Page inDeX) compression support" if EXPERT
> 
> This still needs hiding on x86.  The minimal hack for 4.18 is "if EXPERT
> && !X86".

If you insist on complete hiding and I insist on allowing it to be used by
people who want to play with exotic hardware, then this won't go anywhere.
I think I've come far enough with accepting a compromise, and so I think
it's your turn now to also take a step from your original position.

Jan
Roger Pau Monné Sept. 25, 2023, 9:46 a.m. UTC | #3
On Mon, Sep 25, 2023 at 08:36:03AM +0200, Jan Beulich wrote:
> On 22.09.2023 22:03, Andrew Cooper wrote:
> > On 08/08/2023 2:02 pm, Alejandro Vallejo wrote:
> >> --- a/xen/common/Kconfig
> >> +++ b/xen/common/Kconfig
> >> @@ -23,6 +23,16 @@ config GRANT_TABLE
> >>  
> >>  	  If unsure, say Y.
> >>  
> >> +config PDX_COMPRESSION
> >> +	bool "PDX (Page inDeX) compression support" if EXPERT
> > 
> > This still needs hiding on x86.  The minimal hack for 4.18 is "if EXPERT
> > && !X86".
> 
> If you insist on complete hiding and I insist on allowing it to be used by
> people who want to play with exotic hardware, then this won't go anywhere.
> I think I've come far enough with accepting a compromise, and so I think
> it's your turn now to also take a step from your original position.

Just because I'm not familiar with this, is there any x86 hardware
that has such sparse memory map that would benefit from PDX?

Wouldn't anyone doing bring up on such exotic hardware also likely need to
perform other adjustments to Xen, and hence commenting out the !X86 in
Kconfig be acceptable? (we would likely make it selectable at that
point if such platforms exist).

Thanks, Roger.
Jan Beulich Sept. 25, 2023, 9:59 a.m. UTC | #4
On 25.09.2023 11:46, Roger Pau Monné wrote:
> On Mon, Sep 25, 2023 at 08:36:03AM +0200, Jan Beulich wrote:
>> On 22.09.2023 22:03, Andrew Cooper wrote:
>>> On 08/08/2023 2:02 pm, Alejandro Vallejo wrote:
>>>> --- a/xen/common/Kconfig
>>>> +++ b/xen/common/Kconfig
>>>> @@ -23,6 +23,16 @@ config GRANT_TABLE
>>>>  
>>>>  	  If unsure, say Y.
>>>>  
>>>> +config PDX_COMPRESSION
>>>> +	bool "PDX (Page inDeX) compression support" if EXPERT
>>>
>>> This still needs hiding on x86.  The minimal hack for 4.18 is "if EXPERT
>>> && !X86".
>>
>> If you insist on complete hiding and I insist on allowing it to be used by
>> people who want to play with exotic hardware, then this won't go anywhere.
>> I think I've come far enough with accepting a compromise, and so I think
>> it's your turn now to also take a step from your original position.
> 
> Just because I'm not familiar with this, is there any x86 hardware
> that has such sparse memory map that would benefit from PDX?
> 
> Wouldn't anyone doing bring up on such exotic hardware also likely need to
> perform other adjustments to Xen, and hence commenting out the !X86 in
> Kconfig be acceptable? (we would likely make it selectable at that
> point if such platforms exist).

As mentioned before, the reason PDX was introduced was to actually make Xen
work on such exotic x86 hardware. While I can't tell for sure, that hardware
probably has never made it into production. Yet still things were known to
work there after the original adjustments, so no, I would not expect other
adjustments to be necessary (provided there was no bitrot).

Jan
Andrew Cooper Sept. 25, 2023, 10:01 a.m. UTC | #5
On 25/09/2023 10:46 am, Roger Pau Monné wrote:
> On Mon, Sep 25, 2023 at 08:36:03AM +0200, Jan Beulich wrote:
>> On 22.09.2023 22:03, Andrew Cooper wrote:
>>> On 08/08/2023 2:02 pm, Alejandro Vallejo wrote:
>>>> --- a/xen/common/Kconfig
>>>> +++ b/xen/common/Kconfig
>>>> @@ -23,6 +23,16 @@ config GRANT_TABLE
>>>>  
>>>>  	  If unsure, say Y.
>>>>  
>>>> +config PDX_COMPRESSION
>>>> +	bool "PDX (Page inDeX) compression support" if EXPERT
>>> This still needs hiding on x86.  The minimal hack for 4.18 is "if EXPERT
>>> && !X86".
>> If you insist on complete hiding and I insist on allowing it to be used by
>> people who want to play with exotic hardware, then this won't go anywhere.
>> I think I've come far enough with accepting a compromise, and so I think
>> it's your turn now to also take a step from your original position.
> Just because I'm not familiar with this, is there any x86 hardware
> that has such sparse memory map that would benefit from PDX?

There is one known system which never shipped.  Xen's implementation was
from the anticipation of that system shipping.  Nothing else known.

None of the other major kernels have facilities such as this, which is
very likely a contributory factor to the system not shipping.

> Wouldn't anyone doing bring up on such exotic hardware also likely need to
> perform other adjustments to Xen, and hence commenting out the !X86 in
> Kconfig be acceptable? (we would likely make it selectable at that
> point if such platforms exist).

People with bizarre hardware can cover the cost of bringing it up.  And
that includes tweaking Kconfig so they can select this option.

~Andrew
Jan Beulich Sept. 25, 2023, 10:15 a.m. UTC | #6
On 25.09.2023 12:01, Andrew Cooper wrote:
> On 25/09/2023 10:46 am, Roger Pau Monné wrote:
>> On Mon, Sep 25, 2023 at 08:36:03AM +0200, Jan Beulich wrote:
>>> On 22.09.2023 22:03, Andrew Cooper wrote:
>>>> On 08/08/2023 2:02 pm, Alejandro Vallejo wrote:
>>>>> --- a/xen/common/Kconfig
>>>>> +++ b/xen/common/Kconfig
>>>>> @@ -23,6 +23,16 @@ config GRANT_TABLE
>>>>>  
>>>>>  	  If unsure, say Y.
>>>>>  
>>>>> +config PDX_COMPRESSION
>>>>> +	bool "PDX (Page inDeX) compression support" if EXPERT
>>>> This still needs hiding on x86.  The minimal hack for 4.18 is "if EXPERT
>>>> && !X86".
>>> If you insist on complete hiding and I insist on allowing it to be used by
>>> people who want to play with exotic hardware, then this won't go anywhere.
>>> I think I've come far enough with accepting a compromise, and so I think
>>> it's your turn now to also take a step from your original position.
>> Just because I'm not familiar with this, is there any x86 hardware
>> that has such sparse memory map that would benefit from PDX?
> 
> There is one known system which never shipped.  Xen's implementation was
> from the anticipation of that system shipping.  Nothing else known.
> 
> None of the other major kernels have facilities such as this, which is
> very likely a contributory factor to the system not shipping.

Possibly, yet there's one important difference (mentioned before): VA space
that Xen can use (for directmap and frame_table[]) is far more constrained
than for baremetal systems (Linux, Windows). Hence the guys who got in touch
back at the time were not in need of Linux side changes at all (as far as I
was told back then).

Jan
Shawn Anastasio Sept. 25, 2023, 5:37 p.m. UTC | #7
On 9/22/23 3:03 PM, Andrew Cooper wrote:
> Several things.
> 
> First, Shawn: PPC has gained a HAS_PDX, the deletion of which needs
> merging into this patch.
> 
> It was added as part of 4a2f68f909304 which was "minimal to build". 
> This series address the issue you presumably encountered where pdx.c
> appears to be optional but wasn't.
>

That's correct, build was broken without HAS_PDX.

> 
> Do PPC platforms have (or potentially have) sparse RAM banks?
>

Yes, they do. Especially on POWER9, it is very common for there to be
large gaps in the physical address space between RAM banks.

> If like x86 the answer is definitely no, then you want to have
> PDX_COMPRESSION=n
> 
> If the answer is definitely yes always, then you want PDX_COMPRESSION=y
>
> If the answer is system specific, then you want to offer users a choice.

It looks like PDX_COMPRESSION=y would probably make the most sense for
ppc, but I'm not against making it a choice.

Thanks,
Shawn
Andrew Cooper Oct. 6, 2023, 1:20 p.m. UTC | #8
On 25/09/2023 6:37 pm, Shawn Anastasio wrote:
> On 9/22/23 3:03 PM, Andrew Cooper wrote:
>> Several things.
>>
>> First, Shawn: PPC has gained a HAS_PDX, the deletion of which needs
>> merging into this patch.
>>
>> It was added as part of 4a2f68f909304 which was "minimal to build". 
>> This series address the issue you presumably encountered where pdx.c
>> appears to be optional but wasn't.
>>
> That's correct, build was broken without HAS_PDX.
>
>> Do PPC platforms have (or potentially have) sparse RAM banks?
>>
> Yes, they do. Especially on POWER9, it is very common for there to be
> large gaps in the physical address space between RAM banks.
>
>> If like x86 the answer is definitely no, then you want to have
>> PDX_COMPRESSION=n
>>
>> If the answer is definitely yes always, then you want PDX_COMPRESSION=y
>>
>> If the answer is system specific, then you want to offer users a choice.
> It looks like PDX_COMPRESSION=y would probably make the most sense for
> ppc, but I'm not against making it a choice.

Thanks.  I'll make suitable adjustments.

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 57bd1d01d7..224db89c05 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -14,7 +14,6 @@  config ARM
 	select HAS_ALTERNATIVE
 	select HAS_DEVICE_TREE
 	select HAS_PASSTHROUGH
-	select HAS_PDX
 	select HAS_PMAP
 	select HAS_UBSAN
 	select IOMMU_FORCE_PT_SHARE
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 92f3a627da..30df085d96 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -24,7 +24,6 @@  config X86
 	select HAS_PASSTHROUGH
 	select HAS_PCI
 	select HAS_PCI_MSI
-	select HAS_PDX
 	select HAS_SCHED_GRANULARITY
 	select HAS_UBSAN
 	select HAS_VPCI if HVM
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fe86a7f853..f476df17e4 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -458,7 +458,7 @@  void domain_cpu_policy_changed(struct domain *d)
     }
 }
 
-#ifndef CONFIG_BIGMEM
+#if !defined(CONFIG_BIGMEM) && defined(CONFIG_PDX_COMPRESSION)
 /*
  * The hole may be at or above the 44-bit boundary, so we need to determine
  * the total bit count until reaching 32 significant (not squashed out) bits
@@ -485,13 +485,20 @@  static unsigned int __init noinline _domain_struct_bits(void)
 struct domain *alloc_domain_struct(void)
 {
     struct domain *d;
-#ifdef CONFIG_BIGMEM
-    const unsigned int bits = 0;
-#else
+
     /*
-     * We pack the PDX of the domain structure into a 32-bit field within
-     * the page_info structure. Hence the MEMF_bits() restriction.
+     * Without CONFIG_BIGMEM, we pack the PDX of the domain structure into
+     * a 32-bit field within the page_info structure. Hence the MEMF_bits()
+     * restriction. With PDX compression in place the number of bits must
+     * be calculated at runtime, but it's fixed otherwise.
+     *
+     * On systems with CONFIG_BIGMEM there's no packing, and so there's no
+     * such restriction.
      */
+#if defined(CONFIG_BIGMEM) || !defined(CONFIG_PDX_COMPRESSION)
+    const unsigned int bits = IS_ENABLED(CONFIG_BIGMEM) ? 0 :
+                                                          32 + PAGE_SHIFT;
+#else
     static unsigned int __read_mostly bits;
 
     if ( unlikely(!bits) )
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 0d248ab941..2c1d1fc3a2 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -23,6 +23,16 @@  config GRANT_TABLE
 
 	  If unsure, say Y.
 
+config PDX_COMPRESSION
+	bool "PDX (Page inDeX) compression support" if EXPERT
+	default ARM
+	help
+	  PDX compression is a technique that allows the hypervisor to
+	  represent physical addresses in a very space-efficient manner.
+	  This is very helpful reducing memory wastage in systems with
+	  memory banks with base addresses far from each other, but carries
+	  a performance cost.
+
 config ALTERNATIVE_CALL
 	bool
 
@@ -53,9 +63,6 @@  config HAS_IOPORTS
 config HAS_KEXEC
 	bool
 
-config HAS_PDX
-	bool
-
 config HAS_PMAP
 	bool
 
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 46049eac35..0020cafb8a 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -29,7 +29,7 @@  obj-y += multicall.o
 obj-y += notifier.o
 obj-$(CONFIG_NUMA) += numa.o
 obj-y += page_alloc.o
-obj-$(CONFIG_HAS_PDX) += pdx.o
+obj-y += pdx.o
 obj-$(CONFIG_PERF_COUNTERS) += perfc.o
 obj-bin-$(CONFIG_HAS_PMAP) += pmap.init.o
 obj-y += preempt.o
diff --git a/xen/common/pdx.c b/xen/common/pdx.c
index d3d38965bd..d3d63b0750 100644
--- a/xen/common/pdx.c
+++ b/xen/common/pdx.c
@@ -31,11 +31,16 @@  unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS(
 
 bool __mfn_valid(unsigned long mfn)
 {
-    if ( unlikely(evaluate_nospec(mfn >= max_page)) )
+    bool invalid = mfn >= max_page;
+
+#ifdef CONFIG_PDX_COMPRESSION
+    invalid |= mfn & pfn_hole_mask;
+#endif
+
+    if ( unlikely(evaluate_nospec(invalid)) )
         return false;
-    return likely(!(mfn & pfn_hole_mask)) &&
-           likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT,
-                           pdx_group_valid));
+
+    return test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, pdx_group_valid);
 }
 
 void set_pdx_range(unsigned long smfn, unsigned long emfn)
@@ -49,6 +54,8 @@  void set_pdx_range(unsigned long smfn, unsigned long emfn)
         __set_bit(idx, pdx_group_valid);
 }
 
+#ifdef CONFIG_PDX_COMPRESSION
+
 /*
  * Diagram to make sense of the following variables. The masks and shifts
  * are done on mfn values in order to convert to/from pdx:
@@ -176,6 +183,7 @@  void __init pfn_pdx_hole_setup(unsigned long mask)
     ma_top_mask         = pfn_top_mask << PAGE_SHIFT;
 }
 
+#endif /* CONFIG_PDX_COMPRESSION */
 
 /*
  * Local variables:
diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h
index b5367a33ca..6511aba6c7 100644
--- a/xen/include/xen/pdx.h
+++ b/xen/include/xen/pdx.h
@@ -67,8 +67,6 @@ 
  * region involved.
  */
 
-#ifdef CONFIG_HAS_PDX
-
 extern unsigned long max_pdx;
 
 #define PDX_GROUP_COUNT ((1 << PDX_GROUP_SHIFT) / \
@@ -100,6 +98,8 @@  bool __mfn_valid(unsigned long mfn);
 #define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn))
 #define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx))
 
+#ifdef CONFIG_PDX_COMPRESSION
+
 extern unsigned long pfn_pdx_bottom_mask, ma_va_bottom_mask;
 extern unsigned int pfn_pdx_hole_shift;
 extern unsigned long pfn_hole_mask;
@@ -206,7 +206,39 @@  static inline paddr_t directmapoff_to_maddr(unsigned long offset)
  */
 void pfn_pdx_hole_setup(unsigned long mask);
 
-#endif /* HAS_PDX */
+#else /* !CONFIG_PDX_COMPRESSION */
+
+/* Without PDX compression we can skip some computations */
+
+/* pdx<->pfn == identity */
+#define pdx_to_pfn(x) (x)
+#define pfn_to_pdx(x) (x)
+
+/* directmap is indexed by by maddr */
+#define maddr_to_directmapoff(x) (x)
+#define directmapoff_to_maddr(x) (x)
+
+static inline bool pdx_is_region_compressible(unsigned long smfn,
+                                              unsigned long emfn)
+{
+    return true;
+}
+
+static inline uint64_t pdx_init_mask(uint64_t base_addr)
+{
+    return 0;
+}
+
+static inline uint64_t pdx_region_mask(uint64_t base, uint64_t len)
+{
+    return 0;
+}
+
+static inline void pfn_pdx_hole_setup(unsigned long mask)
+{
+}
+
+#endif /* CONFIG_PDX_COMPRESSION */
 #endif /* __XEN_PDX_H__ */
 
 /*