diff mbox series

[v5,2/7] x86/sgx: Add infrastructure to identify SGX EPC pages

Message ID 20210917213836.175138-3-tony.luck@intel.com (mailing list archive)
State New, archived
Headers show
Series Basic recovery for machine checks inside SGX | expand

Commit Message

Luck, Tony Sept. 17, 2021, 9:38 p.m. UTC
X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray.

Create a function arch_is_platform_page() that simply reports whether an address
is an EPC page for use elsewhere in the kernel. The ACPI error injection
code needs this function and is typically built as a module, so export it.

Note that arch_is_platform_page() will be slower than other similar "what type
is this page" functions that can simply check bits in the "struct page".
If there is some future performance critical user of this function it
may need to be implemented in a more efficient way.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 10 ++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h  |  1 +
 2 files changed, 11 insertions(+)

Comments

Dave Hansen Sept. 21, 2021, 8:23 p.m. UTC | #1
On 9/17/21 2:38 PM, Tony Luck wrote:
>  /*
>   * These variables are part of the state of the reclaimer, and must be accessed
> @@ -649,6 +650,9 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
>  	}
>  
>  	section->phys_addr = phys_addr;
> +	section->end_phys_addr = phys_addr + size - 1;
> +	xa_store_range(&epc_page_ranges, section->phys_addr,
> +		       section->end_phys_addr, section, GFP_KERNEL);

Did we ever figure out how much space storing really big ranges in the
xarray consumes?
Luck, Tony Sept. 21, 2021, 8:50 p.m. UTC | #2
>>  	section->phys_addr = phys_addr;
>> +	section->end_phys_addr = phys_addr + size - 1;
>> +	xa_store_range(&epc_page_ranges, section->phys_addr,
>> +		       section->end_phys_addr, section, GFP_KERNEL);
>
> Did we ever figure out how much space storing really big ranges in the
> xarray consumes?

No. Willy said the existing xarray code would be less than optimal with
this usage, but that things would be much better when he applied some
maple tree updates to the internals of xarray.

If there is some easy way to measure the memory backing an xarray I'm
happy to get the data. Or if someone else can synthesize it ... the two
ranges on my system that are added to the xarray are:

$ dmesg | grep -i sgx
[    8.496844] sgx: EPC section 0x8000c00000-0x807f7fffff
[    8.505118] sgx: EPC section 0x10000c00000-0x1007fffffff

I.e. two ranges of a bit under 2GB each.

But I don't think the overhead can be too hideous:

$ grep MemFree /proc/meminfo
MemFree:        1048682016 kB

I still have ~ 1TB free. Which is much greater that the 640 KB which should
be "enough for anybody" :-).

-Tony
Dave Hansen Sept. 21, 2021, 10:32 p.m. UTC | #3
On 9/21/21 1:50 PM, Luck, Tony wrote:
>> Did we ever figure out how much space storing really big ranges in the
>> xarray consumes?
> No. Willy said the existing xarray code would be less than optimal with
> this usage, but that things would be much better when he applied some
> maple tree updates to the internals of xarray.
> 
> If there is some easy way to measure the memory backing an xarray I'm
> happy to get the data. Or if someone else can synthesize it ... the two
> ranges on my system that are added to the xarray are:
> 
> $ dmesg | grep -i sgx
> [    8.496844] sgx: EPC section 0x8000c00000-0x807f7fffff
> [    8.505118] sgx: EPC section 0x10000c00000-0x1007fffffff
> 
> I.e. two ranges of a bit under 2GB each.
> 
> But I don't think the overhead can be too hideous:
> 
> $ grep MemFree /proc/meminfo
> MemFree:        1048682016 kB
> 
> I still have ~ 1TB free. Which is much greater that the 640 KB which should
> be "enough for anybody" :-).

There is a kmem_cache_create() for the xarray nodes.  So, you should be
able to see the difference in /proc/meminfo's "Slab" field.  Maybe boot
with init=/bin/sh to reduce the noise and look at meminfo both with and
without SGX your patch applied, or just with the xarray bits commented out.

I don't quite know how the data structures are munged, but xas_alloc()
makes it look like 'struct xa_node' is allocated from
radix_tree_node_cachep.  If that's the case, you should also be able to
see this in even more detail in:

# grep radix /proc/slabinfo
radix_tree_node   432305 482412    584   28    4 : tunables    0    0
 0 : slabdata  17229  17229      0

again, on a system with and without your new code enabled.
Luck, Tony Sept. 21, 2021, 11:48 p.m. UTC | #4
On Tue, Sep 21, 2021 at 03:32:14PM -0700, Dave Hansen wrote:
> On 9/21/21 1:50 PM, Luck, Tony wrote:
> >> Did we ever figure out how much space storing really big ranges in the
> >> xarray consumes?
> > No. Willy said the existing xarray code would be less than optimal with
> > this usage, but that things would be much better when he applied some
> > maple tree updates to the internals of xarray.
> > 
> > If there is some easy way to measure the memory backing an xarray I'm
> > happy to get the data. Or if someone else can synthesize it ... the two
> > ranges on my system that are added to the xarray are:
> > 
> > $ dmesg | grep -i sgx
> > [    8.496844] sgx: EPC section 0x8000c00000-0x807f7fffff
> > [    8.505118] sgx: EPC section 0x10000c00000-0x1007fffffff
> > 
> > I.e. two ranges of a bit under 2GB each.
> > 
> > But I don't think the overhead can be too hideous:
> > 
> > $ grep MemFree /proc/meminfo
> > MemFree:        1048682016 kB
> > 
> > I still have ~ 1TB free. Which is much greater that the 640 KB which should
> > be "enough for anybody" :-).
> 
> There is a kmem_cache_create() for the xarray nodes.  So, you should be
> able to see the difference in /proc/meminfo's "Slab" field.  Maybe boot
> with init=/bin/sh to reduce the noise and look at meminfo both with and
> without SGX your patch applied, or just with the xarray bits commented out.
> 
> I don't quite know how the data structures are munged, but xas_alloc()
> makes it look like 'struct xa_node' is allocated from
> radix_tree_node_cachep.  If that's the case, you should also be able to
> see this in even more detail in:
> 
> # grep radix /proc/slabinfo
> radix_tree_node   432305 482412    584   28    4 : tunables    0    0
>  0 : slabdata  17229  17229      0
> 
> again, on a system with and without your new code enabled.


Booting with init=/bin/sh and running that grep command right away at
the prompt:

With the xa_store_range() call commented out of my kernel:

radix_tree_node     9800   9968    584   56    8 : tunables    0    0    0 : slabdata    178    178      0


With xa_store_range() enabled:

radix_tree_node     9950  10136    584   56    8 : tunables    0    0    0 : slabdata    181    181      0



The head of the file says these are the field names:

# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>

So I think this means that I have (9950 - 9800) * 584 = 87600 more bytes
allocated. Maybe that's a lot? But percentage-wise is seems in the
noise. E.g. We allocate one "struct sgx_epc_page" for each SGX page.
On my system I have 4GB of SGX EPC, so around 32 MB of these structures.

-Tony
Dave Hansen Sept. 21, 2021, 11:50 p.m. UTC | #5
On 9/21/21 4:48 PM, Luck, Tony wrote:
> 
> # name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
> 
> So I think this means that I have (9950 - 9800) * 584 = 87600 more bytes
> allocated. Maybe that's a lot? But percentage-wise is seems in the
> noise. E.g. We allocate one "struct sgx_epc_page" for each SGX page.
> On my system I have 4GB of SGX EPC, so around 32 MB of these structures.

100k for 4GB of EPC is certainly in the noise as far as I'm concerned.

Thanks for checking this.
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 4a5b51d16133..10892513212d 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -20,6 +20,7 @@  struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
+static DEFINE_XARRAY(epc_page_ranges);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
@@ -649,6 +650,9 @@  static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	section->end_phys_addr = phys_addr + size - 1;
+	xa_store_range(&epc_page_ranges, section->phys_addr,
+		       section->end_phys_addr, section, GFP_KERNEL);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -660,6 +664,12 @@  static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+bool arch_is_platform_page(u64 paddr)
+{
+	return !!xa_load(&epc_page_ranges, paddr);
+}
+EXPORT_SYMBOL_GPL(arch_is_platform_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 8b1be10a46f6..6a55b1971956 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -54,6 +54,7 @@  struct sgx_numa_node {
  */
 struct sgx_epc_section {
 	unsigned long phys_addr;
+	unsigned long end_phys_addr;
 	void *virt_addr;
 	struct sgx_epc_page *pages;
 	struct sgx_numa_node *node;