Message ID | 20210917213836.175138-3-tony.luck@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Basic recovery for machine checks inside SGX | expand |
On 9/17/21 2:38 PM, Tony Luck wrote: > /* > * These variables are part of the state of the reclaimer, and must be accessed > @@ -649,6 +650,9 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, > } > > section->phys_addr = phys_addr; > + section->end_phys_addr = phys_addr + size - 1; > + xa_store_range(&epc_page_ranges, section->phys_addr, > + section->end_phys_addr, section, GFP_KERNEL); Did we ever figure out how much space storing really big ranges in the xarray consumes?
>> section->phys_addr = phys_addr; >> + section->end_phys_addr = phys_addr + size - 1; >> + xa_store_range(&epc_page_ranges, section->phys_addr, >> + section->end_phys_addr, section, GFP_KERNEL); > > Did we ever figure out how much space storing really big ranges in the > xarray consumes? No. Willy said the existing xarray code would be less than optimal with this usage, but that things would be much better when he applied some maple tree updates to the internals of xarray. If there is some easy way to measure the memory backing an xarray I'm happy to get the data. Or if someone else can synthesize it ... the two ranges on my system that are added to the xarray are: $ dmesg | grep -i sgx [ 8.496844] sgx: EPC section 0x8000c00000-0x807f7fffff [ 8.505118] sgx: EPC section 0x10000c00000-0x1007fffffff I.e. two ranges of a bit under 2GB each. But I don't think the overhead can be too hideous: $ grep MemFree /proc/meminfo MemFree: 1048682016 kB I still have ~ 1TB free. Which is much greater that the 640 KB which should be "enough for anybody" :-). -Tony
On 9/21/21 1:50 PM, Luck, Tony wrote: >> Did we ever figure out how much space storing really big ranges in the >> xarray consumes? > No. Willy said the existing xarray code would be less than optimal with > this usage, but that things would be much better when he applied some > maple tree updates to the internals of xarray. > > If there is some easy way to measure the memory backing an xarray I'm > happy to get the data. Or if someone else can synthesize it ... the two > ranges on my system that are added to the xarray are: > > $ dmesg | grep -i sgx > [ 8.496844] sgx: EPC section 0x8000c00000-0x807f7fffff > [ 8.505118] sgx: EPC section 0x10000c00000-0x1007fffffff > > I.e. two ranges of a bit under 2GB each. > > But I don't think the overhead can be too hideous: > > $ grep MemFree /proc/meminfo > MemFree: 1048682016 kB > > I still have ~ 1TB free. Which is much greater that the 640 KB which should > be "enough for anybody" :-). There is a kmem_cache_create() for the xarray nodes. So, you should be able to see the difference in /proc/meminfo's "Slab" field. Maybe boot with init=/bin/sh to reduce the noise and look at meminfo both with and without SGX your patch applied, or just with the xarray bits commented out. I don't quite know how the data structures are munged, but xas_alloc() makes it look like 'struct xa_node' is allocated from radix_tree_node_cachep. If that's the case, you should also be able to see this in even more detail in: # grep radix /proc/slabinfo radix_tree_node 432305 482412 584 28 4 : tunables 0 0 0 : slabdata 17229 17229 0 again, on a system with and without your new code enabled.
On Tue, Sep 21, 2021 at 03:32:14PM -0700, Dave Hansen wrote: > On 9/21/21 1:50 PM, Luck, Tony wrote: > >> Did we ever figure out how much space storing really big ranges in the > >> xarray consumes? > > No. Willy said the existing xarray code would be less than optimal with > > this usage, but that things would be much better when he applied some > > maple tree updates to the internals of xarray. > > > > If there is some easy way to measure the memory backing an xarray I'm > > happy to get the data. Or if someone else can synthesize it ... the two > > ranges on my system that are added to the xarray are: > > > > $ dmesg | grep -i sgx > > [ 8.496844] sgx: EPC section 0x8000c00000-0x807f7fffff > > [ 8.505118] sgx: EPC section 0x10000c00000-0x1007fffffff > > > > I.e. two ranges of a bit under 2GB each. > > > > But I don't think the overhead can be too hideous: > > > > $ grep MemFree /proc/meminfo > > MemFree: 1048682016 kB > > > > I still have ~ 1TB free. Which is much greater that the 640 KB which should > > be "enough for anybody" :-). > > There is a kmem_cache_create() for the xarray nodes. So, you should be > able to see the difference in /proc/meminfo's "Slab" field. Maybe boot > with init=/bin/sh to reduce the noise and look at meminfo both with and > without SGX your patch applied, or just with the xarray bits commented out. > > I don't quite know how the data structures are munged, but xas_alloc() > makes it look like 'struct xa_node' is allocated from > radix_tree_node_cachep. If that's the case, you should also be able to > see this in even more detail in: > > # grep radix /proc/slabinfo > radix_tree_node 432305 482412 584 28 4 : tunables 0 0 > 0 : slabdata 17229 17229 0 > > again, on a system with and without your new code enabled. Booting with init=/bin/sh and running that grep command right away at the prompt: With the xa_store_range() call commented out of my kernel: radix_tree_node 9800 9968 584 56 8 : tunables 0 0 0 : slabdata 178 178 0 With xa_store_range() enabled: radix_tree_node 9950 10136 584 56 8 : tunables 0 0 0 : slabdata 181 181 0 The head of the file says these are the field names: # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> So I think this means that I have (9950 - 9800) * 584 = 87600 more bytes allocated. Maybe that's a lot? But percentage-wise is seems in the noise. E.g. We allocate one "struct sgx_epc_page" for each SGX page. On my system I have 4GB of SGX EPC, so around 32 MB of these structures. -Tony
On 9/21/21 4:48 PM, Luck, Tony wrote: > > # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> > > So I think this means that I have (9950 - 9800) * 584 = 87600 more bytes > allocated. Maybe that's a lot? But percentage-wise is seems in the > noise. E.g. We allocate one "struct sgx_epc_page" for each SGX page. > On my system I have 4GB of SGX EPC, so around 32 MB of these structures. 100k for 4GB of EPC is certainly in the noise as far as I'm concerned. Thanks for checking this.
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 4a5b51d16133..10892513212d 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; static int sgx_nr_epc_sections; static struct task_struct *ksgxd_tsk; static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq); +static DEFINE_XARRAY(epc_page_ranges); /* * These variables are part of the state of the reclaimer, and must be accessed @@ -649,6 +650,9 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, } section->phys_addr = phys_addr; + section->end_phys_addr = phys_addr + size - 1; + xa_store_range(&epc_page_ranges, section->phys_addr, + section->end_phys_addr, section, GFP_KERNEL); for (i = 0; i < nr_pages; i++) { section->pages[i].section = index; @@ -660,6 +664,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, return true; } +bool arch_is_platform_page(u64 paddr) +{ + return !!xa_load(&epc_page_ranges, paddr); +} +EXPORT_SYMBOL_GPL(arch_is_platform_page); + /** * A section metric is concatenated in a way that @low bits 12-31 define the * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 8b1be10a46f6..6a55b1971956 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -54,6 +54,7 @@ struct sgx_numa_node { */ struct sgx_epc_section { unsigned long phys_addr; + unsigned long end_phys_addr; void *virt_addr; struct sgx_epc_page *pages; struct sgx_numa_node *node;
X86 machine check architecture reports a physical address when there is a memory error. Handling that error requires a method to determine whether the physical address reported is in any of the areas reserved for EPC pages by BIOS. SGX EPC pages do not have Linux "struct page" associated with them. Keep track of the mapping from ranges of EPC pages to the sections that contain them using an xarray. Create a function arch_is_platform_page() that simply reports whether an address is an EPC page for use elsewhere in the kernel. The ACPI error injection code needs this function and is typically built as a module, so export it. Note that arch_is_platform_page() will be slower than other similar "what type is this page" functions that can simply check bits in the "struct page". If there is some future performance critical user of this function it may need to be implemented in a more efficient way. Signed-off-by: Tony Luck <tony.luck@intel.com> --- arch/x86/kernel/cpu/sgx/main.c | 10 ++++++++++ arch/x86/kernel/cpu/sgx/sgx.h | 1 + 2 files changed, 11 insertions(+)