Message ID | 20210811032133.853680-1-jarkko@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,v3] x86/sgx: Add /proc/sys/kernel/sgx/total_mem | expand |
On 8/10/21 8:21 PM, Jarkko Sakkinen wrote: > +The following sysctl files can be found in the ``/proc/sys/kernel/sgx/`` directory: > + > +``total_mem`` > + The total amount of SGX protected memory in bytes available in the system > + available for use. In other words, it describes the size of the Enclave > + Page Cache (EPC). I've been acting as if /proc is deprecated for new stuff. Shouldn't this be going in sysfs? I figured, at some point, someone is going to ask for NUMA statistics. That would tend to point in the direction of us needing something in: /sys/devices/system/node/nodeN/ Maybe 'sgxinfo' or 'sgxstat' to go along with 'meminfo'. But, we'll probably also end up needing some stats for other things. Folks have, for instance, asked for a counter of the number of instantiated enclaves. We could also use the drivers' namespaces: /sys/class/misc/sgx_enclave /sys/class/misc/sgx_provision /sys/class/misc/sgx_vepc although that is a bit awkward for reporting global resources like memory. We could create a platform device just for these stats, say: /sys/bus/platform/devices/sgx But I think platform devices are rather highly scrutinized these days. I'm not sure if SGX counts as one. /sys/kernel also appears to be a bit of a free-for-all. Perhaps it could go in: /sys/kernel/sgx or /sys/kernel/enclaves The other crazy thing we could try would be to just hijack core mm mechanisms: /proc/{meminfo,vmstat} /sys/devices/system/node/nodeN/{vmstat,meminfo} Then we can just use the existing counter infrastructure, which I think gets us into /sys and /proc. I'm not sure the mm folks would be fond of this for something arch and vendor specific, though. In any case, ABIs are hard and SGX is weird. News at 11.
On Wed, Aug 11, 2021 at 07:30:13AM -0700, Dave Hansen wrote: > On 8/10/21 8:21 PM, Jarkko Sakkinen wrote: > > +The following sysctl files can be found in the ``/proc/sys/kernel/sgx/`` directory: > > + > > +``total_mem`` > > + The total amount of SGX protected memory in bytes available in the system > > + available for use. In other words, it describes the size of the Enclave > > + Page Cache (EPC). > > I've been acting as if /proc is deprecated for new stuff. Shouldn't > this be going in sysfs? Are sysctl variables deprecated too? > I figured, at some point, someone is going to ask for NUMA statistics. > That would tend to point in the direction of us needing something in: > > /sys/devices/system/node/nodeN/ > > Maybe 'sgxinfo' or 'sgxstat' to go along with 'meminfo'. Is conetents of meminfo freezed or can a new line added, e.g. Node 0 SgxMemTotal: 32825700 kB If a new file is needed, I would name it as "sgxmeminfo" > But, we'll probably also end up needing some stats for other things. > Folks have, for instance, asked for a counter of the number of > instantiated enclaves. > > We could also use the drivers' namespaces: > > /sys/class/misc/sgx_enclave > /sys/class/misc/sgx_provision > /sys/class/misc/sgx_vepc > > although that is a bit awkward for reporting global resources like memory. I think these stats should be available when the driver is not enabled. It would be best to find a global solution for the long-run. /Jarkko
On 8/12/21 12:53 PM, Jarkko Sakkinen wrote: > On Wed, Aug 11, 2021 at 07:30:13AM -0700, Dave Hansen wrote: >> On 8/10/21 8:21 PM, Jarkko Sakkinen wrote: >>> +The following sysctl files can be found in the ``/proc/sys/kernel/sgx/`` directory: >>> + >>> +``total_mem`` >>> + The total amount of SGX protected memory in bytes available in the system >>> + available for use. In other words, it describes the size of the Enclave >>> + Page Cache (EPC). >> >> I've been acting as if /proc is deprecated for new stuff. Shouldn't >> this be going in sysfs? > > Are sysctl variables deprecated too? Adding new ones is. Adding new, related functionality to existing ones is OK. Anything not related to processes shouldn't added /proc, for many years now. >> I figured, at some point, someone is going to ask for NUMA statistics. >> That would tend to point in the direction of us needing something in: >> >> /sys/devices/system/node/nodeN/ >> >> Maybe 'sgxinfo' or 'sgxstat' to go along with 'meminfo'. > > Is conetents of meminfo freezed or can a new line added, e.g. > > Node 0 SgxMemTotal: 32825700 kB New lines get added occasionally. Things like AnonHugePages and KReclaimable are _relatively_ new additions. > If a new file is needed, I would name it as "sgxmeminfo" Yeah, that would fine. The other option would be to have an "archmeminfo" which other architectures might end up being able to use. That has the advantage of getting picked up by common tooling more widely. We might also be able to use it for things on x86 like TDX metadata to enumerate how much memory is being consumed. >> But, we'll probably also end up needing some stats for other things. >> Folks have, for instance, asked for a counter of the number of >> instantiated enclaves. >> >> We could also use the drivers' namespaces: >> >> /sys/class/misc/sgx_enclave >> /sys/class/misc/sgx_provision >> /sys/class/misc/sgx_vepc >> >> although that is a bit awkward for reporting global resources like memory. > > I think these stats should be available when the driver is not enabled. Do you mean like if it were compiled out? Or if we booted up and decided to disallow /dev/sgx_enclave because of Launch Control being locked? Either way, the drivers seem to be an odd place to do this. Probably a last resort if we don't find a better home.
diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst index dd0ac96ff9ef..32d579c90d82 100644 --- a/Documentation/x86/sgx.rst +++ b/Documentation/x86/sgx.rst @@ -250,3 +250,13 @@ user wants to deploy SGX applications both on the host and in guests on the same machine, the user should reserve enough EPC (by taking out total virtual EPC size of all SGX VMs from the physical EPC size) for host SGX applications so they can run with acceptable performance. + +Sysctls +======= + +The following sysctl files can be found in the ``/proc/sys/kernel/sgx/`` directory: + +``total_mem`` + The total amount of SGX protected memory in bytes available in the system + available for use. In other words, it describes the size of the Enclave + Page Cache (EPC). diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 63d3de02bbcc..c857253a2e5d 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -28,7 +28,10 @@ static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq); static LIST_HEAD(sgx_active_page_list); static DEFINE_SPINLOCK(sgx_reclaimer_lock); -/* The free page list lock protected variables prepend the lock. */ +/* Total EPC memory available in bytes. */ +static unsigned long sgx_total_mem; + +/* The number of free EPC pages in all nodes. */ static unsigned long sgx_nr_free_pages; /* Nodes with one or more EPC sections. */ @@ -656,6 +659,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list); } + sgx_total_mem += nr_pages * PAGE_SIZE; + return true; } @@ -790,8 +795,30 @@ int sgx_set_attribute(unsigned long *allowed_attributes, } EXPORT_SYMBOL_GPL(sgx_set_attribute); +static struct ctl_path sgx_sysctl_path[] = { + { .procname = "kernel", }, + { .procname = "sgx", }, + { } +}; + +static unsigned long sgx_total_mem_max = ~0UL; + +static struct ctl_table sgx_sysctl_table[] = { + { + .procname = "total_mem", + .data = &sgx_total_mem, + .maxlen = sizeof(unsigned long), + .mode = 0444, + .proc_handler = proc_doulongvec_minmax, + .extra1 = SYSCTL_ZERO, /* min */ + .extra2 = &sgx_total_mem_max, /* max */ + }, + { } +}; + static int __init sgx_init(void) { + struct ctl_table_header *sysctl_table_header; int ret; int i; @@ -810,6 +837,12 @@ static int __init sgx_init(void) if (ret) goto err_kthread; + sysctl_table_header = register_sysctl_paths(sgx_sysctl_path, sgx_sysctl_table); + if (!sysctl_table_header) { + pr_err("sysctl registration failed.\n"); + goto err_provision; + } + /* * Always try to initialize the native *and* KVM drivers. * The KVM driver is less picky than the native one and @@ -821,10 +854,13 @@ static int __init sgx_init(void) ret = sgx_drv_init(); if (sgx_vepc_init() && ret) - goto err_provision; + goto err_sysctl; return 0; +err_sysctl: + unregister_sysctl_table(sysctl_table_header); + err_provision: misc_deregister(&sgx_dev_provision);
The amount of EPC on the system is determined by the BIOS and it varies wildly between systems. It can be dozens of MB on desktops, or many GB on servers. Just like normal memory, SGX memory can be overcommitted. SGX has its own reclaim mechanism which kicks in when physical SGX memory (Enclave Page Cache / EPC) is exhausted. SGX memory can also be sliced into a number pieces when utilizing /dev/sgx_vepc to allocate SGX memory regions for the VM's. For all these needs and purposes, it is often useful to know what is the physical limit for SGX memory. Add /proc/sys/kernel/sgx/total_mem read-only sysctl attribute for to report the total available SGX memory to help with configuring SGX applications based on the system capabilities. Since having this makes sense both for KVM and the driver, it makes most sense to have it as a global sysfs attribute. Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> --- v3: * ``sgx_total_mem`` -> ``total_mem`` * Rewrote the commit message. v2: * Removed Dave's ack. * Rewrote Documentation/x86/sgx.rst. It was not properly updated in the previous iteration. Documentation/x86/sgx.rst | 10 +++++++++ arch/x86/kernel/cpu/sgx/main.c | 40 ++++++++++++++++++++++++++++++++-- 2 files changed, 48 insertions(+), 2 deletions(-)