Message ID | 20230712230202.47929-1-haitao.huang@linux.intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Add Cgroup support for SGX EPC memory | expand |
On Wed Jul 12, 2023 at 11:01 PM UTC, Haitao Huang wrote: > SGX EPC memory allocations are separate from normal RAM allocations, and is > managed solely by the SGX subsystem. The existing cgroup memory controller > cannot be used to limit or account for SGX EPC memory, which is a desirable > feature in some environments, e.g., support for pod level control in a > Kubernates cluster on a VM or baremetal host [1,2] in those environments. > > This patchset implements the support for sgx_epc memory within the misc > cgroup controller. The user can use the misc cgroup controller to set and > enforce a max limit on total EPC usage per cgroup. The implementation > reports current usage and events of reaching the limit per cgroup as well > as the total system capacity. > > This work was originally authored by Sean Christopherson a few years ago, > and previously modified by Kristen C. Accardi to work with more recent > kernels, and to utilize the misc cgroup controller rather than a custom > controller. Now I updated the patches based on review comments on the V2 > series[3], simplified a few aspects of the implementation/design and fixed > some stability issues found from testing, while keeping the same user space > facing interfaces. > > The patchset adds support for multiple LRUs to track both reclaimable EPC > pages (i.e. pages the reclaimer knows about), as well as unreclaimable EPC > pages (i.e. pages which the reclaimer isn't aware of, such as VA pages). > These pages are assigned to an LRU, as well as an enclave, so that an > enclave's full EPC usage can be tracked, and limited to a max value. During > OOM events, an enclave can be have its memory zapped, and all the EPC pages > not tracked by the reclaimer can be freed. > > I appreciate your comments and feedback. > > Summary of changes from v2: (more details in commit logs) > > * Added EPC states to replace flags in sgx_epc_page struct. (Jarkko) > * Unrolled wrappers for cond_resched, list (Dave) > * Separate patches for adding reclaimable and unreclaimable lists. (Dave) > * Other improvments on patch flow, commit messages, styles. (Dave, Jarkko) > * Simplified the cgroup tree walking with plain > css_for_each_descendant_pre. > * Fixed race conditions and crashes. > * OOM killer to wait for the victim enclave pages being reclaimed. > * Unblock the user by handling misc_max_write callback asynchronously. > * Rebased onto 6.4 and no longer base this series on the MCA patchset. > * Fix an overflow in misc_try_charge. > * Fix a NULL pointer in SGX PF handler. > * Updated and included the SGX selftest patches previously reviewed. Those > patches fix issues triggered in high EPC pressure required for cgroup > testing. > * Added test scripts to help setup and test SGX EPC cgroups. > > [1]https://lore.kernel.org/all/DM6PR21MB11772A6ED915825854B419D6C4989@DM6PR21MB1177.namprd21.prod.outlook.com/ > [2]https://lore.kernel.org/all/ZD7Iutppjj+muH4p@himmelriiki/ > [3]https://lore.kernel.org/all/20221202183655.3767674-1-kristen@linux.intel.com/ > [4]Documentation/arch/x86/sgx.rst, Section "Virtual EPC" > > Haitao Huang (6): > x86/sgx: Store struct sgx_encl when allocating new VA pages > x86/sgx: Introduce EPC page states > x86/sgx: fix a NULL pointer > cgroup/misc: Fix an overflow > selftests/sgx: Retry the ioctl()'s returned with EAGAIN > selftests/sgx: Add scripts for epc cgroup testing > > Jarkko Sakkinen (3): > selftests/sgx: Move ENCL_HEAP_SIZE_DEFAULT to main.c > selftests/sgx: Use encl->encl_size in sigstruct.c > selftests/sgx: Include the dynamic heap size to the ELRANGE > calculation > > Kristen Carlson Accardi (9): > x86/sgx: Add 'struct sgx_epc_lru_lists' to encapsulate lru list(s) > x86/sgx: Use sgx_epc_lru_lists for existing active page list > x86/sgx: Store reclaimable epc pages in sgx_epc_lru_lists > x86/sgx: store unreclaimable EPC pages in sgx_epc_lru_lists > x86/sgx: Use a list to track to-be-reclaimed pages > cgroup/misc: Add per resource callbacks for CSS events > cgroup/misc: Add SGX EPC resource type and export APIs for SGX driver > x86/sgx: Limit process EPC usage with misc cgroup controller > Docs/x86/sgx: Add description for cgroup support > > Sean Christopherson (9): > x86/sgx: Add EPC page flags to identify owner type > x86/sgx: Introduce RECLAIM_IN_PROGRESS state > x86/sgx: Allow reclaiming up to 32 pages, but scan 16 by default > x85/sgx: Return the number of EPC pages that were successfully > reclaimed > x86/sgx: Add option to ignore age of page during EPC reclaim > x86/sgx: Prepare for multiple LRUs > x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup > x86/sgx: Add helper to grab pages from an arbitrary EPC LRU > x86/sgx: Add EPC OOM path to forcefully reclaim EPC > > Vijay Dhanraj (1): > selftests/sgx: Add SGX selftest augment_via_eaccept_long > > Documentation/arch/x86/sgx.rst | 77 ++++ > arch/x86/Kconfig | 13 + > arch/x86/kernel/cpu/sgx/Makefile | 1 + > arch/x86/kernel/cpu/sgx/driver.c | 27 +- > arch/x86/kernel/cpu/sgx/encl.c | 95 +++- > arch/x86/kernel/cpu/sgx/encl.h | 4 +- > arch/x86/kernel/cpu/sgx/epc_cgroup.c | 406 ++++++++++++++++++ > arch/x86/kernel/cpu/sgx/epc_cgroup.h | 60 +++ > arch/x86/kernel/cpu/sgx/ioctl.c | 25 +- > arch/x86/kernel/cpu/sgx/main.c | 406 ++++++++++++++---- > arch/x86/kernel/cpu/sgx/sgx.h | 113 ++++- > include/linux/misc_cgroup.h | 34 ++ > kernel/cgroup/misc.c | 63 ++- > tools/testing/selftests/sgx/load.c | 8 +- > tools/testing/selftests/sgx/main.c | 177 +++++++- > tools/testing/selftests/sgx/main.h | 6 +- > .../selftests/sgx/run_tests_in_misc_cg.sh | 68 +++ > tools/testing/selftests/sgx/setup_epc_cg.sh | 29 ++ > tools/testing/selftests/sgx/sigstruct.c | 8 +- > .../selftests/sgx/watch_misc_for_tests.sh | 13 + > 20 files changed, 1446 insertions(+), 187 deletions(-) > create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c > create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h > create mode 100755 tools/testing/selftests/sgx/run_tests_in_misc_cg.sh > create mode 100755 tools/testing/selftests/sgx/setup_epc_cg.sh > create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh > > -- > 2.25.1 Thanks for taking the effort, must have been tedious! BR, Jarkko
Hi Haitao, On 7/12/2023 4:01 PM, Haitao Huang wrote: > I appreciate your comments and feedback. > Nit: You missed emailing the cover letter to x86@kernel.org. I think a few other people included in the individual patches are also missing in the cover letter. In general, it might be useful to keep the email list consistent across the cover letter and the individual patches. -Sohil
On Mon, 24 Jul 2023 14:09:21 -0500, Sohil Mehta <sohil.mehta@intel.com> wrote: > Hi Haitao, > > On 7/12/2023 4:01 PM, Haitao Huang wrote: > >> I appreciate your comments and feedback. >> > > Nit: You missed emailing the cover letter to x86@kernel.org. I think a > few other people included in the individual patches are also missing in > the cover letter. > > In general, it might be useful to keep the email list consistent across > the cover letter and the individual patches. > > -Sohil Thanks I'll change to use the same set of lists and addresses for all patches and cover letters in next version. BR Haitao
On Wed, Jul 12, 2023 at 04:01:34PM -0700, Haitao Huang wrote: > SGX EPC memory allocations are separate from normal RAM allocations, and is > managed solely by the SGX subsystem. The existing cgroup memory controller > cannot be used to limit or account for SGX EPC memory, which is a desirable > feature in some environments, e.g., support for pod level control in a > Kubernates cluster on a VM or baremetal host [1,2] in those environments. > > This patchset implements the support for sgx_epc memory within the misc > cgroup controller. The user can use the misc cgroup controller to set and > enforce a max limit on total EPC usage per cgroup. The implementation > reports current usage and events of reaching the limit per cgroup as well > as the total system capacity. > > This work was originally authored by Sean Christopherson a few years ago, > and previously modified by Kristen C. Accardi to work with more recent > kernels, and to utilize the misc cgroup controller rather than a custom > controller. Now I updated the patches based on review comments on the V2 > series[3], simplified a few aspects of the implementation/design and fixed > some stability issues found from testing, while keeping the same user space > facing interfaces. > > The patchset adds support for multiple LRUs to track both reclaimable EPC > pages (i.e. pages the reclaimer knows about), as well as unreclaimable EPC > pages (i.e. pages which the reclaimer isn't aware of, such as VA pages). > These pages are assigned to an LRU, as well as an enclave, so that an > enclave's full EPC usage can be tracked, and limited to a max value. During > OOM events, an enclave can be have its memory zapped, and all the EPC pages > not tracked by the reclaimer can be freed. > > I appreciate your comments and feedback. I've been stressing this patch set in my Kubernetes cluster with a few simultaneous replicas with two Gramine containers (one with EDMM and another one without) in each and per container misc.max limits set. I've not observed any issues and everything seems to be as expected: per container EPC usage is capped to misc.max and its memory.limit triggers OOM for the reclaimed EPC. Tested-by: Mikko Ylinen <mikko.ylinen@linux.intel.com> > > Summary of changes from v2: (more details in commit logs) > > * Added EPC states to replace flags in sgx_epc_page struct. (Jarkko) > * Unrolled wrappers for cond_resched, list (Dave) > * Separate patches for adding reclaimable and unreclaimable lists. (Dave) > * Other improvments on patch flow, commit messages, styles. (Dave, Jarkko) > * Simplified the cgroup tree walking with plain > css_for_each_descendant_pre. > * Fixed race conditions and crashes. > * OOM killer to wait for the victim enclave pages being reclaimed. > * Unblock the user by handling misc_max_write callback asynchronously. > * Rebased onto 6.4 and no longer base this series on the MCA patchset. > * Fix an overflow in misc_try_charge. > * Fix a NULL pointer in SGX PF handler. > * Updated and included the SGX selftest patches previously reviewed. Those > patches fix issues triggered in high EPC pressure required for cgroup > testing. > * Added test scripts to help setup and test SGX EPC cgroups. > > [1]https://lore.kernel.org/all/DM6PR21MB11772A6ED915825854B419D6C4989@DM6PR21MB1177.namprd21.prod.outlook.com/ > [2]https://lore.kernel.org/all/ZD7Iutppjj+muH4p@himmelriiki/ > [3]https://lore.kernel.org/all/20221202183655.3767674-1-kristen@linux.intel.com/ > [4]Documentation/arch/x86/sgx.rst, Section "Virtual EPC" > > Haitao Huang (6): > x86/sgx: Store struct sgx_encl when allocating new VA pages > x86/sgx: Introduce EPC page states > x86/sgx: fix a NULL pointer > cgroup/misc: Fix an overflow > selftests/sgx: Retry the ioctl()'s returned with EAGAIN > selftests/sgx: Add scripts for epc cgroup testing > > Jarkko Sakkinen (3): > selftests/sgx: Move ENCL_HEAP_SIZE_DEFAULT to main.c > selftests/sgx: Use encl->encl_size in sigstruct.c > selftests/sgx: Include the dynamic heap size to the ELRANGE > calculation > > Kristen Carlson Accardi (9): > x86/sgx: Add 'struct sgx_epc_lru_lists' to encapsulate lru list(s) > x86/sgx: Use sgx_epc_lru_lists for existing active page list > x86/sgx: Store reclaimable epc pages in sgx_epc_lru_lists > x86/sgx: store unreclaimable EPC pages in sgx_epc_lru_lists > x86/sgx: Use a list to track to-be-reclaimed pages > cgroup/misc: Add per resource callbacks for CSS events > cgroup/misc: Add SGX EPC resource type and export APIs for SGX driver > x86/sgx: Limit process EPC usage with misc cgroup controller > Docs/x86/sgx: Add description for cgroup support > > Sean Christopherson (9): > x86/sgx: Add EPC page flags to identify owner type > x86/sgx: Introduce RECLAIM_IN_PROGRESS state > x86/sgx: Allow reclaiming up to 32 pages, but scan 16 by default > x85/sgx: Return the number of EPC pages that were successfully > reclaimed > x86/sgx: Add option to ignore age of page during EPC reclaim > x86/sgx: Prepare for multiple LRUs > x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup > x86/sgx: Add helper to grab pages from an arbitrary EPC LRU > x86/sgx: Add EPC OOM path to forcefully reclaim EPC > > Vijay Dhanraj (1): > selftests/sgx: Add SGX selftest augment_via_eaccept_long > > Documentation/arch/x86/sgx.rst | 77 ++++ > arch/x86/Kconfig | 13 + > arch/x86/kernel/cpu/sgx/Makefile | 1 + > arch/x86/kernel/cpu/sgx/driver.c | 27 +- > arch/x86/kernel/cpu/sgx/encl.c | 95 +++- > arch/x86/kernel/cpu/sgx/encl.h | 4 +- > arch/x86/kernel/cpu/sgx/epc_cgroup.c | 406 ++++++++++++++++++ > arch/x86/kernel/cpu/sgx/epc_cgroup.h | 60 +++ > arch/x86/kernel/cpu/sgx/ioctl.c | 25 +- > arch/x86/kernel/cpu/sgx/main.c | 406 ++++++++++++++---- > arch/x86/kernel/cpu/sgx/sgx.h | 113 ++++- > include/linux/misc_cgroup.h | 34 ++ > kernel/cgroup/misc.c | 63 ++- > tools/testing/selftests/sgx/load.c | 8 +- > tools/testing/selftests/sgx/main.c | 177 +++++++- > tools/testing/selftests/sgx/main.h | 6 +- > .../selftests/sgx/run_tests_in_misc_cg.sh | 68 +++ > tools/testing/selftests/sgx/setup_epc_cg.sh | 29 ++ > tools/testing/selftests/sgx/sigstruct.c | 8 +- > .../selftests/sgx/watch_misc_for_tests.sh | 13 + > 20 files changed, 1446 insertions(+), 187 deletions(-) > create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c > create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h > create mode 100755 tools/testing/selftests/sgx/run_tests_in_misc_cg.sh > create mode 100755 tools/testing/selftests/sgx/setup_epc_cg.sh > create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh > > -- > 2.25.1 > -- Regards, Mikko