[v3,21/28] x86/sgx: Limit process EPC usage with misc cgroup controller

From: Kristen Carlson Accardi <kristen@linux.intel.com>

From: Kristen Carlson Accardi <kristen@linux.intel.com>

Implement support for cgroup control of SGX Enclave Page Cache (EPC)
memory using the misc cgroup controller. EPC memory is independent
from normal system memory, e.g. must be reserved at boot from RAM and
cannot be converted between EPC and normal memory while the system is
running. EPC is managed by the SGX subsystem and is not accounted by
the memory controller.

Much like normal system memory, EPC memory can be overcommitted via
virtual memory techniques and pages can be swapped out of the EPC to
their backing store (normal system memory, e.g. shmem).  The SGX EPC
subsystem is analogous to the memory subsytem and the SGX EPC controller
is in turn analogous to the memory controller; it implements limit and
protection models for EPC memory.

The misc controller provides a mechanism to set a hard limit of EPC
usage via the "sgx_epc" resource in "misc.max". The total EPC memory
available on the system is reported via the "sgx_epc" resource in
"misc.capacity".

This patch was modified from its original version to use the misc cgroup
controller instead of a custom controller.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>

V3:

1) Use the same maximum number of reclaiming candidate pages to be
processed, SGX_NR_TO_SCAN_MAX, for each reclaiming iteration in both
cgroup worker function and ksgxd. This fixes an overflow in the
backing store buffer with the same fixed size allocated on stack in
sgx_reclaim_epc_pages().

2) Initialize max for root EPC cgroup. Otherwise, all
misc_cg_try_charge() calls would fail as it checks for all limits of
ancestors all the way to the root node.

3) Start reclaiming whenever misc_cg_try_charge fails. Removed all
re-checks for limits and current usage. For all purposes and intent,
when misc_try_charge() fails, reclaiming is needed. This also corrects
an error of not reclaiming when the child limit is larger than one of
its ancestors.

4) Handle failure on charging to the root EPC cgroup. Failure on charging
to root means we are at or above capacity, so start reclaiming or return
OOM error.

5) Removed the custom cgroup tree walking iterator with epoch tracking
logic. Replaced it with just the plain css_for_each_descendant_pre
iterator. The custom iterator implemented a rather complex epoch scheme
I believe was intended to prevent extra reclaiming from multiple worker
threads doing the same walk but it turned out not matter much as each
thread would only reclaim when usage is above limit. Using the plain
css_for_each_descendant_pre iterator simplified code a bit.

6) Do not reclaim synchrously in misc_max_write callback which would
block the user. Instead queue an async work item to run the reclaiming
loop.

7) Other minor refactorings:
- Remove unused params in epc_cgroup APIs
- centralize uncharge into sgx_free_epc_page()
---
 arch/x86/Kconfig                     |  13 +
 arch/x86/kernel/cpu/sgx/Makefile     |   1 +
 arch/x86/kernel/cpu/sgx/epc_cgroup.c | 406 +++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/epc_cgroup.h |  60 ++++
 arch/x86/kernel/cpu/sgx/main.c       |  79 ++++--
 arch/x86/kernel/cpu/sgx/sgx.h        |  14 +-
 6 files changed, 552 insertions(+), 21 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c
 create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h

Message ID	20230712230202.47929-22-haitao.huang@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-sgx-owner@vger.kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 714BEC05053 for <linux-sgx@archiver.kernel.org>; Wed, 12 Jul 2023 23:03:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232888AbjGLXCy (ORCPT <rfc822;linux-sgx@archiver.kernel.org>); Wed, 12 Jul 2023 19:02:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232813AbjGLXCW (ORCPT <rfc822;linux-sgx@vger.kernel.org>); Wed, 12 Jul 2023 19:02:22 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C4D61992; Wed, 12 Jul 2023 16:02:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689202938; x=1720738938; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9pkdFeFKYJoU0jF7XMY7Rkw+53JGPnL8Udni3aFbNe4=; b=fq/xb+z+8aRVa5LJVisaTSOPTz8SIC4gZnN5XtvvVDW5RMWjdEB/046R ekJyisXdLs0FaVFS0t+i3eU6c7EpkxHi0j/djujb/iWaiVT02b3WHPThk y3mzaC7+iGndU2njL47FGO2SBnOPoUYc04SgfYqro8XBmc/QmicpjHeKs Ym11FGxtTWvKOKUVYNPlHvU+I/fwhgUhStzGKLPyej0I/2iZq7XkwWYhb FlcbvDPj70mn3rrteVvS/iCrPTT8emrAB5YJNDdJfOse25ZpCHlnCEMgs O14EGOMTvU075jCQKU7d5ODlPIHVd19N0s9l35uIDYjQtgeTOg31pJ95y A==; X-IronPort-AV: E=McAfee;i="6600,9927,10769"; a="428774187" X-IronPort-AV: E=Sophos;i="6.01,200,1684825200"; d="scan'208";a="428774187" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jul 2023 16:02:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10769"; a="835338667" X-IronPort-AV: E=Sophos;i="6.01,200,1684825200"; d="scan'208";a="835338667" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga002.fm.intel.com with ESMTP; 12 Jul 2023 16:02:16 -0700 From: Haitao Huang <haitao.huang@linux.intel.com> To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com> Cc: kai.huang@intel.com, reinette.chatre@intel.com, Kristen Carlson Accardi <kristen@linux.intel.com>, zhiquan1.li@intel.com, seanjc@google.com Subject: [PATCH v3 21/28] x86/sgx: Limit process EPC usage with misc cgroup controller Date: Wed, 12 Jul 2023 16:01:55 -0700 Message-Id: <20230712230202.47929-22-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230712230202.47929-1-haitao.huang@linux.intel.com> References: <20230712230202.47929-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-sgx.vger.kernel.org> X-Mailing-List: linux-sgx@vger.kernel.org
Series	Add Cgroup support for SGX EPC memory \| expand [v3,00/28] Add Cgroup support for SGX EPC memory [v3,01/28] x86/sgx: Store struct sgx_encl when allocating new VA pages [v3,02/28] x86/sgx: Add EPC page flags to identify owner type [v3,03/28] x86/sgx: Add 'struct sgx_epc_lru_lists' to encapsulate lru list(s) [v3,04/28] x86/sgx: Use sgx_epc_lru_lists for existing active page list [v3,05/28] x86/sgx: Store reclaimable epc pages in sgx_epc_lru_lists [v3,06/28] x86/sgx: store unreclaimable EPC pages in sgx_epc_lru_lists [v3,07/28] x86/sgx: Introduce EPC page states [v3,08/28] x86/sgx: Introduce RECLAIM_IN_PROGRESS state [v3,09/28] x86/sgx: Use a list to track to-be-reclaimed pages [v3,10/28] x86/sgx: Allow reclaiming up to 32 pages, but scan 16 by default [v3,11/28] x85/sgx: Return the number of EPC pages that were successfully reclaimed [v3,12/28] x86/sgx: Add option to ignore age of page during EPC reclaim [v3,13/28] x86/sgx: Prepare for multiple LRUs [v3,14/28] x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup [v3,15/28] x86/sgx: Add helper to grab pages from an arbitrary EPC LRU [v3,16/28] x86/sgx: Add EPC OOM path to forcefully reclaim EPC [v3,17/28] x86/sgx: fix a NULL pointer [v3,18/28] cgroup/misc: Fix an overflow [v3,19/28] cgroup/misc: Add per resource callbacks for CSS events [v3,20/28] cgroup/misc: Add SGX EPC resource type and export APIs for SGX driver [v3,21/28] x86/sgx: Limit process EPC usage with misc cgroup controller [v3,22/28] Docs/x86/sgx: Add description for cgroup support [v3,23/28] selftests/sgx: Retry the ioctl()'s returned with EAGAIN [v3,24/28] selftests/sgx: Move ENCL_HEAP_SIZE_DEFAULT to main.c [v3,25/28] selftests/sgx: Use encl->encl_size in sigstruct.c [v3,26/28] selftests/sgx: Include the dynamic heap size to the ELRANGE calculation [v3,27/28] selftests/sgx: Add SGX selftest augment_via_eaccept_long [v3,28/28] selftests/sgx: Add scripts for epc cgroup testing

[v3,21/28] x86/sgx: Limit process EPC usage with misc cgroup controller

Commit Message

Comments

Patch