[v4,16/18] x86/sgx: Limit process EPC usage with misc cgroup controller

From: Kristen Carlson Accardi <kristen@linux.intel.com>

From: Kristen Carlson Accardi <kristen@linux.intel.com>

Implement support for cgroup control of SGX Enclave Page Cache (EPC)
memory using the misc cgroup controller. EPC memory is independent
from normal system memory, e.g. must be reserved at boot from RAM and
cannot be converted between EPC and normal memory while the system is
running. EPC is managed by the SGX subsystem and is not accounted by
the memory controller.

Much like normal system memory, EPC memory can be overcommitted via
virtual memory techniques and pages can be swapped out of the EPC to
their backing store (normal system memory, e.g. shmem).  The SGX EPC
subsystem is analogous to the memory subsystem and the SGX EPC controller
is in turn analogous to the memory controller; it implements limit and
protection models for EPC memory.

The misc controller provides a mechanism to set a hard limit of EPC
usage via the "sgx_epc" resource in "misc.max". The total EPC memory
available on the system is reported via the "sgx_epc" resource in
"misc.capacity".

This patch was modified from its original version to use the misc cgroup
controller instead of a custom controller.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
Tested-by: Mikko Ylinen <mikko.ylinen@linux.intel.com>

Cc: Sean Christopherson <seanjc@google.com>
---
V4:
- Fix a white space issue in Kconfig (Randy).
- Update comments for LRU list as it can be owned by a cgroup.
- Fix comments for sgx_reclaim_epc_pages() and use IS_ENABLED consistently (Mikko)

V3:

1) Use the same maximum number of reclaiming candidate pages to be
processed, SGX_NR_TO_SCAN_MAX, for each reclaiming iteration in both
cgroup worker function and ksgxd. This fixes an overflow in the
backing store buffer with the same fixed size allocated on stack in
sgx_reclaim_epc_pages().

2) Initialize max for root EPC cgroup. Otherwise, all
misc_cg_try_charge() calls would fail as it checks for all limits of
ancestors all the way to the root node.

3) Start reclaiming whenever misc_cg_try_charge fails. Removed all
re-checks for limits and current usage. For all purposes and intent,
when misc_try_charge() fails, reclaiming is needed. This also corrects
an error of not reclaiming when the child limit is larger than one of
its ancestors.

4) Handle failure on charging to the root EPC cgroup. Failure on charging
to root means we are at or above capacity, so start reclaiming or return
OOM error.

5) Removed the custom cgroup tree walking iterator with epoch tracking
logic. Replaced it with just the plain css_for_each_descendant_pre
iterator. The custom iterator implemented a rather complex epoch scheme
I believe was intended to prevent extra reclaiming from multiple worker
threads doing the same walk but it turned out not matter much as each
thread would only reclaim when usage is above limit. Using the plain
css_for_each_descendant_pre iterator simplified code a bit.

6) Do not reclaim synchronously in misc_max_write callback which would
block the user. Instead queue an async work item to run the reclaiming
loop.

7) Other minor refactoring:
- Remove unused params in epc_cgroup APIs
- centralize uncharge into sgx_free_epc_page()
---
 arch/x86/Kconfig                     |  13 +
 arch/x86/kernel/cpu/sgx/Makefile     |   1 +
 arch/x86/kernel/cpu/sgx/epc_cgroup.c | 406 +++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/epc_cgroup.h |  59 ++++
 arch/x86/kernel/cpu/sgx/main.c       |  67 ++++-
 arch/x86/kernel/cpu/sgx/sgx.h        |  17 +-
 6 files changed, 547 insertions(+), 16 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c
 create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h

Message ID	20230913040635.28815-17-haitao.huang@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-sgx-owner@vger.kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BC36EEB594 for <linux-sgx@archiver.kernel.org>; Wed, 13 Sep 2023 04:09:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238299AbjIMEJQ (ORCPT <rfc822;linux-sgx@archiver.kernel.org>); Wed, 13 Sep 2023 00:09:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238295AbjIMEJK (ORCPT <rfc822;linux-sgx@vger.kernel.org>); Wed, 13 Sep 2023 00:09:10 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B99C19A3; Tue, 12 Sep 2023 21:09:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1694578140; x=1726114140; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EhPugwfLd9mvCDSTYHKRajKTodZivqGU9oTPowBNYq8=; b=QaV7uNXjscx2TWnT6xQJkc6h6fCYt/GL3n7LUeiaJ1kIf+KcuWm3J5/C B7f8EXJG+Y6lHYI72rdbYd9dQbPp5QGNhu1TsE5HLxXmkIvbaYoMzQfvt oIReKtKcm9fHI+nYNc0YOHoq2QWJtyJw+uCpt+nVfdsQnJbR7TWiv0puQ Q4SWNxPj2Ig5Nl878sHmgQv0LT4KH5o0t9B5Ix+HCftx/1FgvbKeRnkVo tAW1+K4PMmD9gUbUHhPQNuF/zc7ChHhgA9wYyXqhNYgIWlzwcNso3OLvM wIHkK+Plr8ZvOVN5MAgEfZMFP7a8Itz38tK+OIrs0w4TIuc8PRKgTuhBc A==; X-IronPort-AV: E=McAfee;i="6600,9927,10831"; a="357990510" X-IronPort-AV: E=Sophos;i="6.02,142,1688454000"; d="scan'208";a="357990510" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Sep 2023 21:06:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10831"; a="747156008" X-IronPort-AV: E=Sophos;i="6.02,142,1688454000"; d="scan'208";a="747156008" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga007.fm.intel.com with ESMTP; 12 Sep 2023 21:06:54 -0700 From: Haitao Huang <haitao.huang@linux.intel.com> To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v4 16/18] x86/sgx: Limit process EPC usage with misc cgroup controller Date: Tue, 12 Sep 2023 21:06:33 -0700 Message-Id: <20230913040635.28815-17-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230913040635.28815-1-haitao.huang@linux.intel.com> References: <20230913040635.28815-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-sgx.vger.kernel.org> X-Mailing-List: linux-sgx@vger.kernel.org
Series	Add Cgroup support for SGX EPC memory \| expand [v4,00/18] Add Cgroup support for SGX EPC memory [v4,01/18] cgroup/misc: Add per resource callbacks for CSS events [v4,02/18] cgroup/misc: Add SGX EPC resource type and export APIs for SGX driver [v4,03/18] x86/sgx: Add sgx_epc_lru_lists to encapsulate LRU lists [v4,04/18] x86/sgx: Use sgx_epc_lru_lists for existing active page list [v4,05/18] x86/sgx: Store reclaimable EPC pages in sgx_epc_lru_lists [v4,06/18] x86/sgx: Introduce EPC page states [v4,07/18] x86/sgx: Introduce RECLAIM_IN_PROGRESS state [v4,08/18] x86/sgx: Use a list to track to-be-reclaimed pages [v4,09/18] x86/sgx: Store struct sgx_encl when allocating new VA pages [v4,10/18] x86/sgx: Add EPC page flags to identify owner types [v4,11/18] x86/sgx: store unreclaimable pages in LRU lists [v4,12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC [v4,13/18] x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup [v4,14/18] x86/sgx: Add helper to grab pages from an arbitrary EPC LRU [v4,15/18] x86/sgx: Prepare for multiple LRUs [v4,16/18] x86/sgx: Limit process EPC usage with misc cgroup controller [v4,17/18] Docs/x86/sgx: Add description for cgroup support [v4,18/18] selftests/sgx: Add scripts for epc cgroup testing

[v4,16/18] x86/sgx: Limit process EPC usage with misc cgroup controller

Commit Message

Comments

Patch