diff mbox series

[v5,11/18] x86/sgx: store unreclaimable pages in LRU lists

Message ID 20230923030657.16148-12-haitao.huang@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Add Cgroup support for SGX EPC memory | expand

Commit Message

Haitao Huang Sept. 23, 2023, 3:06 a.m. UTC
From: Sean Christopherson <sean.j.christopherson@intel.com>

When an OOM event occurs, all pages associated with an enclave will need
to be freed, including pages that are not currently tracked by the
cgroup LRU lists.

Add a new "unreclaimable" list to the sgx_epc_lru_lists struct and
update the "sgx_record/drop_epc_pages()" functions for adding/removing
VA and SECS pages to/from this "unreclaimable" list.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Co-developed-by: Haitao Huang <haitao.huang@linux.intel.com>
Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
V4:
- Updates for patch reordering.
- Revised commit messages.
- Revised comments for the list.

V3:
- Removed tracking virtual EPC pages in unreclaimable list as host
kernel does not reclaim them. The EPC cgroups implemented later only
blocks allocating for a guest if the limit is reached by returning
-ENOMEM from sgx_alloc_epc_page() called by virt_epc, and does nothing
else. Therefore, no need to track those in LRU lists.
---
 arch/x86/kernel/cpu/sgx/encl.c  | 2 ++
 arch/x86/kernel/cpu/sgx/ioctl.c | 1 +
 arch/x86/kernel/cpu/sgx/main.c  | 3 +++
 arch/x86/kernel/cpu/sgx/sgx.h   | 8 +++++++-
 4 files changed, 13 insertions(+), 1 deletion(-)

Comments

Huang, Kai Sept. 27, 2023, 11:57 a.m. UTC | #1
On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> When an OOM event occurs, all pages associated with an enclave will need
> to be freed, including pages that are not currently tracked by the
> cgroup LRU lists.

What are "cgroup LRU lists"?

> 
> Add a new "unreclaimable" list to the sgx_epc_lru_lists struct and
> update the "sgx_record/drop_epc_pages()" functions for adding/removing
> VA and SECS pages to/from this "unreclaimable" list.

Sorry I don't follow the logic between the two paragraphs.

What is the exact problem?  How does the new "unreclaimable" list solve the
problem?
Huang, Kai Sept. 28, 2023, 9:41 a.m. UTC | #2
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -746,6 +746,7 @@ void sgx_encl_release(struct kref *ref)
>  	xa_destroy(&encl->page_array);
>  
>  	if (!encl->secs_child_cnt && encl->secs.epc_page) {
> +		sgx_drop_epc_page(encl->secs.epc_page);
>  		sgx_encl_free_epc_page(encl->secs.epc_page);
>  		encl->secs.epc_page = NULL;
>  	}

The "record" of SECS/VA pages should be done together with this.  I see no
reason why the "record" and "drop" are separated into different patches.
Haitao Huang Oct. 3, 2023, 5:15 a.m. UTC | #3
On Thu, 28 Sep 2023 04:41:33 -0500, Huang, Kai <kai.huang@intel.com> wrote:

>
>> --- a/arch/x86/kernel/cpu/sgx/encl.c
>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
>> @@ -746,6 +746,7 @@ void sgx_encl_release(struct kref *ref)
>>  	xa_destroy(&encl->page_array);
>>
>>  	if (!encl->secs_child_cnt && encl->secs.epc_page) {
>> +		sgx_drop_epc_page(encl->secs.epc_page);
>>  		sgx_encl_free_epc_page(encl->secs.epc_page);
>>  		encl->secs.epc_page = NULL;
>>  	}
>
> The "record" of SECS/VA pages should be done together with this.  I see  
> no
> reason why the "record" and "drop" are separated into different patches.

"record" of SECS/VA pages are done in this patch. Before nothing done in  
"record" for them because no tracking LRU lists for them. Now they are  
tracked.

Thanks
Haitao
Haitao Huang Oct. 3, 2023, 5:42 a.m. UTC | #4
On Wed, 27 Sep 2023 06:57:18 -0500, Huang, Kai <kai.huang@intel.com> wrote:

> On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
>> From: Sean Christopherson <sean.j.christopherson@intel.com>
>>
>> When an OOM event occurs, all pages associated with an enclave will need
>> to be freed, including pages that are not currently tracked by the
>> cgroup LRU lists.
>
> What are "cgroup LRU lists"?
>
Will reword it. At them moment there is only one global sgx_epc_lru_lists.
>>
>> Add a new "unreclaimable" list to the sgx_epc_lru_lists struct and
>> update the "sgx_record/drop_epc_pages()" functions for adding/removing
>> VA and SECS pages to/from this "unreclaimable" list.
>
> Sorry I don't follow the logic between the two paragraphs.
>
> What is the exact problem?  How does the new "unreclaimable" list solve  
> the
> problem?
>
>
Currently they are not tracked in a list managed by the ksgxd (future  
cgroup worker). So add a list to track them.
Thanks
Haitao
Huang, Kai Oct. 3, 2023, 8:12 p.m. UTC | #5
On Tue, 2023-10-03 at 00:15 -0500, Haitao Huang wrote:
> On Thu, 28 Sep 2023 04:41:33 -0500, Huang, Kai <kai.huang@intel.com> wrote:
> 
> > 
> > > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > > @@ -746,6 +746,7 @@ void sgx_encl_release(struct kref *ref)
> > >  	xa_destroy(&encl->page_array);
> > > 
> > >  	if (!encl->secs_child_cnt && encl->secs.epc_page) {
> > > +		sgx_drop_epc_page(encl->secs.epc_page);
> > >  		sgx_encl_free_epc_page(encl->secs.epc_page);
> > >  		encl->secs.epc_page = NULL;
> > >  	}
> > 
> > The "record" of SECS/VA pages should be done together with this.  I see  
> > no
> > reason why the "record" and "drop" are separated into different patches.
> 
> "record" of SECS/VA pages are done in this patch. Before nothing done in  
> "record" for them because no tracking LRU lists for them. Now they are  
> tracked.
> 
> 

I was talking about calling sgx_record_epc_page() for SECS/VA:

@@ -113,6 +113,9 @@ static int sgx_encl_create(struct sgx_encl *encl, struct
sgx_secs *secs)
 	encl->attributes = secs->attributes;
 	encl->attributes_mask = SGX_ATTR_UNPRIV_MASK;
 
+	sgx_record_epc_page(encl->secs.epc_page,
+			    SGX_EPC_PAGE_UNRECLAIMABLE);

This piece of code *literally* does the record.
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index da1657813fce..a8617e6a4b4e 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -746,6 +746,7 @@  void sgx_encl_release(struct kref *ref)
 	xa_destroy(&encl->page_array);
 
 	if (!encl->secs_child_cnt && encl->secs.epc_page) {
+		sgx_drop_epc_page(encl->secs.epc_page);
 		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 	}
@@ -754,6 +755,7 @@  void sgx_encl_release(struct kref *ref)
 		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
 					   list);
 		list_del(&va_page->list);
+		sgx_drop_epc_page(va_page->epc_page);
 		sgx_encl_free_epc_page(va_page->epc_page);
 		kfree(va_page);
 	}
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index cd338e93acc1..50ddd8988452 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -48,6 +48,7 @@  void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
 	encl->page_cnt--;
 
 	if (va_page) {
+		sgx_drop_epc_page(va_page->epc_page);
 		sgx_encl_free_epc_page(va_page->epc_page);
 		list_del(&va_page->list);
 		kfree(va_page);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index ed813288af44..f3a3ed894616 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -268,6 +268,7 @@  static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 			goto out;
 
 		sgx_encl_ewb(encl->secs.epc_page, &secs_backing);
+		sgx_drop_epc_page(encl->secs.epc_page);
 		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 
@@ -510,6 +511,8 @@  void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags)
 	page->flags |= flags;
 	if (sgx_epc_page_reclaimable(flags))
 		list_add_tail(&page->list, &sgx_global_lru.reclaimable);
+	else
+		list_add_tail(&page->list, &sgx_global_lru.unreclaimable);
 	spin_unlock(&sgx_global_lru.lock);
 }
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 51aba1cd1937..337747bef7c2 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -152,17 +152,23 @@  static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page)
 }
 
 /*
- * Tracks EPC pages reclaimable by the reclaimer (ksgxd).
+ * Contains EPC pages tracked by the reclaimer (ksgxd).
  */
 struct sgx_epc_lru_lists {
 	spinlock_t lock;
 	struct list_head reclaimable;
+	/*
+	 * Tracks SECS, VA pages,etc., pages only freeable after all its
+	 * dependent reclaimables are freed.
+	 */
+	struct list_head unreclaimable;
 };
 
 static inline void sgx_lru_init(struct sgx_epc_lru_lists *lrus)
 {
 	spin_lock_init(&lrus->lock);
 	INIT_LIST_HEAD(&lrus->reclaimable);
+	INIT_LIST_HEAD(&lrus->unreclaimable);
 }
 
 struct sgx_epc_page *__sgx_alloc_epc_page(void);