Message ID | 20230923030657.16148-14-haitao.huang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add Cgroup support for SGX EPC memory | expand |
On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote: > From: Sean Christopherson <sean.j.christopherson@intel.com> > > Adjust and expose the top-level reclaim function as > sgx_reclaim_epc_pages() for use by the upcoming EPC cgroup, which will > initiate reclaim to enforce the max limit. > > Make these adjustments to the function signature. > > 1) To take a parameter that specifies the number of pages to scan for > reclaiming. Define a max value of 32, but scan 16 in the case for the > global reclaimer (ksgxd). The EPC cgroup will use it to specify a > desired number of pages to be reclaimed up to the max value of 32. > > 2) To take a flag to force reclaiming a page regardless of its age. The > EPC cgroup will use the flag to enforce its limits by draining the > reclaimable lists before resorting to other measures, e.g. forcefully > kill enclaves. > > 3) Return the number of reclaimed pages. The EPC cgroup will use the > result to track reclaiming progress and escalate to a more forceful > reclaiming mode, e.g., calling this function with the flag to ignore age > of pages. > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > Co-developed-by: Kristen Carlson Accardi <kristen@linux.intel.com> > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> > Co-developed-by: Haitao Huang <haitao.huang@linux.intel.com> > Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> > Cc: Sean Christopherson <seanjc@google.com> > --- > V4: > - Combined the 3 patches that made the individual changes to the > function signature. > - Removed 'high' limit in commit message. > --- > arch/x86/kernel/cpu/sgx/main.c | 31 +++++++++++++++++++++---------- > arch/x86/kernel/cpu/sgx/sgx.h | 1 + > 2 files changed, 22 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > index 3b875ab4dcd0..4e1a3e038db5 100644 > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -18,6 +18,11 @@ > #include "encl.h" > #include "encls.h" > > +/* > + * Maximum number of pages to scan for reclaiming. > + */ > +#define SGX_NR_TO_SCAN_MAX 32 > + > struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; > static int sgx_nr_epc_sections; > static struct task_struct *ksgxd_tsk; > @@ -279,7 +284,11 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, > mutex_unlock(&encl->lock); > } > > -/* > +/** > + * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers > + * @nr_to_scan: Number of EPC pages to scan for reclaim > + * @ignore_age: Reclaim a page even if it is young > + * > * Take a fixed number of pages from the head of the active page pool and > * reclaim them to the enclave's private shmem files. Skip the pages, which have > * been accessed since the last scan. Move those pages to the tail of active > @@ -292,15 +301,14 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, > * problematic as it would increase the lock contention too much, which would > * halt forward progress. > */ > -static void sgx_reclaim_pages(void) > +size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age) 'size_t' looks odd. Any reason to use it? Given you only scan 32 at maximum, seems 'int' is good enough? > { > - struct sgx_backing backing[SGX_NR_TO_SCAN]; > + struct sgx_backing backing[SGX_NR_TO_SCAN_MAX]; > struct sgx_epc_page *epc_page, *tmp; > struct sgx_encl_page *encl_page; > pgoff_t page_index; > LIST_HEAD(iso); > - int ret; > - int i; > + size_t ret, i; > > spin_lock(&sgx_global_lru.lock); > for (i = 0; i < SGX_NR_TO_SCAN; i++) { The function comment says * @nr_to_scan: Number of EPC pages to scan for reclaim But I don't see it is even used, if my eye isn't deceiving me? > @@ -326,13 +334,14 @@ static void sgx_reclaim_pages(void) > spin_unlock(&sgx_global_lru.lock); > > if (list_empty(&iso)) > - return; > + return 0; > > i = 0; > list_for_each_entry_safe(epc_page, tmp, &iso, list) { > encl_page = epc_page->encl_page; > > - if (!sgx_reclaimer_age(epc_page)) > + if (i == SGX_NR_TO_SCAN_MAX || i == nr_to_scan? And should we have a if (nr_to_scan < SGX_NR_TO_SCAN_MAX) return 0; at the very beginning of this function? > + (!ignore_age && !sgx_reclaimer_age(epc_page))) > goto skip; > > page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); > @@ -371,6 +380,8 @@ static void sgx_reclaim_pages(void) > > sgx_free_epc_page(epc_page); > } > + > + return i; > } > I found this function a little bit odd, given the mixing of 'nr_to_scan', SGX_NR_TO_SCAN and SGX_NR_TO_SCAN_MAX. From the changelog: 1) To take a parameter that specifies the number of pages to scan for reclaiming. Define a max value of 32, but scan 16 in the case for the global reclaimer (ksgxd). It appears we want to make this function to scan @nr_to_scan for cgroup, but still want to scan a fixed value for ksgxd, which is SGX_NR_TO_SCAN. And @nr_to_scan can be larger than SGX_NR_TO_SCAN but smaller than SGX_NR_TO_SCAN_MAX. Putting behind the mystery of why above is needed, to achieve it, is it more clear if we do below? int __sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age) { struct sgx_backing backing[SGX_NR_TO_SCAN_MAX]; ... if (nr_to_scan > SGX_NR_TO_SCAN_MAX) return 0; for (i = 0; i < nr_to_scan; i++) { ... } return reclaimed; } /* This is for ksgxd() */ int sgx_reclaim_epc_page(void) { return __sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); } EPC cgroup calls __sgx_reclaim_epc_pages() directly, or introduce another wrapper.
On Thu, 05 Oct 2023 07:24:12 -0500, Huang, Kai <kai.huang@intel.com> wrote: > On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote: >> From: Sean Christopherson <sean.j.christopherson@intel.com> >> >> Adjust and expose the top-level reclaim function as >> sgx_reclaim_epc_pages() for use by the upcoming EPC cgroup, which will >> initiate reclaim to enforce the max limit. >> >> Make these adjustments to the function signature. >> >> 1) To take a parameter that specifies the number of pages to scan for >> reclaiming. Define a max value of 32, but scan 16 in the case for the >> global reclaimer (ksgxd). The EPC cgroup will use it to specify a >> desired number of pages to be reclaimed up to the max value of 32. >> >> 2) To take a flag to force reclaiming a page regardless of its age. The >> EPC cgroup will use the flag to enforce its limits by draining the >> reclaimable lists before resorting to other measures, e.g. forcefully >> kill enclaves. >> >> 3) Return the number of reclaimed pages. The EPC cgroup will use the >> result to track reclaiming progress and escalate to a more forceful >> reclaiming mode, e.g., calling this function with the flag to ignore age >> of pages. >> >> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> >> Co-developed-by: Kristen Carlson Accardi <kristen@linux.intel.com> >> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> >> Co-developed-by: Haitao Huang <haitao.huang@linux.intel.com> >> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> >> Cc: Sean Christopherson <seanjc@google.com> >> --- >> V4: >> - Combined the 3 patches that made the individual changes to the >> function signature. >> - Removed 'high' limit in commit message. >> --- >> arch/x86/kernel/cpu/sgx/main.c | 31 +++++++++++++++++++++---------- >> arch/x86/kernel/cpu/sgx/sgx.h | 1 + >> 2 files changed, 22 insertions(+), 10 deletions(-) >> >> diff --git a/arch/x86/kernel/cpu/sgx/main.c >> b/arch/x86/kernel/cpu/sgx/main.c >> index 3b875ab4dcd0..4e1a3e038db5 100644 >> --- a/arch/x86/kernel/cpu/sgx/main.c >> +++ b/arch/x86/kernel/cpu/sgx/main.c >> @@ -18,6 +18,11 @@ >> #include "encl.h" >> #include "encls.h" >> >> +/* >> + * Maximum number of pages to scan for reclaiming. >> + */ >> +#define SGX_NR_TO_SCAN_MAX 32 >> + >> struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; >> static int sgx_nr_epc_sections; >> static struct task_struct *ksgxd_tsk; >> @@ -279,7 +284,11 @@ static void sgx_reclaimer_write(struct >> sgx_epc_page *epc_page, >> mutex_unlock(&encl->lock); >> } >> >> -/* >> +/** >> + * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers >> + * @nr_to_scan: Number of EPC pages to scan for reclaim >> + * @ignore_age: Reclaim a page even if it is young >> + * >> * Take a fixed number of pages from the head of the active page pool >> and >> * reclaim them to the enclave's private shmem files. Skip the pages, >> which have >> * been accessed since the last scan. Move those pages to the tail of >> active >> @@ -292,15 +301,14 @@ static void sgx_reclaimer_write(struct >> sgx_epc_page *epc_page, >> * problematic as it would increase the lock contention too much, >> which would >> * halt forward progress. >> */ >> -static void sgx_reclaim_pages(void) >> +size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age) > > 'size_t' looks odd. Any reason to use it? > > Given you only scan 32 at maximum, seems 'int' is good enough? > Initially was int. Jarkko was suggesting ssize_t. I changed to size_t as this function will never return negative. >> { >> - struct sgx_backing backing[SGX_NR_TO_SCAN]; >> + struct sgx_backing backing[SGX_NR_TO_SCAN_MAX]; >> struct sgx_epc_page *epc_page, *tmp; >> struct sgx_encl_page *encl_page; >> pgoff_t page_index; >> LIST_HEAD(iso); >> - int ret; >> - int i; >> + size_t ret, i; >> >> spin_lock(&sgx_global_lru.lock); >> for (i = 0; i < SGX_NR_TO_SCAN; i++) { > This should be nr_to_scan It was missed during some rebase and reordering operations. > The function comment says > > * @nr_to_scan: Number of EPC pages to scan for reclaim > > But I don't see it is even used, if my eye isn't deceiving me? > >> @@ -326,13 +334,14 @@ static void sgx_reclaim_pages(void) >> spin_unlock(&sgx_global_lru.lock); >> >> if (list_empty(&iso)) >> - return; >> + return 0; >> >> i = 0; >> list_for_each_entry_safe(epc_page, tmp, &iso, list) { >> encl_page = epc_page->encl_page; >> >> - if (!sgx_reclaimer_age(epc_page)) >> + if (i == SGX_NR_TO_SCAN_MAX || > > i == nr_to_scan? > Not needed if above for statement fixed for nr_to_scan. Anything above MAX will be skipped and put back to LRU. > And should we have a > > if (nr_to_scan < SGX_NR_TO_SCAN_MAX) > return 0; > > at the very beginning of this function? > In final version caller to make sure not call with nr_to_scan not larger than SGX_NR_TO_SCAN_MAX >> + (!ignore_age && !sgx_reclaimer_age(epc_page))) >> goto skip; >> >> page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); >> @@ -371,6 +380,8 @@ static void sgx_reclaim_pages(void) >> >> sgx_free_epc_page(epc_page); >> } >> + >> + return i; >> } >> > > I found this function a little bit odd, given the mixing of 'nr_to_scan', > SGX_NR_TO_SCAN and SGX_NR_TO_SCAN_MAX. > > From the changelog: > > 1) To take a parameter that specifies the number of pages to scan for > reclaiming. Define a max value of 32, but scan 16 in the case for the > global reclaimer (ksgxd). > > It appears we want to make this function to scan @nr_to_scan for cgroup, > but > still want to scan a fixed value for ksgxd, which is SGX_NR_TO_SCAN. And > @nr_to_scan can be larger than SGX_NR_TO_SCAN but smaller than > SGX_NR_TO_SCAN_MAX. > > Putting behind the mystery of why above is needed, to achieve it, is it > more > clear if we do below? > > int __sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age) > { > struct sgx_backing backing[SGX_NR_TO_SCAN_MAX]; > ... > > if (nr_to_scan > SGX_NR_TO_SCAN_MAX) > return 0; We could set nr_to_scan to MAX but since this is code internal to driver, maybe just make sure callers don't call with bigger numbers. > > for (i = 0; i < nr_to_scan; i++) { > ... > } > yes > return reclaimed; > } > > /* This is for ksgxd() */ > int sgx_reclaim_epc_page(void) > { > return __sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); > } Some maintainers may prefer no wrapping. Thanks Haitao
> > > > > > -/* > > > +/** > > > + * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers > > > + * @nr_to_scan: Number of EPC pages to scan for reclaim > > > + * @ignore_age: Reclaim a page even if it is young > > > + * > > > * Take a fixed number of pages from the head of the active page pool > > > and > > > * reclaim them to the enclave's private shmem files. Skip the pages, > > > which have > > > * been accessed since the last scan. Move those pages to the tail of > > > active > > > @@ -292,15 +301,14 @@ static void sgx_reclaimer_write(struct > > > sgx_epc_page *epc_page, > > > * problematic as it would increase the lock contention too much, > > > which would > > > * halt forward progress. > > > */ > > > -static void sgx_reclaim_pages(void) > > > +size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age) > > > > 'size_t' looks odd. Any reason to use it? > > > > Given you only scan 32 at maximum, seems 'int' is good enough? > > > > Initially was int. > Jarkko was suggesting ssize_t. I changed to size_t as this function will > never return negative. Then 'unsigned int'. We are talking about 32 at max here. size_t is more suitable for bytes, but we are dealing with number of pages. Maybe Jarkko could comment why size_t is better. [...] > > > > > i = 0; > > > list_for_each_entry_safe(epc_page, tmp, &iso, list) { > > > encl_page = epc_page->encl_page; > > > > > > - if (!sgx_reclaimer_age(epc_page)) > > > + if (i == SGX_NR_TO_SCAN_MAX || > > > > i == nr_to_scan? > > > Not needed if above for statement fixed for nr_to_scan. > Anything above MAX will be skipped and put back to LRU. I believe using nr_to_scan is more logically correct. [...] > > > > I found this function a little bit odd, given the mixing of 'nr_to_scan', > > SGX_NR_TO_SCAN and SGX_NR_TO_SCAN_MAX. > > > > From the changelog: > > > > 1) To take a parameter that specifies the number of pages to scan for > > reclaiming. Define a max value of 32, but scan 16 in the case for the > > global reclaimer (ksgxd). > > > > It appears we want to make this function to scan @nr_to_scan for cgroup, > > but > > still want to scan a fixed value for ksgxd, which is SGX_NR_TO_SCAN. And > > @nr_to_scan can be larger than SGX_NR_TO_SCAN but smaller than > > SGX_NR_TO_SCAN_MAX. > > > > Putting behind the mystery of why above is needed, to achieve it, is it > > more > > clear if we do below? > > > > int __sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age) > > { > > struct sgx_backing backing[SGX_NR_TO_SCAN_MAX]; > > ... > > > > if (nr_to_scan > SGX_NR_TO_SCAN_MAX) > > return 0; > > We could set nr_to_scan to MAX but since this is code internal to driver, > maybe just make sure callers don't call with bigger numbers. Please add this check, using WARN_ON_ONCE() if it's better. Then the code is much easier to review. > > > > > for (i = 0; i < nr_to_scan; i++) { > > ... > > } > > > > yes please fix this up, then ... > > > return reclaimed; > > } > > > > /* This is for ksgxd() */ > > int sgx_reclaim_epc_page(void) > > { > > return __sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); > > } > > Some maintainers may prefer no wrapping. > ... OK.
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 3b875ab4dcd0..4e1a3e038db5 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -18,6 +18,11 @@ #include "encl.h" #include "encls.h" +/* + * Maximum number of pages to scan for reclaiming. + */ +#define SGX_NR_TO_SCAN_MAX 32 + struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; static int sgx_nr_epc_sections; static struct task_struct *ksgxd_tsk; @@ -279,7 +284,11 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, mutex_unlock(&encl->lock); } -/* +/** + * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers + * @nr_to_scan: Number of EPC pages to scan for reclaim + * @ignore_age: Reclaim a page even if it is young + * * Take a fixed number of pages from the head of the active page pool and * reclaim them to the enclave's private shmem files. Skip the pages, which have * been accessed since the last scan. Move those pages to the tail of active @@ -292,15 +301,14 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, * problematic as it would increase the lock contention too much, which would * halt forward progress. */ -static void sgx_reclaim_pages(void) +size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age) { - struct sgx_backing backing[SGX_NR_TO_SCAN]; + struct sgx_backing backing[SGX_NR_TO_SCAN_MAX]; struct sgx_epc_page *epc_page, *tmp; struct sgx_encl_page *encl_page; pgoff_t page_index; LIST_HEAD(iso); - int ret; - int i; + size_t ret, i; spin_lock(&sgx_global_lru.lock); for (i = 0; i < SGX_NR_TO_SCAN; i++) { @@ -326,13 +334,14 @@ static void sgx_reclaim_pages(void) spin_unlock(&sgx_global_lru.lock); if (list_empty(&iso)) - return; + return 0; i = 0; list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->encl_page; - if (!sgx_reclaimer_age(epc_page)) + if (i == SGX_NR_TO_SCAN_MAX || + (!ignore_age && !sgx_reclaimer_age(epc_page))) goto skip; page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); @@ -371,6 +380,8 @@ static void sgx_reclaim_pages(void) sgx_free_epc_page(epc_page); } + + return i; } static bool sgx_should_reclaim(unsigned long watermark) @@ -387,7 +398,7 @@ static bool sgx_should_reclaim(unsigned long watermark) void sgx_reclaim_direct(void) { if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) - sgx_reclaim_pages(); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); } static int ksgxd(void *p) @@ -410,7 +421,7 @@ static int ksgxd(void *p) sgx_should_reclaim(SGX_NR_HIGH_PAGES)); if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) - sgx_reclaim_pages(); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); cond_resched(); } @@ -582,7 +593,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - sgx_reclaim_pages(); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); cond_resched(); } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 6c0bfdc209c0..7e7f1f36d31e 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -179,6 +179,7 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); bool sgx_epc_oom(struct sgx_epc_lru_lists *lrus); +size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age); void sgx_ipi_cb(void *info);