Message ID | 06a5f478d3bfaa57954954c82dd5d4040450171d.1666130846.git.reinette.chatre@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/sgx: Reduce delay and interference of enclave release | expand |
On Tue, Oct 18, 2022 at 03:42:47PM -0700, Reinette Chatre wrote: > commit 8795359e35bc ("x86/sgx: Silence softlockup detection when > releasing large enclaves") introduced a cond_resched() during enclave > release where the EREMOVE instruction is applied to every 4k enclave > page. Giving other tasks an opportunity to run while tearing down a > large enclave placates the soft lockup detector but Iqbal found > that the fix causes a 25% performance degradation of a workload > run using Gramine. > > Gramine maintains a 1:1 mapping between processes and SGX enclaves. > That means if a workload in an enclave creates a subprocess then > Gramine creates a duplicate enclave for that subprocess to run in. > The consequence is that the release of the enclave used to run > the subprocess can impact the performance of the workload that is > run in the original enclave, especially in large enclaves when > SGX2 is not in use. > > The workload run by Iqbal behaves as follows: > Create enclave (enclave "A") > /* Initialize workload in enclave "A" */ > Create enclave (enclave "B") > /* Run subprocess in enclave "B" and send result to enclave "A" */ > Release enclave (enclave "B") > /* Run workload in enclave "A" */ > Release enclave (enclave "A") > > The performance impact of releasing enclave "B" in the above scenario > is amplified when there is a lot of SGX memory and the enclave size > matches the SGX memory. When there is 128GB SGX memory and an enclave > size of 128GB, from the time enclave "B" starts the 128GB SGX memory > is oversubscribed with a combined demand for 256GB from the two > enclaves. > > Before commit 8795359e35bc ("x86/sgx: Silence softlockup detection when > releasing large enclaves") enclave release was done in a tight loop > without giving other tasks a chance to run. Even though the system > experienced soft lockups the workload (run in enclave "A") obtained > good performance numbers because when the workload started running > there was no interference. > > Commit 8795359e35bc ("x86/sgx: Silence softlockup detection when > releasing large enclaves") gave other tasks opportunity to run while an > enclave is released. The impact of this in this scenario is that while > enclave "B" is released and needing to access each page that belongs > to it in order to run the SGX EREMOVE instruction on it, enclave "A" > is attempting to run the workload needing to access the enclave > pages that belong to it. This causes a lot of swapping due to the > demand for the oversubscribed SGX memory. Longer latencies are > experienced by the workload in enclave "A" while enclave "B" is > released. > > Improve the performance of enclave release while still avoiding the > soft lockup detector with two enhancements: > - Only call cond_resched() after XA_CHECK_SCHED iterations. > - Use the xarray advanced API to keep the xarray locked for > XA_CHECK_SCHED iterations instead of locking and unlocking > at every iteration. > > This batching solution is copied from sgx_encl_may_map() that > also iterates through all enclave pages using this technique. > > With this enhancement the workload experiences a 5% > performance degradation when compared to a kernel without > commit 8795359e35bc ("x86/sgx: Silence softlockup detection when > releasing large enclaves"), an improvement to the reported 25% > degradation, while still placating the soft lockup detector. > > Scenarios with poor performance are still possible even with these > enhancements. For example, short workloads creating sub processes > while running in large enclaves. Further performance improvements > are pursued in user space through avoiding to create duplicate enclaves > for certain sub processes, and using SGX2 that will do lazy allocation > of pages as needed so enclaves created for sub processes start quickly > and release quickly. > > Fixes: 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves") > Reported-by: Md Iqbal Hossain <md.iqbal.hossain@intel.com> > Tested-by: Md Iqbal Hossain <md.iqbal.hossain@intel.com> > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> > --- > > I do not know if this qualifies as stable material. > > arch/x86/kernel/cpu/sgx/encl.c | 22 ++++++++++++++++++---- > 1 file changed, 18 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > index 1ec20807de1e..f7365c278525 100644 > --- a/arch/x86/kernel/cpu/sgx/encl.c > +++ b/arch/x86/kernel/cpu/sgx/encl.c > @@ -682,9 +682,12 @@ void sgx_encl_release(struct kref *ref) > struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount); > struct sgx_va_page *va_page; > struct sgx_encl_page *entry; > - unsigned long index; > + unsigned long count = 0; > + > + XA_STATE(xas, &encl->page_array, PFN_DOWN(encl->base)); > > - xa_for_each(&encl->page_array, index, entry) { > + xas_lock(&xas); > + xas_for_each(&xas, entry, PFN_DOWN(encl->base + encl->size - 1)) { I would add to declarations: unsigned long nr_pages = PFN_DOWN(encl->base + encl->size - 1); Makes this more readable. > if (entry->epc_page) { > /* > * The page and its radix tree entry cannot be freed > @@ -699,9 +702,20 @@ void sgx_encl_release(struct kref *ref) > } > > kfree(entry); > - /* Invoke scheduler to prevent soft lockups. */ > - cond_resched(); > + /* > + * Invoke scheduler on every XA_CHECK_SCHED iteration > + * to prevent soft lockups. > + */ > + if (!(++count % XA_CHECK_SCHED)) { > + xas_pause(&xas); > + xas_unlock(&xas); > + > + cond_resched(); > + > + xas_lock(&xas); > + } > } WARN_ON(count != nr_pages); > + xas_unlock(&xas); > > xa_destroy(&encl->page_array); > > -- > 2.34.1 > BR, Jarkko
Hi Jarkko, On 10/23/2022 1:06 PM, Jarkko Sakkinen wrote: > On Tue, Oct 18, 2022 at 03:42:47PM -0700, Reinette Chatre wrote: ... >> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c >> index 1ec20807de1e..f7365c278525 100644 >> --- a/arch/x86/kernel/cpu/sgx/encl.c >> +++ b/arch/x86/kernel/cpu/sgx/encl.c >> @@ -682,9 +682,12 @@ void sgx_encl_release(struct kref *ref) >> struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount); >> struct sgx_va_page *va_page; >> struct sgx_encl_page *entry; >> - unsigned long index; >> + unsigned long count = 0; >> + >> + XA_STATE(xas, &encl->page_array, PFN_DOWN(encl->base)); >> >> - xa_for_each(&encl->page_array, index, entry) { >> + xas_lock(&xas); >> + xas_for_each(&xas, entry, PFN_DOWN(encl->base + encl->size - 1)) { > > I would add to declarations: > > unsigned long nr_pages = PFN_DOWN(encl->base + encl->size - 1); > > Makes this more readable. Will do, but I prefer to name it "max_page_index" or something related instead. "nr_pages" implies "number of pages" to me, which is not what PFN_DOWN(encl->base + encl->size - 1) represents. What is represented is the highest possible index of a page in page_array, where an index is the pfn of a page. > >> if (entry->epc_page) { >> /* >> * The page and its radix tree entry cannot be freed >> @@ -699,9 +702,20 @@ void sgx_encl_release(struct kref *ref) >> } >> >> kfree(entry); >> - /* Invoke scheduler to prevent soft lockups. */ >> - cond_resched(); >> + /* >> + * Invoke scheduler on every XA_CHECK_SCHED iteration >> + * to prevent soft lockups. >> + */ >> + if (!(++count % XA_CHECK_SCHED)) { >> + xas_pause(&xas); >> + xas_unlock(&xas); >> + >> + cond_resched(); >> + >> + xas_lock(&xas); >> + } >> } > > WARN_ON(count != nr_pages); > nr_pages as assigned in your example does not represent a count of the enclave pages but instead a pfn index into the page_array. Comparing it to count, the number of removed enclave pages that are not being held by reclaimer, is not appropriate. This check would be problematic even if we create a "nr_pages" from the range of possible indices. This is because of how enclave sizes are required to be power-of-two that makes it likely for there to be indices without pages associated with it. >> + xas_unlock(&xas); >> >> xa_destroy(&encl->page_array); >> >> -- >> 2.34.1 >> Reinette
On Mon, Oct 24, 2022 at 11:56:39AM -0700, Reinette Chatre wrote: > Hi Jarkko, > > On 10/23/2022 1:06 PM, Jarkko Sakkinen wrote: > > On Tue, Oct 18, 2022 at 03:42:47PM -0700, Reinette Chatre wrote: > > ... > > >> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > >> index 1ec20807de1e..f7365c278525 100644 > >> --- a/arch/x86/kernel/cpu/sgx/encl.c > >> +++ b/arch/x86/kernel/cpu/sgx/encl.c > >> @@ -682,9 +682,12 @@ void sgx_encl_release(struct kref *ref) > >> struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount); > >> struct sgx_va_page *va_page; > >> struct sgx_encl_page *entry; > >> - unsigned long index; > >> + unsigned long count = 0; > >> + > >> + XA_STATE(xas, &encl->page_array, PFN_DOWN(encl->base)); > >> > >> - xa_for_each(&encl->page_array, index, entry) { > >> + xas_lock(&xas); > >> + xas_for_each(&xas, entry, PFN_DOWN(encl->base + encl->size - 1)) { > > > > I would add to declarations: > > > > unsigned long nr_pages = PFN_DOWN(encl->base + encl->size - 1); > > > > Makes this more readable. > > Will do, but I prefer to name it "max_page_index" or something related instead. > "nr_pages" implies "number of pages" to me, which is not what > PFN_DOWN(encl->base + encl->size - 1) represents. What is represented is the > highest possible index of a page in page_array, where an index is the > pfn of a page. Yeah, makes sense. > > > > >> if (entry->epc_page) { > >> /* > >> * The page and its radix tree entry cannot be freed > >> @@ -699,9 +702,20 @@ void sgx_encl_release(struct kref *ref) > >> } > >> > >> kfree(entry); > >> - /* Invoke scheduler to prevent soft lockups. */ > >> - cond_resched(); > >> + /* > >> + * Invoke scheduler on every XA_CHECK_SCHED iteration > >> + * to prevent soft lockups. > >> + */ > >> + if (!(++count % XA_CHECK_SCHED)) { > >> + xas_pause(&xas); > >> + xas_unlock(&xas); > >> + > >> + cond_resched(); > >> + > >> + xas_lock(&xas); > >> + } > >> } > > > > WARN_ON(count != nr_pages); > > > > nr_pages as assigned in your example does not represent a count of the > enclave pages but instead a pfn index into the page_array. Comparing it > to count, the number of removed enclave pages that are not being held > by reclaimer, is not appropriate. > > This check would be problematic even if we create a "nr_pages" from > the range of possible indices. This is because of how enclave sizes are > required to be power-of-two that makes it likely for there to be indices > without pages associated with it. Ok. > > >> + xas_unlock(&xas); > >> > >> xa_destroy(&encl->page_array); > >> > >> -- > >> 2.34.1 > >> > > Reinette BR, Jarkko
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 1ec20807de1e..f7365c278525 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -682,9 +682,12 @@ void sgx_encl_release(struct kref *ref) struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount); struct sgx_va_page *va_page; struct sgx_encl_page *entry; - unsigned long index; + unsigned long count = 0; + + XA_STATE(xas, &encl->page_array, PFN_DOWN(encl->base)); - xa_for_each(&encl->page_array, index, entry) { + xas_lock(&xas); + xas_for_each(&xas, entry, PFN_DOWN(encl->base + encl->size - 1)) { if (entry->epc_page) { /* * The page and its radix tree entry cannot be freed @@ -699,9 +702,20 @@ void sgx_encl_release(struct kref *ref) } kfree(entry); - /* Invoke scheduler to prevent soft lockups. */ - cond_resched(); + /* + * Invoke scheduler on every XA_CHECK_SCHED iteration + * to prevent soft lockups. + */ + if (!(++count % XA_CHECK_SCHED)) { + xas_pause(&xas); + xas_unlock(&xas); + + cond_resched(); + + xas_lock(&xas); + } } + xas_unlock(&xas); xa_destroy(&encl->page_array);