Message ID | 20220304122852.563475-1-jarkko@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] x86: Add SGX_IOC_ENCLAVE_AUGMENT_PAGES | expand |
On Fri, 2022-03-04 at 14:28 +0200, Jarkko Sakkinen wrote: > With SGX1 an enclave needs to be created with its maximum memory demands > allocated. Pages cannot be added to an enclave after it is initialized. > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add pages > to an initialized enclave. With SGX2 the enclave still needs to set aside > address space for its maximum memory demands during enclave creation, but > all pages need not be added before enclave initialization. Pages can be > added during enclave runtime. > > Add support for dynamically adding pages to an initialized enclave with > SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given range of > pages. Do not enforce any particular permissions from kernel, like is done > for the pages added during the pre-initialization phase, as enclave > controls the final permissions and content for these pages by issuing > either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary data and > permissions). > > Explicit EAUG ioctl is a better choice than an implicit EAUG from a page > fault handler because it allows to have O(1) number of kernel-enclave round > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the case > when a page fault handler EAUG single page at a time. > > Cc: Reinette Chatre <reinette.chatre@intel.com> > Cc: Nathaniel McCallum <nathaniel@profian.com> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > --- > Is contained in sgx2-v2.1 branch of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git I created sgx2-v2.2 branch, which has #PF EAUG removed. I also moved selftests to the tail in the patch sets so that it is easier to update them reflecting these and future changes. Having them intervened makes things just complicated. I focus now to implement mmap() for Enarx with this, so no kselftest update just yet. Roughly the sequence in Enarx is: 1. Enclave traps on syscall (opcode). 2. Host jumps to shim expection handler. 3. Enclave copies the mmap() arguments to a buffer outside the enclave. 4. Enclave exists back to the host. 5. Host performs EAUG to the mmap range. 6. Host performs mmap() to the mmap range, which succeeds given that vm_max_prot_bits is RWX (i.e. disabled for dynamic pages). 7. Host jumps back to the enclave and execution continues there in the mmap handler. 8. mmap handler does a series of EACCEPTCOPY operations for the range with given permissions and empty page as the input data. Some details might differ a bit but this gives the basic idea. I like the fact the roud-trips are kind of in control and not variable, and it is pretty easy to use to implement the basic syscall behaviour. This has of course some corner cases but my sequence describes the main flow anyway. Take it or leave it but this does give at least for me a sound way to implement my workload. I'll use this up until my changes have been inducted to the original patch set, or it starts to look sane with other solutions. The original patch set simply does not work for us at all. BR, Jarkko
On Fri, 04 Mar 2022 06:28:52 -0600, Jarkko Sakkinen <jarkko@kernel.org> wrote: > With SGX1 an enclave needs to be created with its maximum memory demands > allocated. Pages cannot be added to an enclave after it is initialized. > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add > pages > to an initialized enclave. With SGX2 the enclave still needs to set aside > address space for its maximum memory demands during enclave creation, but > all pages need not be added before enclave initialization. Pages can be > added during enclave runtime. > > Add support for dynamically adding pages to an initialized enclave with > SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given range of > pages. Do not enforce any particular permissions from kernel, like is > done > for the pages added during the pre-initialization phase, as enclave > controls the final permissions and content for these pages by issuing > either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary data > and > permissions). > > Explicit EAUG ioctl is a better choice than an implicit EAUG from a page > fault handler because it allows to have O(1) number of kernel-enclave > round > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the > case > when a page fault handler EAUG single page at a time. > > Cc: Reinette Chatre <reinette.chatre@intel.com> > Cc: Nathaniel McCallum <nathaniel@profian.com> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > --- > Is contained in sgx2-v2.1 branch of > git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git > --- > arch/x86/include/uapi/asm/sgx.h | 14 +++ > arch/x86/kernel/cpu/sgx/ioctl.c | 159 ++++++++++++++++++++++++++++++++ > 2 files changed, 173 insertions(+) > > diff --git a/arch/x86/include/uapi/asm/sgx.h > b/arch/x86/include/uapi/asm/sgx.h > index c4e0326d281d..2b3a606e78fe 100644 > --- a/arch/x86/include/uapi/asm/sgx.h > +++ b/arch/x86/include/uapi/asm/sgx.h > @@ -35,6 +35,8 @@ enum sgx_page_flags { > _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_modt) > #define SGX_IOC_ENCLAVE_REMOVE_PAGES \ > _IOWR(SGX_MAGIC, 0x08, struct sgx_enclave_remove_pages) > +#define SGX_IOC_ENCLAVE_AUGMENT_PAGES \ > + _IOWR(SGX_MAGIC, 0x09, struct sgx_enclave_augment_pages) > /** > * struct sgx_enclave_create - parameter structure for the > @@ -138,6 +140,18 @@ struct sgx_enclave_remove_pages { > __u64 count; > }; > +/** > + * struct sgx_enclave_augment_pages - parameter structure for the > %SGX_IOC_ENCLAVE_ADD_PAGE ioctl > + * @offset: starting page offset > + * @length: length of the data (multiple of the page size) > + * @count: number of bytes added (multiple of the page size) > + */ > +struct sgx_enclave_augment_pages { > + __u64 offset; > + __u64 length; > + __u64 count; > +}; > + As I stated in another thread, we need a mechanism to allow EAUG page lazily, e.g., on #PF. Can we add a field here to indicate that? > struct sgx_enclave_run; > /** > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c > b/arch/x86/kernel/cpu/sgx/ioctl.c > index 3ad4320ff6ae..1cb410d97ac7 100644 > --- a/arch/x86/kernel/cpu/sgx/ioctl.c > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c > @@ -1320,6 +1320,162 @@ static long sgx_ioc_enclave_remove_pages(struct > sgx_encl *encl, > return ret; > } > +/* > + * Performa EAUG operation for an enclave. > + */ > +static int sgx_encl_augment_page(struct sgx_encl *encl, unsigned long > offset) > +{ > + struct sgx_pageinfo pginfo = {0}; > + struct sgx_encl_page *encl_page; > + struct sgx_epc_page *epc_page; > + struct sgx_va_page *va_page; > + u64 secinfo_flags; > + int ret; > + > + /* > + * Ignore internal permission checking for dynamically added pages. > + * They matter only for data added during the pre-initialization phase. > + * The enclave decides the permissions by the means of EACCEPT, > + * EACCEPTCOPY and EMODPE. > + */ > + secinfo_flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X; > + encl_page = sgx_encl_page_alloc(encl, offset, secinfo_flags); > + if (IS_ERR(encl_page)) > + return PTR_ERR(encl_page); > + > + epc_page = sgx_alloc_epc_page(encl_page, true); > + if (IS_ERR(epc_page)) { > + ret = PTR_ERR(epc_page); > + goto err_alloc_epc_page; > + } > + > + va_page = sgx_encl_grow(encl); > + if (IS_ERR(va_page)) { > + ret = PTR_ERR(va_page); > + goto err_grow; > + } > + > + mutex_lock(&encl->lock); > + > + /* > + * Adding to encl->va_pages must be done under encl->lock. Ditto for > + * deleting (via sgx_encl_shrink()) in the error path. > + */ > + if (va_page) > + list_add(&va_page->list, &encl->va_pages); > + > + /* > + * Insert prior to EADD in case of OOM. EADD modifies MRENCLAVE, i.e. > + * can't be gracefully unwound, while failure on EADD/EXTEND is limited > + * to userspace errors (or kernel/hardware bugs). > + */ > + ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc), > + encl_page, GFP_KERNEL); > + > + /* > + * If ret == -EBUSY then page was created in another flow while > + * running without encl->lock > + */ > + if (ret) > + goto err_xa_insert; > + > + pginfo.secs = (unsigned > long)sgx_get_epc_virt_addr(encl->secs.epc_page); > + pginfo.addr = encl_page->desc & PAGE_MASK; > + pginfo.metadata = 0; > + > + ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page)); > + if (ret) > + goto err_eaug; > + > + encl_page->encl = encl; > + encl_page->epc_page = epc_page; > + encl_page->type = SGX_PAGE_TYPE_REG; > + encl->secs_child_cnt++; > + > + sgx_mark_page_reclaimable(encl_page->epc_page); > + > + mutex_unlock(&encl->lock); > + > + return 0; > + > +err_eaug: > + xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc)); > + > +err_xa_insert: > + sgx_encl_shrink(encl, va_page); > + mutex_unlock(&encl->lock); > + > +err_grow: > + sgx_encl_free_epc_page(epc_page); > + > +err_alloc_epc_page: > + kfree(encl_page); > + > + return VM_FAULT_SIGBUS; > +} > + > +/** > + * sgx_ioc_enclave_augment_pages() - The handler for > %SGX_IOC_ENCLAVE_AUGMENT_PAGES > + * @encl: an enclave pointer > + * @arg: a user pointer to a struct sgx_enclave_augment_pages instance > + * > + * Request to augment pages to an initialized enclave. The pages must be > + * acknowledged by the enclave by issuing either ENCLU[EACCEPT] (for a > + * trivial empty data page) or ENCLU[EACCEPTCOPY] (for a page with > arbitrary > + * permissions and user provided data). > + * > + * mmap() protection bits for augmented pages can be arbitrary, i.e. > their > + * permissions are only capped by the VMA, not by EPCM permissions. > This is > + * a constraint because enclave can internally control the final EPCM > versions, > + * for which the kernel has no say. > + * > + * Return: > + * - 0: Success. > + * - -EACCES: The source page is located in a noexec partition. > + * - -ENOMEM: Out of EPC pages. > + * - -EINTR: The call was interrupted before data was processed. > + * - -EIO: EAUG failed. > + * - -errno: POSIX error. > + */ > +static long sgx_ioc_enclave_augment_pages(struct sgx_encl *encl, void > __user *user_arg) > +{ > + struct sgx_enclave_augment_pages arg; > + unsigned long c; > + int ret; > + > + if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) > + return -EINVAL; > + > + if (copy_from_user(&arg, user_arg, sizeof(arg))) > + return -EFAULT; > + > + if (sgx_validate_offset_length(encl, arg.offset, arg.length)) > + return -EINVAL; > + > + for (c = 0 ; c < arg.length; c += PAGE_SIZE) { > + if (signal_pending(current)) { > + if (!c) > + ret = -ERESTARTSYS; > + > + break; > + } > + > + if (need_resched()) > + cond_resched(); > + > + ret = sgx_encl_augment_page(encl, arg.offset + c); > + if (ret) > + break; > + } > + > + arg.count = c; > + > + if (copy_to_user(user_arg, &arg, sizeof(arg))) > + return -EFAULT; > + > + return ret; > +} > + > long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) > { > struct sgx_encl *encl = filep->private_data; > @@ -1350,6 +1506,9 @@ long sgx_ioctl(struct file *filep, unsigned int > cmd, unsigned long arg) > case SGX_IOC_ENCLAVE_REMOVE_PAGES: > ret = sgx_ioc_enclave_remove_pages(encl, (void __user *)arg); > break; > + case SGX_IOC_ENCLAVE_AUGMENT_PAGES: > + ret = sgx_ioc_enclave_augment_pages(encl, (void __user *)arg); > + break; > default: > ret = -ENOIOCTLCMD; > break;
On 3/4/22 04:28, Jarkko Sakkinen wrote: > Explicit EAUG ioctl is a better choice than an implicit EAUG from a page > fault handler because it allows to have O(1) number of kernel-enclave round > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the case > when a page fault handler EAUG single page at a time. So this is basically an optimization? It's MADV_WILLNEED or MAP_POPULATE to the cost of avoid future faults?
On Fri, Mar 04, 2022 at 10:27:58AM -0600, Haitao Huang wrote: > On Fri, 04 Mar 2022 06:28:52 -0600, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > > With SGX1 an enclave needs to be created with its maximum memory demands > > allocated. Pages cannot be added to an enclave after it is initialized. > > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add > > pages > > to an initialized enclave. With SGX2 the enclave still needs to set aside > > address space for its maximum memory demands during enclave creation, but > > all pages need not be added before enclave initialization. Pages can be > > added during enclave runtime. > > > > Add support for dynamically adding pages to an initialized enclave with > > SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given range of > > pages. Do not enforce any particular permissions from kernel, like is > > done > > for the pages added during the pre-initialization phase, as enclave > > controls the final permissions and content for these pages by issuing > > either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary data > > and > > permissions). > > > > Explicit EAUG ioctl is a better choice than an implicit EAUG from a page > > fault handler because it allows to have O(1) number of kernel-enclave > > round > > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the > > case > > when a page fault handler EAUG single page at a time. > > > > Cc: Reinette Chatre <reinette.chatre@intel.com> > > Cc: Nathaniel McCallum <nathaniel@profian.com> > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > > --- > > Is contained in sgx2-v2.1 branch of > > git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git > > --- > > arch/x86/include/uapi/asm/sgx.h | 14 +++ > > arch/x86/kernel/cpu/sgx/ioctl.c | 159 ++++++++++++++++++++++++++++++++ > > 2 files changed, 173 insertions(+) > > > > diff --git a/arch/x86/include/uapi/asm/sgx.h > > b/arch/x86/include/uapi/asm/sgx.h > > index c4e0326d281d..2b3a606e78fe 100644 > > --- a/arch/x86/include/uapi/asm/sgx.h > > +++ b/arch/x86/include/uapi/asm/sgx.h > > @@ -35,6 +35,8 @@ enum sgx_page_flags { > > _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_modt) > > #define SGX_IOC_ENCLAVE_REMOVE_PAGES \ > > _IOWR(SGX_MAGIC, 0x08, struct sgx_enclave_remove_pages) > > +#define SGX_IOC_ENCLAVE_AUGMENT_PAGES \ > > + _IOWR(SGX_MAGIC, 0x09, struct sgx_enclave_augment_pages) > > /** > > * struct sgx_enclave_create - parameter structure for the > > @@ -138,6 +140,18 @@ struct sgx_enclave_remove_pages { > > __u64 count; > > }; > > +/** > > + * struct sgx_enclave_augment_pages - parameter structure for the > > %SGX_IOC_ENCLAVE_ADD_PAGE ioctl > > + * @offset: starting page offset > > + * @length: length of the data (multiple of the page size) > > + * @count: number of bytes added (multiple of the page size) > > + */ > > +struct sgx_enclave_augment_pages { > > + __u64 offset; > > + __u64 length; > > + __u64 count; > > +}; > > + > > As I stated in another thread, we need a mechanism to allow EAUG page > lazily, e.g., on #PF. Can we add a field here to indicate that? ioctl *does not* prevent lazy behaviour where, or if, it makes sense. For growing memory (e.g. MAP_GROWSDOWN) you should just take advantage of the vDSO's exception handling mechanism and call the ioctl on demand. For a high-performance user space you still want to be also do minimum round trip "batch jobs" where they are possible. We exactly have the whole vDSO framework to service the on-demand needs while still having full control of the execution. EAUG in the #PF handler is all about being flakky and loosing all robustness. BR, Jarkko
On Fri, Mar 04, 2022 at 09:08:20AM -0800, Dave Hansen wrote: > On 3/4/22 04:28, Jarkko Sakkinen wrote: > > Explicit EAUG ioctl is a better choice than an implicit EAUG from a page > > fault handler because it allows to have O(1) number of kernel-enclave round > > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the case > > when a page fault handler EAUG single page at a time. > > So this is basically an optimization? It's MADV_WILLNEED or > MAP_POPULATE to the cost of avoid future faults? Yes. So the idea would be that based on these the #PF handler would have more smartness, and it would do a batch of EAUG's? That could be possibly acceptable but I also had other concern. I would like to see this: 1. Removal of vm_run_prot_bits. 2. Use RWX vm_max_prot_bits for EAUG'd pages. During run-time kernel controls PTE's, and enclave has full control of the EPCM (EACCEPT, EACCEPTCOPY, EMODPE). By creating artificial limitations how to operate with these, it can limit various optimizations in the user space code. E.g. a syscall shim can require clever co-operation between in-enclave opcodes and what you do with the kernel in various situations. RWX sounds provocative yes, but here it means only the limits where kernel can set its PTE's and nothing else, not that page table is filled with RWX pages, and enclave dictates what is in EPCM, and that's how it actually should be (e.g. you can sometimes deliver mmap() without ever going out of the enclave with EMODPE). If MADV_WILLNEED/MAP_POPULATE approach is combined with this what I discussed here, then I think we could have solution to write an efficient memory management shims. BR, Jarkko
On Sat, Mar 05, 2022 at 04:17:53AM +0200, Jarkko Sakkinen wrote: > On Fri, Mar 04, 2022 at 09:08:20AM -0800, Dave Hansen wrote: > > On 3/4/22 04:28, Jarkko Sakkinen wrote: > > > Explicit EAUG ioctl is a better choice than an implicit EAUG from a page > > > fault handler because it allows to have O(1) number of kernel-enclave round > > > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the case > > > when a page fault handler EAUG single page at a time. > > > > So this is basically an optimization? It's MADV_WILLNEED or > > MAP_POPULATE to the cost of avoid future faults? > > Yes. > > So the idea would be that based on these the #PF handler would have more > smartness, and it would do a batch of EAUG's? > > That could be possibly acceptable but I also had other concern. > > I would like to see this: > > 1. Removal of vm_run_prot_bits. > 2. Use RWX vm_max_prot_bits for EAUG'd pages. > > During run-time kernel controls PTE's, and enclave has full control of the > EPCM (EACCEPT, EACCEPTCOPY, EMODPE). By creating artificial limitations how > to operate with these, it can limit various optimizations in the user space > code. E.g. a syscall shim can require clever co-operation between in-enclave > opcodes and what you do with the kernel in various situations. > > RWX sounds provocative yes, but here it means only the limits where kernel > can set its PTE's and nothing else, not that page table is filled with RWX > pages, and enclave dictates what is in EPCM, and that's how it actually > should be (e.g. you can sometimes deliver mmap() without ever going out > of the enclave with EMODPE). > > If MADV_WILLNEED/MAP_POPULATE approach is combined with this what I > discussed here, then I think we could have solution to write an efficient > memory management shims. Do you already have a rough idea what needs to be done? I can take anyway a look but just in case you had processed this further, please tell what you have. BR, Jarkko
On Sat, Mar 05, 2022 at 04:32:12AM +0200, Jarkko Sakkinen wrote: > On Sat, Mar 05, 2022 at 04:17:53AM +0200, Jarkko Sakkinen wrote: > > On Fri, Mar 04, 2022 at 09:08:20AM -0800, Dave Hansen wrote: > > > On 3/4/22 04:28, Jarkko Sakkinen wrote: > > > > Explicit EAUG ioctl is a better choice than an implicit EAUG from a page > > > > fault handler because it allows to have O(1) number of kernel-enclave round > > > > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the case > > > > when a page fault handler EAUG single page at a time. > > > > > > So this is basically an optimization? It's MADV_WILLNEED or > > > MAP_POPULATE to the cost of avoid future faults? > > > > Yes. > > > > So the idea would be that based on these the #PF handler would have more > > smartness, and it would do a batch of EAUG's? > > > > That could be possibly acceptable but I also had other concern. > > > > I would like to see this: > > > > 1. Removal of vm_run_prot_bits. > > 2. Use RWX vm_max_prot_bits for EAUG'd pages. > > > > During run-time kernel controls PTE's, and enclave has full control of the > > EPCM (EACCEPT, EACCEPTCOPY, EMODPE). By creating artificial limitations how > > to operate with these, it can limit various optimizations in the user space > > code. E.g. a syscall shim can require clever co-operation between in-enclave > > opcodes and what you do with the kernel in various situations. > > > > RWX sounds provocative yes, but here it means only the limits where kernel > > can set its PTE's and nothing else, not that page table is filled with RWX > > pages, and enclave dictates what is in EPCM, and that's how it actually > > should be (e.g. you can sometimes deliver mmap() without ever going out > > of the enclave with EMODPE). > > > > If MADV_WILLNEED/MAP_POPULATE approach is combined with this what I > > discussed here, then I think we could have solution to write an efficient > > memory management shims. > > Do you already have a rough idea what needs to be done? I can take anyway a > look but just in case you had processed this further, please tell what you > have. MAP_POPULATE would almost exact match to the ioctl, so I guess that could really work. I can give this a shot. Does MAP_POPULATE "come through" the driver's mmap callback? BR, Jarkko
On Fri, 04 Mar 2022 19:26:12 -0600, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Fri, Mar 04, 2022 at 10:27:58AM -0600, Haitao Huang wrote: >> On Fri, 04 Mar 2022 06:28:52 -0600, Jarkko Sakkinen <jarkko@kernel.org> >> wrote: >> >> > With SGX1 an enclave needs to be created with its maximum memory >> demands >> > allocated. Pages cannot be added to an enclave after it is >> initialized. >> > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add >> > pages >> > to an initialized enclave. With SGX2 the enclave still needs to set >> aside >> > address space for its maximum memory demands during enclave creation, >> but >> > all pages need not be added before enclave initialization. Pages can >> be >> > added during enclave runtime. >> > >> > Add support for dynamically adding pages to an initialized enclave >> with >> > SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given range >> of >> > pages. Do not enforce any particular permissions from kernel, like is >> > done >> > for the pages added during the pre-initialization phase, as enclave >> > controls the final permissions and content for these pages by issuing >> > either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary data >> > and >> > permissions). >> > >> > Explicit EAUG ioctl is a better choice than an implicit EAUG from a >> page >> > fault handler because it allows to have O(1) number of kernel-enclave >> > round >> > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the >> > case >> > when a page fault handler EAUG single page at a time. >> > >> > Cc: Reinette Chatre <reinette.chatre@intel.com> >> > Cc: Nathaniel McCallum <nathaniel@profian.com> >> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> >> > --- >> > Is contained in sgx2-v2.1 branch of >> > git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git >> > --- >> > arch/x86/include/uapi/asm/sgx.h | 14 +++ >> > arch/x86/kernel/cpu/sgx/ioctl.c | 159 >> ++++++++++++++++++++++++++++++++ >> > 2 files changed, 173 insertions(+) >> > >> > diff --git a/arch/x86/include/uapi/asm/sgx.h >> > b/arch/x86/include/uapi/asm/sgx.h >> > index c4e0326d281d..2b3a606e78fe 100644 >> > --- a/arch/x86/include/uapi/asm/sgx.h >> > +++ b/arch/x86/include/uapi/asm/sgx.h >> > @@ -35,6 +35,8 @@ enum sgx_page_flags { >> > _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_modt) >> > #define SGX_IOC_ENCLAVE_REMOVE_PAGES \ >> > _IOWR(SGX_MAGIC, 0x08, struct sgx_enclave_remove_pages) >> > +#define SGX_IOC_ENCLAVE_AUGMENT_PAGES \ >> > + _IOWR(SGX_MAGIC, 0x09, struct sgx_enclave_augment_pages) >> > /** >> > * struct sgx_enclave_create - parameter structure for the >> > @@ -138,6 +140,18 @@ struct sgx_enclave_remove_pages { >> > __u64 count; >> > }; >> > +/** >> > + * struct sgx_enclave_augment_pages - parameter structure for the >> > %SGX_IOC_ENCLAVE_ADD_PAGE ioctl >> > + * @offset: starting page offset >> > + * @length: length of the data (multiple of the page size) >> > + * @count: number of bytes added (multiple of the page size) >> > + */ >> > +struct sgx_enclave_augment_pages { >> > + __u64 offset; >> > + __u64 length; >> > + __u64 count; >> > +}; >> > + >> >> As I stated in another thread, we need a mechanism to allow EAUG page >> lazily, e.g., on #PF. Can we add a field here to indicate that? > > ioctl *does not* prevent lazy behaviour where, or if, it makes sense. > > For growing memory (e.g. MAP_GROWSDOWN) you should just take advantage of > the vDSO's exception handling mechanism and call the ioctl on demand. > > For a high-performance user space you still want to be also do minimum > round trip "batch jobs" where they are possible. > Looks like you are pursuing MAP_POPULATE to optimize out the O(N) trips. Just for my understanding of your proposal in case this ever comes back. For the on-demand case, this ioctl is required for for each #PF. That's extra round trip compared to automatic kernel EAUG on #PF. > We exactly have the whole vDSO framework to service the on-demand needs > while still having full control of the execution. EAUG in the #PF handler > is all about being flakky and loosing all robustness. > Again, you keep saying EAUG in the #PF handler is not good. So far the only concrete thing I hear is extra O(N) round trips which can be optimized out.
On Fri, 04 Mar 2022 07:50:43 -0600, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Fri, 2022-03-04 at 14:28 +0200, Jarkko Sakkinen wrote: >> With SGX1 an enclave needs to be created with its maximum memory demands >> allocated. Pages cannot be added to an enclave after it is initialized. >> SGX2 introduces a new function, ENCLS[EAUG], that can be used to add >> pages >> to an initialized enclave. With SGX2 the enclave still needs to set >> aside >> address space for its maximum memory demands during enclave creation, >> but >> all pages need not be added before enclave initialization. Pages can be >> added during enclave runtime. >> >> Add support for dynamically adding pages to an initialized enclave with >> SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given range of >> pages. Do not enforce any particular permissions from kernel, like is >> done >> for the pages added during the pre-initialization phase, as enclave >> controls the final permissions and content for these pages by issuing >> either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary data >> and >> permissions). >> >> Explicit EAUG ioctl is a better choice than an implicit EAUG from a page >> fault handler because it allows to have O(1) number of kernel-enclave >> round >> trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the >> case >> when a page fault handler EAUG single page at a time. >> >> Cc: Reinette Chatre <reinette.chatre@intel.com> >> Cc: Nathaniel McCallum <nathaniel@profian.com> >> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> >> --- >> Is contained in sgx2-v2.1 branch of >> git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git > > I created sgx2-v2.2 branch, which has #PF EAUG removed. I > also moved selftests to the tail in the patch sets so that > it is easier to update them reflecting these and future > changes. Having them intervened makes things just complicated. > > I focus now to implement mmap() for Enarx with this, so no > kselftest update just yet. > > Roughly the sequence in Enarx is: > > 1. Enclave traps on syscall (opcode). > 2. Host jumps to shim expection handler. > 3. Enclave copies the mmap() arguments to a buffer outside > the enclave. > 4. Enclave exists back to the host. > 5. Host performs EAUG to the mmap range. > 6. Host performs mmap() to the mmap range, which succeeds > given that vm_max_prot_bits is RWX (i.e. disabled for > dynamic pages). > 7. Host jumps back to the enclave and execution continues > there in the mmap handler. > 8. mmap handler does a series of EACCEPTCOPY operations for > the range with given permissions and empty page as the > input data. > EACCEPTCOPY will require target pages with RW in PTE. So you would need to make mprotect to change PTE permissions afterwards depending on your target permissions. Without knowing much your context, if your intent is to EACCEPTCOPY(EPCM.RX/EPCM.R, EPCM.pending), then I don't see how the page can be used later without making it RW again first, and copy real data into it. So these empty EACCEPTCOPYs may be better just EACCEPTs(EPCM.RW, EPCM.pending). Then after copy real data into the pages, you do EMODPE/EMODPR as needed. EACCEPTCOPY would make more sense when you already have data to be copied into the EAUG'd but pending EPC pages. > Some details might differ a bit but this gives the basic idea. > I like the fact the roud-trips are kind of in control and not > variable, and it is pretty easy to use to implement the basic > syscall behaviour. This has of course some corner cases but > my sequence describes the main flow anyway. > > Take it or leave it but this does give at least for me a sound > way to implement my workload. I'll use this up until my changes > have been inducted to the original patch set, or it starts to > look sane with other solutions. The original patch set simply > does not work for us at all. > > BR, Jarkko
On Sun, Mar 06, 2022 at 07:38:40AM -0600, Haitao Huang wrote: > On Fri, 04 Mar 2022 19:26:12 -0600, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > > On Fri, Mar 04, 2022 at 10:27:58AM -0600, Haitao Huang wrote: > > > On Fri, 04 Mar 2022 06:28:52 -0600, Jarkko Sakkinen <jarkko@kernel.org> > > > wrote: > > > > > > > With SGX1 an enclave needs to be created with its maximum memory > > > demands > > > > allocated. Pages cannot be added to an enclave after it is > > > initialized. > > > > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add > > > > pages > > > > to an initialized enclave. With SGX2 the enclave still needs to > > > set aside > > > > address space for its maximum memory demands during enclave > > > creation, but > > > > all pages need not be added before enclave initialization. Pages > > > can be > > > > added during enclave runtime. > > > > > > > > Add support for dynamically adding pages to an initialized enclave > > > with > > > > SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given > > > range of > > > > pages. Do not enforce any particular permissions from kernel, like is > > > > done > > > > for the pages added during the pre-initialization phase, as enclave > > > > controls the final permissions and content for these pages by issuing > > > > either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary data > > > > and > > > > permissions). > > > > > > > > Explicit EAUG ioctl is a better choice than an implicit EAUG from > > > a page > > > > fault handler because it allows to have O(1) number of kernel-enclave > > > > round > > > > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the > > > > case > > > > when a page fault handler EAUG single page at a time. > > > > > > > > Cc: Reinette Chatre <reinette.chatre@intel.com> > > > > Cc: Nathaniel McCallum <nathaniel@profian.com> > > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > > > > --- > > > > Is contained in sgx2-v2.1 branch of > > > > git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git > > > > --- > > > > arch/x86/include/uapi/asm/sgx.h | 14 +++ > > > > arch/x86/kernel/cpu/sgx/ioctl.c | 159 > > > ++++++++++++++++++++++++++++++++ > > > > 2 files changed, 173 insertions(+) > > > > > > > > diff --git a/arch/x86/include/uapi/asm/sgx.h > > > > b/arch/x86/include/uapi/asm/sgx.h > > > > index c4e0326d281d..2b3a606e78fe 100644 > > > > --- a/arch/x86/include/uapi/asm/sgx.h > > > > +++ b/arch/x86/include/uapi/asm/sgx.h > > > > @@ -35,6 +35,8 @@ enum sgx_page_flags { > > > > _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_modt) > > > > #define SGX_IOC_ENCLAVE_REMOVE_PAGES \ > > > > _IOWR(SGX_MAGIC, 0x08, struct sgx_enclave_remove_pages) > > > > +#define SGX_IOC_ENCLAVE_AUGMENT_PAGES \ > > > > + _IOWR(SGX_MAGIC, 0x09, struct sgx_enclave_augment_pages) > > > > /** > > > > * struct sgx_enclave_create - parameter structure for the > > > > @@ -138,6 +140,18 @@ struct sgx_enclave_remove_pages { > > > > __u64 count; > > > > }; > > > > +/** > > > > + * struct sgx_enclave_augment_pages - parameter structure for the > > > > %SGX_IOC_ENCLAVE_ADD_PAGE ioctl > > > > + * @offset: starting page offset > > > > + * @length: length of the data (multiple of the page size) > > > > + * @count: number of bytes added (multiple of the page size) > > > > + */ > > > > +struct sgx_enclave_augment_pages { > > > > + __u64 offset; > > > > + __u64 length; > > > > + __u64 count; > > > > +}; > > > > + > > > > > > As I stated in another thread, we need a mechanism to allow EAUG page > > > lazily, e.g., on #PF. Can we add a field here to indicate that? > > > > ioctl *does not* prevent lazy behaviour where, or if, it makes sense. > > > > For growing memory (e.g. MAP_GROWSDOWN) you should just take advantage of > > the vDSO's exception handling mechanism and call the ioctl on demand. > > > > For a high-performance user space you still want to be also do minimum > > round trip "batch jobs" where they are possible. > > > > Looks like you are pursuing MAP_POPULATE to optimize out the O(N) trips. > Just for my understanding of your proposal in case this ever comes back. For > the on-demand case, this ioctl is required for for each #PF. That's extra > round trip compared to automatic kernel EAUG on #PF. Point taken. I think we could have both. It would then work quite a lot like for normal memory. BR, Jarkko
On Sun, Mar 06, 2022 at 09:20:11AM -0600, Haitao Huang wrote: > On Fri, 04 Mar 2022 07:50:43 -0600, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > > On Fri, 2022-03-04 at 14:28 +0200, Jarkko Sakkinen wrote: > > > With SGX1 an enclave needs to be created with its maximum memory demands > > > allocated. Pages cannot be added to an enclave after it is initialized. > > > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add > > > pages > > > to an initialized enclave. With SGX2 the enclave still needs to set > > > aside > > > address space for its maximum memory demands during enclave > > > creation, but > > > all pages need not be added before enclave initialization. Pages can be > > > added during enclave runtime. > > > > > > Add support for dynamically adding pages to an initialized enclave with > > > SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given range of > > > pages. Do not enforce any particular permissions from kernel, like > > > is done > > > for the pages added during the pre-initialization phase, as enclave > > > controls the final permissions and content for these pages by issuing > > > either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary > > > data and > > > permissions). > > > > > > Explicit EAUG ioctl is a better choice than an implicit EAUG from a page > > > fault handler because it allows to have O(1) number of > > > kernel-enclave round > > > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in > > > the case > > > when a page fault handler EAUG single page at a time. > > > > > > Cc: Reinette Chatre <reinette.chatre@intel.com> > > > Cc: Nathaniel McCallum <nathaniel@profian.com> > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > > > --- > > > Is contained in sgx2-v2.1 branch of > > > git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git > > > > I created sgx2-v2.2 branch, which has #PF EAUG removed. I > > also moved selftests to the tail in the patch sets so that > > it is easier to update them reflecting these and future > > changes. Having them intervened makes things just complicated. > > > > I focus now to implement mmap() for Enarx with this, so no > > kselftest update just yet. > > > > Roughly the sequence in Enarx is: > > > > 1. Enclave traps on syscall (opcode). > > 2. Host jumps to shim expection handler. > > 3. Enclave copies the mmap() arguments to a buffer outside > > the enclave. > > 4. Enclave exists back to the host. > > 5. Host performs EAUG to the mmap range. > > 6. Host performs mmap() to the mmap range, which succeeds > > given that vm_max_prot_bits is RWX (i.e. disabled for > > dynamic pages). > > 7. Host jumps back to the enclave and execution continues > > there in the mmap handler. > > 8. mmap handler does a series of EACCEPTCOPY operations for > > the range with given permissions and empty page as the > > input data. > > > EACCEPTCOPY will require target pages with RW in PTE. So you would need to > make mprotect to change PTE permissions afterwards depending on your target > permissions. EACCEPTCOPY afaik does depend on EPCM permissions, not PTE permissions. > Without knowing much your context, if your intent is to > EACCEPTCOPY(EPCM.RX/EPCM.R, EPCM.pending), then I don't see how the page can > be used later without making it RW again first, and copy real data into it. > So these empty EACCEPTCOPYs may be better just EACCEPTs(EPCM.RW, > EPCM.pending). Then after copy real data into the pages, you do > EMODPE/EMODPR as needed. I take this point other than we could potentially use EACCEPTCOPY to implement semantics for e.g. a mapped file. But yes sometimes you will need take W out bit EMODPR, you're right. The key problem in the SGX2 patch set is not EPCM side but PTE side. In order to effectively adjust PTE permissions, vm_max_prot_bits should be RWX. They make sense only for the static signed content at most. If that is changed in the current patch set, then it should be fairly easy to tune both PTE and EPCM permissions based on situation. BR, Jarkko
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h index c4e0326d281d..2b3a606e78fe 100644 --- a/arch/x86/include/uapi/asm/sgx.h +++ b/arch/x86/include/uapi/asm/sgx.h @@ -35,6 +35,8 @@ enum sgx_page_flags { _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_modt) #define SGX_IOC_ENCLAVE_REMOVE_PAGES \ _IOWR(SGX_MAGIC, 0x08, struct sgx_enclave_remove_pages) +#define SGX_IOC_ENCLAVE_AUGMENT_PAGES \ + _IOWR(SGX_MAGIC, 0x09, struct sgx_enclave_augment_pages) /** * struct sgx_enclave_create - parameter structure for the @@ -138,6 +140,18 @@ struct sgx_enclave_remove_pages { __u64 count; }; +/** + * struct sgx_enclave_augment_pages - parameter structure for the %SGX_IOC_ENCLAVE_ADD_PAGE ioctl + * @offset: starting page offset + * @length: length of the data (multiple of the page size) + * @count: number of bytes added (multiple of the page size) + */ +struct sgx_enclave_augment_pages { + __u64 offset; + __u64 length; + __u64 count; +}; + struct sgx_enclave_run; /** diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 3ad4320ff6ae..1cb410d97ac7 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -1320,6 +1320,162 @@ static long sgx_ioc_enclave_remove_pages(struct sgx_encl *encl, return ret; } +/* + * Performa EAUG operation for an enclave. + */ +static int sgx_encl_augment_page(struct sgx_encl *encl, unsigned long offset) +{ + struct sgx_pageinfo pginfo = {0}; + struct sgx_encl_page *encl_page; + struct sgx_epc_page *epc_page; + struct sgx_va_page *va_page; + u64 secinfo_flags; + int ret; + + /* + * Ignore internal permission checking for dynamically added pages. + * They matter only for data added during the pre-initialization phase. + * The enclave decides the permissions by the means of EACCEPT, + * EACCEPTCOPY and EMODPE. + */ + secinfo_flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X; + encl_page = sgx_encl_page_alloc(encl, offset, secinfo_flags); + if (IS_ERR(encl_page)) + return PTR_ERR(encl_page); + + epc_page = sgx_alloc_epc_page(encl_page, true); + if (IS_ERR(epc_page)) { + ret = PTR_ERR(epc_page); + goto err_alloc_epc_page; + } + + va_page = sgx_encl_grow(encl); + if (IS_ERR(va_page)) { + ret = PTR_ERR(va_page); + goto err_grow; + } + + mutex_lock(&encl->lock); + + /* + * Adding to encl->va_pages must be done under encl->lock. Ditto for + * deleting (via sgx_encl_shrink()) in the error path. + */ + if (va_page) + list_add(&va_page->list, &encl->va_pages); + + /* + * Insert prior to EADD in case of OOM. EADD modifies MRENCLAVE, i.e. + * can't be gracefully unwound, while failure on EADD/EXTEND is limited + * to userspace errors (or kernel/hardware bugs). + */ + ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc), + encl_page, GFP_KERNEL); + + /* + * If ret == -EBUSY then page was created in another flow while + * running without encl->lock + */ + if (ret) + goto err_xa_insert; + + pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page); + pginfo.addr = encl_page->desc & PAGE_MASK; + pginfo.metadata = 0; + + ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page)); + if (ret) + goto err_eaug; + + encl_page->encl = encl; + encl_page->epc_page = epc_page; + encl_page->type = SGX_PAGE_TYPE_REG; + encl->secs_child_cnt++; + + sgx_mark_page_reclaimable(encl_page->epc_page); + + mutex_unlock(&encl->lock); + + return 0; + +err_eaug: + xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc)); + +err_xa_insert: + sgx_encl_shrink(encl, va_page); + mutex_unlock(&encl->lock); + +err_grow: + sgx_encl_free_epc_page(epc_page); + +err_alloc_epc_page: + kfree(encl_page); + + return VM_FAULT_SIGBUS; +} + +/** + * sgx_ioc_enclave_augment_pages() - The handler for %SGX_IOC_ENCLAVE_AUGMENT_PAGES + * @encl: an enclave pointer + * @arg: a user pointer to a struct sgx_enclave_augment_pages instance + * + * Request to augment pages to an initialized enclave. The pages must be + * acknowledged by the enclave by issuing either ENCLU[EACCEPT] (for a + * trivial empty data page) or ENCLU[EACCEPTCOPY] (for a page with arbitrary + * permissions and user provided data). + * + * mmap() protection bits for augmented pages can be arbitrary, i.e. their + * permissions are only capped by the VMA, not by EPCM permissions. This is + * a constraint because enclave can internally control the final EPCM versions, + * for which the kernel has no say. + * + * Return: + * - 0: Success. + * - -EACCES: The source page is located in a noexec partition. + * - -ENOMEM: Out of EPC pages. + * - -EINTR: The call was interrupted before data was processed. + * - -EIO: EAUG failed. + * - -errno: POSIX error. + */ +static long sgx_ioc_enclave_augment_pages(struct sgx_encl *encl, void __user *user_arg) +{ + struct sgx_enclave_augment_pages arg; + unsigned long c; + int ret; + + if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) + return -EINVAL; + + if (copy_from_user(&arg, user_arg, sizeof(arg))) + return -EFAULT; + + if (sgx_validate_offset_length(encl, arg.offset, arg.length)) + return -EINVAL; + + for (c = 0 ; c < arg.length; c += PAGE_SIZE) { + if (signal_pending(current)) { + if (!c) + ret = -ERESTARTSYS; + + break; + } + + if (need_resched()) + cond_resched(); + + ret = sgx_encl_augment_page(encl, arg.offset + c); + if (ret) + break; + } + + arg.count = c; + + if (copy_to_user(user_arg, &arg, sizeof(arg))) + return -EFAULT; + + return ret; +} + long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) { struct sgx_encl *encl = filep->private_data; @@ -1350,6 +1506,9 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) case SGX_IOC_ENCLAVE_REMOVE_PAGES: ret = sgx_ioc_enclave_remove_pages(encl, (void __user *)arg); break; + case SGX_IOC_ENCLAVE_AUGMENT_PAGES: + ret = sgx_ioc_enclave_augment_pages(encl, (void __user *)arg); + break; default: ret = -ENOIOCTLCMD; break;
With SGX1 an enclave needs to be created with its maximum memory demands allocated. Pages cannot be added to an enclave after it is initialized. SGX2 introduces a new function, ENCLS[EAUG], that can be used to add pages to an initialized enclave. With SGX2 the enclave still needs to set aside address space for its maximum memory demands during enclave creation, but all pages need not be added before enclave initialization. Pages can be added during enclave runtime. Add support for dynamically adding pages to an initialized enclave with SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given range of pages. Do not enforce any particular permissions from kernel, like is done for the pages added during the pre-initialization phase, as enclave controls the final permissions and content for these pages by issuing either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary data and permissions). Explicit EAUG ioctl is a better choice than an implicit EAUG from a page fault handler because it allows to have O(1) number of kernel-enclave round trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the case when a page fault handler EAUG single page at a time. Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Nathaniel McCallum <nathaniel@profian.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> --- Is contained in sgx2-v2.1 branch of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git --- arch/x86/include/uapi/asm/sgx.h | 14 +++ arch/x86/kernel/cpu/sgx/ioctl.c | 159 ++++++++++++++++++++++++++++++++ 2 files changed, 173 insertions(+)