[RFC] x86: Add SGX_IOC_ENCLAVE_AUGMENT_PAGES

Message ID	20220304122852.563475-1-jarkko@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-sgx-owner@kernel.org> From: Jarkko Sakkinen <jarkko@kernel.org> To: linux-sgx@vger.kernel.org Cc: Jarkko Sakkinen <jarkko@kernel.org>, Reinette Chatre <reinette.chatre@intel.com>, Nathaniel McCallum <nathaniel@profian.com>, Dave Hansen <dave.hansen@linux.intel.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), "H. Peter Anvin" <hpa@zytor.com>, linux-kernel@vger.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)) Subject: [PATCH RFC] x86: Add SGX_IOC_ENCLAVE_AUGMENT_PAGES Date: Fri, 4 Mar 2022 14:28:52 +0200 Message-Id: <20220304122852.563475-1-jarkko@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[RFC] x86: Add SGX_IOC_ENCLAVE_AUGMENT_PAGES \| expand [RFC] x86: Add SGX_IOC_ENCLAVE_AUGMENT_PAGES

Jarkko Sakkinen March 4, 2022, 12:28 p.m. UTC

With SGX1 an enclave needs to be created with its maximum memory demands
allocated. Pages cannot be added to an enclave after it is initialized.
SGX2 introduces a new function, ENCLS[EAUG], that can be used to add pages
to an initialized enclave. With SGX2 the enclave still needs to set aside
address space for its maximum memory demands during enclave creation, but
all pages need not be added before enclave initialization. Pages can be
added during enclave runtime.

Add support for dynamically adding pages to an initialized enclave with
SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given range of
pages. Do not enforce any particular permissions from kernel, like is done
for the pages added during the pre-initialization phase, as enclave
controls the final permissions and content for these pages by issuing
either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary data and
permissions).

Explicit EAUG ioctl is a better choice than an implicit EAUG from a page
fault handler because it allows to have O(1) number of kernel-enclave round
trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the case
when a page fault handler EAUG single page at a time.

Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Nathaniel McCallum <nathaniel@profian.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
Is contained in sgx2-v2.1 branch of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git
---
 arch/x86/include/uapi/asm/sgx.h |  14 +++
 arch/x86/kernel/cpu/sgx/ioctl.c | 159 ++++++++++++++++++++++++++++++++
 2 files changed, 173 insertions(+)

Jarkko Sakkinen March 4, 2022, 1:50 p.m. UTC | #1

On Fri, 2022-03-04 at 14:28 +0200, Jarkko Sakkinen wrote:
> With SGX1 an enclave needs to be created with its maximum memory demands
> allocated. Pages cannot be added to an enclave after it is initialized.
> SGX2 introduces a new function, ENCLS[EAUG], that can be used to add pages
> to an initialized enclave. With SGX2 the enclave still needs to set aside
> address space for its maximum memory demands during enclave creation, but
> all pages need not be added before enclave initialization. Pages can be
> added during enclave runtime.
> 
> Add support for dynamically adding pages to an initialized enclave with
> SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given range of
> pages. Do not enforce any particular permissions from kernel, like is done
> for the pages added during the pre-initialization phase, as enclave
> controls the final permissions and content for these pages by issuing
> either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary data and
> permissions).
> 
> Explicit EAUG ioctl is a better choice than an implicit EAUG from a page
> fault handler because it allows to have O(1) number of kernel-enclave round
> trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the case
> when a page fault handler EAUG single page at a time.
> 
> Cc: Reinette Chatre <reinette.chatre@intel.com>
> Cc: Nathaniel McCallum <nathaniel@profian.com>
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> ---
> Is contained in sgx2-v2.1 branch of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git

I created sgx2-v2.2 branch, which has #PF EAUG removed. I
also moved selftests to the tail in the patch sets so that
it is easier to update them reflecting these and future
changes. Having them intervened makes things just complicated.

I focus now to implement mmap() for Enarx with this, so no
kselftest update just yet.

Roughly the sequence in Enarx is:

1. Enclave traps on syscall (opcode).
2. Host jumps to shim expection handler.
3. Enclave copies the mmap() arguments to a buffer outside
   the enclave.
4. Enclave exists back to the host.
5. Host performs EAUG to the mmap range.
6. Host performs mmap() to the mmap range, which succeeds
   given that vm_max_prot_bits is RWX (i.e. disabled for
   dynamic pages).
7. Host jumps back to the enclave and execution continues
   there in the mmap handler.
8. mmap handler does a series of EACCEPTCOPY operations for
   the range with given permissions and empty page as the
   input data.

Some details might differ a bit but this gives the basic idea.
I like the fact the roud-trips are kind of in control and not
variable, and it is pretty easy to use to implement the basic
syscall behaviour. This has of course some corner cases but
my sequence describes the main flow anyway.

Take it or leave it but this does give at least for me a sound
way to implement my workload. I'll use this up until my changes
have been inducted to the original patch set, or it starts to
look sane with other solutions. The original patch set simply
does not work for us at all.

BR, Jarkko

Haitao Huang March 4, 2022, 4:27 p.m. UTC | #2

On Fri, 04 Mar 2022 06:28:52 -0600, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> With SGX1 an enclave needs to be created with its maximum memory demands
> allocated. Pages cannot be added to an enclave after it is initialized.
> SGX2 introduces a new function, ENCLS[EAUG], that can be used to add  
> pages
> to an initialized enclave. With SGX2 the enclave still needs to set aside
> address space for its maximum memory demands during enclave creation, but
> all pages need not be added before enclave initialization. Pages can be
> added during enclave runtime.
>
> Add support for dynamically adding pages to an initialized enclave with
> SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given range of
> pages. Do not enforce any particular permissions from kernel, like is  
> done
> for the pages added during the pre-initialization phase, as enclave
> controls the final permissions and content for these pages by issuing
> either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary data  
> and
> permissions).
>
> Explicit EAUG ioctl is a better choice than an implicit EAUG from a page
> fault handler because it allows to have O(1) number of kernel-enclave  
> round
> trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the  
> case
> when a page fault handler EAUG single page at a time.
>
> Cc: Reinette Chatre <reinette.chatre@intel.com>
> Cc: Nathaniel McCallum <nathaniel@profian.com>
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> ---
> Is contained in sgx2-v2.1 branch of  
> git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git
> ---
>  arch/x86/include/uapi/asm/sgx.h |  14 +++
>  arch/x86/kernel/cpu/sgx/ioctl.c | 159 ++++++++++++++++++++++++++++++++
>  2 files changed, 173 insertions(+)
>
> diff --git a/arch/x86/include/uapi/asm/sgx.h  
> b/arch/x86/include/uapi/asm/sgx.h
> index c4e0326d281d..2b3a606e78fe 100644
> --- a/arch/x86/include/uapi/asm/sgx.h
> +++ b/arch/x86/include/uapi/asm/sgx.h
> @@ -35,6 +35,8 @@ enum sgx_page_flags {
>  	_IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_modt)
>  #define SGX_IOC_ENCLAVE_REMOVE_PAGES \
>  	_IOWR(SGX_MAGIC, 0x08, struct sgx_enclave_remove_pages)
> +#define SGX_IOC_ENCLAVE_AUGMENT_PAGES \
> +	_IOWR(SGX_MAGIC, 0x09, struct sgx_enclave_augment_pages)
> /**
>   * struct sgx_enclave_create - parameter structure for the
> @@ -138,6 +140,18 @@ struct sgx_enclave_remove_pages {
>  	__u64 count;
>  };
> +/**
> + * struct sgx_enclave_augment_pages - parameter structure for the  
> %SGX_IOC_ENCLAVE_ADD_PAGE ioctl
> + * @offset:	starting page offset
> + * @length:	length of the data (multiple of the page size)
> + * @count:	number of bytes added (multiple of the page size)
> + */
> +struct sgx_enclave_augment_pages {
> +	__u64 offset;
> +	__u64 length;
> +	__u64 count;
> +};
> +

As I stated in another thread, we need a mechanism to allow EAUG page  
lazily, e.g., on #PF. Can we add a field here to indicate that?

>  struct sgx_enclave_run;
> /**
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c  
> b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 3ad4320ff6ae..1cb410d97ac7 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -1320,6 +1320,162 @@ static long sgx_ioc_enclave_remove_pages(struct  
> sgx_encl *encl,
>  	return ret;
>  }
> +/*
> + * Performa EAUG operation for an enclave.
> + */
> +static int sgx_encl_augment_page(struct sgx_encl *encl, unsigned long  
> offset)
> +{
> +	struct sgx_pageinfo pginfo = {0};
> +	struct sgx_encl_page *encl_page;
> +	struct sgx_epc_page *epc_page;
> +	struct sgx_va_page *va_page;
> +	u64 secinfo_flags;
> +	int ret;
> +
> +	/*
> +	 * Ignore internal permission checking for dynamically added pages.
> +	 * They matter only for data added during the pre-initialization phase.
> +	 * The enclave decides the permissions by the means of EACCEPT,
> +	 * EACCEPTCOPY and EMODPE.
> +	 */
> +	secinfo_flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X;
> +	encl_page = sgx_encl_page_alloc(encl, offset, secinfo_flags);
> +	if (IS_ERR(encl_page))
> +		return PTR_ERR(encl_page);
> +
> +	epc_page = sgx_alloc_epc_page(encl_page, true);
> +	if (IS_ERR(epc_page)) {
> +		ret = PTR_ERR(epc_page);
> +		goto err_alloc_epc_page;
> +	}
> +
> +	va_page = sgx_encl_grow(encl);
> +	if (IS_ERR(va_page)) {
> +		ret = PTR_ERR(va_page);
> +		goto err_grow;
> +	}
> +
> +	mutex_lock(&encl->lock);
> +
> +	/*
> +	 * Adding to encl->va_pages must be done under encl->lock.  Ditto for
> +	 * deleting (via sgx_encl_shrink()) in the error path.
> +	 */
> +	if (va_page)
> +		list_add(&va_page->list, &encl->va_pages);
> +
> +	/*
> +	 * Insert prior to EADD in case of OOM.  EADD modifies MRENCLAVE, i.e.
> +	 * can't be gracefully unwound, while failure on EADD/EXTEND is limited
> +	 * to userspace errors (or kernel/hardware bugs).
> +	 */
> +	ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
> +			encl_page, GFP_KERNEL);
> +
> +	/*
> +	 * If ret == -EBUSY then page was created in another flow while
> +	 * running without encl->lock
> +	 */
> +	if (ret)
> +		goto err_xa_insert;
> +
> +	pginfo.secs = (unsigned  
> long)sgx_get_epc_virt_addr(encl->secs.epc_page);
> +	pginfo.addr = encl_page->desc & PAGE_MASK;
> +	pginfo.metadata = 0;
> +
> +	ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
> +	if (ret)
> +		goto err_eaug;
> +
> +	encl_page->encl = encl;
> +	encl_page->epc_page = epc_page;
> +	encl_page->type = SGX_PAGE_TYPE_REG;
> +	encl->secs_child_cnt++;
> +
> +	sgx_mark_page_reclaimable(encl_page->epc_page);
> +
> +	mutex_unlock(&encl->lock);
> +
> +	return 0;
> +
> +err_eaug:
> +	xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
> +
> +err_xa_insert:
> +	sgx_encl_shrink(encl, va_page);
> +	mutex_unlock(&encl->lock);
> +
> +err_grow:
> +	sgx_encl_free_epc_page(epc_page);
> +
> +err_alloc_epc_page:
> +	kfree(encl_page);
> +
> +	return VM_FAULT_SIGBUS;
> +}
> +
> +/**
> + * sgx_ioc_enclave_augment_pages() - The handler for  
> %SGX_IOC_ENCLAVE_AUGMENT_PAGES
> + * @encl:       an enclave pointer
> + * @arg:	a user pointer to a struct sgx_enclave_augment_pages instance
> + *
> + * Request to augment pages to an initialized enclave. The pages must be
> + * acknowledged by the enclave by issuing either ENCLU[EACCEPT] (for a
> + * trivial empty data page) or ENCLU[EACCEPTCOPY] (for a page with  
> arbitrary
> + * permissions and user provided data).
> + *
> + * mmap() protection bits for augmented pages can be arbitrary, i.e.  
> their
> + * permissions are only capped by the VMA, not by EPCM permissions.  
> This is
> + * a constraint because enclave can internally control the final EPCM  
> versions,
> + * for which the kernel has no say.
> + *
> + * Return:
> + * - 0:		Success.
> + * - -EACCES:	The source page is located in a noexec partition.
> + * - -ENOMEM:	Out of EPC pages.
> + * - -EINTR:	The call was interrupted before data was processed.
> + * - -EIO:	EAUG failed.
> + * - -errno:	POSIX error.
> + */
> +static long sgx_ioc_enclave_augment_pages(struct sgx_encl *encl, void  
> __user *user_arg)
> +{
> +	struct sgx_enclave_augment_pages arg;
> +	unsigned long c;
> +	int ret;
> +
> +	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&arg, user_arg, sizeof(arg)))
> +		return -EFAULT;
> +
> +	if (sgx_validate_offset_length(encl, arg.offset, arg.length))
> +		return -EINVAL;
> +
> +	for (c = 0 ; c < arg.length; c += PAGE_SIZE) {
> +		if (signal_pending(current)) {
> +			if (!c)
> +				ret = -ERESTARTSYS;
> +
> +			break;
> +		}
> +
> +		if (need_resched())
> +			cond_resched();
> +
> +		ret = sgx_encl_augment_page(encl, arg.offset + c);
> +		if (ret)
> +			break;
> +	}
> +
> +	arg.count = c;
> +
> +	if (copy_to_user(user_arg, &arg, sizeof(arg)))
> +		return -EFAULT;
> +
> +	return ret;
> +}
> +
>  long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  {
>  	struct sgx_encl *encl = filep->private_data;
> @@ -1350,6 +1506,9 @@ long sgx_ioctl(struct file *filep, unsigned int  
> cmd, unsigned long arg)
>  	case SGX_IOC_ENCLAVE_REMOVE_PAGES:
>  		ret = sgx_ioc_enclave_remove_pages(encl, (void __user *)arg);
>  		break;
> +	case SGX_IOC_ENCLAVE_AUGMENT_PAGES:
> +		ret = sgx_ioc_enclave_augment_pages(encl, (void __user *)arg);
> +		break;
>  	default:
>  		ret = -ENOIOCTLCMD;
>  		break;

Dave Hansen March 4, 2022, 5:08 p.m. UTC | #3

On 3/4/22 04:28, Jarkko Sakkinen wrote:
> Explicit EAUG ioctl is a better choice than an implicit EAUG from a page
> fault handler because it allows to have O(1) number of kernel-enclave round
> trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the case
> when a page fault handler EAUG single page at a time.

So this is basically an optimization?  It's MADV_WILLNEED or
MAP_POPULATE to the cost of avoid future faults?

Jarkko Sakkinen March 5, 2022, 1:26 a.m. UTC | #4

On Fri, Mar 04, 2022 at 10:27:58AM -0600, Haitao Huang wrote:
> On Fri, 04 Mar 2022 06:28:52 -0600, Jarkko Sakkinen <jarkko@kernel.org>
> wrote:
> 
> > With SGX1 an enclave needs to be created with its maximum memory demands
> > allocated. Pages cannot be added to an enclave after it is initialized.
> > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add
> > pages
> > to an initialized enclave. With SGX2 the enclave still needs to set aside
> > address space for its maximum memory demands during enclave creation, but
> > all pages need not be added before enclave initialization. Pages can be
> > added during enclave runtime.
> > 
> > Add support for dynamically adding pages to an initialized enclave with
> > SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given range of
> > pages. Do not enforce any particular permissions from kernel, like is
> > done
> > for the pages added during the pre-initialization phase, as enclave
> > controls the final permissions and content for these pages by issuing
> > either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary data
> > and
> > permissions).
> > 
> > Explicit EAUG ioctl is a better choice than an implicit EAUG from a page
> > fault handler because it allows to have O(1) number of kernel-enclave
> > round
> > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the
> > case
> > when a page fault handler EAUG single page at a time.
> > 
> > Cc: Reinette Chatre <reinette.chatre@intel.com>
> > Cc: Nathaniel McCallum <nathaniel@profian.com>
> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > ---
> > Is contained in sgx2-v2.1 branch of
> > git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git
> > ---
> >  arch/x86/include/uapi/asm/sgx.h |  14 +++
> >  arch/x86/kernel/cpu/sgx/ioctl.c | 159 ++++++++++++++++++++++++++++++++
> >  2 files changed, 173 insertions(+)
> > 
> > diff --git a/arch/x86/include/uapi/asm/sgx.h
> > b/arch/x86/include/uapi/asm/sgx.h
> > index c4e0326d281d..2b3a606e78fe 100644
> > --- a/arch/x86/include/uapi/asm/sgx.h
> > +++ b/arch/x86/include/uapi/asm/sgx.h
> > @@ -35,6 +35,8 @@ enum sgx_page_flags {
> >  	_IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_modt)
> >  #define SGX_IOC_ENCLAVE_REMOVE_PAGES \
> >  	_IOWR(SGX_MAGIC, 0x08, struct sgx_enclave_remove_pages)
> > +#define SGX_IOC_ENCLAVE_AUGMENT_PAGES \
> > +	_IOWR(SGX_MAGIC, 0x09, struct sgx_enclave_augment_pages)
> > /**
> >   * struct sgx_enclave_create - parameter structure for the
> > @@ -138,6 +140,18 @@ struct sgx_enclave_remove_pages {
> >  	__u64 count;
> >  };
> > +/**
> > + * struct sgx_enclave_augment_pages - parameter structure for the
> > %SGX_IOC_ENCLAVE_ADD_PAGE ioctl
> > + * @offset:	starting page offset
> > + * @length:	length of the data (multiple of the page size)
> > + * @count:	number of bytes added (multiple of the page size)
> > + */
> > +struct sgx_enclave_augment_pages {
> > +	__u64 offset;
> > +	__u64 length;
> > +	__u64 count;
> > +};
> > +
> 
> As I stated in another thread, we need a mechanism to allow EAUG page
> lazily, e.g., on #PF. Can we add a field here to indicate that?

ioctl *does not* prevent lazy behaviour where, or if, it makes sense.

For growing memory (e.g. MAP_GROWSDOWN) you should just take advantage of
the vDSO's exception handling mechanism and call the ioctl on demand.

For a high-performance user space you still want to be also do minimum
round trip "batch jobs" where they are possible.

We exactly have the whole vDSO framework to service the on-demand needs
while still having full control of the execution. EAUG in the #PF handler
is all about being flakky and loosing all robustness.

BR, Jarkko

Jarkko Sakkinen March 5, 2022, 2:17 a.m. UTC | #5

On Fri, Mar 04, 2022 at 09:08:20AM -0800, Dave Hansen wrote:
> On 3/4/22 04:28, Jarkko Sakkinen wrote:
> > Explicit EAUG ioctl is a better choice than an implicit EAUG from a page
> > fault handler because it allows to have O(1) number of kernel-enclave round
> > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the case
> > when a page fault handler EAUG single page at a time.
> 
> So this is basically an optimization?  It's MADV_WILLNEED or
> MAP_POPULATE to the cost of avoid future faults?

Yes.

So the idea would be that based on these the #PF handler would have more
smartness, and it would do a batch of EAUG's?

That could be possibly acceptable but I also had other concern.

I would like to see this:

1. Removal of vm_run_prot_bits.
2. Use RWX vm_max_prot_bits for EAUG'd pages.

During run-time kernel controls PTE's, and enclave has full control of the
EPCM (EACCEPT, EACCEPTCOPY, EMODPE). By creating artificial limitations how
to operate with these, it can limit various optimizations in the user space
code. E.g. a syscall shim can require clever co-operation between in-enclave
opcodes and what you do with the kernel in various situations.

RWX sounds provocative yes, but here it means only the limits where kernel
can set its PTE's and nothing else, not that page table is filled with RWX
pages, and enclave dictates what is in EPCM, and that's how it actually
should be (e.g. you can sometimes deliver mmap() without ever going out
of the enclave with EMODPE).

If MADV_WILLNEED/MAP_POPULATE approach is combined with this what I
discussed here, then I think we could have solution to write an efficient
memory management shims.

BR, Jarkko

Jarkko Sakkinen March 5, 2022, 2:32 a.m. UTC | #6

On Sat, Mar 05, 2022 at 04:17:53AM +0200, Jarkko Sakkinen wrote:
> On Fri, Mar 04, 2022 at 09:08:20AM -0800, Dave Hansen wrote:
> > On 3/4/22 04:28, Jarkko Sakkinen wrote:
> > > Explicit EAUG ioctl is a better choice than an implicit EAUG from a page
> > > fault handler because it allows to have O(1) number of kernel-enclave round
> > > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the case
> > > when a page fault handler EAUG single page at a time.
> > 
> > So this is basically an optimization?  It's MADV_WILLNEED or
> > MAP_POPULATE to the cost of avoid future faults?
> 
> Yes.
> 
> So the idea would be that based on these the #PF handler would have more
> smartness, and it would do a batch of EAUG's?
> 
> That could be possibly acceptable but I also had other concern.
> 
> I would like to see this:
> 
> 1. Removal of vm_run_prot_bits.
> 2. Use RWX vm_max_prot_bits for EAUG'd pages.
> 
> During run-time kernel controls PTE's, and enclave has full control of the
> EPCM (EACCEPT, EACCEPTCOPY, EMODPE). By creating artificial limitations how
> to operate with these, it can limit various optimizations in the user space
> code. E.g. a syscall shim can require clever co-operation between in-enclave
> opcodes and what you do with the kernel in various situations.
> 
> RWX sounds provocative yes, but here it means only the limits where kernel
> can set its PTE's and nothing else, not that page table is filled with RWX
> pages, and enclave dictates what is in EPCM, and that's how it actually
> should be (e.g. you can sometimes deliver mmap() without ever going out
> of the enclave with EMODPE).
> 
> If MADV_WILLNEED/MAP_POPULATE approach is combined with this what I
> discussed here, then I think we could have solution to write an efficient
> memory management shims.

Do you already have a rough idea what needs to be done? I can take anyway a
look but just in case you had processed this further, please tell what you
have.

BR, Jarkko

Jarkko Sakkinen March 5, 2022, 2:55 a.m. UTC | #7

On Sat, Mar 05, 2022 at 04:32:12AM +0200, Jarkko Sakkinen wrote:
> On Sat, Mar 05, 2022 at 04:17:53AM +0200, Jarkko Sakkinen wrote:
> > On Fri, Mar 04, 2022 at 09:08:20AM -0800, Dave Hansen wrote:
> > > On 3/4/22 04:28, Jarkko Sakkinen wrote:
> > > > Explicit EAUG ioctl is a better choice than an implicit EAUG from a page
> > > > fault handler because it allows to have O(1) number of kernel-enclave round
> > > > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the case
> > > > when a page fault handler EAUG single page at a time.
> > > 
> > > So this is basically an optimization?  It's MADV_WILLNEED or
> > > MAP_POPULATE to the cost of avoid future faults?
> > 
> > Yes.
> > 
> > So the idea would be that based on these the #PF handler would have more
> > smartness, and it would do a batch of EAUG's?
> > 
> > That could be possibly acceptable but I also had other concern.
> > 
> > I would like to see this:
> > 
> > 1. Removal of vm_run_prot_bits.
> > 2. Use RWX vm_max_prot_bits for EAUG'd pages.
> > 
> > During run-time kernel controls PTE's, and enclave has full control of the
> > EPCM (EACCEPT, EACCEPTCOPY, EMODPE). By creating artificial limitations how
> > to operate with these, it can limit various optimizations in the user space
> > code. E.g. a syscall shim can require clever co-operation between in-enclave
> > opcodes and what you do with the kernel in various situations.
> > 
> > RWX sounds provocative yes, but here it means only the limits where kernel
> > can set its PTE's and nothing else, not that page table is filled with RWX
> > pages, and enclave dictates what is in EPCM, and that's how it actually
> > should be (e.g. you can sometimes deliver mmap() without ever going out
> > of the enclave with EMODPE).
> > 
> > If MADV_WILLNEED/MAP_POPULATE approach is combined with this what I
> > discussed here, then I think we could have solution to write an efficient
> > memory management shims.
> 
> Do you already have a rough idea what needs to be done? I can take anyway a
> look but just in case you had processed this further, please tell what you
> have.

MAP_POPULATE would almost exact match to the ioctl, so I guess that could
really work. I can give this a shot.

Does MAP_POPULATE "come through" the driver's mmap callback?

BR, Jarkko

Haitao Huang March 6, 2022, 1:38 p.m. UTC | #8

On Fri, 04 Mar 2022 19:26:12 -0600, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Fri, Mar 04, 2022 at 10:27:58AM -0600, Haitao Huang wrote:
>> On Fri, 04 Mar 2022 06:28:52 -0600, Jarkko Sakkinen <jarkko@kernel.org>
>> wrote:
>>
>> > With SGX1 an enclave needs to be created with its maximum memory  
>> demands
>> > allocated. Pages cannot be added to an enclave after it is  
>> initialized.
>> > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add
>> > pages
>> > to an initialized enclave. With SGX2 the enclave still needs to set  
>> aside
>> > address space for its maximum memory demands during enclave creation,  
>> but
>> > all pages need not be added before enclave initialization. Pages can  
>> be
>> > added during enclave runtime.
>> >
>> > Add support for dynamically adding pages to an initialized enclave  
>> with
>> > SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given range  
>> of
>> > pages. Do not enforce any particular permissions from kernel, like is
>> > done
>> > for the pages added during the pre-initialization phase, as enclave
>> > controls the final permissions and content for these pages by issuing
>> > either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary data
>> > and
>> > permissions).
>> >
>> > Explicit EAUG ioctl is a better choice than an implicit EAUG from a  
>> page
>> > fault handler because it allows to have O(1) number of kernel-enclave
>> > round
>> > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the
>> > case
>> > when a page fault handler EAUG single page at a time.
>> >
>> > Cc: Reinette Chatre <reinette.chatre@intel.com>
>> > Cc: Nathaniel McCallum <nathaniel@profian.com>
>> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
>> > ---
>> > Is contained in sgx2-v2.1 branch of
>> > git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git
>> > ---
>> >  arch/x86/include/uapi/asm/sgx.h |  14 +++
>> >  arch/x86/kernel/cpu/sgx/ioctl.c | 159  
>> ++++++++++++++++++++++++++++++++
>> >  2 files changed, 173 insertions(+)
>> >
>> > diff --git a/arch/x86/include/uapi/asm/sgx.h
>> > b/arch/x86/include/uapi/asm/sgx.h
>> > index c4e0326d281d..2b3a606e78fe 100644
>> > --- a/arch/x86/include/uapi/asm/sgx.h
>> > +++ b/arch/x86/include/uapi/asm/sgx.h
>> > @@ -35,6 +35,8 @@ enum sgx_page_flags {
>> >  	_IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_modt)
>> >  #define SGX_IOC_ENCLAVE_REMOVE_PAGES \
>> >  	_IOWR(SGX_MAGIC, 0x08, struct sgx_enclave_remove_pages)
>> > +#define SGX_IOC_ENCLAVE_AUGMENT_PAGES \
>> > +	_IOWR(SGX_MAGIC, 0x09, struct sgx_enclave_augment_pages)
>> > /**
>> >   * struct sgx_enclave_create - parameter structure for the
>> > @@ -138,6 +140,18 @@ struct sgx_enclave_remove_pages {
>> >  	__u64 count;
>> >  };
>> > +/**
>> > + * struct sgx_enclave_augment_pages - parameter structure for the
>> > %SGX_IOC_ENCLAVE_ADD_PAGE ioctl
>> > + * @offset:	starting page offset
>> > + * @length:	length of the data (multiple of the page size)
>> > + * @count:	number of bytes added (multiple of the page size)
>> > + */
>> > +struct sgx_enclave_augment_pages {
>> > +	__u64 offset;
>> > +	__u64 length;
>> > +	__u64 count;
>> > +};
>> > +
>>
>> As I stated in another thread, we need a mechanism to allow EAUG page
>> lazily, e.g., on #PF. Can we add a field here to indicate that?
>
> ioctl *does not* prevent lazy behaviour where, or if, it makes sense.
>
> For growing memory (e.g. MAP_GROWSDOWN) you should just take advantage of
> the vDSO's exception handling mechanism and call the ioctl on demand.
>
> For a high-performance user space you still want to be also do minimum
> round trip "batch jobs" where they are possible.
>

Looks like you are pursuing MAP_POPULATE to optimize out the O(N) trips.
Just for my understanding of your proposal in case this ever comes back.  
For the on-demand case,  this ioctl is required for for each #PF. That's  
extra round trip compared to automatic kernel EAUG on #PF.


> We exactly have the whole vDSO framework to service the on-demand needs
> while still having full control of the execution. EAUG in the #PF handler
> is all about being flakky and loosing all robustness.
>

Again, you keep saying EAUG in the #PF handler is not good. So far the  
only concrete thing I hear is extra O(N) round trips which can be  
optimized out.

Haitao Huang March 6, 2022, 3:20 p.m. UTC | #9

On Fri, 04 Mar 2022 07:50:43 -0600, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Fri, 2022-03-04 at 14:28 +0200, Jarkko Sakkinen wrote:
>> With SGX1 an enclave needs to be created with its maximum memory demands
>> allocated. Pages cannot be added to an enclave after it is initialized.
>> SGX2 introduces a new function, ENCLS[EAUG], that can be used to add  
>> pages
>> to an initialized enclave. With SGX2 the enclave still needs to set  
>> aside
>> address space for its maximum memory demands during enclave creation,  
>> but
>> all pages need not be added before enclave initialization. Pages can be
>> added during enclave runtime.
>>
>> Add support for dynamically adding pages to an initialized enclave with
>> SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given range of
>> pages. Do not enforce any particular permissions from kernel, like is  
>> done
>> for the pages added during the pre-initialization phase, as enclave
>> controls the final permissions and content for these pages by issuing
>> either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary data  
>> and
>> permissions).
>>
>> Explicit EAUG ioctl is a better choice than an implicit EAUG from a page
>> fault handler because it allows to have O(1) number of kernel-enclave  
>> round
>> trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the  
>> case
>> when a page fault handler EAUG single page at a time.
>>
>> Cc: Reinette Chatre <reinette.chatre@intel.com>
>> Cc: Nathaniel McCallum <nathaniel@profian.com>
>> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
>> ---
>> Is contained in sgx2-v2.1 branch of  
>> git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git
>
> I created sgx2-v2.2 branch, which has #PF EAUG removed. I
> also moved selftests to the tail in the patch sets so that
> it is easier to update them reflecting these and future
> changes. Having them intervened makes things just complicated.
>
> I focus now to implement mmap() for Enarx with this, so no
> kselftest update just yet.
>
> Roughly the sequence in Enarx is:
>
> 1. Enclave traps on syscall (opcode).
> 2. Host jumps to shim expection handler.
> 3. Enclave copies the mmap() arguments to a buffer outside
>    the enclave.
> 4. Enclave exists back to the host.
> 5. Host performs EAUG to the mmap range.
> 6. Host performs mmap() to the mmap range, which succeeds
>    given that vm_max_prot_bits is RWX (i.e. disabled for
>    dynamic pages).
> 7. Host jumps back to the enclave and execution continues
>    there in the mmap handler.
> 8. mmap handler does a series of EACCEPTCOPY operations for
>    the range with given permissions and empty page as the
>    input data.
>
EACCEPTCOPY will require target pages with RW in PTE. So you would need to  
make mprotect to change PTE permissions afterwards depending on your  
target permissions.

Without knowing much your context, if your intent is to  
EACCEPTCOPY(EPCM.RX/EPCM.R, EPCM.pending), then I don't see how the page  
can be used later without making it RW again first, and copy real data  
into it.
So these empty EACCEPTCOPYs may be better just EACCEPTs(EPCM.RW,  
EPCM.pending). Then after copy real data into the pages, you do  
EMODPE/EMODPR as needed.


EACCEPTCOPY would make more sense when you already have data to be copied  
into the EAUG'd but pending EPC pages.

> Some details might differ a bit but this gives the basic idea.
> I like the fact the roud-trips are kind of in control and not
> variable, and it is pretty easy to use to implement the basic
> syscall behaviour. This has of course some corner cases but
> my sequence describes the main flow anyway.
>
> Take it or leave it but this does give at least for me a sound
> way to implement my workload. I'll use this up until my changes
> have been inducted to the original patch set, or it starts to
> look sane with other solutions. The original patch set simply
> does not work for us at all.
>
> BR, Jarkko

Jarkko Sakkinen March 6, 2022, 4:18 p.m. UTC | #10

On Sun, Mar 06, 2022 at 07:38:40AM -0600, Haitao Huang wrote:
> On Fri, 04 Mar 2022 19:26:12 -0600, Jarkko Sakkinen <jarkko@kernel.org>
> wrote:
> 
> > On Fri, Mar 04, 2022 at 10:27:58AM -0600, Haitao Huang wrote:
> > > On Fri, 04 Mar 2022 06:28:52 -0600, Jarkko Sakkinen <jarkko@kernel.org>
> > > wrote:
> > > 
> > > > With SGX1 an enclave needs to be created with its maximum memory
> > > demands
> > > > allocated. Pages cannot be added to an enclave after it is
> > > initialized.
> > > > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add
> > > > pages
> > > > to an initialized enclave. With SGX2 the enclave still needs to
> > > set aside
> > > > address space for its maximum memory demands during enclave
> > > creation, but
> > > > all pages need not be added before enclave initialization. Pages
> > > can be
> > > > added during enclave runtime.
> > > >
> > > > Add support for dynamically adding pages to an initialized enclave
> > > with
> > > > SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given
> > > range of
> > > > pages. Do not enforce any particular permissions from kernel, like is
> > > > done
> > > > for the pages added during the pre-initialization phase, as enclave
> > > > controls the final permissions and content for these pages by issuing
> > > > either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary data
> > > > and
> > > > permissions).
> > > >
> > > > Explicit EAUG ioctl is a better choice than an implicit EAUG from
> > > a page
> > > > fault handler because it allows to have O(1) number of kernel-enclave
> > > > round
> > > > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in the
> > > > case
> > > > when a page fault handler EAUG single page at a time.
> > > >
> > > > Cc: Reinette Chatre <reinette.chatre@intel.com>
> > > > Cc: Nathaniel McCallum <nathaniel@profian.com>
> > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > > ---
> > > > Is contained in sgx2-v2.1 branch of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git
> > > > ---
> > > >  arch/x86/include/uapi/asm/sgx.h |  14 +++
> > > >  arch/x86/kernel/cpu/sgx/ioctl.c | 159
> > > ++++++++++++++++++++++++++++++++
> > > >  2 files changed, 173 insertions(+)
> > > >
> > > > diff --git a/arch/x86/include/uapi/asm/sgx.h
> > > > b/arch/x86/include/uapi/asm/sgx.h
> > > > index c4e0326d281d..2b3a606e78fe 100644
> > > > --- a/arch/x86/include/uapi/asm/sgx.h
> > > > +++ b/arch/x86/include/uapi/asm/sgx.h
> > > > @@ -35,6 +35,8 @@ enum sgx_page_flags {
> > > >  	_IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_modt)
> > > >  #define SGX_IOC_ENCLAVE_REMOVE_PAGES \
> > > >  	_IOWR(SGX_MAGIC, 0x08, struct sgx_enclave_remove_pages)
> > > > +#define SGX_IOC_ENCLAVE_AUGMENT_PAGES \
> > > > +	_IOWR(SGX_MAGIC, 0x09, struct sgx_enclave_augment_pages)
> > > > /**
> > > >   * struct sgx_enclave_create - parameter structure for the
> > > > @@ -138,6 +140,18 @@ struct sgx_enclave_remove_pages {
> > > >  	__u64 count;
> > > >  };
> > > > +/**
> > > > + * struct sgx_enclave_augment_pages - parameter structure for the
> > > > %SGX_IOC_ENCLAVE_ADD_PAGE ioctl
> > > > + * @offset:	starting page offset
> > > > + * @length:	length of the data (multiple of the page size)
> > > > + * @count:	number of bytes added (multiple of the page size)
> > > > + */
> > > > +struct sgx_enclave_augment_pages {
> > > > +	__u64 offset;
> > > > +	__u64 length;
> > > > +	__u64 count;
> > > > +};
> > > > +
> > > 
> > > As I stated in another thread, we need a mechanism to allow EAUG page
> > > lazily, e.g., on #PF. Can we add a field here to indicate that?
> > 
> > ioctl *does not* prevent lazy behaviour where, or if, it makes sense.
> > 
> > For growing memory (e.g. MAP_GROWSDOWN) you should just take advantage of
> > the vDSO's exception handling mechanism and call the ioctl on demand.
> > 
> > For a high-performance user space you still want to be also do minimum
> > round trip "batch jobs" where they are possible.
> > 
> 
> Looks like you are pursuing MAP_POPULATE to optimize out the O(N) trips.
> Just for my understanding of your proposal in case this ever comes back. For
> the on-demand case,  this ioctl is required for for each #PF. That's extra
> round trip compared to automatic kernel EAUG on #PF.

Point taken.

I think we could have both. It would then work quite a lot like for
normal memory.

BR, Jarkko

Jarkko Sakkinen March 6, 2022, 4:30 p.m. UTC | #11

On Sun, Mar 06, 2022 at 09:20:11AM -0600, Haitao Huang wrote:
> On Fri, 04 Mar 2022 07:50:43 -0600, Jarkko Sakkinen <jarkko@kernel.org>
> wrote:
> 
> > On Fri, 2022-03-04 at 14:28 +0200, Jarkko Sakkinen wrote:
> > > With SGX1 an enclave needs to be created with its maximum memory demands
> > > allocated. Pages cannot be added to an enclave after it is initialized.
> > > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add
> > > pages
> > > to an initialized enclave. With SGX2 the enclave still needs to set
> > > aside
> > > address space for its maximum memory demands during enclave
> > > creation, but
> > > all pages need not be added before enclave initialization. Pages can be
> > > added during enclave runtime.
> > > 
> > > Add support for dynamically adding pages to an initialized enclave with
> > > SGX_IOC_ENCLAVE_AUGMENT_PAGES, which performs EAUG's to a given range of
> > > pages. Do not enforce any particular permissions from kernel, like
> > > is done
> > > for the pages added during the pre-initialization phase, as enclave
> > > controls the final permissions and content for these pages by issuing
> > > either ENCLU[EACCEPT] (empty RW) or ENCLU[EACCEPTCOPY] (arbitrary
> > > data and
> > > permissions).
> > > 
> > > Explicit EAUG ioctl is a better choice than an implicit EAUG from a page
> > > fault handler because it allows to have O(1) number of
> > > kernel-enclave round
> > > trips for EAUG-EACCEPT{COPY} process, instead of O(n), as it is in
> > > the case
> > > when a page fault handler EAUG single page at a time.
> > > 
> > > Cc: Reinette Chatre <reinette.chatre@intel.com>
> > > Cc: Nathaniel McCallum <nathaniel@profian.com>
> > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > ---
> > > Is contained in sgx2-v2.1 branch of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git
> > 
> > I created sgx2-v2.2 branch, which has #PF EAUG removed. I
> > also moved selftests to the tail in the patch sets so that
> > it is easier to update them reflecting these and future
> > changes. Having them intervened makes things just complicated.
> > 
> > I focus now to implement mmap() for Enarx with this, so no
> > kselftest update just yet.
> > 
> > Roughly the sequence in Enarx is:
> > 
> > 1. Enclave traps on syscall (opcode).
> > 2. Host jumps to shim expection handler.
> > 3. Enclave copies the mmap() arguments to a buffer outside
> >    the enclave.
> > 4. Enclave exists back to the host.
> > 5. Host performs EAUG to the mmap range.
> > 6. Host performs mmap() to the mmap range, which succeeds
> >    given that vm_max_prot_bits is RWX (i.e. disabled for
> >    dynamic pages).
> > 7. Host jumps back to the enclave and execution continues
> >    there in the mmap handler.
> > 8. mmap handler does a series of EACCEPTCOPY operations for
> >    the range with given permissions and empty page as the
> >    input data.
> > 
> EACCEPTCOPY will require target pages with RW in PTE. So you would need to
> make mprotect to change PTE permissions afterwards depending on your target
> permissions.

EACCEPTCOPY afaik does depend on EPCM permissions, not PTE permissions.

> Without knowing much your context, if your intent is to
> EACCEPTCOPY(EPCM.RX/EPCM.R, EPCM.pending), then I don't see how the page can
> be used later without making it RW again first, and copy real data into it.
> So these empty EACCEPTCOPYs may be better just EACCEPTs(EPCM.RW,
> EPCM.pending). Then after copy real data into the pages, you do
> EMODPE/EMODPR as needed.

I take this point other than we could potentially use EACCEPTCOPY to
implement semantics for e.g. a mapped file. But yes sometimes you will
need take W out bit EMODPR, you're right.

The key problem in the SGX2 patch set is not EPCM side but PTE side.
In order to effectively adjust PTE permissions, vm_max_prot_bits
should be RWX. They make sense only for the static signed content
at most.

If that is changed in the current patch set, then it should be fairly
easy to tune both PTE and EPCM permissions based on situation.

BR, Jarkko

[RFC] x86: Add SGX_IOC_ENCLAVE_AUGMENT_PAGES

Commit Message

Comments

Patch