diff mbox series

[v3] x86/sgx: Do not limit EAUG'd pages by pre-initialization policy

Message ID 20220304033918.361495-1-jarkko@kernel.org (mailing list archive)
State New, archived
Headers show
Series [v3] x86/sgx: Do not limit EAUG'd pages by pre-initialization policy | expand

Commit Message

Jarkko Sakkinen March 4, 2022, 3:39 a.m. UTC
Pre-initialization policy is meant for EADD'd pages because they are
part of the enclave identity. It's a good practice to not let touch the
permissions after initialization, and does provide guarantees to e.g.
LSM's about the enclave.

For EAUG'd pages it should be sufficient to let mmap(), mprotect() and
SGX opcodes to control the permissions. Thus effectively disable
pre-initialization policy by setting vm_max_prot_bits to RWX.

Then, remove vm_run_prot_bits. For EADD'd pages the roof is where
it was during construction, for EAUG'd we don't simply care. This
hard to keep in-sync variable adds only a layer of complexity and
nothing else.

Without vm_run_prot_bits existing, SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
does absolutely nothing. Therefore, it can be safely removed.

Link: https://lore.kernel.org/linux-sgx/YiFkDVhBBYj9zo1N@iki.fi/T/#t
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Nathaniel McCallum <nathaniel@profian.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
v3:
Remove SGX_IOC_ENCLAVE_RELAX_PERMISSIONS.
v2:
Remove vm_run_prot_bits.
---
 arch/x86/include/uapi/asm/sgx.h |  19 ----
 arch/x86/kernel/cpu/sgx/encl.c  |  17 +---
 arch/x86/kernel/cpu/sgx/encl.h  |   1 -
 arch/x86/kernel/cpu/sgx/ioctl.c | 168 +-------------------------------
 4 files changed, 9 insertions(+), 196 deletions(-)

Comments

Reinette Chatre March 4, 2022, 7:09 p.m. UTC | #1
Hi Jarkko,

On 3/3/2022 7:39 PM, Jarkko Sakkinen wrote:
> Pre-initialization policy is meant for EADD'd pages because they are
> part of the enclave identity. It's a good practice to not let touch the
> permissions after initialization, and does provide guarantees to e.g.
> LSM's about the enclave.

I disagree. There are scenarios where it is indeed good practice to
modify the permissions after initialization.  For example, pages that
may be used for relocatable code can start with RWX permissions but once
the pages have been populated with the code they should be able to restrict
permissions to RX only. It is not good practice to require RWX permission
over their entire lifetime. Ideally pages should only have the lowest
permissions possible. 

Supporting the modification of permissions after initialization enables
the security conscious enclave owner to support the security
principle of least privilege.

> 
> For EAUG'd pages it should be sufficient to let mmap(), mprotect() and
> SGX opcodes to control the permissions. Thus effectively disable
> pre-initialization policy by setting vm_max_prot_bits to RWX.

Setting vm_max_prot_bits of EAUG pages to RWX results in dynamically
added (RW) pages to allow RWX PTEs!

Please correct me if I am wrong but I do not see how mmap() and
mprotect() will control permissions here. Will these flows be able to
prevent a RWX mapping of dynamically added RW EPCM pages? From what
I understand SGX controls the permissions here with sgx_encl_may_map()
called during mmap() and mprotect() and it will allow such mappings
after this patch. RWX mapping (VMA) with RWX PTEs of RW EPCM memory
will be allowed, correct?

That means, an enclave using EAUG to expand its heap will have
enclave pages with RW EPCM but yet could have RWX PTEs pointing to
them after the user creates a RWX mmap().

From what I understand this circumvents the kernel's security mechanisms.

This also breaks one of the original rules of SGX as per:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/x86/sgx.rst#n74

"EPCM permissions are separate from the normal page tables.  This prevents the
kernel from, for instance, allowing writes to data which an enclave wishes to
remain read-only."

In these changes the kernel allows all dynamically added pages to be
executable - even when the enclave wishes them not to be.

> Then, remove vm_run_prot_bits. For EADD'd pages the roof is where
> it was during construction, for EAUG'd we don't simply care. This
> hard to keep in-sync variable adds only a layer of complexity and
> nothing else.
> 
> Without vm_run_prot_bits existing, SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
> does absolutely nothing. Therefore, it can be safely removed.

Removing vm_run_prot_bits cripples SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS.
With this removal SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS will support the
modification of EPCM permissions but PTE and VMA permissions will continue
to allow the maximum access possible for the page, whether the enclave page
supports the permission or not.

I find it risky to circumvent the kernel's security mechanisms and
I am not comfortable signing off on this. Or am I just not understanding it
correctly?

Reinette
Jarkko Sakkinen March 5, 2022, 1:59 a.m. UTC | #2
On Fri, Mar 04, 2022 at 11:09:36AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 3/3/2022 7:39 PM, Jarkko Sakkinen wrote:
> > Pre-initialization policy is meant for EADD'd pages because they are
> > part of the enclave identity. It's a good practice to not let touch the
> > permissions after initialization, and does provide guarantees to e.g.
> > LSM's about the enclave.
> 
> I disagree. There are scenarios where it is indeed good practice to
> modify the permissions after initialization.  For example, pages that
> may be used for relocatable code can start with RWX permissions but once
> the pages have been populated with the code they should be able to restrict
> permissions to RX only. It is not good practice to require RWX permission
> over their entire lifetime. Ideally pages should only have the lowest
> permissions possible. 

The only permissions kernel has real control is PTE permissions when
the enclave has been initialized.

You are introducing an artificial limitation with vm_run_prot_bits
that makes e.g. EMODPE more costly for no good reason, and in-kernel
variable has nothing to do with the permissions. They are located
in EPCM.

> Supporting the modification of permissions after initialization enables
> the security conscious enclave owner to support the security
> principle of least privilege.

1. Kernel has the control of PTE permissions.
2. Enclave has the control of EPCM permissions.

vm_run_prot_bits does not help making anything more secure.

BR, Jarkko
diff mbox series

Patch

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index db969a2a1874..6cfdd89d3076 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -29,8 +29,6 @@  enum sgx_page_flags {
 	_IOW(SGX_MAGIC, 0x03, struct sgx_enclave_provision)
 #define SGX_IOC_VEPC_REMOVE_ALL \
 	_IO(SGX_MAGIC, 0x04)
-#define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
-	_IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
 #define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
 	_IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
 #define SGX_IOC_ENCLAVE_MODIFY_TYPE \
@@ -84,23 +82,6 @@  struct sgx_enclave_provision {
 	__u64 fd;
 };
 
-/**
- * struct sgx_enclave_relax_perm - parameters for ioctl
- *                                 %SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
- * @offset:	starting page offset (page aligned relative to enclave base
- *		address defined in SECS)
- * @length:	length of memory (multiple of the page size)
- * @secinfo:	address for the SECINFO data containing the new permission bits
- *		for pages in range described by @offset and @length
- * @count:	(output) bytes successfully changed (multiple of page size)
- */
-struct sgx_enclave_relax_perm {
-	__u64 offset;
-	__u64 length;
-	__u64 secinfo;
-	__u64 count;
-};
-
 /**
  * struct sgx_enclave_restrict_perm - parameters for ioctl
  *                                    %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 5fe7189eac9d..c350fb987a11 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -200,15 +200,8 @@  static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
 	encl_page->desc = addr;
 	encl_page->encl = encl;
 
-	/*
-	 * Adding a regular page that is architecturally allowed to only
-	 * be created with RW permissions.
-	 * TBD: Interface with user space policy to support max permissions
-	 * of RWX.
-	 */
-	prot = PROT_READ | PROT_WRITE;
-	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
-	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
+	prot = PROT_READ | PROT_WRITE | PROT_EXEC;
+	encl_page->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
 
 	epc_page = sgx_alloc_epc_page(encl_page, true);
 	if (IS_ERR(epc_page)) {
@@ -338,7 +331,7 @@  static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 	 * exceed the VMA permissions.
 	 */
 	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
-	page_prot_bits = entry->vm_run_prot_bits & vm_prot_bits;
+	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
 	/*
 	 * Add VM_SHARED so that PTE is made writable right away if VMA
 	 * and EPCM are writable (no COW in SGX).
@@ -391,7 +384,7 @@  static vm_fault_t sgx_vma_pfn_mkwrite(struct vm_fault *vmf)
 		goto out;
 	}
 
-	if (!(entry->vm_run_prot_bits & VM_WRITE))
+	if (!(entry->vm_max_prot_bits & VM_WRITE))
 		ret = VM_FAULT_SIGBUS;
 
 out:
@@ -459,7 +452,7 @@  int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
 	mutex_lock(&encl->lock);
 	xas_lock(&xas);
 	xas_for_each(&xas, page, PFN_DOWN(end - 1)) {
-		if (~page->vm_run_prot_bits & vm_prot_bits) {
+		if (~page->vm_max_prot_bits & vm_prot_bits) {
 			ret = -EACCES;
 			break;
 		}
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 1b6ce1da7c92..241e302e7a72 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -28,7 +28,6 @@ 
 struct sgx_encl_page {
 	unsigned long desc;
 	unsigned long vm_max_prot_bits:8;
-	unsigned long vm_run_prot_bits:8;
 	enum sgx_page_type type:16;
 	struct sgx_epc_page *epc_page;
 	struct sgx_encl *encl;
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index d8c3c07badb3..3ad4320ff6ae 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -198,12 +198,6 @@  static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
 	/* Calculate maximum of the VM flags for the page. */
 	encl_page->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
 
-	/*
-	 * At time of allocation, the runtime protection bits are the same
-	 * as the maximum protection bits.
-	 */
-	encl_page->vm_run_prot_bits = encl_page->vm_max_prot_bits;
-
 	return encl_page;
 }
 
@@ -710,85 +704,6 @@  static unsigned long vm_prot_from_secinfo(u64 secinfo_perm)
 	return vm_prot;
 }
 
-/**
- * sgx_enclave_relax_perm() - Update OS after permissions relaxed by enclave
- * @encl:	Enclave to which the pages belong.
- * @modp:	Checked parameters from user on which pages need modifying.
- * @secinfo_perm: New validated permission bits.
- *
- * Return:
- * - 0:		Success.
- * - -errno:	Otherwise.
- */
-static long sgx_enclave_relax_perm(struct sgx_encl *encl,
-				   struct sgx_enclave_relax_perm *modp,
-				   u64 secinfo_perm)
-{
-	struct sgx_encl_page *entry;
-	unsigned long vm_prot;
-	unsigned long addr;
-	unsigned long c;
-	int ret;
-
-	vm_prot = vm_prot_from_secinfo(secinfo_perm);
-
-	for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
-		addr = encl->base + modp->offset + c;
-
-		mutex_lock(&encl->lock);
-
-		entry = xa_load(&encl->page_array, PFN_DOWN(addr));
-		if (!entry) {
-			ret = -EFAULT;
-			goto out_unlock;
-		}
-
-		/*
-		 * Changing EPCM permissions is only supported on regular
-		 * SGX pages.
-		 */
-		if (entry->type != SGX_PAGE_TYPE_REG) {
-			ret = -EINVAL;
-			goto out_unlock;
-		}
-
-		/*
-		 * Do not accept permissions that are more relaxed
-		 * than vetted permissions.
-		 * If this check fails then EPCM permissions may be more
-		 * relaxed that what would be allowed by the kernel via
-		 * PTEs.
-		 */
-		if ((entry->vm_max_prot_bits & vm_prot) != vm_prot) {
-			ret = -EPERM;
-			goto out_unlock;
-		}
-
-		/*
-		 * Change runtime protection before zapping PTEs to ensure
-		 * any new #PF uses new permissions.
-		 */
-		entry->vm_run_prot_bits = vm_prot;
-
-		mutex_unlock(&encl->lock);
-		/*
-		 * Do not keep encl->lock because of dependency on
-		 * mmap_lock acquired in sgx_zap_enclave_ptes().
-		 */
-		sgx_zap_enclave_ptes(encl, addr);
-	}
-
-	ret = 0;
-	goto out;
-
-out_unlock:
-	mutex_unlock(&encl->lock);
-out:
-	modp->count = c;
-
-	return ret;
-}
-
 /*
  * Ensure enclave is ready for SGX2 functions. Readiness is checked
  * by ensuring the hardware supports SGX2 and the enclave is initialized
@@ -835,65 +750,6 @@  static int sgx_perm_from_user_secinfo(void __user *_secinfo, u64 *secinfo_perm)
 	return 0;
 }
 
-/**
- * sgx_ioc_enclave_relax_perm() - handler for
- *                                %SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
- * @encl:	an enclave pointer
- * @arg:	userspace pointer to a &struct sgx_enclave_relax_perm instance
- *
- * SGX2 distinguishes between relaxing and restricting the enclave page
- * permissions maintained by the hardware (EPCM permissions) of pages
- * belonging to an initialized enclave (after %SGX_IOC_ENCLAVE_INIT).
- *
- * EPCM permissions can be relaxed anytime directly from within the enclave
- * with no visibility from the kernel. This is accomplished with
- * ENCLU[EMODPE] run from within the enclave. Accessing pages with
- * the new, relaxed permissions requires the kernel to update the PTE
- * to handle the subsequent #PF correctly.
- *
- * Enclave page permissions are not allowed to exceed the
- * maximum vetted permissions maintained in
- * &struct sgx_encl_page->vm_max_prot_bits. If the enclave
- * exceeds these permissions by running ENCLU[EMODPE] from within the enclave
- * the kernel will prevent access to the pages via PTE and
- * VMA permissions.
- *
- * Return:
- * - 0:		Success
- * - -errno:	Otherwise
- */
-static long sgx_ioc_enclave_relax_perm(struct sgx_encl *encl, void __user *arg)
-{
-	struct sgx_enclave_relax_perm params;
-	u64 secinfo_perm;
-	long ret;
-
-	ret = sgx_ioc_sgx2_ready(encl);
-	if (ret)
-		return ret;
-
-	if (copy_from_user(&params, arg, sizeof(params)))
-		return -EFAULT;
-
-	if (sgx_validate_offset_length(encl, params.offset, params.length))
-		return -EINVAL;
-
-	ret = sgx_perm_from_user_secinfo((void __user *)params.secinfo,
-					 &secinfo_perm);
-	if (ret)
-		return ret;
-
-	if (params.count)
-		return -EINVAL;
-
-	ret = sgx_enclave_relax_perm(encl, &params, secinfo_perm);
-
-	if (copy_to_user(arg, &params, sizeof(params)))
-		return -EFAULT;
-
-	return ret;
-}
-
 /*
  * Some SGX functions require that no cached linear-to-physical address
  * mappings are present before they can succeed. Collaborate with
@@ -946,9 +802,9 @@  static long sgx_enclave_restrict_perm(struct sgx_encl *encl,
 				      struct sgx_enclave_restrict_perm *modp,
 				      u64 secinfo_perm)
 {
-	unsigned long vm_prot, run_prot_restore;
 	struct sgx_encl_page *entry;
 	struct sgx_secinfo secinfo;
+	unsigned long vm_prot;
 	unsigned long addr;
 	unsigned long c;
 	void *epc_virt;
@@ -1002,14 +858,6 @@  static long sgx_enclave_restrict_perm(struct sgx_encl *encl,
 			goto out_unlock;
 		}
 
-		/*
-		 * Change runtime protection before zapping PTEs to ensure
-		 * any new #PF uses new permissions. EPCM permissions (if
-		 * needed) not changed yet.
-		 */
-		run_prot_restore = entry->vm_run_prot_bits;
-		entry->vm_run_prot_bits = vm_prot;
-
 		mutex_unlock(&encl->lock);
 		/*
 		 * Do not keep encl->lock because of dependency on
@@ -1033,12 +881,12 @@  static long sgx_enclave_restrict_perm(struct sgx_encl *encl,
 			pr_err_once("EMODPR encountered exception %d\n",
 				    ENCLS_TRAPNR(ret));
 			ret = -EFAULT;
-			goto out_prot_restore;
+			goto out_reclaim;
 		}
 		if (encls_failed(ret)) {
 			modp->result = ret;
 			ret = -EFAULT;
-			goto out_prot_restore;
+			goto out_reclaim;
 		}
 
 		ret = sgx_enclave_etrack(encl);
@@ -1054,8 +902,6 @@  static long sgx_enclave_restrict_perm(struct sgx_encl *encl,
 	ret = 0;
 	goto out;
 
-out_prot_restore:
-	entry->vm_run_prot_bits = run_prot_restore;
 out_reclaim:
 	sgx_mark_page_reclaimable(entry->epc_page);
 out_unlock:
@@ -1136,7 +982,7 @@  static long sgx_enclave_modt(struct sgx_encl *encl,
 			     struct sgx_enclave_modt *modt,
 			     enum sgx_page_type page_type)
 {
-	unsigned long max_prot_restore, run_prot_restore;
+	unsigned long max_prot_restore;
 	struct sgx_encl_page *entry;
 	struct sgx_secinfo secinfo;
 	unsigned long prot;
@@ -1182,7 +1028,6 @@  static long sgx_enclave_modt(struct sgx_encl *encl,
 		}
 
 		max_prot_restore = entry->vm_max_prot_bits;
-		run_prot_restore = entry->vm_run_prot_bits;
 
 		/*
 		 * Once a regular page becomes a TCS page it cannot be
@@ -1200,7 +1045,6 @@  static long sgx_enclave_modt(struct sgx_encl *encl,
 			}
 			prot = PROT_READ | PROT_WRITE;
 			entry->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
-			entry->vm_run_prot_bits = entry->vm_max_prot_bits;
 
 			/*
 			 * Prevent page from being reclaimed while mutex
@@ -1262,7 +1106,6 @@  static long sgx_enclave_modt(struct sgx_encl *encl,
 
 out_entry_changed:
 	entry->vm_max_prot_bits = max_prot_restore;
-	entry->vm_run_prot_bits = run_prot_restore;
 out_unlock:
 	mutex_unlock(&encl->lock);
 out:
@@ -1498,9 +1341,6 @@  long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 	case SGX_IOC_ENCLAVE_PROVISION:
 		ret = sgx_ioc_enclave_provision(encl, (void __user *)arg);
 		break;
-	case SGX_IOC_ENCLAVE_RELAX_PERMISSIONS:
-		ret = sgx_ioc_enclave_relax_perm(encl, (void __user *)arg);
-		break;
 	case SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:
 		ret = sgx_ioc_enclave_restrict_perm(encl, (void __user *)arg);
 		break;