Message ID | 20190617222438.2080-5-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | security: x86/sgx: SGX vs. LSM, round 3 | expand |
On Mon, 2019-06-17 at 15:24 -0700, Sean Christopherson wrote:
> + __u32 flags;
This should be changed to secinfo_flags_mask containing a mask of the
allowed bits for the secinfo flags because of two obvious reasons:
1. Protection flags are used mainly with syscalls and contain also other
things than just the permissions that do not apply in this context.
2. Having a mask for all secinfo flags is more future proof.
With the protection flags you end up reserving bits forever for things
that we will never have any use for (e.g. PROT_SEM).
Looking the change you convert 'flags' (wondering why it isn't called
'prot') to VM flags, which means that you essentially gain absolutely
nothing and loose some potential versatility as a side-effect by doing
that.
/Jarkko
On Wed, Jun 19, 2019 at 05:43:11PM +0300, Jarkko Sakkinen wrote: > On Mon, 2019-06-17 at 15:24 -0700, Sean Christopherson wrote: > > + __u32 flags; > > This should be changed to secinfo_flags_mask containing a mask of the > allowed bits for the secinfo flags because of two obvious reasons: > > 1. Protection flags are used mainly with syscalls and contain also other > things than just the permissions that do not apply in this context. > 2. Having a mask for all secinfo flags is more future proof. > > With the protection flags you end up reserving bits forever for things > that we will never have any use for (e.g. PROT_SEM). > > Looking the change you convert 'flags' (wondering why it isn't called > 'prot') to VM flags, which means that you essentially gain absolutely > nothing and loose some potential versatility as a side-effect by doing > that. Ah, I see where you're coming from. My intent was that supported flags would be SGX specific, not generic PROT_* flags. I.e. bits 2:0 are used for PROT_{READ,WRITE,EXEC}, bit 7 can be used for SGX_ZERO_PAGE, etc... I have two objections to 'secinfo_flags_mask': - A full SECINFO mask is problematic for literally every other bit/field currently defined in SECINFO.FLAGS, e.g. masking PAGE_TYPE, PENDING and MODIFIED adds no value that I can think of, but would require the kernel do to weird things like reject page types and EMODPR requests (due to their PENDING/MODIFIED interaction). - The kernel doesn't actually restrict SECINFO based on the param, it's restricting VM_MAY* flags in the vmas. 'secinfo_flags_mask' implies the kernel is somehow masking SECINFO. What about something like this? /** * struct sgx_enclave_add_page - parameter structure for the * %SGX_IOC_ENCLAVE_ADD_PAGE ioctl * @addr: address within the ELRANGE * @src: address for the page data * @secinfo: address for the SECINFO data * @mrmask: bitmask for the measured 256 byte chunks * @prot: maximal PROT_{READ,WRITE,EXEC} permissions for the page */ struct sgx_enclave_add_page { __u64 addr; __u64 src; __u64 secinfo; __u16 mrmask; __u8 prot; __u8 pad; __u64[2] reserved; };
On Wed, Jun 19, 2019 at 08:20:18AM -0700, Sean Christopherson wrote: > On Wed, Jun 19, 2019 at 05:43:11PM +0300, Jarkko Sakkinen wrote: > > On Mon, 2019-06-17 at 15:24 -0700, Sean Christopherson wrote: > > > + __u32 flags; > > > > This should be changed to secinfo_flags_mask containing a mask of the > > allowed bits for the secinfo flags because of two obvious reasons: > > > > 1. Protection flags are used mainly with syscalls and contain also other > > things than just the permissions that do not apply in this context. > > 2. Having a mask for all secinfo flags is more future proof. > > > > With the protection flags you end up reserving bits forever for things > > that we will never have any use for (e.g. PROT_SEM). > > > > Looking the change you convert 'flags' (wondering why it isn't called > > 'prot') to VM flags, which means that you essentially gain absolutely > > nothing and loose some potential versatility as a side-effect by doing > > that. > > Ah, I see where you're coming from. My intent was that supported flags > would be SGX specific, not generic PROT_* flags. I.e. bits 2:0 are used > for PROT_{READ,WRITE,EXEC}, bit 7 can be used for SGX_ZERO_PAGE, etc... > > I have two objections to 'secinfo_flags_mask': > > - A full SECINFO mask is problematic for literally every other bit/field > currently defined in SECINFO.FLAGS, e.g. masking PAGE_TYPE, PENDING > and MODIFIED adds no value that I can think of, but would require the > kernel do to weird things like reject page types and EMODPR requests > (due to their PENDING/MODIFIED interaction). You're probably right that in practice it would hard to do much with EMODT. > - The kernel doesn't actually restrict SECINFO based on the param, it's > restricting VM_MAY* flags in the vmas. 'secinfo_flags_mask' implies > the kernel is somehow masking SECINFO. > > What about something like this? > > /** > * struct sgx_enclave_add_page - parameter structure for the > * %SGX_IOC_ENCLAVE_ADD_PAGE ioctl > * @addr: address within the ELRANGE > * @src: address for the page data > * @secinfo: address for the SECINFO data > * @mrmask: bitmask for the measured 256 byte chunks > * @prot: maximal PROT_{READ,WRITE,EXEC} permissions for the page > */ > struct sgx_enclave_add_page { > __u64 addr; > __u64 src; > __u64 secinfo; > __u16 mrmask; > __u8 prot; > __u8 pad; > __u64[2] reserved; > }; LGTM but why it isn't like: __u16 mrmask; __u8 prot; __u8 reserved[5]; /Jarkko
On Fri, Jun 21, 2019 at 01:17:02AM +0300, Jarkko Sakkinen wrote: > On Wed, Jun 19, 2019 at 08:20:18AM -0700, Sean Christopherson wrote: > > On Wed, Jun 19, 2019 at 05:43:11PM +0300, Jarkko Sakkinen wrote: > > > On Mon, 2019-06-17 at 15:24 -0700, Sean Christopherson wrote: > > > > + __u32 flags; > > > > > > This should be changed to secinfo_flags_mask containing a mask of the > > > allowed bits for the secinfo flags because of two obvious reasons: > > > > > > 1. Protection flags are used mainly with syscalls and contain also other > > > things than just the permissions that do not apply in this context. > > > 2. Having a mask for all secinfo flags is more future proof. > > > > > > With the protection flags you end up reserving bits forever for things > > > that we will never have any use for (e.g. PROT_SEM). > > > > > > Looking the change you convert 'flags' (wondering why it isn't called > > > 'prot') to VM flags, which means that you essentially gain absolutely > > > nothing and loose some potential versatility as a side-effect by doing > > > that. > > > > Ah, I see where you're coming from. My intent was that supported flags > > would be SGX specific, not generic PROT_* flags. I.e. bits 2:0 are used > > for PROT_{READ,WRITE,EXEC}, bit 7 can be used for SGX_ZERO_PAGE, etc... > > > > I have two objections to 'secinfo_flags_mask': > > > > - A full SECINFO mask is problematic for literally every other bit/field > > currently defined in SECINFO.FLAGS, e.g. masking PAGE_TYPE, PENDING > > and MODIFIED adds no value that I can think of, but would require the > > kernel do to weird things like reject page types and EMODPR requests > > (due to their PENDING/MODIFIED interaction). > > You're probably right that in practice it would hard to do much with > EMODT. > > > - The kernel doesn't actually restrict SECINFO based on the param, it's > > restricting VM_MAY* flags in the vmas. 'secinfo_flags_mask' implies > > the kernel is somehow masking SECINFO. > > > > What about something like this? > > > > /** > > * struct sgx_enclave_add_page - parameter structure for the > > * %SGX_IOC_ENCLAVE_ADD_PAGE ioctl > > * @addr: address within the ELRANGE > > * @src: address for the page data > > * @secinfo: address for the SECINFO data > > * @mrmask: bitmask for the measured 256 byte chunks > > * @prot: maximal PROT_{READ,WRITE,EXEC} permissions for the page > > */ > > struct sgx_enclave_add_page { > > __u64 addr; > > __u64 src; > > __u64 secinfo; > > __u16 mrmask; > > __u8 prot; > > __u8 pad; > > __u64[2] reserved; > > }; > > LGTM but why it isn't like: > > __u16 mrmask; > __u8 prot; > __u8 reserved[5]; Because math is hard :-) Though I think we'd want __u16 mrmask __u8 prot __u8 pad[5]; with an optional __u64 reserved[?]; "pad" can be ignored, e.g. doesn't need to be explicitly zeroed by userspace, whereas "reserved" requires explicit zeroing and probably an associated kernel check.
On Sun, 2019-07-07 at 12:08 -0700, Sean Christopherson wrote: > LGTM but why it isn't like: > > > > __u16 mrmask; > > __u8 prot; > > __u8 reserved[5]; > > Because math is hard :-) Though I think we'd want > > __u16 mrmask > __u8 prot > __u8 pad[5]; > > with an optional > > __u64 reserved[?]; > > "pad" can be ignored, e.g. doesn't need to be explicitly zeroed by > userspace, whereas "reserved" requires explicit zeroing and probably an > associated kernel check. OK, cool with the change itself. Still need to get a better idea how these make sense in architectural sense. Things that would help with overall picture: 1. We have to figure out how this can be useful when LSM's are not used. That gives at least some evidence that the security model is overally good if it makes sense with and without LSM. Right now this looks like dead functionality if not coupled with an LSM. 2. Probably some "user story" type of examples would help with the discussion overall [1] i.e. how one would use this for her own good. [1] Probably many of the folks who work x86 tree have ignored major part of the discussion. Somehow these should be brought to nutshell so that anyone can get whatever the model is. Anyone should get it basically. /Jarkko
On Mon, Jul 08, 2019 at 06:23:46PM +0300, Jarkko Sakkinen wrote: > On Sun, 2019-07-07 at 12:08 -0700, Sean Christopherson wrote: > > LGTM but why it isn't like: > > > > > > __u16 mrmask; > > > __u8 prot; > > > __u8 reserved[5]; > > > > Because math is hard :-) Though I think we'd want > > > > __u16 mrmask > > __u8 prot > > __u8 pad[5]; > > > > with an optional > > > > __u64 reserved[?]; > > > > "pad" can be ignored, e.g. doesn't need to be explicitly zeroed by > > userspace, whereas "reserved" requires explicit zeroing and probably an > > associated kernel check. > > OK, cool with the change itself. Still need to get a better idea > how these make sense in architectural sense. > > Things that would help with overall picture: > > 1. We have to figure out how this can be useful when LSM's are not used. Declaring PROT_EXEC is useful to enforce noexec filesystems. Beyond that, I am not aware of any meaningful use case. > That gives at least some evidence that the security model is overally > good if it makes sense with and without LSM. Right now this looks like > dead functionality if not coupled with an LSM. I agree that it's effectively dead functionality without LSMs, but keep in mind that this LSM rat hole was opened specifically because SGX could be used to circumvent existing LSM security policies. In other words, the purpose of the UAPI extension is to achieve LSM compatibility without incurring significant complexity in the LSM subsystem. > 2. Probably some "user story" type of examples would help with the > discussion overall [1] i.e. how one would use this for > her own good. The compelling story is Andy's original concern that userspace could circumvent existing security policies by running code in an enclave. AIUI, closing the LSM loophole is the minimal requirement to get SGX upstreamed. The extensive discussion has largely been focused on ensuring that whatever mechanism is used to close the loophole will play nice with future SGX functionality and/or LSM security policies. > [1] Probably many of the folks who work x86 tree have ignored major > part of the discussion. Somehow these should be brought to > nutshell so that anyone can get whatever the model is. Anyone > should get it basically. > > /Jarkko >
On Mon, Jul 08, 2019 at 09:19:32AM -0700, Sean Christopherson wrote: > > 2. Probably some "user story" type of examples would help with the > > discussion overall [1] i.e. how one would use this for > > her own good. > > The compelling story is Andy's original concern that userspace could > circumvent existing security policies by running code in an enclave. > > AIUI, closing the LSM loophole is the minimal requirement to get SGX > upstreamed. The extensive discussion has largely been focused on > ensuring that whatever mechanism is used to close the loophole will > play nice with future SGX functionality and/or LSM security policies. OK, might be getting here where I fall out of the wagon so: Doesn't Andy's example anyway require a process that has privileges to make pages executable i.e. it could run arbitrary code even without an enclave? /Jarkko
On Tue, Jul 09, 2019 at 07:06:34PM +0300, Jarkko Sakkinen wrote: > On Mon, Jul 08, 2019 at 09:19:32AM -0700, Sean Christopherson wrote: > > > 2. Probably some "user story" type of examples would help with the > > > discussion overall [1] i.e. how one would use this for > > > her own good. > > > > The compelling story is Andy's original concern that userspace could > > circumvent existing security policies by running code in an enclave. > > > > AIUI, closing the LSM loophole is the minimal requirement to get SGX > > upstreamed. The extensive discussion has largely been focused on > > ensuring that whatever mechanism is used to close the loophole will > > play nice with future SGX functionality and/or LSM security policies. > > OK, might be getting here where I fall out of the wagon so: > > Doesn't Andy's example anyway require a process that has privileges to > make pages executable i.e. it could run arbitrary code even without an > enclave? Ah, no. He did raise that concern, but it'd only be an issue if the enclave fd were backed by an anon inode, in which case all enclaves would need EXECMEM in order to gain PROT_EXEC on EPC. Because the fd is backed /dev/sgx/enclave, userspace just needs FILE__EXECUTE on /dev/sgx/enclave.
On Wed, Jul 10, 2019 at 10:25 AM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Tue, Jul 09, 2019 at 07:06:34PM +0300, Jarkko Sakkinen wrote: > > On Mon, Jul 08, 2019 at 09:19:32AM -0700, Sean Christopherson wrote: > > > > 2. Probably some "user story" type of examples would help with the > > > > discussion overall [1] i.e. how one would use this for > > > > her own good. > > > > > > The compelling story is Andy's original concern that userspace could > > > circumvent existing security policies by running code in an enclave. > > > > > > AIUI, closing the LSM loophole is the minimal requirement to get SGX > > > upstreamed. The extensive discussion has largely been focused on > > > ensuring that whatever mechanism is used to close the loophole will > > > play nice with future SGX functionality and/or LSM security policies. > > > > OK, might be getting here where I fall out of the wagon so: > > > > Doesn't Andy's example anyway require a process that has privileges to > > make pages executable i.e. it could run arbitrary code even without an > > enclave? > > Ah, no. He did raise that concern, but it'd only be an issue if the > enclave fd were backed by an anon inode, in which case all enclaves would > need EXECMEM in order to gain PROT_EXEC on EPC. Because the fd is backed > /dev/sgx/enclave, userspace just needs FILE__EXECUTE on /dev/sgx/enclave. I would say it differently: regardless of exactly how /dev/sgx/enclave is wired up under the hood, we want a way that a process can be granted permission to usefully run enclaves without being granted permission to execute whatever bytes of code it wants. Preferably without requiring LSMs to maintain some form of enclave signature whitelist. This is pretty much the only hard requirement I see. We really could achieve this, in a somewhat limited form, by saying that LSMs can approve or reject the SIGSTRUCT. But doing it that way is a bit nasty as we've noticed, for a few reasons. Several of you have raised objections to requiring SIGSTRUCT to come from a .sigstruct file. We also need to worry about a SIGSTRUCT that refers to an enclave that forgot to measure its text. And we need to worry about SGX2. So this whole messy exercise boils down to: a bunch of security policy authors think that EXECMEM and similar are not to be given out lightly. How to we allow policies like that to be compatible with SGX?
On Mon, Jul 15, 2019 at 03:29:23PM -0700, Andy Lutomirski wrote: > I would say it differently: regardless of exactly how /dev/sgx/enclave > is wired up under the hood, we want a way that a process can be > granted permission to usefully run enclaves without being granted > permission to execute whatever bytes of code it wants. Preferably > without requiring LSMs to maintain some form of enclave signature > whitelist. Would it be better to have a signer whitelist instead or some combination? E.g. you could whiteliste either by signer or enclave signature. /Jarkko
On Thu, Aug 1, 2019 at 9:38 AM Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > On Mon, Jul 15, 2019 at 03:29:23PM -0700, Andy Lutomirski wrote: > > I would say it differently: regardless of exactly how /dev/sgx/enclave > > is wired up under the hood, we want a way that a process can be > > granted permission to usefully run enclaves without being granted > > permission to execute whatever bytes of code it wants. Preferably > > without requiring LSMs to maintain some form of enclave signature > > whitelist. > > Would it be better to have a signer whitelist instead or some > combination? E.g. you could whiteliste either by signer or > enclave signature. > I'm not sure, and also don't really think we need to commit to an answer right now. I do think that the eventual solution should be more flexible than just whitelisting the signers. In particular, it should be possible to make secure enclaves, open-source or otherwise, that are reproducibly buildable. This more or less requires that the signing private key not be a secret, which means that no one would want to whitelist the signing key. The enclave would be trusted, and would seal data, on the basis of its MRENCLAVE, and the policy, if any, would want to whitelist the MRENCLAVE or perhaps the whole SIGSTRUCT. But my overall point is that it should be possible to have a conherent policy that allows any enclave whatsoever to run but that still respects EXECMEM and such.
On Sun, Aug 04, 2019 at 03:20:24PM -0700, Andy Lutomirski wrote: > On Thu, Aug 1, 2019 at 9:38 AM Jarkko Sakkinen > <jarkko.sakkinen@linux.intel.com> wrote: > > > > On Mon, Jul 15, 2019 at 03:29:23PM -0700, Andy Lutomirski wrote: > > > I would say it differently: regardless of exactly how /dev/sgx/enclave > > > is wired up under the hood, we want a way that a process can be > > > granted permission to usefully run enclaves without being granted > > > permission to execute whatever bytes of code it wants. Preferably > > > without requiring LSMs to maintain some form of enclave signature > > > whitelist. > > > > Would it be better to have a signer whitelist instead or some > > combination? E.g. you could whiteliste either by signer or > > enclave signature. > > > > I'm not sure, and also don't really think we need to commit to an > answer right now. I do think that the eventual solution should be > more flexible than just whitelisting the signers. In particular, it > should be possible to make secure enclaves, open-source or otherwise, > that are reproducibly buildable. This more or less requires that the > signing private key not be a secret, which means that no one would > want to whitelist the signing key. The enclave would be trusted, and > would seal data, on the basis of its MRENCLAVE, and the policy, if > any, would want to whitelist the MRENCLAVE or perhaps the whole > SIGSTRUCT. > > But my overall point is that it should be possible to have a conherent > policy that allows any enclave whatsoever to run but that still > respects EXECMEM and such. So could kernel embed a fixed signing key that would be made available through sysfs for signing? Already have one for my selftest. /Jarkko
On Mon, Aug 5, 2019 at 1:51 PM Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > On Sun, Aug 04, 2019 at 03:20:24PM -0700, Andy Lutomirski wrote: > > On Thu, Aug 1, 2019 at 9:38 AM Jarkko Sakkinen > > <jarkko.sakkinen@linux.intel.com> wrote: > > > > > > On Mon, Jul 15, 2019 at 03:29:23PM -0700, Andy Lutomirski wrote: > > > > I would say it differently: regardless of exactly how /dev/sgx/enclave > > > > is wired up under the hood, we want a way that a process can be > > > > granted permission to usefully run enclaves without being granted > > > > permission to execute whatever bytes of code it wants. Preferably > > > > without requiring LSMs to maintain some form of enclave signature > > > > whitelist. > > > > > > Would it be better to have a signer whitelist instead or some > > > combination? E.g. you could whiteliste either by signer or > > > enclave signature. > > > > > > > I'm not sure, and also don't really think we need to commit to an > > answer right now. I do think that the eventual solution should be > > more flexible than just whitelisting the signers. In particular, it > > should be possible to make secure enclaves, open-source or otherwise, > > that are reproducibly buildable. This more or less requires that the > > signing private key not be a secret, which means that no one would > > want to whitelist the signing key. The enclave would be trusted, and > > would seal data, on the basis of its MRENCLAVE, and the policy, if > > any, would want to whitelist the MRENCLAVE or perhaps the whole > > SIGSTRUCT. > > > > But my overall point is that it should be possible to have a conherent > > policy that allows any enclave whatsoever to run but that still > > respects EXECMEM and such. > > So could kernel embed a fixed signing key that would be made available > through sysfs for signing? Already have one for my selftest. > Do you mean a public and private key? I was imagining that someone would just create a key pair and publish it for the case of SGX programs that don't depend on MRSIGNER. There doesn't have to be just one. But I may be misunderstanding you.
On Mon, Aug 05, 2019 at 02:30:22PM -0700, Andy Lutomirski wrote: > On Mon, Aug 5, 2019 at 1:51 PM Jarkko Sakkinen > <jarkko.sakkinen@linux.intel.com> wrote: > > > > On Sun, Aug 04, 2019 at 03:20:24PM -0700, Andy Lutomirski wrote: > > > On Thu, Aug 1, 2019 at 9:38 AM Jarkko Sakkinen > > > <jarkko.sakkinen@linux.intel.com> wrote: > > > > > > > > On Mon, Jul 15, 2019 at 03:29:23PM -0700, Andy Lutomirski wrote: > > > > > I would say it differently: regardless of exactly how /dev/sgx/enclave > > > > > is wired up under the hood, we want a way that a process can be > > > > > granted permission to usefully run enclaves without being granted > > > > > permission to execute whatever bytes of code it wants. Preferably > > > > > without requiring LSMs to maintain some form of enclave signature > > > > > whitelist. > > > > > > > > Would it be better to have a signer whitelist instead or some > > > > combination? E.g. you could whiteliste either by signer or > > > > enclave signature. > > > > > > > > > > I'm not sure, and also don't really think we need to commit to an > > > answer right now. I do think that the eventual solution should be > > > more flexible than just whitelisting the signers. In particular, it > > > should be possible to make secure enclaves, open-source or otherwise, > > > that are reproducibly buildable. This more or less requires that the > > > signing private key not be a secret, which means that no one would > > > want to whitelist the signing key. The enclave would be trusted, and > > > would seal data, on the basis of its MRENCLAVE, and the policy, if > > > any, would want to whitelist the MRENCLAVE or perhaps the whole > > > SIGSTRUCT. > > > > > > But my overall point is that it should be possible to have a conherent > > > policy that allows any enclave whatsoever to run but that still > > > respects EXECMEM and such. > > > > So could kernel embed a fixed signing key that would be made available > > through sysfs for signing? Already have one for my selftest. > > > > Do you mean a public and private key? I was imagining that someone > would just create a key pair and publish it for the case of SGX > programs that don't depend on MRSIGNER. There doesn't have to be just > one. > > But I may be misunderstanding you. Aa, OK, got you. I actually misunderstood you. /Jarkko
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h index 6dba9f282232..4144242ab690 100644 --- a/arch/x86/include/uapi/asm/sgx.h +++ b/arch/x86/include/uapi/asm/sgx.h @@ -35,15 +35,16 @@ struct sgx_enclave_create { * @src: address for the page data * @secinfo: address for the SECINFO data * @mrmask: bitmask for the measured 256 byte chunks + * @flags: flags, e.g. PROT_{READ,WRITE,EXEC} */ struct sgx_enclave_add_page { __u64 addr; __u64 src; __u64 secinfo; - __u64 mrmask; + __u32 mrmask; + __u32 flags; }; - /** * struct sgx_enclave_init - parameter structure for the * %SGX_IOC_ENCLAVE_INIT ioctl diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c index 3552d642b26f..8e95e45411f2 100644 --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c @@ -2,6 +2,7 @@ // Copyright(c) 2016-19 Intel Corporation. #include <asm/mman.h> +#include <linux/mman.h> #include <linux/delay.h> #include <linux/file.h> #include <linux/hashtable.h> @@ -235,7 +236,8 @@ static int sgx_validate_secs(const struct sgx_secs *secs, } static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl, - unsigned long addr) + unsigned long addr, + unsigned long prot) { struct sgx_encl_page *encl_page; int ret; @@ -247,6 +249,7 @@ static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl, return ERR_PTR(-ENOMEM); encl_page->desc = addr; encl_page->encl = encl; + encl_page->vm_prot_bits = calc_vm_prot_bits(prot, 0); ret = radix_tree_insert(&encl->page_tree, PFN_DOWN(encl_page->desc), encl_page); if (ret) { @@ -517,7 +520,7 @@ static int __sgx_encl_add_page(struct sgx_encl *encl, static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, void *data, struct sgx_secinfo *secinfo, - unsigned int mrmask) + unsigned int mrmask, unsigned long prot) { u64 page_type = secinfo->flags & SGX_SECINFO_PAGE_TYPE_MASK; struct sgx_encl_page *encl_page; @@ -543,7 +546,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, goto out; } - encl_page = sgx_encl_page_alloc(encl, addr); + encl_page = sgx_encl_page_alloc(encl, addr, prot); if (IS_ERR(encl_page)) { ret = PTR_ERR(encl_page); goto out; @@ -584,6 +587,7 @@ static long sgx_ioc_enclave_add_page(struct file *filep, void __user *arg) struct sgx_enclave_add_page addp; struct sgx_secinfo secinfo; struct page *data_page; + unsigned long prot; void *data; int ret; @@ -605,7 +609,10 @@ static long sgx_ioc_enclave_add_page(struct file *filep, void __user *arg) goto out; } - ret = sgx_encl_add_page(encl, addp.addr, data, &secinfo, addp.mrmask); + prot = addp.flags & (PROT_READ | PROT_WRITE | PROT_EXEC); + + ret = sgx_encl_add_page(encl, addp.addr, data, &secinfo, addp.mrmask, + prot); if (ret) goto out; diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c index d7de4d9aea87..cfc348b44ffb 100644 --- a/arch/x86/kernel/cpu/sgx/driver/main.c +++ b/arch/x86/kernel/cpu/sgx/driver/main.c @@ -79,17 +79,66 @@ static int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm) return 0; } +/* + * Returns the AND of VM_{READ,WRITE,EXEC} permissions across all pages + * covered by the specific VMA. A non-existent (or yet to be added) enclave + * page is considered to have no RWX permissions, i.e. is inaccessible. + */ +static unsigned long sgx_allowed_rwx(struct sgx_encl *encl, + struct vm_area_struct *vma) +{ + unsigned long allowed_rwx = VM_READ | VM_WRITE | VM_EXEC; + unsigned long idx, idx_start, idx_end; + struct sgx_encl_page *page; + + idx_start = PFN_DOWN(vma->vm_start); + idx_end = PFN_DOWN(vma->vm_end - 1); + + for (idx = idx_start; idx <= idx_end; ++idx) { + /* + * No need to take encl->lock, vm_prot_bits is set prior to + * insertion and never changes, and racing with adding pages is + * a userspace bug. + */ + rcu_read_lock(); + page = radix_tree_lookup(&encl->page_tree, idx); + rcu_read_unlock(); + + /* Do not allow R|W|X to a non-existent page. */ + if (!page) + allowed_rwx = 0; + else + allowed_rwx &= page->vm_prot_bits; + if (!allowed_rwx) + break; + } + + return allowed_rwx; +} + static int sgx_mmap(struct file *file, struct vm_area_struct *vma) { struct sgx_encl *encl = file->private_data; + unsigned long allowed_rwx; int ret; + allowed_rwx = sgx_allowed_rwx(encl, vma); + if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC) & ~allowed_rwx) + return -EACCES; + if (!sgx_encl_get_mm(encl, vma->vm_mm)) { ret = sgx_encl_mm_add(encl, vma->vm_mm); if (ret) return ret; } + if (!(allowed_rwx & VM_READ)) + vma->vm_flags &= ~VM_MAYREAD; + if (!(allowed_rwx & VM_WRITE)) + vma->vm_flags &= ~VM_MAYWRITE; + if (!(allowed_rwx & VM_EXEC)) + vma->vm_flags &= ~VM_MAYEXEC; + vma->vm_ops = &sgx_vm_ops; vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO; vma->vm_private_data = encl; diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index 7b339334d875..5cb0f2653d4c 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -41,6 +41,7 @@ enum sgx_encl_page_desc { struct sgx_encl_page { unsigned long desc; + unsigned long vm_prot_bits; struct sgx_epc_page *epc_page; struct sgx_va_page *va_page; struct sgx_encl *encl; diff --git a/tools/testing/selftests/x86/sgx/main.c b/tools/testing/selftests/x86/sgx/main.c index e2265f841fb0..e14a34adc0e4 100644 --- a/tools/testing/selftests/x86/sgx/main.c +++ b/tools/testing/selftests/x86/sgx/main.c @@ -2,6 +2,7 @@ // Copyright(c) 2016-18 Intel Corporation. #include <elf.h> +#include <errno.h> #include <fcntl.h> #include <stdbool.h> #include <stdio.h> @@ -18,6 +19,8 @@ #include "../../../../../arch/x86/kernel/cpu/sgx/arch.h" #include "../../../../../arch/x86/include/uapi/asm/sgx.h" +#define PAGE_SIZE 4096 + static const uint64_t MAGIC = 0x1122334455667788ULL; struct vdso_symtab { @@ -135,8 +138,7 @@ static bool encl_create(int dev_fd, unsigned long bin_size, for (secs->size = 4096; secs->size < bin_size; ) secs->size <<= 1; - base = mmap(NULL, secs->size, PROT_READ | PROT_WRITE | PROT_EXEC, - MAP_SHARED, dev_fd, 0); + base = mmap(NULL, secs->size, PROT_NONE, MAP_SHARED, dev_fd, 0); if (base == MAP_FAILED) { perror("mmap"); return false; @@ -147,7 +149,7 @@ static bool encl_create(int dev_fd, unsigned long bin_size, ioc.src = (unsigned long)secs; rc = ioctl(dev_fd, SGX_IOC_ENCLAVE_CREATE, &ioc); if (rc) { - fprintf(stderr, "ECREATE failed rc=%d.\n", rc); + fprintf(stderr, "ECREATE failed rc=%d, err=%d.\n", rc, errno); munmap(base, secs->size); return false; } @@ -160,8 +162,14 @@ static bool encl_add_page(int dev_fd, unsigned long addr, void *data, { struct sgx_enclave_add_page ioc; struct sgx_secinfo secinfo; + unsigned long prot; int rc; + if (flags == SGX_SECINFO_TCS) + prot = PROT_READ | PROT_WRITE; + else + prot = PROT_READ | PROT_WRITE | PROT_EXEC; + memset(&secinfo, 0, sizeof(secinfo)); secinfo.flags = flags; @@ -169,6 +177,7 @@ static bool encl_add_page(int dev_fd, unsigned long addr, void *data, ioc.mrmask = 0xFFFF; ioc.addr = addr; ioc.src = (uint64_t)data; + ioc.flags = prot; rc = ioctl(dev_fd, SGX_IOC_ENCLAVE_ADD_PAGE, &ioc); if (rc) { @@ -184,6 +193,7 @@ static bool encl_load(struct sgx_secs *secs, unsigned long bin_size) struct sgx_enclave_init ioc; uint64_t offset; uint64_t flags; + void *addr; int dev_fd; int rc; @@ -215,6 +225,22 @@ static bool encl_load(struct sgx_secs *secs, unsigned long bin_size) goto out_map; } + addr = mmap((void *)secs->base, PAGE_SIZE, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_FIXED, dev_fd, 0); + if (addr == MAP_FAILED) { + fprintf(stderr, "mmap() failed on TCS, errno=%d.\n", errno); + return false; + } + + addr = mmap((void *)(secs->base + PAGE_SIZE), bin_size - PAGE_SIZE, + PROT_READ | PROT_WRITE | PROT_EXEC, + MAP_SHARED | MAP_FIXED, dev_fd, 0); + if (addr == MAP_FAILED) { + fprintf(stderr, "mmap() failed, errno=%d.\n", errno); + return false; + } + + close(dev_fd); return true; out_map:
Existing Linux Security Module policies restrict userspace's ability to map memory, e.g. may require priveleged permissions to map a page that is simultaneously writable and executable. Said permissions are often tied to the file which backs the mapped memory, i.e. vm_file. For reasons explained below, SGX does not allow LSMs to enforce policies using existing LSM hooks such as file_mprotect(). Explicitly track the protection bits for an enclave page (separate from the vma/pte bits) and require userspace to explicit define a page's protection bits when the page is added to the enclave. Enclave page protection bits paves the way to adding security_enclave_load() LSM hook as an SGX equivalent to security_file_mprotect(), e.g. SGX can pass the page's protection bits and source vma to LSMs. The source vma will allow LSMs to tie permissions to files, e.g. the file containing the enclave's code and initial data, and the protection bits will allow LSMs to make decisions based on the capabilities of the process, e.g. if a process is allowed to load unmeasured code or load code from anonymous memory. Due to the nature of the Enclave Page Cache, and because the EPC is manually managed by SGX, all enclave vmas are backed by the same file, i.e. /dev/sgx/enclave. Specifically, a single file allows SGX to use file op hooks to move pages in/out of the EPC. Furthermore, EPC pages for any given enclave are fundamentally shared between processes, i.e. CoW semantics are not possible with EPC pages due to hardware restrictions such as 1:1 mappings between virtual and physical addresses (within the enclave). Lastly, all real world enclaves will need read, write and execute permissions to EPC pages. As a result, SGX does not play nice with existing LSM behavior as it is impossible to apply policies to enclaves with reasonable granularity, e.g. an LSM can deny access to EPC altogether, but can't deny potentially unwanted behavior such as mapping pages WX, loading code from anonymous memory, loading unmeasured code, etc... For example, because all (practical) enclaves need RW pages for data and RX pages for code, SELinux's existing policies will require all enclaves to have FILE__READ, FILE__WRITE and FILE__EXECUTE permissions on /dev/sgx/enclave. Witholding FILE__WRITE or FILE__EXECUTE in an attempt to deny RW->RX or RWX would prevent running *any* enclave, even those that cleanly separate RW and RX pages. And because /dev/sgx/enclave requires MAP_SHARED, the anonymous/CoW checks that would trigger FILE__EXECMOD or PROCESS__EXECMEM permissions will never fire. Taking protection bits has a second use in that it can be used to prevent loading an enclave from a noexec file system. On SGX2 hardware, regardless of kernel support for SGX2, userspace could EADD a page from a noexec path using read-only permissions and later mprotect() and ENCLU[EMODPE] the page to gain execute permissions. By requiring the enclave's page protections up front, SGX will be able to enforce noexec paths when building enclaves. To prevent userspace from circumventing the allowed protections, do not allow PROT_{READ,WRITE,EXEC} mappings to an enclave without an associated enclave page, i.e. prevent creating a mapping with unchecked protection bits. Many alternatives[1][2] have been explored, most notably the concept of having SGX check (at load time) and save the permissions of the enclave loader. The permissions would then be enforced by SGX at run time, e.g. via mmap()/mprotect() hooks of some form. The basic functionality of pre-checking permissions is relatively straightforward, but supporting LSM auditing is complex and fraught with pitfalls. If auditing is done at the time of denial then the audit logs will potentially show a large number of false positives. Auditing when the denial is enforced, e.g. at mprotect(), suffers from its own problems, e.g.: - Requires LSMs to pre-generate audit messages so that they can be replayed by SGX when the denial is actually enforced. - System changes can result in stale audit messages, e.g. if files are removed from the system, an LSM profile is modified, etc... - A process could log what is essentially a false positive denial, e.g. if the current process has the requisite capability but the original enclave loader did not. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/include/uapi/asm/sgx.h | 5 +-- arch/x86/kernel/cpu/sgx/driver/ioctl.c | 15 +++++--- arch/x86/kernel/cpu/sgx/driver/main.c | 49 ++++++++++++++++++++++++++ arch/x86/kernel/cpu/sgx/encl.h | 1 + tools/testing/selftests/x86/sgx/main.c | 32 +++++++++++++++-- 5 files changed, 93 insertions(+), 9 deletions(-)