Message ID | 20200331114432.7593-3-jarkko.sakkinen@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Migrate enclave mapping to an anonymous inode | expand |
On Tue, Mar 31, 2020 at 4:44 AM Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > When creating an enclave attach it to an anonymous file. This prepares the > code to have a separate interface at runtime, which can be published to the > user space after the enclave has been fully initialized. This isn't an objection per se, but I can't shake the feeling that this seems ridiculous. This changes the type of object returned by open() because, without this change, the old type was problematic. So I have some questions: - Can sgx just ignore the fs noexec option on the chardev inode's fs instead? - Would SELinux users *want* to put a useful label on the inode? if so, can they still accomplish whatever they were trying to accomplish with this patch applied? --Andy
On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote: > On Tue, Mar 31, 2020 at 4:44 AM Jarkko Sakkinen > <jarkko.sakkinen@linux.intel.com> wrote: > > > > When creating an enclave attach it to an anonymous file. This prepares the > > code to have a separate interface at runtime, which can be published to the > > user space after the enclave has been fully initialized. > > This isn't an objection per se, but I can't shake the feeling that > this seems ridiculous. This changes the type of object returned by > open() because, without this change, the old type was problematic. > > So I have some questions: > > - Can sgx just ignore the fs noexec option on the chardev inode's fs instead? Not easily. do_mmap() rejects PROT_EXEC based on noexec without checking VM_MAYEXEC (because it's the one that configures VM_MAYEXEC). SGX could add VM_MAYEXEC back in during its ->mmap hook, but it would mean userspace would never be able to do mmap() w/ PROT_EXEC, i.e. would always have to mmap() then mprotect(). I don't see any way to a dodge the check without doing something nasty. if (path_noexec(&file->f_path)) { if (vm_flags & VM_EXEC) return -EPERM; vm_flags &= ~VM_MAYEXEC; } > - Would SELinux users *want* to put a useful label on the inode? Probably? EXECMEM is likely the biggest point of contention. I assume users would also want to do ioctl() whitelisting, etc... I can't think of a use case where someone would want to lock down the SGX ioctls, but I can see someone wanting to ensure the enclave fd can only be used for SGX stuff. > if so, can they still accomplish whatever they were trying to accomplish > with this patch applied? Not at this time. There's the "secure anon inode" series floating around that would theoretically remedy that, but that's pure happenstance and not something we want to rely on. It wasn't mentioned in the cover letter, but this will effectively require PROCESS_EXECMEM (in SELinux) for all processes that _run_ enclaves. It would allow processes that only _build_ enclaves to avoid PROCESS_EXECMEM, as they wouldn't need to actually map the enclave. Jarkko's contention is enclaves _should_ require EXECMEM and that it's ok to require EXECMEM on the runtime so long as the builder can run without EXECMEM. If EXECMEM is a sticking point, one way to dodge it would be to add a helper to allow SELinux to detect enclave files. It'd be ugly, but simple. That doesn't solve the generic labeling issue though. It also begs the question of why hacking SELinux but not do_mmap() would be acceptable. If you have any ideas for fixing the noexec issue without resorting to an anon inode, we're all ears. > > --Andy
On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote: > On Tue, Mar 31, 2020 at 4:44 AM Jarkko Sakkinen > <jarkko.sakkinen@linux.intel.com> wrote: > > > > When creating an enclave attach it to an anonymous file. This prepares the > > code to have a separate interface at runtime, which can be published to the > > user space after the enclave has been fully initialized. > > This isn't an objection per se, but I can't shake the feeling that > this seems ridiculous. This changes the type of object returned by > open() because, without this change, the old type was problematic. > > So I have some questions: > > - Can sgx just ignore the fs noexec option on the chardev inode's fs instead? It's not SGX that nak's. It's mm level decision. > - Would SELinux users *want* to put a useful label on the inode? if > so, can they still accomplish whatever they were trying to accomplish > with this patch applied? What this does is that by default SGX is essentially blocked from anything, which I think is sane. I think from security perspective the EXECMEM requirement brings more clarity than does harm. I'm sure that with this approach it is possible to integrate SGX to software packages such as Kubernetes and end users will most likely take into use through something like Kubernetes. /Jarkko
On Tue, Mar 31, 2020 at 5:24 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote: > > If EXECMEM is a sticking point, one way to dodge it would be to add a > helper to allow SELinux to detect enclave files. It'd be ugly, but simple. > That doesn't solve the generic labeling issue though. It also begs the > question of why hacking SELinux but not do_mmap() would be acceptable. > > If you have any ideas for fixing the noexec issue without resorting to an > anon inode, we're all ears. Hmm. Maybe teach udev to put /dev/sgx on a different fs and bind-mount it? Or make /dev/sgx be an actual filesystem? Or just mount /dev with exec enabled?
On Thu, Apr 02, 2020 at 02:41:39PM -0700, Andy Lutomirski wrote: > On Tue, Mar 31, 2020 at 5:24 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote: > > > > If EXECMEM is a sticking point, one way to dodge it would be to add a > > helper to allow SELinux to detect enclave files. It'd be ugly, but simple. > > That doesn't solve the generic labeling issue though. It also begs the > > question of why hacking SELinux but not do_mmap() would be acceptable. > > > > If you have any ideas for fixing the noexec issue without resorting to an > > anon inode, we're all ears. > > Hmm. Maybe teach udev to put /dev/sgx on a different fs and > bind-mount it? Or make /dev/sgx be an actual filesystem? Or just > mount /dev with exec enabled? I'm not forseeing how the last option could work out as it is distro's choice. Casey, do you think we could use securityfs for this or do you have some other recommendation? I'm just asking you because you've used securityfs a lot. /Jarkko
On Fri, Apr 03, 2020 at 09:56:32AM +0300, Jarkko Sakkinen wrote: > On Thu, Apr 02, 2020 at 02:41:39PM -0700, Andy Lutomirski wrote: > > On Tue, Mar 31, 2020 at 5:24 PM Sean Christopherson > > <sean.j.christopherson@intel.com> wrote: > > > > > > On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote: > > > > > > If EXECMEM is a sticking point, one way to dodge it would be to add a > > > helper to allow SELinux to detect enclave files. It'd be ugly, but simple. > > > That doesn't solve the generic labeling issue though. It also begs the > > > question of why hacking SELinux but not do_mmap() would be acceptable. > > > > > > If you have any ideas for fixing the noexec issue without resorting to an > > > anon inode, we're all ears. > > > > Hmm. Maybe teach udev to put /dev/sgx on a different fs and > > bind-mount it? Or make /dev/sgx be an actual filesystem? Or just > > mount /dev with exec enabled? > > I'm not forseeing how the last option could work out as it is distro's > choice. > > Casey, do you think we could use securityfs for this or do you have some > other recommendation? I'm just asking you because you've used securityfs > a lot. I'll squash 1/4 from this patch set since it is purely a fix. /Jarkko
On 4/2/2020 11:56 PM, Jarkko Sakkinen wrote: > On Thu, Apr 02, 2020 at 02:41:39PM -0700, Andy Lutomirski wrote: >> On Tue, Mar 31, 2020 at 5:24 PM Sean Christopherson >> <sean.j.christopherson@intel.com> wrote: >>> On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote: >>> >>> If EXECMEM is a sticking point, one way to dodge it would be to add a >>> helper to allow SELinux to detect enclave files. It'd be ugly, but simple. >>> That doesn't solve the generic labeling issue though. It also begs the >>> question of why hacking SELinux but not do_mmap() would be acceptable. >>> >>> If you have any ideas for fixing the noexec issue without resorting to an >>> anon inode, we're all ears. >> Hmm. Maybe teach udev to put /dev/sgx on a different fs and >> bind-mount it? Or make /dev/sgx be an actual filesystem? Or just >> mount /dev with exec enabled? > I'm not forseeing how the last option could work out as it is distro's > choice. > > Casey, do you think we could use securityfs for this or do you have some > other recommendation? I'm just asking you because you've used securityfs > a lot. I don't know how well securityfs works when mounted in a container, but otherwise it would seem like a viable option. On the other hand, pseudo filesystems are pretty easy to write, so /sys/fs/sgxfs wouldn't be a bad choice, either. > > /Jarkko
On Fri, Apr 03, 2020 at 07:35:16AM -0700, Casey Schaufler wrote: > > On 4/2/2020 11:56 PM, Jarkko Sakkinen wrote: > > On Thu, Apr 02, 2020 at 02:41:39PM -0700, Andy Lutomirski wrote: > >> On Tue, Mar 31, 2020 at 5:24 PM Sean Christopherson > >> <sean.j.christopherson@intel.com> wrote: > >>> On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote: > >>> > >>> If EXECMEM is a sticking point, one way to dodge it would be to add a > >>> helper to allow SELinux to detect enclave files. It'd be ugly, but simple. > >>> That doesn't solve the generic labeling issue though. It also begs the > >>> question of why hacking SELinux but not do_mmap() would be acceptable. > >>> > >>> If you have any ideas for fixing the noexec issue without resorting to an > >>> anon inode, we're all ears. > >> Hmm. Maybe teach udev to put /dev/sgx on a different fs and > >> bind-mount it? Or make /dev/sgx be an actual filesystem? Or just > >> mount /dev with exec enabled? > > I'm not forseeing how the last option could work out as it is distro's > > choice. > > > > Casey, do you think we could use securityfs for this or do you have some > > other recommendation? I'm just asking you because you've used securityfs > > a lot. > > I don't know how well securityfs works when mounted in a container, > but otherwise it would seem like a viable option. On the other hand, > pseudo filesystems are pretty easy to write, so /sys/fs/sgxfs wouldn't > be a bad choice, either. Ugh, sorry, forgot for a while that smackfs is independent fs. How does smackfs interact with namespaces? /Jarkko
On 4/3/2020 8:30 AM, Jarkko Sakkinen wrote: > On Fri, Apr 03, 2020 at 07:35:16AM -0700, Casey Schaufler wrote: >> On 4/2/2020 11:56 PM, Jarkko Sakkinen wrote: >>> On Thu, Apr 02, 2020 at 02:41:39PM -0700, Andy Lutomirski wrote: >>>> On Tue, Mar 31, 2020 at 5:24 PM Sean Christopherson >>>> <sean.j.christopherson@intel.com> wrote: >>>>> On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote: >>>>> >>>>> If EXECMEM is a sticking point, one way to dodge it would be to add a >>>>> helper to allow SELinux to detect enclave files. It'd be ugly, but simple. >>>>> That doesn't solve the generic labeling issue though. It also begs the >>>>> question of why hacking SELinux but not do_mmap() would be acceptable. >>>>> >>>>> If you have any ideas for fixing the noexec issue without resorting to an >>>>> anon inode, we're all ears. >>>> Hmm. Maybe teach udev to put /dev/sgx on a different fs and >>>> bind-mount it? Or make /dev/sgx be an actual filesystem? Or just >>>> mount /dev with exec enabled? >>> I'm not forseeing how the last option could work out as it is distro's >>> choice. >>> >>> Casey, do you think we could use securityfs for this or do you have some >>> other recommendation? I'm just asking you because you've used securityfs >>> a lot. >> I don't know how well securityfs works when mounted in a container, >> but otherwise it would seem like a viable option. On the other hand, >> pseudo filesystems are pretty easy to write, so /sys/fs/sgxfs wouldn't >> be a bad choice, either. > Ugh, sorry, forgot for a while that smackfs is independent fs. > > How does smackfs interact with namespaces? Smack attributes are global. Aside from privilege issues, namespaces ignore and are ignored by Smack. > > /Jarkko
On Fri, Apr 03, 2020 at 08:50:08AM -0700, Casey Schaufler wrote: > > How does smackfs interact with namespaces? > > Smack attributes are global. Aside from privilege issues, namespaces > ignore and are ignored by Smack. Okay. For SGX, I foresee things as: 1. Existing files are global. 2. If a policy of any kind is ever added it needs to be *per container*. I'm not sure whether PID or user namespace is the right choice here, but does not matter right now as the feature is not in the queue. To summarize: 1. We have a heterogeneous set of files (i.e. 'enclave' and 'provision' are not "different sames"). 2. The files probably will have heterogeneous visibility requirements. I think based on these premises own file system would be a more decent choice than populating /dev. Beside, SGX hasn't been a driver for a while. Andy, what do you think of this? /Jarkko
> On Apr 3, 2020, at 3:08 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > On Fri, Apr 03, 2020 at 08:50:08AM -0700, Casey Schaufler wrote: >>> How does smackfs interact with namespaces? >> >> Smack attributes are global. Aside from privilege issues, namespaces >> ignore and are ignored by Smack. > > Okay. > > For SGX, I foresee things as: > > 1. Existing files are global. > 2. If a policy of any kind is ever added it needs to be *per container*. > I'm not sure whether PID or user namespace is the right choice here, > but does not matter right now as the feature is not in the queue. > > To summarize: > > 1. We have a heterogeneous set of files (i.e. 'enclave' and 'provision' > are not "different sames"). > 2. The files probably will have heterogeneous visibility requirements. > > I think based on these premises own file system would be a more decent > choice than populating /dev. Beside, SGX hasn't been a driver for a > while. > > Andy, what do you think of this? Probably okay. There are two semantic questions you’ll have to address, though: - What happens if you mount sgxfs twice? Do you get two copies that can diverge from each other, or do you get two views of the same thing? - Can it be instantiated from outside the root initns? It’s certainly conceptually simpler to stick with device nodes. Why exactly is Ubuntu noexecing /dev? > > /Jarkko
This appears to originate in Debian Rationale: https://salsa.debian.org/kernel-team/initramfs-tools/-/merge_requests/9 Interestingly, they claim mmap(/dev/zero) is special-cased? Can we do the same for SGX? Some allowances were made in https://salsa.debian.org/kernel-team/initramfs-tools/-/commit/d6c6eeca3540d18f5bce95b5ffcb1823ab3050ea Including those people in this conversation. Ben, Towi: for context, see https://lore.kernel.org/linux-sgx/20200319142434.GA11305@linux.intel.com/T/ and https://lore.kernel.org/linux-sgx/20200401084511.GE17325@linux.intel.com/T/ -- Jethro Beekman | Fortanix On 2020-04-04 05:54, Andy Lutomirski wrote: > > >> On Apr 3, 2020, at 3:08 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: >> >> On Fri, Apr 03, 2020 at 08:50:08AM -0700, Casey Schaufler wrote: >>>> How does smackfs interact with namespaces? >>> >>> Smack attributes are global. Aside from privilege issues, namespaces >>> ignore and are ignored by Smack. >> >> Okay. >> >> For SGX, I foresee things as: >> >> 1. Existing files are global. >> 2. If a policy of any kind is ever added it needs to be *per container*. >> I'm not sure whether PID or user namespace is the right choice here, >> but does not matter right now as the feature is not in the queue. >> >> To summarize: >> >> 1. We have a heterogeneous set of files (i.e. 'enclave' and 'provision' >> are not "different sames"). >> 2. The files probably will have heterogeneous visibility requirements. >> >> I think based on these premises own file system would be a more decent >> choice than populating /dev. Beside, SGX hasn't been a driver for a >> while. >> >> Andy, what do you think of this? > > Probably okay. There are two semantic questions you’ll have to address, though: > > - What happens if you mount sgxfs twice? Do you get two copies that can diverge from each other, or do you get two views of the same thing? > > - Can it be instantiated from outside the root initns? > > It’s certainly conceptually simpler to stick with device nodes. Why exactly is Ubuntu noexecing /dev? > >> >> /Jarkko
On 4.4.2020 8.46, Jethro Beekman wrote: > This appears to originate in Debian > > Rationale: https://salsa.debian.org/kernel-team/initramfs-tools/-/merge_requests/9 > > Interestingly, they claim mmap(/dev/zero) is special-cased? Can we do the same for SGX? > > Some allowances were made in https://salsa.debian.org/kernel-team/initramfs-tools/-/commit/d6c6eeca3540d18f5bce95b5ffcb1823ab3050ea > > Including those people in this conversation. > > Ben, Towi: for context, see https://lore.kernel.org/linux-sgx/20200319142434.GA11305@linux.intel.com/T/ and https://lore.kernel.org/linux-sgx/20200401084511.GE17325@linux.intel.com/T/ > > -- > Jethro Beekman | Fortanix > > On 2020-04-04 05:54, Andy Lutomirski wrote: >> >> >>> On Apr 3, 2020, at 3:08 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: >>> >>> On Fri, Apr 03, 2020 at 08:50:08AM -0700, Casey Schaufler wrote: >>>>> How does smackfs interact with namespaces? >>>> >>>> Smack attributes are global. Aside from privilege issues, namespaces >>>> ignore and are ignored by Smack. >>> >>> Okay. >>> >>> For SGX, I foresee things as: >>> >>> 1. Existing files are global. >>> 2. If a policy of any kind is ever added it needs to be *per container*. >>> I'm not sure whether PID or user namespace is the right choice here, >>> but does not matter right now as the feature is not in the queue. >>> >>> To summarize: >>> >>> 1. We have a heterogeneous set of files (i.e. 'enclave' and 'provision' >>> are not "different sames"). >>> 2. The files probably will have heterogeneous visibility requirements. >>> >>> I think based on these premises own file system would be a more decent >>> choice than populating /dev. Beside, SGX hasn't been a driver for a >>> while. >>> >>> Andy, what do you think of this? >> >> Probably okay. There are two semantic questions you’ll have to address, though: >> >> - What happens if you mount sgxfs twice? Do you get two copies that can diverge from each other, or do you get two views of the same thing? >> >> - Can it be instantiated from outside the root initns? >> >> It’s certainly conceptually simpler to stick with device nodes. Why exactly is Ubuntu noexecing /dev? >> >>> >>> /Jarkko > My goal is to block executing binaries directly from /dev and using the file descriptors from device files to avoid EXECMEM, without relying on MACs. If the SGX device can be used to make new executable mappings, it should honor noexec for /dev. Then initramfs should make a similar exception as with v86d and grant exec to /dev. I think sgxfs should also honor noexec but it probably does not make sense to mount it so. With an ioctl to SGX device the caller can change previously writable memory to executable. It should be able to trigger PROCESS__EXECMEM check for SELinux, so perhaps LSM hook for mprotect should be called: https://github.com/intel/linux-sgx-driver/blob/95eaa6f6693cd86c35e10a22b4f8e483373c987c/sgx_ioctl.c#L254 Also SELinux reference policy doesn't know yet about SGX. Can the SGX enclave access physical memory, kernel memory or bypass MMU somehow even for the thread? If it can bypass SELinux protections, access should be conditional to boolean allow_raw_memory_access. -Topi
On Sat, Apr 04, 2020 at 10:27:53AM +0300, Topi Miettinen wrote: > On 4.4.2020 8.46, Jethro Beekman wrote: > > This appears to originate in Debian > > > > Rationale: https://salsa.debian.org/kernel-team/initramfs-tools/-/merge_requests/9 > > > > Interestingly, they claim mmap(/dev/zero) is special-cased? Can we do the same for SGX? > > > > Some allowances were made in https://salsa.debian.org/kernel-team/initramfs-tools/-/commit/d6c6eeca3540d18f5bce95b5ffcb1823ab3050ea > > > > Including those people in this conversation. > > > > Ben, Towi: for context, see https://lore.kernel.org/linux-sgx/20200319142434.GA11305@linux.intel.com/T/ and https://lore.kernel.org/linux-sgx/20200401084511.GE17325@linux.intel.com/T/ > > > > -- > > Jethro Beekman | Fortanix > > > > On 2020-04-04 05:54, Andy Lutomirski wrote: > > > > > > > > > > On Apr 3, 2020, at 3:08 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > > > > > > > On Fri, Apr 03, 2020 at 08:50:08AM -0700, Casey Schaufler wrote: > > > > > > How does smackfs interact with namespaces? > > > > > > > > > > Smack attributes are global. Aside from privilege issues, namespaces > > > > > ignore and are ignored by Smack. > > > > > > > > Okay. > > > > > > > > For SGX, I foresee things as: > > > > > > > > 1. Existing files are global. > > > > 2. If a policy of any kind is ever added it needs to be *per container*. > > > > I'm not sure whether PID or user namespace is the right choice here, > > > > but does not matter right now as the feature is not in the queue. > > > > > > > > To summarize: > > > > > > > > 1. We have a heterogeneous set of files (i.e. 'enclave' and 'provision' > > > > are not "different sames"). > > > > 2. The files probably will have heterogeneous visibility requirements. > > > > > > > > I think based on these premises own file system would be a more decent > > > > choice than populating /dev. Beside, SGX hasn't been a driver for a > > > > while. > > > > > > > > Andy, what do you think of this? > > > > > > Probably okay. There are two semantic questions you’ll have to address, though: > > > > > > - What happens if you mount sgxfs twice? Do you get two copies that can diverge from each other, or do you get two views of the same thing? > > > > > > - Can it be instantiated from outside the root initns? > > > > > > It’s certainly conceptually simpler to stick with device nodes. Why exactly is Ubuntu noexecing /dev? > > > > > > > > > > > /Jarkko > > > > My goal is to block executing binaries directly from /dev and using the file > descriptors from device files to avoid EXECMEM, without relying on MACs. If > the SGX device can be used to make new executable mappings, it should honor > noexec for /dev. Then initramfs should make a similar exception as with v86d > and grant exec to /dev. I think sgxfs should also honor noexec but it > probably does not make sense to mount it so. Just to bring context: the whole sgxfs thing is something that we are planning to do circulate the configuration issue with /dev i.e. move device nodes as file nodes to a custom fs. That feels somewhat counter productive and does not improve security in any possible way. I guess we keep our stuff in /dev where it logically belongs anyway and go through the configuration route then. > With an ioctl to SGX device the caller can change previously writable memory > to executable. It should be able to trigger PROCESS__EXECMEM check for > SELinux, so perhaps LSM hook for mprotect should be called: > https://github.com/intel/linux-sgx-driver/blob/95eaa6f6693cd86c35e10a22b4f8e483373c987c/sgx_ioctl.c#L254 You are looking at wrong thing. That is OOT driver that has diverged from the kernel code base over four years ago. This the latest code: https://github.com/jsakkine-intel/linux-sgx/tree/master/arch/x86/kernel/cpu/sgx But just look at the call pattern kselftest's main program should do: https://github.com/jsakkine-intel/linux-sgx/blob/master/tools/testing/selftests/sgx/main.c I.e. 1. Reserve by any means possible (usually anonymous mmap) an address range. 2. Construct enclave and initialize (no mapping involved). Up until this point everything works with or without noexec. Then: 3. mmap() regions. Internally kernel checks for mmap() and mprotect() that address ranges are opaque and requested permissions do not surpass the permissions that were assigned to the enclave pages. > Also SELinux reference policy doesn't know yet about SGX. Can the SGX > enclave access physical memory, kernel memory or bypass MMU somehow even for > the thread? If it can bypass SELinux protections, access should be > conditional to boolean allow_raw_memory_access. It can only do normal memory accesses within process address space. It is a submode for ring-3 essentially. SELinux policies cannot exist because the code has not reached yet upstream. For now I think we got what we needed aka now we know that there is a process for getting the exception and do not have to try the customfs route. Thank you for your feedback. Now we know what to do for the moment. /Jarkko
On Fri, Apr 03, 2020 at 08:54:40PM -0700, Andy Lutomirski wrote: > > > > On Apr 3, 2020, at 3:08 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > > > On Fri, Apr 03, 2020 at 08:50:08AM -0700, Casey Schaufler wrote: > >>> How does smackfs interact with namespaces? > >> > >> Smack attributes are global. Aside from privilege issues, namespaces > >> ignore and are ignored by Smack. > > > > Okay. > > > > For SGX, I foresee things as: > > > > 1. Existing files are global. > > 2. If a policy of any kind is ever added it needs to be *per container*. > > I'm not sure whether PID or user namespace is the right choice here, > > but does not matter right now as the feature is not in the queue. > > > > To summarize: > > > > 1. We have a heterogeneous set of files (i.e. 'enclave' and 'provision' > > are not "different sames"). > > 2. The files probably will have heterogeneous visibility requirements. > > > > I think based on these premises own file system would be a more decent > > choice than populating /dev. Beside, SGX hasn't been a driver for a > > while. > > > > Andy, what do you think of this? > > Probably okay. There are two semantic questions you’ll have to address, though: > > - What happens if you mount sgxfs twice? Do you get two copies that can diverge from each other, or do you get two views of the same thing? > > - Can it be instantiated from outside the root initns? > > It’s certainly conceptually simpler to stick with device nodes. Why exactly is Ubuntu noexecing /dev? I'm retreating this given that we have reasonable means to drive exception to the /dev configuration. Thanks Jethro for helping with this one! /Jarkko
On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev. I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all? -- Jethro Beekman | Fortanix
On 6.4.2020 9.42, Jethro Beekman wrote: > On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev. > > I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all? Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow? Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems. -Topi
> On Apr 6, 2020, at 4:01 AM, Topi Miettinen <toiwoton@gmail.com> wrote: > > On 6.4.2020 9.42, Jethro Beekman wrote: >> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev. >> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all? > > Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow? > > Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems. > > It’s *possible* to create a tmpfs, create the sgx nodes on it, bind-mount to /dev/sgx/..., and lazy-unmount the tmpfs. I don’t know whether udev would be willing to support such a thing.
On 2020-04-06 18:44, Andy Lutomirski wrote: > >> On Apr 6, 2020, at 4:01 AM, Topi Miettinen <toiwoton@gmail.com> wrote: >> >> On 6.4.2020 9.42, Jethro Beekman wrote: >>> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev. >>> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all? >> >> Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow? >> >> Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems. >> >> > > It’s *possible* to create a tmpfs, create the sgx nodes on it, bind-mount to /dev/sgx/..., and lazy-unmount the tmpfs. > > I don’t know whether udev would be willing to support such a thing. > It doesn't even need to be in a temporary location, you can just mount it directly at /dev/sgx -- Jethro Beekman | Fortanix
On Mon, Apr 06, 2020 at 02:01:38PM +0300, Topi Miettinen wrote: > On 6.4.2020 9.42, Jethro Beekman wrote: > > On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev. > > > > I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all? > > Intel does not control the whole market yet, does AMD also offer SGX or > similar? Will SGX be also available for consumer devices? Are distros going > to enable SGX, will it benefit their users somehow? It has a strong user base, yes. That's the whole reason for upstreaming it (like always). It has been available on all CPUs since Skylake. /Jarkko
On Mon, Apr 06, 2020 at 09:44:19AM -0700, Andy Lutomirski wrote: > > > On Apr 6, 2020, at 4:01 AM, Topi Miettinen <toiwoton@gmail.com> wrote: > > > > On 6.4.2020 9.42, Jethro Beekman wrote: > >> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev. > >> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all? > > > > Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow? > > > > Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems. > > > > > > It’s *possible* to create a tmpfs, create the sgx nodes on it, > bind-mount to /dev/sgx/..., and lazy-unmount the tmpfs. > > I don’t know whether udev would be willing to support such a thing. sgxfs is somewhat trivial to implement and has one stakeholder less to worry about. It is not really a huge stretch. Overally, I think it is something that we could live with. At least it is something that does not step on others toes. Haitao: If we go with sgxfs route, then you can for the moment do what Andy suggested: bind mount it to /dev/sgx. /Jarkko
On Mon, Apr 06, 2020 at 09:55:34PM +0300, Jarkko Sakkinen wrote: > On Mon, Apr 06, 2020 at 09:44:19AM -0700, Andy Lutomirski wrote: > > > > > On Apr 6, 2020, at 4:01 AM, Topi Miettinen <toiwoton@gmail.com> wrote: > > > > > > On 6.4.2020 9.42, Jethro Beekman wrote: > > >> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev. > > >> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all? > > > > > > Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow? > > > > > > Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems. > > > > > > > > > > It’s *possible* to create a tmpfs, create the sgx nodes on it, > > bind-mount to /dev/sgx/..., and lazy-unmount the tmpfs. > > > > I don’t know whether udev would be willing to support such a thing. > > sgxfs is somewhat trivial to implement and has one stakeholder less to > worry about. It is not really a huge stretch. > > Overally, I think it is something that we could live with. At least it > is something that does not step on others toes. > > Haitao: If we go with sgxfs route, then you can for the moment do what > Andy suggested: bind mount it to /dev/sgx. I'll head on and write the patch. Thanks everyone for all the feedback. I don't think this patch set was a waste of time. It is easier to reflect things when you have the code rather than pure speculation. /Jarkko
> On Apr 6, 2020, at 11:55 AM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > On Mon, Apr 06, 2020 at 09:44:19AM -0700, Andy Lutomirski wrote: >> >>>> On Apr 6, 2020, at 4:01 AM, Topi Miettinen <toiwoton@gmail.com> wrote: >>> >>> On 6.4.2020 9.42, Jethro Beekman wrote: >>>> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev. >>>> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all? >>> >>> Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow? >>> >>> Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems. >>> >>> >> >> It’s *possible* to create a tmpfs, create the sgx nodes on it, >> bind-mount to /dev/sgx/..., and lazy-unmount the tmpfs. >> >> I don’t know whether udev would be willing to support such a thing. > > sgxfs is somewhat trivial to implement and has one stakeholder less to > worry about. It is not really a huge stretch. > > Overally, I think it is something that we could live with. At least it > is something that does not step on others toes. > > Haitao: If we go with sgxfs route, then you can for the moment do what > Andy suggested: bind mount it to /dev/sgx. That also needs userspace support. I’ll start a thread on the udev list. > > /Jarkko
On Mon, Apr 06, 2020 at 12:53:41PM -0700, Andy Lutomirski wrote: > > sgxfs is somewhat trivial to implement and has one stakeholder less to > > worry about. It is not really a huge stretch. > > > > Overally, I think it is something that we could live with. At least it > > is something that does not step on others toes. > > > > Haitao: If we go with sgxfs route, then you can for the moment do what > > Andy suggested: bind mount it to /dev/sgx. > > That also needs userspace support. > > I’ll start a thread on the udev list. Haven't written any code yet but just by drafting the idea I already pinpointed something. You would add roughly this to the existing codebase: * struct file_system_type sgxfs * struct fs_context_operations sgxfs_ctx_ops * int sgxfs_get_tree() * int sgxfs_fill_super() * const struct tree_descr sgxfs_files[] The files would be defined as like this: static const struct tree_descr sgxfs_files[] { [SGXFS_ENCLAVE] = { "enclave", &sgxfs_encl_ops, /* ? */ }, [SGXFS_PROVISION] = { "provision", &sgxfs_encl_ops, /* ? */ }, }; The permissions would need to be completely static, which feels nasty from maintainability viewpoint :-( And I see no gain anywhere. In my opinion udev defining the whole /dev as noexec has zero technical merits. It is same as they would say that "we don't trust our own database". There are no real security benefits as long as dev nodes are configured correctly. /Jarkko
On Mon, Apr 6, 2020 at 2:24 PM Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > On Mon, Apr 06, 2020 at 12:53:41PM -0700, Andy Lutomirski wrote: > > > sgxfs is somewhat trivial to implement and has one stakeholder less to > > > worry about. It is not really a huge stretch. > > > > > > Overally, I think it is something that we could live with. At least it > > > is something that does not step on others toes. > > > > > > Haitao: If we go with sgxfs route, then you can for the moment do what > > > Andy suggested: bind mount it to /dev/sgx. > > > > That also needs userspace support. > > > > I’ll start a thread on the udev list. > > Haven't written any code yet but just by drafting the idea I already > pinpointed something. > > You would add roughly this to the existing codebase: > > * struct file_system_type sgxfs > * struct fs_context_operations sgxfs_ctx_ops > * int sgxfs_get_tree() > * int sgxfs_fill_super() > * const struct tree_descr sgxfs_files[] > > The files would be defined as like this: > > static const struct tree_descr sgxfs_files[] { > [SGXFS_ENCLAVE] = { "enclave", &sgxfs_encl_ops, /* ? */ }, > [SGXFS_PROVISION] = { "provision", &sgxfs_encl_ops, /* ? */ }, > }; > > The permissions would need to be completely static, which feels nasty > from maintainability viewpoint :-( And I see no gain anywhere. > > In my opinion udev defining the whole /dev as noexec has zero technical > merits. It is same as they would say that "we don't trust our own > database". There are no real security benefits as long as dev nodes are > configured correctly. I tend to agree, but it could be seen as a sort-of-valid hardening measure. I think the best bet is for you to ignore this and let userspace figure it out.
On Mon, Apr 06, 2020 at 04:18:20PM -0700, Andy Lutomirski wrote: > On Mon, Apr 6, 2020 at 2:24 PM Jarkko Sakkinen > <jarkko.sakkinen@linux.intel.com> wrote: > > > > On Mon, Apr 06, 2020 at 12:53:41PM -0700, Andy Lutomirski wrote: > > > > sgxfs is somewhat trivial to implement and has one stakeholder less to > > > > worry about. It is not really a huge stretch. > > > > > > > > Overally, I think it is something that we could live with. At least it > > > > is something that does not step on others toes. > > > > > > > > Haitao: If we go with sgxfs route, then you can for the moment do what > > > > Andy suggested: bind mount it to /dev/sgx. > > > > > > That also needs userspace support. > > > > > > I’ll start a thread on the udev list. > > > > Haven't written any code yet but just by drafting the idea I already > > pinpointed something. > > > > You would add roughly this to the existing codebase: > > > > * struct file_system_type sgxfs > > * struct fs_context_operations sgxfs_ctx_ops > > * int sgxfs_get_tree() > > * int sgxfs_fill_super() > > * const struct tree_descr sgxfs_files[] > > > > The files would be defined as like this: > > > > static const struct tree_descr sgxfs_files[] { > > [SGXFS_ENCLAVE] = { "enclave", &sgxfs_encl_ops, /* ? */ }, > > [SGXFS_PROVISION] = { "provision", &sgxfs_encl_ops, /* ? */ }, > > }; > > > > The permissions would need to be completely static, which feels nasty > > from maintainability viewpoint :-( And I see no gain anywhere. > > > > In my opinion udev defining the whole /dev as noexec has zero technical > > merits. It is same as they would say that "we don't trust our own > > database". There are no real security benefits as long as dev nodes are > > configured correctly. > > I tend to agree, but it could be seen as a sort-of-valid hardening measure. Bets are not great anyway to revert it so lets just consider it as a constraint then. > I think the best bet is for you to ignore this and let userspace figure it out. Yeah. You could put statically 0700 permissions and then let systemd to set the legit ones but it just feels a way too big hammer to do a new fs just to sort out a simple permission issue. /Jarkko
On 2020-04-07 01:18, Andy Lutomirski wrote: > On Mon, Apr 6, 2020 at 2:24 PM Jarkko Sakkinen > <jarkko.sakkinen@linux.intel.com> wrote: >> >> On Mon, Apr 06, 2020 at 12:53:41PM -0700, Andy Lutomirski wrote: >>>> sgxfs is somewhat trivial to implement and has one stakeholder less to >>>> worry about. It is not really a huge stretch. >>>> >>>> Overally, I think it is something that we could live with. At least it >>>> is something that does not step on others toes. >>>> >>>> Haitao: If we go with sgxfs route, then you can for the moment do what >>>> Andy suggested: bind mount it to /dev/sgx. >>> >>> That also needs userspace support. >>> >>> I’ll start a thread on the udev list. >> >> Haven't written any code yet but just by drafting the idea I already >> pinpointed something. >> >> You would add roughly this to the existing codebase: >> >> * struct file_system_type sgxfs >> * struct fs_context_operations sgxfs_ctx_ops >> * int sgxfs_get_tree() >> * int sgxfs_fill_super() >> * const struct tree_descr sgxfs_files[] >> >> The files would be defined as like this: >> >> static const struct tree_descr sgxfs_files[] { >> [SGXFS_ENCLAVE] = { "enclave", &sgxfs_encl_ops, /* ? */ }, >> [SGXFS_PROVISION] = { "provision", &sgxfs_encl_ops, /* ? */ }, >> }; >> >> The permissions would need to be completely static, which feels nasty >> from maintainability viewpoint :-( And I see no gain anywhere. >> >> In my opinion udev defining the whole /dev as noexec has zero technical >> merits. It is same as they would say that "we don't trust our own >> database". There are no real security benefits as long as dev nodes are >> configured correctly. > > I tend to agree, but it could be seen as a sort-of-valid hardening measure. Yeah it seems to me `noexec` is overly broad in not allowing mmap(PROT_EXEC). I'd expect it to disallow only execve(), which is fine for /dev. -- Jethro Beekman | Fortanix
On 7.4.2020 0.24, Jarkko Sakkinen wrote: > In my opinion udev defining the whole /dev as noexec has zero technical > merits. It is same as they would say that "we don't trust our own > database". There are no real security benefits as long as dev nodes are > configured correctly. The threat is not that the device nodes would have execute permissions, but that a malicious entity with write access to /dev would create a new executable and run it, or rather, trick another (perhaps more privileged or more vulnerable) entity to do so. The malicious entity does not need any capabilities and it can be constrained by any number of typical seccomp filters which just don't block such basic system calls as open(), write(), [f]chmod() and close(). It simply needs to have UID 0 (possibly something else, like suitable GID could also be sufficient for some subdirectories) and write access to /dev (or its subdirectories) in its mount namespace. My philosophy is that "trust" means confidence that an action will not be done even when there's no control over it. "Control" means that it's possible to make active decision on whether the action can or cannot be allowed to be done. Trust in security mindset is a weak thing, control is stronger, but the strongest case is when you don't need trust nor control: the action simply can't ever happen because it's impossible or always forbidden. This idea is shown in such famous principles as "least privilege", "need to know" or compartmentalization. If the additional privilege of exec is not needed, it should not exist. -Topi
Please correct me if I'm wrong, but isn't it the goal of SGX to let a (suitably privileged) process designate some of its memory areas as part of SGX enclave? If so, why don't you simply add a system call to do so, such as int sgx_mprotect(void *start, size_t length, int prot, u64 sgx_flags); like existing pkey_mprotect()? Or add a flag PROT_SGX to mprotect() like existing PROT_SAO/PROT_SEM? -Topi
On Tue, Apr 07, 2020 at 11:48:10AM +0300, Topi Miettinen wrote: > On 7.4.2020 0.24, Jarkko Sakkinen wrote: > > In my opinion udev defining the whole /dev as noexec has zero technical > > merits. It is same as they would say that "we don't trust our own > > database". There are no real security benefits as long as dev nodes are > > configured correctly. > > The threat is not that the device nodes would have execute permissions, but > that a malicious entity with write access to /dev would create a new > executable and run it, or rather, trick another (perhaps more privileged or > more vulnerable) entity to do so. The malicious entity does not need any > capabilities and it can be constrained by any number of typical seccomp > filters which just don't block such basic system calls as open(), write(), > [f]chmod() and close(). It simply needs to have UID 0 (possibly something > else, like suitable GID could also be sufficient for some subdirectories) > and write access to /dev (or its subdirectories) in its mount namespace. > > My philosophy is that "trust" means confidence that an action will not be > done even when there's no control over it. "Control" means that it's > possible to make active decision on whether the action can or cannot be > allowed to be done. Trust in security mindset is a weak thing, control is > stronger, but the strongest case is when you don't need trust nor control: > the action simply can't ever happen because it's impossible or always > forbidden. This idea is shown in such famous principles as "least > privilege", "need to know" or compartmentalization. If the additional > privilege of exec is not needed, it should not exist. > > -Topi I get the threat scenario, thanks. The problem (as Jethro correctly pointed out) with noexec /dev is somewhat broad. Thank you anyway for taking time describing the threat scenario. /Jarkko
On Tue, Apr 07, 2020 at 12:04:58PM +0300, Topi Miettinen wrote: > Please correct me if I'm wrong, but isn't it the goal of SGX to let a > (suitably privileged) process designate some of its memory areas as part of > SGX enclave? If so, why don't you simply add a system call to do so, such as > > int sgx_mprotect(void *start, size_t length, int prot, u64 sgx_flags); > > like existing pkey_mprotect()? Or add a flag PROT_SGX to mprotect() like > existing PROT_SAO/PROT_SEM? > > -Topi New syscalls is always the last resort path, especially if they are associated with an arch. PROT_SGX sounds something worth of consideration. Another idea to throw would be noexec_dev mount option that would allow exec *only* for the device nodes (zero analysis done on feasibility). /Jarkko
On Tue, Apr 07, 2020 at 07:57:08PM +0300, Jarkko Sakkinen wrote: > On Tue, Apr 07, 2020 at 12:04:58PM +0300, Topi Miettinen wrote: > > Please correct me if I'm wrong, but isn't it the goal of SGX to let a > > (suitably privileged) process designate some of its memory areas as part of > > SGX enclave? If so, why don't you simply add a system call to do so, such as > > > > int sgx_mprotect(void *start, size_t length, int prot, u64 sgx_flags); > > > > like existing pkey_mprotect()? Or add a flag PROT_SGX to mprotect() like > > existing PROT_SAO/PROT_SEM? > > > > -Topi > > New syscalls is always the last resort path, especially if they are > associated with an arch. > > PROT_SGX sounds something worth of consideration. > > Another idea to throw would be noexec_dev mount option that would allow > exec *only* for the device nodes (zero analysis done on feasibility). The 2nd proposal has the merit that it would scale above SGX and does not give additional strengths to the adversary in the context of the threat scenario. /Jarkko
On Tue, Apr 07, 2020 at 07:59:00PM +0300, Jarkko Sakkinen wrote: > On Tue, Apr 07, 2020 at 07:57:08PM +0300, Jarkko Sakkinen wrote: > > On Tue, Apr 07, 2020 at 12:04:58PM +0300, Topi Miettinen wrote: > > > Please correct me if I'm wrong, but isn't it the goal of SGX to let a > > > (suitably privileged) process designate some of its memory areas as part of > > > SGX enclave? If so, why don't you simply add a system call to do so, such as > > > > > > int sgx_mprotect(void *start, size_t length, int prot, u64 sgx_flags); > > > > > > like existing pkey_mprotect()? Or add a flag PROT_SGX to mprotect() like > > > existing PROT_SAO/PROT_SEM? > > > > > > -Topi > > > > New syscalls is always the last resort path, especially if they are > > associated with an arch. > > > > PROT_SGX sounds something worth of consideration. > > > > Another idea to throw would be noexec_dev mount option that would allow > > exec *only* for the device nodes (zero analysis done on feasibility). > > The 2nd proposal has the merit that it would scale above SGX and > does not give additional strengths to the adversary in the context > of the threat scenario. Or. Why couldn't kernel just disallow anything but device files to be created to devtmpfs unconditionally? Then noexec would not be needed in the first place. /Jarkko
On 7.4.2020 21.04, Jarkko Sakkinen wrote: > On Tue, Apr 07, 2020 at 07:59:00PM +0300, Jarkko Sakkinen wrote: >> On Tue, Apr 07, 2020 at 07:57:08PM +0300, Jarkko Sakkinen wrote: >>> On Tue, Apr 07, 2020 at 12:04:58PM +0300, Topi Miettinen wrote: >>>> Please correct me if I'm wrong, but isn't it the goal of SGX to let a >>>> (suitably privileged) process designate some of its memory areas as part of >>>> SGX enclave? If so, why don't you simply add a system call to do so, such as >>>> >>>> int sgx_mprotect(void *start, size_t length, int prot, u64 sgx_flags); >>>> >>>> like existing pkey_mprotect()? Or add a flag PROT_SGX to mprotect() like >>>> existing PROT_SAO/PROT_SEM? >>>> >>>> -Topi >>> >>> New syscalls is always the last resort path, especially if they are >>> associated with an arch. >>> >>> PROT_SGX sounds something worth of consideration. >>> >>> Another idea to throw would be noexec_dev mount option that would allow >>> exec *only* for the device nodes (zero analysis done on feasibility). >> >> The 2nd proposal has the merit that it would scale above SGX and >> does not give additional strengths to the adversary in the context >> of the threat scenario. > > Or. > > Why couldn't kernel just disallow anything but device files to be > created to devtmpfs unconditionally? It seems to be just a regular tmpfs but with direct access to add device nodes when drivers are loaded. Nice idea, maybe something like that could be done with SELinux policy. -Topi
On Tue, Apr 07, 2020 at 10:54:46PM +0300, Topi Miettinen wrote: > On 7.4.2020 21.04, Jarkko Sakkinen wrote: > > On Tue, Apr 07, 2020 at 07:59:00PM +0300, Jarkko Sakkinen wrote: > > > On Tue, Apr 07, 2020 at 07:57:08PM +0300, Jarkko Sakkinen wrote: > > > > On Tue, Apr 07, 2020 at 12:04:58PM +0300, Topi Miettinen wrote: > > > > > Please correct me if I'm wrong, but isn't it the goal of SGX to let a > > > > > (suitably privileged) process designate some of its memory areas as part of > > > > > SGX enclave? If so, why don't you simply add a system call to do so, such as > > > > > > > > > > int sgx_mprotect(void *start, size_t length, int prot, u64 sgx_flags); > > > > > > > > > > like existing pkey_mprotect()? Or add a flag PROT_SGX to mprotect() like > > > > > existing PROT_SAO/PROT_SEM? > > > > > > > > > > -Topi > > > > > > > > New syscalls is always the last resort path, especially if they are > > > > associated with an arch. > > > > > > > > PROT_SGX sounds something worth of consideration. > > > > > > > > Another idea to throw would be noexec_dev mount option that would allow > > > > exec *only* for the device nodes (zero analysis done on feasibility). > > > > > > The 2nd proposal has the merit that it would scale above SGX and > > > does not give additional strengths to the adversary in the context > > > of the threat scenario. > > > > Or. > > > > Why couldn't kernel just disallow anything but device files to be > > created to devtmpfs unconditionally? > > It seems to be just a regular tmpfs but with direct access to add device > nodes when drivers are loaded. Nice idea, maybe something like that could be > done with SELinux policy. Anyway, thank you for all the feedback. What starts to be obvious is that we don't do anything in code level in SGX particular but instead workaround something around /dev. /Jarkko
On Wed, Apr 08, 2020 at 04:40:49PM +0300, Jarkko Sakkinen wrote: > What starts to be obvious is that we don't do anything in code level > in SGX particular but instead workaround something around /dev. Can you summarize the plan going forward? Thanks!
On 8.4.2020 16.40, Jarkko Sakkinen wrote: > On Tue, Apr 07, 2020 at 10:54:46PM +0300, Topi Miettinen wrote: >> On 7.4.2020 21.04, Jarkko Sakkinen wrote: >>> On Tue, Apr 07, 2020 at 07:59:00PM +0300, Jarkko Sakkinen wrote: >>>> On Tue, Apr 07, 2020 at 07:57:08PM +0300, Jarkko Sakkinen wrote: >>>>> On Tue, Apr 07, 2020 at 12:04:58PM +0300, Topi Miettinen wrote: >>>>>> Please correct me if I'm wrong, but isn't it the goal of SGX to let a >>>>>> (suitably privileged) process designate some of its memory areas as part of >>>>>> SGX enclave? If so, why don't you simply add a system call to do so, such as >>>>>> >>>>>> int sgx_mprotect(void *start, size_t length, int prot, u64 sgx_flags); >>>>>> >>>>>> like existing pkey_mprotect()? Or add a flag PROT_SGX to mprotect() like >>>>>> existing PROT_SAO/PROT_SEM? >>>>>> >>>>>> -Topi >>>>> >>>>> New syscalls is always the last resort path, especially if they are >>>>> associated with an arch. >>>>> >>>>> PROT_SGX sounds something worth of consideration. >>>>> >>>>> Another idea to throw would be noexec_dev mount option that would allow >>>>> exec *only* for the device nodes (zero analysis done on feasibility). >>>> >>>> The 2nd proposal has the merit that it would scale above SGX and >>>> does not give additional strengths to the adversary in the context >>>> of the threat scenario. >>> >>> Or. >>> >>> Why couldn't kernel just disallow anything but device files to be >>> created to devtmpfs unconditionally? >> >> It seems to be just a regular tmpfs but with direct access to add device >> nodes when drivers are loaded. Nice idea, maybe something like that could be >> done with SELinux policy. > > Anyway, thank you for all the feedback. > > What starts to be obvious is that we don't do anything in code level > in SGX particular but instead workaround something around /dev. If you take the /dev/sgx path, perhaps you could use KVM as a reference. It uses a similar special device /dev/kvm, works well with noexec /dev but still it can be used to do much more complex stuff than SGX. -Topi
On Thu, Apr 09, 2020 at 12:15:36AM +0300, Topi Miettinen wrote: > On 8.4.2020 16.40, Jarkko Sakkinen wrote: > >What starts to be obvious is that we don't do anything in code level > >in SGX particular but instead workaround something around /dev. > > If you take the /dev/sgx path, perhaps you could use KVM as a reference. It > uses a similar special device /dev/kvm, works well with noexec /dev but > still it can be used to do much more complex stuff than SGX. But userspace doesn't need to mmap() /dev/kvm with PROT_EXEC, that's the rub. KVM uses anon inodes for VMs, vCPUs, etc..., but doing that on SGX runs afould of SELinux's PROCESS_EXECMEM, again due to mmap() PROT_EXEC.
On Wed, Apr 08, 2020 at 07:56:36AM -0700, Sean Christopherson wrote: > On Wed, Apr 08, 2020 at 04:40:49PM +0300, Jarkko Sakkinen wrote: > > What starts to be obvious is that we don't do anything in code level > > in SGX particular but instead workaround something around /dev. > > Can you summarize the plan going forward? Thanks! Sure. The summary is that the permission problem should be solved outside of SGX. Doing an FS is somewhat big hammer to sort out the permission problem. It will also make configuration more clunky i.e. kernel defines some static permissions and then let say the systemd unit file will overwrite them with some other. It is somewhat more sound to have the udev db to contain the permissions. Admitting that we don't have exact solutions I think we have enough knowledge to say that things can be refined to work for SGX outside of SGX. I've been playing with the idea to add something like SB_I_DEVMMAPEXEC [1] to superblock flags. This would be set by devtmpfs (drivers/base/devtmpfs.c) initialization code. Then do_mmap_pgoff should check this flag from superblock and that we are mapping a device node. If these premises are fulfilled, then it would allow mmap(). This will make exec work when mapping device nodes but does not allow attacker to put executable to /dev and run them in the threat scenario described by Topi [2]. Andy also mentioned that he was going mail to udev ML but have not checked that at this point. [1] For explicit control of this new behavior also mnt flag is needed. [2] https://patchwork.kernel.org/patch/11467637/#23269417 /Jarkko
On 2020-04-06 21:53, Andy Lutomirski wrote: > > >> On Apr 6, 2020, at 11:55 AM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: >> >> On Mon, Apr 06, 2020 at 09:44:19AM -0700, Andy Lutomirski wrote: >>> >>>>> On Apr 6, 2020, at 4:01 AM, Topi Miettinen <toiwoton@gmail.com> wrote: >>>> >>>> On 6.4.2020 9.42, Jethro Beekman wrote: >>>>> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev. >>>>> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all? >>>> >>>> Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow? >>>> >>>> Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems. >>>> >>>> >>> >>> It’s *possible* to create a tmpfs, create the sgx nodes on it, >>> bind-mount to /dev/sgx/..., and lazy-unmount the tmpfs. >>> >>> I don’t know whether udev would be willing to support such a thing. >> >> sgxfs is somewhat trivial to implement and has one stakeholder less to >> worry about. It is not really a huge stretch. >> >> Overally, I think it is something that we could live with. At least it >> is something that does not step on others toes. >> >> Haitao: If we go with sgxfs route, then you can for the moment do what >> Andy suggested: bind mount it to /dev/sgx. > > That also needs userspace support. > > I’ll start a thread on the udev list. Andy, can you send a link to this thread? -- Jethro Beekman | Fortanix
On Wed, Nov 18, 2020 at 11:23 PM Jethro Beekman <jethro@fortanix.com> wrote: > > On 2020-04-06 21:53, Andy Lutomirski wrote: > > > > > >> On Apr 6, 2020, at 11:55 AM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > >> > >> On Mon, Apr 06, 2020 at 09:44:19AM -0700, Andy Lutomirski wrote: > >>> > >>>>> On Apr 6, 2020, at 4:01 AM, Topi Miettinen <toiwoton@gmail.com> wrote: > >>>> > >>>> On 6.4.2020 9.42, Jethro Beekman wrote: > >>>>> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev. > >>>>> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all? > >>>> > >>>> Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow? > >>>> > >>>> Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems. > >>>> > >>>> > >>> > >>> It’s *possible* to create a tmpfs, create the sgx nodes on it, > >>> bind-mount to /dev/sgx/..., and lazy-unmount the tmpfs. > >>> > >>> I don’t know whether udev would be willing to support such a thing. > >> > >> sgxfs is somewhat trivial to implement and has one stakeholder less to > >> worry about. It is not really a huge stretch. > >> > >> Overally, I think it is something that we could live with. At least it > >> is something that does not step on others toes. > >> > >> Haitao: If we go with sgxfs route, then you can for the moment do what > >> Andy suggested: bind mount it to /dev/sgx. > > > > That also needs userspace support. > > > > I’ll start a thread on the udev list. > > Andy, can you send a link to this thread? > Hmm, I may never have sent the email. Let me do this right now.
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c index 997a7f4117c5..1c825ef957db 100644 --- a/arch/x86/kernel/cpu/sgx/driver.c +++ b/arch/x86/kernel/cpu/sgx/driver.c @@ -6,6 +6,7 @@ #include <linux/mman.h> #include <linux/security.h> #include <linux/suspend.h> +#include <linux/anon_inodes.h> #include <asm/traps.h> #include "driver.h" #include "encl.h" @@ -21,35 +22,7 @@ u64 sgx_attributes_reserved_mask; u64 sgx_xfrm_reserved_mask = ~0x3; u32 sgx_xsave_size_tbl[64]; -static int sgx_open(struct inode *inode, struct file *file) -{ - struct sgx_encl *encl; - int ret; - - encl = kzalloc(sizeof(*encl), GFP_KERNEL); - if (!encl) - return -ENOMEM; - - atomic_set(&encl->flags, 0); - kref_init(&encl->refcount); - INIT_LIST_HEAD(&encl->va_pages); - INIT_RADIX_TREE(&encl->page_tree, GFP_KERNEL); - mutex_init(&encl->lock); - INIT_LIST_HEAD(&encl->mm_list); - spin_lock_init(&encl->mm_lock); - - ret = init_srcu_struct(&encl->srcu); - if (ret) { - kfree(encl); - return ret; - } - - file->private_data = encl; - - return 0; -} - -static int sgx_release(struct inode *inode, struct file *file) +static int sgx_encl_file_release(struct inode *inode, struct file *file) { struct sgx_encl *encl = file->private_data; struct sgx_encl_mm *encl_mm; @@ -84,6 +57,68 @@ static int sgx_release(struct inode *inode, struct file *file) return 0; } +static const struct file_operations sgx_encl_file_fops = { + .owner = THIS_MODULE, + .release = sgx_encl_file_release, +}; + +static int sgx_open(struct inode *inode, struct file *file) +{ + struct file *encl_file = NULL; + struct sgx_encl *encl = NULL; + int ret; + + encl = kzalloc(sizeof(*encl), GFP_KERNEL); + if (!encl) { + ret = -ENOMEM; + goto err; + } + + encl_file = anon_inode_getfile("[sgx]", &sgx_encl_file_fops, encl, + O_RDWR); + if (IS_ERR(encl_file)) { + ret = PTR_ERR(encl_file); + goto err; + } + + ret = init_srcu_struct(&encl->srcu); + if (ret) + goto err; + + atomic_set(&encl->flags, 0); + kref_init(&encl->refcount); + INIT_LIST_HEAD(&encl->va_pages); + INIT_RADIX_TREE(&encl->page_tree, GFP_KERNEL); + mutex_init(&encl->lock); + INIT_LIST_HEAD(&encl->mm_list); + spin_lock_init(&encl->mm_lock); + + file->private_data = encl_file; + + return 0; + +err: + if (encl_file) + fput(encl_file); + + kfree(encl); + return ret; +} + +static int sgx_encl_dev_release(struct inode *inode, struct file *file) +{ + struct file *encl_file = file->private_data; + + /* + * Can be NULL when the enclave file has been handed over to the + * user space. + */ + if (encl_file) + fput(encl_file); + + return 0; +} + #ifdef CONFIG_COMPAT static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) @@ -94,7 +129,8 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, static int sgx_mmap(struct file *file, struct vm_area_struct *vma) { - struct sgx_encl *encl = file->private_data; + struct file *encl_file = file->private_data; + struct sgx_encl *encl = encl_file->private_data; int ret; ret = sgx_encl_may_map(encl, vma->vm_start, vma->vm_end, @@ -128,10 +164,10 @@ static unsigned long sgx_get_unmapped_area(struct file *file, return current->mm->get_unmapped_area(file, addr, len, pgoff, flags); } -static const struct file_operations sgx_encl_fops = { +static const struct file_operations sgx_encl_dev_fops = { .owner = THIS_MODULE, .open = sgx_open, - .release = sgx_release, + .release = sgx_encl_dev_release, .unlocked_ioctl = sgx_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = sgx_compat_ioctl, @@ -148,7 +184,7 @@ static struct miscdevice sgx_dev_enclave = { .minor = MISC_DYNAMIC_MINOR, .name = "enclave", .nodename = "sgx/enclave", - .fops = &sgx_encl_fops, + .fops = &sgx_encl_dev_fops, }; static struct miscdevice sgx_dev_provision = { diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 3af0596530a8..891aa9395907 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -766,7 +766,8 @@ static long sgx_ioc_enclave_set_attribute(struct sgx_encl *encl, long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) { - struct sgx_encl *encl = filep->private_data; + struct file *encl_file = filep->private_data; + struct sgx_encl *encl = encl_file->private_data; int ret, encl_flags; encl_flags = atomic_fetch_or(SGX_ENCL_IOCTL, &encl->flags);
When creating an enclave attach it to an anonymous file. This prepares the code to have a separate interface at runtime, which can be published to the user space after the enclave has been fully initialized. Cc: luto@kernel.org Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Haitao Huang <haitao.huang@linux.intel.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> --- arch/x86/kernel/cpu/sgx/driver.c | 102 +++++++++++++++++++++---------- arch/x86/kernel/cpu/sgx/ioctl.c | 3 +- 2 files changed, 71 insertions(+), 34 deletions(-)