diff mbox series

[2/4] x86/sgx: Put enclaves into anonymous files

Message ID 20200331114432.7593-3-jarkko.sakkinen@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Migrate enclave mapping to an anonymous inode | expand

Commit Message

Jarkko Sakkinen March 31, 2020, 11:44 a.m. UTC
When creating an enclave attach it to an anonymous file. This prepares the
code to have a separate interface at runtime, which can be published to the
user space after the enclave has been fully initialized.

Cc: luto@kernel.org
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Haitao Huang <haitao.huang@linux.intel.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/kernel/cpu/sgx/driver.c | 102 +++++++++++++++++++++----------
 arch/x86/kernel/cpu/sgx/ioctl.c  |   3 +-
 2 files changed, 71 insertions(+), 34 deletions(-)

Comments

Andy Lutomirski March 31, 2020, 5:39 p.m. UTC | #1
On Tue, Mar 31, 2020 at 4:44 AM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> When creating an enclave attach it to an anonymous file. This prepares the
> code to have a separate interface at runtime, which can be published to the
> user space after the enclave has been fully initialized.

This isn't an objection per se, but I can't shake the feeling that
this seems ridiculous.  This changes the type of object returned by
open() because, without this change, the old type was problematic.

So I have some questions:

 - Can sgx just ignore the fs noexec option on the chardev inode's fs instead?

 - Would SELinux users *want* to put a useful label on the inode?  if
so, can they still accomplish whatever they were trying to accomplish
with this patch applied?

--Andy
Sean Christopherson April 1, 2020, 12:24 a.m. UTC | #2
On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote:
> On Tue, Mar 31, 2020 at 4:44 AM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > When creating an enclave attach it to an anonymous file. This prepares the
> > code to have a separate interface at runtime, which can be published to the
> > user space after the enclave has been fully initialized.
> 
> This isn't an objection per se, but I can't shake the feeling that
> this seems ridiculous.  This changes the type of object returned by
> open() because, without this change, the old type was problematic.
> 
> So I have some questions:
> 
>  - Can sgx just ignore the fs noexec option on the chardev inode's fs instead?

Not easily.  do_mmap() rejects PROT_EXEC based on noexec without checking
VM_MAYEXEC (because it's the one that configures VM_MAYEXEC).  SGX could add
VM_MAYEXEC back in during its ->mmap hook, but it would mean userspace would
never be able to do mmap() w/ PROT_EXEC, i.e. would always have to mmap()
then mprotect().  I don't see any way to a dodge the check without doing
something nasty.

                        if (path_noexec(&file->f_path)) {
                                if (vm_flags & VM_EXEC)
                                        return -EPERM;
                                vm_flags &= ~VM_MAYEXEC;
                        }

>  - Would SELinux users *want* to put a useful label on the inode?

Probably?  EXECMEM is likely the biggest point of contention.  I assume
users would also want to do ioctl() whitelisting, etc...  I can't think of
a use case where someone would want to lock down the SGX ioctls, but I can
see someone wanting to ensure the enclave fd can only be used for SGX stuff.

> if so, can they still accomplish whatever they were trying to accomplish
> with this patch applied?

Not at this time.  There's the "secure anon inode" series floating around
that would theoretically remedy that, but that's pure happenstance and not
something we want to rely on.

It wasn't mentioned in the cover letter, but this will effectively require
PROCESS_EXECMEM (in SELinux) for all processes that _run_ enclaves.  It
would allow processes that only _build_ enclaves to avoid PROCESS_EXECMEM,
as they wouldn't need to actually map the enclave.  Jarkko's contention is
enclaves _should_ require EXECMEM and that it's ok to require EXECMEM on
the runtime so long as the builder can run without EXECMEM.

If EXECMEM is a sticking point, one way to dodge it would be to add a
helper to allow SELinux to detect enclave files.  It'd be ugly, but simple.
That doesn't solve the generic labeling issue though.  It also begs the
question of why hacking SELinux but not do_mmap() would be acceptable.

If you have any ideas for fixing the noexec issue without resorting to an
anon inode, we're all ears.

> 
> --Andy
Jarkko Sakkinen April 1, 2020, 8:45 a.m. UTC | #3
On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote:
> On Tue, Mar 31, 2020 at 4:44 AM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > When creating an enclave attach it to an anonymous file. This prepares the
> > code to have a separate interface at runtime, which can be published to the
> > user space after the enclave has been fully initialized.
> 
> This isn't an objection per se, but I can't shake the feeling that
> this seems ridiculous.  This changes the type of object returned by
> open() because, without this change, the old type was problematic.
> 
> So I have some questions:
> 
>  - Can sgx just ignore the fs noexec option on the chardev inode's fs instead?

It's not SGX that nak's. It's mm level decision.

>  - Would SELinux users *want* to put a useful label on the inode?  if
> so, can they still accomplish whatever they were trying to accomplish
> with this patch applied?

What this does is that by default SGX is essentially blocked from
anything, which I think is sane. I think from security perspective
the EXECMEM requirement brings more clarity than does harm.

I'm sure that with this approach it is possible to integrate SGX to
software packages such as Kubernetes and end users will most likely
take into use through something like Kubernetes.

/Jarkko
Andy Lutomirski April 2, 2020, 9:41 p.m. UTC | #4
On Tue, Mar 31, 2020 at 5:24 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote:
>
> If EXECMEM is a sticking point, one way to dodge it would be to add a
> helper to allow SELinux to detect enclave files.  It'd be ugly, but simple.
> That doesn't solve the generic labeling issue though.  It also begs the
> question of why hacking SELinux but not do_mmap() would be acceptable.
>
> If you have any ideas for fixing the noexec issue without resorting to an
> anon inode, we're all ears.

Hmm.  Maybe teach udev to put /dev/sgx on a different fs and
bind-mount it?  Or make /dev/sgx be an actual filesystem?  Or just
mount /dev with exec enabled?
Jarkko Sakkinen April 3, 2020, 6:56 a.m. UTC | #5
On Thu, Apr 02, 2020 at 02:41:39PM -0700, Andy Lutomirski wrote:
> On Tue, Mar 31, 2020 at 5:24 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote:
> >
> > If EXECMEM is a sticking point, one way to dodge it would be to add a
> > helper to allow SELinux to detect enclave files.  It'd be ugly, but simple.
> > That doesn't solve the generic labeling issue though.  It also begs the
> > question of why hacking SELinux but not do_mmap() would be acceptable.
> >
> > If you have any ideas for fixing the noexec issue without resorting to an
> > anon inode, we're all ears.
> 
> Hmm.  Maybe teach udev to put /dev/sgx on a different fs and
> bind-mount it?  Or make /dev/sgx be an actual filesystem?  Or just
> mount /dev with exec enabled?

I'm not forseeing how the last option could work out as it is distro's
choice.

Casey, do you think we could use securityfs for this or do you have some
other recommendation? I'm just asking you because you've used securityfs
a lot.

/Jarkko
Jarkko Sakkinen April 3, 2020, 6:59 a.m. UTC | #6
On Fri, Apr 03, 2020 at 09:56:32AM +0300, Jarkko Sakkinen wrote:
> On Thu, Apr 02, 2020 at 02:41:39PM -0700, Andy Lutomirski wrote:
> > On Tue, Mar 31, 2020 at 5:24 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote:
> > >
> > > If EXECMEM is a sticking point, one way to dodge it would be to add a
> > > helper to allow SELinux to detect enclave files.  It'd be ugly, but simple.
> > > That doesn't solve the generic labeling issue though.  It also begs the
> > > question of why hacking SELinux but not do_mmap() would be acceptable.
> > >
> > > If you have any ideas for fixing the noexec issue without resorting to an
> > > anon inode, we're all ears.
> > 
> > Hmm.  Maybe teach udev to put /dev/sgx on a different fs and
> > bind-mount it?  Or make /dev/sgx be an actual filesystem?  Or just
> > mount /dev with exec enabled?
> 
> I'm not forseeing how the last option could work out as it is distro's
> choice.
> 
> Casey, do you think we could use securityfs for this or do you have some
> other recommendation? I'm just asking you because you've used securityfs
> a lot.

I'll squash 1/4 from this patch set since it is purely a fix.

/Jarkko
Casey Schaufler April 3, 2020, 2:35 p.m. UTC | #7
On 4/2/2020 11:56 PM, Jarkko Sakkinen wrote:
> On Thu, Apr 02, 2020 at 02:41:39PM -0700, Andy Lutomirski wrote:
>> On Tue, Mar 31, 2020 at 5:24 PM Sean Christopherson
>> <sean.j.christopherson@intel.com> wrote:
>>> On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote:
>>>
>>> If EXECMEM is a sticking point, one way to dodge it would be to add a
>>> helper to allow SELinux to detect enclave files.  It'd be ugly, but simple.
>>> That doesn't solve the generic labeling issue though.  It also begs the
>>> question of why hacking SELinux but not do_mmap() would be acceptable.
>>>
>>> If you have any ideas for fixing the noexec issue without resorting to an
>>> anon inode, we're all ears.
>> Hmm.  Maybe teach udev to put /dev/sgx on a different fs and
>> bind-mount it?  Or make /dev/sgx be an actual filesystem?  Or just
>> mount /dev with exec enabled?
> I'm not forseeing how the last option could work out as it is distro's
> choice.
>
> Casey, do you think we could use securityfs for this or do you have some
> other recommendation? I'm just asking you because you've used securityfs
> a lot.

I don't know how well securityfs works when mounted in a container,
but otherwise it would seem like a viable option. On the other hand,
pseudo filesystems are pretty easy to write, so /sys/fs/sgxfs wouldn't
be a bad choice, either.

>
> /Jarkko
Jarkko Sakkinen April 3, 2020, 3:30 p.m. UTC | #8
On Fri, Apr 03, 2020 at 07:35:16AM -0700, Casey Schaufler wrote:
> 
> On 4/2/2020 11:56 PM, Jarkko Sakkinen wrote:
> > On Thu, Apr 02, 2020 at 02:41:39PM -0700, Andy Lutomirski wrote:
> >> On Tue, Mar 31, 2020 at 5:24 PM Sean Christopherson
> >> <sean.j.christopherson@intel.com> wrote:
> >>> On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote:
> >>>
> >>> If EXECMEM is a sticking point, one way to dodge it would be to add a
> >>> helper to allow SELinux to detect enclave files.  It'd be ugly, but simple.
> >>> That doesn't solve the generic labeling issue though.  It also begs the
> >>> question of why hacking SELinux but not do_mmap() would be acceptable.
> >>>
> >>> If you have any ideas for fixing the noexec issue without resorting to an
> >>> anon inode, we're all ears.
> >> Hmm.  Maybe teach udev to put /dev/sgx on a different fs and
> >> bind-mount it?  Or make /dev/sgx be an actual filesystem?  Or just
> >> mount /dev with exec enabled?
> > I'm not forseeing how the last option could work out as it is distro's
> > choice.
> >
> > Casey, do you think we could use securityfs for this or do you have some
> > other recommendation? I'm just asking you because you've used securityfs
> > a lot.
> 
> I don't know how well securityfs works when mounted in a container,
> but otherwise it would seem like a viable option. On the other hand,
> pseudo filesystems are pretty easy to write, so /sys/fs/sgxfs wouldn't
> be a bad choice, either.

Ugh, sorry, forgot for a while that smackfs is independent fs.

How does smackfs interact with namespaces?

/Jarkko
Casey Schaufler April 3, 2020, 3:50 p.m. UTC | #9
On 4/3/2020 8:30 AM, Jarkko Sakkinen wrote:
> On Fri, Apr 03, 2020 at 07:35:16AM -0700, Casey Schaufler wrote:
>> On 4/2/2020 11:56 PM, Jarkko Sakkinen wrote:
>>> On Thu, Apr 02, 2020 at 02:41:39PM -0700, Andy Lutomirski wrote:
>>>> On Tue, Mar 31, 2020 at 5:24 PM Sean Christopherson
>>>> <sean.j.christopherson@intel.com> wrote:
>>>>> On Tue, Mar 31, 2020 at 10:39:38AM -0700, Andy Lutomirski wrote:
>>>>>
>>>>> If EXECMEM is a sticking point, one way to dodge it would be to add a
>>>>> helper to allow SELinux to detect enclave files.  It'd be ugly, but simple.
>>>>> That doesn't solve the generic labeling issue though.  It also begs the
>>>>> question of why hacking SELinux but not do_mmap() would be acceptable.
>>>>>
>>>>> If you have any ideas for fixing the noexec issue without resorting to an
>>>>> anon inode, we're all ears.
>>>> Hmm.  Maybe teach udev to put /dev/sgx on a different fs and
>>>> bind-mount it?  Or make /dev/sgx be an actual filesystem?  Or just
>>>> mount /dev with exec enabled?
>>> I'm not forseeing how the last option could work out as it is distro's
>>> choice.
>>>
>>> Casey, do you think we could use securityfs for this or do you have some
>>> other recommendation? I'm just asking you because you've used securityfs
>>> a lot.
>> I don't know how well securityfs works when mounted in a container,
>> but otherwise it would seem like a viable option. On the other hand,
>> pseudo filesystems are pretty easy to write, so /sys/fs/sgxfs wouldn't
>> be a bad choice, either.
> Ugh, sorry, forgot for a while that smackfs is independent fs.
>
> How does smackfs interact with namespaces?

Smack attributes are global. Aside from privilege issues, namespaces
ignore and are ignored by Smack.

>
> /Jarkko
Jarkko Sakkinen April 3, 2020, 10:08 p.m. UTC | #10
On Fri, Apr 03, 2020 at 08:50:08AM -0700, Casey Schaufler wrote:
> > How does smackfs interact with namespaces?
> 
> Smack attributes are global. Aside from privilege issues, namespaces
> ignore and are ignored by Smack.

Okay.

For SGX, I foresee things as:

1. Existing files are global.
2. If a policy of any kind is ever added it needs to be *per container*.
   I'm not sure whether PID or user namespace is the right choice here,
   but does not matter right now as the feature is not in the queue.

To summarize:

1. We have a heterogeneous set of files (i.e. 'enclave' and 'provision'
   are not "different sames").
2. The files probably will have heterogeneous visibility requirements.

I think based on these premises own file system would be a more decent
choice than populating /dev. Beside, SGX hasn't been a driver for a
while.

Andy, what do you think of this?

/Jarkko
Andy Lutomirski April 4, 2020, 3:54 a.m. UTC | #11
> On Apr 3, 2020, at 3:08 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> 
> On Fri, Apr 03, 2020 at 08:50:08AM -0700, Casey Schaufler wrote:
>>> How does smackfs interact with namespaces?
>> 
>> Smack attributes are global. Aside from privilege issues, namespaces
>> ignore and are ignored by Smack.
> 
> Okay.
> 
> For SGX, I foresee things as:
> 
> 1. Existing files are global.
> 2. If a policy of any kind is ever added it needs to be *per container*.
>   I'm not sure whether PID or user namespace is the right choice here,
>   but does not matter right now as the feature is not in the queue.
> 
> To summarize:
> 
> 1. We have a heterogeneous set of files (i.e. 'enclave' and 'provision'
>   are not "different sames").
> 2. The files probably will have heterogeneous visibility requirements.
> 
> I think based on these premises own file system would be a more decent
> choice than populating /dev. Beside, SGX hasn't been a driver for a
> while.
> 
> Andy, what do you think of this?

Probably okay.  There are two semantic questions you’ll have to address, though:

- What happens if you mount sgxfs twice?  Do you get two copies that can diverge from each other, or do you get two views of the same thing?

- Can it be instantiated from outside the root initns?

It’s certainly conceptually simpler to stick with device nodes. Why exactly is Ubuntu noexecing /dev?

> 
> /Jarkko
Jethro Beekman April 4, 2020, 5:46 a.m. UTC | #12
This appears to originate in Debian

Rationale: https://salsa.debian.org/kernel-team/initramfs-tools/-/merge_requests/9

Interestingly, they claim mmap(/dev/zero) is special-cased? Can we do the same for SGX?

Some allowances were made in https://salsa.debian.org/kernel-team/initramfs-tools/-/commit/d6c6eeca3540d18f5bce95b5ffcb1823ab3050ea

Including those people in this conversation.

Ben, Towi: for context, see https://lore.kernel.org/linux-sgx/20200319142434.GA11305@linux.intel.com/T/ and https://lore.kernel.org/linux-sgx/20200401084511.GE17325@linux.intel.com/T/

--
Jethro Beekman | Fortanix

On 2020-04-04 05:54, Andy Lutomirski wrote:
> 
> 
>> On Apr 3, 2020, at 3:08 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>
>> On Fri, Apr 03, 2020 at 08:50:08AM -0700, Casey Schaufler wrote:
>>>> How does smackfs interact with namespaces?
>>>
>>> Smack attributes are global. Aside from privilege issues, namespaces
>>> ignore and are ignored by Smack.
>>
>> Okay.
>>
>> For SGX, I foresee things as:
>>
>> 1. Existing files are global.
>> 2. If a policy of any kind is ever added it needs to be *per container*.
>>   I'm not sure whether PID or user namespace is the right choice here,
>>   but does not matter right now as the feature is not in the queue.
>>
>> To summarize:
>>
>> 1. We have a heterogeneous set of files (i.e. 'enclave' and 'provision'
>>   are not "different sames").
>> 2. The files probably will have heterogeneous visibility requirements.
>>
>> I think based on these premises own file system would be a more decent
>> choice than populating /dev. Beside, SGX hasn't been a driver for a
>> while.
>>
>> Andy, what do you think of this?
> 
> Probably okay.  There are two semantic questions you’ll have to address, though:
> 
> - What happens if you mount sgxfs twice?  Do you get two copies that can diverge from each other, or do you get two views of the same thing?
> 
> - Can it be instantiated from outside the root initns?
> 
> It’s certainly conceptually simpler to stick with device nodes. Why exactly is Ubuntu noexecing /dev?
> 
>>
>> /Jarkko
Topi Miettinen April 4, 2020, 7:27 a.m. UTC | #13
On 4.4.2020 8.46, Jethro Beekman wrote:
> This appears to originate in Debian
> 
> Rationale: https://salsa.debian.org/kernel-team/initramfs-tools/-/merge_requests/9
> 
> Interestingly, they claim mmap(/dev/zero) is special-cased? Can we do the same for SGX?
> 
> Some allowances were made in https://salsa.debian.org/kernel-team/initramfs-tools/-/commit/d6c6eeca3540d18f5bce95b5ffcb1823ab3050ea
> 
> Including those people in this conversation.
> 
> Ben, Towi: for context, see https://lore.kernel.org/linux-sgx/20200319142434.GA11305@linux.intel.com/T/ and https://lore.kernel.org/linux-sgx/20200401084511.GE17325@linux.intel.com/T/
> 
> --
> Jethro Beekman | Fortanix
> 
> On 2020-04-04 05:54, Andy Lutomirski wrote:
>>
>>
>>> On Apr 3, 2020, at 3:08 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>>
>>> On Fri, Apr 03, 2020 at 08:50:08AM -0700, Casey Schaufler wrote:
>>>>> How does smackfs interact with namespaces?
>>>>
>>>> Smack attributes are global. Aside from privilege issues, namespaces
>>>> ignore and are ignored by Smack.
>>>
>>> Okay.
>>>
>>> For SGX, I foresee things as:
>>>
>>> 1. Existing files are global.
>>> 2. If a policy of any kind is ever added it needs to be *per container*.
>>>    I'm not sure whether PID or user namespace is the right choice here,
>>>    but does not matter right now as the feature is not in the queue.
>>>
>>> To summarize:
>>>
>>> 1. We have a heterogeneous set of files (i.e. 'enclave' and 'provision'
>>>    are not "different sames").
>>> 2. The files probably will have heterogeneous visibility requirements.
>>>
>>> I think based on these premises own file system would be a more decent
>>> choice than populating /dev. Beside, SGX hasn't been a driver for a
>>> while.
>>>
>>> Andy, what do you think of this?
>>
>> Probably okay.  There are two semantic questions you’ll have to address, though:
>>
>> - What happens if you mount sgxfs twice?  Do you get two copies that can diverge from each other, or do you get two views of the same thing?
>>
>> - Can it be instantiated from outside the root initns?
>>
>> It’s certainly conceptually simpler to stick with device nodes. Why exactly is Ubuntu noexecing /dev?
>>
>>>
>>> /Jarkko
> 

My goal is to block executing binaries directly from /dev and using the 
file descriptors from device files to avoid EXECMEM, without relying on 
MACs. If the SGX device can be used to make new executable mappings, it 
should honor noexec for /dev. Then initramfs should make a similar 
exception as with v86d and grant exec to /dev. I think sgxfs should also 
honor noexec but it probably does not make sense to mount it so.

With an ioctl to SGX device the caller can change previously writable 
memory to executable. It should be able to trigger PROCESS__EXECMEM 
check for SELinux, so perhaps LSM hook for mprotect should be called:
https://github.com/intel/linux-sgx-driver/blob/95eaa6f6693cd86c35e10a22b4f8e483373c987c/sgx_ioctl.c#L254

Also SELinux reference policy doesn't know yet about SGX. Can the SGX 
enclave access physical memory, kernel memory or bypass MMU somehow even 
for the thread? If it can bypass SELinux protections, access should be 
conditional to boolean allow_raw_memory_access.

-Topi
Jarkko Sakkinen April 4, 2020, 9:20 a.m. UTC | #14
On Sat, Apr 04, 2020 at 10:27:53AM +0300, Topi Miettinen wrote:
> On 4.4.2020 8.46, Jethro Beekman wrote:
> > This appears to originate in Debian
> > 
> > Rationale: https://salsa.debian.org/kernel-team/initramfs-tools/-/merge_requests/9
> > 
> > Interestingly, they claim mmap(/dev/zero) is special-cased? Can we do the same for SGX?
> > 
> > Some allowances were made in https://salsa.debian.org/kernel-team/initramfs-tools/-/commit/d6c6eeca3540d18f5bce95b5ffcb1823ab3050ea
> > 
> > Including those people in this conversation.
> > 
> > Ben, Towi: for context, see https://lore.kernel.org/linux-sgx/20200319142434.GA11305@linux.intel.com/T/ and https://lore.kernel.org/linux-sgx/20200401084511.GE17325@linux.intel.com/T/
> > 
> > --
> > Jethro Beekman | Fortanix
> > 
> > On 2020-04-04 05:54, Andy Lutomirski wrote:
> > > 
> > > 
> > > > On Apr 3, 2020, at 3:08 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> > > > 
> > > > On Fri, Apr 03, 2020 at 08:50:08AM -0700, Casey Schaufler wrote:
> > > > > > How does smackfs interact with namespaces?
> > > > > 
> > > > > Smack attributes are global. Aside from privilege issues, namespaces
> > > > > ignore and are ignored by Smack.
> > > > 
> > > > Okay.
> > > > 
> > > > For SGX, I foresee things as:
> > > > 
> > > > 1. Existing files are global.
> > > > 2. If a policy of any kind is ever added it needs to be *per container*.
> > > >    I'm not sure whether PID or user namespace is the right choice here,
> > > >    but does not matter right now as the feature is not in the queue.
> > > > 
> > > > To summarize:
> > > > 
> > > > 1. We have a heterogeneous set of files (i.e. 'enclave' and 'provision'
> > > >    are not "different sames").
> > > > 2. The files probably will have heterogeneous visibility requirements.
> > > > 
> > > > I think based on these premises own file system would be a more decent
> > > > choice than populating /dev. Beside, SGX hasn't been a driver for a
> > > > while.
> > > > 
> > > > Andy, what do you think of this?
> > > 
> > > Probably okay.  There are two semantic questions you’ll have to address, though:
> > > 
> > > - What happens if you mount sgxfs twice?  Do you get two copies that can diverge from each other, or do you get two views of the same thing?
> > > 
> > > - Can it be instantiated from outside the root initns?
> > > 
> > > It’s certainly conceptually simpler to stick with device nodes. Why exactly is Ubuntu noexecing /dev?
> > > 
> > > > 
> > > > /Jarkko
> > 
> 
> My goal is to block executing binaries directly from /dev and using the file
> descriptors from device files to avoid EXECMEM, without relying on MACs. If
> the SGX device can be used to make new executable mappings, it should honor
> noexec for /dev. Then initramfs should make a similar exception as with v86d
> and grant exec to /dev. I think sgxfs should also honor noexec but it
> probably does not make sense to mount it so.

Just to bring context: the whole sgxfs thing is something that we are
planning to do circulate the configuration issue with /dev i.e. move
device nodes as file nodes to a custom fs. That feels somewhat counter
productive and does not improve security in any possible way.

I guess we keep our stuff in /dev where it logically belongs anyway
and go through the configuration route then.

> With an ioctl to SGX device the caller can change previously writable memory
> to executable. It should be able to trigger PROCESS__EXECMEM check for
> SELinux, so perhaps LSM hook for mprotect should be called:
> https://github.com/intel/linux-sgx-driver/blob/95eaa6f6693cd86c35e10a22b4f8e483373c987c/sgx_ioctl.c#L254

You are looking at wrong thing. That is OOT driver that has diverged
from the kernel code base over four years ago.

This the latest code:

https://github.com/jsakkine-intel/linux-sgx/tree/master/arch/x86/kernel/cpu/sgx

But just look at the call pattern kselftest's main program should do:

https://github.com/jsakkine-intel/linux-sgx/blob/master/tools/testing/selftests/sgx/main.c

I.e.

1. Reserve by any means possible (usually anonymous mmap) an address
   range.
2. Construct enclave and initialize (no mapping involved).

Up until this point everything works with or without noexec.

Then:

3. mmap() regions.

Internally kernel checks for mmap() and mprotect() that address ranges
are opaque and requested permissions do not surpass the permissions that
were assigned to the enclave pages.

> Also SELinux reference policy doesn't know yet about SGX. Can the SGX
> enclave access physical memory, kernel memory or bypass MMU somehow even for
> the thread? If it can bypass SELinux protections, access should be
> conditional to boolean allow_raw_memory_access.

It can only do normal memory accesses within process address space. It
is a submode for ring-3 essentially.

SELinux policies cannot exist because the code has not reached yet
upstream. For now I think we got what we needed aka now we know that
there is a process for getting the exception and do not have to try the
customfs route.

Thank you for your feedback. Now we know what to do for the moment.

/Jarkko
Jarkko Sakkinen April 4, 2020, 9:22 a.m. UTC | #15
On Fri, Apr 03, 2020 at 08:54:40PM -0700, Andy Lutomirski wrote:
> 
> 
> > On Apr 3, 2020, at 3:08 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> > 
> > On Fri, Apr 03, 2020 at 08:50:08AM -0700, Casey Schaufler wrote:
> >>> How does smackfs interact with namespaces?
> >> 
> >> Smack attributes are global. Aside from privilege issues, namespaces
> >> ignore and are ignored by Smack.
> > 
> > Okay.
> > 
> > For SGX, I foresee things as:
> > 
> > 1. Existing files are global.
> > 2. If a policy of any kind is ever added it needs to be *per container*.
> >   I'm not sure whether PID or user namespace is the right choice here,
> >   but does not matter right now as the feature is not in the queue.
> > 
> > To summarize:
> > 
> > 1. We have a heterogeneous set of files (i.e. 'enclave' and 'provision'
> >   are not "different sames").
> > 2. The files probably will have heterogeneous visibility requirements.
> > 
> > I think based on these premises own file system would be a more decent
> > choice than populating /dev. Beside, SGX hasn't been a driver for a
> > while.
> > 
> > Andy, what do you think of this?
> 
> Probably okay.  There are two semantic questions you’ll have to address, though:
> 
> - What happens if you mount sgxfs twice?  Do you get two copies that can diverge from each other, or do you get two views of the same thing?
> 
> - Can it be instantiated from outside the root initns?
> 
> It’s certainly conceptually simpler to stick with device nodes. Why exactly is Ubuntu noexecing /dev?

I'm retreating this given that we have reasonable means to drive
exception to the /dev configuration.

Thanks Jethro for helping with this one!

/Jarkko
Jethro Beekman April 6, 2020, 6:42 a.m. UTC | #16
On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev.

I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all?

--
Jethro Beekman | Fortanix
Topi Miettinen April 6, 2020, 11:01 a.m. UTC | #17
On 6.4.2020 9.42, Jethro Beekman wrote:
> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev.
> 
> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all?

Intel does not control the whole market yet, does AMD also offer SGX or 
similar? Will SGX be also available for consumer devices? Are distros 
going to enable SGX, will it benefit their users somehow?

Perhaps the sgxfs approach or something else (system call?) would be 
better after all in order to not force exec just because of one device. 
/dev is usually writable, so allowing exec means breaking the W^X 
principle for filesystems.

-Topi
Andy Lutomirski April 6, 2020, 4:44 p.m. UTC | #18
> On Apr 6, 2020, at 4:01 AM, Topi Miettinen <toiwoton@gmail.com> wrote:
> 
> On 6.4.2020 9.42, Jethro Beekman wrote:
>> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev.
>> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all?
> 
> Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow?
> 
> Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems.
> 
> 

It’s *possible* to create a tmpfs, create the sgx nodes on it, bind-mount to /dev/sgx/..., and lazy-unmount the tmpfs.

I don’t know whether udev would be willing to support such a thing.
Jethro Beekman April 6, 2020, 5:17 p.m. UTC | #19
On 2020-04-06 18:44, Andy Lutomirski wrote:
> 
>> On Apr 6, 2020, at 4:01 AM, Topi Miettinen <toiwoton@gmail.com> wrote:
>>
>> On 6.4.2020 9.42, Jethro Beekman wrote:
>>> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev.
>>> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all?
>>
>> Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow?
>>
>> Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems.
>>
>>
> 
> It’s *possible* to create a tmpfs, create the sgx nodes on it, bind-mount to /dev/sgx/..., and lazy-unmount the tmpfs.
> 
> I don’t know whether udev would be willing to support such a thing.
> 

It doesn't even need to be in a temporary location, you can just mount it directly at /dev/sgx

--
Jethro Beekman | Fortanix
Jarkko Sakkinen April 6, 2020, 6:47 p.m. UTC | #20
On Mon, Apr 06, 2020 at 02:01:38PM +0300, Topi Miettinen wrote:
> On 6.4.2020 9.42, Jethro Beekman wrote:
> > On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev.
> > 
> > I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all?
> 
> Intel does not control the whole market yet, does AMD also offer SGX or
> similar? Will SGX be also available for consumer devices? Are distros going
> to enable SGX, will it benefit their users somehow?

It has a strong user base, yes. That's the whole reason for upstreaming it
(like always). It has been available on all CPUs since Skylake.

/Jarkko
Jarkko Sakkinen April 6, 2020, 6:55 p.m. UTC | #21
On Mon, Apr 06, 2020 at 09:44:19AM -0700, Andy Lutomirski wrote:
> 
> > On Apr 6, 2020, at 4:01 AM, Topi Miettinen <toiwoton@gmail.com> wrote:
> > 
> > On 6.4.2020 9.42, Jethro Beekman wrote:
> >> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev.
> >> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all?
> > 
> > Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow?
> > 
> > Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems.
> > 
> > 
> 
> It’s *possible* to create a tmpfs, create the sgx nodes on it,
> bind-mount to /dev/sgx/..., and lazy-unmount the tmpfs.
> 
> I don’t know whether udev would be willing to support such a thing.

sgxfs is somewhat trivial to implement and has one stakeholder less to
worry about. It is not really a huge stretch.

Overally, I think it is something that we could live with. At least it
is something that does not step on others toes.

Haitao: If we go with sgxfs route, then you can for the moment do what
Andy suggested: bind mount it to /dev/sgx.

/Jarkko
Jarkko Sakkinen April 6, 2020, 7:01 p.m. UTC | #22
On Mon, Apr 06, 2020 at 09:55:34PM +0300, Jarkko Sakkinen wrote:
> On Mon, Apr 06, 2020 at 09:44:19AM -0700, Andy Lutomirski wrote:
> > 
> > > On Apr 6, 2020, at 4:01 AM, Topi Miettinen <toiwoton@gmail.com> wrote:
> > > 
> > > On 6.4.2020 9.42, Jethro Beekman wrote:
> > >> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev.
> > >> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all?
> > > 
> > > Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow?
> > > 
> > > Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems.
> > > 
> > > 
> > 
> > It’s *possible* to create a tmpfs, create the sgx nodes on it,
> > bind-mount to /dev/sgx/..., and lazy-unmount the tmpfs.
> > 
> > I don’t know whether udev would be willing to support such a thing.
> 
> sgxfs is somewhat trivial to implement and has one stakeholder less to
> worry about. It is not really a huge stretch.
> 
> Overally, I think it is something that we could live with. At least it
> is something that does not step on others toes.
> 
> Haitao: If we go with sgxfs route, then you can for the moment do what
> Andy suggested: bind mount it to /dev/sgx.

I'll head on and write the patch.

Thanks everyone for all the feedback. I don't think this patch set was a
waste of time. It is easier to reflect things when you have the code
rather than pure speculation.

/Jarkko
Andy Lutomirski April 6, 2020, 7:53 p.m. UTC | #23
> On Apr 6, 2020, at 11:55 AM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> 
> On Mon, Apr 06, 2020 at 09:44:19AM -0700, Andy Lutomirski wrote:
>> 
>>>> On Apr 6, 2020, at 4:01 AM, Topi Miettinen <toiwoton@gmail.com> wrote:
>>> 
>>> On 6.4.2020 9.42, Jethro Beekman wrote:
>>>> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev.
>>>> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all?
>>> 
>>> Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow?
>>> 
>>> Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems.
>>> 
>>> 
>> 
>> It’s *possible* to create a tmpfs, create the sgx nodes on it,
>> bind-mount to /dev/sgx/..., and lazy-unmount the tmpfs.
>> 
>> I don’t know whether udev would be willing to support such a thing.
> 
> sgxfs is somewhat trivial to implement and has one stakeholder less to
> worry about. It is not really a huge stretch.
> 
> Overally, I think it is something that we could live with. At least it
> is something that does not step on others toes.
> 
> Haitao: If we go with sgxfs route, then you can for the moment do what
> Andy suggested: bind mount it to /dev/sgx.

That also needs userspace support.

I’ll start a thread on the udev list.

> 
> /Jarkko
Jarkko Sakkinen April 6, 2020, 9:24 p.m. UTC | #24
On Mon, Apr 06, 2020 at 12:53:41PM -0700, Andy Lutomirski wrote:
> > sgxfs is somewhat trivial to implement and has one stakeholder less to
> > worry about. It is not really a huge stretch.
> > 
> > Overally, I think it is something that we could live with. At least it
> > is something that does not step on others toes.
> > 
> > Haitao: If we go with sgxfs route, then you can for the moment do what
> > Andy suggested: bind mount it to /dev/sgx.
> 
> That also needs userspace support.
> 
> I’ll start a thread on the udev list.

Haven't written any code yet but just by drafting the idea I already
pinpointed something.

You would add roughly this to the existing codebase:

* struct file_system_type sgxfs
* struct fs_context_operations sgxfs_ctx_ops
* int sgxfs_get_tree()
* int sgxfs_fill_super()
* const struct tree_descr sgxfs_files[]

The files would be defined as like this:

static const struct tree_descr sgxfs_files[] {
	[SGXFS_ENCLAVE] = { "enclave", &sgxfs_encl_ops, /* ? */ },
	[SGXFS_PROVISION] = { "provision", &sgxfs_encl_ops, /* ? */ },
};

The permissions would need to be completely static, which feels nasty
from maintainability viewpoint :-( And I see no gain anywhere.

In my opinion udev defining the whole /dev as noexec has zero technical
merits. It is same as they would say that "we don't trust our own
database". There are no real security benefits as long as dev nodes are
configured correctly.

/Jarkko
Andy Lutomirski April 6, 2020, 11:18 p.m. UTC | #25
On Mon, Apr 6, 2020 at 2:24 PM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> On Mon, Apr 06, 2020 at 12:53:41PM -0700, Andy Lutomirski wrote:
> > > sgxfs is somewhat trivial to implement and has one stakeholder less to
> > > worry about. It is not really a huge stretch.
> > >
> > > Overally, I think it is something that we could live with. At least it
> > > is something that does not step on others toes.
> > >
> > > Haitao: If we go with sgxfs route, then you can for the moment do what
> > > Andy suggested: bind mount it to /dev/sgx.
> >
> > That also needs userspace support.
> >
> > I’ll start a thread on the udev list.
>
> Haven't written any code yet but just by drafting the idea I already
> pinpointed something.
>
> You would add roughly this to the existing codebase:
>
> * struct file_system_type sgxfs
> * struct fs_context_operations sgxfs_ctx_ops
> * int sgxfs_get_tree()
> * int sgxfs_fill_super()
> * const struct tree_descr sgxfs_files[]
>
> The files would be defined as like this:
>
> static const struct tree_descr sgxfs_files[] {
>         [SGXFS_ENCLAVE] = { "enclave", &sgxfs_encl_ops, /* ? */ },
>         [SGXFS_PROVISION] = { "provision", &sgxfs_encl_ops, /* ? */ },
> };
>
> The permissions would need to be completely static, which feels nasty
> from maintainability viewpoint :-( And I see no gain anywhere.
>
> In my opinion udev defining the whole /dev as noexec has zero technical
> merits. It is same as they would say that "we don't trust our own
> database". There are no real security benefits as long as dev nodes are
> configured correctly.

I tend to agree, but it could be seen as a sort-of-valid hardening measure.

I think the best bet is for you to ignore this and let userspace figure it out.
Jarkko Sakkinen April 6, 2020, 11:48 p.m. UTC | #26
On Mon, Apr 06, 2020 at 04:18:20PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 6, 2020 at 2:24 PM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > On Mon, Apr 06, 2020 at 12:53:41PM -0700, Andy Lutomirski wrote:
> > > > sgxfs is somewhat trivial to implement and has one stakeholder less to
> > > > worry about. It is not really a huge stretch.
> > > >
> > > > Overally, I think it is something that we could live with. At least it
> > > > is something that does not step on others toes.
> > > >
> > > > Haitao: If we go with sgxfs route, then you can for the moment do what
> > > > Andy suggested: bind mount it to /dev/sgx.
> > >
> > > That also needs userspace support.
> > >
> > > I’ll start a thread on the udev list.
> >
> > Haven't written any code yet but just by drafting the idea I already
> > pinpointed something.
> >
> > You would add roughly this to the existing codebase:
> >
> > * struct file_system_type sgxfs
> > * struct fs_context_operations sgxfs_ctx_ops
> > * int sgxfs_get_tree()
> > * int sgxfs_fill_super()
> > * const struct tree_descr sgxfs_files[]
> >
> > The files would be defined as like this:
> >
> > static const struct tree_descr sgxfs_files[] {
> >         [SGXFS_ENCLAVE] = { "enclave", &sgxfs_encl_ops, /* ? */ },
> >         [SGXFS_PROVISION] = { "provision", &sgxfs_encl_ops, /* ? */ },
> > };
> >
> > The permissions would need to be completely static, which feels nasty
> > from maintainability viewpoint :-( And I see no gain anywhere.
> >
> > In my opinion udev defining the whole /dev as noexec has zero technical
> > merits. It is same as they would say that "we don't trust our own
> > database". There are no real security benefits as long as dev nodes are
> > configured correctly.
> 
> I tend to agree, but it could be seen as a sort-of-valid hardening measure.

Bets are not great anyway to revert it so lets just consider it as a
constraint then.

> I think the best bet is for you to ignore this and let userspace figure it out.

Yeah. You could put statically 0700 permissions and then let systemd to
set the legit ones but it just feels a way too big hammer to do a new fs
just to sort out a simple permission issue.

/Jarkko
Jethro Beekman April 7, 2020, 7:15 a.m. UTC | #27
On 2020-04-07 01:18, Andy Lutomirski wrote:
> On Mon, Apr 6, 2020 at 2:24 PM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
>>
>> On Mon, Apr 06, 2020 at 12:53:41PM -0700, Andy Lutomirski wrote:
>>>> sgxfs is somewhat trivial to implement and has one stakeholder less to
>>>> worry about. It is not really a huge stretch.
>>>>
>>>> Overally, I think it is something that we could live with. At least it
>>>> is something that does not step on others toes.
>>>>
>>>> Haitao: If we go with sgxfs route, then you can for the moment do what
>>>> Andy suggested: bind mount it to /dev/sgx.
>>>
>>> That also needs userspace support.
>>>
>>> I’ll start a thread on the udev list.
>>
>> Haven't written any code yet but just by drafting the idea I already
>> pinpointed something.
>>
>> You would add roughly this to the existing codebase:
>>
>> * struct file_system_type sgxfs
>> * struct fs_context_operations sgxfs_ctx_ops
>> * int sgxfs_get_tree()
>> * int sgxfs_fill_super()
>> * const struct tree_descr sgxfs_files[]
>>
>> The files would be defined as like this:
>>
>> static const struct tree_descr sgxfs_files[] {
>>         [SGXFS_ENCLAVE] = { "enclave", &sgxfs_encl_ops, /* ? */ },
>>         [SGXFS_PROVISION] = { "provision", &sgxfs_encl_ops, /* ? */ },
>> };
>>
>> The permissions would need to be completely static, which feels nasty
>> from maintainability viewpoint :-( And I see no gain anywhere.
>>
>> In my opinion udev defining the whole /dev as noexec has zero technical
>> merits. It is same as they would say that "we don't trust our own
>> database". There are no real security benefits as long as dev nodes are
>> configured correctly.
> 
> I tend to agree, but it could be seen as a sort-of-valid hardening measure.

Yeah it seems to me `noexec` is overly broad in not allowing mmap(PROT_EXEC). I'd expect it to disallow only execve(), which is fine for /dev.

--
Jethro Beekman | Fortanix
Topi Miettinen April 7, 2020, 8:48 a.m. UTC | #28
On 7.4.2020 0.24, Jarkko Sakkinen wrote:
> In my opinion udev defining the whole /dev as noexec has zero technical
> merits. It is same as they would say that "we don't trust our own
> database". There are no real security benefits as long as dev nodes are
> configured correctly.

The threat is not that the device nodes would have execute permissions, 
but that a malicious entity with write access to /dev would create a new 
executable and run it, or rather, trick another (perhaps more privileged 
or more vulnerable) entity to do so. The malicious entity does not need 
any capabilities and it can be constrained by any number of typical 
seccomp filters which just don't block such basic system calls as 
open(), write(), [f]chmod() and close(). It simply needs to have UID 0 
(possibly something else, like suitable GID could also be sufficient for 
some subdirectories) and write access to /dev (or its subdirectories) in 
its mount namespace.

My philosophy is that "trust" means confidence that an action will not 
be done even when there's no control over it. "Control" means that it's 
possible to make active decision on whether the action can or cannot be 
allowed to be done. Trust in security mindset is a weak thing, control 
is stronger, but the strongest case is when you don't need trust nor 
control: the action simply can't ever happen because it's impossible or 
always forbidden. This idea is shown in such famous principles as "least 
privilege", "need to know" or compartmentalization. If the additional 
privilege of exec is not needed, it should not exist.

-Topi
Topi Miettinen April 7, 2020, 9:04 a.m. UTC | #29
Please correct me if I'm wrong, but isn't it the goal of SGX to let a 
(suitably privileged) process designate some of its memory areas as part 
of SGX enclave? If so, why don't you simply add a system call to do so, 
such as

int sgx_mprotect(void *start, size_t length, int prot, u64 sgx_flags);

like existing pkey_mprotect()? Or add a flag PROT_SGX to mprotect() like 
existing PROT_SAO/PROT_SEM?

-Topi
Jarkko Sakkinen April 7, 2020, 4:52 p.m. UTC | #30
On Tue, Apr 07, 2020 at 11:48:10AM +0300, Topi Miettinen wrote:
> On 7.4.2020 0.24, Jarkko Sakkinen wrote:
> > In my opinion udev defining the whole /dev as noexec has zero technical
> > merits. It is same as they would say that "we don't trust our own
> > database". There are no real security benefits as long as dev nodes are
> > configured correctly.
> 
> The threat is not that the device nodes would have execute permissions, but
> that a malicious entity with write access to /dev would create a new
> executable and run it, or rather, trick another (perhaps more privileged or
> more vulnerable) entity to do so. The malicious entity does not need any
> capabilities and it can be constrained by any number of typical seccomp
> filters which just don't block such basic system calls as open(), write(),
> [f]chmod() and close(). It simply needs to have UID 0 (possibly something
> else, like suitable GID could also be sufficient for some subdirectories)
> and write access to /dev (or its subdirectories) in its mount namespace.
> 
> My philosophy is that "trust" means confidence that an action will not be
> done even when there's no control over it. "Control" means that it's
> possible to make active decision on whether the action can or cannot be
> allowed to be done. Trust in security mindset is a weak thing, control is
> stronger, but the strongest case is when you don't need trust nor control:
> the action simply can't ever happen because it's impossible or always
> forbidden. This idea is shown in such famous principles as "least
> privilege", "need to know" or compartmentalization. If the additional
> privilege of exec is not needed, it should not exist.
> 
> -Topi

I get the threat scenario, thanks.

The problem (as Jethro correctly pointed out) with noexec /dev is
somewhat broad.

Thank you anyway for taking time describing the threat scenario.

/Jarkko
Jarkko Sakkinen April 7, 2020, 4:57 p.m. UTC | #31
On Tue, Apr 07, 2020 at 12:04:58PM +0300, Topi Miettinen wrote:
> Please correct me if I'm wrong, but isn't it the goal of SGX to let a
> (suitably privileged) process designate some of its memory areas as part of
> SGX enclave? If so, why don't you simply add a system call to do so, such as
> 
> int sgx_mprotect(void *start, size_t length, int prot, u64 sgx_flags);
> 
> like existing pkey_mprotect()? Or add a flag PROT_SGX to mprotect() like
> existing PROT_SAO/PROT_SEM?
> 
> -Topi

New syscalls is always the last resort path, especially if they are
associated with an arch.

PROT_SGX sounds something worth of consideration.

Another idea to throw would be noexec_dev mount option that would allow
exec *only* for the device nodes (zero analysis done on feasibility).

/Jarkko
Jarkko Sakkinen April 7, 2020, 4:59 p.m. UTC | #32
On Tue, Apr 07, 2020 at 07:57:08PM +0300, Jarkko Sakkinen wrote:
> On Tue, Apr 07, 2020 at 12:04:58PM +0300, Topi Miettinen wrote:
> > Please correct me if I'm wrong, but isn't it the goal of SGX to let a
> > (suitably privileged) process designate some of its memory areas as part of
> > SGX enclave? If so, why don't you simply add a system call to do so, such as
> > 
> > int sgx_mprotect(void *start, size_t length, int prot, u64 sgx_flags);
> > 
> > like existing pkey_mprotect()? Or add a flag PROT_SGX to mprotect() like
> > existing PROT_SAO/PROT_SEM?
> > 
> > -Topi
> 
> New syscalls is always the last resort path, especially if they are
> associated with an arch.
> 
> PROT_SGX sounds something worth of consideration.
> 
> Another idea to throw would be noexec_dev mount option that would allow
> exec *only* for the device nodes (zero analysis done on feasibility).

The 2nd proposal has the merit that it would scale above SGX and
does not give additional strengths to the adversary in the context
of the threat scenario.

/Jarkko
Jarkko Sakkinen April 7, 2020, 6:04 p.m. UTC | #33
On Tue, Apr 07, 2020 at 07:59:00PM +0300, Jarkko Sakkinen wrote:
> On Tue, Apr 07, 2020 at 07:57:08PM +0300, Jarkko Sakkinen wrote:
> > On Tue, Apr 07, 2020 at 12:04:58PM +0300, Topi Miettinen wrote:
> > > Please correct me if I'm wrong, but isn't it the goal of SGX to let a
> > > (suitably privileged) process designate some of its memory areas as part of
> > > SGX enclave? If so, why don't you simply add a system call to do so, such as
> > > 
> > > int sgx_mprotect(void *start, size_t length, int prot, u64 sgx_flags);
> > > 
> > > like existing pkey_mprotect()? Or add a flag PROT_SGX to mprotect() like
> > > existing PROT_SAO/PROT_SEM?
> > > 
> > > -Topi
> > 
> > New syscalls is always the last resort path, especially if they are
> > associated with an arch.
> > 
> > PROT_SGX sounds something worth of consideration.
> > 
> > Another idea to throw would be noexec_dev mount option that would allow
> > exec *only* for the device nodes (zero analysis done on feasibility).
> 
> The 2nd proposal has the merit that it would scale above SGX and
> does not give additional strengths to the adversary in the context
> of the threat scenario.

Or.

Why couldn't kernel just disallow anything but device files to be
created to devtmpfs unconditionally?

Then noexec would not be needed in the first place.

/Jarkko
Topi Miettinen April 7, 2020, 7:54 p.m. UTC | #34
On 7.4.2020 21.04, Jarkko Sakkinen wrote:
> On Tue, Apr 07, 2020 at 07:59:00PM +0300, Jarkko Sakkinen wrote:
>> On Tue, Apr 07, 2020 at 07:57:08PM +0300, Jarkko Sakkinen wrote:
>>> On Tue, Apr 07, 2020 at 12:04:58PM +0300, Topi Miettinen wrote:
>>>> Please correct me if I'm wrong, but isn't it the goal of SGX to let a
>>>> (suitably privileged) process designate some of its memory areas as part of
>>>> SGX enclave? If so, why don't you simply add a system call to do so, such as
>>>>
>>>> int sgx_mprotect(void *start, size_t length, int prot, u64 sgx_flags);
>>>>
>>>> like existing pkey_mprotect()? Or add a flag PROT_SGX to mprotect() like
>>>> existing PROT_SAO/PROT_SEM?
>>>>
>>>> -Topi
>>>
>>> New syscalls is always the last resort path, especially if they are
>>> associated with an arch.
>>>
>>> PROT_SGX sounds something worth of consideration.
>>>
>>> Another idea to throw would be noexec_dev mount option that would allow
>>> exec *only* for the device nodes (zero analysis done on feasibility).
>>
>> The 2nd proposal has the merit that it would scale above SGX and
>> does not give additional strengths to the adversary in the context
>> of the threat scenario.
> 
> Or.
> 
> Why couldn't kernel just disallow anything but device files to be
> created to devtmpfs unconditionally?

It seems to be just a regular tmpfs but with direct access to add device 
nodes when drivers are loaded. Nice idea, maybe something like that 
could be done with SELinux policy.

-Topi
Jarkko Sakkinen April 8, 2020, 1:40 p.m. UTC | #35
On Tue, Apr 07, 2020 at 10:54:46PM +0300, Topi Miettinen wrote:
> On 7.4.2020 21.04, Jarkko Sakkinen wrote:
> > On Tue, Apr 07, 2020 at 07:59:00PM +0300, Jarkko Sakkinen wrote:
> > > On Tue, Apr 07, 2020 at 07:57:08PM +0300, Jarkko Sakkinen wrote:
> > > > On Tue, Apr 07, 2020 at 12:04:58PM +0300, Topi Miettinen wrote:
> > > > > Please correct me if I'm wrong, but isn't it the goal of SGX to let a
> > > > > (suitably privileged) process designate some of its memory areas as part of
> > > > > SGX enclave? If so, why don't you simply add a system call to do so, such as
> > > > > 
> > > > > int sgx_mprotect(void *start, size_t length, int prot, u64 sgx_flags);
> > > > > 
> > > > > like existing pkey_mprotect()? Or add a flag PROT_SGX to mprotect() like
> > > > > existing PROT_SAO/PROT_SEM?
> > > > > 
> > > > > -Topi
> > > > 
> > > > New syscalls is always the last resort path, especially if they are
> > > > associated with an arch.
> > > > 
> > > > PROT_SGX sounds something worth of consideration.
> > > > 
> > > > Another idea to throw would be noexec_dev mount option that would allow
> > > > exec *only* for the device nodes (zero analysis done on feasibility).
> > > 
> > > The 2nd proposal has the merit that it would scale above SGX and
> > > does not give additional strengths to the adversary in the context
> > > of the threat scenario.
> > 
> > Or.
> > 
> > Why couldn't kernel just disallow anything but device files to be
> > created to devtmpfs unconditionally?
> 
> It seems to be just a regular tmpfs but with direct access to add device
> nodes when drivers are loaded. Nice idea, maybe something like that could be
> done with SELinux policy.

Anyway, thank you for all the feedback.

What starts to be obvious is that we don't do anything in code level
in SGX particular but instead workaround something around /dev.

/Jarkko
Sean Christopherson April 8, 2020, 2:56 p.m. UTC | #36
On Wed, Apr 08, 2020 at 04:40:49PM +0300, Jarkko Sakkinen wrote:
> What starts to be obvious is that we don't do anything in code level
> in SGX particular but instead workaround something around /dev.

Can you summarize the plan going forward?  Thanks!
Topi Miettinen April 8, 2020, 9:15 p.m. UTC | #37
On 8.4.2020 16.40, Jarkko Sakkinen wrote:
> On Tue, Apr 07, 2020 at 10:54:46PM +0300, Topi Miettinen wrote:
>> On 7.4.2020 21.04, Jarkko Sakkinen wrote:
>>> On Tue, Apr 07, 2020 at 07:59:00PM +0300, Jarkko Sakkinen wrote:
>>>> On Tue, Apr 07, 2020 at 07:57:08PM +0300, Jarkko Sakkinen wrote:
>>>>> On Tue, Apr 07, 2020 at 12:04:58PM +0300, Topi Miettinen wrote:
>>>>>> Please correct me if I'm wrong, but isn't it the goal of SGX to let a
>>>>>> (suitably privileged) process designate some of its memory areas as part of
>>>>>> SGX enclave? If so, why don't you simply add a system call to do so, such as
>>>>>>
>>>>>> int sgx_mprotect(void *start, size_t length, int prot, u64 sgx_flags);
>>>>>>
>>>>>> like existing pkey_mprotect()? Or add a flag PROT_SGX to mprotect() like
>>>>>> existing PROT_SAO/PROT_SEM?
>>>>>>
>>>>>> -Topi
>>>>>
>>>>> New syscalls is always the last resort path, especially if they are
>>>>> associated with an arch.
>>>>>
>>>>> PROT_SGX sounds something worth of consideration.
>>>>>
>>>>> Another idea to throw would be noexec_dev mount option that would allow
>>>>> exec *only* for the device nodes (zero analysis done on feasibility).
>>>>
>>>> The 2nd proposal has the merit that it would scale above SGX and
>>>> does not give additional strengths to the adversary in the context
>>>> of the threat scenario.
>>>
>>> Or.
>>>
>>> Why couldn't kernel just disallow anything but device files to be
>>> created to devtmpfs unconditionally?
>>
>> It seems to be just a regular tmpfs but with direct access to add device
>> nodes when drivers are loaded. Nice idea, maybe something like that could be
>> done with SELinux policy.
> 
> Anyway, thank you for all the feedback.
> 
> What starts to be obvious is that we don't do anything in code level
> in SGX particular but instead workaround something around /dev.

If you take the /dev/sgx path, perhaps you could use KVM as a reference. 
It uses a similar special device /dev/kvm, works well with noexec /dev 
but still it can be used to do much more complex stuff than SGX.

-Topi
Sean Christopherson April 8, 2020, 9:29 p.m. UTC | #38
On Thu, Apr 09, 2020 at 12:15:36AM +0300, Topi Miettinen wrote:
> On 8.4.2020 16.40, Jarkko Sakkinen wrote:
> >What starts to be obvious is that we don't do anything in code level
> >in SGX particular but instead workaround something around /dev.
> 
> If you take the /dev/sgx path, perhaps you could use KVM as a reference. It
> uses a similar special device /dev/kvm, works well with noexec /dev but
> still it can be used to do much more complex stuff than SGX.

But userspace doesn't need to mmap() /dev/kvm with PROT_EXEC, that's the
rub.  KVM uses anon inodes for VMs, vCPUs, etc..., but doing that on SGX
runs afould of SELinux's PROCESS_EXECMEM, again due to mmap() PROT_EXEC.
Jarkko Sakkinen April 9, 2020, 6:39 p.m. UTC | #39
On Wed, Apr 08, 2020 at 07:56:36AM -0700, Sean Christopherson wrote:
> On Wed, Apr 08, 2020 at 04:40:49PM +0300, Jarkko Sakkinen wrote:
> > What starts to be obvious is that we don't do anything in code level
> > in SGX particular but instead workaround something around /dev.
> 
> Can you summarize the plan going forward?  Thanks!

Sure.

The summary is that the permission problem should be solved outside of
SGX. Doing an FS is somewhat big hammer to sort out the permission
problem. It will also make configuration more clunky i.e.  kernel
defines some static permissions and then let say the systemd unit file
will overwrite them with some other. It is somewhat more sound to have
the udev db to contain the permissions.

Admitting that we don't have exact solutions I think we have enough
knowledge to say that things can be refined to work for SGX outside of
SGX.

I've been playing with the idea to add something like SB_I_DEVMMAPEXEC [1]
to superblock flags. This would be set by devtmpfs
(drivers/base/devtmpfs.c) initialization code. Then do_mmap_pgoff should
check this flag from superblock and that we are mapping a device node.
If these premises are fulfilled, then it would allow mmap().

This will make exec work when mapping device nodes but does not allow
attacker to put executable to /dev and run them in the threat scenario
described by Topi [2].

Andy also mentioned that he was going mail to udev ML but have not
checked that at this point.

[1] For explicit control of this new behavior also mnt flag is needed.
[2] https://patchwork.kernel.org/patch/11467637/#23269417

/Jarkko
Jethro Beekman Nov. 19, 2020, 7:23 a.m. UTC | #40
On 2020-04-06 21:53, Andy Lutomirski wrote:
> 
> 
>> On Apr 6, 2020, at 11:55 AM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>
>> On Mon, Apr 06, 2020 at 09:44:19AM -0700, Andy Lutomirski wrote:
>>>
>>>>> On Apr 6, 2020, at 4:01 AM, Topi Miettinen <toiwoton@gmail.com> wrote:
>>>>
>>>> On 6.4.2020 9.42, Jethro Beekman wrote:
>>>>> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev.
>>>>> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all?
>>>>
>>>> Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow?
>>>>
>>>> Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems.
>>>>
>>>>
>>>
>>> It’s *possible* to create a tmpfs, create the sgx nodes on it,
>>> bind-mount to /dev/sgx/..., and lazy-unmount the tmpfs.
>>>
>>> I don’t know whether udev would be willing to support such a thing.
>>
>> sgxfs is somewhat trivial to implement and has one stakeholder less to
>> worry about. It is not really a huge stretch.
>>
>> Overally, I think it is something that we could live with. At least it
>> is something that does not step on others toes.
>>
>> Haitao: If we go with sgxfs route, then you can for the moment do what
>> Andy suggested: bind mount it to /dev/sgx.
> 
> That also needs userspace support.
> 
> I’ll start a thread on the udev list.

Andy, can you send a link to this thread?

--
Jethro Beekman | Fortanix
Andy Lutomirski Nov. 19, 2020, 4:09 p.m. UTC | #41
On Wed, Nov 18, 2020 at 11:23 PM Jethro Beekman <jethro@fortanix.com> wrote:
>
> On 2020-04-06 21:53, Andy Lutomirski wrote:
> >
> >
> >> On Apr 6, 2020, at 11:55 AM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> >>
> >> On Mon, Apr 06, 2020 at 09:44:19AM -0700, Andy Lutomirski wrote:
> >>>
> >>>>> On Apr 6, 2020, at 4:01 AM, Topi Miettinen <toiwoton@gmail.com> wrote:
> >>>>
> >>>> On 6.4.2020 9.42, Jethro Beekman wrote:
> >>>>> On 2020-04-04 09:27, Topi Miettinen wrote> Then initramfs should make a similar exception as with v86d and grant exec to /dev.
> >>>>> I'm not sure this is a reasonable approach. Expect most devices with an Intel processor will have the SGX device going forward. Then, no one is using noexec, so why have this logic at all?
> >>>>
> >>>> Intel does not control the whole market yet, does AMD also offer SGX or similar? Will SGX be also available for consumer devices? Are distros going to enable SGX, will it benefit their users somehow?
> >>>>
> >>>> Perhaps the sgxfs approach or something else (system call?) would be better after all in order to not force exec just because of one device. /dev is usually writable, so allowing exec means breaking the W^X principle for filesystems.
> >>>>
> >>>>
> >>>
> >>> It’s *possible* to create a tmpfs, create the sgx nodes on it,
> >>> bind-mount to /dev/sgx/..., and lazy-unmount the tmpfs.
> >>>
> >>> I don’t know whether udev would be willing to support such a thing.
> >>
> >> sgxfs is somewhat trivial to implement and has one stakeholder less to
> >> worry about. It is not really a huge stretch.
> >>
> >> Overally, I think it is something that we could live with. At least it
> >> is something that does not step on others toes.
> >>
> >> Haitao: If we go with sgxfs route, then you can for the moment do what
> >> Andy suggested: bind mount it to /dev/sgx.
> >
> > That also needs userspace support.
> >
> > I’ll start a thread on the udev list.
>
> Andy, can you send a link to this thread?
>

Hmm, I may never have sent the email.  Let me do this right now.
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index 997a7f4117c5..1c825ef957db 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -6,6 +6,7 @@ 
 #include <linux/mman.h>
 #include <linux/security.h>
 #include <linux/suspend.h>
+#include <linux/anon_inodes.h>
 #include <asm/traps.h>
 #include "driver.h"
 #include "encl.h"
@@ -21,35 +22,7 @@  u64 sgx_attributes_reserved_mask;
 u64 sgx_xfrm_reserved_mask = ~0x3;
 u32 sgx_xsave_size_tbl[64];
 
-static int sgx_open(struct inode *inode, struct file *file)
-{
-	struct sgx_encl *encl;
-	int ret;
-
-	encl = kzalloc(sizeof(*encl), GFP_KERNEL);
-	if (!encl)
-		return -ENOMEM;
-
-	atomic_set(&encl->flags, 0);
-	kref_init(&encl->refcount);
-	INIT_LIST_HEAD(&encl->va_pages);
-	INIT_RADIX_TREE(&encl->page_tree, GFP_KERNEL);
-	mutex_init(&encl->lock);
-	INIT_LIST_HEAD(&encl->mm_list);
-	spin_lock_init(&encl->mm_lock);
-
-	ret = init_srcu_struct(&encl->srcu);
-	if (ret) {
-		kfree(encl);
-		return ret;
-	}
-
-	file->private_data = encl;
-
-	return 0;
-}
-
-static int sgx_release(struct inode *inode, struct file *file)
+static int sgx_encl_file_release(struct inode *inode, struct file *file)
 {
 	struct sgx_encl *encl = file->private_data;
 	struct sgx_encl_mm *encl_mm;
@@ -84,6 +57,68 @@  static int sgx_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static const struct file_operations sgx_encl_file_fops = {
+	.owner			= THIS_MODULE,
+	.release		= sgx_encl_file_release,
+};
+
+static int sgx_open(struct inode *inode, struct file *file)
+{
+	struct file *encl_file = NULL;
+	struct sgx_encl *encl = NULL;
+	int ret;
+
+	encl = kzalloc(sizeof(*encl), GFP_KERNEL);
+	if (!encl) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	encl_file = anon_inode_getfile("[sgx]", &sgx_encl_file_fops, encl,
+				       O_RDWR);
+	if (IS_ERR(encl_file)) {
+		ret = PTR_ERR(encl_file);
+		goto err;
+	}
+
+	ret = init_srcu_struct(&encl->srcu);
+	if (ret)
+		goto err;
+
+	atomic_set(&encl->flags, 0);
+	kref_init(&encl->refcount);
+	INIT_LIST_HEAD(&encl->va_pages);
+	INIT_RADIX_TREE(&encl->page_tree, GFP_KERNEL);
+	mutex_init(&encl->lock);
+	INIT_LIST_HEAD(&encl->mm_list);
+	spin_lock_init(&encl->mm_lock);
+
+	file->private_data = encl_file;
+
+	return 0;
+
+err:
+	if (encl_file)
+		fput(encl_file);
+
+	kfree(encl);
+	return ret;
+}
+
+static int sgx_encl_dev_release(struct inode *inode, struct file *file)
+{
+	struct file *encl_file = file->private_data;
+
+	/*
+	 * Can be NULL when the enclave file has been handed over to the
+	 * user space.
+	 */
+	if (encl_file)
+		fput(encl_file);
+
+	return 0;
+}
+
 #ifdef CONFIG_COMPAT
 static long sgx_compat_ioctl(struct file *filep, unsigned int cmd,
 			      unsigned long arg)
@@ -94,7 +129,8 @@  static long sgx_compat_ioctl(struct file *filep, unsigned int cmd,
 
 static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
 {
-	struct sgx_encl *encl = file->private_data;
+	struct file *encl_file = file->private_data;
+	struct sgx_encl *encl = encl_file->private_data;
 	int ret;
 
 	ret = sgx_encl_may_map(encl, vma->vm_start, vma->vm_end,
@@ -128,10 +164,10 @@  static unsigned long sgx_get_unmapped_area(struct file *file,
 	return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
 }
 
-static const struct file_operations sgx_encl_fops = {
+static const struct file_operations sgx_encl_dev_fops = {
 	.owner			= THIS_MODULE,
 	.open			= sgx_open,
-	.release		= sgx_release,
+	.release		= sgx_encl_dev_release,
 	.unlocked_ioctl		= sgx_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl		= sgx_compat_ioctl,
@@ -148,7 +184,7 @@  static struct miscdevice sgx_dev_enclave = {
 	.minor = MISC_DYNAMIC_MINOR,
 	.name = "enclave",
 	.nodename = "sgx/enclave",
-	.fops = &sgx_encl_fops,
+	.fops = &sgx_encl_dev_fops,
 };
 
 static struct miscdevice sgx_dev_provision = {
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 3af0596530a8..891aa9395907 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -766,7 +766,8 @@  static long sgx_ioc_enclave_set_attribute(struct sgx_encl *encl,
 
 long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 {
-	struct sgx_encl *encl = filep->private_data;
+	struct file *encl_file = filep->private_data;
+	struct sgx_encl *encl = encl_file->private_data;
 	int ret, encl_flags;
 
 	encl_flags = atomic_fetch_or(SGX_ENCL_IOCTL, &encl->flags);