Message ID | 20181214215729.4221-1-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | x86: Add vDSO exception fixup for SGX | expand |
On Fri, Dec 14, 2018 at 01:57:24PM -0800, Sean Christopherson wrote: > __vdso_sgx_enter_enclave() gets another rewrite, this time to strip > it down to the bare minimum and explicitly break compliance with the > x86-64 ABI. Feedback from v4 revealed that __vdso_sgx_enter_enclave() > would need to save (a lot) more than just the non-volatile GPRs to be > compliant with the x86-64 ABI, at which point breaking from the ABI > completely became much more palatable. > > The non-standard ABI also solves the question of "which GPRs should be > marshalled to/from the enclave" by getting out of the way entirely and > letting userspace have free reign (except for RSP, which has a big ol' > DO NOT TOUCH sign on it). Can you share a reference, or is this better documented in the accompanied patches? /Jarkko
On Tue, Dec 18, 2018 at 06:18:15AM +0200, Jarkko Sakkinen wrote: > On Fri, Dec 14, 2018 at 01:57:24PM -0800, Sean Christopherson wrote: > > __vdso_sgx_enter_enclave() gets another rewrite, this time to strip > > it down to the bare minimum and explicitly break compliance with the > > x86-64 ABI. Feedback from v4 revealed that __vdso_sgx_enter_enclave() > > would need to save (a lot) more than just the non-volatile GPRs to be > > compliant with the x86-64 ABI, at which point breaking from the ABI > > completely became much more palatable. > > > > The non-standard ABI also solves the question of "which GPRs should be > > marshalled to/from the enclave" by getting out of the way entirely and > > letting userspace have free reign (except for RSP, which has a big ol' > > DO NOT TOUCH sign on it). > > Can you share a reference, or is this better documented in the > accompanied patches? Patch 5/5 has more details, and v4 of the series has a lot of background info by way of discussion: https://lkml.kernel.org/r/20181213213135.12913-1-sean.j.christopherson@intel.com
On Tue, Dec 18, 2018 at 07:08:15AM -0800, Sean Christopherson wrote: > On Tue, Dec 18, 2018 at 06:18:15AM +0200, Jarkko Sakkinen wrote: > > On Fri, Dec 14, 2018 at 01:57:24PM -0800, Sean Christopherson wrote: > > > __vdso_sgx_enter_enclave() gets another rewrite, this time to strip > > > it down to the bare minimum and explicitly break compliance with the > > > x86-64 ABI. Feedback from v4 revealed that __vdso_sgx_enter_enclave() > > > would need to save (a lot) more than just the non-volatile GPRs to be > > > compliant with the x86-64 ABI, at which point breaking from the ABI > > > completely became much more palatable. > > > > > > The non-standard ABI also solves the question of "which GPRs should be > > > marshalled to/from the enclave" by getting out of the way entirely and > > > letting userspace have free reign (except for RSP, which has a big ol' > > > DO NOT TOUCH sign on it). > > > > Can you share a reference, or is this better documented in the > > accompanied patches? > > Patch 5/5 has more details, and v4 of the series has a lot of background > info by way of discussion: > > https://lkml.kernel.org/r/20181213213135.12913-1-sean.j.christopherson@intel.com Thanks Sean, I will. Have not had yet much time to look at this. /Jarkko
On Wed, Dec 19, 2018 at 06:43:46AM +0200, Jarkko Sakkinen wrote: > On Tue, Dec 18, 2018 at 07:08:15AM -0800, Sean Christopherson wrote: > > On Tue, Dec 18, 2018 at 06:18:15AM +0200, Jarkko Sakkinen wrote: > > > On Fri, Dec 14, 2018 at 01:57:24PM -0800, Sean Christopherson wrote: > > > > __vdso_sgx_enter_enclave() gets another rewrite, this time to strip > > > > it down to the bare minimum and explicitly break compliance with the > > > > x86-64 ABI. Feedback from v4 revealed that __vdso_sgx_enter_enclave() > > > > would need to save (a lot) more than just the non-volatile GPRs to be > > > > compliant with the x86-64 ABI, at which point breaking from the ABI > > > > completely became much more palatable. > > > > > > > > The non-standard ABI also solves the question of "which GPRs should be > > > > marshalled to/from the enclave" by getting out of the way entirely and > > > > letting userspace have free reign (except for RSP, which has a big ol' > > > > DO NOT TOUCH sign on it). > > > > > > Can you share a reference, or is this better documented in the > > > accompanied patches? > > > > Patch 5/5 has more details, and v4 of the series has a lot of background > > info by way of discussion: > > > > https://lkml.kernel.org/r/20181213213135.12913-1-sean.j.christopherson@intel.com > > Thanks Sean, I will. Have not had yet much time to look at this. Might make sense to summarize the design decisions for future patch set versions to the cover letter at least, namely the reasoning why not to follow the ABI. /Jarkko
I have pretty much figured out how to change the driver implementation from VMA based to file based. Most of the code in the driver can be reused with not that enormous changes. I think it is a clue that the architecture is somewhat right because changing the driver this radically does not seem to require any changes to the core. Using anon inode is the right choice because it is more robust interface to be able to create multiple enclaves. The only remaining open that I have when it comes to implementing this is the backing storage. From API perspective the most robust choice would be to revert to use shmem file. It would be easy then to create a complete construction flow without any dependencies to mm_struct. I do recognize the issue with accounting but to which process the backing storage should be accounted anyway in this new paradigm. I've attached the new uapi header to this email that I'm going forward with. /Jarkko
On 2018-12-19 13:28, Jarkko Sakkinen wrote: > I have pretty much figured out how to change the driver implementation > from VMA based to file based. Most of the code in the driver can be > reused with not that enormous changes. I think it is a clue that the > architecture is somewhat right because changing the driver this > radically does not seem to require any changes to the core. One weird thing is the departure from the normal mmap behavior that the memory mapping persists even if the original fd is closed. (See man mmap: "closing the file descriptor does not unmap the region.") > Using anon inode is the right choice because it is more robust interface > to be able to create multiple enclaves. > > The only remaining open that I have when it comes to implementing this > is the backing storage. From API perspective the most robust choice > would be to revert to use shmem file. It would be easy then to create a > complete construction flow without any dependencies to mm_struct. > > I do recognize the issue with accounting but to which process the > backing storage should be accounted anyway in this new paradigm. > > I've attached the new uapi header to this email that I'm going forward > with. > > /Jarkko > > sgx.h > > /* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ > /** > * Copyright(c) 2016-18 Intel Corporation. > */ > #ifndef _UAPI_ASM_X86_SGX_H > #define _UAPI_ASM_X86_SGX_H > > #include <linux/types.h> > #include <linux/ioctl.h> > > #define SGX_MAGIC 0xA4 > > #define SGX_IOC_ENCLAVE_CREATE \ > _IOW(SGX_MAGIC, 0x00, struct sgx_enclave_create) > #define SGX_IOC_ENCLAVE_ADD_PAGE \ > _IOW(SGX_MAGIC, 0x01, struct sgx_enclave_add_page) > #define SGX_IOC_ENCLAVE_INIT \ > _IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init) > #define SGX_IOC_ENCLAVE_SET_ATTRIBUTE \ > _IOW(SGX_MAGIC, 0x03, struct sgx_enclave_set_attribute) > > /* IOCTL return values */ > #define SGX_POWER_LOST_ENCLAVE 0x40000000 > > /** > * struct sgx_enclave_create - parameter structure for the > * %SGX_IOC_ENCLAVE_CREATE ioctl > * @src: address for the SECS page data > * @enclave_fd: file handle to the enclave address space (out) > */ > struct sgx_enclave_create { > __u64 src; > __u64 enclave_fd; > }; > > /** > * struct sgx_enclave_add_page - parameter structure for the > * %SGX_IOC_ENCLAVE_ADD_PAGE ioctl > * @eclave_fd: file handle to the enclave address space > * @src: address for the page data > * @secinfo: address for the SECINFO data > * @mrmask: bitmask for the measured 256 byte chunks > */ > struct sgx_enclave_add_page { > __u64 enclave_fd; > __u64 src; > __u64 secinfo; > __u16 mrmask; > } __attribute__((__packed__)); Wouldn't you just pass enclave_fd as the ioctl fd parameter? How to specify the address of the page that is being added? > > > /** > * struct sgx_enclave_init - parameter structure for the > * %SGX_IOC_ENCLAVE_INIT ioctl > * @eclave_fd: file handle to the enclave address space > * @sigstruct: address for the SIGSTRUCT data > */ > struct sgx_enclave_init { > __u64 enclave_fd; > __u64 sigstruct; > }; > > /** > * struct sgx_enclave_set_attribute - parameter structure for the > * %SGX_IOC_ENCLAVE_INIT ioctl > * @addr: address within the ELRANGE Stray parameter in docs > * @eclave_fd: file handle to the enclave address space > * @attribute_fd: file handle of the attribute file in the securityfs > */ > struct sgx_enclave_set_attribute { > __u64 enclave_fd; > __u64 attribute_fd; > }; What is this for? > > #endif /* _UAPI_ASM_X86_SGX_H */ -- Jethro Beekman | Fortanix
On Wed, Dec 19, 2018 at 08:41:12AM +0000, Jethro Beekman wrote: > One weird thing is the departure from the normal mmap behavior that the > memory mapping persists even if the original fd is closed. (See man mmap: > "closing the file descriptor does not unmap the region.") The mmapped region and enclave would be completely disjoint to start with. The enclave driver code would assume that an enclave VMA exists when it maps enclave address space to a process. I.e. VMA would no longer reference to the enclave or vice versa but you would still create an enclave VMA with mmap(). This is IMHO very clear and well-defined semantics. > > struct sgx_enclave_add_page { > > __u64 enclave_fd; > > __u64 src; > > __u64 secinfo; > > __u16 mrmask; > > } __attribute__((__packed__)); > > Wouldn't you just pass enclave_fd as the ioctl fd parameter? I'm still planning to keep the API in the device fd and use enclave_fd as handle to the enclave address space. I don't see any obvious reason to change that behavior. And if we ever add any "global" ioctls, then we would have to define APIs to both fd's, which would become a mess. > How to specify the address of the page that is being added? Yes, that is correct and my bad to remove it (just quickly drafted what I had in mind). /Jarkko
On 2018-12-19 14:41, Jarkko Sakkinen wrote: > On Wed, Dec 19, 2018 at 08:41:12AM +0000, Jethro Beekman wrote: >> One weird thing is the departure from the normal mmap behavior that the >> memory mapping persists even if the original fd is closed. (See man mmap: >> "closing the file descriptor does not unmap the region.") > > The mmapped region and enclave would be completely disjoint to start > with. The enclave driver code would assume that an enclave VMA exists > when it maps enclave address space to a process. > > I.e. VMA would no longer reference to the enclave or vice versa but > you would still create an enclave VMA with mmap(). > > This is IMHO very clear and well-defined semantics. > >>> struct sgx_enclave_add_page { >>> __u64 enclave_fd; >>> __u64 src; >>> __u64 secinfo; >>> __u16 mrmask; >>> } __attribute__((__packed__)); >> >> Wouldn't you just pass enclave_fd as the ioctl fd parameter? > > I'm still planning to keep the API in the device fd and use enclave_fd > as handle to the enclave address space. I don't see any obvious reason > to change that behavior. > > And if we ever add any "global" ioctls, then we would have to define > APIs to both fd's, which would become a mess. > >> How to specify the address of the page that is being added? > > Yes, that is correct and my bad to remove it (just quickly drafted what > I had in mind). So your plan is that to call EADD, userspace has to pass the device fd AND the enclave fd AND the enclave address? That seems a little superfluous. -- Jethro Beekman | Fortanix
On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote: > On 2018-12-19 14:41, Jarkko Sakkinen wrote: > > On Wed, Dec 19, 2018 at 08:41:12AM +0000, Jethro Beekman wrote: > > > One weird thing is the departure from the normal mmap behavior that the > > > memory mapping persists even if the original fd is closed. (See man mmap: > > > "closing the file descriptor does not unmap the region.") > > > > The mmapped region and enclave would be completely disjoint to start > > with. The enclave driver code would assume that an enclave VMA exists > > when it maps enclave address space to a process. > > > > I.e. VMA would no longer reference to the enclave or vice versa but > > you would still create an enclave VMA with mmap(). > > > > This is IMHO very clear and well-defined semantics. > > > > > > struct sgx_enclave_add_page { > > > > __u64 enclave_fd; > > > > __u64 src; > > > > __u64 secinfo; > > > > __u16 mrmask; > > > > } __attribute__((__packed__)); > > > > > > Wouldn't you just pass enclave_fd as the ioctl fd parameter? > > > > I'm still planning to keep the API in the device fd and use enclave_fd > > as handle to the enclave address space. I don't see any obvious reason > > to change that behavior. > > > > And if we ever add any "global" ioctls, then we would have to define > > APIs to both fd's, which would become a mess. > > > > > How to specify the address of the page that is being added? > > > > Yes, that is correct and my bad to remove it (just quickly drafted what > > I had in mind). > > So your plan is that to call EADD, userspace has to pass the device fd AND > the enclave fd AND the enclave address? That seems a little superfluous. If the enclave fd would have ioctls to add pages etc., two ioctls APIs would be already needed now (create for device fd and rest to the enclave fd). Which one would be more superfluous? /Jarkko
On Wed, Dec 19, 2018 at 08:41:12AM +0000, Jethro Beekman wrote: Good morning, I everyone is weathering the pre-holiday season well. > On 2018-12-19 13:28, Jarkko Sakkinen wrote: > > * @eclave_fd: file handle to the enclave address space > > * @attribute_fd: file handle of the attribute file in the securityfs > > */ > >struct sgx_enclave_set_attribute { > > __u64 enclave_fd; > > __u64 attribute_fd; > >}; > What is this for? I believe it is a silent response to the issues we were prosecuting 4-5 weeks ago, regarding the requirement for an SGX driver on an FLC hardware platform to have some semblance of policy management to be relevant from a security/privacy perspective. It would have certainly been collegial to include a reference to our discussions and concerns in the changelog. See 364f68f5a3c in Jarkko's next/master. The changeset addresses enclave access to the PROVISION key but is still insufficient to deliver guarantees that are consistent with the SGX security model. In order to achieve that, policy management needs to embrace the use of MRSIGNER values, which is what our SFLC patchset uses. The noted changeset actually implements most of the 'kernel bloat' that our SFLC patchset needs to bolt onto. As of yesterday afternoon next/master still won't initialize a non-trivial enclave. Since there now appears to be a wholesale change in the driver architecture and UAPI we are sitting on the sidelines waiting for an indication all of that has some hope of working before we introduce our approach. Part of SFLC won't be popular but it is driven by clients who are actually paying for SGX security engineering and architectures. > Jethro Beekman | Fortanix Best wishes for a pleasant holiday season to everyone. Dr. Greg As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "Politics is the business of getting power and privilege without possessing merit." -- P.J. O'Rourke
On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote: > On 2018-12-19 14:41, Jarkko Sakkinen wrote: > >On Wed, Dec 19, 2018 at 08:41:12AM +0000, Jethro Beekman wrote: > >>One weird thing is the departure from the normal mmap behavior that the > >>memory mapping persists even if the original fd is closed. (See man mmap: > >>"closing the file descriptor does not unmap the region.") It won't (be a departure). mmap() on a file grabs a reference to the file, i.e. each VMA keeps a reference to the file. Closing the original enclave fd will only put its reference to the file/enclave, not destroy it outright. > > > >The mmapped region and enclave would be completely disjoint to start > >with. The enclave driver code would assume that an enclave VMA exists > >when it maps enclave address space to a process. > > > >I.e. VMA would no longer reference to the enclave or vice versa but > >you would still create an enclave VMA with mmap(). > > > >This is IMHO very clear and well-defined semantics. > > > >>>struct sgx_enclave_add_page { > >>> __u64 enclave_fd; > >>> __u64 src; > >>> __u64 secinfo; > >>> __u16 mrmask; > >>>} __attribute__((__packed__)); > >> > >>Wouldn't you just pass enclave_fd as the ioctl fd parameter? > > > >I'm still planning to keep the API in the device fd and use enclave_fd > >as handle to the enclave address space. I don't see any obvious reason > >to change that behavior. > > > >And if we ever add any "global" ioctls, then we would have to define > >APIs to both fd's, which would become a mess. > > > >>How to specify the address of the page that is being added? > > > >Yes, that is correct and my bad to remove it (just quickly drafted what > >I had in mind). > > So your plan is that to call EADD, userspace has to pass the device fd AND > the enclave fd AND the enclave address? That seems a little superfluous. I agree with Jethro, passing the enclave_fd as a param is obnoxious. And it means the user needs to open /dev/sgx to do anything with an enclave fd, e.g. the enclave fd might be passed to a builder thread, it shouldn't also need the device fd. E.g.: sgx_fd = open("/dev/sgx", O_RDWR); BUG_ON(sgx_fd < 0); enclave_fd = ioctl(sgx_fd, SGX_ENCLAVE_CREATE, &ecreate); BUG_ON(enclave_fd < 0); ret = ioctl(enclave_fd, SGX_ENCLAVE_ADD_PAGE, &eadd); BUG_ON(ret); ... ret = ioctl(enclave_fd, SGX_ENCLAVE_INIT, &einit); BUG_ON(ret); ... close(enclave_fd); close(sgx_fd); Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes and ioctls for VMs and vCPUs.
> On Dec 19, 2018, at 6:45 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > >> On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote: > I agree with Jethro, passing the enclave_fd as a param is obnoxious. > And it means the user needs to open /dev/sgx to do anything with an > enclave fd, e.g. the enclave fd might be passed to a builder thread, > it shouldn't also need the device fd. > > E.g.: > > sgx_fd = open("/dev/sgx", O_RDWR); > BUG_ON(sgx_fd < 0); > > enclave_fd = ioctl(sgx_fd, SGX_ENCLAVE_CREATE, &ecreate); > BUG_ON(enclave_fd < 0); > > ret = ioctl(enclave_fd, SGX_ENCLAVE_ADD_PAGE, &eadd); > BUG_ON(ret); > > ... > > ret = ioctl(enclave_fd, SGX_ENCLAVE_INIT, &einit); > BUG_ON(ret); > > ... > > close(enclave_fd); > close(sgx_fd); > > > Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes > and ioctls for VMs and vCPUs. Can one of you explain why SGX_ENCLAVE_CREATE is better than just opening a new instance of /dev/sgx for each encalve?
On Wed, Dec 19, 2018 at 06:45:15AM -0800, Sean Christopherson wrote: > I agree with Jethro, passing the enclave_fd as a param is obnoxious. > And it means the user needs to open /dev/sgx to do anything with an > enclave fd, e.g. the enclave fd might be passed to a builder thread, Please note that this is not really a thing that I care that much in the end of the day because either approach is straight forward to implement. That is why asked from Jethro, which is more superfluous. > Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes > and ioctls for VMs and vCPUs. I actually grabbed anon inode code from in-kernel LE code and started to transform it to this framework just because I was familiar with that snippet (because I wrote it) but yeah the idea is similar as in there. /Jarkko
On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote: > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > opening a new instance of /dev/sgx for each encalve? I think that fits better to the SCM_RIGHTS scenario i.e. you could send the enclav to a process that does not have necessarily have rights to /dev/sgx. Gives more robust environment to configure SGX. /Jarkko
On Wed, Dec 19, 2018 at 08:43:43AM -0600, Dr. Greg wrote: > I believe it is a silent response to the issues we were prosecuting > 4-5 weeks ago, regarding the requirement for an SGX driver on an FLC > hardware platform to have some semblance of policy management to be > relevant from a security/privacy perspective. It would have certainly > been collegial to include a reference to our discussions and concerns > in the changelog. > > See 364f68f5a3c in Jarkko's next/master. > > The changeset addresses enclave access to the PROVISION key but is > still insufficient to deliver guarantees that are consistent with the > SGX security model. In order to achieve that, policy management needs > to embrace the use of MRSIGNER values, which is what our SFLC patchset > uses. > > The noted changeset actually implements most of the 'kernel bloat' > that our SFLC patchset needs to bolt onto. > > As of yesterday afternoon next/master still won't initialize a > non-trivial enclave. Since there now appears to be a wholesale change > in the driver architecture and UAPI we are sitting on the sidelines > waiting for an indication all of that has some hope of working before > we introduce our approach. > > Part of SFLC won't be popular but it is driven by clients who are > actually paying for SGX security engineering and architectures. How many of these people are actually posting here? /Jarkko
On Wed, Dec 19, 2018 at 10:26 AM Jethro Beekman <jethro@fortanix.com> wrote: > > On 2018-12-19 13:28, Jarkko Sakkinen wrote: > > /** > > * struct sgx_enclave_add_page - parameter structure for the > > * %SGX_IOC_ENCLAVE_ADD_PAGE ioctl > > * @eclave_fd: file handle to the enclave address space > > * @src: address for the page data > > * @secinfo: address for the SECINFO data > > * @mrmask: bitmask for the measured 256 byte chunks > > */ > > struct sgx_enclave_add_page { > > __u64 enclave_fd; > > __u64 src; > > __u64 secinfo; > > __u16 mrmask; > > } __attribute__((__packed__)); > > Wouldn't you just pass enclave_fd as the ioctl fd parameter? > > How to specify the address of the page that is being added? One more comment about the structure: I would generally recommend against packing structures like this. Instead just extend the mrmask member to 64 bits as well. Arnd
On Thu, Dec 20, 2018 at 01:08:46PM +0100, Arnd Bergmann wrote: > On Wed, Dec 19, 2018 at 10:26 AM Jethro Beekman <jethro@fortanix.com> wrote: > > > > On 2018-12-19 13:28, Jarkko Sakkinen wrote: > > > /** > > > * struct sgx_enclave_add_page - parameter structure for the > > > * %SGX_IOC_ENCLAVE_ADD_PAGE ioctl > > > * @eclave_fd: file handle to the enclave address space > > > * @src: address for the page data > > > * @secinfo: address for the SECINFO data > > > * @mrmask: bitmask for the measured 256 byte chunks > > > */ > > > struct sgx_enclave_add_page { > > > __u64 enclave_fd; > > > __u64 src; > > > __u64 secinfo; > > > __u16 mrmask; > > > } __attribute__((__packed__)); > > > > Wouldn't you just pass enclave_fd as the ioctl fd parameter? > > > > How to specify the address of the page that is being added? > > One more comment about the structure: I would generally recommend > against packing structures like this. Instead just extend the mrmask > member to 64 bits as well. Can do. Thanks for the remark. /Jarkko
On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote: > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote: > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > opening a new instance of /dev/sgx for each encalve? > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send > the enclav to a process that does not have necessarily have rights to > /dev/sgx. Gives more robust environment to configure SGX. My only open for the implementation is where to swap? If it is a VMA, whose VMA? Please share your views here. Not a blocker for me to work on the implementation, though. I'll use a private shmem file up until there is a better option. This ioctl API discussion is kind of meaningless for me ATM because it does not have that much effect to the internals even if it wouldn't be perfect in v19. Very trival to change. /Jarkko
On Thu, Dec 20, 2018 at 03:12:13PM +0200, Jarkko Sakkinen wrote: > On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote: > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote: > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > > opening a new instance of /dev/sgx for each encalve? > > > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send > > the enclav to a process that does not have necessarily have rights to > > /dev/sgx. Gives more robust environment to configure SGX. > > My only open for the implementation is where to swap? If it is a VMA, > whose VMA? > > Please share your views here. Not a blocker for me to work on the > implementation, though. I'll use a private shmem file up until there > is a better option. > > This ioctl API discussion is kind of meaningless for me ATM because it > does not have that much effect to the internals even if it wouldn't be > perfect in v19. Very trival to change. Oops, and after sending I realized that I started this thread asking comments about the API (I think I mentioned swapping though too) :-) The feedback has been valuable and I gained the required understanding about enclave_fd but I think that now the things have been saturated to minor details. Appreciate all the feedback so far. Sorry for a bit harsh statement. /Jarkko
On Thu, Dec 20, 2018 at 12:34:00PM +0200, Jarkko Sakkinen wrote: Good afternoon to everyone. > On Wed, Dec 19, 2018 at 08:43:43AM -0600, Dr. Greg wrote: > > I believe it is a silent response to the issues we were > > prosecuting 4-5 weeks ago, regarding the requirement for an SGX > > driver on an FLC hardware platform to have some semblance of > > policy management to be relevant from a security/privacy > > perspective. It would have certainly been collegial to include a > > reference to our discussions and concerns in the changelog. > > > > See 364f68f5a3c in Jarkko's next/master. > > > > The changeset addresses enclave access to the PROVISION key but is > > still insufficient to deliver guarantees that are consistent with > > the SGX security model. In order to achieve that, policy > > management needs to embrace the use of MRSIGNER values, which is > > what our SFLC patchset uses. > > > > The noted changeset actually implements most of the 'kernel bloat' > > that our SFLC patchset needs to bolt onto. > > > > As of yesterday afternoon next/master still won't initialize a > > non-trivial enclave. Since there now appears to be a wholesale > > change in the driver architecture and UAPI we are sitting on the > > sidelines waiting for an indication all of that has some hope of > > working before we introduce our approach. > > > > Part of SFLC won't be popular but it is driven by clients who are > > actually paying for SGX security engineering and architectures. > How many of these people are actually posting here? None that I know of. The individuals I was referring to are CISO's and security risk managers of multi-billion dollar corporations and/or 3-letter entities. It has been my own personal observation that they don't have time to post to the Linux Kernel Mailing List. The time they do spend on this technology seems to involve sitting in meetings and making decisions on whether or not to authorize capital expenditure budgets for Intel processors and chipsets, based on whether or not an SGX security stack can definably implement the security controls that are being imposed on their organizations by the government and/or their liability carriers. Such issues may be out of mainstream kernel concerns but hopefully not conceptually elusive with respect to their implications. > /Jarkko Merry Christmas and happy holidays to everyone. Dr. Greg As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "Don't talk unless you can improve the silence." -- George Will
On Thu, Dec 20, 2018 at 04:06:38PM -0600, Dr. Greg wrote: > On Thu, Dec 20, 2018 at 12:34:00PM +0200, Jarkko Sakkinen wrote: > > Good afternoon to everyone. > > > On Wed, Dec 19, 2018 at 08:43:43AM -0600, Dr. Greg wrote: > > > I believe it is a silent response to the issues we were > > > prosecuting 4-5 weeks ago, regarding the requirement for an SGX > > > driver on an FLC hardware platform to have some semblance of > > > policy management to be relevant from a security/privacy > > > perspective. It would have certainly been collegial to include a > > > reference to our discussions and concerns in the changelog. > > > > > > See 364f68f5a3c in Jarkko's next/master. > > > > > > The changeset addresses enclave access to the PROVISION key but is > > > still insufficient to deliver guarantees that are consistent with > > > the SGX security model. In order to achieve that, policy > > > management needs to embrace the use of MRSIGNER values, which is > > > what our SFLC patchset uses. > > > > > > The noted changeset actually implements most of the 'kernel bloat' > > > that our SFLC patchset needs to bolt onto. > > > > > > As of yesterday afternoon next/master still won't initialize a > > > non-trivial enclave. Since there now appears to be a wholesale > > > change in the driver architecture and UAPI we are sitting on the > > > sidelines waiting for an indication all of that has some hope of > > > working before we introduce our approach. > > > > > > Part of SFLC won't be popular but it is driven by clients who are > > > actually paying for SGX security engineering and architectures. > > > How many of these people are actually posting here? > > None that I know of. > > The individuals I was referring to are CISO's and security risk > managers of multi-billion dollar corporations and/or 3-letter > entities. It has been my own personal observation that they don't > have time to post to the Linux Kernel Mailing List. > > The time they do spend on this technology seems to involve sitting in > meetings and making decisions on whether or not to authorize capital > expenditure budgets for Intel processors and chipsets, based on > whether or not an SGX security stack can definably implement the > security controls that are being imposed on their organizations by the > government and/or their liability carriers. > > Such issues may be out of mainstream kernel concerns but hopefully not > conceptually elusive with respect to their implications. Well, the process is anyway what it is. I'm sending v18 and you are free to comment the code change associated with the provision. /Jarkko
On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote: > > On Dec 19, 2018, at 6:45 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > > >> On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote: > > > I agree with Jethro, passing the enclave_fd as a param is obnoxious. > > And it means the user needs to open /dev/sgx to do anything with an > > enclave fd, e.g. the enclave fd might be passed to a builder thread, > > it shouldn't also need the device fd. > > > > E.g.: > > > > sgx_fd = open("/dev/sgx", O_RDWR); > > BUG_ON(sgx_fd < 0); > > > > enclave_fd = ioctl(sgx_fd, SGX_ENCLAVE_CREATE, &ecreate); > > BUG_ON(enclave_fd < 0); > > > > ret = ioctl(enclave_fd, SGX_ENCLAVE_ADD_PAGE, &eadd); > > BUG_ON(ret); > > > > ... > > > > ret = ioctl(enclave_fd, SGX_ENCLAVE_INIT, &einit); > > BUG_ON(ret); > > > > ... > > > > close(enclave_fd); > > close(sgx_fd); > > > > > > Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes > > and ioctls for VMs and vCPUs. > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > opening a new instance of /dev/sgx for each encalve? Directly associating /dev/sgx with an enclave means /dev/sgx can't be used to provide ioctl()'s for other SGX-related needs, e.g. to mmap() raw EPC and expose it a VM. Proposed layout in the link below. I'll also respond to Jarkko's question about exposing EPC through /dev/sgx instead of having KVM allocate it on behalf of the VM. https://lkml.kernel.org/r/20181218185349.GC30082@linux.intel.com
> On Dec 21, 2018, at 9:28 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote: >>> On Dec 19, 2018, at 6:45 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: >>> >>>>> On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote: >>> >>> I agree with Jethro, passing the enclave_fd as a param is obnoxious. >>> And it means the user needs to open /dev/sgx to do anything with an >>> enclave fd, e.g. the enclave fd might be passed to a builder thread, >>> it shouldn't also need the device fd. >>> >>> E.g.: >>> >>> sgx_fd = open("/dev/sgx", O_RDWR); >>> BUG_ON(sgx_fd < 0); >>> >>> enclave_fd = ioctl(sgx_fd, SGX_ENCLAVE_CREATE, &ecreate); >>> BUG_ON(enclave_fd < 0); >>> >>> ret = ioctl(enclave_fd, SGX_ENCLAVE_ADD_PAGE, &eadd); >>> BUG_ON(ret); >>> >>> ... >>> >>> ret = ioctl(enclave_fd, SGX_ENCLAVE_INIT, &einit); >>> BUG_ON(ret); >>> >>> ... >>> >>> close(enclave_fd); >>> close(sgx_fd); >>> >>> >>> Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes >>> and ioctls for VMs and vCPUs. >> >> Can one of you explain why SGX_ENCLAVE_CREATE is better than just >> opening a new instance of /dev/sgx for each encalve? > > Directly associating /dev/sgx with an enclave means /dev/sgx can't be > used to provide ioctl()'s for other SGX-related needs, e.g. to mmap() > raw EPC and expose it a VM. Proposed layout in the link below. I'll > also respond to Jarkko's question about exposing EPC through /dev/sgx > instead of having KVM allocate it on behalf of the VM. Hmm. I guess this makes some sense. My instinct would be to do it a little differently and have: /dev/sgx/enclave: Each instance is an enclave. /dev/sgx/epc: Used to get raw EPC for KVM. Might have different permissions, perhaps 0660 and group kvm. /dev/sgx/something_else: For when SGX v3 adds something else :)
On Fri, Dec 21, 2018 at 09:12:46AM -0800, Andy Lutomirski wrote: > > On Dec 21, 2018, at 9:28 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote: > >>> On Dec 19, 2018, at 6:45 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > >>> > >>>>> On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote: > >>> > >>> I agree with Jethro, passing the enclave_fd as a param is obnoxious. > >>> And it means the user needs to open /dev/sgx to do anything with an > >>> enclave fd, e.g. the enclave fd might be passed to a builder thread, > >>> it shouldn't also need the device fd. > >>> > >>> E.g.: > >>> > >>> sgx_fd = open("/dev/sgx", O_RDWR); > >>> BUG_ON(sgx_fd < 0); > >>> > >>> enclave_fd = ioctl(sgx_fd, SGX_ENCLAVE_CREATE, &ecreate); > >>> BUG_ON(enclave_fd < 0); > >>> > >>> ret = ioctl(enclave_fd, SGX_ENCLAVE_ADD_PAGE, &eadd); > >>> BUG_ON(ret); > >>> > >>> ... > >>> > >>> ret = ioctl(enclave_fd, SGX_ENCLAVE_INIT, &einit); > >>> BUG_ON(ret); > >>> > >>> ... > >>> > >>> close(enclave_fd); > >>> close(sgx_fd); > >>> > >>> > >>> Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes > >>> and ioctls for VMs and vCPUs. > >> > >> Can one of you explain why SGX_ENCLAVE_CREATE is better than just > >> opening a new instance of /dev/sgx for each encalve? > > > > Directly associating /dev/sgx with an enclave means /dev/sgx can't be > > used to provide ioctl()'s for other SGX-related needs, e.g. to mmap() > > raw EPC and expose it a VM. Proposed layout in the link below. I'll > > also respond to Jarkko's question about exposing EPC through /dev/sgx > > instead of having KVM allocate it on behalf of the VM. > > Hmm. I guess this makes some sense. My instinct would be to do it a > little differently and have: > > /dev/sgx/enclave: Each instance is an enclave. > > /dev/sgx/epc: Used to get raw EPC for KVM. Might have different > permissions, perhaps 0660 and group kvm. > > /dev/sgx/something_else: For when SGX v3 adds something else :) Mmmm, I like this approach a lot. It would allow userspace to easily manage permissions for each "feature", e.g. give all users access to /dev/sgx/epc but restrict /dev/sgx/enclave. And we could add e.g. /dev/sgx/admin if we wanted to exposed ioctls() that apply to all aspects of SGX. Do you know if /dev/sgx/epc could be dynamically created, e.g. by KVM when the kvm_intel module is loaded? That would seal the deal for me as it'd keep open the option of having KVM handle oversubscription of guest EPC while exposing EPC through /dev/sgx instead of /dev/kvm.
On Fri, Dec 21, 2018 at 09:12:46AM -0800, Andy Lutomirski wrote: > Hmm. I guess this makes some sense. My instinct would be to do it a > little differently and have: > > /dev/sgx/enclave: Each instance is an enclave. > > /dev/sgx/epc: Used to get raw EPC for KVM. Might have different > permissions, perhaps 0660 and group kvm. > > /dev/sgx/something_else: For when SGX v3 adds something else :) If I make a draw by saying that I will go with the "ioctls for enclave fd" as I'm already making good progress in implementation for v19 and we will look at it? Does not look like a fruitful conversation to continue forward without functional code. I will also update my selftest (as it is part of the patch set) to align with whatever we have so you can immediately run something. And since no one is giving me anything at all on swapping but instead cutting hairs on here, I will lock in (at least for v19) to use shmem again. Sounds like a plan? All the internals will work for whatever mess we want to introduce to the uapi. /Jarkko
On Fri, Dec 21, 2018 at 10:24:54AM -0800, Sean Christopherson wrote: > On Fri, Dec 21, 2018 at 09:12:46AM -0800, Andy Lutomirski wrote: > > > On Dec 21, 2018, at 9:28 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > > > > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote: > > >>> On Dec 19, 2018, at 6:45 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > >>> > > >>>>> On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote: > > >>> > > >>> I agree with Jethro, passing the enclave_fd as a param is obnoxious. > > >>> And it means the user needs to open /dev/sgx to do anything with an > > >>> enclave fd, e.g. the enclave fd might be passed to a builder thread, > > >>> it shouldn't also need the device fd. > > >>> > > >>> E.g.: > > >>> > > >>> sgx_fd = open("/dev/sgx", O_RDWR); > > >>> BUG_ON(sgx_fd < 0); > > >>> > > >>> enclave_fd = ioctl(sgx_fd, SGX_ENCLAVE_CREATE, &ecreate); > > >>> BUG_ON(enclave_fd < 0); > > >>> > > >>> ret = ioctl(enclave_fd, SGX_ENCLAVE_ADD_PAGE, &eadd); > > >>> BUG_ON(ret); > > >>> > > >>> ... > > >>> > > >>> ret = ioctl(enclave_fd, SGX_ENCLAVE_INIT, &einit); > > >>> BUG_ON(ret); > > >>> > > >>> ... > > >>> > > >>> close(enclave_fd); > > >>> close(sgx_fd); > > >>> > > >>> > > >>> Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes > > >>> and ioctls for VMs and vCPUs. > > >> > > >> Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > >> opening a new instance of /dev/sgx for each encalve? > > > > > > Directly associating /dev/sgx with an enclave means /dev/sgx can't be > > > used to provide ioctl()'s for other SGX-related needs, e.g. to mmap() > > > raw EPC and expose it a VM. Proposed layout in the link below. I'll > > > also respond to Jarkko's question about exposing EPC through /dev/sgx > > > instead of having KVM allocate it on behalf of the VM. > > > > Hmm. I guess this makes some sense. My instinct would be to do it a > > little differently and have: > > > > /dev/sgx/enclave: Each instance is an enclave. > > > > /dev/sgx/epc: Used to get raw EPC for KVM. Might have different > > permissions, perhaps 0660 and group kvm. > > > > /dev/sgx/something_else: For when SGX v3 adds something else :) > > Mmmm, I like this approach a lot. It would allow userspace to easily > manage permissions for each "feature", e.g. give all users access to > /dev/sgx/epc but restrict /dev/sgx/enclave. > > And we could add e.g. /dev/sgx/admin if we wanted to exposed ioctls() > that apply to all aspects of SGX. > > Do you know if /dev/sgx/epc could be dynamically created, e.g. by > KVM when the kvm_intel module is loaded? That would seal the deal for > me as it'd keep open the option of having KVM handle oversubscription > of guest EPC while exposing EPC through /dev/sgx instead of /dev/kvm. No issues with this approach from my side but won't use it for v19. Please share these comments when reviewing v19. Neither objections to have this interface. /Jarkko
On Fri, Dec 21, 2018 at 09:12:46AM -0800, Andy Lutomirski wrote: > /dev/sgx/enclave: Each instance is an enclave. > > /dev/sgx/epc: Used to get raw EPC for KVM. Might have different > permissions, perhaps 0660 and group kvm. > > /dev/sgx/something_else: For when SGX v3 adds something else :) Responding again to this anyway now that I have had time think about it. Here is now I see it: 1. /dev/sgx/enclave should be /dev/sgx as it is now. 2. /dev/sgx/epc should be something that you'd reach through /dev/kvm. This essentially a circular dependency. KVM uapi should provide KVM services. Now you sprinkle KVM uapi to two subsystems. 3. "something else" is securityfs (e.g. provisioning). That is kind of stuff that it is meant for. I'm sorry but from my perspective this does not look too good no matter what glasses I put on... /Jarkko
On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote: > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote: > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > opening a new instance of /dev/sgx for each encalve? > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send > the enclav to a process that does not have necessarily have rights to > /dev/sgx. Gives more robust environment to configure SGX. Sean, is this why you wanted enclave fd and anon inode and not just use the address space of /dev/sgx? Just taking notes of all observations. I'm not sure what your rationale was (maybe it was somewhere). This was something I made up, and this one is wrong deduction. You can easily get the same benefit with /dev/sgx associated fd representing the enclave. This all means that for v19 I'm going without enclave fd involved with fd to /dev/sgx representing the enclave. No anon inodes will be involved. /Jarkko
On Sat, Dec 22, 2018 at 10:25:02AM +0200, Jarkko Sakkinen wrote: > On Sat, Dec 22, 2018 at 10:16:49AM +0200, Jarkko Sakkinen wrote: > > On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote: > > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote: > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > > > opening a new instance of /dev/sgx for each encalve? > > > > > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send > > > the enclav to a process that does not have necessarily have rights to > > > /dev/sgx. Gives more robust environment to configure SGX. > > > > Sean, is this why you wanted enclave fd and anon inode and not just use > > the address space of /dev/sgx? Just taking notes of all observations. > > I'm not sure what your rationale was (maybe it was somewhere). This was > > something I made up, and this one is wrong deduction. You can easily > > get the same benefit with /dev/sgx associated fd representing the > > enclave. > > > > This all means that for v19 I'm going without enclave fd involved with > > fd to /dev/sgx representing the enclave. No anon inodes will be > > involved. > > Based on these observations I updated the uapi. > > As far as I'm concerned there has to be a solution to do EPC mapping > with a sequence: > > 1. Ping /dev/kvm to do something. > 2. KVM asks SGX core to do something. > 3. SGX core does something. > > I don't care what the something is exactly is, but KVM is the only sane > place for KVM uapi. I would be surprised if KVM maintainers didn't agree > that they don't want to sprinkle KVM uapi to random places in other > subsystems. The one option to consider to do would be to have a device driver for KVM if you really want this e.g. something like /dev/vsgx. With the current knowledge I'm not yet sure why all could not be done just through /dev/kvm. /Jarkko
> On Dec 21, 2018, at 11:24 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Fri, Dec 21, 2018 at 09:12:46AM -0800, Andy Lutomirski wrote: >>> On Dec 21, 2018, at 9:28 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: >>> >>> On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote: >>>>> On Dec 19, 2018, at 6:45 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: >>>>> >>>>>>> On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote: >>>>> >>>>> I agree with Jethro, passing the enclave_fd as a param is obnoxious. >>>>> And it means the user needs to open /dev/sgx to do anything with an >>>>> enclave fd, e.g. the enclave fd might be passed to a builder thread, >>>>> it shouldn't also need the device fd. >>>>> >>>>> E.g.: >>>>> >>>>> sgx_fd = open("/dev/sgx", O_RDWR); >>>>> BUG_ON(sgx_fd < 0); >>>>> >>>>> enclave_fd = ioctl(sgx_fd, SGX_ENCLAVE_CREATE, &ecreate); >>>>> BUG_ON(enclave_fd < 0); >>>>> >>>>> ret = ioctl(enclave_fd, SGX_ENCLAVE_ADD_PAGE, &eadd); >>>>> BUG_ON(ret); >>>>> >>>>> ... >>>>> >>>>> ret = ioctl(enclave_fd, SGX_ENCLAVE_INIT, &einit); >>>>> BUG_ON(ret); >>>>> >>>>> ... >>>>> >>>>> close(enclave_fd); >>>>> close(sgx_fd); >>>>> >>>>> >>>>> Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes >>>>> and ioctls for VMs and vCPUs. >>>> >>>> Can one of you explain why SGX_ENCLAVE_CREATE is better than just >>>> opening a new instance of /dev/sgx for each encalve? >>> >>> Directly associating /dev/sgx with an enclave means /dev/sgx can't be >>> used to provide ioctl()'s for other SGX-related needs, e.g. to mmap() >>> raw EPC and expose it a VM. Proposed layout in the link below. I'll >>> also respond to Jarkko's question about exposing EPC through /dev/sgx >>> instead of having KVM allocate it on behalf of the VM. >> >> Hmm. I guess this makes some sense. My instinct would be to do it a >> little differently and have: >> >> /dev/sgx/enclave: Each instance is an enclave. >> >> /dev/sgx/epc: Used to get raw EPC for KVM. Might have different >> permissions, perhaps 0660 and group kvm. >> >> /dev/sgx/something_else: For when SGX v3 adds something else :) > > Mmmm, I like this approach a lot. It would allow userspace to easily > manage permissions for each "feature", e.g. give all users access to > /dev/sgx/epc but restrict /dev/sgx/enclave. > > And we could add e.g. /dev/sgx/admin if we wanted to exposed ioctls() > that apply to all aspects of SGX. > > Do you know if /dev/sgx/epc could be dynamically created, e.g. by > KVM when the kvm_intel module is loaded? Presumably sgx would create a bus and enumerate the devices as needed. Or I suppose these things could be platform or system devices. I’m not really a device model expert, and the one time I tried to implement a character device, I got so disgusted that I wrote a whole library for it. It’s still in limbo somewhere. > That would seal the deal for > me as it'd keep open the option of having KVM handle oversubscription > of guest EPC while exposing EPC through /dev/sgx instead of /dev/kvm.
On Sun, Dec 23, 2018 at 4:52 AM Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > On Sat, Dec 22, 2018 at 10:25:02AM +0200, Jarkko Sakkinen wrote: > > On Sat, Dec 22, 2018 at 10:16:49AM +0200, Jarkko Sakkinen wrote: > > > On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote: > > > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote: > > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > > > > opening a new instance of /dev/sgx for each encalve? > > > > > > > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send > > > > the enclav to a process that does not have necessarily have rights to > > > > /dev/sgx. Gives more robust environment to configure SGX. > > > > > > Sean, is this why you wanted enclave fd and anon inode and not just use > > > the address space of /dev/sgx? Just taking notes of all observations. > > > I'm not sure what your rationale was (maybe it was somewhere). This was > > > something I made up, and this one is wrong deduction. You can easily > > > get the same benefit with /dev/sgx associated fd representing the > > > enclave. > > > > > > This all means that for v19 I'm going without enclave fd involved with > > > fd to /dev/sgx representing the enclave. No anon inodes will be > > > involved. > > > > Based on these observations I updated the uapi. > > > > As far as I'm concerned there has to be a solution to do EPC mapping > > with a sequence: > > > > 1. Ping /dev/kvm to do something. > > 2. KVM asks SGX core to do something. > > 3. SGX core does something. > > > > I don't care what the something is exactly is, but KVM is the only sane > > place for KVM uapi. I would be surprised if KVM maintainers didn't agree > > that they don't want to sprinkle KVM uapi to random places in other > > subsystems. > > The one option to consider to do would be to have a device driver for > KVM if you really want this e.g. something like /dev/vsgx. With the > current knowledge I'm not yet sure why all could not be done just > through /dev/kvm. > That seems reasonable too. I don't really care about the path to the device node, but it does seem reasonable to me to have it be a separate node entirely from the normal enclave interface.
On Sun, Dec 23, 2018 at 12:42:48PM -0800, Andy Lutomirski wrote: > On Sun, Dec 23, 2018 at 4:52 AM Jarkko Sakkinen > <jarkko.sakkinen@linux.intel.com> wrote: > > > > On Sat, Dec 22, 2018 at 10:25:02AM +0200, Jarkko Sakkinen wrote: > > > On Sat, Dec 22, 2018 at 10:16:49AM +0200, Jarkko Sakkinen wrote: > > > > On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote: > > > > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote: > > > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > > > > > opening a new instance of /dev/sgx for each encalve? > > > > > > > > > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send > > > > > the enclav to a process that does not have necessarily have rights to > > > > > /dev/sgx. Gives more robust environment to configure SGX. > > > > > > > > Sean, is this why you wanted enclave fd and anon inode and not just use > > > > the address space of /dev/sgx? Just taking notes of all observations. > > > > I'm not sure what your rationale was (maybe it was somewhere). This was > > > > something I made up, and this one is wrong deduction. You can easily > > > > get the same benefit with /dev/sgx associated fd representing the > > > > enclave. > > > > > > > > This all means that for v19 I'm going without enclave fd involved with > > > > fd to /dev/sgx representing the enclave. No anon inodes will be > > > > involved. > > > > > > Based on these observations I updated the uapi. > > > > > > As far as I'm concerned there has to be a solution to do EPC mapping > > > with a sequence: > > > > > > 1. Ping /dev/kvm to do something. > > > 2. KVM asks SGX core to do something. > > > 3. SGX core does something. > > > > > > I don't care what the something is exactly is, but KVM is the only sane > > > place for KVM uapi. I would be surprised if KVM maintainers didn't agree > > > that they don't want to sprinkle KVM uapi to random places in other > > > subsystems. > > > > The one option to consider to do would be to have a device driver for > > KVM if you really want this e.g. something like /dev/vsgx. With the > > current knowledge I'm not yet sure why all could not be done just > > through /dev/kvm. > > > > That seems reasonable too. I don't really care about the path to the > device node, but it does seem reasonable to me to have it be a > separate node entirely from the normal enclave interface. What is the core reason anyway that /dev/kvm is out of the question? /Jarkko
On Sun, Dec 23, 2018 at 12:41:49PM -0800, Andy Lutomirski wrote: > Presumably sgx would create a bus and enumerate the devices as needed. > Or I suppose these things could be platform or system devices. I’m not > really a device model expert, and the one time I tried to implement a > character device, I got so disgusted that I wrote a whole library for > it. It’s still in limbo somewhere. In the driver/main.c I have a fairly good framework for device creation so if there is any reason to add more devices it is dead easy. I just don't see any reason for that. And I already have a bus there for SGX. All of this has been done a long time ago. /Jarkko
On Sat, Dec 22, 2018 at 10:25:02AM +0200, Jarkko Sakkinen wrote: > On Sat, Dec 22, 2018 at 10:16:49AM +0200, Jarkko Sakkinen wrote: > > On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote: > > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote: > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > > > opening a new instance of /dev/sgx for each encalve? > > > > > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send > > > the enclav to a process that does not have necessarily have rights to > > > /dev/sgx. Gives more robust environment to configure SGX. > > > > Sean, is this why you wanted enclave fd and anon inode and not just use > > the address space of /dev/sgx? Just taking notes of all observations. > > I'm not sure what your rationale was (maybe it was somewhere). This was > > something I made up, and this one is wrong deduction. You can easily > > get the same benefit with /dev/sgx associated fd representing the > > enclave. > > > > This all means that for v19 I'm going without enclave fd involved with > > fd to /dev/sgx representing the enclave. No anon inodes will be > > involved. > > Based on these observations I updated the uapi. > > As far as I'm concerned there has to be a solution to do EPC mapping > with a sequence: > > 1. Ping /dev/kvm to do something. > 2. KVM asks SGX core to do something. > 3. SGX core does something. > > I don't care what the something is exactly is, but KVM is the only sane > place for KVM uapi. I would be surprised if KVM maintainers didn't agree > that they don't want to sprinkle KVM uapi to random places in other > subsystems. It's not a KVM uapi. KVM isn't a hypervisor in the traditional sense. The "real" hypervisor lives in userspace, e.g. Qemu, KVM is essentially just a (very fancy) driver for hardware accelerators, e.g. VMX. Qemu for example is fully capable of running an x86 VM without KVM, it's just substantially slower. In terms of guest memory, KVM doesn't care or even know what a particular region of memory represents or what, if anything, is backing a region in the host. There are cases when KVM is made aware of certain aspects of guest memory for performance or functional reasons, e.g. emulated MMIO and encrypted memory, but in all cases the control logic ultimately resides in userspace. SGX is a weird case because ENCLS can't be emulated in software, i.e. exposing SGX to a VM without KVM's help would be difficult. But, it wouldn't be impossible, just slow and ugly. And so, ignoring host oversubscription for the moment, there is no hard requirement that SGX EPC can only be exposed to a VM through KVM. In other words, allocating and exposing EPC to a VM is orthogonal to KVM supporting SGX. Exposing EPC to userspace via /dev/sgx/epc would mean that KVM would handle it like any other guest memory region, and all EPC related code/logic would reside in the SGX subsystem. Oversubscription throws a wrench in the system because ENCLV can only be executed post-VMXON and EPC conflicts generate VMX VM-Exits. But even then, KVM doesn't need to own the EPC uapi, e.g. it can call into the SGX subsystem to handle EPC conflict VM-Exits and the SGX subsystem can wrap ENCLV with exception fixup and forcefully reclaim EPC pages if ENCLV faults. I can't be 100% certain the oversubscription scheme will be sane without actually writing the code, but I'd like to at least keep the option open, i.e. not structure /dev/sgx/ in such a way that adding e.g. /dev/sgx/epc is impossible or ugly.
On Wed, Jan 02, 2019 at 12:47:52PM -0800, Sean Christopherson wrote: > On Sat, Dec 22, 2018 at 10:25:02AM +0200, Jarkko Sakkinen wrote: > > On Sat, Dec 22, 2018 at 10:16:49AM +0200, Jarkko Sakkinen wrote: > > > On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote: > > > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote: > > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > > > > opening a new instance of /dev/sgx for each encalve? > > > > > > > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send > > > > the enclav to a process that does not have necessarily have rights to > > > > /dev/sgx. Gives more robust environment to configure SGX. > > > > > > Sean, is this why you wanted enclave fd and anon inode and not just use > > > the address space of /dev/sgx? Just taking notes of all observations. > > > I'm not sure what your rationale was (maybe it was somewhere). This was > > > something I made up, and this one is wrong deduction. You can easily > > > get the same benefit with /dev/sgx associated fd representing the > > > enclave. > > > > > > This all means that for v19 I'm going without enclave fd involved with > > > fd to /dev/sgx representing the enclave. No anon inodes will be > > > involved. > > > > Based on these observations I updated the uapi. > > > > As far as I'm concerned there has to be a solution to do EPC mapping > > with a sequence: > > > > 1. Ping /dev/kvm to do something. > > 2. KVM asks SGX core to do something. > > 3. SGX core does something. > > > > I don't care what the something is exactly is, but KVM is the only sane > > place for KVM uapi. I would be surprised if KVM maintainers didn't agree > > that they don't want to sprinkle KVM uapi to random places in other > > subsystems. > > It's not a KVM uapi. > > KVM isn't a hypervisor in the traditional sense. The "real" hypervisor > lives in userspace, e.g. Qemu, KVM is essentially just a (very fancy) > driver for hardware accelerators, e.g. VMX. Qemu for example is fully > capable of running an x86 VM without KVM, it's just substantially slower. > > In terms of guest memory, KVM doesn't care or even know what a particular > region of memory represents or what, if anything, is backing a region in > the host. There are cases when KVM is made aware of certain aspects of > guest memory for performance or functional reasons, e.g. emulated MMIO > and encrypted memory, but in all cases the control logic ultimately > resides in userspace. > > SGX is a weird case because ENCLS can't be emulated in software, i.e. > exposing SGX to a VM without KVM's help would be difficult. But, it > wouldn't be impossible, just slow and ugly. > > And so, ignoring host oversubscription for the moment, there is no hard > requirement that SGX EPC can only be exposed to a VM through KVM. In > other words, allocating and exposing EPC to a VM is orthogonal to KVM > supporting SGX. Exposing EPC to userspace via /dev/sgx/epc would mean > that KVM would handle it like any other guest memory region, and all EPC > related code/logic would reside in the SGX subsystem. I'm fine doing that if it makes sense. I just don't understand why you cannot add ioctls to /dev/kvm for allocating the region. Why isn't that possible? As I said to Andy earlier, adding new device files is easy as everything related to device creation is nicely encapsulated. > Oversubscription throws a wrench in the system because ENCLV can only > be executed post-VMXON and EPC conflicts generate VMX VM-Exits. But > even then, KVM doesn't need to own the EPC uapi, e.g. it can call into > the SGX subsystem to handle EPC conflict VM-Exits and the SGX subsystem > can wrap ENCLV with exception fixup and forcefully reclaim EPC pages if > ENCLV faults. If the uapi is *only* for KVM, it should definitely own it. KVM calling SGX subsystem on a conflict is KVM using in-kernel APIs provided by the SGX core. > I can't be 100% certain the oversubscription scheme will be sane without > actually writing the code, but I'd like to at least keep the option open, > i.e. not structure /dev/sgx/ in such a way that adding e.g. /dev/sgx/epc > is impossible or ugly. /Jarkko
> > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > opening a new instance of /dev/sgx for each encalve? > > Directly associating /dev/sgx with an enclave means /dev/sgx can't be used > to provide ioctl()'s for other SGX-related needs, e.g. to mmap() raw EPC and > expose it a VM. Proposed layout in the link below. I'll also respond to > Jarkko's question about exposing EPC through /dev/sgx instead of having > KVM allocate it on behalf of the VM. > > https://lkml.kernel.org/r/20181218185349.GC30082@linux.intel.com Hi Sean, Sorry for replying to old email. But IMHO it is not a must that Qemu needs to open some /dev/sgx and allocate/mmap EPC for guest's virtual EPC slot, instead, KVM could create private slot, which is not visible to Qemu, for virtual EPC, and KVM could call core-SGX EPC allocation API directly. I am not sure what's the good of allowing userspace to alloc/mmap a raw EPC region? Userspace is not allowed to touch EPC anyway, expect enclave code. To me KVM creates private EPC slot is cleaner than exposing /dev/sgx/epc and allowing userspace to map some raw EPC region. Thanks, -Kai
On Tue, Jan 08, 2019 at 11:27:11AM -0800, Huang, Kai wrote: > > > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > > opening a new instance of /dev/sgx for each encalve? > > > > Directly associating /dev/sgx with an enclave means /dev/sgx can't be used > > to provide ioctl()'s for other SGX-related needs, e.g. to mmap() raw EPC and > > expose it a VM. Proposed layout in the link below. I'll also respond to > > Jarkko's question about exposing EPC through /dev/sgx instead of having > > KVM allocate it on behalf of the VM. > > > > https://lkml.kernel.org/r/20181218185349.GC30082@linux.intel.com > > Hi Sean, > > Sorry for replying to old email. But IMHO it is not a must that Qemu > needs to open some /dev/sgx and allocate/mmap EPC for guest's virtual > EPC slot, instead, KVM could create private slot, which is not visible > to Qemu, for virtual EPC, and KVM could call core-SGX EPC allocation > API directly. That's possible, but it has several downsides. - Duplicates a lot of code in KVM for managing memory regions. - Artificially restricts userspace to a single EPC region, unless even more code is duplicated to handle multiple private regions. - Requires additional ioctls() or capabilities to probe EPC support - Does not fit with Qemu/KVM's memory model, e.g. all other types of memory are exposed to a guest through KVM_SET_USER_MEMORY_REGION. - Prevents userspace from debugging a guest's enclave. I'm not saying this is a likely scenario, but I also don't think we should preclude it without good reason. - KVM is now responsible for managing the lifecycle of EPC, e.g. what happens if an EPC cgroup limit is lowered on a running VM and KVM can't gracefully reclaim EPC? The userspace hypervisor should ultimately decide how to handle such an event. - SGX logic is split between SGX and KVM, e.g. VA page management for oversubscription will likely be common to SGX and KVM. From a long term maintenance perspective, this means that changes to the EPC management could potentially need to be Acked by KVM, and vice versa. > I am not sure what's the good of allowing userspace to alloc/mmap a > raw EPC region? Userspace is not allowed to touch EPC anyway, expect > enclave code. > > To me KVM creates private EPC slot is cleaner than exposing /dev/sgx/epc > and allowing userspace to map some raw EPC region. Cleaner in the sense that it's faster to get basic support up and running since there are fewer touchpoints, but there are long term ramifications to cramming EPC management in KVM. And at this point I'm not stating any absolutes, e.g. how EPC will be handled by KVM. What I'm pushing for is to not eliminate the possibility of having the SGX subsystem own all EPC management, e.g. don't tie /dev/sgx to a single enclave. > > Thanks, > -Kai
On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Tue, Jan 08, 2019 at 11:27:11AM -0800, Huang, Kai wrote: > > > > > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > > > opening a new instance of /dev/sgx for each encalve? > > > > > > Directly associating /dev/sgx with an enclave means /dev/sgx can't be used > > > to provide ioctl()'s for other SGX-related needs, e.g. to mmap() raw EPC and > > > expose it a VM. Proposed layout in the link below. I'll also respond to > > > Jarkko's question about exposing EPC through /dev/sgx instead of having > > > KVM allocate it on behalf of the VM. > > > > > > https://lkml.kernel.org/r/20181218185349.GC30082@linux.intel.com > > > > Hi Sean, > > > > Sorry for replying to old email. But IMHO it is not a must that Qemu > > needs to open some /dev/sgx and allocate/mmap EPC for guest's virtual > > EPC slot, instead, KVM could create private slot, which is not visible > > to Qemu, for virtual EPC, and KVM could call core-SGX EPC allocation > > API directly. > > That's possible, but it has several downsides. > > - Duplicates a lot of code in KVM for managing memory regions. > - Artificially restricts userspace to a single EPC region, unless > even more code is duplicated to handle multiple private regions. > - Requires additional ioctls() or capabilities to probe EPC support > - Does not fit with Qemu/KVM's memory model, e.g. all other types of > memory are exposed to a guest through KVM_SET_USER_MEMORY_REGION. > - Prevents userspace from debugging a guest's enclave. I'm not saying > this is a likely scenario, but I also don't think we should preclude > it without good reason. > - KVM is now responsible for managing the lifecycle of EPC, e.g. what > happens if an EPC cgroup limit is lowered on a running VM and > KVM can't gracefully reclaim EPC? The userspace hypervisor should > ultimately decide how to handle such an event. > - SGX logic is split between SGX and KVM, e.g. VA page management for > oversubscription will likely be common to SGX and KVM. From a long > term maintenance perspective, this means that changes to the EPC > management could potentially need to be Acked by KVM, and vice versa. > > > I am not sure what's the good of allowing userspace to alloc/mmap a > > raw EPC region? Userspace is not allowed to touch EPC anyway, expect > > enclave code. > > > > To me KVM creates private EPC slot is cleaner than exposing /dev/sgx/epc > > and allowing userspace to map some raw EPC region. > > Cleaner in the sense that it's faster to get basic support up and running > since there are fewer touchpoints, but there are long term ramifications > to cramming EPC management in KVM. > > And at this point I'm not stating any absolutes, e.g. how EPC will be > handled by KVM. What I'm pushing for is to not eliminate the possibility > of having the SGX subsystem own all EPC management, e.g. don't tie > /dev/sgx to a single enclave. I haven't gone and re-read all the relevant SDM bits, so I'll just ask: what, if anything, are the actual semantics of mapping "raw EPC" like this? You can't actually do anything with the mapping from user mode unless you actually get an enclave created and initialized in it and have it mapped at the correct linear address, right? I still think you have the right idea, but it is a bit unusual. I do think it makes sense to have QEMU delegate the various ENCLS operations (especially EINIT) to the regular SGX interface, which will mean that VM guests will have exactly the same access controls applied as regular user programs, which is probably what we want. If so, there will need to be a way to get INITTOKEN privilege for the purpose of running non-Linux OSes in the VM, which isn't the end of the world. We might still want the actual ioctl to do EINIT using an actual explicit token to be somehow restricted in a way that strongly discourages its use by anything other than a hypervisor. Or I suppose we could just straight-up ignore the guest-provided init token. --Andy P.S. Is Intel ever going to consider a way to make guests get their own set of keys that are different from the host's keys and other guests' keys?
> On Tue, Jan 08, 2019 at 11:27:11AM -0800, Huang, Kai wrote: > > > > > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > > > opening a new instance of /dev/sgx for each encalve? > > > > > > Directly associating /dev/sgx with an enclave means /dev/sgx can't > > > be used to provide ioctl()'s for other SGX-related needs, e.g. to > > > mmap() raw EPC and expose it a VM. Proposed layout in the link > > > below. I'll also respond to Jarkko's question about exposing EPC > > > through /dev/sgx instead of having KVM allocate it on behalf of the VM. > > > > > > https://lkml.kernel.org/r/20181218185349.GC30082@linux.intel.com > > > > Hi Sean, > > > > Sorry for replying to old email. But IMHO it is not a must that Qemu > > needs to open some /dev/sgx and allocate/mmap EPC for guest's virtual > > EPC slot, instead, KVM could create private slot, which is not visible > > to Qemu, for virtual EPC, and KVM could call core-SGX EPC allocation > > API directly. > > That's possible, but it has several downsides. > > - Duplicates a lot of code in KVM for managing memory regions. I don't see why there will be duplicated code. you can simply call __x86_set_memory_region to create private slot. It is KVM x86 equivalent to KVM_SET_USER_MEMORY_REGION from userspace. The only difference is Qemu is not aware of the private slot. > - Artificially restricts userspace to a single EPC region, unless > even more code is duplicated to handle multiple private regions. You can have multiple private slots, by calling __x86_set_memory_region for each EPC section. KVM receives EPC section/sections info from Qemu, via CPUID, or dedicated IOCTL (is this you are going to add?), and simply creates private EPC slot/slots. > - Requires additional ioctls() or capabilities to probe EPC support No. EPC info is from Qemu at the beginning (size is given by parameter, base is calculated by Qemu), and actually it is Qemu notifies KVM EPC info, so I don't think we require additional ioctls or capabilities here. > - Does not fit with Qemu/KVM's memory model, e.g. all other types of > memory are exposed to a guest through > KVM_SET_USER_MEMORY_REGION. EPC is different. I am not sure whether EPC needs to fit such model. There are already examples in KVM which uses private slot w/o using KVM_SET_USER_MEMORY_REGION, for example, APIC access page. > - Prevents userspace from debugging a guest's enclave. I'm not saying > this is a likely scenario, but I also don't think we should preclude > it without good reason. I am not sure how important it is, so don't know whether this can be a justification. To me we don't need to consider this. Qemu normally doesn't access guest memory unless it has to (ie, for device model). > - KVM is now responsible for managing the lifecycle of EPC, e.g. what > happens if an EPC cgroup limit is lowered on a running VM and > KVM can't gracefully reclaim EPC? The userspace hypervisor should > ultimately decide how to handle such an event. Even using KVM_SET_USER_MEMORY_REGION, KVM is also responsible for managing the lifecycle of EPC, no? And "managing the lifecycle of EPC" you mean doesn't need to be "managing EPC itself" I suppose, since EPC should always be managed by core SGX code. I don't see the difference between private slot and KVM_SET_USER_MEMORY_REGION, in terms of how does KVM reclaim EPC, or how does KVM do when it fails to reclaim EPC. > - SGX logic is split between SGX and KVM, e.g. VA page management for > oversubscription will likely be common to SGX and KVM. From a long > term maintenance perspective, this means that changes to the EPC > management could potentially need to be Acked by KVM, and vice versa. I think most of the code should be in core SGX code under x86. KVM should only have the code that is specifically related to virtualization, ie, ENCLV. VA page allocation should be under code SGX code. KVM might need to call function such as alloc_va_page, etc, but this is not a problem. There are many other cases now. And this is not related to private slot vs KVM_SET_USER_MEMORY_REGION. > > > I am not sure what's the good of allowing userspace to alloc/mmap a > > raw EPC region? Userspace is not allowed to touch EPC anyway, expect > > enclave code. > > > > To me KVM creates private EPC slot is cleaner than exposing > > /dev/sgx/epc and allowing userspace to map some raw EPC region. > > Cleaner in the sense that it's faster to get basic support up and running since > there are fewer touchpoints, but there are long term ramifications to > cramming EPC management in KVM. > > And at this point I'm not stating any absolutes, e.g. how EPC will be handled > by KVM. What I'm pushing for is to not eliminate the possibility of having > the SGX subsystem own all EPC management, e.g. don't tie /dev/sgx to a > single enclave. I suppose "SGX subsystem" you mean here is core SGX code under x86. IMHO EPC should always be managed by such SGX subsystem, and KVM and SGX driver are just consumers (ie, calling EPC allocation function, etc). IMHO I think /dev/sgx (or whatever /dev/sgx/xxx) should be in SGX driver, but not SGX core code. For example, if we don't consider KVM EPC oversubscription here, theoretically we only need below code in core SGX code to make KVM SGX work: 1) SGX feature detection. There should be some core structure (such as boot_cpu_data, etc) where KVM can query SGX capabilities, ie, whether FLC is available. 2) EPC management. KVM simply calls EPC management APIs when it needs. For example, when EPC slot is created, we should populate all EPC pages, and fail to create VM if running out of EPC. 3) Other code to deal with ENCLS, SGX data structure, etc, since we have agreed even with EPC static allocation, we should trap EINIT. We probably even don't need code to deal with enclave, if only for KVM. Of course, in order to support KVM EPC oversubscription, we should also include enclave management code in core SGX code, but this doesn't necessary mean we should include /dev/sgx/xxx in core SGX code. It seems there still are lots of design issues we haven't got consensus in terms of how SGX driver should be (or how SGX stack is supposed to work), but if we can focus on what core SGX code should really have (w/o involving SGX driver logic), we can at least get KVM SGX code working. After all, we also have window SGX support. Thanks, -Kai
On Thu, Jan 03, 2019 at 08:26:35AM -0800, Sean Christopherson wrote: > What I was trying to explain is that the uapi isn't for KVM, it's for > the userspace hypervisor, e.g. Qemu. Qemu will inform KVM of the > resulting guest memory region so that KVM can configure its guest page > tables accordingly, but that is done through KVM's existing memory uapi. OK, I now I got it, apologies it took such a long time :-) Now I see the analogy e.g. qemu creates independently VMAs and then fuels those regions to KVM. Similarly qemu would create regions for KVM using "/dev/sgx/mem". For me this is perfectly fine now I understand the reasoning and neither does make my job more difficult to implement the file based enclave change. Thanks for the patience with this... /Jarkko
On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote: > On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > Cleaner in the sense that it's faster to get basic support up and running > > since there are fewer touchpoints, but there are long term ramifications > > to cramming EPC management in KVM. > > > > And at this point I'm not stating any absolutes, e.g. how EPC will be > > handled by KVM. What I'm pushing for is to not eliminate the possibility > > of having the SGX subsystem own all EPC management, e.g. don't tie > > /dev/sgx to a single enclave. > > I haven't gone and re-read all the relevant SDM bits, so I'll just > ask: what, if anything, are the actual semantics of mapping "raw EPC" > like this? You can't actually do anything with the mapping from user > mode unless you actually get an enclave created and initialized in it > and have it mapped at the correct linear address, right? I still > think you have the right idea, but it is a bit unusual. Correct, the EPC is inaccessible until a range is "mapped" with ECREATE. But I'd argue that it's not unusual, just different. And really it's not all that different than userspace mmap'ing /dev/sgx/enclave prior to ioctl(ENCLAVE_CREATE). In that case, userspace can still (attempt to) access the "raw" EPC, i.e. generate a #PF, the kernel/driver just happens to consider any faulting EPC address without an associated enclave as illegal, e.g. signals SIGBUS. The /dev/sgx/epc case simply has different semantics for moving pages in and out of the EPC, i.e. different fault and eviction semantics. Yes, this allows the guest kernel to directly access the "raw" EPC, but that's conceptually in line with hardware where priveleged software can directly "access" the EPC (or rather, the abort page for all intents and purposes). I.e. it's an argument for requiring certain privileges to open /dev/sgx/epc, but IMO it's not unusual. Maybe /dev/sgx/epc is a poor name and is causing confusion, e.g. /dev/sgx/virtualmachine might be more appropriate. > I do think it makes sense to have QEMU delegate the various ENCLS > operations (especially EINIT) to the regular SGX interface, which will > mean that VM guests will have exactly the same access controls applied > as regular user programs, which is probably what we want. To what end? Except for EINIT, none of the ENCLS leafs are interesting from a permissions perspective. Trapping and re-executing ENCLS leafs is painful, e.g. most leafs have multiple virtual addresses that need to be translated. And routing everything through the regular interface would make SGX even slower than it already is, e.g. every ENCLS would take an additional ~900 cycles just to handle the VM-Exit, and that's not accounting for any additional overhead in the SGX code, e.g. using the regular interface would mean superfluous locks, etc... The only benefit is that it would theoretically allow oversubscribing guest EPC without hardware support, but that would require a lot of code to do the necessary SECS tracking and refcounting. > If so, > there will need to be a way to get INITTOKEN privilege for the purpose > of running non-Linux OSes in the VM, which isn't the end of the world. Couldn't we require the same privilege/capability for VMs and and EINIT tokens? I.e. /dev/sgx/virtualmachine can only be opened by a user that can also generate tokens. > We might still want the actual ioctl to do EINIT using an actual > explicit token to be somehow restricted in a way that strongly > discourages its use by anything other than a hypervisor. Or I suppose > we could just straight-up ignore the guest-provided init token. > > P.S. Is Intel ever going to consider a way to make guests get their > own set of keys that are different from the host's keys and other > guests' keys?
On Tue, Jan 08, 2019 at 09:24:38PM -0800, Huang, Kai wrote: > > On Tue, Jan 08, 2019 at 11:27:11AM -0800, Huang, Kai wrote: > > > > > > > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > > > > opening a new instance of /dev/sgx for each encalve? > > > > > > > > Directly associating /dev/sgx with an enclave means /dev/sgx can't > > > > be used to provide ioctl()'s for other SGX-related needs, e.g. to > > > > mmap() raw EPC and expose it a VM. Proposed layout in the link > > > > below. I'll also respond to Jarkko's question about exposing EPC > > > > through /dev/sgx instead of having KVM allocate it on behalf of the VM. > > > > > > > > https://lkml.kernel.org/r/20181218185349.GC30082@linux.intel.com > > > > > > Hi Sean, > > > > > > Sorry for replying to old email. But IMHO it is not a must that Qemu > > > needs to open some /dev/sgx and allocate/mmap EPC for guest's virtual > > > EPC slot, instead, KVM could create private slot, which is not visible > > > to Qemu, for virtual EPC, and KVM could call core-SGX EPC allocation > > > API directly. > > > > That's possible, but it has several downsides. > > > > - Duplicates a lot of code in KVM for managing memory regions. > > I don't see why there will be duplicated code. you can simply call > __x86_set_memory_region to create private slot. It is KVM x86 equivalent > to KVM_SET_USER_MEMORY_REGION from userspace. The only difference is Qemu > is not aware of the private slot. What about when we allow removing an EPC region? At that point you'd be fully duplicating KVM_SET_USER_MEMORY_REGION. And that's not a purely theoretical idea, it's something I'd like to support in the future, e.g. via the WIP virtio-mem interface. https://events.linuxfoundation.org/wp-content/uploads/2017/12/virtio-mem-Paravirtualized-Memory-David-Hildenbrand-Red-Hat-1.pdf > > > - Artificially restricts userspace to a single EPC region, unless > > even more code is duplicated to handle multiple private regions. > > You can have multiple private slots, by calling __x86_set_memory_region > for each EPC section. KVM receives EPC section/sections info from Qemu, > via CPUID, or dedicated IOCTL (is this you are going to add?), and simply > creates private EPC slot/slots. This would require a dynamic number of private memslots, which breaks (or at least changes) the semantics of KVM_CAP_NR_MEMSLOTS. > > > - Requires additional ioctls() or capabilities to probe EPC support > > No. EPC info is from Qemu at the beginning (size is given by parameter, > base is calculated by Qemu), and actually it is Qemu notifies KVM EPC info, > so I don't think we require additional ioctls or capabilities here. How does Qemu know KVM supports virtual EPC? Probing /dev/sgx doesn't convey any information about KVM support. Maybe you could report it via KVM_GET_SUPPORTED_CPUID, but that would be problematic for Qemu since it would have to create vCPUs before initializing the machine. > > > - Does not fit with Qemu/KVM's memory model, e.g. all other types of > > memory are exposed to a guest through > > KVM_SET_USER_MEMORY_REGION. > > EPC is different. I am not sure whether EPC needs to fit such model. There > are already examples in KVM which uses private slot w/o using > KVM_SET_USER_MEMORY_REGION, for example, APIC access page. EPC has unique access and lifecycle semantics, but that doesnt make it a special snowflake, e.g. NVDIMM has special properties, as does memory that is encrypted via SME or MKTME, and so on and so forth. The private memslots are private for a reason, e.g. the guest is completely unaware that they exist and Qemu is only aware of their existence because KVM needs userspace to tell it what GPA range won't conflict with its memory model. And in the APIC access page case, Qemu isn't aware at all since KVM doesn't allow relocating the guest's APIC. The other aspect of private memslots is that they are not exposed to L2, because again, from the guest's perspective, they do not exist. We can obviously hackaround that restriction, but it's yet another hint that shoving EPC into a private memslot is the wrong approach. > > > - Prevents userspace from debugging a guest's enclave. I'm not saying > > this is a likely scenario, but I also don't think we should preclude > > it without good reason. > > I am not sure how important it is, so don't know whether this can be a > justification. To me we don't need to consider this. Qemu normally doesn't > access guest memory unless it has to (ie, for device model). In terms of debug, "Qemu" accesses guest memory all the time, e.g. reading guest memory via Qemu's monitor interface is something I do on a regular basis. KVM even provides an ABI (KVM_SET_GUEST_DEBUG) to allow userspace to set breakpoints on guest memory. We'd be saying "you can break on accesses in a debug enclave, but you can't actually access its memory". > > > - KVM is now responsible for managing the lifecycle of EPC, e.g. what > > happens if an EPC cgroup limit is lowered on a running VM and > > KVM can't gracefully reclaim EPC? The userspace hypervisor should > > ultimately decide how to handle such an event. > > Even using KVM_SET_USER_MEMORY_REGION, KVM is also responsible for managing the lifecycle of EPC, no? And "managing the lifecycle of EPC" you mean doesn't need to be "managing EPC itself" I suppose, since EPC should always be managed by core SGX code. > > I don't see the difference between private slot and KVM_SET_USER_MEMORY_REGION, > in terms of how does KVM reclaim EPC, or how does KVM do when it fails to > reclaim EPC. Userspace can delete the EPC region via KVM_SET_USER_MEMORY_REGION. As mentioned above, on-demand deletion of a private memslot means duplicating basically all of the KVM_SET_USER_MEMORY_REGION flow. For example, consider a paravirtualized guest that allows EPC regions to be "unplugged" if the host runs out of EPC. > > > - SGX logic is split between SGX and KVM, e.g. VA page management for > > oversubscription will likely be common to SGX and KVM. From a long > > term maintenance perspective, this means that changes to the EPC > > management could potentially need to be Acked by KVM, and vice versa. > > I think most of the code should be in core SGX code under x86. KVM should > only have the code that is specifically related to virtualization, ie, ENCLV. > > VA page allocation should be under code SGX code. KVM might need to call > function such as alloc_va_page, etc, but this is not a problem. There are > many other cases now. > > And this is not related to private slot vs KVM_SET_USER_MEMORY_REGION. Yes it is. Allocating EPC via KVM means exporting additional functions from SGX core, and using a private memslot effectively forces KVM to allocate guest EPC. E.g. using KVM_SET_USER_MEMORY_REGION and having SGX own EPC allocation means KVM can be blissfully unaware of VA pages, EPC swap backing, etc... > > > > > > I am not sure what's the good of allowing userspace to alloc/mmap a > > > raw EPC region? Userspace is not allowed to touch EPC anyway, expect > > > enclave code. > > > > > > To me KVM creates private EPC slot is cleaner than exposing > > > /dev/sgx/epc and allowing userspace to map some raw EPC region. > > > > Cleaner in the sense that it's faster to get basic support up and running since > > there are fewer touchpoints, but there are long term ramifications to > > cramming EPC management in KVM. > > > > And at this point I'm not stating any absolutes, e.g. how EPC will be handled > > by KVM. What I'm pushing for is to not eliminate the possibility of having > > the SGX subsystem own all EPC management, e.g. don't tie /dev/sgx to a > > single enclave. > > I suppose "SGX subsystem" you mean here is core SGX code under x86. IMHO > EPC should always be managed by such SGX subsystem, and KVM and SGX driver > are just consumers (ie, calling EPC allocation function, etc). > > IMHO I think /dev/sgx (or whatever /dev/sgx/xxx) should be in SGX driver, > but not SGX core code. For example, if we don't consider KVM EPC > oversubscription here, theoretically we only need below code in core SGX > code to make KVM SGX work: > > 1) SGX feature detection. There should be some core structure (such as > boot_cpu_data, etc) where KVM can query SGX capabilities, ie, whether FLC > is available. > 2) EPC management. KVM simply calls EPC management APIs when it needs. For > example, when EPC slot is created, we should populate all EPC pages, and > fail to create VM if running out of EPC. > 3) Other code to deal with ENCLS, SGX data structure, etc, since we have > agreed even with EPC static allocation, we should trap EINIT. > > We probably even don't need code to deal with enclave, if only for KVM. Of > course, in order to support KVM EPC oversubscription, we should also include > enclave management code in core SGX code, but this doesn't necessary mean we > should include /dev/sgx/xxx in core SGX code. > > It seems there still are lots of design issues we haven't got consensus in > terms of how SGX driver should be (or how SGX stack is supposed to work), > but if we can focus on what core SGX code should really have (w/o involving > SGX driver logic), we can at least get KVM SGX code working. After all, we > also have window SGX support. I'm all for getting KVM support in ASAP, i.e. I'd love to avoid having to wait for "full" SGX support, but that doesn't obviate the need to hammer out the uapi. Taking a shortcut and shoving everything into KVM could bite us in the long run.
> > > That's possible, but it has several downsides. > > > > > > - Duplicates a lot of code in KVM for managing memory regions. > > > > I don't see why there will be duplicated code. you can simply call > > __x86_set_memory_region to create private slot. It is KVM x86 > > equivalent to KVM_SET_USER_MEMORY_REGION from userspace. The only > > difference is Qemu is not aware of the private slot. > > What about when we allow removing an EPC region? At that point you'd be > fully duplicating KVM_SET_USER_MEMORY_REGION. And that's not a purely > theoretical idea, it's something I'd like to support in the future, e.g. > via the WIP virtio-mem interface. OK. Isn't virtio-balloon good enough for us? Removing EPC is not consistent with hardware behaviour, but if you really want to support then should also be fine since we are not strictly following HW spec anyway. > > https://events.linuxfoundation.org/wp-content/uploads/2017/12/virtio- > mem-Paravirtualized-Memory-David-Hildenbrand-Red-Hat-1.pdf > > > > > > - Artificially restricts userspace to a single EPC region, unless > > > even more code is duplicated to handle multiple private regions. > > > > You can have multiple private slots, by calling > > __x86_set_memory_region for each EPC section. KVM receives EPC > > section/sections info from Qemu, via CPUID, or dedicated IOCTL (is > > this you are going to add?), and simply creates private EPC slot/slots. > > This would require a dynamic number of private memslots, which breaks (or > at least changes) the semantics of KVM_CAP_NR_MEMSLOTS. You are right. I forgot this one. > > > > > > - Requires additional ioctls() or capabilities to probe EPC > > > support > > > > No. EPC info is from Qemu at the beginning (size is given by > > parameter, base is calculated by Qemu), and actually it is Qemu > > notifies KVM EPC info, so I don't think we require additional ioctls or > capabilities here. > > How does Qemu know KVM supports virtual EPC? Probing /dev/sgx doesn't > convey any information about KVM support. Maybe you could report it via > KVM_GET_SUPPORTED_CPUID, but that would be problematic for Qemu > since it would have to create vCPUs before initializing the machine. KVM_GET_SUPPORTED_CPUID is the one. I don't think KVM_GET_SUPPORTED_CPUID require creating vcpu prior, since it is global thing that platform supports. No? > > > > > > - Does not fit with Qemu/KVM's memory model, e.g. all other types of > > > memory are exposed to a guest through > > > KVM_SET_USER_MEMORY_REGION. > > > > EPC is different. I am not sure whether EPC needs to fit such model. > > There are already examples in KVM which uses private slot w/o using > > KVM_SET_USER_MEMORY_REGION, for example, APIC access page. > > EPC has unique access and lifecycle semantics, but that doesnt make it a > special snowflake, e.g. NVDIMM has special properties, as does memory that > is encrypted via SME or MKTME, and so on and so forth. > > The private memslots are private for a reason, e.g. the guest is completely > unaware that they exist and Qemu is only aware of their existence because > KVM needs userspace to tell it what GPA range won't conflict with its > memory model. And in the APIC access page case, Qemu isn't aware at all > since KVM doesn't allow relocating the guest's APIC. > > The other aspect of private memslots is that they are not exposed to L2, > because again, from the guest's perspective, they do not exist. We can > obviously hackaround that restriction, but it's yet another hint that shoving > EPC into a private memslot is the wrong approach. But guest is aware of SGX and EPC so I don't see why it cannot be exposed to L2 even with private slot. But that doesn't matter. I agree with you letting Qemu create EPC slots is probably better since we can support multiple EPC better (and potential EPC removal if needed). And [snip] > > I'm all for getting KVM support in ASAP, i.e. I'd love to avoid having to wait > for "full" SGX support, but that doesn't obviate the need to hammer out the > uapi. Taking a shortcut and shoving everything into KVM could bite us in the > long run. I agree. Thanks, -Kai
On Wed, Jan 09, 2019 at 04:21:19PM -0800, Huang, Kai wrote: > > > > That's possible, but it has several downsides. > > > > > > > > - Duplicates a lot of code in KVM for managing memory regions. > > > > > > I don't see why there will be duplicated code. you can simply call > > > __x86_set_memory_region to create private slot. It is KVM x86 > > > equivalent to KVM_SET_USER_MEMORY_REGION from userspace. The only > > > difference is Qemu is not aware of the private slot. > > > > What about when we allow removing an EPC region? At that point you'd be > > fully duplicating KVM_SET_USER_MEMORY_REGION. And that's not a purely > > theoretical idea, it's something I'd like to support in the future, e.g. > > via the WIP virtio-mem interface. > > OK. Isn't virtio-balloon good enough for us? > > Removing EPC is not consistent with hardware behaviour, but if you really > want to support then should also be fine since we are not strictly > following HW spec anyway. Even if virtio-balloon is sufficient from a performance perspective, the virtio-mem approach can provide unique capabilities, e.g. hotplugging EPC into a VM or "I'm done with SGX, have all of my EPC back". > > > > https://events.linuxfoundation.org/wp-content/uploads/2017/12/virtio- > > mem-Paravirtualized-Memory-David-Hildenbrand-Red-Hat-1.pdf > > > > > No. EPC info is from Qemu at the beginning (size is given by > > > parameter, base is calculated by Qemu), and actually it is Qemu > > > notifies KVM EPC info, so I don't think we require additional ioctls or > > capabilities here. > > > > How does Qemu know KVM supports virtual EPC? Probing /dev/sgx doesn't > > convey any information about KVM support. Maybe you could report it via > > KVM_GET_SUPPORTED_CPUID, but that would be problematic for Qemu > > since it would have to create vCPUs before initializing the machine. > > KVM_GET_SUPPORTED_CPUID is the one. I don't think KVM_GET_SUPPORTED_CPUID > require creating vcpu prior, since it is global thing that platform supports. No? It's not a KVM requirement, but rather Qemu's code flow. It sets up the "machine" and then creates the vCPUs. The CPUID info is a CPU capability and so isn't necessarily available when the "machine" is being created. > > > > The other aspect of private memslots is that they are not exposed to L2, > > because again, from the guest's perspective, they do not exist. We can > > obviously hackaround that restriction, but it's yet another hint that shoving > > EPC into a private memslot is the wrong approach. > > But guest is aware of SGX and EPC so I don't see why it cannot be exposed > to L2 even with private slot. It's not that it can't be done, but rather we'd have to explicitly tell KVM "this private slot isn't really private, expose it to L2". See commit 3a2936dedd20 ("kvm: mmu: Don't expose private memslots to L2").
On Tue, Jan 08, 2019 at 07:27:11PM +0000, Huang, Kai wrote: > > > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > > opening a new instance of /dev/sgx for each encalve? > > > > Directly associating /dev/sgx with an enclave means /dev/sgx can't be used > > to provide ioctl()'s for other SGX-related needs, e.g. to mmap() raw EPC and > > expose it a VM. Proposed layout in the link below. I'll also respond to > > Jarkko's question about exposing EPC through /dev/sgx instead of having > > KVM allocate it on behalf of the VM. > > > > https://lkml.kernel.org/r/20181218185349.GC30082@linux.intel.com > > Hi Sean, > > Sorry for replying to old email. But IMHO it is not a must that Qemu needs to open some /dev/sgx and allocate/mmap EPC for guest's virtual EPC slot, instead, KVM could create private slot, which is not visible to Qemu, for virtual EPC, and KVM could call core-SGX EPC allocation API directly. > > I am not sure what's the good of allowing userspace to alloc/mmap a raw EPC region? Userspace is not allowed to touch EPC anyway, expect enclave code. > > To me KVM creates private EPC slot is cleaner than exposing /dev/sgx/epc and allowing userspace to map some raw EPC region. > > Thanks, > -Kai Side-note: this particular use case does not necessarily be solved in the first upstream patch set, does it? Just try to keep the patch set as small as possible (still be a huge one). /Jarkko
On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote: > I do think it makes sense to have QEMU delegate the various ENCLS > operations (especially EINIT) to the regular SGX interface, which will > mean that VM guests will have exactly the same access controls applied > as regular user programs, which is probably what we want. If so, > there will need to be a way to get INITTOKEN privilege for the purpose > of running non-Linux OSes in the VM, which isn't the end of the world. > We might still want the actual ioctl to do EINIT using an actual > explicit token to be somehow restricted in a way that strongly > discourages its use by anything other than a hypervisor. Or I suppose > we could just straight-up ignore the guest-provided init token. Does it even matter if just leave EINITTOKENKEY attribute unprivileged given that Linux requires that MSRs are writable? Maybe I'll just whitelist that attribute to any enclave? /Jarkko
>> On Jan 9, 2019, at 8:31 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: >> >> On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote: >> On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson >> <sean.j.christopherson@intel.com> wrote: >>> >>> Cleaner in the sense that it's faster to get basic support up and running >>> since there are fewer touchpoints, but there are long term ramifications >>> to cramming EPC management in KVM. >>> >>> And at this point I'm not stating any absolutes, e.g. how EPC will be >>> handled by KVM. What I'm pushing for is to not eliminate the possibility >>> of having the SGX subsystem own all EPC management, e.g. don't tie >>> /dev/sgx to a single enclave. >> >> I haven't gone and re-read all the relevant SDM bits, so I'll just >> ask: what, if anything, are the actual semantics of mapping "raw EPC" >> like this? You can't actually do anything with the mapping from user >> mode unless you actually get an enclave created and initialized in it >> and have it mapped at the correct linear address, right? I still >> think you have the right idea, but it is a bit unusual. > > Correct, the EPC is inaccessible until a range is "mapped" with ECREATE. > But I'd argue that it's not unusual, just different. And really it's not > all that different than userspace mmap'ing /dev/sgx/enclave prior to > ioctl(ENCLAVE_CREATE). In that case, userspace can still (attempt to) > access the "raw" EPC, i.e. generate a #PF, the kernel/driver just happens > to consider any faulting EPC address without an associated enclave as > illegal, e.g. signals SIGBUS. > > The /dev/sgx/epc case simply has different semantics for moving pages in > and out of the EPC, i.e. different fault and eviction semantics. Yes, > this allows the guest kernel to directly access the "raw" EPC, but that's > conceptually in line with hardware where priveleged software can directly > "access" the EPC (or rather, the abort page for all intents and purposes). > I.e. it's an argument for requiring certain privileges to open /dev/sgx/epc, > but IMO it's not unusual. > > Maybe /dev/sgx/epc is a poor name and is causing confusion, e.g. > /dev/sgx/virtualmachine might be more appropriate. > >> I do think it makes sense to have QEMU delegate the various ENCLS >> operations (especially EINIT) to the regular SGX interface, which will >> mean that VM guests will have exactly the same access controls applied >> as regular user programs, which is probably what we want. > > To what end? Except for EINIT, none of the ENCLS leafs are interesting > from a permissions perspective. Trapping and re-executing ENCLS leafs > is painful, e.g. most leafs have multiple virtual addresses that need to > be translated. And routing everything through the regular interface > would make SGX even slower than it already is, e.g. every ENCLS would > take an additional ~900 cycles just to handle the VM-Exit, and that's > not accounting for any additional overhead in the SGX code, e.g. using > the regular interface would mean superfluous locks, etc... Trapping EINIT is what I have in mind. > > Couldn't we require the same privilege/capability for VMs and and EINIT > tokens? I.e. /dev/sgx/virtualmachine can only be opened by a user that > can also generate tokens. Hmm, maybe. Or we can use Jarkko’s securityfs attribute thingy. Concretely, I think there are two things we care about: First, if the host enforces some policy as to which enclaves can launch, then it should apply the same policy to guests — otherwise KVM lets programs do an end run around the policy. So, in the initial incarnation of this, QEMU should probably have to open the provision attribute fd if it wants its guest to be able to EINIT a provisioning enclave. When someone inevitably adds an EINIT LSM hook, the KVM interface should also call it. Second, the normal enclave interface won't allow user code to supply an EINITTOKEN, so the KVM interface will presumably need to be different, unless we're going to emulate EINIT by ignoring the token. That seems like a very strange thing to do.
On Thu, Jan 10, 2019 at 9:46 AM Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote: > > I do think it makes sense to have QEMU delegate the various ENCLS > > operations (especially EINIT) to the regular SGX interface, which will > > mean that VM guests will have exactly the same access controls applied > > as regular user programs, which is probably what we want. If so, > > there will need to be a way to get INITTOKEN privilege for the purpose > > of running non-Linux OSes in the VM, which isn't the end of the world. > > We might still want the actual ioctl to do EINIT using an actual > > explicit token to be somehow restricted in a way that strongly > > discourages its use by anything other than a hypervisor. Or I suppose > > we could just straight-up ignore the guest-provided init token. > > Does it even matter if just leave EINITTOKENKEY attribute unprivileged > given that Linux requires that MSRs are writable? Maybe I'll just > whitelist that attribute to any enclave? > I would at least make it work like the PROVISIONKEY bit (or whatever it's called). Or just deny it at first. It's easy to start allowing it if we need to down the road, but it's harder to start denying it. Also, I don't really want to see some SDK invoke a launch enclave because that's how it works on Windows and then do nothing with the resulting EINITOKEN. If we don't allow it, then the SDKs will be forced to do a better job, which is probably a good thing.
On Thu, 2019-01-10 at 13:34 -0800, Andy Lutomirski wrote: > > > On Jan 9, 2019, at 8:31 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > > > > > On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote: > > > On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson > > > <sean.j.christopherson@intel.com> wrote: > > > > > > > > Cleaner in the sense that it's faster to get basic support up and running > > > > since there are fewer touchpoints, but there are long term ramifications > > > > to cramming EPC management in KVM. > > > > > > > > And at this point I'm not stating any absolutes, e.g. how EPC will be > > > > handled by KVM. What I'm pushing for is to not eliminate the possibility > > > > of having the SGX subsystem own all EPC management, e.g. don't tie > > > > /dev/sgx to a single enclave. > > > > > > I haven't gone and re-read all the relevant SDM bits, so I'll just > > > ask: what, if anything, are the actual semantics of mapping "raw EPC" > > > like this? You can't actually do anything with the mapping from user > > > mode unless you actually get an enclave created and initialized in it > > > and have it mapped at the correct linear address, right? I still > > > think you have the right idea, but it is a bit unusual. > > > > Correct, the EPC is inaccessible until a range is "mapped" with ECREATE. > > But I'd argue that it's not unusual, just different. And really it's not > > all that different than userspace mmap'ing /dev/sgx/enclave prior to > > ioctl(ENCLAVE_CREATE). In that case, userspace can still (attempt to) > > access the "raw" EPC, i.e. generate a #PF, the kernel/driver just happens > > to consider any faulting EPC address without an associated enclave as > > illegal, e.g. signals SIGBUS. > > > > The /dev/sgx/epc case simply has different semantics for moving pages in > > and out of the EPC, i.e. different fault and eviction semantics. Yes, > > this allows the guest kernel to directly access the "raw" EPC, but that's > > conceptually in line with hardware where priveleged software can directly > > "access" the EPC (or rather, the abort page for all intents and purposes). > > I.e. it's an argument for requiring certain privileges to open /dev/sgx/epc, > > but IMO it's not unusual. > > > > Maybe /dev/sgx/epc is a poor name and is causing confusion, e.g. > > /dev/sgx/virtualmachine might be more appropriate. > > > > > I do think it makes sense to have QEMU delegate the various ENCLS > > > operations (especially EINIT) to the regular SGX interface, which will > > > mean that VM guests will have exactly the same access controls applied > > > as regular user programs, which is probably what we want. > > > > To what end? Except for EINIT, none of the ENCLS leafs are interesting > > from a permissions perspective. Trapping and re-executing ENCLS leafs > > is painful, e.g. most leafs have multiple virtual addresses that need to > > be translated. And routing everything through the regular interface > > would make SGX even slower than it already is, e.g. every ENCLS would > > take an additional ~900 cycles just to handle the VM-Exit, and that's > > not accounting for any additional overhead in the SGX code, e.g. using > > the regular interface would mean superfluous locks, etc... > > Trapping EINIT is what I have in mind. > > > > > Couldn't we require the same privilege/capability for VMs and and EINIT > > tokens? I.e. /dev/sgx/virtualmachine can only be opened by a user that > > can also generate tokens. > > Hmm, maybe. Or we can use Jarkko’s securityfs attribute thingy. > > Concretely, I think there are two things we care about: > > First, if the host enforces some policy as to which enclaves can > launch, then it should apply the same policy to guests — otherwise KVM > lets programs do an end run around the policy. So, in the initial > incarnation of this, QEMU should probably have to open the provision > attribute fd if it wants its guest to be able to EINIT a provisioning > enclave. When someone inevitably adds an EINIT LSM hook, the KVM > interface should also call it. > > Second, the normal enclave interface won't allow user code to supply > an EINITTOKEN, so the KVM interface will presumably need to be > different, unless we're going to emulate EINIT by ignoring the token. > That seems like a very strange thing to do. Hi Andy, IMHO applying policy to enclave in VM should be different to applying policy to enclave in host. SGX sw stack in host should be able to run inside VM without any modification, so for example, if host sets policy that we cannot run LE (except LE in host), then basically we are disabling SGX in VM. In general KVM SGX is supposed to run all guest OSes with SGX. And for provisioning enclave, do you see any reason that we need to disallow to run it inside VM? Maybe some more general questions: What policy/policies should we have in host? Should they in core- SGX code, or should they belong to SGX driver's scope? Do we need to figure out all of them and how to control before we can actually think about upstreaming virtualization support? Thanks, -Kai
On Thu, Jan 10, 2019 at 01:34:44PM -0800, Andy Lutomirski wrote: > >> On Jan 9, 2019, at 8:31 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > >> > >> On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote: > > > >> I do think it makes sense to have QEMU delegate the various ENCLS > >> operations (especially EINIT) to the regular SGX interface, which will > >> mean that VM guests will have exactly the same access controls applied > >> as regular user programs, which is probably what we want. > > > > To what end? Except for EINIT, none of the ENCLS leafs are interesting > > from a permissions perspective. Trapping and re-executing ENCLS leafs > > is painful, e.g. most leafs have multiple virtual addresses that need to > > be translated. And routing everything through the regular interface > > would make SGX even slower than it already is, e.g. every ENCLS would > > take an additional ~900 cycles just to handle the VM-Exit, and that's > > not accounting for any additional overhead in the SGX code, e.g. using > > the regular interface would mean superfluous locks, etc... > > Trapping EINIT is what I have in mind. Phew, had me worried :-) > > > > Couldn't we require the same privilege/capability for VMs and and EINIT > > tokens? I.e. /dev/sgx/virtualmachine can only be opened by a user that > > can also generate tokens. > > Hmm, maybe. Or we can use Jarkko’s securityfs attribute thingy. > > Concretely, I think there are two things we care about: > > First, if the host enforces some policy as to which enclaves can > launch, then it should apply the same policy to guests — otherwise KVM > lets programs do an end run around the policy. So, in the initial > incarnation of this, QEMU should probably have to open the provision > attribute fd if it wants its guest to be able to EINIT a provisioning > enclave. When someone inevitably adds an EINIT LSM hook, the KVM > interface should also call it. Sort of. A guest that is running under KVM (i.e. VMX) is much more contained than a random userspace program. A rogue enclave in a VMX guest can attack the guest kernel/OS, but barring a bug (or more likely, several major bugs) elsewhere in the virtualization stack the enclave can't do anything nasty to the host. An enclave would let someone hide code, but enclaves are even more restricted than cpl3, i.e. there's not a lot it can do without coordinating with unencrypted code in the guest. And if someone has sufficient permissions to run a KVM guest, they're much more likely to do something malcious in the guest kernel, not an enclave. All that aside, I don't see any justification for singling out SGX for extra scrutiny, there are other ways for a user with KVM permissions to hide malicious code in guest (and at cpl0!), e.g. AMD's SEV{-ES}. > Second, the normal enclave interface won't allow user code to supply > an EINITTOKEN, so the KVM interface will presumably need to be > different, unless we're going to emulate EINIT by ignoring the token. > That seems like a very strange thing to do.
On Thu, Jan 10, 2019 at 3:54 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Thu, Jan 10, 2019 at 01:34:44PM -0800, Andy Lutomirski wrote: > > >> On Jan 9, 2019, at 8:31 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > >> > > >> On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote: > > > > > >> I do think it makes sense to have QEMU delegate the various ENCLS > > >> operations (especially EINIT) to the regular SGX interface, which will > > >> mean that VM guests will have exactly the same access controls applied > > >> as regular user programs, which is probably what we want. > > > > > > To what end? Except for EINIT, none of the ENCLS leafs are interesting > > > from a permissions perspective. Trapping and re-executing ENCLS leafs > > > is painful, e.g. most leafs have multiple virtual addresses that need to > > > be translated. And routing everything through the regular interface > > > would make SGX even slower than it already is, e.g. every ENCLS would > > > take an additional ~900 cycles just to handle the VM-Exit, and that's > > > not accounting for any additional overhead in the SGX code, e.g. using > > > the regular interface would mean superfluous locks, etc... > > > > Trapping EINIT is what I have in mind. > > Phew, had me worried :-) > > > > > > > Couldn't we require the same privilege/capability for VMs and and EINIT > > > tokens? I.e. /dev/sgx/virtualmachine can only be opened by a user that > > > can also generate tokens. > > > > Hmm, maybe. Or we can use Jarkko’s securityfs attribute thingy. > > > > Concretely, I think there are two things we care about: > > > > First, if the host enforces some policy as to which enclaves can > > launch, then it should apply the same policy to guests — otherwise KVM > > lets programs do an end run around the policy. So, in the initial > > incarnation of this, QEMU should probably have to open the provision > > attribute fd if it wants its guest to be able to EINIT a provisioning > > enclave. When someone inevitably adds an EINIT LSM hook, the KVM > > interface should also call it. > > Sort of. A guest that is running under KVM (i.e. VMX) is much more > contained than a random userspace program. A rogue enclave in a VMX > guest can attack the guest kernel/OS, but barring a bug (or more likely, > several major bugs) elsewhere in the virtualization stack the enclave > can't do anything nasty to the host. An enclave would let someone hide > code, but enclaves are even more restricted than cpl3, i.e. there's not > a lot it can do without coordinating with unencrypted code in the guest. > > And if someone has sufficient permissions to run a KVM guest, they're > much more likely to do something malcious in the guest kernel, not an > enclave. Are you sure? On my laptop, /dev/kvm is 0666, and that's the distro default. I don't think this is at all unusual. I'm not particularly concerned about a guest attacking itself, but it's conceptually straightforward to bypass whatever restrictions the host has by simply opening /dev/kvm and sticking your enclave in a VM. > > All that aside, I don't see any justification for singling out SGX for > extra scrutiny, there are other ways for a user with KVM permissions to > hide malicious code in guest (and at cpl0!), e.g. AMD's SEV{-ES}. I'm not singling out SGX. I'm just saying that the KVM should not magically bypass host policy. If you want to assign a virtual function on your NIC to a KVM guest, you need to give your QEMU process that privilege. Similarly, if someone has a MAC policy that controls which processes can launch which enclaves and they want to run Windows with full SGX support in a VM guest, then they should authorize that in their MAC policy by giving QEMU unrestricted launch privileges. Similarly, if access to a persistent provisioning identifier is restricted, access to /dev/kvm shouldn't magically bypass it. Just give the QEMU process the relevant privileges.
On Thu, Jan 10, 2019 at 04:30:06PM -0800, Andy Lutomirski wrote: > On Thu, Jan 10, 2019 at 3:54 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > Sort of. A guest that is running under KVM (i.e. VMX) is much more > > contained than a random userspace program. A rogue enclave in a VMX > > guest can attack the guest kernel/OS, but barring a bug (or more likely, > > several major bugs) elsewhere in the virtualization stack the enclave > > can't do anything nasty to the host. An enclave would let someone hide > > code, but enclaves are even more restricted than cpl3, i.e. there's not > > a lot it can do without coordinating with unencrypted code in the guest. > > > > And if someone has sufficient permissions to run a KVM guest, they're > > much more likely to do something malcious in the guest kernel, not an > > enclave. > > Are you sure? On my laptop, /dev/kvm is 0666, and that's the distro > default. I don't think this is at all unusual. Wow, that's suprising. A quick search suggests that this may be Debian specific[1], e.g. my Ubuntu systems have: crw-rw---- 1 root kvm 10, 232 Jan 9 09:30 /dev/kvm [1] https://bugzilla.redhat.com/show_bug.cgi?id=1431876 > I'm not particularly > concerned about a guest attacking itself, but it's conceptually > straightforward to bypass whatever restrictions the host has by simply > opening /dev/kvm and sticking your enclave in a VM. VMs by nature allow a user to bypass all sorts of restrictions, e.g. the kernel doesn't let userspace run arbitrary cpl0 code, but launch a VM and voila. It's what you can do with the new privileges that matters. > > All that aside, I don't see any justification for singling out SGX for > > extra scrutiny, there are other ways for a user with KVM permissions to > > hide malicious code in guest (and at cpl0!), e.g. AMD's SEV{-ES}. > > I'm not singling out SGX. I'm just saying that the KVM should not > magically bypass host policy. If you want to assign a virtual > function on your NIC to a KVM guest, you need to give your QEMU > process that privilege. Similarly, if someone has a MAC policy that > controls which processes can launch which enclaves and they want to > run Windows with full SGX support in a VM guest, then they should > authorize that in their MAC policy by giving QEMU unrestricted launch > privileges. MAC systems exist to protect assets, and IMO an enclave isn't an asset. E.g. AppArmor (via LSM) isn't protecting files, it's protecting the contents of the file or what can be done with the file. And the MAC is only part of the overall protection scheme, e.g. userspace is also relying on the kernel to not screw up the page tables. In SGX terms, a LSM hook might use enclave signatures to protect some asset 'X', e.g. access to persistent identifier. But that doesn't mean that whitelisting enclave signatures is the only way to protect 'X'. > Similarly, if access to a persistent provisioning identifier is > restricted, access to /dev/kvm shouldn't magically bypass it. Just > give the QEMU process the relevant privileges. Agreed, but that's not same as applying a host's whitelist against a guest's enclaves.
On Wed, Jan 09, 2019 at 08:31:37AM -0800, Sean Christopherson wrote: > On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote: > > On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson > > <sean.j.christopherson@intel.com> wrote: > > > > > > Cleaner in the sense that it's faster to get basic support up and running > > > since there are fewer touchpoints, but there are long term ramifications > > > to cramming EPC management in KVM. > > > > > > And at this point I'm not stating any absolutes, e.g. how EPC will be > > > handled by KVM. What I'm pushing for is to not eliminate the possibility > > > of having the SGX subsystem own all EPC management, e.g. don't tie > > > /dev/sgx to a single enclave. > > > > I haven't gone and re-read all the relevant SDM bits, so I'll just > > ask: what, if anything, are the actual semantics of mapping "raw EPC" > > like this? You can't actually do anything with the mapping from user > > mode unless you actually get an enclave created and initialized in it > > and have it mapped at the correct linear address, right? I still > > think you have the right idea, but it is a bit unusual. > > Correct, the EPC is inaccessible until a range is "mapped" with ECREATE. > But I'd argue that it's not unusual, just different. And really it's not > all that different than userspace mmap'ing /dev/sgx/enclave prior to > ioctl(ENCLAVE_CREATE). In that case, userspace can still (attempt to) > access the "raw" EPC, i.e. generate a #PF, the kernel/driver just happens > to consider any faulting EPC address without an associated enclave as > illegal, e.g. signals SIGBUS. > > The /dev/sgx/epc case simply has different semantics for moving pages in > and out of the EPC, i.e. different fault and eviction semantics. Yes, > this allows the guest kernel to directly access the "raw" EPC, but that's > conceptually in line with hardware where priveleged software can directly > "access" the EPC (or rather, the abort page for all intents and purposes). > I.e. it's an argument for requiring certain privileges to open /dev/sgx/epc, > but IMO it's not unusual. > > Maybe /dev/sgx/epc is a poor name and is causing confusion, e.g. > /dev/sgx/virtualmachine might be more appropriate. What do you mean by saying "requiring certain privileges"? Are you saying that "raw EPC" (lets say /dev/vmsgx, which probably the name I will use for the device *if* it is required) device would require differet privileged than /dev/sgx? /Jarkko
On Fri, Jan 11, 2019 at 02:58:26PM +0200, Jarkko Sakkinen wrote: > On Wed, Jan 09, 2019 at 08:31:37AM -0800, Sean Christopherson wrote: > > On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote: > > > On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson > > > <sean.j.christopherson@intel.com> wrote: > > > > > > > > Cleaner in the sense that it's faster to get basic support up and running > > > > since there are fewer touchpoints, but there are long term ramifications > > > > to cramming EPC management in KVM. > > > > > > > > And at this point I'm not stating any absolutes, e.g. how EPC will be > > > > handled by KVM. What I'm pushing for is to not eliminate the possibility > > > > of having the SGX subsystem own all EPC management, e.g. don't tie > > > > /dev/sgx to a single enclave. > > > > > > I haven't gone and re-read all the relevant SDM bits, so I'll just > > > ask: what, if anything, are the actual semantics of mapping "raw EPC" > > > like this? You can't actually do anything with the mapping from user > > > mode unless you actually get an enclave created and initialized in it > > > and have it mapped at the correct linear address, right? I still > > > think you have the right idea, but it is a bit unusual. > > > > Correct, the EPC is inaccessible until a range is "mapped" with ECREATE. > > But I'd argue that it's not unusual, just different. And really it's not > > all that different than userspace mmap'ing /dev/sgx/enclave prior to > > ioctl(ENCLAVE_CREATE). In that case, userspace can still (attempt to) > > access the "raw" EPC, i.e. generate a #PF, the kernel/driver just happens > > to consider any faulting EPC address without an associated enclave as > > illegal, e.g. signals SIGBUS. > > > > The /dev/sgx/epc case simply has different semantics for moving pages in > > and out of the EPC, i.e. different fault and eviction semantics. Yes, > > this allows the guest kernel to directly access the "raw" EPC, but that's > > conceptually in line with hardware where priveleged software can directly > > "access" the EPC (or rather, the abort page for all intents and purposes). > > I.e. it's an argument for requiring certain privileges to open /dev/sgx/epc, > > but IMO it's not unusual. > > > > Maybe /dev/sgx/epc is a poor name and is causing confusion, e.g. > > /dev/sgx/virtualmachine might be more appropriate. > > What do you mean by saying "requiring certain privileges"? Are you > saying that "raw EPC" (lets say /dev/vmsgx, which probably the name I > will use for the device *if* it is required) device would require > differet privileged than /dev/sgx? I'd say that it would be a good goal to be able to create a VM as a normal user and use SGX. /Jarkko
On Thu, Jan 10, 2019 at 01:36:15PM -0800, Andy Lutomirski wrote: > > Does it even matter if just leave EINITTOKENKEY attribute unprivileged > > given that Linux requires that MSRs are writable? Maybe I'll just > > whitelist that attribute to any enclave? > > > > I would at least make it work like the PROVISIONKEY bit (or whatever > it's called). Or just deny it at first. It's easy to start allowing > it if we need to down the road, but it's harder to start denying it. I think that would be a great idea to add another file to securityfs for this. Would fit perfectly to your "systemd privilege sharing" daemon example. Here consistency would be really nice. /Jarkko
On Fri, Jan 11, 2019 at 02:58:26PM +0200, Jarkko Sakkinen wrote: > On Wed, Jan 09, 2019 at 08:31:37AM -0800, Sean Christopherson wrote: > > On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote: > > > On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson > > > <sean.j.christopherson@intel.com> wrote: > > > > > > > > Cleaner in the sense that it's faster to get basic support up and running > > > > since there are fewer touchpoints, but there are long term ramifications > > > > to cramming EPC management in KVM. > > > > > > > > And at this point I'm not stating any absolutes, e.g. how EPC will be > > > > handled by KVM. What I'm pushing for is to not eliminate the possibility > > > > of having the SGX subsystem own all EPC management, e.g. don't tie > > > > /dev/sgx to a single enclave. > > > > > > I haven't gone and re-read all the relevant SDM bits, so I'll just > > > ask: what, if anything, are the actual semantics of mapping "raw EPC" > > > like this? You can't actually do anything with the mapping from user > > > mode unless you actually get an enclave created and initialized in it > > > and have it mapped at the correct linear address, right? I still > > > think you have the right idea, but it is a bit unusual. > > > > Correct, the EPC is inaccessible until a range is "mapped" with ECREATE. > > But I'd argue that it's not unusual, just different. And really it's not > > all that different than userspace mmap'ing /dev/sgx/enclave prior to > > ioctl(ENCLAVE_CREATE). In that case, userspace can still (attempt to) > > access the "raw" EPC, i.e. generate a #PF, the kernel/driver just happens > > to consider any faulting EPC address without an associated enclave as > > illegal, e.g. signals SIGBUS. > > > > The /dev/sgx/epc case simply has different semantics for moving pages in > > and out of the EPC, i.e. different fault and eviction semantics. Yes, > > this allows the guest kernel to directly access the "raw" EPC, but that's > > conceptually in line with hardware where priveleged software can directly > > "access" the EPC (or rather, the abort page for all intents and purposes). > > I.e. it's an argument for requiring certain privileges to open /dev/sgx/epc, > > but IMO it's not unusual. > > > > Maybe /dev/sgx/epc is a poor name and is causing confusion, e.g. > > /dev/sgx/virtualmachine might be more appropriate. > > What do you mean by saying "requiring certain privileges"? Are you > saying that "raw EPC" (lets say /dev/vmsgx, which probably the name I > will use for the device *if* it is required) device would require > differet privileged than /dev/sgx? I don't think it would be mandatory, especially if PROVISION and EINITTOKEN attributes are routed through securityfs, but it might be nice to have since the functionality provided by /dev/vmsgx would be different than /dev/sgx. Side topic, what's the reasoning for doing /dev/sgx and /dev/vmsgx instead of /dev/sgx/{enclave,vm,etc...}?
On Fri, Jan 11, 2019 at 03:19:56PM -0800, Sean Christopherson wrote: > On Fri, Jan 11, 2019 at 02:58:26PM +0200, Jarkko Sakkinen wrote: > > On Wed, Jan 09, 2019 at 08:31:37AM -0800, Sean Christopherson wrote: > > > On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote: > > > > On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson > > > > <sean.j.christopherson@intel.com> wrote: > > > > > > > > > > Cleaner in the sense that it's faster to get basic support up and running > > > > > since there are fewer touchpoints, but there are long term ramifications > > > > > to cramming EPC management in KVM. > > > > > > > > > > And at this point I'm not stating any absolutes, e.g. how EPC will be > > > > > handled by KVM. What I'm pushing for is to not eliminate the possibility > > > > > of having the SGX subsystem own all EPC management, e.g. don't tie > > > > > /dev/sgx to a single enclave. > > > > > > > > I haven't gone and re-read all the relevant SDM bits, so I'll just > > > > ask: what, if anything, are the actual semantics of mapping "raw EPC" > > > > like this? You can't actually do anything with the mapping from user > > > > mode unless you actually get an enclave created and initialized in it > > > > and have it mapped at the correct linear address, right? I still > > > > think you have the right idea, but it is a bit unusual. > > > > > > Correct, the EPC is inaccessible until a range is "mapped" with ECREATE. > > > But I'd argue that it's not unusual, just different. And really it's not > > > all that different than userspace mmap'ing /dev/sgx/enclave prior to > > > ioctl(ENCLAVE_CREATE). In that case, userspace can still (attempt to) > > > access the "raw" EPC, i.e. generate a #PF, the kernel/driver just happens > > > to consider any faulting EPC address without an associated enclave as > > > illegal, e.g. signals SIGBUS. > > > > > > The /dev/sgx/epc case simply has different semantics for moving pages in > > > and out of the EPC, i.e. different fault and eviction semantics. Yes, > > > this allows the guest kernel to directly access the "raw" EPC, but that's > > > conceptually in line with hardware where priveleged software can directly > > > "access" the EPC (or rather, the abort page for all intents and purposes). > > > I.e. it's an argument for requiring certain privileges to open /dev/sgx/epc, > > > but IMO it's not unusual. > > > > > > Maybe /dev/sgx/epc is a poor name and is causing confusion, e.g. > > > /dev/sgx/virtualmachine might be more appropriate. > > > > What do you mean by saying "requiring certain privileges"? Are you > > saying that "raw EPC" (lets say /dev/vmsgx, which probably the name I > > will use for the device *if* it is required) device would require > > differet privileged than /dev/sgx? > > I don't think it would be mandatory, especially if PROVISION and EINITTOKEN > attributes are routed through securityfs, but it might be nice to have > since the functionality provided by /dev/vmsgx would be different than > /dev/sgx. > > Side topic, what's the reasoning for doing /dev/sgx and /dev/vmsgx instead > of /dev/sgx/{enclave,vm,etc...}? I don't see we having more than two devices. Directory hierarchies would make sense if there was variable numer of stuff initialized. /Jarkko