Message ID | 20181116010412.23967-23-jarkko.sakkinen@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v17,01/23] x86/sgx: Update MAINTAINERS | expand |
Hi, I have more editing comments below. On 11/15/18 5:01 PM, Jarkko Sakkinen wrote: > Documentation of the features of the Software Guard eXtensions used > by the Linux kernel and basic design choices for the core and driver > and functionality. > > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- > Documentation/index.rst | 1 + > Documentation/x86/index.rst | 8 ++ > Documentation/x86/intel_sgx.rst | 233 ++++++++++++++++++++++++++++++++ > 3 files changed, 242 insertions(+) > create mode 100644 Documentation/x86/index.rst > create mode 100644 Documentation/x86/intel_sgx.rst > diff --git a/Documentation/x86/intel_sgx.rst b/Documentation/x86/intel_sgx.rst > new file mode 100644 > index 000000000000..f51b43f9e125 > --- /dev/null > +++ b/Documentation/x86/intel_sgx.rst > @@ -0,0 +1,233 @@ > +=================== > +Intel(R) SGX driver > +=================== > + > +Introduction > +============ > + > +Intel(R) SGX is a set of CPU instructions that can be used by applications to > +set aside private regions of code and data. The code outside the enclave is > +disallowed to access the memory inside the enclave by the CPU access control. > +In a way you can think that SGX provides inverted sandbox. It protects the provides an inverted sandbox. > +application from a malicious host. > + > +You can tell if your CPU supports SGX by looking into ``/proc/cpuinfo``: > + > + ``cat /proc/cpuinfo | grep sgx`` > + > +Overview of SGX > +=============== > + > +SGX has a set of data structures to maintain information about the enclaves and > +their security properties. BIOS reserves a fixed size region of physical memory > +for these structures by setting Processor Reserved Memory Range Registers > +(PRMRR). > + > +This memory range is protected from outside access by the CPU and all the data > +coming in and out of the CPU package is encrypted by a key that is generated for > +each boot cycle. > + > +Enclaves execute in ring-3 in a special enclave submode using pages from the ring 3 > +reserved memory range. A fixed logical address range for the enclave is reserved > +by ENCLS(ECREATE), a leaf instruction used to create enclaves. It is referred in referred to in > +the documentation commonly as the ELRANGE. > + > +Every memory access to the ELRANGE is asserted by the CPU. If the CPU is not > +executing in the enclave mode inside the enclave, #GP is raised. On the other > +hand, enclave code can make memory accesses both inside and outside of the > +ELRANGE. > + > +Enclave can only execute code inside the ELRANGE. Instructions that may cause An enclave can only > +VMEXIT, IO instructions and instructions that require a privilege change are > +prohibited inside the enclave. Interrupts and exceptions always cause enclave always cause an enclave > +to exit and jump to an address outside the enclave given when the enclave is > +entered by using the leaf instruction ENCLS(EENTER). > + > +Protected memory > +---------------- > + > +Enclave Page Cache (EPC) > + Physical pages used with enclaves that are protected by the CPU from > + unauthorized access. > + > +Enclave Page Cache Map (EPCM) > + A database that describes the properties and state of the pages e.g. their > + permissions or to which enclave they belong to. Drop one of those "to" words (either one). > + > +Memory Encryption Engine (MEE) integrity tree > + Autonomously updated integrity tree. The root of the tree located in on-die > + SRAM. > + > +EPC data types > +-------------- > + > +SGX Enclave Control Structure (SECS) > + Describes the global properties of an enclave. Will not be mapped to the > + ELRANGE. > + > +Regular (REG) > + These pages contain code and data. > + > +Thread Control Structure (TCS) > + The pages that define the entry points inside an enclave. An enclave can > + only be entered through these entry points and each can host a single > + hardware thread at a time. > + > +Version Array (VA) > + The pages contain 64-bit version numbers for pages that have been swapped > + outside the enclave. Each page has the capacity of 512 version numbers. > + > +Launch control > +-------------- > + > +To launch an enclave, two structures must be provided for ENCLS(EINIT): > + > +1. **SIGSTRUCT:** signed measurement of the enclave binary. > +2. **EINITTOKEN:** a cryptographic token CMAC-signed with a AES256-key called with an > + *launch key*, which is re-generated for each boot cycle. (prefer) regenerated > + > +The CPU holds a SHA256 hash of a 3072-bit RSA public key inside > +IA32_SGXLEPUBKEYHASHn MSRs. Enclaves with a SIGSTRUCT that is signed with this > +key do not require a valid EINITTOKEN and can be authorized with special > +privileges. One of those privileges is ability to acquire the launch key with > +ENCLS(EGETKEY). > + > +**IA32_FEATURE_CONTROL[17]** is used by the BIOS configure whether by the BIOS to configure whether > +IA32_SGXLEPUBKEYHASH MSRs are read-only or read-write before locking the > +feature control register and handing over control to the operating system. > + > +Enclave construction > +-------------------- > + > +The construction is started by filling out the SECS that contains enclave > +address range, privileged attributes and measurement of TCS and REG pages (pages > +that will be mapped to the address range) among the other things. This structure > +is passed out to the ENCLS(ECREATE) together with a physical address of a page This would make more sense to me: is passed to the ENCLS(ECREATE) instruction together with ... > +in EPC that will hold the SECS. > + > +The pages are added with ENCLS(EADD) and measured with ENCLS(EEXTEND) i.e. with ENCLS(EEXTEND), i.e. > +SHA256 hash MRENCLAVE residing in the SECS is extended with the page data. > + > +After all of the pages have been added, the enclave is initialized with > +ENCLS(EINIT). It will check that the SIGSTRUCT is signed with the contained > +public key. If the given EINITTOKEN has the valid bit set, the CPU checks that > +the token is valid (CMAC'd with the launch key). If the token is not valid, > +the CPU will check whether the enclave is signed with a key matching to the > +IA32_SGXLEPUBKEYHASHn MSRs. > + > +Swapping pages > +-------------- > + > +Enclave pages can be swapped out with ENCLS(EWB) to the unprotected memory. In > +addition to the EPC page, ENCLS(EWB) takes in a VA page and address for PCMD > +structure (Page Crypto MetaData) as input. The VA page will seal a version > +number for the page. PCMD is 128 byte structure that contains tracking 128-byte > +information for the page, most importantly its MAC. With these structures the > +enclave is sealed and rollback protected while it resides in the unprotected > +memory. > + > +Before the page can be swapped out it must not have any active TLB references. > +ENCLS(EBLOCK) instruction moves a page to the *blocked* state, which means The ENCLS(EBLOCK) instruction > +that no new TLB entries can be created to it by the hardware threads. > + > +After this a shootdown sequence is started with ENCLS(ETRACK), which sets an > +increased counter value to the entering hardware threads. ENCLS(EWB) will > +return SGX_NOT_TRACKED error while there are still threads with the earlier > +couner value because that means that there might be hardware thread inside counter threads > +the enclave with TLB entries to pages that are to be swapped. > + > +Kernel internals > +================ > + > +Requirements > +------------ > + > +Because SGX has an ever evolving and expanding feature set, it's possible for > +a BIOS or VMM to configure a system in such a way that not all CPUs are equal, > +e.g. where Launch Control is only enabled on a subset of CPUs. Linux does > +*not* support such a heterogeneous system configuration, nor does it even > +attempt to play nice in the face of a misconfigured system. With the exception > +of Launch Control's hash MSRs, which can vary per CPU, Linux assumes that all > +CPUs have a configuration that is identical to the boot CPU. > + > + > +Roles and responsibilities > +-------------------------- > + > +SGX introduces system resources, e.g. EPC memory, that must be accessible to > +multiple entities, e.g. the native kernel driver (to expose SGX to userspace) > +and KVM (to expose SGX to VMs), ideally without introducing any dependencies > +between each SGX entity. To that end, the kernel owns and manages the shared > +system resources, i.e. the EPC and Launch Control MSRs, and defines functions > +that provide appropriate access to the shared resources. SGX support for > +user space and VMs is left to the SGX platform driver and KVM respectively. > + > +Launching enclaves > +------------------ > + > +The current kernel implementation supports only unlocked MSRs i.e. MSRs, i.e. > +FEATURE_CONTROL_SGX_LE_WR must be set. The launch is performed by setting the > +MSRs to the hash of the public key modulus of the enclave signer, which is one > +of the fields in the SIGSTRUCT. > + > +EPC management > +-------------- > + > +Due to the unique requirements for swapping EPC pages, and because EPC pages > +(currently) do not have associated page structures, management of the EPC is > +not handled by the standard Linux swapper. SGX directly handles swapping > +of EPC pages, including a kthread to initiate reclaim and a rudimentary LRU > +mechanism. The consumers of EPC pages, e.g. the SGX driver, are required to > +implement function callbacks that can be invoked by the kernel to age, > +swap, and/or forcefully reclaim a target EPC page. In effect, the kernel > +controls what happens and when, while the consumers (driver, KVM, etc..) do > +the actual work. > + > +Exception handling > +------------------ > + > +The PF_SGX bit is set if and only if the #PF is detected by the SGX Enclave Page > +Cache Map (EPCM). The EPCM is a hardware-managed table that enforces accesses to > +an enclave's EPC pages in addition to the software-managed kernel page tables, > +i.e. the effective permissions for an EPC page are a logical AND of the kernel's > +page tables and the corresponding EPCM entry. > + > +The EPCM is consulted only after an access walks the kernel's page tables, i.e.: > + > +1. the access was allowed by the kernel > +2. the kernel's tables have become less restrictive than the EPCM > +3. the kernel cannot fixup the cause of the fault > + > +Noteably, (2) implies that either the kernel has botched the EPC mappings or the Notably, > +EPCM has been invalidated (see below). Regardless of why the fault occurred, > +userspace needs to be alerted so that it can take appropriate action, e.g. > +restart the enclave. This is reinforced by (3) as the kernel doesn't really > +have any other reasonable option, i.e. signalling SIGSEGV is actually the least > +severe action possible. > + > +Although the primary purpose of the EPCM is to prevent a malicious or > +compromised kernel from attacking an enclave, e.g. by modifying the enclave's > +page tables, do not WARN on a #PF w/ PF_SGX set. The SGX architecture with > +effectively allows the CPU to invalidate all EPCM entries at will and requires > +that software be prepared to handle an EPCM fault at any time. The architecture > +defines this behavior because the EPCM is encrypted with an ephemeral key that > +isn't exposed to software. As such, the EPCM entries cannot be preserved across > +transitions that result in a new key being used, e.g. CPU power down as part of > +an S3 transition or when a VM is live migrated to a new physical system. > + > +SGX uapi UAPI > +======== > + > +.. kernel-doc:: drivers/platform/x86/intel_sgx/sgx_ioctl.c > + :functions: sgx_ioc_enclave_create > + sgx_ioc_enclave_add_page > + sgx_ioc_enclave_init > + > +.. kernel-doc:: arch/x86/include/uapi/asm/sgx.h > + > +References > +========== > + > +* A Memory Encryption Engine Suitable for General Purpose Processors > + <https://eprint.iacr.org/2016/204.pdf> > +* System Programming Manual: 39.1.4 Intel® SGX Launch Control Configuration ciao.
On Sun, Dec 02, 2018 at 07:28:55PM -0800, Randy Dunlap wrote: > Hi, > I have more editing comments below. > > > On 11/15/18 5:01 PM, Jarkko Sakkinen wrote: > > Documentation of the features of the Software Guard eXtensions used > > by the Linux kernel and basic design choices for the core and driver > > and functionality. > > > > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > > Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > > --- > > Documentation/index.rst | 1 + > > Documentation/x86/index.rst | 8 ++ > > Documentation/x86/intel_sgx.rst | 233 ++++++++++++++++++++++++++++++++ > > 3 files changed, 242 insertions(+) > > create mode 100644 Documentation/x86/index.rst > > create mode 100644 Documentation/x86/intel_sgx.rst > > > diff --git a/Documentation/x86/intel_sgx.rst b/Documentation/x86/intel_sgx.rst > > new file mode 100644 > > index 000000000000..f51b43f9e125 > > --- /dev/null > > +++ b/Documentation/x86/intel_sgx.rst > > @@ -0,0 +1,233 @@ > > +=================== > > +Intel(R) SGX driver > > +=================== > > + > > +Introduction > > +============ > > + > > +Intel(R) SGX is a set of CPU instructions that can be used by applications to > > +set aside private regions of code and data. The code outside the enclave is > > +disallowed to access the memory inside the enclave by the CPU access control. > > +In a way you can think that SGX provides inverted sandbox. It protects the > > provides an inverted sandbox. > > > +application from a malicious host. > > + > > +You can tell if your CPU supports SGX by looking into ``/proc/cpuinfo``: > > + > > + ``cat /proc/cpuinfo | grep sgx`` > > + > > +Overview of SGX > > +=============== > > + > > +SGX has a set of data structures to maintain information about the enclaves and > > +their security properties. BIOS reserves a fixed size region of physical memory > > +for these structures by setting Processor Reserved Memory Range Registers > > +(PRMRR). > > + > > +This memory range is protected from outside access by the CPU and all the data > > +coming in and out of the CPU package is encrypted by a key that is generated for > > +each boot cycle. > > + > > +Enclaves execute in ring-3 in a special enclave submode using pages from the > > ring 3 > > > +reserved memory range. A fixed logical address range for the enclave is reserved > > +by ENCLS(ECREATE), a leaf instruction used to create enclaves. It is referred in > > referred to in > > > +the documentation commonly as the ELRANGE. > > + > > +Every memory access to the ELRANGE is asserted by the CPU. If the CPU is not > > +executing in the enclave mode inside the enclave, #GP is raised. On the other > > +hand, enclave code can make memory accesses both inside and outside of the > > +ELRANGE. > > + > > +Enclave can only execute code inside the ELRANGE. Instructions that may cause > > An enclave can only > > > +VMEXIT, IO instructions and instructions that require a privilege change are > > +prohibited inside the enclave. Interrupts and exceptions always cause enclave > > always cause an enclave > > > +to exit and jump to an address outside the enclave given when the enclave is > > +entered by using the leaf instruction ENCLS(EENTER). > > + > > +Protected memory > > +---------------- > > + > > +Enclave Page Cache (EPC) > > + Physical pages used with enclaves that are protected by the CPU from > > + unauthorized access. > > + > > +Enclave Page Cache Map (EPCM) > > + A database that describes the properties and state of the pages e.g. their > > + permissions or to which enclave they belong to. > > Drop one of those "to" words (either one). > > > + > > +Memory Encryption Engine (MEE) integrity tree > > + Autonomously updated integrity tree. The root of the tree located in on-die > > + SRAM. > > + > > +EPC data types > > +-------------- > > + > > +SGX Enclave Control Structure (SECS) > > + Describes the global properties of an enclave. Will not be mapped to the > > + ELRANGE. > > + > > +Regular (REG) > > + These pages contain code and data. > > + > > +Thread Control Structure (TCS) > > + The pages that define the entry points inside an enclave. An enclave can > > + only be entered through these entry points and each can host a single > > + hardware thread at a time. > > + > > +Version Array (VA) > > + The pages contain 64-bit version numbers for pages that have been swapped > > + outside the enclave. Each page has the capacity of 512 version numbers. > > + > > +Launch control > > +-------------- > > + > > +To launch an enclave, two structures must be provided for ENCLS(EINIT): > > + > > +1. **SIGSTRUCT:** signed measurement of the enclave binary. > > +2. **EINITTOKEN:** a cryptographic token CMAC-signed with a AES256-key called > > with an > > > + *launch key*, which is re-generated for each boot cycle. > > (prefer) regenerated > > > + > > +The CPU holds a SHA256 hash of a 3072-bit RSA public key inside > > +IA32_SGXLEPUBKEYHASHn MSRs. Enclaves with a SIGSTRUCT that is signed with this > > +key do not require a valid EINITTOKEN and can be authorized with special > > +privileges. One of those privileges is ability to acquire the launch key with > > +ENCLS(EGETKEY). > > + > > +**IA32_FEATURE_CONTROL[17]** is used by the BIOS configure whether > > by the BIOS to configure whether > > > +IA32_SGXLEPUBKEYHASH MSRs are read-only or read-write before locking the > > +feature control register and handing over control to the operating system. > > + > > +Enclave construction > > +-------------------- > > + > > +The construction is started by filling out the SECS that contains enclave > > +address range, privileged attributes and measurement of TCS and REG pages (pages > > +that will be mapped to the address range) among the other things. This structure > > +is passed out to the ENCLS(ECREATE) together with a physical address of a page > > This would make more sense to me: > > is passed to the ENCLS(ECREATE) instruction together with ... > > > +in EPC that will hold the SECS. > > + > > +The pages are added with ENCLS(EADD) and measured with ENCLS(EEXTEND) i.e. > > with ENCLS(EEXTEND), i.e. > > > +SHA256 hash MRENCLAVE residing in the SECS is extended with the page data. > > + > > +After all of the pages have been added, the enclave is initialized with > > +ENCLS(EINIT). It will check that the SIGSTRUCT is signed with the contained > > +public key. If the given EINITTOKEN has the valid bit set, the CPU checks that > > +the token is valid (CMAC'd with the launch key). If the token is not valid, > > +the CPU will check whether the enclave is signed with a key matching to the > > +IA32_SGXLEPUBKEYHASHn MSRs. > > + > > +Swapping pages > > +-------------- > > + > > +Enclave pages can be swapped out with ENCLS(EWB) to the unprotected memory. In > > +addition to the EPC page, ENCLS(EWB) takes in a VA page and address for PCMD > > +structure (Page Crypto MetaData) as input. The VA page will seal a version > > +number for the page. PCMD is 128 byte structure that contains tracking > > 128-byte Is having a space instead of dash always grammatically wrong or is this just to have a coherent style? Just asking for plain curiosity... > > > +information for the page, most importantly its MAC. With these structures the > > +enclave is sealed and rollback protected while it resides in the unprotected > > +memory. > > + > > +Before the page can be swapped out it must not have any active TLB references. > > +ENCLS(EBLOCK) instruction moves a page to the *blocked* state, which means > > The ENCLS(EBLOCK) instruction > > > +that no new TLB entries can be created to it by the hardware threads. > > + > > +After this a shootdown sequence is started with ENCLS(ETRACK), which sets an > > +increased counter value to the entering hardware threads. ENCLS(EWB) will > > +return SGX_NOT_TRACKED error while there are still threads with the earlier > > +couner value because that means that there might be hardware thread inside > > counter threads > > > > +the enclave with TLB entries to pages that are to be swapped. > > + > > +Kernel internals > > +================ > > + > > +Requirements > > +------------ > > + > > +Because SGX has an ever evolving and expanding feature set, it's possible for > > +a BIOS or VMM to configure a system in such a way that not all CPUs are equal, > > +e.g. where Launch Control is only enabled on a subset of CPUs. Linux does > > +*not* support such a heterogeneous system configuration, nor does it even > > +attempt to play nice in the face of a misconfigured system. With the exception > > +of Launch Control's hash MSRs, which can vary per CPU, Linux assumes that all > > +CPUs have a configuration that is identical to the boot CPU. > > + > > + > > +Roles and responsibilities > > +-------------------------- > > + > > +SGX introduces system resources, e.g. EPC memory, that must be accessible to > > +multiple entities, e.g. the native kernel driver (to expose SGX to userspace) > > +and KVM (to expose SGX to VMs), ideally without introducing any dependencies > > +between each SGX entity. To that end, the kernel owns and manages the shared > > +system resources, i.e. the EPC and Launch Control MSRs, and defines functions > > +that provide appropriate access to the shared resources. SGX support for > > +user space and VMs is left to the SGX platform driver and KVM respectively. > > + > > +Launching enclaves > > +------------------ > > + > > +The current kernel implementation supports only unlocked MSRs i.e. > > MSRs, i.e. > > > +FEATURE_CONTROL_SGX_LE_WR must be set. The launch is performed by setting the > > +MSRs to the hash of the public key modulus of the enclave signer, which is one > > +of the fields in the SIGSTRUCT. > > + > > +EPC management > > +-------------- > > + > > +Due to the unique requirements for swapping EPC pages, and because EPC pages > > +(currently) do not have associated page structures, management of the EPC is > > +not handled by the standard Linux swapper. SGX directly handles swapping > > +of EPC pages, including a kthread to initiate reclaim and a rudimentary LRU > > +mechanism. The consumers of EPC pages, e.g. the SGX driver, are required to > > +implement function callbacks that can be invoked by the kernel to age, > > +swap, and/or forcefully reclaim a target EPC page. In effect, the kernel > > +controls what happens and when, while the consumers (driver, KVM, etc..) do > > +the actual work. > > + > > +Exception handling > > +------------------ > > + > > +The PF_SGX bit is set if and only if the #PF is detected by the SGX Enclave Page > > +Cache Map (EPCM). The EPCM is a hardware-managed table that enforces accesses to > > +an enclave's EPC pages in addition to the software-managed kernel page tables, > > +i.e. the effective permissions for an EPC page are a logical AND of the kernel's > > +page tables and the corresponding EPCM entry. > > + > > +The EPCM is consulted only after an access walks the kernel's page tables, i.e.: > > + > > +1. the access was allowed by the kernel > > +2. the kernel's tables have become less restrictive than the EPCM > > +3. the kernel cannot fixup the cause of the fault > > + > > +Noteably, (2) implies that either the kernel has botched the EPC mappings or the > > Notably, > > > +EPCM has been invalidated (see below). Regardless of why the fault occurred, > > +userspace needs to be alerted so that it can take appropriate action, e.g. > > +restart the enclave. This is reinforced by (3) as the kernel doesn't really > > +have any other reasonable option, i.e. signalling SIGSEGV is actually the least > > +severe action possible. > > + > > +Although the primary purpose of the EPCM is to prevent a malicious or > > +compromised kernel from attacking an enclave, e.g. by modifying the enclave's > > +page tables, do not WARN on a #PF w/ PF_SGX set. The SGX architecture > > with > > > +effectively allows the CPU to invalidate all EPCM entries at will and requires > > +that software be prepared to handle an EPCM fault at any time. The architecture > > +defines this behavior because the EPCM is encrypted with an ephemeral key that > > +isn't exposed to software. As such, the EPCM entries cannot be preserved across > > +transitions that result in a new key being used, e.g. CPU power down as part of > > +an S3 transition or when a VM is live migrated to a new physical system. > > + > > +SGX uapi > > UAPI > > > +======== > > + > > +.. kernel-doc:: drivers/platform/x86/intel_sgx/sgx_ioctl.c > > + :functions: sgx_ioc_enclave_create > > + sgx_ioc_enclave_add_page > > + sgx_ioc_enclave_init > > + > > +.. kernel-doc:: arch/x86/include/uapi/asm/sgx.h > > + > > +References > > +========== > > + > > +* A Memory Encryption Engine Suitable for General Purpose Processors > > + <https://eprint.iacr.org/2016/204.pdf> > > +* System Programming Manual: 39.1.4 Intel® SGX Launch Control Configuration > > > ciao. > -- > ~Randy Great, thanks Randy, highly appreciated! /Jarkko
diff --git a/Documentation/index.rst b/Documentation/index.rst index c858c2e66e36..63864826dcd6 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -101,6 +101,7 @@ implementation. :maxdepth: 2 sh/index + x86/index Filesystem Documentation ------------------------ diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst new file mode 100644 index 000000000000..11d5b18d9537 --- /dev/null +++ b/Documentation/x86/index.rst @@ -0,0 +1,8 @@ +====================== +x86 Architecture Guide +====================== + +.. toctree:: + :maxdepth: 2 + + intel_sgx diff --git a/Documentation/x86/intel_sgx.rst b/Documentation/x86/intel_sgx.rst new file mode 100644 index 000000000000..f51b43f9e125 --- /dev/null +++ b/Documentation/x86/intel_sgx.rst @@ -0,0 +1,233 @@ +=================== +Intel(R) SGX driver +=================== + +Introduction +============ + +Intel(R) SGX is a set of CPU instructions that can be used by applications to +set aside private regions of code and data. The code outside the enclave is +disallowed to access the memory inside the enclave by the CPU access control. +In a way you can think that SGX provides inverted sandbox. It protects the +application from a malicious host. + +You can tell if your CPU supports SGX by looking into ``/proc/cpuinfo``: + + ``cat /proc/cpuinfo | grep sgx`` + +Overview of SGX +=============== + +SGX has a set of data structures to maintain information about the enclaves and +their security properties. BIOS reserves a fixed size region of physical memory +for these structures by setting Processor Reserved Memory Range Registers +(PRMRR). + +This memory range is protected from outside access by the CPU and all the data +coming in and out of the CPU package is encrypted by a key that is generated for +each boot cycle. + +Enclaves execute in ring-3 in a special enclave submode using pages from the +reserved memory range. A fixed logical address range for the enclave is reserved +by ENCLS(ECREATE), a leaf instruction used to create enclaves. It is referred in +the documentation commonly as the ELRANGE. + +Every memory access to the ELRANGE is asserted by the CPU. If the CPU is not +executing in the enclave mode inside the enclave, #GP is raised. On the other +hand, enclave code can make memory accesses both inside and outside of the +ELRANGE. + +Enclave can only execute code inside the ELRANGE. Instructions that may cause +VMEXIT, IO instructions and instructions that require a privilege change are +prohibited inside the enclave. Interrupts and exceptions always cause enclave +to exit and jump to an address outside the enclave given when the enclave is +entered by using the leaf instruction ENCLS(EENTER). + +Protected memory +---------------- + +Enclave Page Cache (EPC) + Physical pages used with enclaves that are protected by the CPU from + unauthorized access. + +Enclave Page Cache Map (EPCM) + A database that describes the properties and state of the pages e.g. their + permissions or to which enclave they belong to. + +Memory Encryption Engine (MEE) integrity tree + Autonomously updated integrity tree. The root of the tree located in on-die + SRAM. + +EPC data types +-------------- + +SGX Enclave Control Structure (SECS) + Describes the global properties of an enclave. Will not be mapped to the + ELRANGE. + +Regular (REG) + These pages contain code and data. + +Thread Control Structure (TCS) + The pages that define the entry points inside an enclave. An enclave can + only be entered through these entry points and each can host a single + hardware thread at a time. + +Version Array (VA) + The pages contain 64-bit version numbers for pages that have been swapped + outside the enclave. Each page has the capacity of 512 version numbers. + +Launch control +-------------- + +To launch an enclave, two structures must be provided for ENCLS(EINIT): + +1. **SIGSTRUCT:** signed measurement of the enclave binary. +2. **EINITTOKEN:** a cryptographic token CMAC-signed with a AES256-key called + *launch key*, which is re-generated for each boot cycle. + +The CPU holds a SHA256 hash of a 3072-bit RSA public key inside +IA32_SGXLEPUBKEYHASHn MSRs. Enclaves with a SIGSTRUCT that is signed with this +key do not require a valid EINITTOKEN and can be authorized with special +privileges. One of those privileges is ability to acquire the launch key with +ENCLS(EGETKEY). + +**IA32_FEATURE_CONTROL[17]** is used by the BIOS configure whether +IA32_SGXLEPUBKEYHASH MSRs are read-only or read-write before locking the +feature control register and handing over control to the operating system. + +Enclave construction +-------------------- + +The construction is started by filling out the SECS that contains enclave +address range, privileged attributes and measurement of TCS and REG pages (pages +that will be mapped to the address range) among the other things. This structure +is passed out to the ENCLS(ECREATE) together with a physical address of a page +in EPC that will hold the SECS. + +The pages are added with ENCLS(EADD) and measured with ENCLS(EEXTEND) i.e. +SHA256 hash MRENCLAVE residing in the SECS is extended with the page data. + +After all of the pages have been added, the enclave is initialized with +ENCLS(EINIT). It will check that the SIGSTRUCT is signed with the contained +public key. If the given EINITTOKEN has the valid bit set, the CPU checks that +the token is valid (CMAC'd with the launch key). If the token is not valid, +the CPU will check whether the enclave is signed with a key matching to the +IA32_SGXLEPUBKEYHASHn MSRs. + +Swapping pages +-------------- + +Enclave pages can be swapped out with ENCLS(EWB) to the unprotected memory. In +addition to the EPC page, ENCLS(EWB) takes in a VA page and address for PCMD +structure (Page Crypto MetaData) as input. The VA page will seal a version +number for the page. PCMD is 128 byte structure that contains tracking +information for the page, most importantly its MAC. With these structures the +enclave is sealed and rollback protected while it resides in the unprotected +memory. + +Before the page can be swapped out it must not have any active TLB references. +ENCLS(EBLOCK) instruction moves a page to the *blocked* state, which means +that no new TLB entries can be created to it by the hardware threads. + +After this a shootdown sequence is started with ENCLS(ETRACK), which sets an +increased counter value to the entering hardware threads. ENCLS(EWB) will +return SGX_NOT_TRACKED error while there are still threads with the earlier +couner value because that means that there might be hardware thread inside +the enclave with TLB entries to pages that are to be swapped. + +Kernel internals +================ + +Requirements +------------ + +Because SGX has an ever evolving and expanding feature set, it's possible for +a BIOS or VMM to configure a system in such a way that not all CPUs are equal, +e.g. where Launch Control is only enabled on a subset of CPUs. Linux does +*not* support such a heterogeneous system configuration, nor does it even +attempt to play nice in the face of a misconfigured system. With the exception +of Launch Control's hash MSRs, which can vary per CPU, Linux assumes that all +CPUs have a configuration that is identical to the boot CPU. + + +Roles and responsibilities +-------------------------- + +SGX introduces system resources, e.g. EPC memory, that must be accessible to +multiple entities, e.g. the native kernel driver (to expose SGX to userspace) +and KVM (to expose SGX to VMs), ideally without introducing any dependencies +between each SGX entity. To that end, the kernel owns and manages the shared +system resources, i.e. the EPC and Launch Control MSRs, and defines functions +that provide appropriate access to the shared resources. SGX support for +user space and VMs is left to the SGX platform driver and KVM respectively. + +Launching enclaves +------------------ + +The current kernel implementation supports only unlocked MSRs i.e. +FEATURE_CONTROL_SGX_LE_WR must be set. The launch is performed by setting the +MSRs to the hash of the public key modulus of the enclave signer, which is one +of the fields in the SIGSTRUCT. + +EPC management +-------------- + +Due to the unique requirements for swapping EPC pages, and because EPC pages +(currently) do not have associated page structures, management of the EPC is +not handled by the standard Linux swapper. SGX directly handles swapping +of EPC pages, including a kthread to initiate reclaim and a rudimentary LRU +mechanism. The consumers of EPC pages, e.g. the SGX driver, are required to +implement function callbacks that can be invoked by the kernel to age, +swap, and/or forcefully reclaim a target EPC page. In effect, the kernel +controls what happens and when, while the consumers (driver, KVM, etc..) do +the actual work. + +Exception handling +------------------ + +The PF_SGX bit is set if and only if the #PF is detected by the SGX Enclave Page +Cache Map (EPCM). The EPCM is a hardware-managed table that enforces accesses to +an enclave's EPC pages in addition to the software-managed kernel page tables, +i.e. the effective permissions for an EPC page are a logical AND of the kernel's +page tables and the corresponding EPCM entry. + +The EPCM is consulted only after an access walks the kernel's page tables, i.e.: + +1. the access was allowed by the kernel +2. the kernel's tables have become less restrictive than the EPCM +3. the kernel cannot fixup the cause of the fault + +Noteably, (2) implies that either the kernel has botched the EPC mappings or the +EPCM has been invalidated (see below). Regardless of why the fault occurred, +userspace needs to be alerted so that it can take appropriate action, e.g. +restart the enclave. This is reinforced by (3) as the kernel doesn't really +have any other reasonable option, i.e. signalling SIGSEGV is actually the least +severe action possible. + +Although the primary purpose of the EPCM is to prevent a malicious or +compromised kernel from attacking an enclave, e.g. by modifying the enclave's +page tables, do not WARN on a #PF w/ PF_SGX set. The SGX architecture +effectively allows the CPU to invalidate all EPCM entries at will and requires +that software be prepared to handle an EPCM fault at any time. The architecture +defines this behavior because the EPCM is encrypted with an ephemeral key that +isn't exposed to software. As such, the EPCM entries cannot be preserved across +transitions that result in a new key being used, e.g. CPU power down as part of +an S3 transition or when a VM is live migrated to a new physical system. + +SGX uapi +======== + +.. kernel-doc:: drivers/platform/x86/intel_sgx/sgx_ioctl.c + :functions: sgx_ioc_enclave_create + sgx_ioc_enclave_add_page + sgx_ioc_enclave_init + +.. kernel-doc:: arch/x86/include/uapi/asm/sgx.h + +References +========== + +* A Memory Encryption Engine Suitable for General Purpose Processors + <https://eprint.iacr.org/2016/204.pdf> +* System Programming Manual: 39.1.4 Intel® SGX Launch Control Configuration