[2/2] x86/sgx: Resolve EREMOVE page vs EAUG page data race

Message ID	20240429104330.3636113-3-dmitrii.kuvaiskii@intel.com (mailing list archive)
State	New, archived
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8816A3BBFE; Mon, 29 Apr 2024 10:51:54 +0000 (UTC) From: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com> To: dave.hansen@linux.intel.com, jarkko@kernel.org, kai.huang@intel.com, haitao.huang@linux.intel.com, reinette.chatre@intel.com, linux-sgx@vger.kernel.org, linux-kernel@vger.kernel.org Cc: mona.vij@intel.com, kailun.qin@intel.com, stable@vger.kernel.org Subject: [PATCH 2/2] x86/sgx: Resolve EREMOVE page vs EAUG page data race Date: Mon, 29 Apr 2024 03:43:30 -0700 Message-Id: <20240429104330.3636113-3-dmitrii.kuvaiskii@intel.com> In-Reply-To: <20240429104330.3636113-1-dmitrii.kuvaiskii@intel.com> References: <20240429104330.3636113-1-dmitrii.kuvaiskii@intel.com> Precedence: bulk MIME-Version: 1.0 Organization: Intel Deutschland GmbH - Registered Address: Am Campeon 10, 85579 Neubiberg, Germany Content-Transfer-Encoding: 8bit
Series	x86/sgx: Fix two data races in EAUG/EREMOVE flows \| expand [0/2] x86/sgx: Fix two data races in EAUG/EREMOVE flows [1/2] x86/sgx: Resolve EAUG race where losing thread returns SIGBUS [2/2] x86/sgx: Resolve EREMOVE page vs EAUG page data race

Message ID

20240429104330.3636113-3-dmitrii.kuvaiskii@intel.com (mailing list archive)

State

New, archived

Headers

From: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
To: dave.hansen@linux.intel.com,
	jarkko@kernel.org,
	kai.huang@intel.com,
	haitao.huang@linux.intel.com,
	reinette.chatre@intel.com,
	linux-sgx@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: mona.vij@intel.com,
	kailun.qin@intel.com,
	stable@vger.kernel.org
Subject: [PATCH 2/2] x86/sgx: Resolve EREMOVE page vs EAUG page data race
Date: Mon, 29 Apr 2024 03:43:30 -0700
Message-Id: <20240429104330.3636113-3-dmitrii.kuvaiskii@intel.com>
In-Reply-To: <20240429104330.3636113-1-dmitrii.kuvaiskii@intel.com>
References: <20240429104330.3636113-1-dmitrii.kuvaiskii@intel.com>
Precedence: bulk
MIME-Version: 1.0
Organization: Intel Deutschland GmbH - Registered Address: Am Campeon 10,
 85579 Neubiberg, Germany
Content-Transfer-Encoding: 8bit

Series

x86/sgx: Fix two data races in EAUG/EREMOVE flows | expand

Commit Message

Dmitrii Kuvaiskii April 29, 2024, 10:43 a.m. UTC

Two enclave threads may try to add and remove the same enclave page
simultaneously (e.g., if the SGX runtime supports both lazy allocation
and `MADV_DONTNEED` semantics). Consider this race:

1. T1 performs page removal in sgx_encl_remove_pages() and stops right
   after removing the page table entry and right before re-acquiring the
   enclave lock to EREMOVE and xa_erase(&encl->page_array) the page.
2. T2 tries to access the page, and #PF[not_present] is raised. The
   condition to EAUG in sgx_vma_fault() is not satisfied because the
   page is still present in encl->page_array, thus the SGX driver
   assumes that the fault happened because the page was swapped out. The
   driver continues on a code path that installs a page table entry
   *without* performing EAUG.
3. The enclave page metadata is in inconsistent state: the PTE is
   installed but there was no EAUG. Thus, T2 in userspace infinitely
   receives SIGSEGV on this page (and EACCEPT always fails).

Fix this by making sure that T1 (the page-removing thread) always wins
this data race. In particular, the page-being-removed is marked as such,
and T2 retries until the page is fully removed.

Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c  | 3 ++-
 arch/x86/kernel/cpu/sgx/encl.h  | 3 +++
 arch/x86/kernel/cpu/sgx/ioctl.c | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

Comments

Jarkko Sakkinen April 29, 2024, 1:11 p.m. UTC | #1

On Mon Apr 29, 2024 at 1:43 PM EEST, Dmitrii Kuvaiskii wrote:
> Two enclave threads may try to add and remove the same enclave page
> simultaneously (e.g., if the SGX runtime supports both lazy allocation
> and `MADV_DONTNEED` semantics). Consider this race:
>
> 1. T1 performs page removal in sgx_encl_remove_pages() and stops right
>    after removing the page table entry and right before re-acquiring the
>    enclave lock to EREMOVE and xa_erase(&encl->page_array) the page.
> 2. T2 tries to access the page, and #PF[not_present] is raised. The
>    condition to EAUG in sgx_vma_fault() is not satisfied because the
>    page is still present in encl->page_array, thus the SGX driver
>    assumes that the fault happened because the page was swapped out. The
>    driver continues on a code path that installs a page table entry
>    *without* performing EAUG.
> 3. The enclave page metadata is in inconsistent state: the PTE is
>    installed but there was no EAUG. Thus, T2 in userspace infinitely
>    receives SIGSEGV on this page (and EACCEPT always fails).
>
> Fix this by making sure that T1 (the page-removing thread) always wins
> this data race. In particular, the page-being-removed is marked as such,
> and T2 retries until the page is fully removed.
>
> Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/encl.c  | 3 ++-
>  arch/x86/kernel/cpu/sgx/encl.h  | 3 +++
>  arch/x86/kernel/cpu/sgx/ioctl.c | 1 +
>  3 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 41f14b1a3025..7ccd8b2fce5f 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -257,7 +257,8 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
>  
>  	/* Entry successfully located. */
>  	if (entry->epc_page) {
> -		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
> +		if (entry->desc & (SGX_ENCL_PAGE_BEING_RECLAIMED |
> +				   SGX_ENCL_PAGE_BEING_REMOVED))
>  			return ERR_PTR(-EBUSY);
>  
>  		return entry;
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index f94ff14c9486..fff5f2293ae7 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -25,6 +25,9 @@
>  /* 'desc' bit marking that the page is being reclaimed. */
>  #define SGX_ENCL_PAGE_BEING_RECLAIMED	BIT(3)
>  
> +/* 'desc' bit marking that the page is being removed. */
> +#define SGX_ENCL_PAGE_BEING_REMOVED	BIT(2)
> +
>  struct sgx_encl_page {
>  	unsigned long desc;
>  	unsigned long vm_max_prot_bits:8;
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index b65ab214bdf5..c542d4dd3e64 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -1142,6 +1142,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
>  		 * Do not keep encl->lock because of dependency on
>  		 * mmap_lock acquired in sgx_zap_enclave_ptes().
>  		 */
> +		entry->desc |= SGX_ENCL_PAGE_BEING_REMOVED;
>  		mutex_unlock(&encl->lock);
>  
>  		sgx_zap_enclave_ptes(encl, addr);

It is somewhat trivial to NAK this as the commit message does
not do any effort describing the new flag. By default at least
I have strong opposition against any new flags related to
reclaiming even if it needs a bit of extra synchronization
work in the user space.

One way to describe concurrency scenarios would be to take
example from https://www.kernel.org/doc/Documentation/memory-barriers.txt

I.e. see the examples with CPU 1 and CPU 2.

BR, Jarkko

Dmitrii Kuvaiskii April 30, 2024, 2:38 p.m. UTC | #2

On Mon, Apr 29, 2024 at 04:11:03PM +0300, Jarkko Sakkinen wrote:
> On Mon Apr 29, 2024 at 1:43 PM EEST, Dmitrii Kuvaiskii wrote:
> > Two enclave threads may try to add and remove the same enclave page
> > simultaneously (e.g., if the SGX runtime supports both lazy allocation
> > and `MADV_DONTNEED` semantics). Consider this race:
> >
> > 1. T1 performs page removal in sgx_encl_remove_pages() and stops right
> >    after removing the page table entry and right before re-acquiring the
> >    enclave lock to EREMOVE and xa_erase(&encl->page_array) the page.
> > 2. T2 tries to access the page, and #PF[not_present] is raised. The
> >    condition to EAUG in sgx_vma_fault() is not satisfied because the
> >    page is still present in encl->page_array, thus the SGX driver
> >    assumes that the fault happened because the page was swapped out. The
> >    driver continues on a code path that installs a page table entry
> >    *without* performing EAUG.
> > 3. The enclave page metadata is in inconsistent state: the PTE is
> >    installed but there was no EAUG. Thus, T2 in userspace infinitely
> >    receives SIGSEGV on this page (and EACCEPT always fails).
> >
> > Fix this by making sure that T1 (the page-removing thread) always wins
> > this data race. In particular, the page-being-removed is marked as such,
> > and T2 retries until the page is fully removed.
> >
> > Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
> > ---
> >  arch/x86/kernel/cpu/sgx/encl.c  | 3 ++-
> >  arch/x86/kernel/cpu/sgx/encl.h  | 3 +++
> >  arch/x86/kernel/cpu/sgx/ioctl.c | 1 +
> >  3 files changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > index 41f14b1a3025..7ccd8b2fce5f 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -257,7 +257,8 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
> >  
> >  	/* Entry successfully located. */
> >  	if (entry->epc_page) {
> > -		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
> > +		if (entry->desc & (SGX_ENCL_PAGE_BEING_RECLAIMED |
> > +				   SGX_ENCL_PAGE_BEING_REMOVED))
> >  			return ERR_PTR(-EBUSY);
> >  
> >  		return entry;
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> > index f94ff14c9486..fff5f2293ae7 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.h
> > +++ b/arch/x86/kernel/cpu/sgx/encl.h
> > @@ -25,6 +25,9 @@
> >  /* 'desc' bit marking that the page is being reclaimed. */
> >  #define SGX_ENCL_PAGE_BEING_RECLAIMED	BIT(3)
> >  
> > +/* 'desc' bit marking that the page is being removed. */
> > +#define SGX_ENCL_PAGE_BEING_REMOVED	BIT(2)
> > +
> >  struct sgx_encl_page {
> >  	unsigned long desc;
> >  	unsigned long vm_max_prot_bits:8;
> > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> > index b65ab214bdf5..c542d4dd3e64 100644
> > --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> > @@ -1142,6 +1142,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
> >  		 * Do not keep encl->lock because of dependency on
> >  		 * mmap_lock acquired in sgx_zap_enclave_ptes().
> >  		 */
> > +		entry->desc |= SGX_ENCL_PAGE_BEING_REMOVED;
> >  		mutex_unlock(&encl->lock);
> >  
> >  		sgx_zap_enclave_ptes(encl, addr);
> 
> It is somewhat trivial to NAK this as the commit message does
> not do any effort describing the new flag. By default at least
> I have strong opposition against any new flags related to
> reclaiming even if it needs a bit of extra synchronization
> work in the user space.
> 
> One way to describe concurrency scenarios would be to take
> example from https://www.kernel.org/doc/Documentation/memory-barriers.txt
> 
> I.e. see the examples with CPU 1 and CPU 2.

Thank you for the suggestion. Here is my new attempt at describing the racy
scenario:

Consider some enclave page added to the enclave. User space decides to
temporarily remove this page (e.g., emulating the MADV_DONTNEED semantics)
on CPU1. At the same time, user space performs a memory access on the same
page on CPU2, which results in a #PF and ultimately in sgx_vma_fault().
Scenario proceeds as follows:

/*
 * CPU1: User space performs
 * ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
 * on a single enclave page
 */
sgx_encl_remove_pages() {

  mutex_lock(&encl->lock);

  entry = sgx_encl_load_page(encl);
  /*
   * verify that page is
   * trimmed and accepted
   */

  mutex_unlock(&encl->lock);

  /*
   * remove PTE entry; cannot
   * be performed under lock
   */
  sgx_zap_enclave_ptes(encl);
                                   /*
                                    * Fault on CPU2
                                    */
                                   sgx_vma_fault() {
                                     /*
                                      * PTE entry was removed, but the
                                      * page is still in enclave's xarray
                                      */
                                     xa_load(&encl->page_array) != NULL ->
                                     /*
                                      * SGX driver thinks that this page
                                      * was swapped out and loads it
                                      */
                                     mutex_lock(&encl->lock);
                                     /*
                                      * this is effectively a no-op
                                      */
                                     entry = sgx_encl_load_page_in_vma();
                                     /*
                                      * add PTE entry
                                      */
                                     vmf_insert_pfn(...);

                                     mutex_unlock(&encl->lock);
                                     return VM_FAULT_NOPAGE;
                                   }
  /*
   * continue with page removal
   */
  mutex_lock(&encl->lock);

  sgx_encl_free_epc_page(epc_page) {
    /*
     * remove page via EREMOVE
     */
    /*
     * free EPC page
     */
    sgx_free_epc_page(epc_page);
  }

  xa_erase(&encl->page_array);

  mutex_unlock(&encl->lock);
}

CPU1 removed the page. However CPU2 installed the PTE entry on the
same page. This enclave page becomes perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed. Because of this
combination, any subsequent access to this page raises a fault, and the #PF
handler sees the SGX bit set in the #PF error code and does not call
sgx_vma_fault() but instead raises a SIGSEGV. The userspace SIGSEGV handler
cannot perform EACCEPT because the page was not EAUGed. Thus, the user
space is stuck with the inaccessible page.

This race can be fixed by forcing the fault handler on CPU2 to back off if
the page is currently being removed (on CPU1). Thus a simple change is to
introduce a new flag SGX_ENCL_PAGE_BEING_REMOVED, which is unset by default
and set only right-before the first mutex_unlock() in
sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
page is being removed, and if yes then CPU2 backs off and waits until the
page is completely removed. After that, any memory access to this page
results in a normal "allocate and EAUG a page on #PF" flow.

--
Dmitrii Kuvaiskii

Reinette Chatre May 10, 2024, 11:47 p.m. UTC | #3

Hi Dmitrii,

Thank you very much for uncovering and fixing this issue.

On 4/30/2024 7:38 AM, Dmitrii Kuvaiskii wrote:
> On Mon, Apr 29, 2024 at 04:11:03PM +0300, Jarkko Sakkinen wrote:
>> On Mon Apr 29, 2024 at 1:43 PM EEST, Dmitrii Kuvaiskii wrote:
>>> Two enclave threads may try to add and remove the same enclave page
>>> simultaneously (e.g., if the SGX runtime supports both lazy allocation
>>> and `MADV_DONTNEED` semantics). Consider this race:
>>>
>>> 1. T1 performs page removal in sgx_encl_remove_pages() and stops right
>>>    after removing the page table entry and right before re-acquiring the
>>>    enclave lock to EREMOVE and xa_erase(&encl->page_array) the page.
>>> 2. T2 tries to access the page, and #PF[not_present] is raised. The
>>>    condition to EAUG in sgx_vma_fault() is not satisfied because the
>>>    page is still present in encl->page_array, thus the SGX driver
>>>    assumes that the fault happened because the page was swapped out. The
>>>    driver continues on a code path that installs a page table entry
>>>    *without* performing EAUG.
>>> 3. The enclave page metadata is in inconsistent state: the PTE is
>>>    installed but there was no EAUG. Thus, T2 in userspace infinitely
>>>    receives SIGSEGV on this page (and EACCEPT always fails).
>>>
>>> Fix this by making sure that T1 (the page-removing thread) always wins
>>> this data race. In particular, the page-being-removed is marked as such,
>>> and T2 retries until the page is fully removed.
>>>
>>> Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
>>> ---
>>>  arch/x86/kernel/cpu/sgx/encl.c  | 3 ++-
>>>  arch/x86/kernel/cpu/sgx/encl.h  | 3 +++
>>>  arch/x86/kernel/cpu/sgx/ioctl.c | 1 +
>>>  3 files changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
>>> index 41f14b1a3025..7ccd8b2fce5f 100644
>>> --- a/arch/x86/kernel/cpu/sgx/encl.c
>>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
>>> @@ -257,7 +257,8 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
>>>  
>>>  	/* Entry successfully located. */
>>>  	if (entry->epc_page) {
>>> -		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
>>> +		if (entry->desc & (SGX_ENCL_PAGE_BEING_RECLAIMED |
>>> +				   SGX_ENCL_PAGE_BEING_REMOVED))
>>>  			return ERR_PTR(-EBUSY);
>>>  
>>>  		return entry;
>>> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
>>> index f94ff14c9486..fff5f2293ae7 100644
>>> --- a/arch/x86/kernel/cpu/sgx/encl.h
>>> +++ b/arch/x86/kernel/cpu/sgx/encl.h
>>> @@ -25,6 +25,9 @@
>>>  /* 'desc' bit marking that the page is being reclaimed. */
>>>  #define SGX_ENCL_PAGE_BEING_RECLAIMED	BIT(3)
>>>  
>>> +/* 'desc' bit marking that the page is being removed. */
>>> +#define SGX_ENCL_PAGE_BEING_REMOVED	BIT(2)
>>> +
>>>  struct sgx_encl_page {
>>>  	unsigned long desc;
>>>  	unsigned long vm_max_prot_bits:8;
>>> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
>>> index b65ab214bdf5..c542d4dd3e64 100644
>>> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
>>> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
>>> @@ -1142,6 +1142,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
>>>  		 * Do not keep encl->lock because of dependency on
>>>  		 * mmap_lock acquired in sgx_zap_enclave_ptes().
>>>  		 */
>>> +		entry->desc |= SGX_ENCL_PAGE_BEING_REMOVED;
>>>  		mutex_unlock(&encl->lock);
>>>  
>>>  		sgx_zap_enclave_ptes(encl, addr);
>>
>> It is somewhat trivial to NAK this as the commit message does
>> not do any effort describing the new flag. By default at least
>> I have strong opposition against any new flags related to
>> reclaiming even if it needs a bit of extra synchronization
>> work in the user space.
>>
>> One way to describe concurrency scenarios would be to take
>> example from https://www.kernel.org/doc/Documentation/memory-barriers.txt
>>
>> I.e. see the examples with CPU 1 and CPU 2.
> 
> Thank you for the suggestion. Here is my new attempt at describing the racy
> scenario:
> 
> Consider some enclave page added to the enclave. User space decides to
> temporarily remove this page (e.g., emulating the MADV_DONTNEED semantics)
> on CPU1. At the same time, user space performs a memory access on the same
> page on CPU2, which results in a #PF and ultimately in sgx_vma_fault().
> Scenario proceeds as follows:
> 
> /*
>  * CPU1: User space performs
>  * ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
>  * on a single enclave page
>  */
> sgx_encl_remove_pages() {
> 
>   mutex_lock(&encl->lock);
> 
>   entry = sgx_encl_load_page(encl);
>   /*
>    * verify that page is
>    * trimmed and accepted
>    */
> 
>   mutex_unlock(&encl->lock);
> 
>   /*
>    * remove PTE entry; cannot
>    * be performed under lock
>    */
>   sgx_zap_enclave_ptes(encl);
>                                    /*
>                                     * Fault on CPU2
>                                     */

Please highlight that this fault is related to the page that
is in process of being removed on CPU1.

>                                    sgx_vma_fault() {
>                                      /*
>                                       * PTE entry was removed, but the
>                                       * page is still in enclave's xarray
>                                       */
>                                      xa_load(&encl->page_array) != NULL ->
>                                      /*
>                                       * SGX driver thinks that this page
>                                       * was swapped out and loads it
>                                       */
>                                      mutex_lock(&encl->lock);
>                                      /*
>                                       * this is effectively a no-op
>                                       */
>                                      entry = sgx_encl_load_page_in_vma();
>                                      /*
>                                       * add PTE entry
>                                       */

It may be helpful to highlight that this is a problem: "BUG: A PTE
is installed for a page in process of being removed." (please feel free
to expand)

>                                      vmf_insert_pfn(...);
> 
>                                      mutex_unlock(&encl->lock);
>                                      return VM_FAULT_NOPAGE;
>                                    }
>   /*
>    * continue with page removal
>    */
>   mutex_lock(&encl->lock);
> 
>   sgx_encl_free_epc_page(epc_page) {
>     /*
>      * remove page via EREMOVE
>      */
>     /*
>      * free EPC page
>      */
>     sgx_free_epc_page(epc_page);
>   }
> 
>   xa_erase(&encl->page_array);
> 
>   mutex_unlock(&encl->lock);
> }
> 
> CPU1 removed the page. However CPU2 installed the PTE entry on the
> same page. This enclave page becomes perpetually inaccessible (until
> another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
> marked accessible in the PTE entry but is not EAUGed. Because of this
> combination, any subsequent access to this page raises a fault, and the #PF
> handler sees the SGX bit set in the #PF error code and does not call

Which #PF handler?

> sgx_vma_fault() but instead raises a SIGSEGV. The userspace SIGSEGV handler
> cannot perform EACCEPT because the page was not EAUGed. Thus, the user
> space is stuck with the inaccessible page.
> 
> This race can be fixed by forcing the fault handler on CPU2 to back off if
> the page is currently being removed (on CPU1). Thus a simple change is to
> introduce a new flag SGX_ENCL_PAGE_BEING_REMOVED, which is unset by default
> and set only right-before the first mutex_unlock() in
> sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
> page is being removed, and if yes then CPU2 backs off and waits until the
> page is completely removed. After that, any memory access to this page
> results in a normal "allocate and EAUG a page on #PF" flow.

I have been tripped by these page flags before so would appreciate
another opinion. From my side this looks like an appropriate fix.

Reinette

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 41f14b1a3025..7ccd8b2fce5f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -257,7 +257,8 @@  static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
 
 	/* Entry successfully located. */
 	if (entry->epc_page) {
-		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
+		if (entry->desc & (SGX_ENCL_PAGE_BEING_RECLAIMED |
+				   SGX_ENCL_PAGE_BEING_REMOVED))
 			return ERR_PTR(-EBUSY);
 
 		return entry;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..fff5f2293ae7 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -25,6 +25,9 @@ 
 /* 'desc' bit marking that the page is being reclaimed. */
 #define SGX_ENCL_PAGE_BEING_RECLAIMED	BIT(3)
 
+/* 'desc' bit marking that the page is being removed. */
+#define SGX_ENCL_PAGE_BEING_REMOVED	BIT(2)
+
 struct sgx_encl_page {
 	unsigned long desc;
 	unsigned long vm_max_prot_bits:8;
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index b65ab214bdf5..c542d4dd3e64 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -1142,6 +1142,7 @@  static long sgx_encl_remove_pages(struct sgx_encl *encl,
 		 * Do not keep encl->lock because of dependency on
 		 * mmap_lock acquired in sgx_zap_enclave_ptes().
 		 */
+		entry->desc |= SGX_ENCL_PAGE_BEING_REMOVED;
 		mutex_unlock(&encl->lock);
 
 		sgx_zap_enclave_ptes(encl, addr);

[2/2] x86/sgx: Resolve EREMOVE page vs EAUG page data race

Commit Message

Comments

Patch