diff mbox series

x86/sgx: Add poison handling to reclaimer

Message ID ef74bd9548df61f77e802e7505affcfb5159c48c.1642545829.git.reinette.chatre@intel.com (mailing list archive)
State New, archived
Headers show
Series x86/sgx: Add poison handling to reclaimer | expand

Commit Message

Reinette Chatre Jan. 18, 2022, 11:05 p.m. UTC
The machine check recovery handling in SGX added the changes
listed below to the freeing of pages in sgx_free_epc_page().
The SGX reclaimer contains an open coded version of
sgx_free_epc_page() and thus did not obtain the changes in
support of poison handling.

The changes made to EPC page freeing in support of poison handling
are:
1) A new SGX_EPC_PAGE_IS_FREE flag is set when the EPC page is
   freed. Introduced in commit d6d261bded8a ("x86/sgx: Add new
   sgx_epc_page flag bit to mark free pages").
2) A new "poison" field in struct sgx_epc_page is used to
   determine whether a newly freed EPC page should be placed
   on the list of poisoned or list of free pages. Introduced
   in commit 992801ae9243 ("x86/sgx: Initial poison handling
   for dirty and free pages").
3) The owner field in struct sgx_epc_page is cleared when the EPC
   page is freed.  Introduced in commit 992801ae9243 ("x86/sgx:
   Initial poison handling for dirty and free pages").

Replace the open coded enclave page freeing code in the reclaimer
with sgx_free_epc_page() to obtain support for poison page handling.

Fixes: d6d261bded8a ("x86/sgx: Add new sgx_epc_page flag bit to mark free pages")
Fixes: 992801ae9243 ("x86/sgx: Initial poison handling for dirty and free pages")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

Comments

Dave Hansen Jan. 19, 2022, 7:51 p.m. UTC | #1
On 1/18/22 3:05 PM, Reinette Chatre wrote:
> The machine check recovery handling in SGX added the changes
> listed below to the freeing of pages in sgx_free_epc_page().
> The SGX reclaimer contains an open coded version of
> sgx_free_epc_page() and thus did not obtain the changes in
> support of poison handling.

I was trying to decide if this is an urgent fix or not.  A more crisp
problem statement might have helped in the changelog.

But, from what I can tell, the most probable troublesome scenario here
would be something like:

 1. Machine check (#MC) occurs (asynchronous, !MF_ACTION_REQUIRED)
 2. arch_memory_failure() called is eventually
 3. (SGX) page->poison set to 1
 4. Page is reclaimed
 5. Page added to normal free lists by sgx_reclaim_pages()
    ^ This is the bug
 6. Page is reallocated by some innocent enclave, a second (synchronous)
    in-kernel #MC is induced, probably during EADD instruction.
    ^ This is the fallout from the bug

#6 is unfortunate and can be avoided if this patch is applied.

Basically, this patch ensures that a bad enclave page is isolated
quickly and causes a minimal amount of collateral damage.  Is this a
valid summary?

	The SGX reclaimer code lacks page poison handling in its free
	path.  This can lead to completely avoidable machine checks if a
	poisoned page is freed and reallocated instead of being
	isolated.
Reinette Chatre Jan. 19, 2022, 8:47 p.m. UTC | #2
Hi Dave,

On 1/19/2022 11:51 AM, Dave Hansen wrote:
> On 1/18/22 3:05 PM, Reinette Chatre wrote:
>> The machine check recovery handling in SGX added the changes
>> listed below to the freeing of pages in sgx_free_epc_page().
>> The SGX reclaimer contains an open coded version of
>> sgx_free_epc_page() and thus did not obtain the changes in
>> support of poison handling.
> 
> I was trying to decide if this is an urgent fix or not.  A more crisp
> problem statement might have helped in the changelog.
> 
> But, from what I can tell, the most probable troublesome scenario here
> would be something like:
> 
>  1. Machine check (#MC) occurs (asynchronous, !MF_ACTION_REQUIRED)
>  2. arch_memory_failure() called is eventually
>  3. (SGX) page->poison set to 1
>  4. Page is reclaimed
>  5. Page added to normal free lists by sgx_reclaim_pages()
>     ^ This is the bug
>  6. Page is reallocated by some innocent enclave, a second (synchronous)
>     in-kernel #MC is induced, probably during EADD instruction.
>     ^ This is the fallout from the bug
> 
> #6 is unfortunate and can be avoided if this patch is applied.
> 
> Basically, this patch ensures that a bad enclave page is isolated
> quickly and causes a minimal amount of collateral damage.  Is this a
> valid summary?
> 
> 	The SGX reclaimer code lacks page poison handling in its free
> 	path.  This can lead to completely avoidable machine checks if a
> 	poisoned page is freed and reallocated instead of being
> 	isolated.

As I understand this code it does look like a valid summary to me. One
detail is that the poison page handling is currently done for SECS pages
when they are freed by the reclaimer (via sgx_reclaimer_write()).

Thank you very much for the detailed analysis. Should I resend with
an improved commit message that contains your scenario description
and summary?

Reinette
Dave Hansen Jan. 19, 2022, 8:56 p.m. UTC | #3
On 1/19/22 12:47 PM, Reinette Chatre wrote:
> Thank you very much for the detailed analysis. Should I resend with
> an improved commit message that contains your scenario description
> and summary?

Sure, that would be great.
Jarkko Sakkinen Jan. 20, 2022, 12:57 p.m. UTC | #4
On Tue, 2022-01-18 at 15:05 -0800, Reinette Chatre wrote:
> The machine check recovery handling in SGX added the changes
> listed below to the freeing of pages in sgx_free_epc_page().
> The SGX reclaimer contains an open coded version of
> sgx_free_epc_page() and thus did not obtain the changes in
> support of poison handling.
> 
> The changes made to EPC page freeing in support of poison handling
> are:
> 1) A new SGX_EPC_PAGE_IS_FREE flag is set when the EPC page is
>    freed. Introduced in commit d6d261bded8a ("x86/sgx: Add new
>    sgx_epc_page flag bit to mark free pages").
> 2) A new "poison" field in struct sgx_epc_page is used to
>    determine whether a newly freed EPC page should be placed
>    on the list of poisoned or list of free pages. Introduced
>    in commit 992801ae9243 ("x86/sgx: Initial poison handling
>    for dirty and free pages").
> 3) The owner field in struct sgx_epc_page is cleared when the EPC
>    page is freed.  Introduced in commit 992801ae9243 ("x86/sgx:
>    Initial poison handling for dirty and free pages").
> 
> Replace the open coded enclave page freeing code in the reclaimer
> with sgx_free_epc_page() to obtain support for poison page handling.
> 
> Fixes: d6d261bded8a ("x86/sgx: Add new sgx_epc_page flag bit to mark
> free pages")

AFAIK, this patch does not semantically break anything so it is not
a legit fixes tag.

BR, Jarkko
Reinette Chatre Jan. 20, 2022, 3:28 p.m. UTC | #5
Hi Jarkko,

On 1/20/2022 4:57 AM, Jarkko Sakkinen wrote:
> On Tue, 2022-01-18 at 15:05 -0800, Reinette Chatre wrote:
>> The machine check recovery handling in SGX added the changes
>> listed below to the freeing of pages in sgx_free_epc_page().
>> The SGX reclaimer contains an open coded version of
>> sgx_free_epc_page() and thus did not obtain the changes in
>> support of poison handling.
>>
>> The changes made to EPC page freeing in support of poison handling
>> are:
>> 1) A new SGX_EPC_PAGE_IS_FREE flag is set when the EPC page is
>>    freed. Introduced in commit d6d261bded8a ("x86/sgx: Add new
>>    sgx_epc_page flag bit to mark free pages").
>> 2) A new "poison" field in struct sgx_epc_page is used to
>>    determine whether a newly freed EPC page should be placed
>>    on the list of poisoned or list of free pages. Introduced
>>    in commit 992801ae9243 ("x86/sgx: Initial poison handling
>>    for dirty and free pages").
>> 3) The owner field in struct sgx_epc_page is cleared when the EPC
>>    page is freed.  Introduced in commit 992801ae9243 ("x86/sgx:
>>    Initial poison handling for dirty and free pages").
>>
>> Replace the open coded enclave page freeing code in the reclaimer
>> with sgx_free_epc_page() to obtain support for poison page handling.
>>
>> Fixes: d6d261bded8a ("x86/sgx: Add new sgx_epc_page flag bit to mark
>> free pages")
> 
> AFAIK, this patch does not semantically break anything so it is not
> a legit fixes tag.
> 

The commit you refer to, commit d6d261bded8a ("x86/sgx: Add new
sgx_epc_page flag bit to mark free pages", introduced a new page flag bit
(SGX_EPC_PAGE_IS_FREE) that should be set when an EPC page is freed. The
commit also sets the bit in sgx_free_epc_page() when an EPC page is freed.
The commit should also have set that bit when the EPC page is freed in the
reclaimer, which contains an open coded version of sgx_free_epc_page(),
but it did not. This fix adds the snippet that was omitted from that
commit.

Reinette
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 4b41efc9e367..997a5d0bc488 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -418,13 +418,7 @@  static void sgx_reclaim_pages(void)
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
 		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
 
-		section = &sgx_epc_sections[epc_page->section];
-		node = section->node;
-
-		spin_lock(&node->lock);
-		list_add_tail(&epc_page->list, &node->free_page_list);
-		spin_unlock(&node->lock);
-		atomic_long_inc(&sgx_nr_free_pages);
+		sgx_free_epc_page(epc_page);
 	}
 }