diff mbox series

[04/25] x86/sgx: Add pfn_mkwrite() handler for present PTEs

Message ID 9edf8cd20628cbf400886d88e359fb24265fdef0.1638381245.git.reinette.chatre@intel.com (mailing list archive)
State New, archived
Headers show
Series x86/sgx and selftests/sgx: Support SGX2 | expand

Commit Message

Reinette Chatre Dec. 1, 2021, 7:23 p.m. UTC
By default a write page fault on a present PTE inherits the permissions
of the VMA. Enclave page permissions maintained in the hardware's
Enclave Page Cache Map (EPCM) may change after a VMA accessing the page
is created. A VMA's permissions may thus exceed the enclave page
permissions even though the VMA was originally created not to exceed
the enclave page permissions. Following the default behavior during
a page fault on a present PTE while the VMA permissions exceed the
enclave page permissions would result in the PTE for an enclave page
to be writable even though the page is not writable according to the
enclave's permissions.

Consider the following scenario:
* An enclave page exists with RW EPCM permissions.
* A RW VMA maps the range spanning the enclave page.
* The enclave page's EPCM permissions are changed to read-only.
* There is no page table entry for the enclave page.

Q.
 What will user space observe when an attempt is made to write to the
 enclave page from within the enclave?

A.
 Initially the page table entry is not present so the following is
 observed:
 1) Instruction writing to enclave page is run from within the enclave.
 2) A page fault with second and third bits set (0x6) is encountered
    and handled by the SGX handler sgx_vma_fault() that installs a
    read-only page table entry following previous patch that installs
    page table entry with permissions that VMA and enclave agree on
    (read-only in this case).
 3) Instruction writing to enclave page is re-attempted.
 4) A page fault with first three bits set (0x7) is encountered and
    transparently (from SGX and user space perspective) handled by the
    OS with the page table entry made writable because the VMA is
    writable.
 5) Instruction writing to enclave page is re-attempted.
 6) Since the EPCM permissions prevents writing to the page a new page
    fault is encountered, this time with the SGX flag set in the error
    code (0x8007). No action is taken by OS for this page fault and
    execution returns to user space.
 7) Typically such a fault will be passed on to an application with a
    signal but if the enclave is entered with the vDSO function provided
    by the kernel then user space does not receive a signal but instead
    the vDSO function returns successfully with exception information
    (vector=14, error code=0x8007, and address) within the exception
    fields within the vDSO function's struct sgx_enclave_run.

As can be observed it is not possible for user space to write to an
enclave page if that page's enclave page permissions do not allow so,
no matter what the VMA or PTE allows.

Even so, the OS should not allow writing to a page if that page is not
writable. Thus the page table entry should accurately reflect the
enclave page permissions.

Do not blindly accept VMA permissions on a page fault due to a write
attempt to a present PTE. Install a pfn_mkwrite() handler that ensures
that the VMA permissions agree with the enclave permissions in this
regard.

Considering the same scenario as above after this change results in
the following behavior change:

Q.
 What will user space observe when an attempt is made to write to the
 enclave page from within the enclave?

A.
 Initially the page table entry is not present so the following is
 observed:
 1) Instruction writing to enclave page is run from within the enclave.
 2) A page fault with second and third bits set (0x6) is encountered
    and handled by the SGX handler sgx_vma_fault() that installs a
    read-only page table entry following previous patch that installs
    page table entry with permissions that VMA and enclave agree on
    (read-only in this case).
 3) Instruction writing to enclave page is re-attempted.
 4) A page fault with first three bits set (0x7) is encountered and
    passed to the pfn_mkwrite() handler for consideration. The handler
    determines that the page should not be writable and returns SIGBUS.
 5) Typically such a fault will be passed on to an application with a
    signal but if the enclave is entered with the vDSO function provided
    by the kernel then user space does not receive a signal but instead
    the vDSO function returns successfully with exception information
    (vector=14, error code=0x7, and address) within the exception fields
    within the vDSO function's struct sgx_enclave_run.

The accurate exception information supports the SGX runtime, which is
virtually always implemented inside a shared library, by providing
accurate information in support of its management of the SGX enclave.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c | 42 ++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

Comments

Jarkko Sakkinen Dec. 4, 2021, 10:43 p.m. UTC | #1
On Wed, Dec 01, 2021 at 11:23:02AM -0800, Reinette Chatre wrote:
> By default a write page fault on a present PTE inherits the permissions
> of the VMA. Enclave page permissions maintained in the hardware's
> Enclave Page Cache Map (EPCM) may change after a VMA accessing the page
> is created. A VMA's permissions may thus exceed the enclave page
> permissions even though the VMA was originally created not to exceed
> the enclave page permissions. Following the default behavior during
> a page fault on a present PTE while the VMA permissions exceed the
> enclave page permissions would result in the PTE for an enclave page
> to be writable even though the page is not writable according to the
> enclave's permissions.
> 
> Consider the following scenario:
> * An enclave page exists with RW EPCM permissions.
> * A RW VMA maps the range spanning the enclave page.
> * The enclave page's EPCM permissions are changed to read-only.

How could this happen in the existing mainline code?

> * There is no page table entry for the enclave page.
> 
> Q.
>  What will user space observe when an attempt is made to write to the
>  enclave page from within the enclave?
> 
> A.
>  Initially the page table entry is not present so the following is
>  observed:
>  1) Instruction writing to enclave page is run from within the enclave.
>  2) A page fault with second and third bits set (0x6) is encountered
>     and handled by the SGX handler sgx_vma_fault() that installs a
>     read-only page table entry following previous patch that installs
>     page table entry with permissions that VMA and enclave agree on
>     (read-only in this case).
>  3) Instruction writing to enclave page is re-attempted.
>  4) A page fault with first three bits set (0x7) is encountered and
>     transparently (from SGX and user space perspective) handled by the
>     OS with the page table entry made writable because the VMA is
>     writable.
>  5) Instruction writing to enclave page is re-attempted.
>  6) Since the EPCM permissions prevents writing to the page a new page
>     fault is encountered, this time with the SGX flag set in the error
>     code (0x8007). No action is taken by OS for this page fault and
>     execution returns to user space.
>  7) Typically such a fault will be passed on to an application with a
>     signal but if the enclave is entered with the vDSO function provided
>     by the kernel then user space does not receive a signal but instead
>     the vDSO function returns successfully with exception information
>     (vector=14, error code=0x8007, and address) within the exception
>     fields within the vDSO function's struct sgx_enclave_run.
> 
> As can be observed it is not possible for user space to write to an
> enclave page if that page's enclave page permissions do not allow so,
> no matter what the VMA or PTE allows.
> 
> Even so, the OS should not allow writing to a page if that page is not
> writable. Thus the page table entry should accurately reflect the
> enclave page permissions.
> 
> Do not blindly accept VMA permissions on a page fault due to a write
> attempt to a present PTE. Install a pfn_mkwrite() handler that ensures
> that the VMA permissions agree with the enclave permissions in this
> regard.
> 
> Considering the same scenario as above after this change results in
> the following behavior change:
> 
> Q.
>  What will user space observe when an attempt is made to write to the
>  enclave page from within the enclave?
> 
> A.
>  Initially the page table entry is not present so the following is
>  observed:
>  1) Instruction writing to enclave page is run from within the enclave.
>  2) A page fault with second and third bits set (0x6) is encountered
>     and handled by the SGX handler sgx_vma_fault() that installs a
>     read-only page table entry following previous patch that installs
>     page table entry with permissions that VMA and enclave agree on
>     (read-only in this case).
>  3) Instruction writing to enclave page is re-attempted.
>  4) A page fault with first three bits set (0x7) is encountered and
>     passed to the pfn_mkwrite() handler for consideration. The handler
>     determines that the page should not be writable and returns SIGBUS.
>  5) Typically such a fault will be passed on to an application with a
>     signal but if the enclave is entered with the vDSO function provided
>     by the kernel then user space does not receive a signal but instead
>     the vDSO function returns successfully with exception information
>     (vector=14, error code=0x7, and address) within the exception fields
>     within the vDSO function's struct sgx_enclave_run.
> 
> The accurate exception information supports the SGX runtime, which is
> virtually always implemented inside a shared library, by providing
> accurate information in support of its management of the SGX enclave.

This QA-format is not a great idea, as it kind of tells what are the legit
questions to ask. You should describe what the patch does and what are the
legit reasons for doing that. Unfortunately, in the current form it is very
hard to get grip of this patch.

/Jarkko
Reinette Chatre Dec. 6, 2021, 9:18 p.m. UTC | #2
Hi Jarkko,

On 12/4/2021 2:43 PM, Jarkko Sakkinen wrote:
> On Wed, Dec 01, 2021 at 11:23:02AM -0800, Reinette Chatre wrote:
>> By default a write page fault on a present PTE inherits the permissions
>> of the VMA. Enclave page permissions maintained in the hardware's
>> Enclave Page Cache Map (EPCM) may change after a VMA accessing the page
>> is created. A VMA's permissions may thus exceed the enclave page
>> permissions even though the VMA was originally created not to exceed
>> the enclave page permissions. Following the default behavior during
>> a page fault on a present PTE while the VMA permissions exceed the
>> enclave page permissions would result in the PTE for an enclave page
>> to be writable even though the page is not writable according to the
>> enclave's permissions.
>>
>> Consider the following scenario:
>> * An enclave page exists with RW EPCM permissions.
>> * A RW VMA maps the range spanning the enclave page.
>> * The enclave page's EPCM permissions are changed to read-only.
> 
> How could this happen in the existing mainline code?

This is a preparatory patch for SGX2 support. Restricting the 
permissions of an enclave page would require OS support that is added in 
a later patch.

> 
>> * There is no page table entry for the enclave page.
>>
>> Q.
>>   What will user space observe when an attempt is made to write to the
>>   enclave page from within the enclave?
>>
>> A.
>>   Initially the page table entry is not present so the following is
>>   observed:
>>   1) Instruction writing to enclave page is run from within the enclave.
>>   2) A page fault with second and third bits set (0x6) is encountered
>>      and handled by the SGX handler sgx_vma_fault() that installs a
>>      read-only page table entry following previous patch that installs
>>      page table entry with permissions that VMA and enclave agree on
>>      (read-only in this case).
>>   3) Instruction writing to enclave page is re-attempted.
>>   4) A page fault with first three bits set (0x7) is encountered and
>>      transparently (from SGX and user space perspective) handled by the
>>      OS with the page table entry made writable because the VMA is
>>      writable.
>>   5) Instruction writing to enclave page is re-attempted.
>>   6) Since the EPCM permissions prevents writing to the page a new page
>>      fault is encountered, this time with the SGX flag set in the error
>>      code (0x8007). No action is taken by OS for this page fault and
>>      execution returns to user space.
>>   7) Typically such a fault will be passed on to an application with a
>>      signal but if the enclave is entered with the vDSO function provided
>>      by the kernel then user space does not receive a signal but instead
>>      the vDSO function returns successfully with exception information
>>      (vector=14, error code=0x8007, and address) within the exception
>>      fields within the vDSO function's struct sgx_enclave_run.
>>
>> As can be observed it is not possible for user space to write to an
>> enclave page if that page's enclave page permissions do not allow so,
>> no matter what the VMA or PTE allows.
>>
>> Even so, the OS should not allow writing to a page if that page is not
>> writable. Thus the page table entry should accurately reflect the
>> enclave page permissions.
>>
>> Do not blindly accept VMA permissions on a page fault due to a write
>> attempt to a present PTE. Install a pfn_mkwrite() handler that ensures
>> that the VMA permissions agree with the enclave permissions in this
>> regard.
>>
>> Considering the same scenario as above after this change results in
>> the following behavior change:
>>
>> Q.
>>   What will user space observe when an attempt is made to write to the
>>   enclave page from within the enclave?
>>
>> A.
>>   Initially the page table entry is not present so the following is
>>   observed:
>>   1) Instruction writing to enclave page is run from within the enclave.
>>   2) A page fault with second and third bits set (0x6) is encountered
>>      and handled by the SGX handler sgx_vma_fault() that installs a
>>      read-only page table entry following previous patch that installs
>>      page table entry with permissions that VMA and enclave agree on
>>      (read-only in this case).
>>   3) Instruction writing to enclave page is re-attempted.
>>   4) A page fault with first three bits set (0x7) is encountered and
>>      passed to the pfn_mkwrite() handler for consideration. The handler
>>      determines that the page should not be writable and returns SIGBUS.
>>   5) Typically such a fault will be passed on to an application with a
>>      signal but if the enclave is entered with the vDSO function provided
>>      by the kernel then user space does not receive a signal but instead
>>      the vDSO function returns successfully with exception information
>>      (vector=14, error code=0x7, and address) within the exception fields
>>      within the vDSO function's struct sgx_enclave_run.
>>
>> The accurate exception information supports the SGX runtime, which is
>> virtually always implemented inside a shared library, by providing
>> accurate information in support of its management of the SGX enclave.
> 
> This QA-format is not a great idea, as it kind of tells what are the legit
> questions to ask.

I will remove the QA-format and just describe the two (before/after) 
scenarios.

> You should describe what the patch does and what are the
> legit reasons for doing that. Unfortunately, in the current form it is very
> hard to get grip of this patch.

That was the goal of the summary (the first paragraph) at the start of 
the changelog. Could you please elaborate how you would like me to 
improve it?

Reinette
Jarkko Sakkinen Dec. 11, 2021, 7:37 a.m. UTC | #3
On Mon, 2021-12-06 at 13:18 -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 12/4/2021 2:43 PM, Jarkko Sakkinen wrote:
> > On Wed, Dec 01, 2021 at 11:23:02AM -0800, Reinette Chatre wrote:
> > > By default a write page fault on a present PTE inherits the permissions
> > > of the VMA. Enclave page permissions maintained in the hardware's
> > > Enclave Page Cache Map (EPCM) may change after a VMA accessing the page
> > > is created. A VMA's permissions may thus exceed the enclave page
> > > permissions even though the VMA was originally created not to exceed
> > > the enclave page permissions. Following the default behavior during
> > > a page fault on a present PTE while the VMA permissions exceed the
> > > enclave page permissions would result in the PTE for an enclave page
> > > to be writable even though the page is not writable according to the
> > > enclave's permissions.
> > > 
> > > Consider the following scenario:
> > > * An enclave page exists with RW EPCM permissions.
> > > * A RW VMA maps the range spanning the enclave page.
> > > * The enclave page's EPCM permissions are changed to read-only.
> > 
> > How could this happen in the existing mainline code?
> 
> This is a preparatory patch for SGX2 support. Restricting the 
> permissions of an enclave page would require OS support that is added in 
> a later patch.
> 
> > 
> > > * There is no page table entry for the enclave page.
> > > 
> > > Q.
> > >   What will user space observe when an attempt is made to write to the
> > >   enclave page from within the enclave?
> > > 
> > > A.
> > >   Initially the page table entry is not present so the following is
> > >   observed:
> > >   1) Instruction writing to enclave page is run from within the enclave.
> > >   2) A page fault with second and third bits set (0x6) is encountered
> > >      and handled by the SGX handler sgx_vma_fault() that installs a
> > >      read-only page table entry following previous patch that installs
> > >      page table entry with permissions that VMA and enclave agree on
> > >      (read-only in this case).
> > >   3) Instruction writing to enclave page is re-attempted.
> > >   4) A page fault with first three bits set (0x7) is encountered and
> > >      transparently (from SGX and user space perspective) handled by the
> > >      OS with the page table entry made writable because the VMA is
> > >      writable.
> > >   5) Instruction writing to enclave page is re-attempted.
> > >   6) Since the EPCM permissions prevents writing to the page a new page
> > >      fault is encountered, this time with the SGX flag set in the error
> > >      code (0x8007). No action is taken by OS for this page fault and
> > >      execution returns to user space.
> > >   7) Typically such a fault will be passed on to an application with a
> > >      signal but if the enclave is entered with the vDSO function provided
> > >      by the kernel then user space does not receive a signal but instead
> > >      the vDSO function returns successfully with exception information
> > >      (vector=14, error code=0x8007, and address) within the exception
> > >      fields within the vDSO function's struct sgx_enclave_run.
> > > 
> > > As can be observed it is not possible for user space to write to an
> > > enclave page if that page's enclave page permissions do not allow so,
> > > no matter what the VMA or PTE allows.
> > > 
> > > Even so, the OS should not allow writing to a page if that page is not
> > > writable. Thus the page table entry should accurately reflect the
> > > enclave page permissions.
> > > 
> > > Do not blindly accept VMA permissions on a page fault due to a write
> > > attempt to a present PTE. Install a pfn_mkwrite() handler that ensures
> > > that the VMA permissions agree with the enclave permissions in this
> > > regard.
> > > 
> > > Considering the same scenario as above after this change results in
> > > the following behavior change:
> > > 
> > > Q.
> > >   What will user space observe when an attempt is made to write to the
> > >   enclave page from within the enclave?
> > > 
> > > A.
> > >   Initially the page table entry is not present so the following is
> > >   observed:
> > >   1) Instruction writing to enclave page is run from within the enclave.
> > >   2) A page fault with second and third bits set (0x6) is encountered
> > >      and handled by the SGX handler sgx_vma_fault() that installs a
> > >      read-only page table entry following previous patch that installs
> > >      page table entry with permissions that VMA and enclave agree on
> > >      (read-only in this case).
> > >   3) Instruction writing to enclave page is re-attempted.
> > >   4) A page fault with first three bits set (0x7) is encountered and
> > >      passed to the pfn_mkwrite() handler for consideration. The handler
> > >      determines that the page should not be writable and returns SIGBUS.
> > >   5) Typically such a fault will be passed on to an application with a
> > >      signal but if the enclave is entered with the vDSO function provided
> > >      by the kernel then user space does not receive a signal but instead
> > >      the vDSO function returns successfully with exception information
> > >      (vector=14, error code=0x7, and address) within the exception fields
> > >      within the vDSO function's struct sgx_enclave_run.
> > > 
> > > The accurate exception information supports the SGX runtime, which is
> > > virtually always implemented inside a shared library, by providing
> > > accurate information in support of its management of the SGX enclave.
> > 
> > This QA-format is not a great idea, as it kind of tells what are the legit
> > questions to ask.
> 
> I will remove the QA-format and just describe the two (before/after) 
> scenarios.
> 
> > You should describe what the patch does and what are the
> > legit reasons for doing that. Unfortunately, in the current form it is very
> > hard to get grip of this patch.
> 
> That was the goal of the summary (the first paragraph) at the start of 
> the changelog. Could you please elaborate how you would like me to 
> improve it?

If I do a search "mktme" through the commit message, it gives
me zero results.

/Jarkko
Reinette Chatre Dec. 13, 2021, 10:09 p.m. UTC | #4
Hi Jarkko,

On 12/10/2021 11:37 PM, Jarkko Sakkinen wrote:
> On Mon, 2021-12-06 at 13:18 -0800, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 12/4/2021 2:43 PM, Jarkko Sakkinen wrote:
>>> On Wed, Dec 01, 2021 at 11:23:02AM -0800, Reinette Chatre wrote:
>>>> By default a write page fault on a present PTE inherits the permissions
>>>> of the VMA. Enclave page permissions maintained in the hardware's
>>>> Enclave Page Cache Map (EPCM) may change after a VMA accessing the page
>>>> is created. A VMA's permissions may thus exceed the enclave page
>>>> permissions even though the VMA was originally created not to exceed
>>>> the enclave page permissions. Following the default behavior during
>>>> a page fault on a present PTE while the VMA permissions exceed the
>>>> enclave page permissions would result in the PTE for an enclave page
>>>> to be writable even though the page is not writable according to the
>>>> enclave's permissions.
>>>>
>>>> Consider the following scenario:
>>>> * An enclave page exists with RW EPCM permissions.
>>>> * A RW VMA maps the range spanning the enclave page.
>>>> * The enclave page's EPCM permissions are changed to read-only.
>>>
>>> How could this happen in the existing mainline code?
>>
>> This is a preparatory patch for SGX2 support. Restricting the
>> permissions of an enclave page would require OS support that is added in
>> a later patch.
>>
>>>
>>>> * There is no page table entry for the enclave page.
>>>>
>>>> Q.
>>>>    What will user space observe when an attempt is made to write to the
>>>>    enclave page from within the enclave?
>>>>
>>>> A.
>>>>    Initially the page table entry is not present so the following is
>>>>    observed:
>>>>    1) Instruction writing to enclave page is run from within the enclave.
>>>>    2) A page fault with second and third bits set (0x6) is encountered
>>>>       and handled by the SGX handler sgx_vma_fault() that installs a
>>>>       read-only page table entry following previous patch that installs
>>>>       page table entry with permissions that VMA and enclave agree on
>>>>       (read-only in this case).
>>>>    3) Instruction writing to enclave page is re-attempted.
>>>>    4) A page fault with first three bits set (0x7) is encountered and
>>>>       transparently (from SGX and user space perspective) handled by the
>>>>       OS with the page table entry made writable because the VMA is
>>>>       writable.
>>>>    5) Instruction writing to enclave page is re-attempted.
>>>>    6) Since the EPCM permissions prevents writing to the page a new page
>>>>       fault is encountered, this time with the SGX flag set in the error
>>>>       code (0x8007). No action is taken by OS for this page fault and
>>>>       execution returns to user space.
>>>>    7) Typically such a fault will be passed on to an application with a
>>>>       signal but if the enclave is entered with the vDSO function provided
>>>>       by the kernel then user space does not receive a signal but instead
>>>>       the vDSO function returns successfully with exception information
>>>>       (vector=14, error code=0x8007, and address) within the exception
>>>>       fields within the vDSO function's struct sgx_enclave_run.
>>>>
>>>> As can be observed it is not possible for user space to write to an
>>>> enclave page if that page's enclave page permissions do not allow so,
>>>> no matter what the VMA or PTE allows.
>>>>
>>>> Even so, the OS should not allow writing to a page if that page is not
>>>> writable. Thus the page table entry should accurately reflect the
>>>> enclave page permissions.
>>>>
>>>> Do not blindly accept VMA permissions on a page fault due to a write
>>>> attempt to a present PTE. Install a pfn_mkwrite() handler that ensures
>>>> that the VMA permissions agree with the enclave permissions in this
>>>> regard.
>>>>
>>>> Considering the same scenario as above after this change results in
>>>> the following behavior change:
>>>>
>>>> Q.
>>>>    What will user space observe when an attempt is made to write to the
>>>>    enclave page from within the enclave?
>>>>
>>>> A.
>>>>    Initially the page table entry is not present so the following is
>>>>    observed:
>>>>    1) Instruction writing to enclave page is run from within the enclave.
>>>>    2) A page fault with second and third bits set (0x6) is encountered
>>>>       and handled by the SGX handler sgx_vma_fault() that installs a
>>>>       read-only page table entry following previous patch that installs
>>>>       page table entry with permissions that VMA and enclave agree on
>>>>       (read-only in this case).
>>>>    3) Instruction writing to enclave page is re-attempted.
>>>>    4) A page fault with first three bits set (0x7) is encountered and
>>>>       passed to the pfn_mkwrite() handler for consideration. The handler
>>>>       determines that the page should not be writable and returns SIGBUS.
>>>>    5) Typically such a fault will be passed on to an application with a
>>>>       signal but if the enclave is entered with the vDSO function provided
>>>>       by the kernel then user space does not receive a signal but instead
>>>>       the vDSO function returns successfully with exception information
>>>>       (vector=14, error code=0x7, and address) within the exception fields
>>>>       within the vDSO function's struct sgx_enclave_run.
>>>>
>>>> The accurate exception information supports the SGX runtime, which is
>>>> virtually always implemented inside a shared library, by providing
>>>> accurate information in support of its management of the SGX enclave.
>>>
>>> This QA-format is not a great idea, as it kind of tells what are the legit
>>> questions to ask.
>>
>> I will remove the QA-format and just describe the two (before/after)
>> scenarios.
>>
>>> You should describe what the patch does and what are the
>>> legit reasons for doing that. Unfortunately, in the current form it is very
>>> hard to get grip of this patch.
>>
>> That was the goal of the summary (the first paragraph) at the start of
>> the changelog. Could you please elaborate how you would like me to
>> improve it?
> 
> If I do a search "mktme" through the commit message, it gives
> me zero results.

Could you please elaborate why you expect "mktme" to show up in the 
commit message?

Reinette
Jarkko Sakkinen Dec. 28, 2021, 2:51 p.m. UTC | #5
On Mon, Dec 13, 2021 at 02:09:30PM -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 12/10/2021 11:37 PM, Jarkko Sakkinen wrote:
> > On Mon, 2021-12-06 at 13:18 -0800, Reinette Chatre wrote:
> > > Hi Jarkko,
> > > 
> > > On 12/4/2021 2:43 PM, Jarkko Sakkinen wrote:
> > > > On Wed, Dec 01, 2021 at 11:23:02AM -0800, Reinette Chatre wrote:
> > > > > By default a write page fault on a present PTE inherits the permissions
> > > > > of the VMA. Enclave page permissions maintained in the hardware's
> > > > > Enclave Page Cache Map (EPCM) may change after a VMA accessing the page
> > > > > is created. A VMA's permissions may thus exceed the enclave page
> > > > > permissions even though the VMA was originally created not to exceed
> > > > > the enclave page permissions. Following the default behavior during
> > > > > a page fault on a present PTE while the VMA permissions exceed the
> > > > > enclave page permissions would result in the PTE for an enclave page
> > > > > to be writable even though the page is not writable according to the
> > > > > enclave's permissions.
> > > > > 
> > > > > Consider the following scenario:
> > > > > * An enclave page exists with RW EPCM permissions.
> > > > > * A RW VMA maps the range spanning the enclave page.
> > > > > * The enclave page's EPCM permissions are changed to read-only.
> > > > 
> > > > How could this happen in the existing mainline code?
> > > 
> > > This is a preparatory patch for SGX2 support. Restricting the
> > > permissions of an enclave page would require OS support that is added in
> > > a later patch.
> > > 
> > > > 
> > > > > * There is no page table entry for the enclave page.
> > > > > 
> > > > > Q.
> > > > >    What will user space observe when an attempt is made to write to the
> > > > >    enclave page from within the enclave?
> > > > > 
> > > > > A.
> > > > >    Initially the page table entry is not present so the following is
> > > > >    observed:
> > > > >    1) Instruction writing to enclave page is run from within the enclave.
> > > > >    2) A page fault with second and third bits set (0x6) is encountered
> > > > >       and handled by the SGX handler sgx_vma_fault() that installs a
> > > > >       read-only page table entry following previous patch that installs
> > > > >       page table entry with permissions that VMA and enclave agree on
> > > > >       (read-only in this case).
> > > > >    3) Instruction writing to enclave page is re-attempted.
> > > > >    4) A page fault with first three bits set (0x7) is encountered and
> > > > >       transparently (from SGX and user space perspective) handled by the
> > > > >       OS with the page table entry made writable because the VMA is
> > > > >       writable.
> > > > >    5) Instruction writing to enclave page is re-attempted.
> > > > >    6) Since the EPCM permissions prevents writing to the page a new page
> > > > >       fault is encountered, this time with the SGX flag set in the error
> > > > >       code (0x8007). No action is taken by OS for this page fault and
> > > > >       execution returns to user space.
> > > > >    7) Typically such a fault will be passed on to an application with a
> > > > >       signal but if the enclave is entered with the vDSO function provided
> > > > >       by the kernel then user space does not receive a signal but instead
> > > > >       the vDSO function returns successfully with exception information
> > > > >       (vector=14, error code=0x8007, and address) within the exception
> > > > >       fields within the vDSO function's struct sgx_enclave_run.
> > > > > 
> > > > > As can be observed it is not possible for user space to write to an
> > > > > enclave page if that page's enclave page permissions do not allow so,
> > > > > no matter what the VMA or PTE allows.
> > > > > 
> > > > > Even so, the OS should not allow writing to a page if that page is not
> > > > > writable. Thus the page table entry should accurately reflect the
> > > > > enclave page permissions.
> > > > > 
> > > > > Do not blindly accept VMA permissions on a page fault due to a write
> > > > > attempt to a present PTE. Install a pfn_mkwrite() handler that ensures
> > > > > that the VMA permissions agree with the enclave permissions in this
> > > > > regard.
> > > > > 
> > > > > Considering the same scenario as above after this change results in
> > > > > the following behavior change:
> > > > > 
> > > > > Q.
> > > > >    What will user space observe when an attempt is made to write to the
> > > > >    enclave page from within the enclave?
> > > > > 
> > > > > A.
> > > > >    Initially the page table entry is not present so the following is
> > > > >    observed:
> > > > >    1) Instruction writing to enclave page is run from within the enclave.
> > > > >    2) A page fault with second and third bits set (0x6) is encountered
> > > > >       and handled by the SGX handler sgx_vma_fault() that installs a
> > > > >       read-only page table entry following previous patch that installs
> > > > >       page table entry with permissions that VMA and enclave agree on
> > > > >       (read-only in this case).
> > > > >    3) Instruction writing to enclave page is re-attempted.
> > > > >    4) A page fault with first three bits set (0x7) is encountered and
> > > > >       passed to the pfn_mkwrite() handler for consideration. The handler
> > > > >       determines that the page should not be writable and returns SIGBUS.
> > > > >    5) Typically such a fault will be passed on to an application with a
> > > > >       signal but if the enclave is entered with the vDSO function provided
> > > > >       by the kernel then user space does not receive a signal but instead
> > > > >       the vDSO function returns successfully with exception information
> > > > >       (vector=14, error code=0x7, and address) within the exception fields
> > > > >       within the vDSO function's struct sgx_enclave_run.
> > > > > 
> > > > > The accurate exception information supports the SGX runtime, which is
> > > > > virtually always implemented inside a shared library, by providing
> > > > > accurate information in support of its management of the SGX enclave.
> > > > 
> > > > This QA-format is not a great idea, as it kind of tells what are the legit
> > > > questions to ask.
> > > 
> > > I will remove the QA-format and just describe the two (before/after)
> > > scenarios.
> > > 
> > > > You should describe what the patch does and what are the
> > > > legit reasons for doing that. Unfortunately, in the current form it is very
> > > > hard to get grip of this patch.
> > > 
> > > That was the goal of the summary (the first paragraph) at the start of
> > > the changelog. Could you please elaborate how you would like me to
> > > improve it?
> > 
> > If I do a search "mktme" through the commit message, it gives
> > me zero results.
> 
> Could you please elaborate why you expect "mktme" to show up in the commit
> message?

I'm sorry, my mistake doubled: I ment to write mkwrite, and yes its use was well
described in the commit message.

/Jarkko
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 20e97d3abdce..60afa8eaf979 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -184,6 +184,47 @@  static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 	return VM_FAULT_NOPAGE;
 }
 
+/*
+ * A fault occurred while writing to a present enclave PTE. Since PTE is
+ * present this will not be handled by sgx_vma_fault(). VMA may allow
+ * writing to the page while enclave does not. Do not follow the default
+ * of inheriting VMA permissions in this regard, ensure enclave also allows
+ * writing to the page.
+ */
+static vm_fault_t sgx_vma_pfn_mkwrite(struct vm_fault *vmf)
+{
+	unsigned long addr = (unsigned long)vmf->address;
+	struct vm_area_struct *vma = vmf->vma;
+	struct sgx_encl_page *entry;
+	struct sgx_encl *encl;
+	vm_fault_t ret = 0;
+
+	encl = vma->vm_private_data;
+
+	/*
+	 * It's very unlikely but possible that allocating memory for the
+	 * mm_list entry of a forked process failed in sgx_vma_open(). When
+	 * this happens, vm_private_data is set to NULL.
+	 */
+	if (unlikely(!encl))
+		return VM_FAULT_SIGBUS;
+
+	mutex_lock(&encl->lock);
+
+	entry = xa_load(&encl->page_array, PFN_DOWN(addr));
+	if (!entry) {
+		ret = VM_FAULT_SIGBUS;
+		goto out;
+	}
+
+	if (!(entry->vm_max_prot_bits & VM_WRITE))
+		ret = VM_FAULT_SIGBUS;
+
+out:
+	mutex_unlock(&encl->lock);
+	return ret;
+}
+
 static void sgx_vma_open(struct vm_area_struct *vma)
 {
 	struct sgx_encl *encl = vma->vm_private_data;
@@ -381,6 +422,7 @@  const struct vm_operations_struct sgx_vm_ops = {
 	.mprotect = sgx_vma_mprotect,
 	.open = sgx_vma_open,
 	.access = sgx_vma_access,
+	.pfn_mkwrite = sgx_vma_pfn_mkwrite,
 };
 
 /**