mbox series

[v3,0/3] x86/sgx: fine grained SGX MCA behavior

Message ID 20220525100604.760576-1-zhiquan1.li@intel.com (mailing list archive)
Headers show
Series x86/sgx: fine grained SGX MCA behavior | expand

Message

Zhiquan Li May 25, 2022, 10:06 a.m. UTC
V2: https://lore.kernel.org/linux-sgx/694234d7-6a0d-e85f-f2f9-e52b4a61e1ec@intel.com/T/#t

Changes since V2:
- Repurpose the owner field as the virtual address of virtual EPC page
- Remove struct sgx_vepc_page and relevant code.
- Remove patch 01 as the changes are not necessary in new design.
- Rework patch 02 suggested by Jarkko.
- Adapt patch 03 and 04 since struct sgx_vepc_page was discarded.
- Replace EPC page flag SGX_EPC_PAGE_IS_VEPC with
  SGX_EPC_PAGE_KVM_GUEST as they are duplicated.
  Link: https://lore.kernel.org/linux-sgx/eb95b32ecf3d44a695610cf7f2816785@intel.com/T/#u

V1: https://lore.kernel.org/linux-sgx/443cb425-009c-2784-56f4-5e707122de76@intel.com/T/#t

Changes since V1:
- Updated cover letter and commit messages, added valuable
  information from Jarkko, Tony and Kai's comments.
- Added documentations for struct struct sgx_vepc and
  struct sgx_vepc_page.

Hi everyone,

This series contains a few patches to fine grained SGX MCA behavior.

When VM guest access a SGX EPC page with memory failure, current
behavior will kill the guest, expected only kill the SGX application
inside it.

To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
information for hypervisor to inject #MC information to guest, which
is helpful in SGX virtualization case.

The rest of things are guest side. Currently the hypervisor like
Qemu already has mature facility to convert HVA to GPA and inject #MC
to the guest OS.

Then we extend the solution for the normal SGX case, so that the task
has opportunity to make further decision while EPC page has memory
failure.

However, when a page triggers a machine check, it only reports the PFN.
But in order to inject #MC into hypervisor, the virtual address
is required. Then repurpose the "owner" field as the virtual address of
the virtual EPC page so that arch_memory_failure() can easily retrieve
it.

The EPC page flag - SGX_EPC_PAGE_KVM_GUEST to interpret the meaning
of the field.

Suppose an enclave is shared by multiple processes, when an enclave
page triggers a machine check, the enclave will be disabled so that
it couldn't be entered again. Killing other processes with the same
enclave mapped would perhaps be overkill, but they are going to find
that the enclave is "dead" next time they try to use it. Thanks for
Jarkko's head up and Tony's clarification on this point.

Our intension is to provide additional info so that the application has
more choices. Current behavior looks gently, and we don't want to
change it.

If you expect the other processes to be informed in such case, then
you're looking for an MCA "early kill" feature which worth another
patch set to implement it.

Unlike host enclaves, virtual EPC instance cannot be shared by multiple
VMs. It is because how enclaves are created is totally up to the guest.
Sharing virtual EPC instance will be very likely to unexpectedly break
enclaves in all VMs.

SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance
being shared by multiple VMs via fork(). However KVM doesn't support
running a VM across multiple mm structures, and the de facto userspace
hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice
this should not happen.

This series is based on tip/x86/sgx with the following additionally
applied:

"x86/sgx: Keep record for SGX VA and Guest page type"
https://lore.kernel.org/linux-sgx/eb95b32ecf3d44a695610cf7f2816785@intel.com/T/#mbd2ca61983f1f9514a8baf07fdb17d33495eeada

Tests:
1. MCE injection test for SGX in VM.
   As we expected, the application was killed and VM was alive.
2. MCE injection test for SGX on host.
   As we expected, the application received SIGBUS with extra info.
3. Kernel selftest/sgx: PASS
4. Internal SGX stress test: PASS
5. kmemleak test: No memory leakage detected.

Much appreciate your feedback.

Best Regards,
Zhiquan

Zhiquan Li (3):
  x86/sgx: Repurpose the owner field as the virtual address of virtual
    EPC page
  x86/sgx: Fine grained SGX MCA behavior for virtualization
  x86/sgx: Fine grained SGX MCA behavior for normal case

 arch/x86/kernel/cpu/sgx/main.c | 27 +++++++++++++++++++++++++--
 arch/x86/kernel/cpu/sgx/sgx.h  |  2 ++
 arch/x86/kernel/cpu/sgx/virt.c |  4 +++-
 3 files changed, 30 insertions(+), 3 deletions(-)

Comments

Jarkko Sakkinen May 26, 2022, 12:35 a.m. UTC | #1
On Wed, 2022-05-25 at 18:06 +0800, Zhiquan Li wrote:
> V2: https://lore.kernel.org/linux-sgx/694234d7-6a0d-e85f-f2f9-e52b4a61e1ec@intel.com/T/#t
> 
> Changes since V2:
> - Repurpose the owner field as the virtual address of virtual EPC page
> - Remove struct sgx_vepc_page and relevant code.
> - Remove patch 01 as the changes are not necessary in new design.
> - Rework patch 02 suggested by Jarkko.
> - Adapt patch 03 and 04 since struct sgx_vepc_page was discarded.
> - Replace EPC page flag SGX_EPC_PAGE_IS_VEPC with
>   SGX_EPC_PAGE_KVM_GUEST as they are duplicated.
>   Link: https://lore.kernel.org/linux-sgx/eb95b32ecf3d44a695610cf7f2816785@intel.com/T/#u
> 
> V1: https://lore.kernel.org/linux-sgx/443cb425-009c-2784-56f4-5e707122de76@intel.com/T/#t
> 
> Changes since V1:
> - Updated cover letter and commit messages, added valuable
>   information from Jarkko, Tony and Kai's comments.
> - Added documentations for struct struct sgx_vepc and
>   struct sgx_vepc_page.
> 
> Hi everyone,
> 
> This series contains a few patches to fine grained SGX MCA behavior.
> 
> When VM guest access a SGX EPC page with memory failure, current
> behavior will kill the guest, expected only kill the SGX application
> inside it.
> 
> To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
> information for hypervisor to inject #MC information to guest, which
> is helpful in SGX virtualization case.
> 
> The rest of things are guest side. Currently the hypervisor like
> Qemu already has mature facility to convert HVA to GPA and inject #MC
> to the guest OS.
> 
> Then we extend the solution for the normal SGX case, so that the task
> has opportunity to make further decision while EPC page has memory
> failure.
> 
> However, when a page triggers a machine check, it only reports the PFN.
> But in order to inject #MC into hypervisor, the virtual address
> is required. Then repurpose the "owner" field as the virtual address of
> the virtual EPC page so that arch_memory_failure() can easily retrieve
> it.
> 
> The EPC page flag - SGX_EPC_PAGE_KVM_GUEST to interpret the meaning
> of the field.
> 
> Suppose an enclave is shared by multiple processes, when an enclave
> page triggers a machine check, the enclave will be disabled so that
> it couldn't be entered again. Killing other processes with the same
> enclave mapped would perhaps be overkill, but they are going to find
> that the enclave is "dead" next time they try to use it. Thanks for
> Jarkko's head up and Tony's clarification on this point.
> 
> Our intension is to provide additional info so that the application has
> more choices. Current behavior looks gently, and we don't want to
> change it.
> 
> If you expect the other processes to be informed in such case, then
> you're looking for an MCA "early kill" feature which worth another
> patch set to implement it.
> 
> Unlike host enclaves, virtual EPC instance cannot be shared by multiple
> VMs. It is because how enclaves are created is totally up to the guest.
> Sharing virtual EPC instance will be very likely to unexpectedly break
> enclaves in all VMs.
> 
> SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance
> being shared by multiple VMs via fork(). However KVM doesn't support
> running a VM across multiple mm structures, and the de facto userspace
> hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice
> this should not happen.
> 
> This series is based on tip/x86/sgx with the following additionally
> applied:
> 
> "x86/sgx: Keep record for SGX VA and Guest page type"
> https://lore.kernel.org/linux-sgx/eb95b32ecf3d44a695610cf7f2816785@intel.com/T/#mbd2ca61983f1f9514a8baf07fdb17d33495eeada
> 
> Tests:
> 1. MCE injection test for SGX in VM.
>    As we expected, the application was killed and VM was alive.
> 2. MCE injection test for SGX on host.
>    As we expected, the application received SIGBUS with extra info.
> 3. Kernel selftest/sgx: PASS
> 4. Internal SGX stress test: PASS
> 5. kmemleak test: No memory leakage detected.
> 
> Much appreciate your feedback.
> 
> Best Regards,
> Zhiquan
> 
> Zhiquan Li (3):
>   x86/sgx: Repurpose the owner field as the virtual address of virtual
>     EPC page
>   x86/sgx: Fine grained SGX MCA behavior for virtualization
>   x86/sgx: Fine grained SGX MCA behavior for normal case
> 
>  arch/x86/kernel/cpu/sgx/main.c | 27 +++++++++++++++++++++++++--
>  arch/x86/kernel/cpu/sgx/sgx.h  |  2 ++
>  arch/x86/kernel/cpu/sgx/virt.c |  4 +++-
>  3 files changed, 30 insertions(+), 3 deletions(-)
> 

This applies on top of Cathy's series, right? Why not send one
series with all 12 patches included?

It makes reviewing easier, and we are well beyond 5.19 timeline
for these features.

BR, Jarkko
Du, Fan June 3, 2022, 1:15 a.m. UTC | #2
>-----Original Message-----
>From: Jarkko Sakkinen <jarkko@kernel.org>
>Sent: Thursday, May 26, 2022 8:35 AM
>To: Li, Zhiquan1 <zhiquan1.li@intel.com>; linux-sgx@vger.kernel.org; Luck,
>Tony <tony.luck@intel.com>
>Cc: dave.hansen@linux.intel.com; Christopherson,, Sean
><seanjc@google.com>; Huang, Kai <kai.huang@intel.com>; Du, Fan
><fan.du@intel.com>
>Subject: Re: [PATCH v3 0/3] x86/sgx: fine grained SGX MCA behavior
>
>On Wed, 2022-05-25 at 18:06 +0800, Zhiquan Li wrote:
>> V2:
>https://lore.kernel.org/linux-sgx/694234d7-6a0d-e85f-f2f9-e52b4a61e1ec@int
>el.com/T/#t
>>
>> Changes since V2:
>> - Repurpose the owner field as the virtual address of virtual EPC page
>> - Remove struct sgx_vepc_page and relevant code.
>> - Remove patch 01 as the changes are not necessary in new design.
>> - Rework patch 02 suggested by Jarkko.
>> - Adapt patch 03 and 04 since struct sgx_vepc_page was discarded.
>> - Replace EPC page flag SGX_EPC_PAGE_IS_VEPC with
>>   SGX_EPC_PAGE_KVM_GUEST as they are duplicated.
>>   Link:
>https://lore.kernel.org/linux-sgx/eb95b32ecf3d44a695610cf7f2816785@intel.c
>om/T/#u
>>
>> V1:
>https://lore.kernel.org/linux-sgx/443cb425-009c-2784-56f4-5e707122de76@int
>el.com/T/#t
>>
>> Changes since V1:
>> - Updated cover letter and commit messages, added valuable
>>   information from Jarkko, Tony and Kai's comments.
>> - Added documentations for struct struct sgx_vepc and
>>   struct sgx_vepc_page.
>>
>> Hi everyone,
>>
>> This series contains a few patches to fine grained SGX MCA behavior.
>>
>> When VM guest access a SGX EPC page with memory failure, current
>> behavior will kill the guest, expected only kill the SGX application
>> inside it.
>>
>> To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
>> information for hypervisor to inject #MC information to guest, which
>> is helpful in SGX virtualization case.
>>
>> The rest of things are guest side. Currently the hypervisor like
>> Qemu already has mature facility to convert HVA to GPA and inject #MC
>> to the guest OS.
>>
>> Then we extend the solution for the normal SGX case, so that the task
>> has opportunity to make further decision while EPC page has memory
>> failure.
>>
>> However, when a page triggers a machine check, it only reports the PFN.
>> But in order to inject #MC into hypervisor, the virtual address
>> is required. Then repurpose the "owner" field as the virtual address of
>> the virtual EPC page so that arch_memory_failure() can easily retrieve
>> it.
>>
>> The EPC page flag - SGX_EPC_PAGE_KVM_GUEST to interpret the meaning
>> of the field.
>>
>> Suppose an enclave is shared by multiple processes, when an enclave
>> page triggers a machine check, the enclave will be disabled so that
>> it couldn't be entered again. Killing other processes with the same
>> enclave mapped would perhaps be overkill, but they are going to find
>> that the enclave is "dead" next time they try to use it. Thanks for
>> Jarkko's head up and Tony's clarification on this point.
>>
>> Our intension is to provide additional info so that the application has
>> more choices. Current behavior looks gently, and we don't want to
>> change it.
>>
>> If you expect the other processes to be informed in such case, then
>> you're looking for an MCA "early kill" feature which worth another
>> patch set to implement it.
>>
>> Unlike host enclaves, virtual EPC instance cannot be shared by multiple
>> VMs. It is because how enclaves are created is totally up to the guest.
>> Sharing virtual EPC instance will be very likely to unexpectedly break
>> enclaves in all VMs.
>>
>> SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance
>> being shared by multiple VMs via fork(). However KVM doesn't support
>> running a VM across multiple mm structures, and the de facto userspace
>> hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice
>> this should not happen.
>>
>> This series is based on tip/x86/sgx with the following additionally
>> applied:
>>
>> "x86/sgx: Keep record for SGX VA and Guest page type"
>>
>https://lore.kernel.org/linux-sgx/eb95b32ecf3d44a695610cf7f2816785@intel.c
>om/T/#mbd2ca61983f1f9514a8baf07fdb17d33495eeada
>>
>> Tests:
>> 1. MCE injection test for SGX in VM.
>>    As we expected, the application was killed and VM was alive.
>> 2. MCE injection test for SGX on host.
>>    As we expected, the application received SIGBUS with extra info.
>> 3. Kernel selftest/sgx: PASS
>> 4. Internal SGX stress test: PASS
>> 5. kmemleak test: No memory leakage detected.
>>
>> Much appreciate your feedback.
>>
>> Best Regards,
>> Zhiquan
>>
>> Zhiquan Li (3):
>>   x86/sgx: Repurpose the owner field as the virtual address of virtual
>>     EPC page
>>   x86/sgx: Fine grained SGX MCA behavior for virtualization
>>   x86/sgx: Fine grained SGX MCA behavior for normal case
>>
>>  arch/x86/kernel/cpu/sgx/main.c | 27 +++++++++++++++++++++++++--
>>  arch/x86/kernel/cpu/sgx/sgx.h  |  2 ++
>>  arch/x86/kernel/cpu/sgx/virt.c |  4 +++-
>>  3 files changed, 30 insertions(+), 3 deletions(-)
>>
>
>This applies on top of Cathy's series, right? Why not send one
>series with all 12 patches included?
>
>It makes reviewing easier, and we are well beyond 5.19 timeline
>for these features.

Patches from Zhiquan try to improve SGX MCA handling, actually this is a BUG being
discussed with customer with SGX deployment already - SGX VM instance got killed
in case of SGX application inside VM consumed poison EPC pages. Expected behavior:
SGX application get killed only in such scenario.

Seamless patchset from Catchy is another standalone feature, the design seems
still under discussion. Combining those two distinct purpose-built patchset together 
looks wired.

Zhiquan's patchset introduced the SGX_EPC_PAGE_IS_VEPC [1] macro to mark guest
EPC page, luckily Cathy's patchset also has a prior built macro SGX_EPC_PAGE_KVM_GUEST
with same semantics as a simple standalone patch[2].

For completeness, how about incorporate Cathy's patch[2](keep original authorship)
into Zhiquan up-coming next version(3 patch from Zhiquan, 1 patch from Cathy)?	
Offline synced with Cathy, she is personally ok. while let's see what others think about on
how to prioritize those two things - bugfix and feature enhancement.

So , Tony, Dave, any suggestion here on how we could move on?

Thanks!

[1]
https://patchwork.kernel.org/project/intel-sgx/patch/20220519031137.245767-1-zhiquan1.li@intel.com/

[2]:
https://patchwork.kernel.org/project/intel-sgx/patch/20220520103904.1216-4-cathy.zhang@intel.com/

>BR, Jarkko
Huang, Kai June 6, 2022, 10:40 a.m. UTC | #3
On Fri, 2022-06-03 at 13:15 +1200, Du, Fan wrote:
> > 
> > This applies on top of Cathy's series, right? Why not send one
> > series with all 12 patches included?
> > 
> > It makes reviewing easier, and we are well beyond 5.19 timeline
> > for these features.
> 
> Patches from Zhiquan try to improve SGX MCA handling, actually this is a BUG
> being
> discussed with customer with SGX deployment already - SGX VM instance got
> killed
> in case of SGX application inside VM consumed poison EPC pages. Expected
> behavior:
> SGX application get killed only in such scenario.
> 
> Seamless patchset from Catchy is another standalone feature, the design seems
> still under discussion. Combining those two distinct purpose-built patchset
> together
> looks wired.

Right.  Those are two different features and I don't see why they should be sent
out together.

Btw, please also note Cathy's SGX rebootless recovery may never get accepted:

https://lore.kernel.org/linux-sgx/Yo0xSNt0JKGgOG59@zn.tnic/T/#m4d1a56fc3ed547d200443dab50bed6484e6d2e1d
https://lore.kernel.org/all/20220524185324.28395-1-bp@alien8.de/


> 
> Zhiquan's patchset introduced the SGX_EPC_PAGE_IS_VEPC [1] macro to mark guest
> EPC page, luckily Cathy's patchset also has a prior built macro
> SGX_EPC_PAGE_KVM_GUEST
> with same semantics as a simple standalone patch[2].
> 
> For completeness, how about incorporate Cathy's patch[2](keep original
> authorship)
> into Zhiquan up-coming next version(3 patch from Zhiquan, 1 patch from Cathy)?
> Offline synced with Cathy, she is personally ok. while let's see what others
> think about on
> how to prioritize those two things - bugfix and feature enhancement.

Zhiquan's series only needs one flag: SGX_EPC_PAGE_KVM_GUEST.  It doesn't need 
SGX_EPC_PAGE_VA.  I would suggest to just use SGX_EPC_PAGE_KVM_GUEST (if it is
preferred comparing to SGX_EPC_PAGE_IS_VEPC) in Zhiquan's series.
Jarkko Sakkinen June 7, 2022, 6:35 a.m. UTC | #4
On Mon, 2022-06-06 at 22:40 +1200, Kai Huang wrote:
> On Fri, 2022-06-03 at 13:15 +1200, Du, Fan wrote:
> > > 
> > > This applies on top of Cathy's series, right? Why not send one
> > > series with all 12 patches included?
> > > 
> > > It makes reviewing easier, and we are well beyond 5.19 timeline
> > > for these features.
> > 
> > Patches from Zhiquan try to improve SGX MCA handling, actually this is a BUG
> > being
> > discussed with customer with SGX deployment already - SGX VM instance got
> > killed
> > in case of SGX application inside VM consumed poison EPC pages. Expected
> > behavior:
> > SGX application get killed only in such scenario.
> > 
> > Seamless patchset from Catchy is another standalone feature, the design seems
> > still under discussion. Combining those two distinct purpose-built patchset
> > together
> > looks wired.
> 
> Right.  Those are two different features and I don't see why they should be sent
> out together.
> 
> Btw, please also note Cathy's SGX rebootless recovery may never get accepted:
> 
> https://lore.kernel.org/linux-sgx/Yo0xSNt0JKGgOG59@zn.tnic/T/#m4d1a56fc3ed547d200443dab50bed6484e6d2e1d
> https://lore.kernel.org/all/20220524185324.28395-1-bp@alien8.de/

Considering Zhiquan's series: all the patches have different message ID,
which puts them into different threads and makes it hard to apply.

E.g.

https://lore.kernel.org/linux-sgx/20220525100625.760633-1-zhiquan1.li@intel.com/T/#u

https://lore.kernel.org/linux-sgx/20220525100730.760815-1-zhiquan1.li@intel.com/T/#u

Zhiquan: can you send your patch series as a single series? Easiest
way to do it is to use git send-email. Right now the series broken.

BR, Jarkko
Zhiquan Li June 7, 2022, 7:02 a.m. UTC | #5
On 2022/6/7 14:35, Jarkko Sakkinen wrote:
> Considering Zhiquan's series: all the patches have different message ID,
> which puts them into different threads and makes it hard to apply.
> 
> E.g.
> 
> https://lore.kernel.org/linux-sgx/20220525100625.760633-1-zhiquan1.li@intel.com/T/#u
> 
> https://lore.kernel.org/linux-sgx/20220525100730.760815-1-zhiquan1.li@intel.com/T/#u
> 
> Zhiquan: can you send your patch series as a single series? Easiest
> way to do it is to use git send-email. Right now the series broken.

Thanks for the heads up.

I was not aware of the patch series are broken on https://lore.kernel.org/.
I do use git send-email, there must be some settings are wrong,
let me check my git config.
Sorry for the inconvenience.

Best Regards,
Zhiquan

> 
> BR, Jarkko
Jarkko Sakkinen June 7, 2022, 8:26 a.m. UTC | #6
On Tue, 2022-06-07 at 15:02 +0800, Zhiquan Li wrote:
> On 2022/6/7 14:35, Jarkko Sakkinen wrote:
> > Considering Zhiquan's series: all the patches have different message ID,
> > which puts them into different threads and makes it hard to apply.
> > 
> > E.g.
> > 
> > https://lore.kernel.org/linux-sgx/20220525100625.760633-1-zhiquan1.li@intel.com/T/#u
> > 
> > https://lore.kernel.org/linux-sgx/20220525100730.760815-1-zhiquan1.li@intel.com/T/#u
> > 
> > Zhiquan: can you send your patch series as a single series? Easiest
> > way to do it is to use git send-email. Right now the series broken.
> 
> Thanks for the heads up.
> 
> I was not aware of the patch series are broken on https://lore.kernel.org/.
> I do use git send-email, there must be some settings are wrong,
> let me check my git config.
> Sorry for the inconvenience.

No worries, just look at the main page at lore for sgx, and you
see immediately the issue :-) 

One more thing.

I think Tony hinted me about mechanism to test MCA behaviour in
kernel but I cannot recall it anymore. Was there a way to make
to simulate a memory failure, and use that to test the patch
set?


> Best Regards,
> Zhiquan

BR, Jarkko
Luck, Tony June 7, 2022, 3:43 p.m. UTC | #7
> I think Tony hinted me about mechanism to test MCA behaviour in
> kernel but I cannot recall it anymore. Was there a way to make
> to simulate a memory failure, and use that to test the patch
> set?

It is possible to inject real errors into enclave memory using ACPI/EINJ.
See the final part of:

  Documentation/firmware-guide/acpi/apei/einj.rst

You do a faked test by just asking to offline a page:

# echo {pfn} > /sys/devices/system/memory/hard_offline_page


-Tony
Jarkko Sakkinen June 8, 2022, 12:19 a.m. UTC | #8
On Tue, 2022-06-07 at 15:43 +0000, Luck, Tony wrote:
> > I think Tony hinted me about mechanism to test MCA behaviour in
> > kernel but I cannot recall it anymore. Was there a way to make
> > to simulate a memory failure, and use that to test the patch
> > set?
> 
> It is possible to inject real errors into enclave memory using ACPI/EINJ.
> See the final part of:
> 
>   Documentation/firmware-guide/acpi/apei/einj.rst
> 
> You do a faked test by just asking to offline a page:
> 
> # echo {pfn} > /sys/devices/system/memory/hard_offline_page

OK, I'll try this out in a VM. Thank you.

BR, Jarkko