mbox series

[v8,0/7] Basic recovery for machine checks inside SGX

Message ID 20211001164724.220532-1-tony.luck@intel.com (mailing list archive)
Headers show
Series Basic recovery for machine checks inside SGX | expand

Message

Luck, Tony Oct. 1, 2021, 4:47 p.m. UTC
Now version 8

Note that linux-kernel@vger.kernel.org and x86@kernel.org are still
dropped from the distribution. Time to get some internal agreement on these
changes before bothering the x86 maintainers with yet another version.

So I'm still looking for Acked-by: or Reviewed-by: on any bits of this
series that are worthy, and comments on the problems I need to fix
in the not-worthy parts.

Changes since v7

Parts 1 & 2: Added "Reviewed-by" tag from Jarkko (Thanks!!!)

Part 3: Jarkko had many good questions about the debugfs interface
	that was included to display addresses of pages on the SGX
	poison list. I don't have good answers to them all. While
	this was a useful hook while I was testing these patches
	(check that all the thousands of SGX pages that into which
	I had injected errors showed up on the list) it isn't needed
	for basic recovery. So I dropped the debugfs bits from the
	patch. We can revisit later when there is a clear use case
	for what should be provided.

Parts 4-7: Unchanged.

Tony Luck (7):
  x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook arch_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

 .../firmware-guide/acpi/apei/einj.rst         |  19 ++++
 arch/x86/include/asm/processor.h              |   8 ++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/main.c                | 104 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   6 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 drivers/acpi/apei/ghes.c                      |   2 +-
 include/linux/mm.h                            |  14 +++
 mm/memory-failure.c                           |  19 +++-
 9 files changed, 168 insertions(+), 11 deletions(-)


base-commit: 5816b3e6577eaa676ceb00a848f0fd65fe2adc29

Comments

Reinette Chatre Oct. 4, 2021, 9:56 p.m. UTC | #1
Hi Tony,

On 10/1/2021 9:47 AM, Tony Luck wrote:
> Now version 8
> 
> Note that linux-kernel@vger.kernel.org and x86@kernel.org are still
> dropped from the distribution. Time to get some internal agreement on these
> changes before bothering the x86 maintainers with yet another version.
> 
> So I'm still looking for Acked-by: or Reviewed-by: on any bits of this
> series that are worthy, and comments on the problems I need to fix
> in the not-worthy parts.

Tested-by: Reinette Chatre <reinette.chatre@intel.com>

Details of testing:
I was curious how the signaling worked after reading some snippets that 
a vDSO does not generate a signal. To help my understanding I created 
the test below in the SGX selftests that implements the test sequence 
you document in patch 6/7 and with it I can see how the SIGBUS is delivered:

[BEGIN TEST OUTPUT]
#  RUN           enclave.mce ...
MCE test from pid 2746 on addr 0x7fc879644000
Set up error injection on virt 0x7fc879644000 and press any key to 
continue test ...

# mce: Test terminated unexpectedly by signal 7
[END TEST OUTPUT]

Below is the test I ran. It only implements steps 3 to 7 from the test 
sequence you document in patch 6/7. It does still require manual 
intervention to determine the physical address and trigger the error 
injection on the physical address. It also currently treats the SIGBUS 
as a test failure, which I did to help clearly see the signal, but it 
could be changed to TEST_F_SIGNAL to have a SIGBUS mean success. The 
test is thus not appropriate for the SGX selftests in its current form 
but is provided as informational to describe the testing I did. It 
applies on top of the recent SGX selftest changes from: 
https://lore.kernel.org/lkml/cover.1631731214.git.reinette.chatre@intel.com/

---8<---
  tools/testing/selftests/sgx/defines.h   |  7 +++++
  tools/testing/selftests/sgx/main.c      | 35 +++++++++++++++++++++
  tools/testing/selftests/sgx/test_encl.c | 41 +++++++++++++++++++++++++
  3 files changed, 83 insertions(+)

diff --git a/tools/testing/selftests/sgx/defines.h 
b/tools/testing/selftests/sgx/defines.h
index 02d775789ea7..2b471ba68e91 100644
--- a/tools/testing/selftests/sgx/defines.h
+++ b/tools/testing/selftests/sgx/defines.h
@@ -24,6 +24,7 @@ enum encl_op_type {
  	ENCL_OP_PUT_TO_ADDRESS,
  	ENCL_OP_GET_FROM_ADDRESS,
  	ENCL_OP_NOP,
+	ENCL_OP_MCE,
  	ENCL_OP_MAX,
  };

@@ -53,4 +54,10 @@ struct encl_op_get_from_addr {
  	uint64_t addr;
  };

+struct encl_op_mce {
+	struct encl_op_header header;
+	uint64_t addr;
+	uint64_t value;
+	uint64_t delay_cycles;
+};
  #endif /* DEFINES_H */
diff --git a/tools/testing/selftests/sgx/main.c 
b/tools/testing/selftests/sgx/main.c
index 79669c245f94..2979306a687a 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -594,4 +594,39 @@ TEST_F(enclave, pte_permissions)
  	EXPECT_EQ(self->run.exception_addr, 0);
  }

+TEST_F_TIMEOUT(enclave, mce, 600)
+{
+	struct encl_op_mce mce_op;
+	unsigned long data_start;
+
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, 
_metadata));
+
+	memset(&self->run, 0, sizeof(self->run));
+	self->run.tcs = self->encl.encl_base;
+
+	data_start = self->encl.encl_base +
+		     encl_get_data_offset(&self->encl) +
+		     PAGE_SIZE;
+
+	printf("MCE test from pid %d on addr 0x%lx\n", getpid(), data_start);
+	/*
+	 * Sanity check to ensure it is possible to write to page that will
+	 * have its permissions manipulated.
+	 */
+
+	printf("Set up error injection on virt 0x%lx and press any key to 
continue test ...\n", data_start);
+	getchar();
+	mce_op.value = MAGIC;
+	mce_op.addr = data_start;
+	mce_op.delay_cycles = 600000000;
+	mce_op.header.type = ENCL_OP_MCE;
+
+	EXPECT_EQ(ENCL_CALL(&mce_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+}
+
  TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/sgx/test_encl.c 
b/tools/testing/selftests/sgx/test_encl.c
index 4fca01cfd898..223a80529ba6 100644
--- a/tools/testing/selftests/sgx/test_encl.c
+++ b/tools/testing/selftests/sgx/test_encl.c
@@ -11,6 +11,11 @@
   */
  static uint8_t encl_buffer[8192] = { 1 };

+static inline void clflush(volatile void *__p)
+{
+	asm volatile("clflush %0" : "+m" (*(volatile char __force *)__p));
+}
+
  static void *memcpy(void *dest, const void *src, size_t n)
  {
  	size_t i;
@@ -35,6 +40,30 @@ static void do_encl_op_get_from_buf(void *op)
  	memcpy(&op2->value, &encl_buffer[0], 8);
  }

+static __always_inline unsigned long long read_tsc(void)
+{
+	unsigned long low, high;
+	asm volatile("rdtsc" : "=a" (low), "=d" (high) :: "ecx");
+	return (low | high << 32);
+}
+
+static __always_inline void rep_nop(void)
+{
+	asm volatile("rep;nop": : :"memory");
+}
+
+void delay_mce(unsigned cycles)
+{
+	unsigned long long start, now;
+	start = read_tsc();
+	for (;;) {
+		now = read_tsc();
+		if (now - start >= cycles)
+			break;
+		rep_nop();
+	}
+}
+
  static void do_encl_op_put_to_addr(void *_op)
  {
  	struct encl_op_put_to_addr *op = _op;
@@ -49,6 +78,17 @@ static void do_encl_op_get_from_addr(void *_op)
  	memcpy(&op->value, (void *)op->addr, 8);
  }

+static void do_encl_op_mce(void *_op)
+{
+	struct encl_op_mce *op = _op;
+
+	memcpy((void *)op->addr, &op->value, 8);
+	clflush((void *)op->addr);
+	delay_mce(op->delay_cycles);
+	memcpy(&op->value, (void *)op->addr, 8);
+}
+
+
  static void do_encl_op_nop(void *_op)
  {

@@ -62,6 +102,7 @@ void encl_body(void *rdi,  void *rsi)
  		do_encl_op_put_to_addr,
  		do_encl_op_get_from_addr,
  		do_encl_op_nop,
+		do_encl_op_mce,
  	};

  	struct encl_op_header *op = (struct encl_op_header *)rdi;
Luck, Tony Oct. 11, 2021, 6:59 p.m. UTC | #2
Posting latest version to a slightly wider audience.

The big picture is that SGX uses some memory pages that are walled off
from access by the OS. This means they:
1) Don't have "struct page" describing them
2) Don't appear in the kernel 1:1 map

But they are still backed by normal DDR memory, so errors can occur.

Parts 1-4 of this series handle the internal SGX bits to keep track of
these pages in an error context. They've had a fair amount of review
on the linux-sgx list (but if any of the 37 subscribers to that list
not named Jarkko or Reinette want to chime in with extra comments and
{Acked,Reviewed,Tested}-by that would be great).

Linux-mm reviewers can (if they like) skip to part 5 where two changes are
made: 1) Hook into memory_failure() in the same spot as device mapping 2)
Skip trying to change 1:1 map (since SGX pages aren't there).

The hooks have generic looking names rather than specifically saying
"sgx" at the suggestion of Dave Hansen. I'm not wedded to the names,
so better suggestions welcome.  I could also change to using some
"ARCH_HAS_PLATFORM_PAGES" config bits if that's the current fashion.

Rafael (and other ACPI list readers) can skip to parts 6 & 7 where there
are hooks into error injection and reporting to simply say "these odd
looking physical addresses are actually ok to use). I added some extra
notes to the einj.rst documentation on how to inject into SGX memory.

Tony Luck (7):
  x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook arch_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

 .../firmware-guide/acpi/apei/einj.rst         |  19 ++++
 arch/x86/include/asm/processor.h              |   8 ++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/main.c                | 104 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   6 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 drivers/acpi/apei/ghes.c                      |   2 +-
 include/linux/mm.h                            |  14 +++
 mm/memory-failure.c                           |  19 +++-
 9 files changed, 168 insertions(+), 11 deletions(-)


base-commit: 64570fbc14f8d7cb3fe3995f20e26bc25ce4b2cc
Jarkko Sakkinen Oct. 12, 2021, 4:48 p.m. UTC | #3
On Mon, 2021-10-11 at 11:59 -0700, Tony Luck wrote:
> Posting latest version to a slightly wider audience.
> 
> The big picture is that SGX uses some memory pages that are walled off
> from access by the OS. This means they:
> 1) Don't have "struct page" describing them
> 2) Don't appear in the kernel 1:1 map
> 
> But they are still backed by normal DDR memory, so errors can occur.
> 
> Parts 1-4 of this series handle the internal SGX bits to keep track of
> these pages in an error context. They've had a fair amount of review
> on the linux-sgx list (but if any of the 37 subscribers to that list
> not named Jarkko or Reinette want to chime in with extra comments and
> {Acked,Reviewed,Tested}-by that would be great).
> 
> Linux-mm reviewers can (if they like) skip to part 5 where two changes are
> made: 1) Hook into memory_failure() in the same spot as device mapping 2)
> Skip trying to change 1:1 map (since SGX pages aren't there).
> 
> The hooks have generic looking names rather than specifically saying
> "sgx" at the suggestion of Dave Hansen. I'm not wedded to the names,
> so better suggestions welcome.  I could also change to using some
> "ARCH_HAS_PLATFORM_PAGES" config bits if that's the current fashion.
> 
> Rafael (and other ACPI list readers) can skip to parts 6 & 7 where there
> are hooks into error injection and reporting to simply say "these odd
> looking physical addresses are actually ok to use). I added some extra
> notes to the einj.rst documentation on how to inject into SGX memory.
> 
> Tony Luck (7):
>   x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
>   x86/sgx: Add infrastructure to identify SGX EPC pages
>   x86/sgx: Initial poison handling for dirty and free pages
>   x86/sgx: Add SGX infrastructure to recover from poison
>   x86/sgx: Hook arch_memory_failure() into mainline code
>   x86/sgx: Add hook to error injection address validation
>   x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
> 
>  .../firmware-guide/acpi/apei/einj.rst         |  19 ++++
>  arch/x86/include/asm/processor.h              |   8 ++
>  arch/x86/include/asm/set_memory.h             |   4 +
>  arch/x86/kernel/cpu/sgx/main.c                | 104 +++++++++++++++++-
>  arch/x86/kernel/cpu/sgx/sgx.h                 |   6 +-
>  drivers/acpi/apei/einj.c                      |   3 +-
>  drivers/acpi/apei/ghes.c                      |   2 +-
>  include/linux/mm.h                            |  14 +++
>  mm/memory-failure.c                           |  19 +++-
>  9 files changed, 168 insertions(+), 11 deletions(-)
> 
> 
> base-commit: 64570fbc14f8d7cb3fe3995f20e26bc25ce4b2cc

I think you instructed me on this before but I've forgot it:
how do I simulate this and test how it works?

/Jarkko
Luck, Tony Oct. 12, 2021, 5:57 p.m. UTC | #4
> I think you instructed me on this before but I've forgot it:
> how do I simulate this and test how it works?

Jarkko,

You can test the non-execution paths (e.g. where the memory error is
reported by a patrol scrubber in the memory controller) by:

# echo 0x{some_SGX_EPC_ADDRESS} > /sys/devices/system/memory/hard_offline_page

The execution paths are more difficult. You need a system that can inject
errors into EPC memory. There are some hints in the Documenation changes
in part 0006.

Reinette posted some changes to sgx tests that she used to validate.

-Tony
Luck, Tony Oct. 18, 2021, 8:25 p.m. UTC | #5
v10 (based on v5.15-rc6)

Changes since v9:

ACPI reviewers (Rafael): No changes to parts 6 & 7.

MM reviewers (Horiguchi-san): No changes to part 5.

Jarkko:
	Added Reviewed-by tags to remaining patches.
	N.B. I kept the tags on parts 1, 3, 4 because
	changes based on Sean feedback didn't seem
	consequential. Please let me know if you disagree
	and see new problems introduced by me trying to
	follow Sean's feedback.

Sean:
	1) Reverse the polarity of the neutron flow (sorry,
	Dr Who fan will always autocomplete a sentence that
	begins "reverse the polarity" that way.) Actual change
	is for the new flag bit. Instead of marking in-use
	pages with the new bit, mark free pages instead. This
	avoids the weirdness where I marked the pages on the
	dirty list as "in-use", when clearly they are not.

	2) Race conditions adding poisoned pages to the global
	list of poisoned pages.
	Fixed this by changing from a global list to a per-node
	list. Additions are protected by the node->lock.

	3) Use list_move() instead of list_del(); list_add()
	Fixed both places I used this idiom.

	4) Race looking at page->poison when cleaning dirty pages.
	Added a comment documenting why losing this race isn't
	overly harmful.

Tony Luck (7):
  x86/sgx: Add new sgx_epc_page flag bit to mark free pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook arch_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/include/asm/processor.h              |   8 ++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/main.c                | 113 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   7 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 drivers/acpi/apei/ghes.c                      |   2 +-
 include/linux/mm.h                            |  14 +++
 mm/memory-failure.c                           |  19 ++-
 9 files changed, 179 insertions(+), 10 deletions(-)


base-commit: 519d81956ee277b4419c723adfb154603c2565ba
Luck, Tony Oct. 26, 2021, 10 p.m. UTC | #6
Boris,

I took this series out of lkml/x86 for a few revisions, I think
the last one posted to lkml was v5. So much has changed since then
that it might be easier to just look at this as if it were v1 and
ignore the earlier history.

First four patches add infrastructure within the SGX code to
track enclave pages (because these pages don't have a "struct
page" as they aren't directly accessible by Linux). All have
"Reviewed-by" tags from Jarkko (SGX maintainer).

Patch 5 hooks into memory_failure() to invoke recovery if
the physical address is in enclave space. This has a
"Reviewed-by" tag from Naoya Horiguchi the maintainer for
mm/memory-failure.c

Patch 6 is a hook into the error injection code and addition
to the error injection documentation explaining extra steps
needed to inject into SGX enclave memory.

Patch 7 is a hook into GHES error reporting path to recognize
that SGX enclave addresses are valid and need processing.

-Tony

Tony Luck (7):
  x86/sgx: Add new sgx_epc_page flag bit to mark free pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook arch_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/Kconfig                              |   1 +
 arch/x86/include/asm/processor.h              |   8 ++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/main.c                | 113 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   7 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 drivers/acpi/apei/ghes.c                      |   2 +-
 include/linux/mm.h                            |  13 ++
 mm/memory-failure.c                           |  19 ++-
 10 files changed, 179 insertions(+), 10 deletions(-)


base-commit: 3906fe9bb7f1a2c8667ae54e967dc8690824f4ea