[v5,03/16] x86/tdx: Exclude Shared bit from physical_mask

Message ID	20211009003711.1390019-4-sathyanarayanan.kuppuswamy@linux.intel.com (mailing list archive)
State	Not Applicable
Delegated to:	Bjorn Helgaas
Headers	show Return-Path: <linux-pci-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E28EC433F5 for <linux-pci@archiver.kernel.org>; Sat, 9 Oct 2021 00:37:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4847860FC3 for <linux-pci@archiver.kernel.org>; Sat, 9 Oct 2021 00:37:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244122AbhJIAjj (ORCPT <rfc822;linux-pci@archiver.kernel.org>); Fri, 8 Oct 2021 20:39:39 -0400 Received: from mga02.intel.com ([134.134.136.20]:5251 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244049AbhJIAjc (ORCPT <rfc822;linux-pci@vger.kernel.org>); Fri, 8 Oct 2021 20:39:32 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10131"; a="213756492" X-IronPort-AV: E=Sophos;i="5.85,358,1624345200"; d="scan'208";a="213756492" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2021 17:37:35 -0700 X-IronPort-AV: E=Sophos;i="5.85,358,1624345200"; d="scan'208";a="624905347" Received: from dmsojoza-mobl3.amr.corp.intel.com (HELO skuppusw-desk1.amr.corp.intel.com) ([10.251.135.62]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2021 17:37:34 -0700 From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Peter Zijlstra <peterz@infradead.org>, Andy Lutomirski <luto@kernel.org>, Bjorn Helgaas <bhelgaas@google.com>, Richard Henderson <rth@twiddle.net>, Thomas Bogendoerfer <tsbogend@alpha.franken.de>, James E J Bottomley <James.Bottomley@HansenPartnership.com>, Helge Deller <deller@gmx.de>, "David S . Miller" <davem@davemloft.net>, Arnd Bergmann <arnd@arndb.de>, Jonathan Corbet <corbet@lwn.net>, "Michael S . Tsirkin" <mst@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, David Hildenbrand <david@redhat.com>, Andrea Arcangeli <aarcange@redhat.com>, Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter H Anvin <hpa@zytor.com>, Dave Hansen <dave.hansen@intel.com>, Tony Luck <tony.luck@intel.com>, Dan Williams <dan.j.williams@intel.com>, Andi Kleen <ak@linux.intel.com>, Kirill Shutemov <kirill.shutemov@linux.intel.com>, Sean Christopherson <seanjc@google.com>, Kuppuswamy Sathyanarayanan <knsathya@kernel.org>, Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>, x86@kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: [PATCH v5 03/16] x86/tdx: Exclude Shared bit from physical_mask Date: Fri, 8 Oct 2021 17:36:58 -0700 Message-Id: <20211009003711.1390019-4-sathyanarayanan.kuppuswamy@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211009003711.1390019-1-sathyanarayanan.kuppuswamy@linux.intel.com> References: <20211009003711.1390019-1-sathyanarayanan.kuppuswamy@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-pci.vger.kernel.org> X-Mailing-List: linux-pci@vger.kernel.org
Series	Add TDX Guest Support (shared-mm support) \| expand [v5,00/16] Add TDX Guest Support (shared-mm support) [v5,01/16] x86/mm: Move force_dma_unencrypted() to common code [v5,02/16] x86/tdx: Get TD execution environment information via TDINFO [v5,03/16] x86/tdx: Exclude Shared bit from physical_mask [v5,04/16] x86/tdx: Make pages shared in ioremap() [v5,05/16] x86/tdx: Add helper to do MapGPA hypercall [v5,06/16] x86/tdx: Make DMA pages shared [v5,07/16] x86/kvm: Use bounce buffers for TD guest [v5,08/16] x86/tdx: ioapic: Add shared bit for IOAPIC base address [v5,09/16] x86/tdx: Enable shared memory confidential guest flags for TDX guest [v5,10/16] PCI: Consolidate pci_iomap_range(), pci_iomap_wc_range() [v5,11/16] asm/io.h: Add ioremap_host_shared fallback [v5,12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range() [v5,13/16] PCI: Mark MSI data shared [v5,14/16] virtio: Use shared mappings for virtio PCI devices [v5,15/16] x86/tdx: Implement ioremap_host_shared for x86 [v5,16/16] x86/tdx: Add cmdline option to force use of ioremap_host_shared

Message ID

20211009003711.1390019-4-sathyanarayanan.kuppuswamy@linux.intel.com (mailing list archive)

State

Not Applicable

Delegated to:

Bjorn Helgaas

Headers

From: Kuppuswamy Sathyanarayanan
        <sathyanarayanan.kuppuswamy@linux.intel.com>
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        Peter Zijlstra <peterz@infradead.org>,
        Andy Lutomirski <luto@kernel.org>,
        Bjorn Helgaas <bhelgaas@google.com>,
        Richard Henderson <rth@twiddle.net>,
        Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
        James E J Bottomley <James.Bottomley@HansenPartnership.com>,
        Helge Deller <deller@gmx.de>,
        "David S . Miller" <davem@davemloft.net>,
        Arnd Bergmann <arnd@arndb.de>,
        Jonathan Corbet <corbet@lwn.net>,
        "Michael S . Tsirkin" <mst@redhat.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        David Hildenbrand <david@redhat.com>,
        Andrea Arcangeli <aarcange@redhat.com>,
        Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter H Anvin <hpa@zytor.com>, Dave Hansen <dave.hansen@intel.com>,
        Tony Luck <tony.luck@intel.com>,
        Dan Williams <dan.j.williams@intel.com>,
        Andi Kleen <ak@linux.intel.com>,
        Kirill Shutemov <kirill.shutemov@linux.intel.com>,
        Sean Christopherson <seanjc@google.com>,
        Kuppuswamy Sathyanarayanan <knsathya@kernel.org>,
        Kuppuswamy Sathyanarayanan
        <sathyanarayanan.kuppuswamy@linux.intel.com>, x86@kernel.org,
        linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
        linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org,
        linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org,
        linux-arch@vger.kernel.org, linux-doc@vger.kernel.org,
        virtualization@lists.linux-foundation.org
Subject: [PATCH v5 03/16] x86/tdx: Exclude Shared bit from physical_mask
Date: Fri,  8 Oct 2021 17:36:58 -0700
Message-Id: 
 <20211009003711.1390019-4-sathyanarayanan.kuppuswamy@linux.intel.com>
In-Reply-To: 
 <20211009003711.1390019-1-sathyanarayanan.kuppuswamy@linux.intel.com>
References: 
 <20211009003711.1390019-1-sathyanarayanan.kuppuswamy@linux.intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Add TDX Guest Support (shared-mm support) | expand

Commit Message

Kuppuswamy Sathyanarayanan Oct. 9, 2021, 12:36 a.m. UTC

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Just like MKTME, TDX reassigns bits of the physical address for
metadata.  MKTME used several bits for an encryption KeyID. TDX
uses a single bit in guests to communicate whether a physical page
should be protected by TDX as private memory (bit set to 0) or
unprotected and shared with the VMM (bit set to 1).

Add a helper, tdx_shared_mask() to generate the mask.  The processor
enumerates its physical address width to include the shared bit, which
means it gets included in __PHYSICAL_MASK by default.

Remove the shared mask from 'physical_mask' since any bits in
tdx_shared_mask() are not used for physical addresses in page table
entries.

Also, note that shared mapping configuration cannot be clubbed between
AMD SME and Intel TDX Guest platforms in common function. SME has
to do it very early in __startup_64() as it sets the bit on all
memory, except what is used for communication. TDX can postpone it,
as it don't need any shared mapping in very early boot.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v4:
 * Renamed tdg_shared_mask() to tdx_shared_mask().

Changes since v3:
 * None

Changes since v1:
 * Fixed format issues in commit log.

 arch/x86/Kconfig           | 1 +
 arch/x86/include/asm/tdx.h | 4 ++++
 arch/x86/kernel/tdx.c      | 9 +++++++++
 3 files changed, 14 insertions(+)

Comments

Sean Christopherson Nov. 5, 2021, 10:11 p.m. UTC | #1

On Fri, Oct 08, 2021, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> Just like MKTME, TDX reassigns bits of the physical address for
> metadata.  MKTME used several bits for an encryption KeyID. TDX
> uses a single bit in guests to communicate whether a physical page
> should be protected by TDX as private memory (bit set to 0) or
> unprotected and shared with the VMM (bit set to 1).
> 
> Add a helper, tdx_shared_mask() to generate the mask.  The processor
> enumerates its physical address width to include the shared bit, which
> means it gets included in __PHYSICAL_MASK by default.

This is incorrect.  The shared bit _may_ be a legal PA bit, but AIUI it's not a
hard requirement.

> Remove the shared mask from 'physical_mask' since any bits in
> tdx_shared_mask() are not used for physical addresses in page table
> entries.

...

> @@ -94,6 +100,9 @@ static void tdx_get_info(void)
>  
>  	td_info.gpa_width = out.rcx & GENMASK(5, 0);
>  	td_info.attributes = out.rdx;
> +
> +	/* Exclude Shared bit from the __PHYSICAL_MASK */
> +	physical_mask &= ~tdx_shared_mask();

This is insufficient, though it's not really the fault of this patch, the specs
themselves botch this whole thing.

The TDX Module spec explicitly states that GPAs above GPAW are considered reserved.

    10.11.1. GPAW-Relate EPT Violations
    GPA bits higher than the SHARED bit are considered reserved and must be 0.
    Address translation with any of the reserved bits set to 1 cause a #PF with
    PFEC (Page Fault Error Code) RSVD bit set.

But this is contradicted by the architectural extensions spec, which states that
a GPA that satisfies MAXPA >= GPA > GPAW "can" cause an EPT violation, not #PF.
Note, this section also appears to have a bug, as it states that GPA bit 47 is
both the SHARED bit and reserved.  I assume that blurb is intended to clarify
that bit 47 _would_ be reserved if it weren't the SHARED bit, but because it's
the shared bit it's ok to access.

    1.4.2
    Guest Physical Address Translation
    If the CPU's maximum physical-address width (MAXPA) is 52 and the guest physical
    address width is configured to be 48, accesses with GPA bits 51:48 not all being
    0 can cause an EPT-violation, where such EPT-violations are not mutated to #VE,
    even if the “EPT-violations #VE” execution control is 1.

    If the CPU's physical-address width (MAXPA) is less than 48 and the SHARED bit
    is configured to be in bit position 47, GPA bit 47 would be reserved, and GPA
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                    
    bits 46:MAXPA would be reserved. On such CPUs, setting bits 51:48 or bits
    46:MAXPA in any paging structure can cause a reserved bit page fault on access.

The Module spec also calls out that the effective GPA is not to be confused with
MAXPA, which combined with the above blurb about MAXPA < GPAW, suggests that MAXPA
is enumerated separately by design so that the guest doesn't incorrectly think
46:MAXPA are usable.  But that is problematic for the case where MAXPA > GPAW.

    The effective GPA width (in bits) for this TD (do not confuse with MAXPA).
    SHARED bit is at GPA bit GPAW-1.

I can't find the exact reference, but the TDX module always passes through host's
MAXPHYADDR.  As it pertains to this patch, just doing

	physical_mask &= ~tdx_shared_mask()

means that a guest running with GPAW=0 and MAXPHYADDR>48 will have a discontiguous
physical_mask, and could access "reserved" memory.  If the VMM defines legal memory
with bits [MAXPHYADDR:48]!=0, explosions may ensue.  That's arguably a VMM bug, but
given that the VMM is untrusted I think the guest should be paranoid when handling
the SHARED bit.  I also don't know that the kernel will play nice with a discontiguous
mask.

Specs aside, unless Intel makes a hardware change to treat GPAW as guest.MAXPHYADDR,
or the TDX Module emulates on EPT violations to inject #PF(RSVD) when appropriate,
this mess isn't going to be truly fixed from the guest perspective.

So, IMO all bits >= GPAW should be cleared, and the kernel should warn and/or
refuse to boot if the host has defined legal memory in that range.

FWIW, from a VMM perspective, I'm pretty sure the only sane approach is to force
GPAW=1, a.k.a. SHARED bit == 51, if host.MAXPHYADDR>=49.  But on the guest side,
I think we should be paranoid.

Kirill A. Shutemov Nov. 8, 2021, 2:45 p.m. UTC | #2

On Fri, Nov 05, 2021 at 10:11:48PM +0000, Sean Christopherson wrote:
> On Fri, Oct 08, 2021, Kuppuswamy Sathyanarayanan wrote:
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > 
> > Just like MKTME, TDX reassigns bits of the physical address for
> > metadata.  MKTME used several bits for an encryption KeyID. TDX
> > uses a single bit in guests to communicate whether a physical page
> > should be protected by TDX as private memory (bit set to 0) or
> > unprotected and shared with the VMM (bit set to 1).
> > 
> > Add a helper, tdx_shared_mask() to generate the mask.  The processor
> > enumerates its physical address width to include the shared bit, which
> > means it gets included in __PHYSICAL_MASK by default.
> 
> This is incorrect.  The shared bit _may_ be a legal PA bit, but AIUI it's not a
> hard requirement.

Good point, will fix.

> > Remove the shared mask from 'physical_mask' since any bits in
> > tdx_shared_mask() are not used for physical addresses in page table
> > entries.
> 
> ...
> 
> > @@ -94,6 +100,9 @@ static void tdx_get_info(void)
> >  
> >  	td_info.gpa_width = out.rcx & GENMASK(5, 0);
> >  	td_info.attributes = out.rdx;
> > +
> > +	/* Exclude Shared bit from the __PHYSICAL_MASK */
> > +	physical_mask &= ~tdx_shared_mask();
> 
> This is insufficient, though it's not really the fault of this patch, the specs
> themselves botch this whole thing.
> 
> The TDX Module spec explicitly states that GPAs above GPAW are considered reserved.
> 
>     10.11.1. GPAW-Relate EPT Violations
>     GPA bits higher than the SHARED bit are considered reserved and must be 0.
>     Address translation with any of the reserved bits set to 1 cause a #PF with
>     PFEC (Page Fault Error Code) RSVD bit set.
> 
> But this is contradicted by the architectural extensions spec, which states that
> a GPA that satisfies MAXPA >= GPA > GPAW "can" cause an EPT violation, not #PF.
> Note, this section also appears to have a bug, as it states that GPA bit 47 is
> both the SHARED bit and reserved.  I assume that blurb is intended to clarify
> that bit 47 _would_ be reserved if it weren't the SHARED bit, but because it's
> the shared bit it's ok to access.
> 
>     1.4.2
>     Guest Physical Address Translation
>     If the CPU's maximum physical-address width (MAXPA) is 52 and the guest physical
>     address width is configured to be 48, accesses with GPA bits 51:48 not all being
>     0 can cause an EPT-violation, where such EPT-violations are not mutated to #VE,
>     even if the “EPT-violations #VE” execution control is 1.
> 
>     If the CPU's physical-address width (MAXPA) is less than 48 and the SHARED bit
>     is configured to be in bit position 47, GPA bit 47 would be reserved, and GPA
>        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                    
>     bits 46:MAXPA would be reserved. On such CPUs, setting bits 51:48 or bits
>     46:MAXPA in any paging structure can cause a reserved bit page fault on access.
> 
> The Module spec also calls out that the effective GPA is not to be confused with
> MAXPA, which combined with the above blurb about MAXPA < GPAW, suggests that MAXPA
> is enumerated separately by design so that the guest doesn't incorrectly think
> 46:MAXPA are usable.  But that is problematic for the case where MAXPA > GPAW.
> 
>     The effective GPA width (in bits) for this TD (do not confuse with MAXPA).
>     SHARED bit is at GPA bit GPAW-1.
> 
> I can't find the exact reference, but the TDX module always passes through host's
> MAXPHYADDR.  As it pertains to this patch, just doing
> 
> 	physical_mask &= ~tdx_shared_mask()
> 
> means that a guest running with GPAW=0 and MAXPHYADDR>48 will have a discontiguous
> physical_mask, and could access "reserved" memory.  If the VMM defines legal memory
> with bits [MAXPHYADDR:48]!=0, explosions may ensue.  That's arguably a VMM bug, but
> given that the VMM is untrusted I think the guest should be paranoid when handling
> the SHARED bit.  I also don't know that the kernel will play nice with a discontiguous
> mask.

I expect it to be buggy.

> Specs aside, unless Intel makes a hardware change to treat GPAW as guest.MAXPHYADDR,
> or the TDX Module emulates on EPT violations to inject #PF(RSVD) when appropriate,
> this mess isn't going to be truly fixed from the guest perspective.
> 
> So, IMO all bits >= GPAW should be cleared, and the kernel should warn and/or
> refuse to boot if the host has defined legal memory in that range.

Right. But only >= GPAW-1 as shared bit is the MSB within GPAW:

	physical_mask &= GENMASK_ULL(td_info.gpa_width - 2, 0);

'2' here smells bad, but well...

Given that physical_mask is now contiguous we can truncate anything from
e820 that cannot be addressed with adjusted __PHYSICAL_MASK:

iff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index bc0657f0deed..16d57a8769e8 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -833,6 +833,9 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, enum e820_type
 	unsigned long last_pfn = 0;
 	unsigned long max_arch_pfn = MAX_ARCH_PFN;

+	if (max_arch_pfn > PHYS_PFN(__PHYSICAL_MASK + 1))
+		max_arch_pfn = PHYS_PFN(__PHYSICAL_MASK + 1);
+
 	for (i = 0; i < e820_table->nr_entries; i++) {
 		struct e820_entry *entry = &e820_table->entries[i];
 		unsigned long start_pfn;

Does it look reasonable?

> FWIW, from a VMM perspective, I'm pretty sure the only sane approach is to force
> GPAW=1, a.k.a. SHARED bit == 51, if host.MAXPHYADDR>=49.  But on the guest side,
> I think we should be paranoid.

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 37b27412f52e..e99c669e633a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -871,6 +871,7 @@  config INTEL_TDX_GUEST
 	depends on SECURITY
 	depends on X86_X2APIC
 	select ARCH_HAS_CC_PLATFORM
+	select X86_MEM_ENCRYPT_COMMON
 	help
 	  Provide support for running in a trusted domain on Intel processors
 	  equipped with Trusted Domain Extensions. TDX is a Intel technology
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index eb5e9dbe1861..b8f758dbbea9 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -76,6 +76,8 @@  bool tdx_handle_virtualization_exception(struct pt_regs *regs,
 
 bool tdx_early_handle_ve(struct pt_regs *regs);
 
+extern phys_addr_t tdx_shared_mask(void);
+
 /*
  * To support I/O port access in decompressor or early kernel init
  * code, since #VE exception handler cannot be used, use paravirt
@@ -141,6 +143,8 @@  static inline void tdx_early_init(void) { };
 
 static inline bool tdx_early_handle_ve(struct pt_regs *regs) { return false; }
 
+static inline phys_addr_t tdx_shared_mask(void) { return 0; }
+
 #endif /* CONFIG_INTEL_TDX_GUEST */
 
 #if defined(CONFIG_KVM_GUEST) && defined(CONFIG_INTEL_TDX_GUEST)
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index bb237cf291e6..8a37ab0c6cbf 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -71,6 +71,12 @@  static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14,
 	return out->r10;
 }
 
+/* The highest bit of a guest physical address is the "sharing" bit */
+phys_addr_t tdx_shared_mask(void)
+{
+	return 1ULL << (td_info.gpa_width - 1);
+}
+
 static void tdx_get_info(void)
 {
 	struct tdx_module_output out;
@@ -94,6 +100,9 @@  static void tdx_get_info(void)
 
 	td_info.gpa_width = out.rcx & GENMASK(5, 0);
 	td_info.attributes = out.rdx;
+
+	/* Exclude Shared bit from the __PHYSICAL_MASK */
+	physical_mask &= ~tdx_shared_mask();
 }
 
 static __cpuidle void _tdx_halt(const bool irq_disabled, const bool do_sti)

[v5,03/16] x86/tdx: Exclude Shared bit from physical_mask

Commit Message

Comments

Patch