From patchwork Mon Sep 11 04:37:46 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haozhong Zhang X-Patchwork-Id: 9946601 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 79E4D6035D for ; Mon, 11 Sep 2017 04:42:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6F31C289DE for ; Mon, 11 Sep 2017 04:42:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 63F0F28AD7; Mon, 11 Sep 2017 04:42:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 657E3289DE for ; Mon, 11 Sep 2017 04:42:00 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1drGUr-0001AC-Od; Mon, 11 Sep 2017 04:38:57 +0000 Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1drGUp-000189-Tj for xen-devel@lists.xen.org; Mon, 11 Sep 2017 04:38:56 +0000 Received: from [85.158.137.68] by server-2.bemta-3.messagelabs.com id BF/7B-02041-F5316B95; Mon, 11 Sep 2017 04:38:55 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpikeJIrShJLcpLzFFi42Jpa+sQ0Y0T3hZ pMKtH2mLJx8UsDoweR3f/ZgpgjGLNzEvKr0hgzbg1s4m94FpWxePvu9gbGI8EdjFycQgJTGeU 2LxkNXMXIyeHhACvxJFlM1gh7ACJ9q7ZTBBFvYwSxze2MIIk2AT0JVY8PghWJCIgLXHt82VGk CJmgXVMEnNf/QabJCwQJvF+w3+wBhYBVYldt5tZQGxeAVuJZ7famSA2yEvsarsINohTwE7i4M t3YL1CQDULTi9gncDIu4CRYRWjRnFqUVlqka6hpV5SUWZ6RkluYmaOrqGBsV5uanFxYnpqTmJ SsV5yfu4mRmBI1DMwMO5g/H3c7xCjJAeTkijvu+NbIoX4kvJTKjMSizPii0pzUosPMcpwcChJ 8KoIbYsUEixKTU+tSMvMAQYnTFqCg0dJhDcKJM1bXJCYW5yZDpE6xWjMcWDClT9MHB037/5hE mLJy89LlRLnlQMpFQApzSjNgxsEi5pLjLJSwryMDAwMQjwFqUW5mSWo8q8YxTkYlYR5ZUCm8G TmlcDtewV0ChPQKTyXtoCcUpKIkJJqYOz2sd2aka81b8e2cvnuqmOFiR8FHkZWndaesj/305X Uoq077t5JZJ/QfzVok1gXRzXLeVtbsdAFddYTIp4sZIq9biu9fKuE3s0+7wr3hat/aFys/ru8 ovdjNOMB7n3q89e+bql/+fnAurXXPrGu4pqT0dGrklHP+GnPbT32Cg/bNdb6cw6rT1BiKc5IN NRiLipOBAD5zb7GlQIAAA== X-Env-Sender: haozhong.zhang@intel.com X-Msg-Ref: server-7.tower-31.messagelabs.com!1505104730!106558789!2 X-Originating-IP: [134.134.136.20] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogMTM0LjEzNC4xMzYuMjAgPT4gMzU1MzU4\n X-StarScan-Received: X-StarScan-Version: 9.4.45; banners=-,-,- X-VirusChecked: Checked Received: (qmail 23095 invoked from network); 11 Sep 2017 04:38:53 -0000 Received: from mga02.intel.com (HELO mga02.intel.com) (134.134.136.20) by server-7.tower-31.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 11 Sep 2017 04:38:53 -0000 Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2017 21:38:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos; i="5.42,376,1500966000"; d="scan'208"; a="1217078273" Received: from hz-desktop.sh.intel.com (HELO localhost) ([10.239.159.142]) by fmsmga002.fm.intel.com with ESMTP; 10 Sep 2017 21:38:51 -0700 From: Haozhong Zhang To: xen-devel@lists.xen.org Date: Mon, 11 Sep 2017 12:37:46 +0800 Message-Id: <20170911043820.14617-6-haozhong.zhang@intel.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170911043820.14617-1-haozhong.zhang@intel.com> References: <20170911043820.14617-1-haozhong.zhang@intel.com> Cc: Haozhong Zhang , George Dunlap , Andrew Cooper , Jan Beulich , Chao Peng , Dan Williams Subject: [Xen-devel] [RFC XEN PATCH v3 05/39] x86/mm: exclude PMEM regions from initial frametable X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP No specification defines that PMEM regions cannot appear in margins between RAM regions. If that does happen, init_frametable() will need to allocate RAM for the part of frametable of PMEM regions. However, PMEM regions can be very large (several terabytes or more), so init_frametable() may fail. Because Xen does not use PMEM at the boot time, we can defer the actual resource allocation of frametable of PMEM regions. At the boot time, all pages of frametable of PMEM regions appearing between RAM regions are mapped one RAM page filled with 0xff. Any attempt, whichs write to those frametable pages before the their actual resource is allocated, implies bugs in Xen. Therefore, the read-only mapping is used here to make those bugs explicit. Signed-off-by: Haozhong Zhang --- Cc: Andrew Cooper Cc: George Dunlap Cc: Jan Beulich --- xen/arch/x86/mm.c | 117 +++++++++++++++++++++++++++++++++++++++++----- xen/arch/x86/setup.c | 4 ++ xen/drivers/acpi/Makefile | 2 + xen/drivers/acpi/nfit.c | 116 +++++++++++++++++++++++++++++++++++++++++++++ xen/include/acpi/actbl1.h | 43 +++++++++++++++++ xen/include/xen/acpi.h | 7 +++ 6 files changed, 278 insertions(+), 11 deletions(-) create mode 100644 xen/drivers/acpi/nfit.c diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index e5a029c9be..2fdf609805 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -83,6 +83,9 @@ * an application-supplied buffer). */ +#ifdef CONFIG_NVDIMM_PMEM +#include +#endif #include #include #include @@ -196,31 +199,123 @@ static int __init parse_mmio_relax(const char *s) } custom_param("mmio-relax", parse_mmio_relax); -static void __init init_frametable_chunk(void *start, void *end) +static void __init init_frametable_ram_chunk(unsigned long s, unsigned long e) { - unsigned long s = (unsigned long)start; - unsigned long e = (unsigned long)end; - unsigned long step, mfn; + unsigned long cur, step, mfn; - ASSERT(!(s & ((1 << L2_PAGETABLE_SHIFT) - 1))); - for ( ; s < e; s += step << PAGE_SHIFT ) + for ( cur = s; cur < e; cur += step << PAGE_SHIFT ) { step = 1UL << (cpu_has_page1gb && - !(s & ((1UL << L3_PAGETABLE_SHIFT) - 1)) ? + !(cur & ((1UL << L3_PAGETABLE_SHIFT) - 1)) ? L3_PAGETABLE_SHIFT - PAGE_SHIFT : L2_PAGETABLE_SHIFT - PAGE_SHIFT); /* * The hardcoded 4 below is arbitrary - just pick whatever you think * is reasonable to waste as a trade-off for using a large page. */ - while ( step && s + (step << PAGE_SHIFT) > e + (4 << PAGE_SHIFT) ) + while ( step && cur + (step << PAGE_SHIFT) > e + (4 << PAGE_SHIFT) ) step >>= PAGETABLE_ORDER; mfn = alloc_boot_pages(step, step); - map_pages_to_xen(s, mfn, step, PAGE_HYPERVISOR); + map_pages_to_xen(cur, mfn, step, PAGE_HYPERVISOR); } - memset(start, 0, end - start); - memset(end, -1, s - e); + memset((void *)s, 0, e - s); + memset((void *)e, -1, cur - e); +} + +#ifdef CONFIG_NVDIMM_PMEM +static void __init init_frametable_pmem_chunk(unsigned long s, unsigned long e) +{ + static unsigned long pmem_init_frametable_mfn; + + ASSERT(!((s | e) & (PAGE_SIZE - 1))); + + if ( !pmem_init_frametable_mfn ) + { + pmem_init_frametable_mfn = alloc_boot_pages(1, 1); + if ( !pmem_init_frametable_mfn ) + panic("Not enough memory for pmem initial frame table page"); + memset(mfn_to_virt(pmem_init_frametable_mfn), -1, PAGE_SIZE); + } + + while ( s < e ) + { + /* + * The real frame table entries of a pmem region will be + * created when the pmem region is registered to hypervisor. + * Any write attempt to the initial entries of that pmem + * region implies potential hypervisor bugs. In order to make + * those bugs explicit, map those initial entries as read-only. + */ + map_pages_to_xen(s, pmem_init_frametable_mfn, 1, PAGE_HYPERVISOR_RO); + s += PAGE_SIZE; + } +} +#endif /* CONFIG_NVDIMM_PMEM */ + +static void __init init_frametable_chunk(void *start, void *end) +{ + unsigned long s = (unsigned long)start; + unsigned long e = (unsigned long)end; +#ifdef CONFIG_NVDIMM_PMEM + unsigned long pmem_smfn, pmem_emfn; + unsigned long pmem_spage = s, pmem_epage = s; + unsigned long pmem_page_aligned; + bool found = false; +#endif /* CONFIG_NVDIMM_PMEM */ + + ASSERT(!(s & ((1 << L2_PAGETABLE_SHIFT) - 1))); + +#ifndef CONFIG_NVDIMM_PMEM + init_frametable_ram_chunk(s, e); +#else + while ( s < e ) + { + /* No previous found pmem region overlaps with s ~ e. */ + if ( s >= (pmem_epage & PAGE_MASK) ) + { + found = acpi_nfit_boot_search_pmem( + mfn_x(page_to_mfn((struct page_info *)s)), + mfn_x(page_to_mfn((struct page_info *)e)), + &pmem_smfn, &pmem_emfn); + if ( found ) + { + pmem_spage = (unsigned long)mfn_to_page(_mfn(pmem_smfn)); + pmem_epage = (unsigned long)mfn_to_page(_mfn(pmem_emfn)); + } + } + + /* No pmem region found in s ~ e. */ + if ( s >= (pmem_epage & PAGE_MASK) ) + { + init_frametable_ram_chunk(s, e); + break; + } + + if ( s < pmem_spage ) + { + init_frametable_ram_chunk(s, pmem_spage); + pmem_page_aligned = (pmem_spage + PAGE_SIZE - 1) & PAGE_MASK; + if ( pmem_page_aligned > pmem_epage ) + memset((void *)pmem_epage, -1, pmem_page_aligned - pmem_epage); + s = pmem_page_aligned; + } + else + { + pmem_page_aligned = pmem_epage & PAGE_MASK; + if ( pmem_page_aligned > s ) + init_frametable_pmem_chunk(s, pmem_page_aligned); + if ( pmem_page_aligned < pmem_epage ) + { + init_frametable_ram_chunk(pmem_page_aligned, + min(pmem_page_aligned + PAGE_SIZE, e)); + memset((void *)pmem_page_aligned, -1, + pmem_epage - pmem_page_aligned); + } + s = (pmem_epage + PAGE_SIZE - 1) & PAGE_MASK; + } + } +#endif } void __init init_frametable(void) diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index 3cbe305202..b9ebda8f4e 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -1358,6 +1358,10 @@ void __init noreturn __start_xen(unsigned long mbi_p) BUILD_BUG_ON(MACH2PHYS_VIRT_START != RO_MPT_VIRT_START); BUILD_BUG_ON(MACH2PHYS_VIRT_END != RO_MPT_VIRT_END); +#ifdef CONFIG_NVDIMM_PMEM + acpi_nfit_boot_init(); +#endif + init_frametable(); if ( !acpi_boot_table_init_done ) diff --git a/xen/drivers/acpi/Makefile b/xen/drivers/acpi/Makefile index 444b11d583..c8bb869cb8 100644 --- a/xen/drivers/acpi/Makefile +++ b/xen/drivers/acpi/Makefile @@ -9,3 +9,5 @@ obj-$(CONFIG_HAS_CPUFREQ) += pmstat.o obj-$(CONFIG_X86) += hwregs.o obj-$(CONFIG_X86) += reboot.o + +obj-$(CONFIG_NVDIMM_PMEM) += nfit.o diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c new file mode 100644 index 0000000000..e099378ee0 --- /dev/null +++ b/xen/drivers/acpi/nfit.c @@ -0,0 +1,116 @@ +/* + * xen/drivers/acpi/nfit.c + * + * Copyright (C) 2017, Intel Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms and conditions of the GNU General Public + * License, version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; If not, see . + */ + +#include +#include +#include +#include + +/* + * GUID of a byte addressable persistent memory region + * (ref. ACPI 6.2, Section 5.2.25.2) + */ +static const uint8_t nfit_spa_pmem_guid[] = +{ + 0x79, 0xd3, 0xf0, 0x66, 0xf3, 0xb4, 0x74, 0x40, + 0xac, 0x43, 0x0d, 0x33, 0x18, 0xb7, 0x8c, 0xdb, +}; + +struct acpi_nfit_desc { + struct acpi_table_nfit *acpi_table; +}; + +static struct acpi_nfit_desc nfit_desc; + +void __init acpi_nfit_boot_init(void) +{ + acpi_status status; + acpi_physical_address nfit_addr; + acpi_native_uint nfit_len; + + status = acpi_get_table_phys(ACPI_SIG_NFIT, 0, &nfit_addr, &nfit_len); + if ( ACPI_FAILURE(status) ) + return; + + nfit_desc.acpi_table = (struct acpi_table_nfit *)__va(nfit_addr); + map_pages_to_xen((unsigned long)nfit_desc.acpi_table, PFN_DOWN(nfit_addr), + PFN_UP(nfit_addr + nfit_len) - PFN_DOWN(nfit_addr), + PAGE_HYPERVISOR); +} + +/** + * Search pmem regions overlapped with the specified address range. + * + * Parameters: + * @smfn, @emfn: the start and end MFN of address range to search + * @ret_smfn, @ret_emfn: return the address range of the first pmem region + * in above range + * + * Return: + * Return true if a pmem region is overlapped with @smfn - @emfn. The + * start and end MFN of the lowest pmem region are returned via + * @ret_smfn and @ret_emfn respectively. + * + * Return false if no pmem region is overlapped with @smfn - @emfn. + */ +bool __init acpi_nfit_boot_search_pmem(unsigned long smfn, unsigned long emfn, + unsigned long *ret_smfn, + unsigned long *ret_emfn) +{ + struct acpi_table_nfit *nfit_table = nfit_desc.acpi_table; + uint32_t hdr_offset = sizeof(*nfit_table); + unsigned long saddr = pfn_to_paddr(smfn), eaddr = pfn_to_paddr(emfn); + unsigned long ret_saddr = 0, ret_eaddr = 0; + + if ( !nfit_table ) + return false; + + while ( hdr_offset < nfit_table->header.length ) + { + struct acpi_nfit_header *hdr = (void *)nfit_table + hdr_offset; + struct acpi_nfit_system_address *spa; + unsigned long pmem_saddr, pmem_eaddr; + + hdr_offset += hdr->length; + + if ( hdr->type != ACPI_NFIT_TYPE_SYSTEM_ADDRESS ) + continue; + + spa = (struct acpi_nfit_system_address *)hdr; + if ( memcmp(spa->range_guid, nfit_spa_pmem_guid, 16) ) + continue; + + pmem_saddr = spa->address; + pmem_eaddr = pmem_saddr + spa->length; + if ( pmem_saddr >= eaddr || pmem_eaddr <= saddr ) + continue; + + if ( ret_saddr < pmem_saddr ) + continue; + ret_saddr = pmem_saddr; + ret_eaddr = pmem_eaddr; + } + + if ( ret_saddr == ret_eaddr ) + return false; + + *ret_smfn = paddr_to_pfn(ret_saddr); + *ret_emfn = paddr_to_pfn(ret_eaddr); + + return true; +} diff --git a/xen/include/acpi/actbl1.h b/xen/include/acpi/actbl1.h index e1991362dc..94d8d7775c 100644 --- a/xen/include/acpi/actbl1.h +++ b/xen/include/acpi/actbl1.h @@ -71,6 +71,7 @@ #define ACPI_SIG_SBST "SBST" /* Smart Battery Specification Table */ #define ACPI_SIG_SLIT "SLIT" /* System Locality Distance Information Table */ #define ACPI_SIG_SRAT "SRAT" /* System Resource Affinity Table */ +#define ACPI_SIG_NFIT "NFIT" /* NVDIMM Firmware Interface Table */ /* * All tables must be byte-packed to match the ACPI specification, since @@ -903,6 +904,48 @@ struct acpi_msct_proximity { u64 memory_capacity; /* In bytes */ }; +/******************************************************************************* + * + * NFIT - NVDIMM Interface Table (ACPI 6.0+) + * Version 1 + * + ******************************************************************************/ + +struct acpi_table_nfit { + struct acpi_table_header header; /* Common ACPI table header */ + u32 reserved; /* Reserved, must be zero */ +}; + +/* Subtable header for NFIT */ + +struct acpi_nfit_header { + u16 type; + u16 length; +}; + +/* Values for subtable type in struct acpi_nfit_header */ +enum acpi_nfit_type { + ACPI_NFIT_TYPE_SYSTEM_ADDRESS = 0, + ACPI_NFIT_TYPE_MEMORY_MAP = 1, +}; + +/* + * NFIT Subtables + */ + +/* 0: System Physical Address Range Structure */ +struct acpi_nfit_system_address { + struct acpi_nfit_header header; + u16 range_index; + u16 flags; + u32 reserved; /* Reseved, must be zero */ + u32 proximity_domain; + u8 range_guid[16]; + u64 address; + u64 length; + u64 memory_mapping; +}; + /******************************************************************************* * * SBST - Smart Battery Specification Table diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h index 9409350f05..1bd8f9f4e4 100644 --- a/xen/include/xen/acpi.h +++ b/xen/include/xen/acpi.h @@ -180,4 +180,11 @@ void acpi_reboot(void); void acpi_dmar_zap(void); void acpi_dmar_reinstate(void); +#ifdef CONFIG_NVDIMM_PMEM +void acpi_nfit_boot_init(void); +bool acpi_nfit_boot_search_pmem(unsigned long smfn, unsigned long emfn, + unsigned long *ret_smfn, + unsigned long *ret_emfn); +#endif /* CONFIG_NVDIMM_PMEM */ + #endif /*_LINUX_ACPI_H*/