From patchwork Thu Jul 4 17:57:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Cooper X-Patchwork-Id: 11031787 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BCE24112C for ; Thu, 4 Jul 2019 17:59:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A8FD528A22 for ; Thu, 4 Jul 2019 17:59:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 96F7728A8B; Thu, 4 Jul 2019 17:59:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B511128A22 for ; Thu, 4 Jul 2019 17:59:47 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hj5zJ-0000AF-2q; Thu, 04 Jul 2019 17:57:41 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hj5zH-0000AA-Id for xen-devel@lists.xenproject.org; Thu, 04 Jul 2019 17:57:39 +0000 X-Inumbo-ID: 34420552-9e85-11e9-ae1c-a792ea221c88 Received: from esa6.hc3370-68.iphmx.com (unknown [216.71.155.175]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 34420552-9e85-11e9-ae1c-a792ea221c88; Thu, 04 Jul 2019 17:57:36 +0000 (UTC) Authentication-Results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=andrew.cooper3@citrix.com; spf=Pass smtp.mailfrom=Andrew.Cooper3@citrix.com; spf=None smtp.helo=postmaster@mail.citrix.com Received-SPF: None (esa6.hc3370-68.iphmx.com: no sender authenticity information available from domain of andrew.cooper3@citrix.com) identity=pra; client-ip=162.221.158.21; receiver=esa6.hc3370-68.iphmx.com; envelope-from="Andrew.Cooper3@citrix.com"; x-sender="andrew.cooper3@citrix.com"; x-conformance=sidf_compatible Received-SPF: Pass (esa6.hc3370-68.iphmx.com: domain of Andrew.Cooper3@citrix.com designates 162.221.158.21 as permitted sender) identity=mailfrom; client-ip=162.221.158.21; receiver=esa6.hc3370-68.iphmx.com; envelope-from="Andrew.Cooper3@citrix.com"; x-sender="Andrew.Cooper3@citrix.com"; x-conformance=sidf_compatible; x-record-type="v=spf1"; x-record-text="v=spf1 ip4:209.167.231.154 ip4:178.63.86.133 ip4:195.66.111.40/30 ip4:85.115.9.32/28 ip4:199.102.83.4 ip4:192.28.146.160 ip4:192.28.146.107 ip4:216.52.6.88 ip4:216.52.6.188 ip4:162.221.158.21 ip4:162.221.156.83 ~all" Received-SPF: None (esa6.hc3370-68.iphmx.com: no sender authenticity information available from domain of postmaster@mail.citrix.com) identity=helo; client-ip=162.221.158.21; receiver=esa6.hc3370-68.iphmx.com; envelope-from="Andrew.Cooper3@citrix.com"; x-sender="postmaster@mail.citrix.com"; x-conformance=sidf_compatible IronPort-SDR: SDAk2dBxItClEqB0tW2kUg5FIpvpP4sTKFIfoVTY8XKFNte/W3CSM+P+HCKYGFvM/lOHMQ5DGa r9v1ZD2EPNrH/2wNOzmUFsoGFKyHtTdJtUz6EzouQ3avOwMPMxcb3Js3fqycSrGwuHn553wIjS OzvZCdU63KuSpCNg2/X3ut0MfUFckbH37F61u0VlbGle9rwqbyse2Od8ydn2PBAUzIXDxsAR32 kmzDZxwSZv4HjML8hHZrV1xvTA6wH4NcP4ek33dGU9fxPQODPh4wo5UagwlMijMkwf32ng7aDp IFM= X-SBRS: 2.7 X-MesageID: 2630108 X-Ironport-Server: esa6.hc3370-68.iphmx.com X-Remote-IP: 162.221.158.21 X-Policy: $RELAYED X-IronPort-AV: E=Sophos;i="5.63,451,1557201600"; d="scan'208";a="2630108" From: Andrew Cooper To: Xen-devel Date: Thu, 4 Jul 2019 18:57:32 +0100 Message-ID: <20190704175732.5943-1-andrew.cooper3@citrix.com> X-Mailer: git-send-email 2.11.0 MIME-Version: 1.0 Subject: [Xen-devel] [PATCH] x86/ctxt-switch: Document and improve GDT handling X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Andrew Cooper , Wei Liu , Jan Beulich , =?utf-8?q?Roger_Pau_Monn=C3=A9?= Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP write_full_gdt_ptes() has a latent bug. Using virt_to_mfn() and iterating with (mfn + i) is wrong, because of PDX compression. The context switch path only functions correctly because NR_RESERVED_GDT_PAGES is 1. As this is exceedingly unlikely to change moving foward, drop the loop rather than inserting a BUILD_BUG_ON(NR_RESERVED_GDT_PAGES != 1). With the loop dropped, write_full_gdt_ptes() becomes more obviously a poor name, so rename it to update_xen_slot_in_full_gdt(). Furthermore, calling virt_to_mfn() in the context switch path is a lot of wasted cycles for a result which is constant after boot. Begin by documenting how Xen handles the GDTs across context switch. From this, we observe that load_full_gdt() is completely independent of the current CPU, and load_default_gdt() only gets passed the current CPU regular GDT. Add two extra per-cpu variables which cache the L1e for the regular and compat GDT, calculated in cpu_smpboot_alloc()/trap_init() as appropriate, so update_xen_slot_in_full_gdt() doesn't need to waste time performing the same calculation on every context switch. Signed-off-by: Andrew Cooper Tested-by: Juergen Gross Reviewed-by: Juergen Gross --- CC: Jan Beulich CC: Wei Liu CC: Roger Pau Monné CC: Juergen Gross Slightly RFC. I'm fairly confident this is better, but Juergen says that the some of his scheduling perf tests notice large difference from subtle changes in __context_switch(), so it would be useful to get some numbers from this change. The delta from this change is: add/remove: 2/0 grow/shrink: 1/1 up/down: 320/-127 (193) Function old new delta cpu_smpboot_callback 1152 1456 +304 per_cpu__gdt_table_l1e - 8 +8 per_cpu__compat_gdt_table_l1e - 8 +8 __context_switch 1238 1111 -127 Total: Before=3339227, After=3339420, chg +0.01% I'm not overly happy about the special case in trap_init() but I can't think of a better place to put this. Also, it should now be very obvious to people that Xen's current GDT handling for non-PV vcpus is a recipe subtle bugs, if we ever manage to execute a stray mov/pop %sreg instruction. We really ought to have Xen's regular GDT in an area where slots 0-13 are either mapped to the zero page, or not present, so we don't risk loading a non-faulting garbage selector. --- xen/arch/x86/domain.c | 52 ++++++++++++++++++++++++++++++---------------- xen/arch/x86/smpboot.c | 4 ++++ xen/arch/x86/traps.c | 10 +++++++++ xen/include/asm-x86/desc.h | 2 ++ 4 files changed, 50 insertions(+), 18 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 84cafbe558..147f96a09e 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1635,23 +1635,42 @@ static void _update_runstate_area(struct vcpu *v) v->arch.pv.need_update_runstate_area = 1; } +/* + * Overview of Xen's GDTs. + * + * Xen maintains per-CPU compat and regular GDTs which are both a single page + * in size. Some content is specific to each CPU (the TSS, the per-CPU marker + * for #DF handling, and optionally the LDT). The compat and regular GDTs + * differ by the layout and content of the guest accessible selectors. + * + * The Xen selectors live from 0xe000 (slot 14 of 16), and need to always + * appear in this position for interrupt/exception handling to work. + * + * A PV guest may specify GDT frames of their own (slots 0 to 13). Room for a + * full GDT exists in the per-domain mappings. + * + * To schedule a PV vcpu, we point slot 14 of the guest's full GDT at the + * current CPU's compat or regular (as appropriate) GDT frame. This is so + * that the per-CPU parts still work correctly after switching pagetables and + * loading the guests full GDT into GDTR. + * + * To schedule Idle or HVM vcpus, we load a GDT base address which causes the + * regular per-CPU GDT frame to appear with selectors at the appropriate + * offset. + */ static inline bool need_full_gdt(const struct domain *d) { return is_pv_domain(d) && !is_idle_domain(d); } -static void write_full_gdt_ptes(seg_desc_t *gdt, const struct vcpu *v) +static void update_xen_slot_in_full_gdt(const struct vcpu *v, unsigned int cpu) { - unsigned long mfn = virt_to_mfn(gdt); - l1_pgentry_t *pl1e = pv_gdt_ptes(v); - unsigned int i; - - for ( i = 0; i < NR_RESERVED_GDT_PAGES; i++ ) - l1e_write(pl1e + FIRST_RESERVED_GDT_PAGE + i, - l1e_from_pfn(mfn + i, __PAGE_HYPERVISOR_RW)); + l1e_write(pv_gdt_ptes(v) + FIRST_RESERVED_GDT_PAGE, + !is_pv_32bit_vcpu(v) ? per_cpu(gdt_table_l1e, cpu) + : per_cpu(compat_gdt_table_l1e, cpu)); } -static void load_full_gdt(const struct vcpu *v, unsigned int cpu) +static void load_full_gdt(const struct vcpu *v) { struct desc_ptr gdt_desc = { .limit = LAST_RESERVED_GDT_BYTE, @@ -1661,11 +1680,12 @@ static void load_full_gdt(const struct vcpu *v, unsigned int cpu) lgdt(&gdt_desc); } -static void load_default_gdt(const seg_desc_t *gdt, unsigned int cpu) +static void load_default_gdt(unsigned int cpu) { struct desc_ptr gdt_desc = { .limit = LAST_RESERVED_GDT_BYTE, - .base = (unsigned long)(gdt - FIRST_RESERVED_GDT_ENTRY), + .base = (unsigned long)(per_cpu(gdt_table, cpu) - + FIRST_RESERVED_GDT_ENTRY), }; lgdt(&gdt_desc); @@ -1678,7 +1698,6 @@ static void __context_switch(void) struct vcpu *p = per_cpu(curr_vcpu, cpu); struct vcpu *n = current; struct domain *pd = p->domain, *nd = n->domain; - seg_desc_t *gdt; ASSERT(p != n); ASSERT(!vcpu_cpu_dirty(n)); @@ -1718,15 +1737,12 @@ static void __context_switch(void) psr_ctxt_switch_to(nd); - gdt = !is_pv_32bit_domain(nd) ? per_cpu(gdt_table, cpu) : - per_cpu(compat_gdt_table, cpu); - if ( need_full_gdt(nd) ) - write_full_gdt_ptes(gdt, n); + update_xen_slot_in_full_gdt(n, cpu); if ( need_full_gdt(pd) && ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(nd)) ) - load_default_gdt(gdt, cpu); + load_default_gdt(cpu); write_ptbase(n); @@ -1739,7 +1755,7 @@ static void __context_switch(void) if ( need_full_gdt(nd) && ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) ) - load_full_gdt(n, cpu); + load_full_gdt(n); if ( pd != nd ) cpumask_clear_cpu(cpu, pd->dirty_cpumask); diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index 730fe141fa..004285d14c 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -985,6 +985,8 @@ static int cpu_smpboot_alloc(unsigned int cpu) if ( gdt == NULL ) goto out; per_cpu(gdt_table, cpu) = gdt; + per_cpu(gdt_table_l1e, cpu) = + l1e_from_pfn(virt_to_mfn(gdt), __PAGE_HYPERVISOR_RW); memcpy(gdt, boot_cpu_gdt_table, NR_RESERVED_GDT_PAGES * PAGE_SIZE); BUILD_BUG_ON(NR_CPUS > 0x10000); gdt[PER_CPU_GDT_ENTRY - FIRST_RESERVED_GDT_ENTRY].a = cpu; @@ -992,6 +994,8 @@ static int cpu_smpboot_alloc(unsigned int cpu) per_cpu(compat_gdt_table, cpu) = gdt = alloc_xenheap_pages(order, memflags); if ( gdt == NULL ) goto out; + per_cpu(compat_gdt_table_l1e, cpu) = + l1e_from_pfn(virt_to_mfn(gdt), __PAGE_HYPERVISOR_RW); memcpy(gdt, boot_cpu_compat_gdt_table, NR_RESERVED_GDT_PAGES * PAGE_SIZE); gdt[PER_CPU_GDT_ENTRY - FIRST_RESERVED_GDT_ENTRY].a = cpu; diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 8097ef3bf5..25b4b47e5e 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -97,7 +97,9 @@ DEFINE_PER_CPU(uint64_t, efer); static DEFINE_PER_CPU(unsigned long, last_extable_addr); DEFINE_PER_CPU_READ_MOSTLY(seg_desc_t *, gdt_table); +DEFINE_PER_CPU_READ_MOSTLY(l1_pgentry_t, gdt_table_l1e); DEFINE_PER_CPU_READ_MOSTLY(seg_desc_t *, compat_gdt_table); +DEFINE_PER_CPU_READ_MOSTLY(l1_pgentry_t, compat_gdt_table_l1e); /* Master table, used by CPU0. */ idt_entry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE) @@ -2059,6 +2061,14 @@ void __init trap_init(void) } } + /* Cache {,compat_}gdt_table_l1e now that physically relocation is done. */ + this_cpu(gdt_table_l1e) = + l1e_from_pfn(virt_to_mfn(boot_cpu_gdt_table), + __PAGE_HYPERVISOR_RW); + this_cpu(compat_gdt_table_l1e) = + l1e_from_pfn(virt_to_mfn(boot_cpu_compat_gdt_table), + __PAGE_HYPERVISOR_RW); + percpu_traps_init(); cpu_init(); diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h index 85e83bcefb..e565727dc0 100644 --- a/xen/include/asm-x86/desc.h +++ b/xen/include/asm-x86/desc.h @@ -206,8 +206,10 @@ struct __packed desc_ptr { extern seg_desc_t boot_cpu_gdt_table[]; DECLARE_PER_CPU(seg_desc_t *, gdt_table); +DECLARE_PER_CPU(l1_pgentry_t, gdt_table_l1e); extern seg_desc_t boot_cpu_compat_gdt_table[]; DECLARE_PER_CPU(seg_desc_t *, compat_gdt_table); +DECLARE_PER_CPU(l1_pgentry_t, compat_gdt_table_l1e); extern void load_TR(void);