From patchwork Mon Feb 1 09:13:34 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 8177051 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id C40D7BEEE5 for ; Mon, 1 Feb 2016 09:17:03 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id E8D862041D for ; Mon, 1 Feb 2016 09:17:01 +0000 (UTC) Received: from lists.xen.org (lists.xenproject.org [50.57.142.19]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 78109203F7 for ; Mon, 1 Feb 2016 09:17:00 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xen.org) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aQAYK-0005FG-Rh; Mon, 01 Feb 2016 09:13:44 +0000 Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aQAYI-0005FB-Bf for xen-devel@lists.xenproject.org; Mon, 01 Feb 2016 09:13:42 +0000 Received: from [193.109.254.147] by server-8.bemta-14.messagelabs.com id DF/69-24450-5C12FA65; Mon, 01 Feb 2016 09:13:41 +0000 X-Env-Sender: JBeulich@suse.com X-Msg-Ref: server-4.tower-27.messagelabs.com!1454318018!19558052!1 X-Originating-IP: [137.65.248.74] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 7.35.1; banners=-,-,- X-VirusChecked: Checked Received: (qmail 29152 invoked from network); 1 Feb 2016 09:13:39 -0000 Received: from prv-mh.provo.novell.com (HELO prv-mh.provo.novell.com) (137.65.248.74) by server-4.tower-27.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 1 Feb 2016 09:13:39 -0000 Received: from INET-PRV-MTA by prv-mh.provo.novell.com with Novell_GroupWise; Mon, 01 Feb 2016 02:13:36 -0700 Message-Id: <56AF2FCE02000078000CCC5F@prv-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 14.2.0 Date: Mon, 01 Feb 2016 02:13:34 -0700 From: "Jan Beulich" To: "xen-devel" References: <56AB4B6102000078000CC51B@prv-mh.provo.novell.com> <56AB4D1402000078000CC543@prv-mh.provo.novell.com> In-Reply-To: <56AB4D1402000078000CC543@prv-mh.provo.novell.com> Mime-Version: 1.0 Cc: Andrew Cooper , Keir Fraser , Harmandeep Kaur , Shuai Ruan Subject: [Xen-devel] [PATCH v2 3/4] x86/xstate: fix fault behavior on XRSTORS X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP XRSTORS unconditionally faults when xcomp_bv has bit 63 clear. Instead of just fixing this issue, overhaul the fault recovery code, which - one of the many mistakes made when xstate support got introduced - was blindly mirroring that accompanying FXRSTOR, neglecting the fact that XRSTOR{,S} aren't all-or-nothing instructions. The new code, first of all, does all the recovery actions in C, simplifying the inline assembly used. And it does its work in a multi-stage fashion: Upon first seeing a fault, state fixups get applied strictly based on what architecturally may cause #GP. When seeing another fault despite the fixups done, state gets fully reset. A third fault would then lead to crashing the domain (instead of hanging the hypervisor in an infinite loop of recurring faults). Reported-by: Harmandeep Kaur Signed-off-by: Jan Beulich Reviewed-by: Andrew Cooper --- v2: Fix default MXCSR mask value. Avoid infinite fault recovery loop. x86/xstate: fix fault behavior on XRSTORS XRSTORS unconditionally faults when xcomp_bv has bit 63 clear. Instead of just fixing this issue, overhaul the fault recovery code, which - one of the many mistakes made when xstate support got introduced - was blindly mirroring that accompanying FXRSTOR, neglecting the fact that XRSTOR{,S} aren't all-or-nothing instructions. The new code, first of all, does all the recovery actions in C, simplifying the inline assembly used. And it does its work in a multi-stage fashion: Upon first seeing a fault, state fixups get applied strictly based on what architecturally may cause #GP. When seeing another fault despite the fixups done, state gets fully reset. A third fault would then lead to crashing the domain (instead of hanging the hypervisor in an infinite loop of recurring faults). Reported-by: Harmandeep Kaur Signed-off-by: Jan Beulich --- v2: Fix default MXCSR mask value. Avoid infinite fault recovery loop. --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -29,6 +29,8 @@ unsigned int *__read_mostly xstate_sizes static unsigned int __read_mostly xstate_features; static unsigned int __read_mostly xstate_comp_offsets[sizeof(xfeature_mask)*8]; +static uint32_t __read_mostly mxcsr_mask = 0x0000ffbf; + /* Cached xcr0 for fast read */ static DEFINE_PER_CPU(uint64_t, xcr0); @@ -342,6 +344,7 @@ void xrstor(struct vcpu *v, uint64_t mas uint32_t hmask = mask >> 32; uint32_t lmask = mask; struct xsave_struct *ptr = v->arch.xsave_area; + unsigned int faults, prev_faults; /* * AMD CPUs don't save/restore FDP/FIP/FOP unless an exception @@ -361,35 +364,84 @@ void xrstor(struct vcpu *v, uint64_t mas /* * XRSTOR can fault if passed a corrupted data block. We handle this * possibility, which may occur if the block was passed to us by control - * tools or through VCPUOP_initialise, by silently clearing the block. + * tools or through VCPUOP_initialise, by silently adjusting state. */ - switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) ) + for ( prev_faults = faults = 0; ; prev_faults = faults ) { + switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) ) + { #define XRSTOR(pfx) \ alternative_io("1: .byte " pfx "0x0f,0xae,0x2f\n" \ + "3:\n" \ " .section .fixup,\"ax\"\n" \ - "2: mov %[size],%%ecx\n" \ - " xor %[lmask_out],%[lmask_out]\n" \ - " rep stosb\n" \ - " lea %[mem],%[ptr]\n" \ - " mov %[lmask_in],%[lmask_out]\n" \ - " jmp 1b\n" \ + "2: inc%z[faults] %[faults]\n" \ + " jmp 3b\n" \ " .previous\n" \ _ASM_EXTABLE(1b, 2b), \ ".byte " pfx "0x0f,0xc7,0x1f\n", \ X86_FEATURE_XSAVES, \ - ASM_OUTPUT2([ptr] "+&D" (ptr), [lmask_out] "+&a" (lmask)), \ - [mem] "m" (*ptr), [lmask_in] "g" (lmask), \ - [hmask] "d" (hmask), [size] "m" (xsave_cntxt_size) \ - : "ecx") - - default: - XRSTOR("0x48,"); - break; - case 4: case 2: - XRSTOR(""); - break; + ASM_OUTPUT2([mem] "+m" (*ptr), [faults] "+g" (faults)), \ + [lmask] "a" (lmask), [hmask] "d" (hmask), \ + [ptr] "D" (ptr)) + + default: + XRSTOR("0x48,"); + break; + case 4: case 2: + XRSTOR(""); + break; #undef XRSTOR + } + if ( likely(faults == prev_faults) ) + break; +#ifndef NDEBUG + gprintk(XENLOG_WARNING, "fault#%u: mxcsr=%08x\n", + faults, ptr->fpu_sse.mxcsr); + gprintk(XENLOG_WARNING, "xs=%016lx xc=%016lx\n", + ptr->xsave_hdr.xstate_bv, ptr->xsave_hdr.xcomp_bv); + gprintk(XENLOG_WARNING, "r0=%016lx r1=%016lx\n", + ptr->xsave_hdr.reserved[0], ptr->xsave_hdr.reserved[1]); + gprintk(XENLOG_WARNING, "r2=%016lx r3=%016lx\n", + ptr->xsave_hdr.reserved[2], ptr->xsave_hdr.reserved[3]); + gprintk(XENLOG_WARNING, "r4=%016lx r5=%016lx\n", + ptr->xsave_hdr.reserved[4], ptr->xsave_hdr.reserved[5]); +#endif + switch ( faults ) + { + case 1: /* Stage 1: Reset state to be loaded. */ + ptr->xsave_hdr.xstate_bv &= ~mask; + /* + * Also try to eliminate fault reasons, even if this shouldn't be + * needed here (other code should ensure the sanity of the data). + */ + if ( ((mask & XSTATE_SSE) || + ((mask & XSTATE_YMM) && + !(ptr->xsave_hdr.xcomp_bv & XSTATE_COMPACTION_ENABLED))) ) + ptr->fpu_sse.mxcsr &= mxcsr_mask; + if ( cpu_has_xsaves || cpu_has_xsavec ) + { + ptr->xsave_hdr.xcomp_bv &= this_cpu(xcr0) | this_cpu(xss); + ptr->xsave_hdr.xstate_bv &= ptr->xsave_hdr.xcomp_bv; + ptr->xsave_hdr.xcomp_bv |= XSTATE_COMPACTION_ENABLED; + } + else + { + ptr->xsave_hdr.xstate_bv &= this_cpu(xcr0); + ptr->xsave_hdr.xcomp_bv = 0; + } + memset(ptr->xsave_hdr.reserved, 0, sizeof(ptr->xsave_hdr.reserved)); + continue; + + case 2: /* Stage 2: Reset all state. */ + ptr->fpu_sse.mxcsr = MXCSR_DEFAULT; + ptr->xsave_hdr.xstate_bv = 0; + ptr->xsave_hdr.xcomp_bv = cpu_has_xsaves + ? XSTATE_COMPACTION_ENABLED : 0; + continue; + } + + domain_crash(current->domain); + return; } } @@ -496,6 +548,8 @@ void xstate_init(struct cpuinfo_x86 *c) if ( bsp ) { + static typeof(current->arch.xsave_area->fpu_sse) __initdata ctxt; + xfeature_mask = feature_mask; /* * xsave_cntxt_size is the max size required by enabled features. @@ -504,6 +558,10 @@ void xstate_init(struct cpuinfo_x86 *c) xsave_cntxt_size = _xstate_ctxt_size(feature_mask); printk("%s: using cntxt_size: %#x and states: %#"PRIx64"\n", __func__, xsave_cntxt_size, xfeature_mask); + + asm ( "fxsave %0" : "=m" (ctxt) ); + if ( ctxt.mxcsr_mask ) + mxcsr_mask = ctxt.mxcsr_mask; } else { --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -29,6 +29,8 @@ unsigned int *__read_mostly xstate_sizes static unsigned int __read_mostly xstate_features; static unsigned int __read_mostly xstate_comp_offsets[sizeof(xfeature_mask)*8]; +static uint32_t __read_mostly mxcsr_mask = 0x0000ffbf; + /* Cached xcr0 for fast read */ static DEFINE_PER_CPU(uint64_t, xcr0); @@ -342,6 +344,7 @@ void xrstor(struct vcpu *v, uint64_t mas uint32_t hmask = mask >> 32; uint32_t lmask = mask; struct xsave_struct *ptr = v->arch.xsave_area; + unsigned int faults, prev_faults; /* * AMD CPUs don't save/restore FDP/FIP/FOP unless an exception @@ -361,35 +364,84 @@ void xrstor(struct vcpu *v, uint64_t mas /* * XRSTOR can fault if passed a corrupted data block. We handle this * possibility, which may occur if the block was passed to us by control - * tools or through VCPUOP_initialise, by silently clearing the block. + * tools or through VCPUOP_initialise, by silently adjusting state. */ - switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) ) + for ( prev_faults = faults = 0; ; prev_faults = faults ) { + switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) ) + { #define XRSTOR(pfx) \ alternative_io("1: .byte " pfx "0x0f,0xae,0x2f\n" \ + "3:\n" \ " .section .fixup,\"ax\"\n" \ - "2: mov %[size],%%ecx\n" \ - " xor %[lmask_out],%[lmask_out]\n" \ - " rep stosb\n" \ - " lea %[mem],%[ptr]\n" \ - " mov %[lmask_in],%[lmask_out]\n" \ - " jmp 1b\n" \ + "2: inc%z[faults] %[faults]\n" \ + " jmp 3b\n" \ " .previous\n" \ _ASM_EXTABLE(1b, 2b), \ ".byte " pfx "0x0f,0xc7,0x1f\n", \ X86_FEATURE_XSAVES, \ - ASM_OUTPUT2([ptr] "+&D" (ptr), [lmask_out] "+&a" (lmask)), \ - [mem] "m" (*ptr), [lmask_in] "g" (lmask), \ - [hmask] "d" (hmask), [size] "m" (xsave_cntxt_size) \ - : "ecx") - - default: - XRSTOR("0x48,"); - break; - case 4: case 2: - XRSTOR(""); - break; + ASM_OUTPUT2([mem] "+m" (*ptr), [faults] "+g" (faults)), \ + [lmask] "a" (lmask), [hmask] "d" (hmask), \ + [ptr] "D" (ptr)) + + default: + XRSTOR("0x48,"); + break; + case 4: case 2: + XRSTOR(""); + break; #undef XRSTOR + } + if ( likely(faults == prev_faults) ) + break; +#ifndef NDEBUG + gprintk(XENLOG_WARNING, "fault#%u: mxcsr=%08x\n", + faults, ptr->fpu_sse.mxcsr); + gprintk(XENLOG_WARNING, "xs=%016lx xc=%016lx\n", + ptr->xsave_hdr.xstate_bv, ptr->xsave_hdr.xcomp_bv); + gprintk(XENLOG_WARNING, "r0=%016lx r1=%016lx\n", + ptr->xsave_hdr.reserved[0], ptr->xsave_hdr.reserved[1]); + gprintk(XENLOG_WARNING, "r2=%016lx r3=%016lx\n", + ptr->xsave_hdr.reserved[2], ptr->xsave_hdr.reserved[3]); + gprintk(XENLOG_WARNING, "r4=%016lx r5=%016lx\n", + ptr->xsave_hdr.reserved[4], ptr->xsave_hdr.reserved[5]); +#endif + switch ( faults ) + { + case 1: /* Stage 1: Reset state to be loaded. */ + ptr->xsave_hdr.xstate_bv &= ~mask; + /* + * Also try to eliminate fault reasons, even if this shouldn't be + * needed here (other code should ensure the sanity of the data). + */ + if ( ((mask & XSTATE_SSE) || + ((mask & XSTATE_YMM) && + !(ptr->xsave_hdr.xcomp_bv & XSTATE_COMPACTION_ENABLED))) ) + ptr->fpu_sse.mxcsr &= mxcsr_mask; + if ( cpu_has_xsaves || cpu_has_xsavec ) + { + ptr->xsave_hdr.xcomp_bv &= this_cpu(xcr0) | this_cpu(xss); + ptr->xsave_hdr.xstate_bv &= ptr->xsave_hdr.xcomp_bv; + ptr->xsave_hdr.xcomp_bv |= XSTATE_COMPACTION_ENABLED; + } + else + { + ptr->xsave_hdr.xstate_bv &= this_cpu(xcr0); + ptr->xsave_hdr.xcomp_bv = 0; + } + memset(ptr->xsave_hdr.reserved, 0, sizeof(ptr->xsave_hdr.reserved)); + continue; + + case 2: /* Stage 2: Reset all state. */ + ptr->fpu_sse.mxcsr = MXCSR_DEFAULT; + ptr->xsave_hdr.xstate_bv = 0; + ptr->xsave_hdr.xcomp_bv = cpu_has_xsaves + ? XSTATE_COMPACTION_ENABLED : 0; + continue; + } + + domain_crash(current->domain); + return; } } @@ -496,6 +548,8 @@ void xstate_init(struct cpuinfo_x86 *c) if ( bsp ) { + static typeof(current->arch.xsave_area->fpu_sse) __initdata ctxt; + xfeature_mask = feature_mask; /* * xsave_cntxt_size is the max size required by enabled features. @@ -504,6 +558,10 @@ void xstate_init(struct cpuinfo_x86 *c) xsave_cntxt_size = _xstate_ctxt_size(feature_mask); printk("%s: using cntxt_size: %#x and states: %#"PRIx64"\n", __func__, xsave_cntxt_size, xfeature_mask); + + asm ( "fxsave %0" : "=m" (ctxt) ); + if ( ctxt.mxcsr_mask ) + mxcsr_mask = ctxt.mxcsr_mask; } else {