From patchwork Fri Mar 24 13:24:59 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Razvan Cojocaru X-Patchwork-Id: 9642823 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 60BB9601E9 for ; Fri, 24 Mar 2017 13:27:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 42E6D2040D for ; Fri, 24 Mar 2017 13:27:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 33E5527E5A; Fri, 24 Mar 2017 13:27:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 469952040D for ; Fri, 24 Mar 2017 13:27:58 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1crPDR-0002Cj-Je; Fri, 24 Mar 2017 13:25:17 +0000 Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1crPDQ-0002Cd-Ou for xen-devel@lists.xen.org; Fri, 24 Mar 2017 13:25:16 +0000 Received: from [85.158.137.68] by server-11.bemta-3.messagelabs.com id 55/7C-23940-C3E15D85; Fri, 24 Mar 2017 13:25:16 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprPIsWRWlGSWpSXmKPExsUSfTxjoa613NU Ig7e7+C2WfFzM4sDocXT3b6YAxijWzLyk/IoE1owtux6wFGzyqdjaMpm1gXGJRRcjJ4eQgIfE h86F7F2MXED2WkaJM4cusUE4Vxkl1k+YyA5TNX/nLkaIxH5GiQWnzrKCJNgEDCVWb2xhA7FFB KQlrn2+zAhiMwtUSvzbvoEFxBYWCJG43vsQbBCLgKrEg/VTwXp5BTwlLr45D2ZLCMhJnDw2Gc jmALJzJFbMYYEwpST+tyqBrJUQOMAi0f9qO1S5jMSjiTfZJjAKLGBkWMWoUZxaVJZapGtoqJd UlJmeUZKbmJmja2hgrJebWlycmJ6ak5hUrJecn7uJERhYDECwg3H1b6dDjJIcTEqivG7brkQI 8SXlp1RmJBZnxBeV5qQWH2KU4eBQkuBtlr0aISRYlJqeWpGWmQMMcZi0BAePkghvPEiat7ggM bc4Mx0idYpRl2PO7N1vmIRY8vLzUqXEeblBigRAijJK8+BGwOLtEqOslDAvI9BRQjwFqUW5mS Wo8q8YxTkYlYR59UCm8GTmlcBtegV0BBPQEbM3XAE5oiQRISXVwLiyZFfH/9DnYbvvsn23jXh gbhK6J0lshf+DlRdSqjP8Fu8J4fvh0sEmMaXotfnveJWU9aZxn+QNbvC9vfRFxSh53rywkEi7 zUHXrLMPGDgwLbA+bHpVMGSu/Fmbd963AyTYxFVPP7ddtaDzTFVZ+sQcjxUHKni5WVNezzGWy /ubfsmkZ1aZrBJLcUaioRZzUXEiAAWwGCeyAgAA X-Env-Sender: rcojocaru@bitdefender.com X-Msg-Ref: server-7.tower-31.messagelabs.com!1490361914!84365903!1 X-Originating-IP: [91.199.104.161] X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG X-StarScan-Received: X-StarScan-Version: 9.2.3; banners=-,-,- X-VirusChecked: Checked Received: (qmail 15987 invoked from network); 24 Mar 2017 13:25:15 -0000 Received: from mx01.bbu.dsd.mx.bitdefender.com (HELO mx01.bbu.dsd.mx.bitdefender.com) (91.199.104.161) by server-7.tower-31.messagelabs.com with DHE-RSA-AES128-GCM-SHA256 encrypted SMTP; 24 Mar 2017 13:25:15 -0000 Received: (qmail 13698 invoked from network); 24 Mar 2017 15:25:14 +0200 Received: from unknown (HELO mx-sr.buh.bitdefender.com) (10.17.80.103) by mx01.bbu.dsd.mx.bitdefender.com with AES256-GCM-SHA384 encrypted SMTP; 24 Mar 2017 15:25:14 +0200 Received: from smtp03.buh.bitdefender.org (smtp.bitdefender.biz [10.17.80.77]) by mx-sr.buh.bitdefender.com (Postfix) with ESMTP id 363B27FBF4 for ; Fri, 24 Mar 2017 15:25:14 +0200 (EET) Received: (qmail 10738 invoked from network); 24 Mar 2017 15:25:14 +0200 Received: from xen.dsd.ro (HELO xen.dsd.bitdefender.biz) (rcojocaru@bitdefender.com@10.10.14.109) by smtp03.buh.bitdefender.org with AES128-SHA256 encrypted SMTP; 24 Mar 2017 15:25:13 +0200 From: Razvan Cojocaru To: xen-devel@lists.xen.org Date: Fri, 24 Mar 2017 15:24:59 +0200 Message-Id: <1490361899-18303-1-git-send-email-rcojocaru@bitdefender.com> X-Mailer: git-send-email 1.9.1 X-BitDefender-Scanner: Clean, Agent: BitDefender qmail 3.1.6 on smtp03.buh.bitdefender.org, sigver: 7.70370 X-BitDefender-Spam: No (0) X-BitDefender-SpamStamp: Build: [Engines: 2.15.8.1074, Dats: 444172, Stamp: 3], Multi: [Enabled, t: (0.000014, 0.044385)], BW: [Enabled, t: (0.000010)], RBL DNSBL: [Disabled], APM: [Enabled, Score: 500, t: (0.011224), Flags: 85D2ED72; NN_NO_NEED_TO; NN_NO_CONTENT_TYPE; NN_LEGIT_SUMM_400_WORDS; NN_NO_LINK_NMD; NN_LEGIT_BITDEFENDER; NN_LEGIT_S_SQARE_BRACKETS; NN_LEGIT_MAILING_LIST_TO], SGN: [Enabled, t: (0.018562, 0.000532)], URL: [Enabled, t: (0.000005, 0.000001)], RTDA: [Enabled, t: (0.153324), Hit: No, Details: v2.4.4; Id: 11.5eu806.1bbqkr6qc.1oieh], total: 0(775) X-BitDefender-CF-Stamp: none Cc: andrew.cooper3@citrix.com, paul.durrant@citrix.com, Razvan Cojocaru , jbeulich@suse.com Subject: [Xen-devel] [PATCH RFC] x86/emulate: implement hvmemul_cmpxchg() with an actual CMPXCHG X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP hvmemul_cmpxchg() is no presently SMP-safe, as it ends up doing a memcpy(). Use an actual CMPXCHG instruction in the implementation. Signed-off-by: Razvan Cojocaru --- Questions: - I've used __cmpxchg(), as it's already there, but it always locks. In my tests and in x86_emulate()'s context, the lock is already taken anyway, so it doesn't appear to make a difference at present. Have I missed a case where it does, or should we add a lock prefix anyway for the future? If so, should I use another version of cmpxchg() that we already have in the source code but I've missed, modify this one, bring another implementation in? - Is there anything I've missed in the implementation that could crash the host / guest / cause trouble? - In the introspection context, before this change a RETRY return in hvm_emulate_one_vm_event() would do nothing - when emulating because of a mem_access vm_event, this would mean that the VCPU would then simply try to execute the instruction again, which would trigger another mem_access event (which would hopefully this time trigger a successful insn emulation). However, with these changes, this would mean that, in a SMP scenario where a VCPU failed it's CMPXCHG, the instruction would be re-executed in the guest, which I've found leads to bad things happening (mostly BSODs). I've mitigated this in this patch by trying to emulate the failed instruction in a loop until it succeeds, but I'm not convinced it is a good idea. What are the alternatives? --- xen/arch/x86/hvm/emulate.c | 230 +++++++++++++++++++++++++++++++++++---------- 1 file changed, 181 insertions(+), 49 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 2d92957..b5754a1 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -1029,6 +1030,77 @@ static int hvmemul_wbinvd_discard( return X86EMUL_OKAY; } +static int hvmemul_vaddr_to_mfn( + unsigned long addr, + mfn_t *mfn, + uint32_t pfec, + struct x86_emulate_ctxt *ctxt) +{ + paddr_t gpa = addr & ~PAGE_MASK; + struct page_info *page; + p2m_type_t p2mt; + unsigned long gfn; + struct vcpu *curr = current; + struct hvm_emulate_ctxt *hvmemul_ctxt = + container_of(ctxt, struct hvm_emulate_ctxt, ctxt); + + gfn = paging_gva_to_gfn(curr, addr, &pfec); + + if ( gfn == gfn_x(INVALID_GFN) ) + { + pagefault_info_t pfinfo = {}; + + if ( ( pfec & PFEC_page_paged ) || ( pfec & PFEC_page_shared ) ) + return X86EMUL_RETRY; + + pfinfo.linear = addr; + pfinfo.ec = pfec; + + x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt); + return X86EMUL_EXCEPTION; + } + + gpa |= (paddr_t)gfn << PAGE_SHIFT; + + /* + * No need to do the P2M lookup for internally handled MMIO, benefiting + * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses, + * - newer Windows (like Server 2012) for HPET accesses. + */ + if ( !nestedhvm_vcpu_in_guestmode(curr) && hvm_mmio_internal(gpa) ) + return X86EMUL_UNHANDLEABLE; + + page = get_page_from_gfn(curr->domain, gfn, &p2mt, P2M_UNSHARE); + + if ( !page ) + return X86EMUL_UNHANDLEABLE; + + if ( p2m_is_paging(p2mt) ) + { + put_page(page); + p2m_mem_paging_populate(curr->domain, gfn); + return X86EMUL_RETRY; + } + + if ( p2m_is_shared(p2mt) ) + { + put_page(page); + return X86EMUL_RETRY; + } + + if ( p2m_is_grant(p2mt) ) + { + put_page(page); + return X86EMUL_UNHANDLEABLE; + } + + *mfn = _mfn(page_to_mfn(page)); + + put_page(page); + + return X86EMUL_OKAY; +} + static int hvmemul_cmpxchg( enum x86_segment seg, unsigned long offset, @@ -1037,8 +1109,70 @@ static int hvmemul_cmpxchg( unsigned int bytes, struct x86_emulate_ctxt *ctxt) { - /* Fix this in case the guest is really relying on r-m-w atomicity. */ - return hvmemul_write(seg, offset, p_new, bytes, ctxt); + unsigned long addr, reps = 1; + int rc = X86EMUL_OKAY; + unsigned long old = 0, new = 0; + uint32_t pfec = PFEC_page_present | PFEC_write_access; + struct hvm_emulate_ctxt *hvmemul_ctxt = + container_of(ctxt, struct hvm_emulate_ctxt, ctxt); + mfn_t mfn[2]; + void *map = NULL; + struct domain *currd = current->domain; + + if ( is_x86_system_segment(seg) ) + pfec |= PFEC_implicit; + else if ( hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3 ) + pfec |= PFEC_user_mode; + + rc = hvmemul_virtual_to_linear( + seg, offset, bytes, &reps, hvm_access_write, hvmemul_ctxt, &addr); + + if ( rc != X86EMUL_OKAY || !bytes ) + return rc; + + rc = hvmemul_vaddr_to_mfn(addr, &mfn[0], pfec, ctxt); + + if ( rc != X86EMUL_OKAY ) + return rc; + + if ( likely(((addr + bytes - 1) & PAGE_MASK) == (addr & PAGE_MASK)) ) + { + /* Whole write fits on a single page. */ + mfn[1] = INVALID_MFN; + map = map_domain_page(mfn[0]); + } + else + { + rc = hvmemul_vaddr_to_mfn((addr + bytes - 1) & PAGE_MASK, &mfn[1], + pfec, ctxt); + if ( rc != X86EMUL_OKAY ) + return rc; + + map = vmap(mfn, 2); + } + + if ( !map ) + return X86EMUL_UNHANDLEABLE; + + map += (addr & ~PAGE_MASK); + + memcpy(&old, p_old, bytes); + memcpy(&new, p_new, bytes); + + if ( __cmpxchg(map, old, new, bytes) != old ) + rc = X86EMUL_RETRY; + + paging_mark_dirty(currd, mfn[0]); + + if ( unlikely(mfn_valid(mfn[1])) ) + { + paging_mark_dirty(currd, mfn[1]); + vunmap((void *)((unsigned long)map & PAGE_MASK)); + } + else + unmap_domain_page(map); + + return rc; } static int hvmemul_validate( @@ -1961,59 +2095,57 @@ int hvm_emulate_one_mmio(unsigned long mfn, unsigned long gla) void hvm_emulate_one_vm_event(enum emul_kind kind, unsigned int trapnr, unsigned int errcode) { - struct hvm_emulate_ctxt ctx = {{ 0 }}; - int rc; + int rc = X86EMUL_OKAY; - hvm_emulate_init_once(&ctx, NULL, guest_cpu_user_regs()); + do { + struct hvm_emulate_ctxt ctx = {{ 0 }}; - switch ( kind ) - { - case EMUL_KIND_NOWRITE: - rc = _hvm_emulate_one(&ctx, &hvm_emulate_ops_no_write); - break; - case EMUL_KIND_SET_CONTEXT_INSN: { - struct vcpu *curr = current; - struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io; + hvm_emulate_init_once(&ctx, NULL, guest_cpu_user_regs()); - BUILD_BUG_ON(sizeof(vio->mmio_insn) != - sizeof(curr->arch.vm_event->emul.insn.data)); - ASSERT(!vio->mmio_insn_bytes); + switch ( kind ) + { + case EMUL_KIND_NOWRITE: + rc = _hvm_emulate_one(&ctx, &hvm_emulate_ops_no_write); + break; + case EMUL_KIND_SET_CONTEXT_INSN: { + struct vcpu *curr = current; + struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io; + + BUILD_BUG_ON(sizeof(vio->mmio_insn) != + sizeof(curr->arch.vm_event->emul.insn.data)); + ASSERT(!vio->mmio_insn_bytes); + + /* + * Stash insn buffer into mmio buffer here instead of ctx + * to avoid having to add more logic to hvm_emulate_one. + */ + vio->mmio_insn_bytes = sizeof(vio->mmio_insn); + memcpy(vio->mmio_insn, curr->arch.vm_event->emul.insn.data, + vio->mmio_insn_bytes); + } + /* Fall-through */ + default: + ctx.set_context = (kind == EMUL_KIND_SET_CONTEXT_DATA); + rc = hvm_emulate_one(&ctx); + } - /* - * Stash insn buffer into mmio buffer here instead of ctx - * to avoid having to add more logic to hvm_emulate_one. - */ - vio->mmio_insn_bytes = sizeof(vio->mmio_insn); - memcpy(vio->mmio_insn, curr->arch.vm_event->emul.insn.data, - vio->mmio_insn_bytes); - } - /* Fall-through */ - default: - ctx.set_context = (kind == EMUL_KIND_SET_CONTEXT_DATA); - rc = hvm_emulate_one(&ctx); - } + switch ( rc ) + { + case X86EMUL_RETRY: + break; + case X86EMUL_UNHANDLEABLE: + hvm_dump_emulation_state(XENLOG_G_DEBUG "Mem event", &ctx); + hvm_inject_hw_exception(trapnr, errcode); + break; + case X86EMUL_EXCEPTION: + if ( ctx.ctxt.event_pending ) + hvm_inject_event(&ctx.ctxt.event); + break; + } - switch ( rc ) - { - case X86EMUL_RETRY: - /* - * This function is called when handling an EPT-related vm_event - * reply. As such, nothing else needs to be done here, since simply - * returning makes the current instruction cause a page fault again, - * consistent with X86EMUL_RETRY. - */ - return; - case X86EMUL_UNHANDLEABLE: - hvm_dump_emulation_state(XENLOG_G_DEBUG "Mem event", &ctx); - hvm_inject_hw_exception(trapnr, errcode); - break; - case X86EMUL_EXCEPTION: - if ( ctx.ctxt.event_pending ) - hvm_inject_event(&ctx.ctxt.event); - break; - } + hvm_emulate_writeback(&ctx); - hvm_emulate_writeback(&ctx); + } while( rc == X86EMUL_RETRY ); } void hvm_emulate_init_once(