From patchwork Thu Apr 6 13:19:06 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhang X-Patchwork-Id: 9667435 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CE327601EB for ; Thu, 6 Apr 2017 13:40:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8356728159 for ; Thu, 6 Apr 2017 13:40:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 772E828249; Thu, 6 Apr 2017 13:40:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A627628159 for ; Thu, 6 Apr 2017 13:40:58 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cw7ca-0004ph-82; Thu, 06 Apr 2017 13:38:44 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cw7cZ-0004oe-6s for xen-devel@lists.xen.org; Thu, 06 Apr 2017 13:38:43 +0000 Received: from [193.109.254.147] by server-5.bemta-6.messagelabs.com id E9/54-27545-2E446E85; Thu, 06 Apr 2017 13:38:42 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrNLMWRWlGSWpSXmKPExsXS1tYhovvQ5Vm EwbIjuhZLPi5mcWD0OLr7N1MAYxRrZl5SfkUCa8aTcxeZClbFVDybtJy5gXGOaxcjFweLwC0m iVt9N1hAHCGB6YwSa//PYu5i5OSQEOCVOLJsBiuE7S9x+EY7K0RRO6PEw88NYEVsAtoSP1b/Z gSxRQSkJa59vswIUsQs0MkkcXLrZTaQhLBAqkTr4sdgDSwCqhJ35/8HirNz8Ap4SmxXhZgvJ3 Hy2GSwXZwCXhJb9u1kAbGFgCq+rlvKNoGRbwEjwypGjeLUorLUIl1DY72kosz0jJLcxMwcXUM DM73c1OLixPTUnMSkYr3k/NxNjMBAYQCCHYxflgUcYpTkYFIS5VXweRIhxJeUn1KZkVicEV9U mpNafIhRhoNDSYKXHRh4QoJFqempFWmZOcCQhUlLcPAoifAGOAOleYsLEnOLM9MhUqcYFaXEe Q+AJARAEhmleXBtsDi5xCgrJczLCHSIEE9BalFuZgmq/CtGcQ5GJWHeIJApPJl5JXDTXwEtZg Ja7HPrKcjikkSElFQD42SzjDUbv555Kee7pyMv9Y7fvBijla18M1fFHl417Vlcn/dN7r9zOTd N3x3xPvIwa1Od+4OHhSsffo3VKN6leLXo/IkrtttviS7h5mrTyVujajEtu/TrmsygPI088Ykr Lc7vNNa4MsE+KmVPr5bMKfM8geknFwjaTfdaHsBRYiAXKCVy5pZ/hhJLcUaioRZzUXEiAA1mj pOOAgAA X-Env-Sender: yu.c.zhang@linux.intel.com X-Msg-Ref: server-9.tower-27.messagelabs.com!1491485913!96162937!3 X-Originating-IP: [134.134.136.20] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogMTM0LjEzNC4xMzYuMjAgPT4gMzU1MzU4\n X-StarScan-Received: X-StarScan-Version: 9.4.12; banners=-,-,- X-VirusChecked: Checked Received: (qmail 24360 invoked from network); 6 Apr 2017 13:38:41 -0000 Received: from mga02.intel.com (HELO mga02.intel.com) (134.134.136.20) by server-9.tower-27.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 6 Apr 2017 13:38:41 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=intel; t=1491485921; x=1523021921; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=1p3YBqCfuIC7kU+ADJrgYBLKLI+FjSGxQcwO6Hk3Trk=; b=FbHUoZeE/mxqeERaBa7zvaJ+zJtVmfNDaBSf3qsMz33yWoG6ijI4S1xL ihhOcWlPLxzn60Mrv+9M794IqqSgEQ==; Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Apr 2017 06:38:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.37,159,1488873600"; d="scan'208";a="842691752" Received: from zhangyu-optiplex-9020.bj.intel.com ([10.238.135.159]) by FMSMGA003.fm.intel.com with ESMTP; 06 Apr 2017 06:38:38 -0700 From: Yu Zhang To: xen-devel@lists.xen.org Date: Thu, 6 Apr 2017 21:19:06 +0800 Message-Id: <1491484747-5133-6-git-send-email-yu.c.zhang@linux.intel.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1491484747-5133-1-git-send-email-yu.c.zhang@linux.intel.com> References: <1491484747-5133-1-git-send-email-yu.c.zhang@linux.intel.com> Cc: Kevin Tian , Jun Nakajima , George Dunlap , Andrew Cooper , Paul Durrant , zhiyuan.lv@intel.com, Jan Beulich Subject: [Xen-devel] [PATCH v11 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries. X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP After an ioreq server has unmapped, the remaining p2m_ioreq_server entries need to be reset back to p2m_ram_rw. This patch does this asynchronously with the current p2m_change_entry_type_global() interface. New field entry_count is introduced in struct p2m_domain, to record the number of p2m_ioreq_server p2m page table entries. One nature of these entries is that they only point to 4K sized page frames, because all p2m_ioreq_server entries are originated from p2m_ram_rw ones in p2m_change_type_one(). We do not need to worry about the counting for 2M/1G sized pages. This patch disallows mapping of an ioreq server, when there's still p2m_ioreq_server entry left, in case another mapping occurs right after the current one being unmapped, releases its lock, with p2m table not synced yet. This patch also disallows live migration, when there's remaining p2m_ioreq_server entry in p2m table. The core reason is our current implementation of p2m_change_entry_type_global() lacks information to resync p2m_ioreq_server entries correctly if global_logdirty is on. Signed-off-by: Yu Zhang Reviewed-by: Paul Durrant --- Cc: Paul Durrant Cc: Jan Beulich Cc: Andrew Cooper Cc: George Dunlap Cc: Jun Nakajima Cc: Kevin Tian changes in v6: - According to comments from Jan & George: move the count from p2m_change_type_one() to {ept,p2m_pt}_set_entry. - According to comments from George: comments change. changes in v5: - According to comments from Jan: use unsigned long for entry_count; - According to comments from Jan: refuse mapping requirement when there's p2m_ioreq_server remained in p2m table. - Added "Reviewed-by: Paul Durrant " changes in v4: - According to comments from Jan: use ASSERT() instead of 'if' condition in p2m_change_type_one(). - According to comments from Jan: commit message changes to mention the p2m_ioreq_server are all based on 4K sized pages. changes in v3: - Move the synchronously resetting logic into patch 5. - According to comments from Jan: introduce p2m_check_changeable() to clarify the p2m type change code. - According to comments from George: use locks in the same order to avoid deadlock, call p2m_change_entry_type_global() after unmap of the ioreq server is finished. changes in v2: - Move the calculation of ioreq server page entry_cout into p2m_change_type_one() so that we do not need a seperate lock. Note: entry_count is also calculated in resolve_misconfig()/ do_recalc(), fortunately callers of both routines have p2m lock protected already. - Simplify logic in hvmop_set_mem_type(). - Introduce routine p2m_finish_type_change() to walk the p2m table and do the p2m reset. --- xen/arch/x86/hvm/ioreq.c | 8 ++++++++ xen/arch/x86/mm/hap/hap.c | 9 +++++++++ xen/arch/x86/mm/p2m-ept.c | 20 +++++++++++++++++++- xen/arch/x86/mm/p2m-pt.c | 28 ++++++++++++++++++++++++++-- xen/arch/x86/mm/p2m.c | 9 +++++++++ xen/include/asm-x86/p2m.h | 9 ++++++++- 6 files changed, 79 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index 5bf3b6d..07a6c26 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -955,6 +955,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock); + if ( rc == 0 && flags == 0 ) + { + struct p2m_domain *p2m = p2m_get_hostp2m(d); + + if ( read_atomic(&p2m->ioreq.entry_count) ) + p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw); + } + return rc; } diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index a57b385..4b591fe 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -187,6 +187,15 @@ out: */ static int hap_enable_log_dirty(struct domain *d, bool_t log_global) { + struct p2m_domain *p2m = p2m_get_hostp2m(d); + + /* + * Refuse to turn on global log-dirty mode if + * there are outstanding p2m_ioreq_server pages. + */ + if ( log_global && read_atomic(&p2m->ioreq.entry_count) ) + return -EBUSY; + /* turn on PG_log_dirty bit in paging mode */ paging_lock(d); d->arch.paging.mode |= PG_log_dirty; diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index cc1eb21..c66607a 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn) e.ipat = ipat; if ( e.recalc && p2m_is_changeable(e.sa_p2mt) ) { + if ( e.sa_p2mt == p2m_ioreq_server ) + { + ASSERT(p2m->ioreq.entry_count > 0); + p2m->ioreq.entry_count--; + } + e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i) ? p2m_ram_logdirty : p2m_ram_rw; ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access); @@ -816,6 +822,18 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, new_entry.suppress_ve = is_epte_valid(&old_entry) ? old_entry.suppress_ve : 1; + /* + * p2m_ioreq_server is only used for 4K pages, so + * we only need to do the count for leaf entries. + */ + if ( unlikely(ept_entry->sa_p2mt == p2m_ioreq_server) && + ept_entry->sa_p2mt != p2mt && + i == 0 ) + { + ASSERT(p2m->ioreq.entry_count > 0); + p2m->ioreq.entry_count--; + } + rc = atomic_write_ept_entry(ept_entry, new_entry, target); if ( unlikely(rc) ) old_entry.epte = 0; @@ -965,7 +983,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m, if ( is_epte_valid(ept_entry) ) { if ( (recalc || ept_entry->recalc) && - p2m_is_changeable(ept_entry->sa_p2mt) ) + p2m_check_changeable(ept_entry->sa_p2mt) ) *t = p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty : p2m_ram_rw; else diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index c0055f3..5317815 100644 --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -436,11 +436,13 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) needs_recalc(l1, *pent) ) { l1_pgentry_t e = *pent; + p2m_type_t p2mt_old; if ( !valid_recalc(l1, e) ) P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n", p2m->domain->domain_id, gfn, level); - if ( p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(e))) ) + p2mt_old = p2m_flags_to_type(l1e_get_flags(e)); + if ( p2m_is_changeable(p2mt_old) ) { unsigned long mask = ~0UL << (level * PAGETABLE_ORDER); p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask) @@ -460,6 +462,13 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) mfn &= ~((unsigned long)_PAGE_PSE_PAT >> PAGE_SHIFT); flags |= _PAGE_PSE; } + + if ( p2mt_old == p2m_ioreq_server ) + { + ASSERT(p2m->ioreq.entry_count > 0); + p2m->ioreq.entry_count--; + } + e = l1e_from_pfn(mfn, flags); p2m_add_iommu_flags(&e, level, (p2mt == p2m_ram_rw) @@ -606,6 +615,8 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, if ( page_order == PAGE_ORDER_4K ) { + p2m_type_t p2mt_old; + rc = p2m_next_level(p2m, &table, &gfn_remainder, gfn, L2_PAGETABLE_SHIFT - PAGE_SHIFT, L2_PAGETABLE_ENTRIES, PGT_l1_page_table, 1); @@ -629,6 +640,19 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, if ( entry_content.l1 != 0 ) p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags); + p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry)); + + /* + * p2m_ioreq_server is only used for 4K pages, so + * we only need to do the count for level 1 entries. + */ + if ( unlikely(p2mt_old == p2m_ioreq_server) && + p2mt_old != p2mt) + { + ASSERT(p2m->ioreq.entry_count > 0); + p2m->ioreq.entry_count--; + } + /* level 1 entry */ p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1); /* NB: paging_write_p2m_entry() handles tlb flushes properly */ @@ -729,7 +753,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, static inline p2m_type_t recalc_type(bool_t recalc, p2m_type_t t, struct p2m_domain *p2m, unsigned long gfn) { - if ( !recalc || !p2m_is_changeable(t) ) + if ( !recalc || !p2m_check_changeable(t) ) return t; return p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty : p2m_ram_rw; diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index b84add0..4169d18 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -317,6 +317,15 @@ int p2m_set_ioreq_server(struct domain *d, if ( p2m->ioreq.server != NULL ) goto out; + /* + * It is possible that an ioreq server has just been unmapped, + * released the spin lock, with some p2m_ioreq_server entries + * in p2m table remained. We shall refuse another ioreq server + * mapping request in such case. + */ + if ( read_atomic(&p2m->ioreq.entry_count) ) + goto out; + p2m->ioreq.server = s; p2m->ioreq.flags = flags; } diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index 4521620..e7e390d 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -120,7 +120,10 @@ typedef unsigned int p2m_query_t; /* Types that can be subject to bulk transitions. */ #define P2M_CHANGEABLE_TYPES (p2m_to_mask(p2m_ram_rw) \ - | p2m_to_mask(p2m_ram_logdirty) ) + | p2m_to_mask(p2m_ram_logdirty) \ + | p2m_to_mask(p2m_ioreq_server) ) + +#define P2M_IOREQ_TYPES (p2m_to_mask(p2m_ioreq_server)) #define P2M_POD_TYPES (p2m_to_mask(p2m_populate_on_demand)) @@ -157,6 +160,7 @@ typedef unsigned int p2m_query_t; #define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPES) #define p2m_is_discard_write(_t) (p2m_to_mask(_t) & P2M_DISCARD_WRITE_TYPES) #define p2m_is_changeable(_t) (p2m_to_mask(_t) & P2M_CHANGEABLE_TYPES) +#define p2m_is_ioreq(_t) (p2m_to_mask(_t) & P2M_IOREQ_TYPES) #define p2m_is_pod(_t) (p2m_to_mask(_t) & P2M_POD_TYPES) #define p2m_is_grant(_t) (p2m_to_mask(_t) & P2M_GRANT_TYPES) /* Grant types are *not* considered valid, because they can be @@ -178,6 +182,8 @@ typedef unsigned int p2m_query_t; #define p2m_allows_invalid_mfn(t) (p2m_to_mask(t) & P2M_INVALID_MFN_TYPES) +#define p2m_check_changeable(t) (p2m_is_changeable(t) && !p2m_is_ioreq(t)) + typedef enum { p2m_host, p2m_nested, @@ -349,6 +355,7 @@ struct p2m_domain { * are to be emulated by an ioreq server. */ unsigned int flags; + unsigned long entry_count; } ioreq; };