From patchwork Tue Apr 16 06:32:44 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Guangrong X-Patchwork-Id: 2448051 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id 68A55DF230 for ; Tue, 16 Apr 2013 06:40:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755484Ab3DPGkB (ORCPT ); Tue, 16 Apr 2013 02:40:01 -0400 Received: from e23smtp07.au.ibm.com ([202.81.31.140]:49378 "EHLO e23smtp07.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754063Ab3DPGjl (ORCPT ); Tue, 16 Apr 2013 02:39:41 -0400 Received: from /spool/local by e23smtp07.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 16 Apr 2013 16:30:21 +1000 Received: from d23dlp01.au.ibm.com (202.81.31.203) by e23smtp07.au.ibm.com (202.81.31.204) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 16 Apr 2013 16:29:55 +1000 Received: from d23relay03.au.ibm.com (d23relay03.au.ibm.com [9.190.235.21]) by d23dlp01.au.ibm.com (Postfix) with ESMTP id A9DF22CE80FA; Tue, 16 Apr 2013 16:35:21 +1000 (EST) Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139]) by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r3G6YMvK5636140; Tue, 16 Apr 2013 16:34:28 +1000 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r3G6YIBI006650; Tue, 16 Apr 2013 16:34:19 +1000 Received: from localhost (dhcp-9-111-29-110.cn.ibm.com [9.111.29.110]) by d23av04.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id r3G6XFtP004986; Tue, 16 Apr 2013 16:33:52 +1000 From: Xiao Guangrong To: mtosatti@redhat.com Cc: gleb@redhat.com, avi.kivity@gmail.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Xiao Guangrong Subject: [PATCH v3 06/15] KVM: MMU: allow concurrently clearing spte on remove-only pte-list Date: Tue, 16 Apr 2013 14:32:44 +0800 Message-Id: <1366093973-2617-7-git-send-email-xiaoguangrong@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.7.6 In-Reply-To: <1366093973-2617-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com> References: <1366093973-2617-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com> X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13041606-0260-0000-0000-000002D0AD31 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch introduce PTE_LIST_SPTE_SKIP which is the placeholder and it will be set on pte-list after removing a spte so that other sptes on this pte_list are not moved and the pte-list-descs on the pte-list are not freed. If vcpu can not add spte to the pte-list (e.g. the rmap on invalid memslot) and spte can not be freed during pte-list walk, we can concurrently clear sptes on the pte-list, the worst case is, we double zap a spte that is safe. This patch only ensures that concurrently zapping pte-list is safe, we will keep spte available during concurrently clearing in the later patches Signed-off-by: Xiao Guangrong --- arch/x86/kvm/mmu.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 57 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 99ad2a4..850eab5 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -900,6 +900,18 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn) } /* + * It is the placeholder and it will be set on pte-list after removing + * a spte so that other sptes on this pte_list are not moved and the + * pte-list-descs on the pte-list are not freed. + * + * If vcpu can not add spte to the pte-list (e.g. the rmap on invalid + * memslot) and spte can not be freed during pte-list walk, we can + * cocurrently clear sptes on the pte-list, the worst case is, we double + * zap a spte that is safe. + */ +#define PTE_LIST_SPTE_SKIP (u64 *)((~0x0ul) & (~1)) + +/* * Pte mapping structures: * * If pte_list bit zero is zero, then pte_list point to the spte. @@ -1003,6 +1015,40 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list) } } +static void pte_list_clear_concurrently(u64 *spte, unsigned long *pte_list) +{ + struct pte_list_desc *desc; + unsigned long pte_value = *pte_list; + int i; + + /* Empty pte list stores nothing. */ + WARN_ON(!pte_value); + + if (!(pte_value & 1)) { + if ((u64 *)pte_value == spte) { + *pte_list = (unsigned long)PTE_LIST_SPTE_SKIP; + return; + } + + /* someone has already cleared it. */ + WARN_ON(pte_value != (unsigned long)PTE_LIST_SPTE_SKIP); + return; + } + + desc = (struct pte_list_desc *)(pte_value & ~1ul); + while (desc) { + for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i) + if (desc->sptes[i] == spte) { + desc->sptes[i] = PTE_LIST_SPTE_SKIP; + return; + } + + desc = desc->more; + } + + return; +} + typedef void (*pte_list_walk_fn) (u64 *spte); static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn) { @@ -1214,6 +1260,12 @@ spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect) return false; } +/* PTE_LIST_SPTE_SKIP is only used on invalid rmap. */ +static void check_valid_sptep(u64 *sptep) +{ + WARN_ON(sptep == PTE_LIST_SPTE_SKIP || !is_rmap_spte(*sptep)); +} + static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, bool pt_protect) { @@ -1222,7 +1274,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, bool flush = false; for (sptep = rmap_get_first(*rmapp, &iter); sptep;) { - BUG_ON(!(*sptep & PT_PRESENT_MASK)); + check_valid_sptep(sptep); if (spte_write_protect(kvm, sptep, &flush, pt_protect)) { sptep = rmap_get_first(*rmapp, &iter); continue; @@ -1293,7 +1345,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp) int need_tlb_flush = 0; while ((sptep = rmap_get_first(*rmapp, &iter))) { - BUG_ON(!(*sptep & PT_PRESENT_MASK)); + check_valid_sptep(sptep); rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", sptep, *sptep); drop_spte(kvm, sptep); @@ -1322,7 +1374,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp, new_pfn = pte_pfn(*ptep); for (sptep = rmap_get_first(*rmapp, &iter); sptep;) { - BUG_ON(!is_shadow_present_pte(*sptep)); + check_valid_sptep(sptep); rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", sptep, *sptep); need_flush = 1; @@ -1455,7 +1507,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp) for (sptep = rmap_get_first(*rmapp, &iter); sptep; sptep = rmap_get_next(&iter)) { - BUG_ON(!is_shadow_present_pte(*sptep)); + check_valid_sptep(sptep); if (*sptep & shadow_accessed_mask) { young = 1; @@ -1493,7 +1545,7 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp) for (sptep = rmap_get_first(*rmapp, &iter); sptep; sptep = rmap_get_next(&iter)) { - BUG_ON(!is_shadow_present_pte(*sptep)); + check_valid_sptep(sptep); if (*sptep & shadow_accessed_mask) { young = 1;