From patchwork Mon Aug 18 19:56:11 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiao Guangrong X-Patchwork-Id: 4738861 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 4D0659F401 for ; Mon, 18 Aug 2014 19:56:58 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 622DA20131 for ; Mon, 18 Aug 2014 19:56:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 69BFC20121 for ; Mon, 18 Aug 2014 19:56:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751944AbaHRT4Y (ORCPT ); Mon, 18 Aug 2014 15:56:24 -0400 Received: from mail-pd0-f182.google.com ([209.85.192.182]:60834 "EHLO mail-pd0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750915AbaHRT4W convert rfc822-to-8bit (ORCPT ); Mon, 18 Aug 2014 15:56:22 -0400 Received: by mail-pd0-f182.google.com with SMTP id fp1so8225304pdb.13 for ; Mon, 18 Aug 2014 12:56:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=EQ1sR0d1qHfZAsv3G4gzco8VjSk0+9rzSCUBPumamxc=; b=ahU/i/GusmxXn0sfxgIQJVeP2hLGSz96y4Tm5/kMDS0fYAqK/NfyQjTNh+Fa9x9ict vcg6RCGNZJlZEs+nj1kEH8fbyXQ4uuUmxxfqrZtxrm6rH3l4urLWGLhcrFYjJzaDknXF EQHVi5fy6HxzPcBs59Brtf6PUbJOJwfqXvIUMy+0Ka1TtcesbfqSEC5aDTR3LXB4V+Hm BfX+LkwnTyqTSfnwij1YKNrNg7nja5HYL/5C/fkQA9/bTcg3wkg75VnB4mrEHqiNc/Rc m2YViHPNnL53w+rK1DGTA2+UT2k5FL0/P633ZDqfjz6NaPcacJocUWyK02TRDOQlVqOC RO/w== X-Received: by 10.68.226.65 with SMTP id rq1mr38467824pbc.131.1408391781915; Mon, 18 Aug 2014 12:56:21 -0700 (PDT) Received: from [192.168.99.3] ([101.44.60.224]) by mx.google.com with ESMTPSA id c13sm16990383pbu.10.2014.08.18.12.56.17 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 18 Aug 2014 12:56:21 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number From: Xiao Guangrong In-Reply-To: <53F24A49.2010807@redhat.com> Date: Tue, 19 Aug 2014 03:56:11 +0800 Cc: gleb@kernel.org, avi.kivity@gmail.com, mtosatti@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, stable@vger.kernel.org, David Matlack Message-Id: References: <1407999713-3726-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com> <53F20653.2030204@redhat.com> <9AD43423-2FF3-422D-A5AD-61CAE6339CCC@linux.vnet.ibm.com> <53F24A49.2010807@redhat.com> To: Paolo Bonzini X-Mailer: Apple Mail (2.1878.6) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Aug 19, 2014, at 2:47 AM, Paolo Bonzini wrote: > >> I think this patch is auditable, page-fault is always called by holding >> srcu-lock so that a page fault can’t go across synchronize_srcu_expedited. >> Only these cases can happen: >> >> 1) page fault occurs before synchronize_srcu_expedited. >> In this case, vcpu will generate mmio-exit for the memslot being registered >> by the ioctl. That’s ok since the ioctl have not finished. >> >> 2) page fault occurs after synchronize_srcu_expedited and during >> increasing generation-number. >> In this case, userspace may get wrong mmio-exit (that happen if handing >> page-fault is slower that the ioctl), that’s ok too since userspace need do >> the check anyway like i said above. >> >> 3) page fault occurs after generation-number update >> that’s definitely correct. :) >> >>> Another alternative could be to use the low bit to mark an in-progress >>> change, and skip the caching if the low bit is set. Similar to a >>> seqcount (except if read_seqcount_retry fails, we just punt and not >>> retry anything), you could use it even though the memory barriers >>> provided by write_seqcount_begin/end are not too useful in this case. >> >> I do not know how the bit works, page fault will cache the memslot before >> the bit set and cache the generation-number after the bit set. >> >> Maybe i missed your idea, could you please detail it? > > Something like this: > > - update_memslots(slots, new, kvm->memslots->generation); > + /* ensure generation number is always increased. */ > + slots->generation = old_memslots->generation + 1; > + update_memslots(slots, new); > rcu_assign_pointer(kvm->memslots, slots); > synchronize_srcu_expedited(&kvm->srcu); > + slots->generation++; > > Then case 1 and 2 will just have a cache miss. So, case 2 is what you concerned? :) I still think it is ok but i do not have strong opinion on that. How about simplify it like this: --- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9314678..9fabf6a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -234,16 +234,22 @@ static unsigned int get_mmio_spte_generation(u64 spte) return gen; } -static unsigned int kvm_current_mmio_generation(struct kvm *kvm) +static unsigned int __kvm_current_mmio_generation(struct kvm_memslots *slots) { + /* * Init kvm generation close to MMIO_MAX_GEN to easily test the * code of handling generation number wrap-around. */ - return (kvm_memslots(kvm)->generation + + return (slots->generation + MMIO_MAX_GEN - 150) & MMIO_GEN_MASK; } +static unsigned int kvm_current_mmio_generation(struct kvm *kvm) +{ + return __kvm_current_mmio_generation(kvm_memslots(kvm)); +} + static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn, unsigned access) { @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn, static bool check_mmio_spte(struct kvm *kvm, u64 spte) { + struct kvm_memslots *slots = kvm_memslots(kvm); unsigned int kvm_gen, spte_gen; - kvm_gen = kvm_current_mmio_generation(kvm); + if (slots->updated) + return false; + + smp_rmb(); + + kvm_gen = __kvm_current_mmio_generation(slots); spte_gen = get_mmio_spte_generation(spte); trace_check_mmio_spte(spte, kvm_gen, spte_gen); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4b6c01b..1d4e78f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -96,7 +96,7 @@ static void hardware_disable_all(void); static void kvm_io_bus_destroy(struct kvm_io_bus *bus); static void update_memslots(struct kvm_memslots *slots, - struct kvm_memory_slot *new, u64 last_generation); + struct kvm_memory_slot *new); static void kvm_release_pfn_dirty(pfn_t pfn); static void mark_page_dirty_in_slot(struct kvm *kvm, @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots) } static void update_memslots(struct kvm_memslots *slots, - struct kvm_memory_slot *new, - u64 last_generation) + struct kvm_memory_slot *new) { if (new) { int id = new->id; @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots, if (new->npages != npages) sort_memslots(slots); } - - slots->generation = last_generation + 1; } static int check_memory_region_flags(struct kvm_userspace_memory_region *mem) @@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, { struct kvm_memslots *old_memslots = kvm->memslots; - update_memslots(slots, new, kvm->memslots->generation); + /* ensure generation number is always increased. */ + slots->updated = true; + slots->generation = old_memslots->generation; + update_memslots(slots, new); rcu_assign_pointer(kvm->memslots, slots); synchronize_srcu_expedited(&kvm->srcu); + slots->generation++; + smp_wmb(); + slots->updated = false; + kvm_arch_memslots_updated(kvm); return old_memslots;