From patchwork Wed Jun 26 18:11:35 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul Gortmaker X-Patchwork-Id: 2787721 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 922AB9F3A0 for ; Wed, 26 Jun 2013 18:11:57 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 387EE20571 for ; Wed, 26 Jun 2013 18:11:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3D4C62056D for ; Wed, 26 Jun 2013 18:11:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752484Ab3FZSLv (ORCPT ); Wed, 26 Jun 2013 14:11:51 -0400 Received: from mail1.windriver.com ([147.11.146.13]:47410 "EHLO mail1.windriver.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751813Ab3FZSLt (ORCPT ); Wed, 26 Jun 2013 14:11:49 -0400 Received: from ALA-HCA.corp.ad.wrs.com (ala-hca.corp.ad.wrs.com [147.11.189.40]) by mail1.windriver.com (8.14.5/8.14.3) with ESMTP id r5QIBlMO021141 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Wed, 26 Jun 2013 11:11:48 -0700 (PDT) Received: from yow-pgortmak-d2.corp.ad.wrs.com (128.224.146.165) by ALA-HCA.corp.ad.wrs.com (147.11.189.40) with Microsoft SMTP Server id 14.2.342.3; Wed, 26 Jun 2013 11:11:47 -0700 From: Paul Gortmaker To: CC: , Paul Gortmaker , Paolo Bonzini , Gleb Natapov Subject: [PATCH-next v2] kvm: don't try to take mmu_lock while holding the main raw kvm_lock Date: Wed, 26 Jun 2013 14:11:35 -0400 Message-ID: <1372270295-16496-1-git-send-email-paul.gortmaker@windriver.com> X-Mailer: git-send-email 1.8.1.2 In-Reply-To: <51CAA1DE.2020307@redhat.com> References: <51CAA1DE.2020307@redhat.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"), the kvm_lock was made a raw lock. However, the kvm mmu_shrink() function tries to grab the (non-raw) mmu_lock within the scope of the raw locked kvm_lock being held. This leads to the following: BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0 Preemption disabled at:[] mmu_shrink+0x5c/0x1b0 [kvm] Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt Call Trace: [] __might_sleep+0xfd/0x160 [] rt_spin_lock+0x24/0x50 [] mmu_shrink+0xec/0x1b0 [kvm] [] shrink_slab+0x17d/0x3a0 [] ? mem_cgroup_iter+0x130/0x260 [] balance_pgdat+0x54a/0x730 [] ? set_pgdat_percpu_threshold+0xa7/0xd0 [] kswapd+0x18f/0x490 [] ? get_parent_ip+0x11/0x50 [] ? __init_waitqueue_head+0x50/0x50 [] ? balance_pgdat+0x730/0x730 [] kthread+0xdb/0xe0 [] ? finish_task_switch+0x52/0x100 [] kernel_thread_helper+0x4/0x10 [] ? __init_kthread_worker+0x Note that the above was seen on an earlier 3.4 preempt-rt, for where the lock distinction (raw vs. non-raw) actually matters. Since we only use the lock for protecting the vm_list, once we've found the instance we want, we can shuffle it to the end of the list and then drop the kvm_lock before taking the mmu_lock. We can do this because after the mmu operations are completed, we break -- i.e. we don't continue list processing, so it doesn't matter if the list changed around us. Since the shrinker code runs asynchronously with respect to KVM, we do need to still protect against the users_count going to zero and then kvm_destroy_vm() being called, so we use kvm_get_kvm/kvm_put_kvm, as suggested by Paolo. Cc: Paolo Bonzini Cc: Gleb Natapov Signed-off-by: Paul Gortmaker --- [v2: add the kvm_get_kvm, update comments and log appropriately] diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 748e0d8..662b679 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -4322,6 +4322,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) { struct kvm *kvm; int nr_to_scan = sc->nr_to_scan; + int found = 0; unsigned long freed = 0; raw_spin_lock(&kvm_lock); @@ -4349,6 +4350,18 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) continue; idx = srcu_read_lock(&kvm->srcu); + + list_move_tail(&kvm->vm_list, &vm_list); + found = 1; + /* + * We are done with the list, so drop kvm_lock, as we can't be + * holding a raw lock and take the non-raw mmu_lock. But we + * don't want to be unprotected from kvm_destroy_vm either, + * so we bump users_count. + */ + kvm_get_kvm(kvm); + raw_spin_unlock(&kvm_lock); + spin_lock(&kvm->mmu_lock); if (kvm_has_zapped_obsolete_pages(kvm)) { @@ -4363,6 +4376,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) unlock: spin_unlock(&kvm->mmu_lock); + kvm_put_kvm(kvm); srcu_read_unlock(&kvm->srcu, idx); /* @@ -4370,11 +4384,12 @@ unlock: * per-vm shrinkers cry out * sadness comes quickly */ - list_move_tail(&kvm->vm_list, &vm_list); break; } - raw_spin_unlock(&kvm_lock); + if (!found) + raw_spin_unlock(&kvm_lock); + return freed; }