From patchwork Tue Jun 25 22:34:03 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul Gortmaker X-Patchwork-Id: 2781241 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 2F2249F3A0 for ; Tue, 25 Jun 2013 22:34:20 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 32C562019F for ; Tue, 25 Jun 2013 22:34:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E80D720190 for ; Tue, 25 Jun 2013 22:34:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751446Ab3FYWeN (ORCPT ); Tue, 25 Jun 2013 18:34:13 -0400 Received: from mail.windriver.com ([147.11.1.11]:36765 "EHLO mail.windriver.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751059Ab3FYWeM (ORCPT ); Tue, 25 Jun 2013 18:34:12 -0400 Received: from ALA-HCA.corp.ad.wrs.com (ala-hca.corp.ad.wrs.com [147.11.189.40]) by mail.windriver.com (8.14.5/8.14.3) with ESMTP id r5PMYAB3019567 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Tue, 25 Jun 2013 15:34:10 -0700 (PDT) Received: from yow-lpgnfs-02.corp.ad.wrs.com (128.224.149.8) by ALA-HCA.corp.ad.wrs.com (147.11.189.40) with Microsoft SMTP Server id 14.2.342.3; Tue, 25 Jun 2013 15:34:10 -0700 From: Paul Gortmaker To: Gleb Natapov , Paolo Bonzini CC: , , Paul Gortmaker Subject: [PATCH-next] kvm: don't try to take mmu_lock while holding the main raw kvm_lock Date: Tue, 25 Jun 2013 18:34:03 -0400 Message-ID: <1372199643-3936-1-git-send-email-paul.gortmaker@windriver.com> X-Mailer: git-send-email 1.8.1.2 MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"), the kvm_lock was made a raw lock. However, the kvm mmu_shrink() function tries to grab the (non-raw) mmu_lock within the scope of the raw locked kvm_lock being held. This leads to the following: BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0 Preemption disabled at:[] mmu_shrink+0x5c/0x1b0 [kvm] Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt Call Trace: [] __might_sleep+0xfd/0x160 [] rt_spin_lock+0x24/0x50 [] mmu_shrink+0xec/0x1b0 [kvm] [] shrink_slab+0x17d/0x3a0 [] ? mem_cgroup_iter+0x130/0x260 [] balance_pgdat+0x54a/0x730 [] ? set_pgdat_percpu_threshold+0xa7/0xd0 [] kswapd+0x18f/0x490 [] ? get_parent_ip+0x11/0x50 [] ? __init_waitqueue_head+0x50/0x50 [] ? balance_pgdat+0x730/0x730 [] kthread+0xdb/0xe0 [] ? finish_task_switch+0x52/0x100 [] kernel_thread_helper+0x4/0x10 [] ? __init_kthread_worker+0x Since we only use the lock for protecting the vm_list, once we've found the instance we want, we can shuffle it to the end of the list and then drop the kvm_lock before taking the mmu_lock. We can do this because after the mmu operations are completed, we break -- i.e. we don't continue list processing, so it doesn't matter if the list changed around us. Signed-off-by: Paul Gortmaker --- [Note1: do double check that this solution makes sense for the mainline kernel; consider this an RFC patch that does want a review from people in the know.] [Note2: you'll need to be running a preempt-rt kernel to actually see this. Also note that the above patch is against linux-next. Alternate solutions welcome ; this seemed to me the obvious fix.] arch/x86/kvm/mmu.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 748e0d8..db93a70 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -4322,6 +4322,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) { struct kvm *kvm; int nr_to_scan = sc->nr_to_scan; + int found = 0; unsigned long freed = 0; raw_spin_lock(&kvm_lock); @@ -4349,6 +4350,12 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) continue; idx = srcu_read_lock(&kvm->srcu); + + list_move_tail(&kvm->vm_list, &vm_list); + found = 1; + /* We can't be holding a raw lock and take non-raw mmu_lock */ + raw_spin_unlock(&kvm_lock); + spin_lock(&kvm->mmu_lock); if (kvm_has_zapped_obsolete_pages(kvm)) { @@ -4370,11 +4377,12 @@ unlock: * per-vm shrinkers cry out * sadness comes quickly */ - list_move_tail(&kvm->vm_list, &vm_list); break; } - raw_spin_unlock(&kvm_lock); + if (!found) + raw_spin_unlock(&kvm_lock); + return freed; }