Message ID | 1372199643-3936-1-git-send-email-paul.gortmaker@windriver.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Il 26/06/2013 00:34, Paul Gortmaker ha scritto: > In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"), > the kvm_lock was made a raw lock. However, the kvm mmu_shrink() > function tries to grab the (non-raw) mmu_lock within the scope of > the raw locked kvm_lock being held. This leads to the following: > > BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 > in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0 > Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm] > > Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt > Call Trace: > [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160 > [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50 > [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm] > [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0 > [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260 > [<ffffffff8111824a>] balance_pgdat+0x54a/0x730 > [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0 > [<ffffffff811185bf>] kswapd+0x18f/0x490 > [<ffffffff81070961>] ? get_parent_ip+0x11/0x50 > [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50 > [<ffffffff81118430>] ? balance_pgdat+0x730/0x730 > [<ffffffff81060d2b>] kthread+0xdb/0xe0 > [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100 > [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10 > [<ffffffff81060c50>] ? __init_kthread_worker+0x > > Since we only use the lock for protecting the vm_list, once we've > found the instance we want, we can shuffle it to the end of the > list and then drop the kvm_lock before taking the mmu_lock. We > can do this because after the mmu operations are completed, we > break -- i.e. we don't continue list processing, so it doesn't > matter if the list changed around us. > > Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Since the shrinker code is asynchronous with respect to KVM, I think that the kvm_lock here is also protecting against kvm_destroy_vm running at the same time. So the patch is almost okay; all that is missing is a kvm_get_kvm/kvm_put_kvm pair, where the reference is added just before releasing the kvm_lock. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote: > In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"), I am copying Jan, the author of the patch. Commit message says: "Code under this lock requires non-preemptibility", but which code exactly is this? Is this still true? > the kvm_lock was made a raw lock. However, the kvm mmu_shrink() > function tries to grab the (non-raw) mmu_lock within the scope of > the raw locked kvm_lock being held. This leads to the following: > > BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 > in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0 > Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm] > > Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt > Call Trace: > [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160 > [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50 > [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm] > [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0 > [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260 > [<ffffffff8111824a>] balance_pgdat+0x54a/0x730 > [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0 > [<ffffffff811185bf>] kswapd+0x18f/0x490 > [<ffffffff81070961>] ? get_parent_ip+0x11/0x50 > [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50 > [<ffffffff81118430>] ? balance_pgdat+0x730/0x730 > [<ffffffff81060d2b>] kthread+0xdb/0xe0 > [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100 > [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10 > [<ffffffff81060c50>] ? __init_kthread_worker+0x > > Since we only use the lock for protecting the vm_list, once we've > found the instance we want, we can shuffle it to the end of the > list and then drop the kvm_lock before taking the mmu_lock. We > can do this because after the mmu operations are completed, we > break -- i.e. we don't continue list processing, so it doesn't > matter if the list changed around us. > > Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> > --- > > [Note1: do double check that this solution makes sense for the > mainline kernel; consider this an RFC patch that does want a > review from people in the know.] > > [Note2: you'll need to be running a preempt-rt kernel to actually > see this. Also note that the above patch is against linux-next. > Alternate solutions welcome ; this seemed to me the obvious fix.] > > arch/x86/kvm/mmu.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 748e0d8..db93a70 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -4322,6 +4322,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > { > struct kvm *kvm; > int nr_to_scan = sc->nr_to_scan; > + int found = 0; > unsigned long freed = 0; > > raw_spin_lock(&kvm_lock); > @@ -4349,6 +4350,12 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > continue; > > idx = srcu_read_lock(&kvm->srcu); > + > + list_move_tail(&kvm->vm_list, &vm_list); > + found = 1; > + /* We can't be holding a raw lock and take non-raw mmu_lock */ > + raw_spin_unlock(&kvm_lock); > + > spin_lock(&kvm->mmu_lock); > > if (kvm_has_zapped_obsolete_pages(kvm)) { > @@ -4370,11 +4377,12 @@ unlock: > * per-vm shrinkers cry out > * sadness comes quickly > */ > - list_move_tail(&kvm->vm_list, &vm_list); > break; > } > > - raw_spin_unlock(&kvm_lock); > + if (!found) > + raw_spin_unlock(&kvm_lock); > + > return freed; > > } > -- > 1.8.1.2 -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 27/06/2013 13:09, Gleb Natapov ha scritto: > On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote: >> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"), > I am copying Jan, the author of the patch. Commit message says: > "Code under this lock requires non-preemptibility", but which code > exactly is this? Is this still true? hardware_enable_nolock/hardware_disable_nolock does. Paolo >> the kvm_lock was made a raw lock. However, the kvm mmu_shrink() >> function tries to grab the (non-raw) mmu_lock within the scope of >> the raw locked kvm_lock being held. This leads to the following: >> >> BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 >> in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0 >> Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm] >> >> Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt >> Call Trace: >> [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160 >> [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50 >> [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm] >> [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0 >> [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260 >> [<ffffffff8111824a>] balance_pgdat+0x54a/0x730 >> [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0 >> [<ffffffff811185bf>] kswapd+0x18f/0x490 >> [<ffffffff81070961>] ? get_parent_ip+0x11/0x50 >> [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50 >> [<ffffffff81118430>] ? balance_pgdat+0x730/0x730 >> [<ffffffff81060d2b>] kthread+0xdb/0xe0 >> [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100 >> [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10 >> [<ffffffff81060c50>] ? __init_kthread_worker+0x >> >> Since we only use the lock for protecting the vm_list, once we've >> found the instance we want, we can shuffle it to the end of the >> list and then drop the kvm_lock before taking the mmu_lock. We >> can do this because after the mmu operations are completed, we >> break -- i.e. we don't continue list processing, so it doesn't >> matter if the list changed around us. >> >> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> >> --- >> >> [Note1: do double check that this solution makes sense for the >> mainline kernel; consider this an RFC patch that does want a >> review from people in the know.] >> >> [Note2: you'll need to be running a preempt-rt kernel to actually >> see this. Also note that the above patch is against linux-next. >> Alternate solutions welcome ; this seemed to me the obvious fix.] >> >> arch/x86/kvm/mmu.c | 12 ++++++++++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c >> index 748e0d8..db93a70 100644 >> --- a/arch/x86/kvm/mmu.c >> +++ b/arch/x86/kvm/mmu.c >> @@ -4322,6 +4322,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) >> { >> struct kvm *kvm; >> int nr_to_scan = sc->nr_to_scan; >> + int found = 0; >> unsigned long freed = 0; >> >> raw_spin_lock(&kvm_lock); >> @@ -4349,6 +4350,12 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) >> continue; >> >> idx = srcu_read_lock(&kvm->srcu); >> + >> + list_move_tail(&kvm->vm_list, &vm_list); >> + found = 1; >> + /* We can't be holding a raw lock and take non-raw mmu_lock */ >> + raw_spin_unlock(&kvm_lock); >> + >> spin_lock(&kvm->mmu_lock); >> >> if (kvm_has_zapped_obsolete_pages(kvm)) { >> @@ -4370,11 +4377,12 @@ unlock: >> * per-vm shrinkers cry out >> * sadness comes quickly >> */ >> - list_move_tail(&kvm->vm_list, &vm_list); >> break; >> } >> >> - raw_spin_unlock(&kvm_lock); >> + if (!found) >> + raw_spin_unlock(&kvm_lock); >> + >> return freed; >> >> } >> -- >> 1.8.1.2 > > -- > Gleb. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 27, 2013 at 01:38:29PM +0200, Paolo Bonzini wrote: > Il 27/06/2013 13:09, Gleb Natapov ha scritto: > > On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote: > >> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"), > > I am copying Jan, the author of the patch. Commit message says: > > "Code under this lock requires non-preemptibility", but which code > > exactly is this? Is this still true? > > hardware_enable_nolock/hardware_disable_nolock does. > I suspected this will be the answer and prepared another question :) From a glance kvm_lock is used to protect those just to avoid creating separate lock, so why not create raw one to protect them and change kvm_lock to non raw again. Admittedly I haven't looked too close into this yet. > Paolo > > >> the kvm_lock was made a raw lock. However, the kvm mmu_shrink() > >> function tries to grab the (non-raw) mmu_lock within the scope of > >> the raw locked kvm_lock being held. This leads to the following: > >> > >> BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 > >> in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0 > >> Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm] > >> > >> Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt > >> Call Trace: > >> [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160 > >> [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50 > >> [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm] > >> [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0 > >> [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260 > >> [<ffffffff8111824a>] balance_pgdat+0x54a/0x730 > >> [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0 > >> [<ffffffff811185bf>] kswapd+0x18f/0x490 > >> [<ffffffff81070961>] ? get_parent_ip+0x11/0x50 > >> [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50 > >> [<ffffffff81118430>] ? balance_pgdat+0x730/0x730 > >> [<ffffffff81060d2b>] kthread+0xdb/0xe0 > >> [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100 > >> [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10 > >> [<ffffffff81060c50>] ? __init_kthread_worker+0x > >> > >> Since we only use the lock for protecting the vm_list, once we've > >> found the instance we want, we can shuffle it to the end of the > >> list and then drop the kvm_lock before taking the mmu_lock. We > >> can do this because after the mmu operations are completed, we > >> break -- i.e. we don't continue list processing, so it doesn't > >> matter if the list changed around us. > >> > >> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> > >> --- > >> > >> [Note1: do double check that this solution makes sense for the > >> mainline kernel; consider this an RFC patch that does want a > >> review from people in the know.] > >> > >> [Note2: you'll need to be running a preempt-rt kernel to actually > >> see this. Also note that the above patch is against linux-next. > >> Alternate solutions welcome ; this seemed to me the obvious fix.] > >> > >> arch/x86/kvm/mmu.c | 12 ++++++++++-- > >> 1 file changed, 10 insertions(+), 2 deletions(-) > >> > >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > >> index 748e0d8..db93a70 100644 > >> --- a/arch/x86/kvm/mmu.c > >> +++ b/arch/x86/kvm/mmu.c > >> @@ -4322,6 +4322,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > >> { > >> struct kvm *kvm; > >> int nr_to_scan = sc->nr_to_scan; > >> + int found = 0; > >> unsigned long freed = 0; > >> > >> raw_spin_lock(&kvm_lock); > >> @@ -4349,6 +4350,12 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > >> continue; > >> > >> idx = srcu_read_lock(&kvm->srcu); > >> + > >> + list_move_tail(&kvm->vm_list, &vm_list); > >> + found = 1; > >> + /* We can't be holding a raw lock and take non-raw mmu_lock */ > >> + raw_spin_unlock(&kvm_lock); > >> + > >> spin_lock(&kvm->mmu_lock); > >> > >> if (kvm_has_zapped_obsolete_pages(kvm)) { > >> @@ -4370,11 +4377,12 @@ unlock: > >> * per-vm shrinkers cry out > >> * sadness comes quickly > >> */ > >> - list_move_tail(&kvm->vm_list, &vm_list); > >> break; > >> } > >> > >> - raw_spin_unlock(&kvm_lock); > >> + if (!found) > >> + raw_spin_unlock(&kvm_lock); > >> + > >> return freed; > >> > >> } > >> -- > >> 1.8.1.2 > > > > -- > > Gleb. > > -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 27/06/2013 13:43, Gleb Natapov ha scritto: >>> > > I am copying Jan, the author of the patch. Commit message says: >>> > > "Code under this lock requires non-preemptibility", but which code >>> > > exactly is this? Is this still true? >> > >> > hardware_enable_nolock/hardware_disable_nolock does. >> > > I suspected this will be the answer and prepared another question :) > From a glance kvm_lock is used to protect those just to avoid creating > separate lock, so why not create raw one to protect them and change > kvm_lock to non raw again. Admittedly I haven't looked too close into > this yet. I was wondering the same, but I think it's fine. There's just a handful of uses outside virt/kvm/kvm_main.c. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013-06-27 13:38, Paolo Bonzini wrote: > Il 27/06/2013 13:09, Gleb Natapov ha scritto: >> On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote: >>> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"), >> I am copying Jan, the author of the patch. Commit message says: >> "Code under this lock requires non-preemptibility", but which code >> exactly is this? Is this still true? > > hardware_enable_nolock/hardware_disable_nolock does. IIRC, also the loop in kvmclock_cpufreq_notifier needs it because it reads the processor ID of the caller. That implies the caller cannot be preempted, but theses days a migration lock should be fine as well. Jan
On Thu, Jun 27, 2013 at 02:16:07PM +0200, Jan Kiszka wrote: > On 2013-06-27 13:38, Paolo Bonzini wrote: > > Il 27/06/2013 13:09, Gleb Natapov ha scritto: > >> On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote: > >>> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"), > >> I am copying Jan, the author of the patch. Commit message says: > >> "Code under this lock requires non-preemptibility", but which code > >> exactly is this? Is this still true? > > > > hardware_enable_nolock/hardware_disable_nolock does. > > IIRC, also the loop in kvmclock_cpufreq_notifier needs it because it > reads the processor ID of the caller. That implies the caller cannot be > preempted, but theses days a migration lock should be fine as well. > OK, adding Marcelo to the party. This code is called from cpufreq notifier. I would expect that it will be called from the context that prevents migration to another cpu. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 27/06/2013 14:32, Gleb Natapov ha scritto: >>>>> > >>> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"), >>>> > >> I am copying Jan, the author of the patch. Commit message says: >>>> > >> "Code under this lock requires non-preemptibility", but which code >>>> > >> exactly is this? Is this still true? >>> > > >>> > > hardware_enable_nolock/hardware_disable_nolock does. >> > >> > IIRC, also the loop in kvmclock_cpufreq_notifier needs it because it >> > reads the processor ID of the caller. That implies the caller cannot be >> > preempted, but theses days a migration lock should be fine as well. >> > > OK, adding Marcelo to the party. This code is called from cpufreq > notifier. I would expect that it will be called from the context that > prevents migration to another cpu. No, the CPU is in freq->cpu and may not even be the CPU that changed frequency. But even then I'm not sure the loop needs to be non-preemptible. If it were, the smp_call_function_single just before/after the loop would have to be non-preemptable as well. So it is just an optimization and it can use raw_smp_processor_id() instead. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 27/06/2013 15:00, Paolo Bonzini ha scritto: > Il 27/06/2013 14:32, Gleb Natapov ha scritto: >>>>>>>>>> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"), >>>>>>>> I am copying Jan, the author of the patch. Commit message says: >>>>>>>> "Code under this lock requires non-preemptibility", but which code >>>>>>>> exactly is this? Is this still true? >>>>>> >>>>>> hardware_enable_nolock/hardware_disable_nolock does. >>>> >>>> IIRC, also the loop in kvmclock_cpufreq_notifier needs it because it >>>> reads the processor ID of the caller. That implies the caller cannot be >>>> preempted, but theses days a migration lock should be fine as well. >>>> >> OK, adding Marcelo to the party. This code is called from cpufreq >> notifier. I would expect that it will be called from the context that >> prevents migration to another cpu. > > No, the CPU is in freq->cpu and may not even be the CPU that changed > frequency. Try again: "No, the CPU is in freq->cpu and smp_processor_id() may not even be the CPU that changed frequency". It probably makes more sense now. Paolo > But even then I'm not sure the loop needs to be non-preemptible. If it > were, the smp_call_function_single just before/after the loop would have > to be non-preemptable as well. So it is just an optimization and it can > use raw_smp_processor_id() instead. > > Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 748e0d8..db93a70 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -4322,6 +4322,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) { struct kvm *kvm; int nr_to_scan = sc->nr_to_scan; + int found = 0; unsigned long freed = 0; raw_spin_lock(&kvm_lock); @@ -4349,6 +4350,12 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) continue; idx = srcu_read_lock(&kvm->srcu); + + list_move_tail(&kvm->vm_list, &vm_list); + found = 1; + /* We can't be holding a raw lock and take non-raw mmu_lock */ + raw_spin_unlock(&kvm_lock); + spin_lock(&kvm->mmu_lock); if (kvm_has_zapped_obsolete_pages(kvm)) { @@ -4370,11 +4377,12 @@ unlock: * per-vm shrinkers cry out * sadness comes quickly */ - list_move_tail(&kvm->vm_list, &vm_list); break; } - raw_spin_unlock(&kvm_lock); + if (!found) + raw_spin_unlock(&kvm_lock); + return freed; }
In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"), the kvm_lock was made a raw lock. However, the kvm mmu_shrink() function tries to grab the (non-raw) mmu_lock within the scope of the raw locked kvm_lock being held. This leads to the following: BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0 Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm] Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt Call Trace: [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160 [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50 [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm] [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0 [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260 [<ffffffff8111824a>] balance_pgdat+0x54a/0x730 [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0 [<ffffffff811185bf>] kswapd+0x18f/0x490 [<ffffffff81070961>] ? get_parent_ip+0x11/0x50 [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50 [<ffffffff81118430>] ? balance_pgdat+0x730/0x730 [<ffffffff81060d2b>] kthread+0xdb/0xe0 [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100 [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10 [<ffffffff81060c50>] ? __init_kthread_worker+0x Since we only use the lock for protecting the vm_list, once we've found the instance we want, we can shuffle it to the end of the list and then drop the kvm_lock before taking the mmu_lock. We can do this because after the mmu operations are completed, we break -- i.e. we don't continue list processing, so it doesn't matter if the list changed around us. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> --- [Note1: do double check that this solution makes sense for the mainline kernel; consider this an RFC patch that does want a review from people in the know.] [Note2: you'll need to be running a preempt-rt kernel to actually see this. Also note that the above patch is against linux-next. Alternate solutions welcome ; this seemed to me the obvious fix.] arch/x86/kvm/mmu.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)