diff mbox

[PATCH-next] kvm: don't try to take mmu_lock while holding the main raw kvm_lock

Message ID 1372199643-3936-1-git-send-email-paul.gortmaker@windriver.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paul Gortmaker June 25, 2013, 10:34 p.m. UTC
In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"),
the kvm_lock was made a raw lock.  However, the kvm mmu_shrink()
function tries to grab the (non-raw) mmu_lock within the scope of
the raw locked kvm_lock being held.  This leads to the following:

BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm]

Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
Call Trace:
 [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160
 [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50
 [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm]
 [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0
 [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260
 [<ffffffff8111824a>] balance_pgdat+0x54a/0x730
 [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0
 [<ffffffff811185bf>] kswapd+0x18f/0x490
 [<ffffffff81070961>] ? get_parent_ip+0x11/0x50
 [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50
 [<ffffffff81118430>] ? balance_pgdat+0x730/0x730
 [<ffffffff81060d2b>] kthread+0xdb/0xe0
 [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100
 [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10
 [<ffffffff81060c50>] ? __init_kthread_worker+0x

Since we only use the lock for protecting the vm_list, once we've
found the instance we want, we can shuffle it to the end of the
list and then drop the kvm_lock before taking the mmu_lock.  We
can do this because after the mmu operations are completed, we
break -- i.e. we don't continue list processing, so it doesn't
matter if the list changed around us.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---

[Note1: do double check that this solution makes sense for the
 mainline kernel; consider this an RFC patch that does want a
 review from people in the know.]

[Note2: you'll need to be running a preempt-rt kernel to actually
 see this.  Also note that the above patch is against linux-next.
 Alternate solutions welcome ; this seemed to me the obvious fix.]

 arch/x86/kvm/mmu.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Paolo Bonzini June 26, 2013, 8:10 a.m. UTC | #1
Il 26/06/2013 00:34, Paul Gortmaker ha scritto:
> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"),
> the kvm_lock was made a raw lock.  However, the kvm mmu_shrink()
> function tries to grab the (non-raw) mmu_lock within the scope of
> the raw locked kvm_lock being held.  This leads to the following:
> 
> BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
> in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
> Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm]
> 
> Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
> Call Trace:
>  [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160
>  [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50
>  [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm]
>  [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0
>  [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260
>  [<ffffffff8111824a>] balance_pgdat+0x54a/0x730
>  [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0
>  [<ffffffff811185bf>] kswapd+0x18f/0x490
>  [<ffffffff81070961>] ? get_parent_ip+0x11/0x50
>  [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50
>  [<ffffffff81118430>] ? balance_pgdat+0x730/0x730
>  [<ffffffff81060d2b>] kthread+0xdb/0xe0
>  [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100
>  [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10
>  [<ffffffff81060c50>] ? __init_kthread_worker+0x
> 
> Since we only use the lock for protecting the vm_list, once we've
> found the instance we want, we can shuffle it to the end of the
> list and then drop the kvm_lock before taking the mmu_lock.  We
> can do this because after the mmu operations are completed, we
> break -- i.e. we don't continue list processing, so it doesn't
> matter if the list changed around us.
> 
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

Since the shrinker code is asynchronous with respect to KVM, I think
that the kvm_lock here is also protecting against kvm_destroy_vm running
at the same time.

So the patch is almost okay; all that is missing is a
kvm_get_kvm/kvm_put_kvm pair, where the reference is added just before
releasing the kvm_lock.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov June 27, 2013, 11:09 a.m. UTC | #2
On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote:
> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"),
I am copying Jan, the author of the patch. Commit message says:
"Code under this lock requires non-preemptibility", but which code
exactly is this? Is this still true?

> the kvm_lock was made a raw lock.  However, the kvm mmu_shrink()
> function tries to grab the (non-raw) mmu_lock within the scope of
> the raw locked kvm_lock being held.  This leads to the following:
> 
> BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
> in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
> Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm]
> 
> Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
> Call Trace:
>  [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160
>  [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50
>  [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm]
>  [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0
>  [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260
>  [<ffffffff8111824a>] balance_pgdat+0x54a/0x730
>  [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0
>  [<ffffffff811185bf>] kswapd+0x18f/0x490
>  [<ffffffff81070961>] ? get_parent_ip+0x11/0x50
>  [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50
>  [<ffffffff81118430>] ? balance_pgdat+0x730/0x730
>  [<ffffffff81060d2b>] kthread+0xdb/0xe0
>  [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100
>  [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10
>  [<ffffffff81060c50>] ? __init_kthread_worker+0x
> 
> Since we only use the lock for protecting the vm_list, once we've
> found the instance we want, we can shuffle it to the end of the
> list and then drop the kvm_lock before taking the mmu_lock.  We
> can do this because after the mmu operations are completed, we
> break -- i.e. we don't continue list processing, so it doesn't
> matter if the list changed around us.
> 
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
> 
> [Note1: do double check that this solution makes sense for the
>  mainline kernel; consider this an RFC patch that does want a
>  review from people in the know.]
> 
> [Note2: you'll need to be running a preempt-rt kernel to actually
>  see this.  Also note that the above patch is against linux-next.
>  Alternate solutions welcome ; this seemed to me the obvious fix.]
> 
>  arch/x86/kvm/mmu.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 748e0d8..db93a70 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -4322,6 +4322,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  {
>  	struct kvm *kvm;
>  	int nr_to_scan = sc->nr_to_scan;
> +	int found = 0;
>  	unsigned long freed = 0;
>  
>  	raw_spin_lock(&kvm_lock);
> @@ -4349,6 +4350,12 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  			continue;
>  
>  		idx = srcu_read_lock(&kvm->srcu);
> +
> +		list_move_tail(&kvm->vm_list, &vm_list);
> +		found = 1;
> +		/* We can't be holding a raw lock and take non-raw mmu_lock */
> +		raw_spin_unlock(&kvm_lock);
> +
>  		spin_lock(&kvm->mmu_lock);
>  
>  		if (kvm_has_zapped_obsolete_pages(kvm)) {
> @@ -4370,11 +4377,12 @@ unlock:
>  		 * per-vm shrinkers cry out
>  		 * sadness comes quickly
>  		 */
> -		list_move_tail(&kvm->vm_list, &vm_list);
>  		break;
>  	}
>  
> -	raw_spin_unlock(&kvm_lock);
> +	if (!found)
> +		raw_spin_unlock(&kvm_lock);
> +
>  	return freed;
>  
>  }
> -- 
> 1.8.1.2

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini June 27, 2013, 11:38 a.m. UTC | #3
Il 27/06/2013 13:09, Gleb Natapov ha scritto:
> On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote:
>> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"),
> I am copying Jan, the author of the patch. Commit message says:
> "Code under this lock requires non-preemptibility", but which code
> exactly is this? Is this still true?

hardware_enable_nolock/hardware_disable_nolock does.

Paolo

>> the kvm_lock was made a raw lock.  However, the kvm mmu_shrink()
>> function tries to grab the (non-raw) mmu_lock within the scope of
>> the raw locked kvm_lock being held.  This leads to the following:
>>
>> BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
>> in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
>> Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm]
>>
>> Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
>> Call Trace:
>>  [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160
>>  [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50
>>  [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm]
>>  [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0
>>  [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260
>>  [<ffffffff8111824a>] balance_pgdat+0x54a/0x730
>>  [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0
>>  [<ffffffff811185bf>] kswapd+0x18f/0x490
>>  [<ffffffff81070961>] ? get_parent_ip+0x11/0x50
>>  [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50
>>  [<ffffffff81118430>] ? balance_pgdat+0x730/0x730
>>  [<ffffffff81060d2b>] kthread+0xdb/0xe0
>>  [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100
>>  [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10
>>  [<ffffffff81060c50>] ? __init_kthread_worker+0x
>>
>> Since we only use the lock for protecting the vm_list, once we've
>> found the instance we want, we can shuffle it to the end of the
>> list and then drop the kvm_lock before taking the mmu_lock.  We
>> can do this because after the mmu operations are completed, we
>> break -- i.e. we don't continue list processing, so it doesn't
>> matter if the list changed around us.
>>
>> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
>> ---
>>
>> [Note1: do double check that this solution makes sense for the
>>  mainline kernel; consider this an RFC patch that does want a
>>  review from people in the know.]
>>
>> [Note2: you'll need to be running a preempt-rt kernel to actually
>>  see this.  Also note that the above patch is against linux-next.
>>  Alternate solutions welcome ; this seemed to me the obvious fix.]
>>
>>  arch/x86/kvm/mmu.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 748e0d8..db93a70 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -4322,6 +4322,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>>  {
>>  	struct kvm *kvm;
>>  	int nr_to_scan = sc->nr_to_scan;
>> +	int found = 0;
>>  	unsigned long freed = 0;
>>  
>>  	raw_spin_lock(&kvm_lock);
>> @@ -4349,6 +4350,12 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>>  			continue;
>>  
>>  		idx = srcu_read_lock(&kvm->srcu);
>> +
>> +		list_move_tail(&kvm->vm_list, &vm_list);
>> +		found = 1;
>> +		/* We can't be holding a raw lock and take non-raw mmu_lock */
>> +		raw_spin_unlock(&kvm_lock);
>> +
>>  		spin_lock(&kvm->mmu_lock);
>>  
>>  		if (kvm_has_zapped_obsolete_pages(kvm)) {
>> @@ -4370,11 +4377,12 @@ unlock:
>>  		 * per-vm shrinkers cry out
>>  		 * sadness comes quickly
>>  		 */
>> -		list_move_tail(&kvm->vm_list, &vm_list);
>>  		break;
>>  	}
>>  
>> -	raw_spin_unlock(&kvm_lock);
>> +	if (!found)
>> +		raw_spin_unlock(&kvm_lock);
>> +
>>  	return freed;
>>  
>>  }
>> -- 
>> 1.8.1.2
> 
> --
> 			Gleb.
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov June 27, 2013, 11:43 a.m. UTC | #4
On Thu, Jun 27, 2013 at 01:38:29PM +0200, Paolo Bonzini wrote:
> Il 27/06/2013 13:09, Gleb Natapov ha scritto:
> > On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote:
> >> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"),
> > I am copying Jan, the author of the patch. Commit message says:
> > "Code under this lock requires non-preemptibility", but which code
> > exactly is this? Is this still true?
> 
> hardware_enable_nolock/hardware_disable_nolock does.
> 
I suspected this will be the answer and prepared another question :)
From a glance kvm_lock is used to protect those just to avoid creating
separate lock, so why not create raw one to protect them and change
kvm_lock to non raw again. Admittedly I haven't looked too close into
this yet.

> Paolo
> 
> >> the kvm_lock was made a raw lock.  However, the kvm mmu_shrink()
> >> function tries to grab the (non-raw) mmu_lock within the scope of
> >> the raw locked kvm_lock being held.  This leads to the following:
> >>
> >> BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
> >> in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
> >> Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm]
> >>
> >> Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
> >> Call Trace:
> >>  [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160
> >>  [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50
> >>  [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm]
> >>  [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0
> >>  [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260
> >>  [<ffffffff8111824a>] balance_pgdat+0x54a/0x730
> >>  [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0
> >>  [<ffffffff811185bf>] kswapd+0x18f/0x490
> >>  [<ffffffff81070961>] ? get_parent_ip+0x11/0x50
> >>  [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50
> >>  [<ffffffff81118430>] ? balance_pgdat+0x730/0x730
> >>  [<ffffffff81060d2b>] kthread+0xdb/0xe0
> >>  [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100
> >>  [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10
> >>  [<ffffffff81060c50>] ? __init_kthread_worker+0x
> >>
> >> Since we only use the lock for protecting the vm_list, once we've
> >> found the instance we want, we can shuffle it to the end of the
> >> list and then drop the kvm_lock before taking the mmu_lock.  We
> >> can do this because after the mmu operations are completed, we
> >> break -- i.e. we don't continue list processing, so it doesn't
> >> matter if the list changed around us.
> >>
> >> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> >> ---
> >>
> >> [Note1: do double check that this solution makes sense for the
> >>  mainline kernel; consider this an RFC patch that does want a
> >>  review from people in the know.]
> >>
> >> [Note2: you'll need to be running a preempt-rt kernel to actually
> >>  see this.  Also note that the above patch is against linux-next.
> >>  Alternate solutions welcome ; this seemed to me the obvious fix.]
> >>
> >>  arch/x86/kvm/mmu.c | 12 ++++++++++--
> >>  1 file changed, 10 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> >> index 748e0d8..db93a70 100644
> >> --- a/arch/x86/kvm/mmu.c
> >> +++ b/arch/x86/kvm/mmu.c
> >> @@ -4322,6 +4322,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> >>  {
> >>  	struct kvm *kvm;
> >>  	int nr_to_scan = sc->nr_to_scan;
> >> +	int found = 0;
> >>  	unsigned long freed = 0;
> >>  
> >>  	raw_spin_lock(&kvm_lock);
> >> @@ -4349,6 +4350,12 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> >>  			continue;
> >>  
> >>  		idx = srcu_read_lock(&kvm->srcu);
> >> +
> >> +		list_move_tail(&kvm->vm_list, &vm_list);
> >> +		found = 1;
> >> +		/* We can't be holding a raw lock and take non-raw mmu_lock */
> >> +		raw_spin_unlock(&kvm_lock);
> >> +
> >>  		spin_lock(&kvm->mmu_lock);
> >>  
> >>  		if (kvm_has_zapped_obsolete_pages(kvm)) {
> >> @@ -4370,11 +4377,12 @@ unlock:
> >>  		 * per-vm shrinkers cry out
> >>  		 * sadness comes quickly
> >>  		 */
> >> -		list_move_tail(&kvm->vm_list, &vm_list);
> >>  		break;
> >>  	}
> >>  
> >> -	raw_spin_unlock(&kvm_lock);
> >> +	if (!found)
> >> +		raw_spin_unlock(&kvm_lock);
> >> +
> >>  	return freed;
> >>  
> >>  }
> >> -- 
> >> 1.8.1.2
> > 
> > --
> > 			Gleb.
> > 

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini June 27, 2013, 11:54 a.m. UTC | #5
Il 27/06/2013 13:43, Gleb Natapov ha scritto:
>>> > > I am copying Jan, the author of the patch. Commit message says:
>>> > > "Code under this lock requires non-preemptibility", but which code
>>> > > exactly is this? Is this still true?
>> > 
>> > hardware_enable_nolock/hardware_disable_nolock does.
>> > 
> I suspected this will be the answer and prepared another question :)
> From a glance kvm_lock is used to protect those just to avoid creating
> separate lock, so why not create raw one to protect them and change
> kvm_lock to non raw again. Admittedly I haven't looked too close into
> this yet.

I was wondering the same, but I think it's fine.  There's just a handful
of uses outside virt/kvm/kvm_main.c.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka June 27, 2013, 12:16 p.m. UTC | #6
On 2013-06-27 13:38, Paolo Bonzini wrote:
> Il 27/06/2013 13:09, Gleb Natapov ha scritto:
>> On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote:
>>> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"),
>> I am copying Jan, the author of the patch. Commit message says:
>> "Code under this lock requires non-preemptibility", but which code
>> exactly is this? Is this still true?
> 
> hardware_enable_nolock/hardware_disable_nolock does.

IIRC, also the loop in kvmclock_cpufreq_notifier needs it because it
reads the processor ID of the caller. That implies the caller cannot be
preempted, but theses days a migration lock should be fine as well.

Jan
Gleb Natapov June 27, 2013, 12:32 p.m. UTC | #7
On Thu, Jun 27, 2013 at 02:16:07PM +0200, Jan Kiszka wrote:
> On 2013-06-27 13:38, Paolo Bonzini wrote:
> > Il 27/06/2013 13:09, Gleb Natapov ha scritto:
> >> On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote:
> >>> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"),
> >> I am copying Jan, the author of the patch. Commit message says:
> >> "Code under this lock requires non-preemptibility", but which code
> >> exactly is this? Is this still true?
> > 
> > hardware_enable_nolock/hardware_disable_nolock does.
> 
> IIRC, also the loop in kvmclock_cpufreq_notifier needs it because it
> reads the processor ID of the caller. That implies the caller cannot be
> preempted, but theses days a migration lock should be fine as well.
> 
OK, adding Marcelo to the party. This code is called from cpufreq
notifier. I would expect that it will be called from the context that
prevents migration to another cpu.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini June 27, 2013, 1 p.m. UTC | #8
Il 27/06/2013 14:32, Gleb Natapov ha scritto:
>>>>> > >>> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"),
>>>> > >> I am copying Jan, the author of the patch. Commit message says:
>>>> > >> "Code under this lock requires non-preemptibility", but which code
>>>> > >> exactly is this? Is this still true?
>>> > > 
>>> > > hardware_enable_nolock/hardware_disable_nolock does.
>> > 
>> > IIRC, also the loop in kvmclock_cpufreq_notifier needs it because it
>> > reads the processor ID of the caller. That implies the caller cannot be
>> > preempted, but theses days a migration lock should be fine as well.
>> > 
> OK, adding Marcelo to the party. This code is called from cpufreq
> notifier. I would expect that it will be called from the context that
> prevents migration to another cpu.

No, the CPU is in freq->cpu and may not even be the CPU that changed
frequency.

But even then I'm not sure the loop needs to be non-preemptible.  If it
were, the smp_call_function_single just before/after the loop would have
to be non-preemptable as well.  So it is just an optimization and it can
use raw_smp_processor_id() instead.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini June 27, 2013, 1:01 p.m. UTC | #9
Il 27/06/2013 15:00, Paolo Bonzini ha scritto:
> Il 27/06/2013 14:32, Gleb Natapov ha scritto:
>>>>>>>>>> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"),
>>>>>>>> I am copying Jan, the author of the patch. Commit message says:
>>>>>>>> "Code under this lock requires non-preemptibility", but which code
>>>>>>>> exactly is this? Is this still true?
>>>>>>
>>>>>> hardware_enable_nolock/hardware_disable_nolock does.
>>>>
>>>> IIRC, also the loop in kvmclock_cpufreq_notifier needs it because it
>>>> reads the processor ID of the caller. That implies the caller cannot be
>>>> preempted, but theses days a migration lock should be fine as well.
>>>>
>> OK, adding Marcelo to the party. This code is called from cpufreq
>> notifier. I would expect that it will be called from the context that
>> prevents migration to another cpu.
> 
> No, the CPU is in freq->cpu and may not even be the CPU that changed
> frequency.

Try again: "No, the CPU is in freq->cpu and smp_processor_id() may not
even be the CPU that changed frequency".  It probably makes more sense now.

Paolo

> But even then I'm not sure the loop needs to be non-preemptible.  If it
> were, the smp_call_function_single just before/after the loop would have
> to be non-preemptable as well.  So it is just an optimization and it can
> use raw_smp_processor_id() instead.
> 
> Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 748e0d8..db93a70 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4322,6 +4322,7 @@  mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 {
 	struct kvm *kvm;
 	int nr_to_scan = sc->nr_to_scan;
+	int found = 0;
 	unsigned long freed = 0;
 
 	raw_spin_lock(&kvm_lock);
@@ -4349,6 +4350,12 @@  mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 			continue;
 
 		idx = srcu_read_lock(&kvm->srcu);
+
+		list_move_tail(&kvm->vm_list, &vm_list);
+		found = 1;
+		/* We can't be holding a raw lock and take non-raw mmu_lock */
+		raw_spin_unlock(&kvm_lock);
+
 		spin_lock(&kvm->mmu_lock);
 
 		if (kvm_has_zapped_obsolete_pages(kvm)) {
@@ -4370,11 +4377,12 @@  unlock:
 		 * per-vm shrinkers cry out
 		 * sadness comes quickly
 		 */
-		list_move_tail(&kvm->vm_list, &vm_list);
 		break;
 	}
 
-	raw_spin_unlock(&kvm_lock);
+	if (!found)
+		raw_spin_unlock(&kvm_lock);
+
 	return freed;
 
 }