diff mbox series

KVM: VMX: switch wakeup_vcpus_on_cpu_lock to raw spinlock

Message ID 20220107175114.GA261406@fuller.cnet (mailing list archive)
State New, archived
Headers show
Series KVM: VMX: switch wakeup_vcpus_on_cpu_lock to raw spinlock | expand

Commit Message

Marcelo Tosatti Jan. 7, 2022, 5:51 p.m. UTC
wakeup_vcpus_on_cpu_lock is taken from hard interrupt context 
(pi_wakeup_handler), therefore it cannot sleep.

Switch it to a raw spinlock.

Fixes:

[41297.066254] BUG: scheduling while atomic: CPU 0/KVM/635218/0x00010001 
[41297.066323] Preemption disabled at: 
[41297.066324] [<ffffffff902ee47f>] irq_enter_rcu+0xf/0x60 
[41297.066339] Call Trace: 
[41297.066342]  <IRQ> 
[41297.066346]  dump_stack_lvl+0x34/0x44 
[41297.066353]  ? irq_enter_rcu+0xf/0x60 
[41297.066356]  __schedule_bug.cold+0x7d/0x8b 
[41297.066361]  __schedule+0x439/0x5b0 
[41297.066365]  ? task_blocks_on_rt_mutex.constprop.0.isra.0+0x1b0/0x440 
[41297.066369]  schedule_rtlock+0x1e/0x40 
[41297.066371]  rtlock_slowlock_locked+0xf1/0x260 
[41297.066374]  rt_spin_lock+0x3b/0x60 
[41297.066378]  pi_wakeup_handler+0x31/0x90 [kvm_intel] 
[41297.066388]  sysvec_kvm_posted_intr_wakeup_ipi+0x9d/0xd0 
[41297.066392]  </IRQ> 
[41297.066392]  asm_sysvec_kvm_posted_intr_wakeup_ipi+0x12/0x20 
...

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Comments

Sebastian Sewior Jan. 11, 2022, 11:36 a.m. UTC | #1
On 2022-01-07 14:51:14 [-0300], Marcelo Tosatti wrote:
> 
> wakeup_vcpus_on_cpu_lock is taken from hard interrupt context 
> (pi_wakeup_handler), therefore it cannot sleep.
> 
> Switch it to a raw spinlock.
> 
> Fixes:
> 
> [41297.066254] BUG: scheduling while atomic: CPU 0/KVM/635218/0x00010001 
> [41297.066323] Preemption disabled at: 
> [41297.066324] [<ffffffff902ee47f>] irq_enter_rcu+0xf/0x60 
> [41297.066339] Call Trace: 
> [41297.066342]  <IRQ> 
> [41297.066346]  dump_stack_lvl+0x34/0x44 
> [41297.066353]  ? irq_enter_rcu+0xf/0x60 
> [41297.066356]  __schedule_bug.cold+0x7d/0x8b 
> [41297.066361]  __schedule+0x439/0x5b0 
> [41297.066365]  ? task_blocks_on_rt_mutex.constprop.0.isra.0+0x1b0/0x440 
> [41297.066369]  schedule_rtlock+0x1e/0x40 
> [41297.066371]  rtlock_slowlock_locked+0xf1/0x260 
> [41297.066374]  rt_spin_lock+0x3b/0x60 
> [41297.066378]  pi_wakeup_handler+0x31/0x90 [kvm_intel] 
> [41297.066388]  sysvec_kvm_posted_intr_wakeup_ipi+0x9d/0xd0 
> [41297.066392]  </IRQ> 
> [41297.066392]  asm_sysvec_kvm_posted_intr_wakeup_ipi+0x12/0x20 
> ...
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

so I have here v5.16 and no wakeup_vcpus_on_cpu_lock. It was also not
removed so this patch is not intended for a previous kernel. Also
checked next-20220111 and no wakeup_vcpus_on_cpu_lock.

Sebastian
Marcelo Tosatti Jan. 11, 2022, 12:13 p.m. UTC | #2
On Tue, Jan 11, 2022 at 12:36:35PM +0100, Sebastian Andrzej Siewior wrote:
> On 2022-01-07 14:51:14 [-0300], Marcelo Tosatti wrote:
> > 
> > wakeup_vcpus_on_cpu_lock is taken from hard interrupt context 
> > (pi_wakeup_handler), therefore it cannot sleep.
> > 
> > Switch it to a raw spinlock.
> > 
> > Fixes:
> > 
> > [41297.066254] BUG: scheduling while atomic: CPU 0/KVM/635218/0x00010001 
> > [41297.066323] Preemption disabled at: 
> > [41297.066324] [<ffffffff902ee47f>] irq_enter_rcu+0xf/0x60 
> > [41297.066339] Call Trace: 
> > [41297.066342]  <IRQ> 
> > [41297.066346]  dump_stack_lvl+0x34/0x44 
> > [41297.066353]  ? irq_enter_rcu+0xf/0x60 
> > [41297.066356]  __schedule_bug.cold+0x7d/0x8b 
> > [41297.066361]  __schedule+0x439/0x5b0 
> > [41297.066365]  ? task_blocks_on_rt_mutex.constprop.0.isra.0+0x1b0/0x440 
> > [41297.066369]  schedule_rtlock+0x1e/0x40 
> > [41297.066371]  rtlock_slowlock_locked+0xf1/0x260 
> > [41297.066374]  rt_spin_lock+0x3b/0x60 
> > [41297.066378]  pi_wakeup_handler+0x31/0x90 [kvm_intel] 
> > [41297.066388]  sysvec_kvm_posted_intr_wakeup_ipi+0x9d/0xd0 
> > [41297.066392]  </IRQ> 
> > [41297.066392]  asm_sysvec_kvm_posted_intr_wakeup_ipi+0x12/0x20 
> > ...
> > 
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> so I have here v5.16 and no wakeup_vcpus_on_cpu_lock. It was also not
> removed so this patch is not intended for a previous kernel. Also
> checked next-20220111 and no wakeup_vcpus_on_cpu_lock.
> 
> Sebastian
> 
> 

Sebastian,

The variable has been renamed on kvm.git/queue (sorry for not mentioning
that).

I suppose Paolo can integrate through kvm.git.
Paolo Bonzini Jan. 18, 2022, 9:56 a.m. UTC | #3
On 1/7/22 18:51, Marcelo Tosatti wrote:
> 
> wakeup_vcpus_on_cpu_lock is taken from hard interrupt context
> (pi_wakeup_handler), therefore it cannot sleep.
> 
> Switch it to a raw spinlock.
> 
> Fixes:
> 
> [41297.066254] BUG: scheduling while atomic: CPU 0/KVM/635218/0x00010001
> [41297.066323] Preemption disabled at:
> [41297.066324] [<ffffffff902ee47f>] irq_enter_rcu+0xf/0x60
> [41297.066339] Call Trace:
> [41297.066342]  <IRQ>
> [41297.066346]  dump_stack_lvl+0x34/0x44
> [41297.066353]  ? irq_enter_rcu+0xf/0x60
> [41297.066356]  __schedule_bug.cold+0x7d/0x8b
> [41297.066361]  __schedule+0x439/0x5b0
> [41297.066365]  ? task_blocks_on_rt_mutex.constprop.0.isra.0+0x1b0/0x440
> [41297.066369]  schedule_rtlock+0x1e/0x40
> [41297.066371]  rtlock_slowlock_locked+0xf1/0x260
> [41297.066374]  rt_spin_lock+0x3b/0x60
> [41297.066378]  pi_wakeup_handler+0x31/0x90 [kvm_intel]
> [41297.066388]  sysvec_kvm_posted_intr_wakeup_ipi+0x9d/0xd0
> [41297.066392]  </IRQ>
> [41297.066392]  asm_sysvec_kvm_posted_intr_wakeup_ipi+0x12/0x20
> ...
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
> index f4169c009400..aa1fe9085d77 100644
> --- a/arch/x86/kvm/vmx/posted_intr.c
> +++ b/arch/x86/kvm/vmx/posted_intr.c
> @@ -27,7 +27,7 @@ static DEFINE_PER_CPU(struct list_head, wakeup_vcpus_on_cpu);
>    * CPU.  IRQs must be disabled when taking this lock, otherwise deadlock will
>    * occur if a wakeup IRQ arrives and attempts to acquire the lock.
>    */
> -static DEFINE_PER_CPU(spinlock_t, wakeup_vcpus_on_cpu_lock);
> +static DEFINE_PER_CPU(raw_spinlock_t, wakeup_vcpus_on_cpu_lock);
>   
>   static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
>   {
> @@ -87,9 +87,9 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
>   	 * current pCPU if the task was migrated.
>   	 */
>   	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
> -		spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> +		raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
>   		list_del(&vmx->pi_wakeup_list);
> -		spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> +		raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
>   	}
>   
>   	dest = cpu_physical_id(cpu);
> @@ -149,10 +149,10 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
>   
>   	local_irq_save(flags);
>   
> -	spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> +	raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
>   	list_add_tail(&vmx->pi_wakeup_list,
>   		      &per_cpu(wakeup_vcpus_on_cpu, vcpu->cpu));
> -	spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> +	raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
>   
>   	WARN(pi_desc->sn, "PI descriptor SN field set before blocking");
>   
> @@ -204,20 +204,20 @@ void pi_wakeup_handler(void)
>   	int cpu = smp_processor_id();
>   	struct vcpu_vmx *vmx;
>   
> -	spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
> +	raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
>   	list_for_each_entry(vmx, &per_cpu(wakeup_vcpus_on_cpu, cpu),
>   			    pi_wakeup_list) {
>   
>   		if (pi_test_on(&vmx->pi_desc))
>   			kvm_vcpu_wake_up(&vmx->vcpu);
>   	}
> -	spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
> +	raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
>   }
>   
>   void __init pi_init_cpu(int cpu)
>   {
>   	INIT_LIST_HEAD(&per_cpu(wakeup_vcpus_on_cpu, cpu));
> -	spin_lock_init(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
> +	raw_spin_lock_init(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
>   }
>   
>   bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu)
> 

Queued, thanks.

Paolo
Paolo Bonzini Jan. 18, 2022, 4:24 p.m. UTC | #4
On 1/11/22 12:36, Sebastian Andrzej Siewior wrote:
> On 2022-01-07 14:51:14 [-0300], Marcelo Tosatti wrote:
>>
>> wakeup_vcpus_on_cpu_lock is taken from hard interrupt context
>> (pi_wakeup_handler), therefore it cannot sleep.
>>
>> Switch it to a raw spinlock.
>>
>> Fixes:
>>
>> [41297.066254] BUG: scheduling while atomic: CPU 0/KVM/635218/0x00010001
>> [41297.066323] Preemption disabled at:
>> [41297.066324] [<ffffffff902ee47f>] irq_enter_rcu+0xf/0x60
>> [41297.066339] Call Trace:
>> [41297.066342]  <IRQ>
>> [41297.066346]  dump_stack_lvl+0x34/0x44
>> [41297.066353]  ? irq_enter_rcu+0xf/0x60
>> [41297.066356]  __schedule_bug.cold+0x7d/0x8b
>> [41297.066361]  __schedule+0x439/0x5b0
>> [41297.066365]  ? task_blocks_on_rt_mutex.constprop.0.isra.0+0x1b0/0x440
>> [41297.066369]  schedule_rtlock+0x1e/0x40
>> [41297.066371]  rtlock_slowlock_locked+0xf1/0x260
>> [41297.066374]  rt_spin_lock+0x3b/0x60
>> [41297.066378]  pi_wakeup_handler+0x31/0x90 [kvm_intel]
>> [41297.066388]  sysvec_kvm_posted_intr_wakeup_ipi+0x9d/0xd0
>> [41297.066392]  </IRQ>
>> [41297.066392]  asm_sysvec_kvm_posted_intr_wakeup_ipi+0x12/0x20
>> ...
>>
>> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> so I have here v5.16 and no wakeup_vcpus_on_cpu_lock. It was also not
> removed so this patch is not intended for a previous kernel. Also
> checked next-20220111 and no wakeup_vcpus_on_cpu_lock.

Since this patch is good for stable@ too, I did the backport myself and 
queued Marcelo's version through a merge commit.  You can find it now in 
kvm.git's master (5.16 version) and next branches (5.17 version).

kvm/queue has been rebased on top as well.

Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index f4169c009400..aa1fe9085d77 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -27,7 +27,7 @@  static DEFINE_PER_CPU(struct list_head, wakeup_vcpus_on_cpu);
  * CPU.  IRQs must be disabled when taking this lock, otherwise deadlock will
  * occur if a wakeup IRQ arrives and attempts to acquire the lock.
  */
-static DEFINE_PER_CPU(spinlock_t, wakeup_vcpus_on_cpu_lock);
+static DEFINE_PER_CPU(raw_spinlock_t, wakeup_vcpus_on_cpu_lock);
 
 static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
 {
@@ -87,9 +87,9 @@  void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 	 * current pCPU if the task was migrated.
 	 */
 	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
-		spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+		raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
 		list_del(&vmx->pi_wakeup_list);
-		spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+		raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
 	}
 
 	dest = cpu_physical_id(cpu);
@@ -149,10 +149,10 @@  static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
 
 	local_irq_save(flags);
 
-	spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+	raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
 	list_add_tail(&vmx->pi_wakeup_list,
 		      &per_cpu(wakeup_vcpus_on_cpu, vcpu->cpu));
-	spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+	raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
 
 	WARN(pi_desc->sn, "PI descriptor SN field set before blocking");
 
@@ -204,20 +204,20 @@  void pi_wakeup_handler(void)
 	int cpu = smp_processor_id();
 	struct vcpu_vmx *vmx;
 
-	spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
+	raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
 	list_for_each_entry(vmx, &per_cpu(wakeup_vcpus_on_cpu, cpu),
 			    pi_wakeup_list) {
 
 		if (pi_test_on(&vmx->pi_desc))
 			kvm_vcpu_wake_up(&vmx->vcpu);
 	}
-	spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
+	raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
 }
 
 void __init pi_init_cpu(int cpu)
 {
 	INIT_LIST_HEAD(&per_cpu(wakeup_vcpus_on_cpu, cpu));
-	spin_lock_init(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
+	raw_spin_lock_init(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
 }
 
 bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu)