[s390] possible deadlock in handle_sigp?

Message ID	18e8d5ed-c6f7-4617-0426-be203beb1965@de.ibm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> Gateway: Authorized Use Only! Violators will be prosecuted for <qemu-devel@nongnu.org> from <borntraeger@de.ibm.com>; Mon, 19 Sep 2016 09:15:13 +0100 Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 19 Sep 2016 09:15:11 +0100 To: David Hildenbrand <dahi@linux.vnet.ibm.com> References: <c9438426-1f1e-c6eb-9b90-79a6b62e537a@redhat.com> <a122ac4f-a336-312a-0531-2e2a2a57f5b9@de.ibm.com> <e2702dff-0f51-a97e-e186-c1d125f6ab76@redhat.com> <33773797-04ec-413f-7ba2-4bb7a4350a44@de.ibm.com> <20160915212142.5fd5048e@thinkpad-w530> From: Christian Borntraeger <borntraeger@de.ibm.com> Date: Mon, 19 Sep 2016 10:15:08 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <20160915212142.5fd5048e@thinkpad-w530> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Message-Id: <18e8d5ed-c6f7-4617-0426-be203beb1965@de.ibm.com> Subject: Re: [Qemu-devel] [s390] possible deadlock in handle_sigp? Precedence: list Cc: Cornelia Huck <cornelia.huck@de.ibm.com>, Paolo Bonzini <pbonzini@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, KVM list <kvm@vger.kernel.org> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Message ID

18e8d5ed-c6f7-4617-0426-be203beb1965@de.ibm.com (mailing list archive)

State

New, archived

Headers

To: David Hildenbrand <dahi@linux.vnet.ibm.com>
References: <c9438426-1f1e-c6eb-9b90-79a6b62e537a@redhat.com>
	<a122ac4f-a336-312a-0531-2e2a2a57f5b9@de.ibm.com>
	<e2702dff-0f51-a97e-e186-c1d125f6ab76@redhat.com>
	<33773797-04ec-413f-7ba2-4bb7a4350a44@de.ibm.com>
	<20160915212142.5fd5048e@thinkpad-w530>
From: Christian Borntraeger <borntraeger@de.ibm.com>
Date: Mon, 19 Sep 2016 10:15:08 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
	Thunderbird/45.3.0
MIME-Version: 1.0
In-Reply-To: <20160915212142.5fd5048e@thinkpad-w530>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Message-Id: <18e8d5ed-c6f7-4617-0426-be203beb1965@de.ibm.com>
Subject: Re: [Qemu-devel] [s390] possible deadlock in handle_sigp?
Precedence: list
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>,
	Paolo Bonzini <pbonzini@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, 
	KVM list <kvm@vger.kernel.org>
Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Commit Message

Christian Borntraeger Sept. 19, 2016, 8:15 a.m. UTC

On 09/15/2016 09:21 PM, David Hildenbrand wrote:
>> On 09/12/2016 08:03 PM, Paolo Bonzini wrote:
>>>
>>>
>>> On 12/09/2016 19:37, Christian Borntraeger wrote:  
>>>> On 09/12/2016 06:44 PM, Paolo Bonzini wrote:  
>>>>> I think that two CPUs doing reciprocal SIGPs could in principle end up
>>>>> waiting on each other to complete their run_on_cpu.  If the SIGP has to
>>>>> be synchronous the fix is not trivial (you'd have to put the CPU in a
>>>>> state similar to cpu->halted = 1), otherwise it's enough to replace
>>>>> run_on_cpu with async_run_on_cpu.  
>>>>
>>>> IIRC the sigps are supossed to be serialized by the big QEMU lock. WIll 
>>>> have a look.  
>>>
>>> Yes, but run_on_cpu drops it when it waits on the qemu_work_cond
>>> condition variable.  (Related: I stumbled upon it because I wanted to
>>> remove the BQL from run_on_cpu work items).  
>>
>> Yes, seems you are right. If both CPUs have just exited from KVM doing a
>> crossover sigp, they will do the arch_exit handling before the run_on_cpu
>> stuff which might result in this hang. (luckily it seems quite unlikely 
>> but still we need to fix it).
>> We cannot simply use async as the callbacks also provide the condition
>> code for the initiater, so this requires some rework.
>>
>>
> 
> Smells like having to provide a lock per CPU. Trylock that lock, if that's not
> possible, cc=busy. SIGP SET ARCHITECTURE has to lock all CPUs.
> 
> That was the initital design, until I realized that this was all protected by
> the BQL.
> 
> David

We only do the slow path things in QEMU. Maybe we could just have one lock that
we trylock and return a condition code of 2 (busy) if we fail. That seems the 
most simple solution while still being architecturally correct. Something like

Comments

David Hildenbrand Sept. 19, 2016, 11:25 a.m. UTC | #1

> On 09/15/2016 09:21 PM, David Hildenbrand wrote:
> >> On 09/12/2016 08:03 PM, Paolo Bonzini wrote:  
> >>>
> >>>
> >>> On 12/09/2016 19:37, Christian Borntraeger wrote:    
> >>>> On 09/12/2016 06:44 PM, Paolo Bonzini wrote:    
> >>>>> I think that two CPUs doing reciprocal SIGPs could in principle end up
> >>>>> waiting on each other to complete their run_on_cpu.  If the SIGP has to
> >>>>> be synchronous the fix is not trivial (you'd have to put the CPU in a
> >>>>> state similar to cpu->halted = 1), otherwise it's enough to replace
> >>>>> run_on_cpu with async_run_on_cpu.    
> >>>>
> >>>> IIRC the sigps are supossed to be serialized by the big QEMU lock. WIll 
> >>>> have a look.    
> >>>
> >>> Yes, but run_on_cpu drops it when it waits on the qemu_work_cond
> >>> condition variable.  (Related: I stumbled upon it because I wanted to
> >>> remove the BQL from run_on_cpu work items).    
> >>
> >> Yes, seems you are right. If both CPUs have just exited from KVM doing a
> >> crossover sigp, they will do the arch_exit handling before the run_on_cpu
> >> stuff which might result in this hang. (luckily it seems quite unlikely 
> >> but still we need to fix it).
> >> We cannot simply use async as the callbacks also provide the condition
> >> code for the initiater, so this requires some rework.
> >>
> >>  
> > 
> > Smells like having to provide a lock per CPU. Trylock that lock, if that's not
> > possible, cc=busy. SIGP SET ARCHITECTURE has to lock all CPUs.
> > 
> > That was the initital design, until I realized that this was all protected by
> > the BQL.
> > 
> > David  
> 
> We only do the slow path things in QEMU. Maybe we could just have one lock that
> we trylock and return a condition code of 2 (busy) if we fail. That seems the 
> most simple solution while still being architecturally correct. Something like

According to the architecture, CC=busy is returned in case the access path to
the CPU is busy. So this might not be optimal but should work for now.

> 
> 
> diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
> index f348745..5706218 100644
> --- a/target-s390x/kvm.c
> +++ b/target-s390x/kvm.c
> @@ -133,6 +133,8 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
>      KVM_CAP_LAST_INFO
>  };
> 
> +static QemuMutex qemu_sigp_mutex;
> +
>  static int cap_sync_regs;
>  static int cap_async_pf;
>  static int cap_mem_op;
> @@ -358,6 +360,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>          rc = compat_disable_facilities(s, fac_mask, ARRAY_SIZE(fac_mask));
>      }
> 
> +    qemu_mutex_init(&qemu_sigp_mutex);
> +
>      return rc;
>  }
> 
> @@ -1845,6 +1849,11 @@ static int handle_sigp(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
>      status_reg = &env->regs[r1];
>      param = (r1 % 2) ? env->regs[r1] : env->regs[r1 + 1];
> 
> +    if (qemu_mutex_trylock(&qemu_sigp_mutex)) {
> +        setcc(cpu, SIGP_CC_BUSY );
> +        return 0;
> +    }
> +
>      switch (order) {
>      case SIGP_SET_ARCH:
>          ret = sigp_set_architecture(cpu, param, status_reg);
> @@ -1854,6 +1863,7 @@ static int handle_sigp(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
>          dst_cpu = s390_cpu_addr2state(env->regs[r3]);
>          ret = handle_sigp_single_dst(dst_cpu, order, param, status_reg);
>      }
> +    qemu_mutex_unlock(&qemu_sigp_mutex);
> 
>      trace_kvm_sigp_finished(order, CPU(cpu)->cpu_index,
>                              dst_cpu ? CPU(dst_cpu)->cpu_index : -1, ret);
> 
> 
> 

This makes SET ARCHITECTURE handling much more easier.

David

Christian Borntraeger Sept. 19, 2016, 11:45 a.m. UTC | #2

On 09/19/2016 01:25 PM, David Hildenbrand wrote:
[...]
>>
>> We only do the slow path things in QEMU. Maybe we could just have one lock that
>> we trylock and return a condition code of 2 (busy) if we fail. That seems the 
>> most simple solution while still being architecturally correct. Something like
> 
> According to the architecture, CC=busy is returned in case the access path to
> the CPU is busy. So this might not be optimal but should work for now.
> 
>>
>>
>> diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
>> index f348745..5706218 100644
>> --- a/target-s390x/kvm.c
>> +++ b/target-s390x/kvm.c
>> @@ -133,6 +133,8 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
>>      KVM_CAP_LAST_INFO
>>  };
>>
>> +static QemuMutex qemu_sigp_mutex;
>> +
>>  static int cap_sync_regs;
>>  static int cap_async_pf;
>>  static int cap_mem_op;
>> @@ -358,6 +360,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>          rc = compat_disable_facilities(s, fac_mask, ARRAY_SIZE(fac_mask));
>>      }
>>
>> +    qemu_mutex_init(&qemu_sigp_mutex);
>> +
>>      return rc;
>>  }
>>
>> @@ -1845,6 +1849,11 @@ static int handle_sigp(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
>>      status_reg = &env->regs[r1];
>>      param = (r1 % 2) ? env->regs[r1] : env->regs[r1 + 1];
>>
>> +    if (qemu_mutex_trylock(&qemu_sigp_mutex)) {
>> +        setcc(cpu, SIGP_CC_BUSY );
>> +        return 0;
>> +    }
>> +
>>      switch (order) {
>>      case SIGP_SET_ARCH:
>>          ret = sigp_set_architecture(cpu, param, status_reg);
>> @@ -1854,6 +1863,7 @@ static int handle_sigp(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
>>          dst_cpu = s390_cpu_addr2state(env->regs[r3]);
>>          ret = handle_sigp_single_dst(dst_cpu, order, param, status_reg);
>>      }
>> +    qemu_mutex_unlock(&qemu_sigp_mutex);
>>
>>      trace_kvm_sigp_finished(order, CPU(cpu)->cpu_index,
>>                              dst_cpu ? CPU(dst_cpu)->cpu_index : -1, ret);
>>
>>
>>
> 
> This makes SET ARCHITECTURE handling much more easier.

Yes.

I think doing so in an on top patch is probably safer to keep the fix minimal (e.g. for backports)

diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index f348745..5706218 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -133,6 +133,8 @@  const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
     KVM_CAP_LAST_INFO
 };
 
+static QemuMutex qemu_sigp_mutex;
+
 static int cap_sync_regs;
 static int cap_async_pf;
 static int cap_mem_op;
@@ -358,6 +360,8 @@  int kvm_arch_init(MachineState *ms, KVMState *s)
         rc = compat_disable_facilities(s, fac_mask, ARRAY_SIZE(fac_mask));
     }
 
+    qemu_mutex_init(&qemu_sigp_mutex);
+
     return rc;
 }
 
@@ -1845,6 +1849,11 @@  static int handle_sigp(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
     status_reg = &env->regs[r1];
     param = (r1 % 2) ? env->regs[r1] : env->regs[r1 + 1];
 
+    if (qemu_mutex_trylock(&qemu_sigp_mutex)) {
+        setcc(cpu, SIGP_CC_BUSY );
+        return 0;
+    }
+
     switch (order) {
     case SIGP_SET_ARCH:
         ret = sigp_set_architecture(cpu, param, status_reg);
@@ -1854,6 +1863,7 @@  static int handle_sigp(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
         dst_cpu = s390_cpu_addr2state(env->regs[r3]);
         ret = handle_sigp_single_dst(dst_cpu, order, param, status_reg);
     }
+    qemu_mutex_unlock(&qemu_sigp_mutex);
 
     trace_kvm_sigp_finished(order, CPU(cpu)->cpu_index,
                             dst_cpu ? CPU(dst_cpu)->cpu_index : -1, ret);

[s390] possible deadlock in handle_sigp?

Commit Message

Comments

Patch