From patchwork Mon Sep 19 08:15:08 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Borntraeger X-Patchwork-Id: 9338757 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9078D6022E for ; Mon, 19 Sep 2016 08:16:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 814A128AE4 for ; Mon, 19 Sep 2016 08:16:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 75D2128C62; Mon, 19 Sep 2016 08:16:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=unavailable version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 08F2928AE4 for ; Mon, 19 Sep 2016 08:16:30 +0000 (UTC) Received: from localhost ([::1]:53567 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bltkb-0005jg-EH for patchwork-qemu-devel@patchwork.kernel.org; Mon, 19 Sep 2016 04:16:29 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51423) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bltjX-0005FR-1u for qemu-devel@nongnu.org; Mon, 19 Sep 2016 04:15:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bltjR-0000DF-SZ for qemu-devel@nongnu.org; Mon, 19 Sep 2016 04:15:21 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:51174) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bltjR-0000D1-KR for qemu-devel@nongnu.org; Mon, 19 Sep 2016 04:15:17 -0400 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id u8J884Hd018439 for ; Mon, 19 Sep 2016 04:15:16 -0400 Received: from e06smtp12.uk.ibm.com (e06smtp12.uk.ibm.com [195.75.94.108]) by mx0a-001b2d01.pphosted.com with ESMTP id 25h05a105a-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 19 Sep 2016 04:15:15 -0400 Received: from localhost by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 19 Sep 2016 09:15:13 +0100 Received: from d06dlp02.portsmouth.uk.ibm.com (9.149.20.14) by e06smtp12.uk.ibm.com (192.168.101.142) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 19 Sep 2016 09:15:11 +0100 Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by d06dlp02.portsmouth.uk.ibm.com (Postfix) with ESMTP id 80CD72190063 for ; Mon, 19 Sep 2016 09:14:30 +0100 (BST) Received: from d06av06.portsmouth.uk.ibm.com (d06av06.portsmouth.uk.ibm.com [9.149.37.217]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u8J8FA5R15991206 for ; Mon, 19 Sep 2016 08:15:10 GMT Received: from d06av06.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u8J8F97a013832 for ; Mon, 19 Sep 2016 04:15:09 -0400 Received: from oc1450873852.ibm.com (dyn-9-152-224-26.boeblingen.de.ibm.com [9.152.224.26]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id u8J8F9cG013775; Mon, 19 Sep 2016 04:15:09 -0400 To: David Hildenbrand References: <33773797-04ec-413f-7ba2-4bb7a4350a44@de.ibm.com> <20160915212142.5fd5048e@thinkpad-w530> From: Christian Borntraeger Date: Mon, 19 Sep 2016 10:15:08 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <20160915212142.5fd5048e@thinkpad-w530> X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16091908-0008-0000-0000-000002CD7A2F X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16091908-0009-0000-0000-00001A01B2B8 Message-Id: <18e8d5ed-c6f7-4617-0426-be203beb1965@de.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-09-19_05:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609020000 definitions=main-1609190116 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] X-Received-From: 148.163.156.1 Subject: Re: [Qemu-devel] [s390] possible deadlock in handle_sigp? X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Cornelia Huck , Paolo Bonzini , qemu-devel , KVM list Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP On 09/15/2016 09:21 PM, David Hildenbrand wrote: >> On 09/12/2016 08:03 PM, Paolo Bonzini wrote: >>> >>> >>> On 12/09/2016 19:37, Christian Borntraeger wrote: >>>> On 09/12/2016 06:44 PM, Paolo Bonzini wrote: >>>>> I think that two CPUs doing reciprocal SIGPs could in principle end up >>>>> waiting on each other to complete their run_on_cpu. If the SIGP has to >>>>> be synchronous the fix is not trivial (you'd have to put the CPU in a >>>>> state similar to cpu->halted = 1), otherwise it's enough to replace >>>>> run_on_cpu with async_run_on_cpu. >>>> >>>> IIRC the sigps are supossed to be serialized by the big QEMU lock. WIll >>>> have a look. >>> >>> Yes, but run_on_cpu drops it when it waits on the qemu_work_cond >>> condition variable. (Related: I stumbled upon it because I wanted to >>> remove the BQL from run_on_cpu work items). >> >> Yes, seems you are right. If both CPUs have just exited from KVM doing a >> crossover sigp, they will do the arch_exit handling before the run_on_cpu >> stuff which might result in this hang. (luckily it seems quite unlikely >> but still we need to fix it). >> We cannot simply use async as the callbacks also provide the condition >> code for the initiater, so this requires some rework. >> >> > > Smells like having to provide a lock per CPU. Trylock that lock, if that's not > possible, cc=busy. SIGP SET ARCHITECTURE has to lock all CPUs. > > That was the initital design, until I realized that this was all protected by > the BQL. > > David We only do the slow path things in QEMU. Maybe we could just have one lock that we trylock and return a condition code of 2 (busy) if we fail. That seems the most simple solution while still being architecturally correct. Something like diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c index f348745..5706218 100644 --- a/target-s390x/kvm.c +++ b/target-s390x/kvm.c @@ -133,6 +133,8 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = { KVM_CAP_LAST_INFO }; +static QemuMutex qemu_sigp_mutex; + static int cap_sync_regs; static int cap_async_pf; static int cap_mem_op; @@ -358,6 +360,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s) rc = compat_disable_facilities(s, fac_mask, ARRAY_SIZE(fac_mask)); } + qemu_mutex_init(&qemu_sigp_mutex); + return rc; } @@ -1845,6 +1849,11 @@ static int handle_sigp(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1) status_reg = &env->regs[r1]; param = (r1 % 2) ? env->regs[r1] : env->regs[r1 + 1]; + if (qemu_mutex_trylock(&qemu_sigp_mutex)) { + setcc(cpu, SIGP_CC_BUSY ); + return 0; + } + switch (order) { case SIGP_SET_ARCH: ret = sigp_set_architecture(cpu, param, status_reg); @@ -1854,6 +1863,7 @@ static int handle_sigp(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1) dst_cpu = s390_cpu_addr2state(env->regs[r3]); ret = handle_sigp_single_dst(dst_cpu, order, param, status_reg); } + qemu_mutex_unlock(&qemu_sigp_mutex); trace_kvm_sigp_finished(order, CPU(cpu)->cpu_index, dst_cpu ? CPU(dst_cpu)->cpu_index : -1, ret);