From patchwork Mon Apr 1 13:36:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vitaly Kuznetsov X-Patchwork-Id: 10879951 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2497513B5 for ; Mon, 1 Apr 2019 13:38:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0E26528537 for ; Mon, 1 Apr 2019 13:38:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 027ED2861E; Mon, 1 Apr 2019 13:38:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6D96228537 for ; Mon, 1 Apr 2019 13:38:54 +0000 (UTC) Received: from localhost ([127.0.0.1]:52865 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hAx9J-00014P-Qh for patchwork-qemu-devel@patchwork.kernel.org; Mon, 01 Apr 2019 09:38:53 -0400 Received: from eggs.gnu.org ([209.51.188.92]:38588) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hAx7e-0008Nb-NI for qemu-devel@nongnu.org; Mon, 01 Apr 2019 09:37:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hAx7d-0004kV-7W for qemu-devel@nongnu.org; Mon, 01 Apr 2019 09:37:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52478) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hAx7c-0004k5-Sk for qemu-devel@nongnu.org; Mon, 01 Apr 2019 09:37:09 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 507F7C04BE09; Mon, 1 Apr 2019 13:37:07 +0000 (UTC) Received: from vitty.brq.redhat.com (unknown [10.43.2.155]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C95A45C206; Mon, 1 Apr 2019 13:37:01 +0000 (UTC) From: Vitaly Kuznetsov To: qemu-devel@nongnu.org Date: Mon, 1 Apr 2019 15:36:59 +0200 Message-Id: <20190401133659.20421-1-vkuznets@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 01 Apr 2019 13:37:07 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH] ioapic: allow buggy guests mishandling level-triggered interrupts to make progress X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Paolo Bonzini , Liran Alon , "Michael S. Tsirkin" Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP It was found that Hyper-V 2016 on KVM in some configurations (q35 machine + piix4-usb-uhci) hangs on boot. Trace analysis led us to the conclusion that it is mishandling level-triggered interrupt performing EOI without fixing the root cause. This causes immediate re-assertion and L2 VM (which is supposedly expected to fix the cause of the interrupt) is not making any progress. Gory details: https://www.spinics.net/lists/kvm/msg184484.html Turns out we were dealing with similar issues before; in-kernel IOAPIC implementation has commit 184564efae4d ("kvm: ioapic: conditionally delay irq delivery duringeoi broadcast") which describes a very similar issue. Steal the idea from the above mentioned commit for IOAPIC implementation in QEMU. SUCCESSIVE_IRQ_MAX_COUNT, delay and the comment are borrowed as well. Signed-off-by: Vitaly Kuznetsov --- hw/intc/ioapic.c | 43 ++++++++++++++++++++++++++++++- hw/intc/trace-events | 1 + include/hw/i386/ioapic_internal.h | 3 +++ 3 files changed, 46 insertions(+), 1 deletion(-) diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c index 9d75f84d3b..daf45cc8a8 100644 --- a/hw/intc/ioapic.c +++ b/hw/intc/ioapic.c @@ -139,6 +139,15 @@ static void ioapic_service(IOAPICCommonState *s) } } +#define SUCCESSIVE_IRQ_MAX_COUNT 10000 + +static void ioapic_timer(void *opaque) +{ + IOAPICCommonState *s = opaque; + + ioapic_service(s); +} + static void ioapic_set_irq(void *opaque, int vector, int level) { IOAPICCommonState *s = opaque; @@ -227,7 +236,28 @@ void ioapic_eoi_broadcast(int vector) trace_ioapic_clear_remote_irr(n, vector); s->ioredtbl[n] = entry & ~IOAPIC_LVT_REMOTE_IRR; if (!(entry & IOAPIC_LVT_MASKED) && (s->irr & (1 << n))) { - ioapic_service(s); + bool level = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1) + == IOAPIC_TRIGGER_LEVEL; + + ++s->irq_reassert[vector]; + if (!level || + s->irq_reassert[vector] < SUCCESSIVE_IRQ_MAX_COUNT) { + ioapic_service(s); + } else { + /* + * Real hardware does not deliver the interrupt + * immediately during eoi broadcast, and this lets a + * buggy guest make slow progress even if it does not + * correctly handle a level-triggered interrupt. Emulate + * this behavior if we detect an interrupt storm. + */ + trace_ioapic_eoi_delayed_reassert(vector); + timer_mod(s->timer, + qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + + NANOSECONDS_PER_SECOND / 100); + } + } else { + s->irq_reassert[vector] = 0; } } } @@ -401,6 +431,8 @@ static void ioapic_realize(DeviceState *dev, Error **errp) memory_region_init_io(&s->io_memory, OBJECT(s), &ioapic_io_ops, s, "ioapic", 0x1000); + s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, ioapic_timer, s); + qdev_init_gpio_in(dev, ioapic_set_irq, IOAPIC_NUM_PINS); ioapics[ioapic_no] = s; @@ -408,6 +440,14 @@ static void ioapic_realize(DeviceState *dev, Error **errp) qemu_add_machine_init_done_notifier(&s->machine_done); } +static void ioapic_unrealize(DeviceState *dev, Error **errp) +{ + IOAPICCommonState *s = IOAPIC_COMMON(dev); + + timer_del(s->timer); + timer_free(s->timer); +} + static Property ioapic_properties[] = { DEFINE_PROP_UINT8("version", IOAPICCommonState, version, IOAPIC_VER_DEF), DEFINE_PROP_END_OF_LIST(), @@ -419,6 +459,7 @@ static void ioapic_class_init(ObjectClass *klass, void *data) DeviceClass *dc = DEVICE_CLASS(klass); k->realize = ioapic_realize; + k->unrealize = ioapic_unrealize; /* * If APIC is in kernel, we need to update the kernel cache after * migration, otherwise first 24 gsi routes will be invalid. diff --git a/hw/intc/trace-events b/hw/intc/trace-events index a28bdce925..90c9d07c1a 100644 --- a/hw/intc/trace-events +++ b/hw/intc/trace-events @@ -25,6 +25,7 @@ apic_mem_writel(uint64_t addr, uint32_t val) "0x%"PRIx64" = 0x%08x" ioapic_set_remote_irr(int n) "set remote irr for pin %d" ioapic_clear_remote_irr(int n, int vector) "clear remote irr for pin %d vector %d" ioapic_eoi_broadcast(int vector) "EOI broadcast for vector %d" +ioapic_eoi_delayed_reassert(int vector) "delayed reassert on EOI broadcast for vector %d" ioapic_mem_read(uint8_t addr, uint8_t regsel, uint8_t size, uint32_t val) "ioapic mem read addr 0x%"PRIx8" regsel: 0x%"PRIx8" size 0x%"PRIx8" retval 0x%"PRIx32 ioapic_mem_write(uint8_t addr, uint8_t regsel, uint8_t size, uint32_t val) "ioapic mem write addr 0x%"PRIx8" regsel: 0x%"PRIx8" size 0x%"PRIx8" val 0x%"PRIx32 ioapic_set_irq(int vector, int level) "vector: %d level: %d" diff --git a/include/hw/i386/ioapic_internal.h b/include/hw/i386/ioapic_internal.h index 9848f391bb..e0ee88db40 100644 --- a/include/hw/i386/ioapic_internal.h +++ b/include/hw/i386/ioapic_internal.h @@ -96,6 +96,7 @@ typedef struct IOAPICCommonClass { SysBusDeviceClass parent_class; DeviceRealize realize; + DeviceUnrealize unrealize; void (*pre_save)(IOAPICCommonState *s); void (*post_load)(IOAPICCommonState *s); } IOAPICCommonClass; @@ -111,6 +112,8 @@ struct IOAPICCommonState { uint8_t version; uint64_t irq_count[IOAPIC_NUM_PINS]; int irq_level[IOAPIC_NUM_PINS]; + int irq_reassert[IOAPIC_NUM_PINS]; + QEMUTimer *timer; }; void ioapic_reset_common(DeviceState *dev);