From patchwork Sat Sep 30 13:58:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 13405133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4B24E77374 for ; Sat, 30 Sep 2023 13:58:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234207AbjI3N6r (ORCPT ); Sat, 30 Sep 2023 09:58:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231364AbjI3N6q (ORCPT ); Sat, 30 Sep 2023 09:58:46 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D22269E; Sat, 30 Sep 2023 06:58:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=MIME-Version:Content-Type:Date:Cc:To: From:Subject:Message-ID:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=1vPpvqCVinL9fzTjTNGUlj4myey3yl6y2SPFrWu6ih4=; b=HcEf01AHydDrE8pxRYZKHdPTmx Z9qHuAeBAYxa+TPzHviQ3l2Aen5HCg1wvmHrcVsCcVU0MzmRBrpxtdaJA3QaRjWFYSB+234GW253T 8zkAiDx+e1zSNUL+7FEQGFoaIyhyyjSqtdHbINFSaClgzbqNt06j13VXFG/3qPV87Kf63EcHtgu/r dIrMltvvNLRgD91KeJLUiPGteycM0IXLW9B3Ar+hRsCkB5sB7RA35JCc6U4CKf8g3DZCXLDnROJVh iUAlgPa+hWYItb2kTsqhRKpKE0A8UTdPAv12jm0gGKAoG8m5wmwQTcGTH8VPFk4O8k7ppUMXVvl82 ABNR1tJw==; Received: from 54-240-197-238.amazon.com ([54.240.197.238] helo=u3832b3a9db3152.ant.amazon.com) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qmaUP-00FEqW-Rx; Sat, 30 Sep 2023 13:58:38 +0000 Message-ID: Subject: [PATCH v3] KVM: x86: Use fast path for Xen timer delivery From: David Woodhouse To: kvm Cc: Paul Durrant , Sean Christopherson , Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , linux-kernel@vger.kernel.org Date: Sat, 30 Sep 2023 14:58:35 +0100 User-Agent: Evolution 3.44.4-0ubuntu2 MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse Most of the time there's no need to kick the vCPU and deliver the timer event through kvm_xen_inject_timer_irqs(). Use kvm_xen_set_evtchn_fast() directly from the timer callback, and only fall back to the slow path when it's necessary to do so. This gives a significant improvement in timer latency testing (using nanosleep() for various periods and then measuring the actual time elapsed). However, there was a reason¹ the fast path was dropped when this support was first added. The current code holds vcpu->mutex for all operations on the kvm->arch.timer_expires field, and the fast path introduces a potential race condition. Avoid that race by ensuring the hrtimer is (temporarily) cancelled before making changes in kvm_xen_start_timer(), and also when reading the values out for KVM_XEN_VCPU_ATTR_TYPE_TIMER. ¹ https://lore.kernel.org/kvm/846caa99-2e42-4443-1070-84e49d2f11d2@redhat.com/ Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- • v2: Remember, and deal with, those races. • v3: Drop the assertions for vcpu being loaded; those can be done separately if at all. Reorder the code in xen_timer_callback() to make it clearer that kvm->arch.xen.timer_expires is being cleared in the case where the event channel delivery is *complete*, as opposed to the -EWOULDBLOCK deferred path. Drop the 'pending' variable in kvm_xen_vcpu_get_attr() and restart the hrtimer if (kvm->arch.xen.timer_expires), which ought to be exactly the same thing (that's the *point* in cancelling the timer, to make it truthful as we return its value to userspace). Improve comments. arch/x86/kvm/xen.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 40edf4d1974c..75586da134b3 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -134,9 +134,23 @@ static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer) { struct kvm_vcpu *vcpu = container_of(timer, struct kvm_vcpu, arch.xen.timer); + struct kvm_xen_evtchn e; + int rc; + if (atomic_read(&vcpu->arch.xen.timer_pending)) return HRTIMER_NORESTART; + e.vcpu_id = vcpu->vcpu_id; + e.vcpu_idx = vcpu->vcpu_idx; + e.port = vcpu->arch.xen.timer_virq; + e.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL; + + rc = kvm_xen_set_evtchn_fast(&e, vcpu->kvm); + if (rc != -EWOULDBLOCK) { + vcpu->arch.xen.timer_expires = 0; + return HRTIMER_NORESTART; + } + atomic_inc(&vcpu->arch.xen.timer_pending); kvm_make_request(KVM_REQ_UNBLOCK, vcpu); kvm_vcpu_kick(vcpu); @@ -146,6 +160,14 @@ static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer) static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs, s64 delta_ns) { + /* + * Avoid races with the old timer firing. Checking timer_expires + * to avoid calling hrtimer_cancel() will only have false positives + * so is fine. + */ + if (vcpu->arch.xen.timer_expires) + hrtimer_cancel(&vcpu->arch.xen.timer); + atomic_set(&vcpu->arch.xen.timer_pending, 0); vcpu->arch.xen.timer_expires = guest_abs; @@ -1019,9 +1041,36 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) break; case KVM_XEN_VCPU_ATTR_TYPE_TIMER: + /* + * Ensure a consistent snapshot of state is captured, with a + * timer either being pending, or the event channel delivered + * to the corresponding bit in the shared_info. Not still + * lurking in the timer_pending flag for deferred delivery. + * Purely as an optimisation, if the timer_expires field is + * zero, that means the timer isn't active (or even in the + * timer_pending flag) and there is no need to cancel it. + */ + if (vcpu->arch.xen.timer_expires) { + hrtimer_cancel(&vcpu->arch.xen.timer); + kvm_xen_inject_timer_irqs(vcpu); + } + data->u.timer.port = vcpu->arch.xen.timer_virq; data->u.timer.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL; data->u.timer.expires_ns = vcpu->arch.xen.timer_expires; + + /* + * The hrtimer may trigger and raise the IRQ immediately, + * while the returned state causes it to be set up and + * raised again on the destination system after migration. + * That's fine, as the guest won't even have had a chance + * to run and handle the interrupt. Asserting an already + * pending event channel is idempotent. + */ + if (vcpu->arch.xen.timer_expires) + hrtimer_start_expires(&vcpu->arch.xen.timer, + HRTIMER_MODE_ABS_HARD); + r = 0; break;