From patchwork Mon Jan 7 22:29:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Venu Busireddy X-Patchwork-Id: 10751263 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3C76A91E for ; Mon, 7 Jan 2019 22:33:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2B3A128910 for ; Mon, 7 Jan 2019 22:33:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1C57428A95; Mon, 7 Jan 2019 22:33:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 81D8628910 for ; Mon, 7 Jan 2019 22:33:14 +0000 (UTC) Received: from localhost ([127.0.0.1]:51320 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ggdSL-0007Te-Mj for patchwork-qemu-devel@patchwork.kernel.org; Mon, 07 Jan 2019 17:33:13 -0500 Received: from eggs.gnu.org ([209.51.188.92]:39467) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ggdPo-0005Jl-Qy for qemu-devel@nongnu.org; Mon, 07 Jan 2019 17:30:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ggdPl-00050p-HC for qemu-devel@nongnu.org; Mon, 07 Jan 2019 17:30:36 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:54880) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ggdPl-0004OV-7F for qemu-devel@nongnu.org; Mon, 07 Jan 2019 17:30:33 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id x07MJ7gc065781; Mon, 7 Jan 2019 22:30:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=GYSSbM2dP/ffT9saV7ZsC9XarM+4zqBgClXC9DaD8YE=; b=ZOWgpyTLfvwFA/jgPqdg2IhG+9VUWxfTJgxdse1M8myQB/jhzp4dPVPAJB9i5WQCaBNr sWULaDhTPz4c5C/KuhegUTAf7Q5xkmdTUxjXf68rHJlITa3/t5SYgj8aWoutdRNWMKhx 3MA/cTFHO9k6jJZyXdkKUj8ABQXEA6gs0PEbXwyQ8mNE3CvwzsGMdb//U3b7JEhNrnhP g1w5ZKTBpWNX9oEEVu5wmTKGlDcTMrdBcVjbIvLy28g2d9Op88ZbpVK4G8Db714d0slE qf192TEGzreWltPrdUp11EP1Mx09NttPJT5Xzbp+rTA/TRtyW0MMkzEnPlP+GVmOoP3f Pw== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2130.oracle.com with ESMTP id 2ptj3drmq3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 07 Jan 2019 22:30:18 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id x07MUHBH016803 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 7 Jan 2019 22:30:17 GMT Received: from abhmp0013.oracle.com (abhmp0013.oracle.com [141.146.116.19]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x07MUGYc004925; Mon, 7 Jan 2019 22:30:17 GMT Received: from ban25x6uut28.us.oracle.com (/10.153.73.28) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 07 Jan 2019 14:30:16 -0800 From: Venu Busireddy To: venu.busireddy@oracle.com, "Michael S. Tsirkin" , Marcel Apfelbaum Date: Mon, 7 Jan 2019 17:29:43 -0500 Message-Id: <1546900184-27403-5-git-send-email-venu.busireddy@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1546900184-27403-1-git-send-email-venu.busireddy@oracle.com> References: <1546900184-27403-1-git-send-email-venu.busireddy@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9129 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1901070183 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] X-Received-From: 141.146.126.79 Subject: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Si-Wei Liu , virtio-dev@lists.oasis-open.org, qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Si-Wei Liu When a VF is hotplugged into the guest, datapath switching will be performed immediately, which is sub-optimal in terms of timing, and could end up with substantial network downtime. One of ways to shorten this downtime is to switch the datapath only after the VF is seen to get enabled by guest, indicated by the bus master bit in VF's PCI config space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted at that time to indicate this condition. Then management stack can kick off datapath switching upon receiving the event. Signed-off-by: Si-Wei Liu Signed-off-by: Venu Busireddy --- hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ qapi/net.json | 26 ++++++++++++++++++++++++++ 2 files changed, 83 insertions(+) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index bd83b58..adcc95a 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -34,6 +34,7 @@ #include "pci.h" #include "trace.h" #include "qapi/error.h" +#include "qapi/qapi-events-net.h" #define MSIX_CAP_LENGTH 12 @@ -42,6 +43,7 @@ static void vfio_disable_interrupts(VFIOPCIDevice *vdev); static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled); +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status); /* * Disabling BAR mmaping can be slow, but toggling it around INTx can @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev, { VFIOPCIDevice *vdev = PCI_VFIO(pdev); uint32_t val_le = cpu_to_le32(val); + bool may_notify = false; + bool master_was = false; trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len); @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev, __func__, vdev->vbasedev.name, addr, val, len); } + /* Bus Master Enabling/Disabling */ + if (pdev->failover_primary && current_cpu && + range_covers_byte(addr, len, PCI_COMMAND)) { + master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) & + PCI_COMMAND_MASTER); + may_notify = true; + } + /* MSI/MSI-X Enabling/Disabling */ if (pdev->cap_present & QEMU_PCI_CAP_MSI && ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) { @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev, /* Write everything to QEMU to keep emulated bits correct */ pci_default_write_config(pdev, addr, val, len); } + + if (may_notify) { + bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) & + PCI_COMMAND_MASTER); + if (master_was != master_now) { + vfio_failover_notify(vdev, master_now); + } + } } /* @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev) vdev->req_enabled = false; } +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status) +{ + PCIDevice *pdev = &vdev->pdev; + const char *n; + gchar *path; + + n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name; + path = object_get_canonical_path(OBJECT(vdev)); + qapi_event_send_failover_primary_changed(!!n, n, path, status); +} + static void vfio_realize(PCIDevice *pdev, Error **errp) { VFIOPCIDevice *vdev = PCI_VFIO(pdev); @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj) vfio_put_group(group); } +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev) +{ + PCIDevice *pdev = &vdev->pdev; + + /* + * Guest driver may not get the chance to disable bus mastering + * before the device object gets to be unrealized. In that event, + * send out a "disabled" notification on behalf of guest driver. + */ + if (pdev->failover_primary && + pdev->bus_master_enable_region.enabled) { + vfio_failover_notify(vdev, false); + } +} + static void vfio_exitfn(PCIDevice *pdev) { VFIOPCIDevice *vdev = PCI_VFIO(pdev); + /* + * During the guest reboot sequence, it is sometimes possible that + * the guest may not get sufficient time to complete the entire driver + * removal sequence, near the end of which a PCI config space write to + * disable bus mastering can be intercepted by device. In such cases, + * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It + * is imperative to generate the event on the guest's behalf if the + * guest fails to make it. + */ + vfio_exit_failover_notify(vdev); + vfio_unregister_req_notifier(vdev); vfio_unregister_err_notifier(vdev); pci_device_set_intx_routing_notifier(&vdev->pdev, NULL); diff --git a/qapi/net.json b/qapi/net.json index 633ac87..a5b8d70 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -757,3 +757,29 @@ ## { 'command': 'query-standby-status', 'data': { '*device': 'str' }, 'returns': ['StandbyStatusInfo'] } + +## +# @FAILOVER_PRIMARY_CHANGED: +# +# Emitted whenever the driver of failover primary is loaded or unloaded +# by the guest. +# +# @device: device name +# +# @path: device path +# +# @enabled: true if driver is loaded thus device is enabled in guest +# +# Since: 3.0 +# +# Example: +# +# <- { "event": "FAILOVER_PRIMARY_CHANGED", +# "data": { "device": "vfio-0", +# "path": "/machine/peripheral/vfio-0" }, +# "enabled": "true" }, +# "timestamp": { "seconds": 1539935213, "microseconds": 753529 } } +# +## +{ 'event': 'FAILOVER_PRIMARY_CHANGED', + 'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }