From patchwork Tue Jul 9 20:58:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Sistare X-Patchwork-Id: 13728523 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2ADF1C3DA42 for ; Tue, 9 Jul 2024 20:59:54 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sRHvg-0005hK-MA; Tue, 09 Jul 2024 16:59:16 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHva-0005EV-DV for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:10 -0400 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHvV-0005WA-Oi for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:09 -0400 Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 469KtoE2002323; Tue, 9 Jul 2024 20:59:00 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= corp-2023-11-20; bh=MFYDPnMFgJiIFcgTnvD059sZcMMPfhkRR0EJOPChVSc=; b= odHdLMpt+dtCY6onxHfFYGZ93m9ykPaV8YfRc98C0icCdapmovPR+q0zpGgb9Hub Kupczo2kOUxATy82th0r9PNKsaLxeVdNebOCuyq2pPQi1iOjhw55uItc0bBmWFaj zQ8aCAJoMdDNcYcneknLu7rPgwa6X+y7QlfUK6EzeyO+0kgwzISTHALxFhbcj+gH lEMJuA9AU5fcFiK1/+3sJO1NbwPyT4H+ifxdnLLujD3ajHRCsb2c4UR5bgf4d+CQ ZfLnNqgLAp4QRq3b47rMYV2HVLDbM9qeD/JdbpNej+7D4hjKzF9M3mGRt3L5Rvpr 6kcnrRnEBITl2cvRPQVIJQ== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 407emswam2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:00 +0000 (GMT) Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 469Jx0rk005008; Tue, 9 Jul 2024 20:58:59 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 407tve98ra-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:58:59 +0000 Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 469KwwCr012128; Tue, 9 Jul 2024 20:58:59 GMT Received: from ca-dev63.us.oracle.com (ca-dev63.us.oracle.com [10.211.8.221]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 407tve98qa-2; Tue, 09 Jul 2024 20:58:59 +0000 From: Steve Sistare To: qemu-devel@nongnu.org Cc: Alex Williamson , Cedric Le Goater , "Michael S. Tsirkin" , Marcel Apfelbaum , Peter Xu , Fabiano Rosas , Steve Sistare Subject: [PATCH V1 1/8] migration: cpr_needed_for_reuse Date: Tue, 9 Jul 2024 13:58:50 -0700 Message-Id: <1720558737-451106-2-git-send-email-steven.sistare@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> References: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-09_09,2024-07-09_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 bulkscore=0 adultscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2406180000 definitions=main-2407090143 X-Proofpoint-GUID: OG8-mx-dd99xvDVb9KwJJ-VIij0t00eK X-Proofpoint-ORIG-GUID: OG8-mx-dd99xvDVb9KwJJ-VIij0t00eK Received-SPF: pass client-ip=205.220.177.32; envelope-from=steven.sistare@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Define a vmstate "needed" helper. This will be moved to the preceding patch series "Live update: cpr-exec" because it is needed by multiple devices. Signed-off-by: Steve Sistare --- include/migration/cpr.h | 1 + migration/cpr.c | 5 +++++ 2 files changed, 6 insertions(+) diff --git a/include/migration/cpr.h b/include/migration/cpr.h index c6c60f8..8d20d3e 100644 --- a/include/migration/cpr.h +++ b/include/migration/cpr.h @@ -24,6 +24,7 @@ void cpr_resave_fd(const char *name, int id, int fd); int cpr_state_save(Error **errp); int cpr_state_load(Error **errp); +bool cpr_needed_for_reuse(void *opaque); QEMUFile *cpr_exec_output(Error **errp); QEMUFile *cpr_exec_input(Error **errp); diff --git a/migration/cpr.c b/migration/cpr.c index f756c15..843241c 100644 --- a/migration/cpr.c +++ b/migration/cpr.c @@ -236,3 +236,8 @@ int cpr_state_load(Error **errp) return ret; } +bool cpr_needed_for_reuse(void *opaque) +{ + MigMode mode = migrate_mode(); + return mode == MIG_MODE_CPR_EXEC; +} From patchwork Tue Jul 9 20:58:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Sistare X-Patchwork-Id: 13728520 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F2669C2BD09 for ; Tue, 9 Jul 2024 20:59:53 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sRHva-0005FZ-OP; Tue, 09 Jul 2024 16:59:10 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHvY-0005Bk-Q2 for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:08 -0400 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHvV-0005WI-OO for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:08 -0400 Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 469Kta1K013460; Tue, 9 Jul 2024 20:59:01 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= corp-2023-11-20; bh=O4vGi4YKzgEMRY64Z5ji/8gBNADnjUSW3sNa7rBArQI=; b= Qk0zqQIHG9G88Br9XyFASIF0q+q1vJmUkLdGCi12CipflA+9eGxyZ7gHI/Boalki IxV+78Jr9xLAWFIhtLxUU1wujm3heOc39G2JhKih0tPCkE83aRRKpgtILU4JGCrT 1j4A5A9K318bphp40w2jYgSkTCuPcXqBT/96hqiiJXcyC4ONGpGUnrcV08EPoEkg ZNRznBUK3KjrlIPrBg0uiLpPCXEu3k+k/DCf8OZxf/ssVCKwKoQS3yWbxJUW8Uju eBrbilRDAQ+qGlHBv5jX1bt17T3mw35ltxjOOiPBFMA7RmYOefTywKFE+0xtwfE5 aTEjcdM2cyJifuoSUIEYgA== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 406wky60gf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:00 +0000 (GMT) Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 469Jl92n005052; Tue, 9 Jul 2024 20:59:00 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 407tve98ru-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:00 +0000 Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 469KwwCt012128; Tue, 9 Jul 2024 20:58:59 GMT Received: from ca-dev63.us.oracle.com (ca-dev63.us.oracle.com [10.211.8.221]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 407tve98qa-3; Tue, 09 Jul 2024 20:58:59 +0000 From: Steve Sistare To: qemu-devel@nongnu.org Cc: Alex Williamson , Cedric Le Goater , "Michael S. Tsirkin" , Marcel Apfelbaum , Peter Xu , Fabiano Rosas , Steve Sistare Subject: [PATCH V1 2/8] pci: export msix_is_pending Date: Tue, 9 Jul 2024 13:58:51 -0700 Message-Id: <1720558737-451106-3-git-send-email-steven.sistare@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> References: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-09_09,2024-07-09_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 bulkscore=0 adultscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2406180000 definitions=main-2407090143 X-Proofpoint-ORIG-GUID: dt1q4IXdNBTFIU-WXWM4YOAHeGSIz5ad X-Proofpoint-GUID: dt1q4IXdNBTFIU-WXWM4YOAHeGSIz5ad Received-SPF: pass client-ip=205.220.165.32; envelope-from=steven.sistare@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Export msix_is_pending for use by cpr. No functional change. Signed-off-by: Steve Sistare Acked-by: Michael S. Tsirkin --- hw/pci/msix.c | 2 +- include/hw/pci/msix.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/pci/msix.c b/hw/pci/msix.c index 487e498..17ef2b0 100644 --- a/hw/pci/msix.c +++ b/hw/pci/msix.c @@ -71,7 +71,7 @@ static uint8_t *msix_pending_byte(PCIDevice *dev, int vector) return dev->msix_pba + vector / 8; } -static int msix_is_pending(PCIDevice *dev, int vector) +int msix_is_pending(PCIDevice *dev, unsigned int vector) { return *msix_pending_byte(dev, vector) & msix_pending_mask(vector); } diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h index 0e6f257..11ef945 100644 --- a/include/hw/pci/msix.h +++ b/include/hw/pci/msix.h @@ -32,6 +32,7 @@ int msix_present(PCIDevice *dev); bool msix_is_masked(PCIDevice *dev, unsigned vector); void msix_set_pending(PCIDevice *dev, unsigned vector); void msix_clr_pending(PCIDevice *dev, int vector); +int msix_is_pending(PCIDevice *dev, unsigned vector); void msix_vector_use(PCIDevice *dev, unsigned vector); void msix_vector_unuse(PCIDevice *dev, unsigned vector); From patchwork Tue Jul 9 20:58:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Sistare X-Patchwork-Id: 13728524 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6C9D3C2BD09 for ; Tue, 9 Jul 2024 21:00:40 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sRHvb-0005H8-34; Tue, 09 Jul 2024 16:59:11 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHvX-0005Am-Sp for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:08 -0400 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHvV-0005WM-Of for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:07 -0400 Received: from pps.filterd (m0333520.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 469KtxG8007397; Tue, 9 Jul 2024 20:59:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= corp-2023-11-20; bh=1CmNZjNG0UrDffznWLzeZ0X16rMNtfOS/G5kp8uln/U=; b= bMFA05xsNUOAYjZhte6qhuYcviKGnSwAsnwXIFXmAtBG9M+gxraa/0wqtArb9Lpk tam4ipnztdjqE8yr7SUs7HVPyajbV9SuUe74n2cAW7hSeJhpwfuKLK7TzC8p9gle lchF2RpS7Q2H5iW3SPRN9wVBks0hFGU5hg6JO9RNVsdBEVYujiXlxxOrxSS93w5i 4G/Um54a5Y3GLuUXg53WvpQfuzx2ipGO1kF9bfxOCcRC6Qp6M10EB/9eCI9zdneG sFWEDgvWqVblXpr9cbRFTIJpRogd+zDYkrsCY5hqysCBE6PTLYNXLstVVwovQiiv v4G8NRifEoAFVmoFuS5wXQ== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 406wybp0vw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:01 +0000 (GMT) Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 469JvmU5005036; Tue, 9 Jul 2024 20:59:00 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 407tve98s1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:00 +0000 Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 469KwwCv012128; Tue, 9 Jul 2024 20:59:00 GMT Received: from ca-dev63.us.oracle.com (ca-dev63.us.oracle.com [10.211.8.221]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 407tve98qa-4; Tue, 09 Jul 2024 20:59:00 +0000 From: Steve Sistare To: qemu-devel@nongnu.org Cc: Alex Williamson , Cedric Le Goater , "Michael S. Tsirkin" , Marcel Apfelbaum , Peter Xu , Fabiano Rosas , Steve Sistare Subject: [PATCH V1 3/8] vfio-pci: refactor for cpr Date: Tue, 9 Jul 2024 13:58:52 -0700 Message-Id: <1720558737-451106-4-git-send-email-steven.sistare@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> References: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-09_09,2024-07-09_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 bulkscore=0 adultscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2406180000 definitions=main-2407090143 X-Proofpoint-GUID: y-M22LkM4Ccq6hfn8tnRr4QR0BdfQk4E X-Proofpoint-ORIG-GUID: y-M22LkM4Ccq6hfn8tnRr4QR0BdfQk4E Received-SPF: pass client-ip=205.220.177.32; envelope-from=steven.sistare@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Refactor vector use into a helper vfio_vector_init. Add vfio_notifier_init and vfio_notifier_cleanup for named notifiers, and pass additional arguments to vfio_remove_kvm_msi_virq. All for use by CPR in a subsequent patch. No functional change. Signed-off-by: Steve Sistare --- hw/vfio/pci.c | 106 +++++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 68 insertions(+), 38 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index e03d9f3..ca3c22a 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -54,6 +54,32 @@ static void vfio_disable_interrupts(VFIOPCIDevice *vdev); static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled); static void vfio_msi_disable_common(VFIOPCIDevice *vdev); +/* Create new or reuse existing eventfd */ +static int vfio_notifier_init(VFIOPCIDevice *vdev, EventNotifier *e, + const char *name, int nr) +{ + int fd = -1; /* placeholder until a subsequent patch */ + int ret = 0; + + if (fd >= 0) { + event_notifier_init_fd(e, fd); + } else { + ret = event_notifier_init(e, 0); + if (ret) { + Error *err = NULL; + error_setg_errno(&err, -ret, "vfio_notifier_init %s failed", name); + error_report_err(err); + } + } + return ret; +} + +static void vfio_notifier_cleanup(VFIOPCIDevice *vdev, EventNotifier *e, + const char *name, int nr) +{ + event_notifier_cleanup(e); +} + /* * Disabling BAR mmaping can be slow, but toggling it around INTx can * also be a huge overhead. We try to get the best of both worlds by @@ -134,8 +160,8 @@ static bool vfio_intx_enable_kvm(VFIOPCIDevice *vdev, Error **errp) pci_irq_deassert(&vdev->pdev); /* Get an eventfd for resample/unmask */ - if (event_notifier_init(&vdev->intx.unmask, 0)) { - error_setg(errp, "event_notifier_init failed eoi"); + if (vfio_notifier_init(vdev, &vdev->intx.unmask, "intx-unmask", 0)) { + error_setg(errp, "vfio_notifier_init intx-unmask failed"); goto fail; } @@ -167,7 +193,7 @@ fail_vfio: kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, &vdev->intx.interrupt, vdev->intx.route.irq); fail_irqfd: - event_notifier_cleanup(&vdev->intx.unmask); + vfio_notifier_cleanup(vdev, &vdev->intx.unmask, "intx-unmask", 0); fail: qemu_set_fd_handler(irq_fd, vfio_intx_interrupt, NULL, vdev); vfio_unmask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX); @@ -199,7 +225,7 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev) } /* We only need to close the eventfd for VFIO to cleanup the kernel side */ - event_notifier_cleanup(&vdev->intx.unmask); + vfio_notifier_cleanup(vdev, &vdev->intx.unmask, "intx-unmask", 0); /* QEMU starts listening for interrupt events. */ qemu_set_fd_handler(event_notifier_get_fd(&vdev->intx.interrupt), @@ -266,7 +292,6 @@ static bool vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp) uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1); Error *err = NULL; int32_t fd; - int ret; if (!pin) { @@ -289,9 +314,7 @@ static bool vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp) } #endif - ret = event_notifier_init(&vdev->intx.interrupt, 0); - if (ret) { - error_setg_errno(errp, -ret, "event_notifier_init failed"); + if (vfio_notifier_init(vdev, &vdev->intx.interrupt, "intx-interrupt", 0)) { return false; } fd = event_notifier_get_fd(&vdev->intx.interrupt); @@ -300,7 +323,7 @@ static bool vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp) if (!vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX, 0, VFIO_IRQ_SET_ACTION_TRIGGER, fd, errp)) { qemu_set_fd_handler(fd, NULL, NULL, vdev); - event_notifier_cleanup(&vdev->intx.interrupt); + vfio_notifier_cleanup(vdev, &vdev->intx.interrupt, "intx-interrupt", 0); return false; } @@ -327,7 +350,7 @@ static void vfio_intx_disable(VFIOPCIDevice *vdev) fd = event_notifier_get_fd(&vdev->intx.interrupt); qemu_set_fd_handler(fd, NULL, NULL, vdev); - event_notifier_cleanup(&vdev->intx.interrupt); + vfio_notifier_cleanup(vdev, &vdev->intx.interrupt, "intx-interrupt", 0); vdev->interrupt = VFIO_INT_NONE; @@ -471,13 +494,15 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector, vector_n, &vdev->pdev); } -static void vfio_connect_kvm_msi_virq(VFIOMSIVector *vector) +static void vfio_connect_kvm_msi_virq(VFIOMSIVector *vector, int nr) { + const char *name = "kvm_interrupt"; + if (vector->virq < 0) { return; } - if (event_notifier_init(&vector->kvm_interrupt, 0)) { + if (vfio_notifier_init(vector->vdev, &vector->kvm_interrupt, name, nr)) { goto fail_notifier; } @@ -489,19 +514,20 @@ static void vfio_connect_kvm_msi_virq(VFIOMSIVector *vector) return; fail_kvm: - event_notifier_cleanup(&vector->kvm_interrupt); + vfio_notifier_cleanup(vector->vdev, &vector->kvm_interrupt, name, nr); fail_notifier: kvm_irqchip_release_virq(kvm_state, vector->virq); vector->virq = -1; } -static void vfio_remove_kvm_msi_virq(VFIOMSIVector *vector) +static void vfio_remove_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector, + int nr) { kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, &vector->kvm_interrupt, vector->virq); kvm_irqchip_release_virq(kvm_state, vector->virq); vector->virq = -1; - event_notifier_cleanup(&vector->kvm_interrupt); + vfio_notifier_cleanup(vdev, &vector->kvm_interrupt, "kvm_interrupt", nr); } static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg, @@ -511,6 +537,20 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg, kvm_irqchip_commit_routes(kvm_state); } +static void vfio_vector_init(VFIOPCIDevice *vdev, int nr) +{ + VFIOMSIVector *vector = &vdev->msi_vectors[nr]; + PCIDevice *pdev = &vdev->pdev; + + vector->vdev = vdev; + vector->virq = -1; + vfio_notifier_init(vdev, &vector->interrupt, "interrupt", nr); + vector->use = true; + if (vdev->interrupt == VFIO_INT_MSIX) { + msix_vector_use(pdev, nr); + } +} + static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr, MSIMessage *msg, IOHandler *handler) { @@ -524,13 +564,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr, vector = &vdev->msi_vectors[nr]; if (!vector->use) { - vector->vdev = vdev; - vector->virq = -1; - if (event_notifier_init(&vector->interrupt, 0)) { - error_report("vfio: Error: event_notifier_init failed"); - } - vector->use = true; - msix_vector_use(pdev, nr); + vfio_vector_init(vdev, nr); } qemu_set_fd_handler(event_notifier_get_fd(&vector->interrupt), @@ -542,7 +576,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr, */ if (vector->virq >= 0) { if (!msg) { - vfio_remove_kvm_msi_virq(vector); + vfio_remove_kvm_msi_virq(vdev, vector, nr); } else { vfio_update_kvm_msi_virq(vector, *msg, pdev); } @@ -554,7 +588,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr, vfio_route_change = kvm_irqchip_begin_route_changes(kvm_state); vfio_add_kvm_msi_virq(vdev, vector, nr, true); kvm_irqchip_commit_route_changes(&vfio_route_change); - vfio_connect_kvm_msi_virq(vector); + vfio_connect_kvm_msi_virq(vector, nr); } } } @@ -661,7 +695,7 @@ static void vfio_commit_kvm_msi_virq_batch(VFIOPCIDevice *vdev) kvm_irqchip_commit_route_changes(&vfio_route_change); for (i = 0; i < vdev->nr_vectors; i++) { - vfio_connect_kvm_msi_virq(&vdev->msi_vectors[i]); + vfio_connect_kvm_msi_virq(&vdev->msi_vectors[i], i); } } @@ -741,9 +775,7 @@ retry: vector->virq = -1; vector->use = true; - if (event_notifier_init(&vector->interrupt, 0)) { - error_report("vfio: Error: event_notifier_init failed"); - } + vfio_notifier_init(vdev, &vector->interrupt, "interrupt", i); qemu_set_fd_handler(event_notifier_get_fd(&vector->interrupt), vfio_msi_interrupt, NULL, vector); @@ -797,11 +829,11 @@ static void vfio_msi_disable_common(VFIOPCIDevice *vdev) VFIOMSIVector *vector = &vdev->msi_vectors[i]; if (vdev->msi_vectors[i].use) { if (vector->virq >= 0) { - vfio_remove_kvm_msi_virq(vector); + vfio_remove_kvm_msi_virq(vdev, vector, i); } qemu_set_fd_handler(event_notifier_get_fd(&vector->interrupt), NULL, NULL, NULL); - event_notifier_cleanup(&vector->interrupt); + vfio_notifier_cleanup(vdev, &vector->interrupt, "interrupt", i); } } @@ -2855,8 +2887,7 @@ static void vfio_register_err_notifier(VFIOPCIDevice *vdev) return; } - if (event_notifier_init(&vdev->err_notifier, 0)) { - error_report("vfio: Unable to init event notifier for error detection"); + if (vfio_notifier_init(vdev, &vdev->err_notifier, "err_notifier", 0)) { vdev->pci_aer = false; return; } @@ -2868,7 +2899,7 @@ static void vfio_register_err_notifier(VFIOPCIDevice *vdev) VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) { error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name); qemu_set_fd_handler(fd, NULL, NULL, vdev); - event_notifier_cleanup(&vdev->err_notifier); + vfio_notifier_cleanup(vdev, &vdev->err_notifier, "err_notifier", 0); vdev->pci_aer = false; } } @@ -2887,7 +2918,7 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev) } qemu_set_fd_handler(event_notifier_get_fd(&vdev->err_notifier), NULL, NULL, vdev); - event_notifier_cleanup(&vdev->err_notifier); + vfio_notifier_cleanup(vdev, &vdev->err_notifier, "err_notifier", 0); } static void vfio_req_notifier_handler(void *opaque) @@ -2921,8 +2952,7 @@ static void vfio_register_req_notifier(VFIOPCIDevice *vdev) return; } - if (event_notifier_init(&vdev->req_notifier, 0)) { - error_report("vfio: Unable to init event notifier for device request"); + if (vfio_notifier_init(vdev, &vdev->req_notifier, "req_notifier", 0)) { return; } @@ -2933,7 +2963,7 @@ static void vfio_register_req_notifier(VFIOPCIDevice *vdev) VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) { error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name); qemu_set_fd_handler(fd, NULL, NULL, vdev); - event_notifier_cleanup(&vdev->req_notifier); + vfio_notifier_cleanup(vdev, &vdev->req_notifier, "req_notifier", 0); } else { vdev->req_enabled = true; } @@ -2953,7 +2983,7 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev) } qemu_set_fd_handler(event_notifier_get_fd(&vdev->req_notifier), NULL, NULL, vdev); - event_notifier_cleanup(&vdev->req_notifier); + vfio_notifier_cleanup(vdev, &vdev->req_notifier, "req_notifier", 0); vdev->req_enabled = false; } From patchwork Tue Jul 9 20:58:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Sistare X-Patchwork-Id: 13728525 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9EAADC3DA42 for ; Tue, 9 Jul 2024 21:00:42 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sRHvh-0005kY-7J; Tue, 09 Jul 2024 16:59:17 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHva-0005G3-Py for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:10 -0400 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHvV-0005Wi-Oe for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:10 -0400 Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 469KtXQP013362; Tue, 9 Jul 2024 20:59:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= corp-2023-11-20; bh=OqKTAz13oJqfMf35akxCVBZIbB1aEw6pScdFvFuTZ2M=; b= G4WZmi+86d5BxLYZSmCZj3Oc0msExsb84W5sswNnn+pWddVR3voZxNvICPw4eIj5 EjyGYJn3mgpHhN+iEJxvp2uoOMUBinN0KH3Yx3Ez8JjXkUJRy+bKVvpD4FNXn0Pp aAjQbsci8jSxjLIlwQnHbpDnyoIuZoVqhIGkDmhym2F4AsNp8mQXSRA7EwJQ2yZq 9zypzX/fBazM2g0ijPtUX6yRYRwDSruLL+rWJgMslyx2KK+EX+KutrQcMM79ptCr 0d/OHgFDXyzf6CZbQ13wVoSaiFxBdnvwvmYQnVADKwfBxhHy5+igpJPv3QgbqB64 EhSt+1QdLODXc7rKmFm0zA== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 406wky60gg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:02 +0000 (GMT) Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 469JvmU6005036; Tue, 9 Jul 2024 20:59:01 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 407tve98s7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:01 +0000 Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 469KwwCx012128; Tue, 9 Jul 2024 20:59:01 GMT Received: from ca-dev63.us.oracle.com (ca-dev63.us.oracle.com [10.211.8.221]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 407tve98qa-5; Tue, 09 Jul 2024 20:59:00 +0000 From: Steve Sistare To: qemu-devel@nongnu.org Cc: Alex Williamson , Cedric Le Goater , "Michael S. Tsirkin" , Marcel Apfelbaum , Peter Xu , Fabiano Rosas , Steve Sistare Subject: [PATCH V1 4/8] vfio-pci: cpr part 1 (fd and dma) Date: Tue, 9 Jul 2024 13:58:53 -0700 Message-Id: <1720558737-451106-5-git-send-email-steven.sistare@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> References: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-09_09,2024-07-09_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 bulkscore=0 adultscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2406180000 definitions=main-2407090143 X-Proofpoint-ORIG-GUID: 2DiUyXCxG1kDcfADptxQuvez1mQXoUH8 X-Proofpoint-GUID: 2DiUyXCxG1kDcfADptxQuvez1mQXoUH8 Received-SPF: pass client-ip=205.220.165.32; envelope-from=steven.sistare@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Enable vfio-pci devices to be saved and restored across a cpr-exec of qemu. At vfio creation time, save the value of vfio container, group, and device descriptors in CPR state. In the container pre_save handler, suspend the use of virtual addresses in DMA mappings with VFIO_DMA_UNMAP_FLAG_VADDR, because guest ram will be remapped at a different VA after exec. DMA to already-mapped pages continues. Save the msi message area as part of vfio-pci vmstate, and save the interrupt and notifier eventfd's in vmstate. On qemu restart, vfio_realize() finds the saved descriptors, uses the descriptors, and notes that the device is being reused. Device and iommu state is already configured, so operations in vfio_realize that would modify the configuration are skipped for a reused device, including vfio ioctl's and writes to PCI configuration space. Vfio PCI device reset is also suppressed. The result is that vfio_realize constructs qemu data structures that reflect the current state of the device. However, the reconstruction is not complete until migrate_incoming is called. migrate_incoming loads the msi data, the vfio post_load handler finds eventfds in CPR state, rebuilds vector data structures, and attaches the interrupts to the new KVM instance. The container post_load handler then invokes the main vfio listener callback, which walks the flattened ranges of the vfio address space and calls VFIO_DMA_MAP_FLAG_VADDR to inform the kernel of the new VA's. Lastly, migration resumes the VM. This functionality is delivered by 3 patches for clarity. Part 1 handles device file descriptors and DMA. Part 2 adds eventfd and MSI/MSI-X vector support. Part 3 adds INTX support. Signed-off-by: Steve Sistare --- hw/pci/pci.c | 13 ++++ hw/vfio/common.c | 12 +++ hw/vfio/container.c | 139 ++++++++++++++++++++++++++++------ hw/vfio/cpr-legacy.c | 118 +++++++++++++++++++++++++++++ hw/vfio/cpr.c | 24 +++++- hw/vfio/meson.build | 3 +- hw/vfio/pci.c | 38 ++++++++++ include/hw/vfio/vfio-common.h | 8 ++ include/hw/vfio/vfio-container-base.h | 6 ++ include/migration/vmstate.h | 2 + 10 files changed, 336 insertions(+), 27 deletions(-) create mode 100644 hw/vfio/cpr-legacy.c diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 4c7be52..42513dd 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -32,6 +32,7 @@ #include "hw/pci/pci_host.h" #include "hw/qdev-properties.h" #include "hw/qdev-properties-system.h" +#include "migration/misc.h" #include "migration/qemu-file-types.h" #include "migration/vmstate.h" #include "net/net.h" @@ -389,6 +390,18 @@ static void pci_reset_regions(PCIDevice *dev) static void pci_do_device_reset(PCIDevice *dev) { + /* + * A PCI device that is resuming for cpr is already configured, so do + * not reset it here when we are called from qemu_system_reset prior to + * cpr load, else interrupts may be lost for vfio-pci devices. It is + * safe to skip this reset for all PCI devices, because cpr load will set + * all fields that would have been set here. + */ + MigMode mode = migrate_mode(); + if (mode == MIG_MODE_CPR_EXEC) { + return; + } + pci_device_deassert_intx(dev); assert(dev->irq_state == 0); diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 7cdb969..72a692a 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -566,6 +566,12 @@ static void vfio_listener_region_add(MemoryListener *listener, { VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase, listener); + vfio_container_region_add(bcontainer, section); +} + +void vfio_container_region_add(VFIOContainerBase *bcontainer, + MemoryRegionSection *section) +{ hwaddr iova, end; Int128 llend, llsize; void *vaddr; @@ -1395,6 +1401,12 @@ const MemoryListener vfio_memory_listener = { .log_sync = vfio_listener_log_sync, }; +void vfio_listener_register(VFIOContainerBase *bcontainer) +{ + bcontainer->listener = vfio_memory_listener; + memory_listener_register(&bcontainer->listener, bcontainer->space->as); +} + void vfio_reset_handler(void *opaque) { VFIODevice *vbasedev; diff --git a/hw/vfio/container.c b/hw/vfio/container.c index 88ede91..9970463 100644 --- a/hw/vfio/container.c +++ b/hw/vfio/container.c @@ -31,6 +31,7 @@ #include "sysemu/reset.h" #include "trace.h" #include "qapi/error.h" +#include "migration/cpr.h" #include "pci.h" VFIOGroupList vfio_group_list = @@ -131,6 +132,8 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer, int ret; Error *local_err = NULL; + assert(!bcontainer->reused); + if (iotlb && vfio_devices_all_running_and_mig_active(bcontainer)) { if (!vfio_devices_all_device_dirty_tracking(bcontainer) && bcontainer->dirty_pages_supported) { @@ -182,12 +185,24 @@ static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova, bcontainer); struct vfio_iommu_type1_dma_map map = { .argsz = sizeof(map), - .flags = VFIO_DMA_MAP_FLAG_READ, .vaddr = (__u64)(uintptr_t)vaddr, .iova = iova, .size = size, }; + /* + * Set the new vaddr for any mappings registered during cpr load. + * Reused is cleared thereafter. + */ + if (bcontainer->reused) { + map.flags = VFIO_DMA_MAP_FLAG_VADDR; + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map)) { + goto fail; + } + return 0; + } + + map.flags = VFIO_DMA_MAP_FLAG_READ; if (!readonly) { map.flags |= VFIO_DMA_MAP_FLAG_WRITE; } @@ -204,7 +219,11 @@ static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova, return 0; } - error_report("VFIO_MAP_DMA failed: %s", strerror(errno)); +fail: + error_report("vfio_dma_map %s (iova %lu, size %ld, va %p): %s", + (bcontainer->reused ? "VADDR" : ""), iova, size, vaddr, + strerror(errno)); + return -errno; } @@ -415,12 +434,28 @@ static bool vfio_set_iommu(int container_fd, int group_fd, } static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group, - Error **errp) + bool reused, Error **errp) { int iommu_type; const char *vioc_name; VFIOContainer *container; + /* + * If container is reused, just set its type and skip the ioctls, as the + * container and group are already configured in the kernel. + * VFIO_TYPE1v2_IOMMU is the only type that supports reuse/cpr. + */ + if (reused) { + if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) { + iommu_type = VFIO_TYPE1v2_IOMMU; + goto skip_iommu; + } else { + error_setg(errp, "container was reused but VFIO_TYPE1v2_IOMMU " + "is not supported"); + return NULL; + } + } + iommu_type = vfio_get_iommu_type(fd, errp); if (iommu_type < 0) { return NULL; @@ -430,10 +465,12 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group, return NULL; } +skip_iommu: vioc_name = vfio_get_iommu_class_name(iommu_type); container = VFIO_IOMMU_LEGACY(object_new(vioc_name)); container->fd = fd; + container->bcontainer.reused = reused; container->iommu_type = iommu_type; return container; } @@ -543,10 +580,13 @@ static bool vfio_connect_container(VFIOGroup *group, AddressSpace *as, VFIOContainer *container; VFIOContainerBase *bcontainer; int ret, fd; + bool reused; VFIOAddressSpace *space; VFIOIOMMUClass *vioc; space = vfio_get_address_space(as); + fd = cpr_find_fd("vfio_container_for_group", group->groupid); + reused = (fd > 0); /* * VFIO is currently incompatible with discarding of RAM insofar as the @@ -579,28 +619,50 @@ static bool vfio_connect_container(VFIOGroup *group, AddressSpace *as, * details once we know which type of IOMMU we are using. */ + /* + * If the container is reused, then the group is already attached in the + * kernel. If a container with matching fd is found, then update the + * userland group list and return. If not, then after the loop, create + * the container struct and group list. + */ + QLIST_FOREACH(bcontainer, &space->containers, next) { container = container_of(bcontainer, VFIOContainer, bcontainer); - if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) { - ret = vfio_ram_block_discard_disable(container, true); - if (ret) { - error_setg_errno(errp, -ret, - "Cannot set discarding of RAM broken"); - if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, - &container->fd)) { - error_report("vfio: error disconnecting group %d from" - " container", group->groupid); - } - return false; + + if (reused) { + if (container->fd != fd) { + continue; } - group->container = container; - QLIST_INSERT_HEAD(&container->group_list, group, container_next); + } else if (ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) { + continue; + } + + ret = vfio_ram_block_discard_disable(container, true); + if (ret) { + error_setg_errno(errp, -ret, + "Cannot set discarding of RAM broken"); + if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, + &container->fd)) { + error_report("vfio: error disconnecting group %d from" + " container", group->groupid); + + } + goto delete_fd_exit; + } + group->container = container; + QLIST_INSERT_HEAD(&container->group_list, group, container_next); + if (!reused) { vfio_kvm_device_add_group(group); - return true; + cpr_save_fd("vfio_container_for_group", group->groupid, + container->fd); } + return true; + } + + if (!reused) { + fd = qemu_open_old("/dev/vfio/vfio", O_RDWR); } - fd = qemu_open_old("/dev/vfio/vfio", O_RDWR); if (fd < 0) { error_setg_errno(errp, errno, "failed to open /dev/vfio/vfio"); goto put_space_exit; @@ -613,11 +675,12 @@ static bool vfio_connect_container(VFIOGroup *group, AddressSpace *as, goto close_fd_exit; } - container = vfio_create_container(fd, group, errp); + container = vfio_create_container(fd, group, reused, errp); if (!container) { goto close_fd_exit; } bcontainer = &container->bcontainer; + bcontainer->reused = reused; if (!vfio_cpr_register_container(bcontainer, errp)) { goto free_container_exit; @@ -643,8 +706,16 @@ static bool vfio_connect_container(VFIOGroup *group, AddressSpace *as, group->container = container; QLIST_INSERT_HEAD(&container->group_list, group, container_next); - bcontainer->listener = vfio_memory_listener; - memory_listener_register(&bcontainer->listener, bcontainer->space->as); + /* + * If reused, register the listener later, after all state that may + * affect regions and mapping boundaries has been cpr load'ed. Later, + * the listener will invoke its callback on each flat section and call + * vfio_dma_map to supply the new vaddr, and the calls will match the + * mappings remembered by the kernel. + */ + if (!reused) { + vfio_listener_register(bcontainer); + } if (bcontainer->error) { error_propagate_prepend(errp, bcontainer->error, @@ -653,6 +724,7 @@ static bool vfio_connect_container(VFIOGroup *group, AddressSpace *as, } bcontainer->initialized = true; + cpr_resave_fd("vfio_container_for_group", group->groupid, fd); return true; listener_release_exit: @@ -679,6 +751,8 @@ close_fd_exit: put_space_exit: vfio_put_address_space(space); +delete_fd_exit: + cpr_delete_fd("vfio_container_for_group", group->groupid); return false; } @@ -690,6 +764,7 @@ static void vfio_disconnect_container(VFIOGroup *group) QLIST_REMOVE(group, container_next); group->container = NULL; + cpr_delete_fd("vfio_container_for_group", group->groupid); /* * Explicitly release the listener first before unset container, @@ -743,7 +818,12 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp) group = g_malloc0(sizeof(*group)); snprintf(path, sizeof(path), "/dev/vfio/%d", groupid); - group->fd = qemu_open_old(path, O_RDWR); + + group->fd = cpr_find_fd("vfio_group", groupid); + if (group->fd < 0) { + group->fd = qemu_open_old(path, O_RDWR); + } + if (group->fd < 0) { error_setg_errno(errp, errno, "failed to open %s", path); goto free_group_exit; @@ -772,6 +852,7 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp) } QLIST_INSERT_HEAD(&vfio_group_list, group, next); + cpr_resave_fd("vfio_group", groupid, group->fd); return group; @@ -797,6 +878,7 @@ static void vfio_put_group(VFIOGroup *group) vfio_disconnect_container(group); QLIST_REMOVE(group, next); trace_vfio_put_group(group->fd); + cpr_delete_fd("vfio_group", group->groupid); close(group->fd); g_free(group); } @@ -806,8 +888,14 @@ static bool vfio_get_device(VFIOGroup *group, const char *name, { g_autofree struct vfio_device_info *info = NULL; int fd; + bool reused; + + fd = cpr_find_fd(name, 0); + reused = (fd >= 0); + if (!reused) { + fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name); + } - fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name); if (fd < 0) { error_setg_errno(errp, errno, "error getting device from group %d", group->groupid); @@ -852,6 +940,8 @@ static bool vfio_get_device(VFIOGroup *group, const char *name, vbasedev->num_irqs = info->num_irqs; vbasedev->num_regions = info->num_regions; vbasedev->flags = info->flags; + vbasedev->reused = reused; + cpr_resave_fd(name, 0, fd); trace_vfio_get_device(name, info->flags, info->num_regions, info->num_irqs); @@ -868,6 +958,7 @@ static void vfio_put_base_device(VFIODevice *vbasedev) QLIST_REMOVE(vbasedev, next); vbasedev->group = NULL; trace_vfio_put_base_device(vbasedev->fd); + cpr_delete_fd(vbasedev->name, 0); close(vbasedev->fd); } @@ -1136,6 +1227,8 @@ static void vfio_iommu_legacy_class_init(ObjectClass *klass, void *data) vioc->set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking; vioc->query_dirty_bitmap = vfio_legacy_query_dirty_bitmap; vioc->pci_hot_reset = vfio_legacy_pci_hot_reset; + vioc->cpr_register = vfio_legacy_cpr_register_container; + vioc->cpr_unregister = vfio_legacy_cpr_unregister_container; }; static bool hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque, diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c new file mode 100644 index 0000000..bc51ebe --- /dev/null +++ b/hw/vfio/cpr-legacy.c @@ -0,0 +1,118 @@ +/* + * Copyright (c) 2021-2024 Oracle and/or its affiliates. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include +#include +#include "hw/vfio/vfio-common.h" +#include "migration/blocker.h" +#include "migration/cpr.h" +#include "migration/migration.h" +#include "qapi/error.h" +#include "migration/vmstate.h" + +#define VFIO_CONTAINER(base) container_of(base, VFIOContainer, bcontainer) + +static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp) +{ + struct vfio_iommu_type1_dma_unmap unmap = { + .argsz = sizeof(unmap), + .flags = VFIO_DMA_UNMAP_FLAG_VADDR | VFIO_DMA_UNMAP_FLAG_ALL, + .iova = 0, + .size = 0, + }; + if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) { + error_setg_errno(errp, errno, "vfio_dma_unmap_vaddr_all"); + return false; + } + return true; +} + +static bool vfio_can_cpr_exec(VFIOContainer *container, Error **errp) +{ + if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) { + error_setg(errp, "VFIO container does not support VFIO_UPDATE_VADDR"); + return false; + + } else if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UNMAP_ALL)) { + error_setg(errp, "VFIO container does not support VFIO_UNMAP_ALL"); + return false; + + } else { + return true; + } +} + +static int vfio_container_pre_save(void *opaque) +{ + VFIOContainer *container = opaque; + Error *err = NULL; + + if (!vfio_can_cpr_exec(container, &err) || + !vfio_dma_unmap_vaddr_all(container, &err)) { + error_report_err(err); + return -1; + } + return 0; +} + +static int vfio_container_post_load(void *opaque, int version_id) +{ + VFIOContainer *container = opaque; + VFIOContainerBase *bcontainer = &container->bcontainer; + VFIOGroup *group; + Error *err = NULL; + VFIODevice *vbasedev; + + if (!vfio_can_cpr_exec(container, &err)) { + error_report_err(err); + return -1; + } + vfio_listener_register(bcontainer); + bcontainer->reused = false; + + QLIST_FOREACH(group, &container->group_list, container_next) { + QLIST_FOREACH(vbasedev, &group->device_list, next) { + vbasedev->reused = false; + } + } + return 0; +} + +static const VMStateDescription vfio_container_vmstate = { + .name = "vfio-container", + .version_id = 0, + .minimum_version_id = 0, + .pre_save = vfio_container_pre_save, + .post_load = vfio_container_post_load, + .needed = cpr_needed_for_reuse, + .fields = (VMStateField[]) { + VMSTATE_END_OF_LIST() + } +}; + +bool vfio_legacy_cpr_register_container(VFIOContainerBase *bcontainer, + Error **errp) +{ + VFIOContainer *container = VFIO_CONTAINER(bcontainer); + + if (!vfio_can_cpr_exec(container, &bcontainer->cpr_blocker)) { + return migrate_add_blocker_modes(&bcontainer->cpr_blocker, errp, + MIG_MODE_CPR_EXEC, -1); + } + + vmstate_register(NULL, -1, &vfio_container_vmstate, container); + + return true; +} + +void vfio_legacy_cpr_unregister_container(VFIOContainerBase *bcontainer) +{ + VFIOContainer *container = VFIO_CONTAINER(bcontainer); + + vmstate_unregister(NULL, &vfio_container_vmstate, container); +} diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c index 87e51fc..4474bc3 100644 --- a/hw/vfio/cpr.c +++ b/hw/vfio/cpr.c @@ -6,10 +6,12 @@ */ #include "qemu/osdep.h" +#include +#include #include "hw/vfio/vfio-common.h" -#include "migration/misc.h" +#include "migration/blocker.h" +#include "migration/migration.h" #include "qapi/error.h" -#include "sysemu/runstate.h" static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier, MigrationEvent *e, Error **errp) @@ -27,13 +29,29 @@ static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier, bool vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp) { + VFIOIOMMUClass *ops = VFIO_IOMMU_GET_CLASS(bcontainer); + migration_add_notifier_mode(&bcontainer->cpr_reboot_notifier, vfio_cpr_reboot_notifier, MIG_MODE_CPR_REBOOT); - return true; + + if (!ops->cpr_register) { + error_setg(&bcontainer->cpr_blocker, + "VFIO container does not support cpr_register"); + return migrate_add_blocker_modes(&bcontainer->cpr_blocker, errp, + MIG_MODE_CPR_EXEC, -1) == 0; + } + + return ops->cpr_register(bcontainer, errp); } void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer) { + VFIOIOMMUClass *ops = VFIO_IOMMU_GET_CLASS(bcontainer); + migration_remove_notifier(&bcontainer->cpr_reboot_notifier); + migrate_del_blocker(&bcontainer->cpr_blocker); + if (ops->cpr_unregister) { + ops->cpr_unregister(bcontainer); + } } diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build index bba776f..5487815 100644 --- a/hw/vfio/meson.build +++ b/hw/vfio/meson.build @@ -5,13 +5,14 @@ vfio_ss.add(files( 'container-base.c', 'container.c', 'migration.c', - 'cpr.c', )) vfio_ss.add(when: 'CONFIG_PSERIES', if_true: files('spapr.c')) vfio_ss.add(when: 'CONFIG_IOMMUFD', if_true: files( 'iommufd.c', )) vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files( + 'cpr.c', + 'cpr-legacy.c', 'display.c', 'pci-quirks.c', 'pci.c', diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index ca3c22a..2485236 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -29,6 +29,8 @@ #include "hw/pci/pci_bridge.h" #include "hw/qdev-properties.h" #include "hw/qdev-properties-system.h" +#include "migration/misc.h" +#include "migration/cpr.h" #include "migration/vmstate.h" #include "qapi/qmp/qdict.h" #include "qemu/error-report.h" @@ -3326,6 +3328,11 @@ static void vfio_pci_reset(DeviceState *dev) { VFIOPCIDevice *vdev = VFIO_PCI(dev); + /* Do not reset the device during qemu_system_reset prior to cpr load */ + if (vdev->vbasedev.reused) { + return; + } + trace_vfio_pci_reset(vdev->vbasedev.name); vfio_pci_pre_reset(vdev); @@ -3447,6 +3454,36 @@ static void vfio_pci_set_fd(Object *obj, const char *str, Error **errp) } #endif +/* + * The kernel may change non-emulated config bits. Exclude them from the + * changed-bits check in get_pci_config_device. + */ +static int vfio_pci_pre_load(void *opaque) +{ + VFIOPCIDevice *vdev = opaque; + PCIDevice *pdev = &vdev->pdev; + int size = MIN(pci_config_size(pdev), vdev->config_size); + int i; + + for (i = 0; i < size; i++) { + pdev->cmask[i] &= vdev->emulated_config_bits[i]; + } + + return 0; +} + +static const VMStateDescription vfio_pci_vmstate = { + .name = "vfio-pci", + .version_id = 0, + .minimum_version_id = 0, + .priority = MIG_PRI_VFIO_PCI, /* must load before container */ + .pre_load = vfio_pci_pre_load, + .needed = cpr_needed_for_reuse, + .fields = (VMStateField[]) { + VMSTATE_END_OF_LIST() + } +}; + static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -3457,6 +3494,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) #ifdef CONFIG_IOMMUFD object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd); #endif + dc->vmsd = &vfio_pci_vmstate; dc->desc = "VFIO-based PCI device assignment"; set_bit(DEVICE_CATEGORY_MISC, dc->categories); pdc->realize = vfio_realize; diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index e8ddf92..7c4283b 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -122,6 +122,7 @@ typedef struct VFIODevice { bool ram_block_discard_allowed; OnOffAuto enable_migration; bool migration_events; + bool reused; VFIODeviceOps *ops; unsigned int num_irqs; unsigned int num_regions; @@ -240,6 +241,9 @@ int vfio_kvm_device_del_fd(int fd, Error **errp); bool vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp); void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer); +bool vfio_legacy_cpr_register_container(VFIOContainerBase *bcontainer, + Error **errp); +void vfio_legacy_cpr_unregister_container(VFIOContainerBase *bcontainer); extern const MemoryRegionOps vfio_region_ops; typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList; @@ -287,6 +291,10 @@ int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer, int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova, uint64_t size, ram_addr_t ram_addr, Error **errp); +void vfio_container_region_add(VFIOContainerBase *bcontainer, + MemoryRegionSection *section); +void vfio_listener_register(VFIOContainerBase *bcontainer); + /* Returns 0 on success, or a negative errno. */ bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp); void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp); diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h index 419e45e..82ccf0c 100644 --- a/include/hw/vfio/vfio-container-base.h +++ b/include/hw/vfio/vfio-container-base.h @@ -39,6 +39,7 @@ typedef struct VFIOContainerBase { MemoryListener listener; Error *error; bool initialized; + bool reused; uint64_t dirty_pgsizes; uint64_t max_dirty_bitmap_size; unsigned long pgsizes; @@ -50,6 +51,7 @@ typedef struct VFIOContainerBase { QLIST_HEAD(, VFIODevice) device_list; GList *iova_ranges; NotifierWithReturn cpr_reboot_notifier; + Error *cpr_blocker; } VFIOContainerBase; typedef struct VFIOGuestIOMMU { @@ -152,5 +154,9 @@ struct VFIOIOMMUClass { void (*del_window)(VFIOContainerBase *bcontainer, MemoryRegionSection *section); void (*release)(VFIOContainerBase *bcontainer); + + /* CPR */ + bool (*cpr_register)(VFIOContainerBase *bcontainer, Error **errp); + void (*cpr_unregister)(VFIOContainerBase *bcontainer); }; #endif /* HW_VFIO_VFIO_CONTAINER_BASE_H */ diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h index f313f2f..87cb5b0 100644 --- a/include/migration/vmstate.h +++ b/include/migration/vmstate.h @@ -162,6 +162,8 @@ typedef enum { MIG_PRI_GICV3_ITS, /* Must happen before PCI devices */ MIG_PRI_GICV3, /* Must happen before the ITS */ MIG_PRI_MAX, + MIG_PRI_VFIO_PCI = + MIG_PRI_DEFAULT + 1, /* Must happen before vfio containers */ } MigrationPriority; struct VMStateField { From patchwork Tue Jul 9 20:58:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Sistare X-Patchwork-Id: 13728527 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4FBB5C3DA45 for ; Tue, 9 Jul 2024 21:00:47 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sRHvn-00068l-9Y; Tue, 09 Jul 2024 16:59:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHvl-00061B-C6 for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:21 -0400 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHvV-0005Wr-Oi for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:21 -0400 Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 469KtqIa005249; Tue, 9 Jul 2024 20:59:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= corp-2023-11-20; bh=4e0y9DLDxOxZNHEqrsO3p0WeiOFJaqMeEs/XFitVQ8Y=; b= e5ZsXFvQ9b6jhP4f3GDIMKijQynfqe+beavZpO18lIo+IouCnDHR28pnX9ZkNkjd krNnlnn/DLgMUiDWz6noLLyRvQ9SCtT9PrpfuENK2EfnbKAAfaPDAq4/t/t3WQJM t0jHFYHY2BTnRysQIBQr2/uXbovx9X95qBpQDARbAZVoKUWnnoF2hjVaXr8aRC+F Od6S22FjnRkVF+6R/a3IwyJKxx+KqB/aG98SjCIkqS3S4649iKvsVdorraq+nJeD tUdz0nxZhDlppNvth7FIJYuehOpB3C71LZNS3cOHoTUoYSMSfIMiLXRjVqhJKIjz HIU43DsQzcDz27vrGwVnOQ== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 406wknnym0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:03 +0000 (GMT) Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 469KBpsL008641; Tue, 9 Jul 2024 20:59:02 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 407tve98se-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:02 +0000 Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 469KwwD1012128; Tue, 9 Jul 2024 20:59:01 GMT Received: from ca-dev63.us.oracle.com (ca-dev63.us.oracle.com [10.211.8.221]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 407tve98qa-6; Tue, 09 Jul 2024 20:59:01 +0000 From: Steve Sistare To: qemu-devel@nongnu.org Cc: Alex Williamson , Cedric Le Goater , "Michael S. Tsirkin" , Marcel Apfelbaum , Peter Xu , Fabiano Rosas , Steve Sistare Subject: [PATCH V1 5/8] vfio-pci: cpr part 2 (msi) Date: Tue, 9 Jul 2024 13:58:54 -0700 Message-Id: <1720558737-451106-6-git-send-email-steven.sistare@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> References: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-09_09,2024-07-09_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 bulkscore=0 adultscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2406180000 definitions=main-2407090143 X-Proofpoint-GUID: LCqB40SSsphv8mk3KEocqWa8mn0dmMMl X-Proofpoint-ORIG-GUID: LCqB40SSsphv8mk3KEocqWa8mn0dmMMl Received-SPF: pass client-ip=205.220.177.32; envelope-from=steven.sistare@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Finish CPR for vfio-pci MSI/MSI-X devices by preserving eventfd's and vector state. Signed-off-by: Steve Sistare --- hw/vfio/pci.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 116 insertions(+), 1 deletion(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 2485236..f0213e0 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -56,11 +56,37 @@ static void vfio_disable_interrupts(VFIOPCIDevice *vdev); static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled); static void vfio_msi_disable_common(VFIOPCIDevice *vdev); +#define EVENT_FD_NAME(vdev, name) \ + g_strdup_printf("%s_%s", (vdev)->vbasedev.name, (name)) + +static void save_event_fd(VFIOPCIDevice *vdev, const char *name, int nr, + EventNotifier *ev) +{ + int fd = event_notifier_get_fd(ev); + + if (fd >= 0) { + g_autofree char *fdname = EVENT_FD_NAME(vdev, name); + cpr_resave_fd(fdname, nr, fd); + } +} + +static int load_event_fd(VFIOPCIDevice *vdev, const char *name, int nr) +{ + g_autofree char *fdname = EVENT_FD_NAME(vdev, name); + return cpr_find_fd(fdname, nr); +} + +static void delete_event_fd(VFIOPCIDevice *vdev, const char *name, int nr) +{ + g_autofree char *fdname = EVENT_FD_NAME(vdev, name); + cpr_delete_fd(fdname, nr); +} + /* Create new or reuse existing eventfd */ static int vfio_notifier_init(VFIOPCIDevice *vdev, EventNotifier *e, const char *name, int nr) { - int fd = -1; /* placeholder until a subsequent patch */ + int fd = load_event_fd(vdev, name, nr); int ret = 0; if (fd >= 0) { @@ -71,6 +97,8 @@ static int vfio_notifier_init(VFIOPCIDevice *vdev, EventNotifier *e, Error *err = NULL; error_setg_errno(&err, -ret, "vfio_notifier_init %s failed", name); error_report_err(err); + } else { + save_event_fd(vdev, name, nr, e); } } return ret; @@ -79,6 +107,7 @@ static int vfio_notifier_init(VFIOPCIDevice *vdev, EventNotifier *e, static void vfio_notifier_cleanup(VFIOPCIDevice *vdev, EventNotifier *e, const char *name, int nr) { + delete_event_fd(vdev, name, nr); event_notifier_cleanup(e); } @@ -561,6 +590,15 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr, int ret; bool resizing = !!(vdev->nr_vectors < nr + 1); + /* + * Ignore the callback from msix_set_vector_notifiers during resume. + * The necessary subset of these actions is called from vfio_claim_vectors + * during post load. + */ + if (vdev->vbasedev.reused) { + return 0; + } + trace_vfio_msix_vector_do_use(vdev->vbasedev.name, nr); vector = &vdev->msi_vectors[nr]; @@ -2897,6 +2935,11 @@ static void vfio_register_err_notifier(VFIOPCIDevice *vdev) fd = event_notifier_get_fd(&vdev->err_notifier); qemu_set_fd_handler(fd, vfio_err_notifier_handler, NULL, vdev); + /* Do not alter irq_signaling during vfio_realize for cpr */ + if (vdev->vbasedev.reused) { + return; + } + if (!vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_ERR_IRQ_INDEX, 0, VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) { error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name); @@ -2961,6 +3004,12 @@ static void vfio_register_req_notifier(VFIOPCIDevice *vdev) fd = event_notifier_get_fd(&vdev->req_notifier); qemu_set_fd_handler(fd, vfio_req_notifier_handler, NULL, vdev); + /* Do not alter irq_signaling during vfio_realize for cpr */ + if (vdev->vbasedev.reused) { + vdev->req_enabled = true; + return; + } + if (!vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_REQ_IRQ_INDEX, 0, VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) { error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name); @@ -3454,6 +3503,46 @@ static void vfio_pci_set_fd(Object *obj, const char *str, Error **errp) } #endif +static void vfio_claim_vectors(VFIOPCIDevice *vdev, int nr_vectors, bool msix) +{ + int i, fd; + bool pending = false; + PCIDevice *pdev = &vdev->pdev; + + vdev->nr_vectors = nr_vectors; + vdev->msi_vectors = g_new0(VFIOMSIVector, nr_vectors); + vdev->interrupt = msix ? VFIO_INT_MSIX : VFIO_INT_MSI; + + vfio_prepare_kvm_msi_virq_batch(vdev); + + for (i = 0; i < nr_vectors; i++) { + VFIOMSIVector *vector = &vdev->msi_vectors[i]; + + fd = load_event_fd(vdev, "interrupt", i); + if (fd >= 0) { + vfio_vector_init(vdev, i); + qemu_set_fd_handler(fd, vfio_msi_interrupt, NULL, vector); + } + + if (load_event_fd(vdev, "kvm_interrupt", i) >= 0) { + vfio_add_kvm_msi_virq(vdev, vector, i, msix); + } else { + vdev->msi_vectors[i].virq = -1; + } + + if (msix && msix_is_pending(pdev, i) && msix_is_masked(pdev, i)) { + set_bit(i, vdev->msix->pending); + pending = true; + } + } + + vfio_commit_kvm_msi_virq_batch(vdev); + + if (msix) { + memory_region_set_enabled(&pdev->msix_pba_mmio, pending); + } +} + /* * The kernel may change non-emulated config bits. Exclude them from the * changed-bits check in get_pci_config_device. @@ -3472,14 +3561,40 @@ static int vfio_pci_pre_load(void *opaque) return 0; } +static int vfio_pci_post_load(void *opaque, int version_id) +{ + VFIOPCIDevice *vdev = opaque; + PCIDevice *pdev = &vdev->pdev; + int nr_vectors; + + if (msix_enabled(pdev)) { + msix_set_vector_notifiers(pdev, vfio_msix_vector_use, + vfio_msix_vector_release, NULL); + nr_vectors = vdev->msix->entries; + vfio_claim_vectors(vdev, nr_vectors, true); + + } else if (msi_enabled(pdev)) { + nr_vectors = msi_nr_vectors_allocated(pdev); + vfio_claim_vectors(vdev, nr_vectors, false); + + } else if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) { + g_assert_not_reached(); /* completed in a subsequent patch */ + } + + return 0; +} + static const VMStateDescription vfio_pci_vmstate = { .name = "vfio-pci", .version_id = 0, .minimum_version_id = 0, .priority = MIG_PRI_VFIO_PCI, /* must load before container */ .pre_load = vfio_pci_pre_load, + .post_load = vfio_pci_post_load, .needed = cpr_needed_for_reuse, .fields = (VMStateField[]) { + VMSTATE_PCI_DEVICE(pdev, VFIOPCIDevice), + VMSTATE_MSIX_TEST(pdev, VFIOPCIDevice, vfio_msix_present), VMSTATE_END_OF_LIST() } }; From patchwork Tue Jul 9 20:58:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Sistare X-Patchwork-Id: 13728521 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 035BAC41513 for ; Tue, 9 Jul 2024 20:59:53 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sRHvf-0005bt-Vc; Tue, 09 Jul 2024 16:59:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHva-0005EP-Ce for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:10 -0400 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHvV-0005Ww-Og for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:09 -0400 Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 469KtXcS013370; Tue, 9 Jul 2024 20:59:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= corp-2023-11-20; bh=MCOGnJm4jHNb5bsoF8nZWFyVlnYU7Rs93xv2mjaFw/o=; b= j9M+l+61TzX1i3hetoHpf60oIhSwzAxc6N04mXjTnnwCc9SQIEqZjjdKVLuKspoN SnukKuT5mvKcOVZeo0gHTH1IkGOkwN5DJubrWU6A7CJs2xuJbrHjC2c2foqi1ZyN XjdCsGkVTPugsSkYwqhCcaQHXaH7/mNRp9FfXSxp3rzDQS3iu3O3CTOmOV/u6hye D3MxZmF/97WkRntj7de1YRfWNvN88byfrO1Ng5L9gZNRpjYhXIMTGhy21yIr84/j 9TtDoEPowPcF9wVsrdjxRS7af49sW7ezMUDV/bbez+d6Z9RXhMRnMzZXctIXKcgn fFBAM4LarYSXbJzR3clz0A== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 406wky60gm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:03 +0000 (GMT) Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 469JfoPx005016; Tue, 9 Jul 2024 20:59:02 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 407tve98sr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:02 +0000 Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 469KwwD3012128; Tue, 9 Jul 2024 20:59:02 GMT Received: from ca-dev63.us.oracle.com (ca-dev63.us.oracle.com [10.211.8.221]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 407tve98qa-7; Tue, 09 Jul 2024 20:59:02 +0000 From: Steve Sistare To: qemu-devel@nongnu.org Cc: Alex Williamson , Cedric Le Goater , "Michael S. Tsirkin" , Marcel Apfelbaum , Peter Xu , Fabiano Rosas , Steve Sistare Subject: [PATCH V1 6/8] vfio-pci: cpr part 3 (intx) Date: Tue, 9 Jul 2024 13:58:55 -0700 Message-Id: <1720558737-451106-7-git-send-email-steven.sistare@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> References: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-09_09,2024-07-09_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 bulkscore=0 adultscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2406180000 definitions=main-2407090143 X-Proofpoint-ORIG-GUID: mZrVspZ_u36m-ycI8v2YJRieG4ut_cN2 X-Proofpoint-GUID: mZrVspZ_u36m-ycI8v2YJRieG4ut_cN2 Received-SPF: pass client-ip=205.220.165.32; envelope-from=steven.sistare@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Preserve vfio INTX state across cpr-exec. Preserve VFIOINTx fields as follows: pin : Recover this from the vfio config in kernel space interrupt : Preserve its eventfd descriptor across exec. unmask : Ditto route.irq : This could perhaps be recovered in vfio_pci_post_load by calling pci_device_route_intx_to_irq(pin), whose implementation reads config space for a bridge device such as ich9. However, there is no guarantee that the bridge vmstate is read before vfio vmstate. Rather than fiddling with MigrationPriority for vmstate handlers, explicitly save route.irq in vfio vmstate. pending : save in vfio vmstate. mmap_timeout, mmap_timer : Re-initialize bool kvm_accel : Re-initialize In vfio_realize, defer calling vfio_intx_enable until the vmstate is available, in vfio_pci_post_load. Modify vfio_intx_enable and vfio_intx_kvm_enable to skip vfio initialization, but still perform kvm initialization. Signed-off-by: Steve Sistare --- hw/vfio/pci.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 47 insertions(+), 4 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index f0213e0..b5e7592 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -184,12 +184,17 @@ static bool vfio_intx_enable_kvm(VFIOPCIDevice *vdev, Error **errp) return true; } + if (vdev->vbasedev.reused) { + goto skip_state; + } + /* Get to a known interrupt state */ qemu_set_fd_handler(irq_fd, NULL, NULL, vdev); vfio_mask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX); vdev->intx.pending = false; pci_irq_deassert(&vdev->pdev); +skip_state: /* Get an eventfd for resample/unmask */ if (vfio_notifier_init(vdev, &vdev->intx.unmask, "intx-unmask", 0)) { error_setg(errp, "vfio_notifier_init intx-unmask failed"); @@ -204,6 +209,10 @@ static bool vfio_intx_enable_kvm(VFIOPCIDevice *vdev, Error **errp) goto fail_irqfd; } + if (vdev->vbasedev.reused) { + goto skip_irq; + } + if (!vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX, 0, VFIO_IRQ_SET_ACTION_UNMASK, event_notifier_get_fd(&vdev->intx.unmask), @@ -214,6 +223,7 @@ static bool vfio_intx_enable_kvm(VFIOPCIDevice *vdev, Error **errp) /* Let'em rip */ vfio_unmask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX); +skip_irq: vdev->intx.kvm_accel = true; trace_vfio_intx_enable_kvm(vdev->vbasedev.name); @@ -329,7 +339,13 @@ static bool vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp) return true; } - vfio_disable_interrupts(vdev); + /* + * Do not alter interrupt state during vfio_realize and cpr load. The + * reused flag is cleared thereafter. + */ + if (!vdev->vbasedev.reused) { + vfio_disable_interrupts(vdev); + } vdev->intx.pin = pin - 1; /* Pin A (1) -> irq[0] */ pci_config_set_interrupt_pin(vdev->pdev.config, pin); @@ -351,7 +367,8 @@ static bool vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp) fd = event_notifier_get_fd(&vdev->intx.interrupt); qemu_set_fd_handler(fd, vfio_intx_interrupt, NULL, vdev); - if (!vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX, 0, + if (!vdev->vbasedev.reused && + !vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX, 0, VFIO_IRQ_SET_ACTION_TRIGGER, fd, errp)) { qemu_set_fd_handler(fd, NULL, NULL, vdev); vfio_notifier_cleanup(vdev, &vdev->intx.interrupt, "intx-interrupt", 0); @@ -3262,7 +3279,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) vfio_intx_routing_notifier); vdev->irqchip_change_notifier.notify = vfio_irqchip_change; kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier); - if (!vfio_intx_enable(vdev, errp)) { + /* Wait until cpr load reads intx routing data to enable */ + if (!vdev->vbasedev.reused && !vfio_intx_enable(vdev, errp)) { goto out_deregister; } } @@ -3578,12 +3596,36 @@ static int vfio_pci_post_load(void *opaque, int version_id) vfio_claim_vectors(vdev, nr_vectors, false); } else if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) { - g_assert_not_reached(); /* completed in a subsequent patch */ + Error *err = NULL; + if (!vfio_intx_enable(vdev, &err)) { + error_report_err(err); + return -1; + } } return 0; } +static const VMStateDescription vfio_intx_vmstate = { + .name = "vfio-intx", + .version_id = 0, + .minimum_version_id = 0, + .fields = (VMStateField[]) { + VMSTATE_BOOL(pending, VFIOINTx), + VMSTATE_UINT32(route.mode, VFIOINTx), + VMSTATE_INT32(route.irq, VFIOINTx), + VMSTATE_END_OF_LIST() + } +}; + +#define VMSTATE_VFIO_INTX(_field, _state) { \ + .name = (stringify(_field)), \ + .size = sizeof(VFIOINTx), \ + .vmsd = &vfio_intx_vmstate, \ + .flags = VMS_STRUCT, \ + .offset = vmstate_offset_value(_state, _field, VFIOINTx), \ +} + static const VMStateDescription vfio_pci_vmstate = { .name = "vfio-pci", .version_id = 0, @@ -3595,6 +3637,7 @@ static const VMStateDescription vfio_pci_vmstate = { .fields = (VMStateField[]) { VMSTATE_PCI_DEVICE(pdev, VFIOPCIDevice), VMSTATE_MSIX_TEST(pdev, VFIOPCIDevice, vfio_msix_present), + VMSTATE_VFIO_INTX(intx, VFIOPCIDevice), VMSTATE_END_OF_LIST() } }; From patchwork Tue Jul 9 20:58:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Sistare X-Patchwork-Id: 13728528 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id ADA88C2BD09 for ; Tue, 9 Jul 2024 21:01:04 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sRHvd-0005Pj-07; Tue, 09 Jul 2024 16:59:13 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHvb-0005HQ-2c for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:11 -0400 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHvW-0005X2-Bt for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:10 -0400 Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 469KtVh2008152; Tue, 9 Jul 2024 20:59:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= corp-2023-11-20; bh=QhV3i+9aKAg2+l4BBjw00kc4KHdu5dvlD21pKuInl4E=; b= VMGJhPyK72BifOuntJ7yP69eBRLazrLVi6LNJzEefRZpdXbhLHgJ0wyKCwSbV/j1 CPYfmk3ALoll6GPS+YoX4pDpmTu8TECZ347MVXVmdvqNpnSVR9VfuonOXEjrfg0g nSVR6CW2ILr/fx5d/FXUKkUERyHEjH5SQQd45JIuMCtrUfYicHrAdBnfliNUG+8M zDTQuH92OOz8rEcy341NS8rEB+1UWs21tF1Kd3qIg5wBun14nZhYI8TvSLY6Qhij Y+Ddt3s4vE9e9CadFaNvAWUpki0/gKceXUZgLU5Yv1WR5ECCaTSbSJdRoiGpG1Lk tBEgFd8gjRpO2uqDYTmHcA== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 406wgpwypf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:03 +0000 (GMT) Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 469JmJRN005043; Tue, 9 Jul 2024 20:59:03 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 407tve98sw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:03 +0000 Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 469KwwD5012128; Tue, 9 Jul 2024 20:59:02 GMT Received: from ca-dev63.us.oracle.com (ca-dev63.us.oracle.com [10.211.8.221]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 407tve98qa-8; Tue, 09 Jul 2024 20:59:02 +0000 From: Steve Sistare To: qemu-devel@nongnu.org Cc: Alex Williamson , Cedric Le Goater , "Michael S. Tsirkin" , Marcel Apfelbaum , Peter Xu , Fabiano Rosas , Steve Sistare Subject: [PATCH V1 7/8] vfio: vfio_find_ram_discard_listener Date: Tue, 9 Jul 2024 13:58:56 -0700 Message-Id: <1720558737-451106-8-git-send-email-steven.sistare@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> References: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-09_09,2024-07-09_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 bulkscore=0 adultscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2406180000 definitions=main-2407090143 X-Proofpoint-ORIG-GUID: XPRsVREJU11_68jy_BnjV1pQsKxyyHfE X-Proofpoint-GUID: XPRsVREJU11_68jy_BnjV1pQsKxyyHfE Received-SPF: pass client-ip=205.220.165.32; envelope-from=steven.sistare@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Define vfio_find_ram_discard_listener as a subroutine so additional calls to it may be added in a subsequent patch. Signed-off-by: Steve Sistare --- hw/vfio/common.c | 35 ++++++++++++++++++++++------------- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 72a692a..5c7baad 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -561,6 +561,26 @@ static bool vfio_get_section_iova_range(VFIOContainerBase *bcontainer, return true; } +static VFIORamDiscardListener *vfio_find_ram_discard_listener( + VFIOContainerBase *bcontainer, MemoryRegionSection *section) +{ + VFIORamDiscardListener *vrdl = NULL; + + QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) { + if (vrdl->mr == section->mr && + vrdl->offset_within_address_space == + section->offset_within_address_space) { + break; + } + } + + if (!vrdl) { + hw_error("vfio: Trying to sync missing RAM discard listener"); + /* does not return */ + } + return vrdl; +} + static void vfio_listener_region_add(MemoryListener *listener, MemoryRegionSection *section) { @@ -1285,19 +1305,8 @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer, MemoryRegionSection *section) { RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr); - VFIORamDiscardListener *vrdl = NULL; - - QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) { - if (vrdl->mr == section->mr && - vrdl->offset_within_address_space == - section->offset_within_address_space) { - break; - } - } - - if (!vrdl) { - hw_error("vfio: Trying to sync missing RAM discard listener"); - } + VFIORamDiscardListener *vrdl = + vfio_find_ram_discard_listener(bcontainer, section); /* * We only want/can synchronize the bitmap for actually mapped parts - From patchwork Tue Jul 9 20:58:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Sistare X-Patchwork-Id: 13728526 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1F9C5C2BD09 for ; Tue, 9 Jul 2024 21:00:47 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sRHvd-0005QO-EJ; Tue, 09 Jul 2024 16:59:13 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHva-0005EN-Cb for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:10 -0400 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sRHvW-0005XB-C6 for qemu-devel@nongnu.org; Tue, 09 Jul 2024 16:59:10 -0400 Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 469KtuIM005681; Tue, 9 Jul 2024 20:59:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= corp-2023-11-20; bh=KgLDCh70MzL+z9CrRxPKd9/rhKIzpnFQenotDp8DH3Q=; b= Mzf5ofTDKwYkw0GleVk8T2SjNcSa/hrgEoCPLr1Utz1Tq2DSBM0C2zlwJTs4jp5M QZhYGaSLVjn26mzMN++JkYKmoNiRiHG+Oy840/TaA/c7/9qhvYXp1IvFv5LJrO03 5T8MHYxpI+4ocWlhEYx0VdT1YG+3U0fzb7aCwSJOalAo/B2FCw/xNlCaUy5+T77M ieHHSaQ2vbfbNsJKWWPT1Z+CBUJrHPdZjSVT+fLyC8z1cZiREV/uH+oEHkK46ZbZ ScQtiVLJyITmKBKWeeCqQfiC38FfvPUQR4ah+V+Xs0Dx4WssT444Uh2aVMTU6MYO 0t8SFSrQ4ch+G1clf41t+g== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 406wknnym1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:04 +0000 (GMT) Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 469KCjpT005055; Tue, 9 Jul 2024 20:59:03 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 407tve98t4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jul 2024 20:59:03 +0000 Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 469KwwD7012128; Tue, 9 Jul 2024 20:59:03 GMT Received: from ca-dev63.us.oracle.com (ca-dev63.us.oracle.com [10.211.8.221]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 407tve98qa-9; Tue, 09 Jul 2024 20:59:03 +0000 From: Steve Sistare To: qemu-devel@nongnu.org Cc: Alex Williamson , Cedric Le Goater , "Michael S. Tsirkin" , Marcel Apfelbaum , Peter Xu , Fabiano Rosas , Steve Sistare Subject: [PATCH V1 8/8] vfio-pci: recover from unmap-all-vaddr failure Date: Tue, 9 Jul 2024 13:58:57 -0700 Message-Id: <1720558737-451106-9-git-send-email-steven.sistare@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> References: <1720558737-451106-1-git-send-email-steven.sistare@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-09_09,2024-07-09_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 bulkscore=0 adultscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2406180000 definitions=main-2407090143 X-Proofpoint-GUID: vf1AnGtyxJQrR9paxth5Cj0mJWVMAaBW X-Proofpoint-ORIG-GUID: vf1AnGtyxJQrR9paxth5Cj0mJWVMAaBW Received-SPF: pass client-ip=205.220.177.32; envelope-from=steven.sistare@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org If there are multiple containers and unmap-all fails for some container, we need to remap vaddr for the other containers for which unmap-all succeeded. Recover by walking all address ranges of all containers to restore the vaddr for each. Do so by invoking the vfio listener callback, and passing a new "remap" flag that tells it to restore a mapping without re-allocating new userland data structures. Signed-off-by: Steve Sistare --- hw/vfio/common.c | 45 ++++++++++++++++++++++++++++++++--- hw/vfio/cpr-legacy.c | 44 ++++++++++++++++++++++++++++++++++ include/hw/vfio/vfio-common.h | 4 +++- include/hw/vfio/vfio-container-base.h | 1 + 4 files changed, 90 insertions(+), 4 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 5c7baad..da2e0ec 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -586,11 +586,12 @@ static void vfio_listener_region_add(MemoryListener *listener, { VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase, listener); - vfio_container_region_add(bcontainer, section); + vfio_container_region_add(bcontainer, section, false); } void vfio_container_region_add(VFIOContainerBase *bcontainer, - MemoryRegionSection *section) + MemoryRegionSection *section, + bool remap) { hwaddr iova, end; Int128 llend, llsize; @@ -626,6 +627,30 @@ void vfio_container_region_add(VFIOContainerBase *bcontainer, int iommu_idx; trace_vfio_listener_region_add_iommu(iova, end); + + /* + * If remap, then VFIO_DMA_UNMAP_FLAG_VADDR has been called, and we + * want to remap the vaddr. vfio_container_region_add was already + * called in the past, so the giommu already exists. Find it and + * replay it, which calls vfio_dma_map further down the stack. + */ + + if (remap) { + hwaddr as_offset = section->offset_within_address_space; + hwaddr iommu_offset = as_offset - section->offset_within_region; + + QLIST_FOREACH(giommu, &bcontainer->giommu_list, giommu_next) { + if (giommu->iommu_mr == iommu_mr && + giommu->iommu_offset == iommu_offset) { + memory_region_iommu_replay(giommu->iommu_mr, &giommu->n); + return; + } + } + error_report("Container cannot find iommu region %s offset %lx", + memory_region_name(section->mr), iommu_offset); + goto fail; + } + /* * FIXME: For VFIO iommu types which have KVM acceleration to * avoid bouncing all map/unmaps through qemu this way, this @@ -676,7 +701,21 @@ void vfio_container_region_add(VFIOContainerBase *bcontainer, * about changes. */ if (memory_region_has_ram_discard_manager(section->mr)) { - vfio_register_ram_discard_listener(bcontainer, section); + /* + * If remap, then VFIO_DMA_UNMAP_FLAG_VADDR has been called, and we + * want to remap the vaddr. vfio_container_region_add was already + * called in the past, so the ram discard listener already exists. + * Call its populate function directly, which calls vfio_dma_map. + */ + if (remap) { + VFIORamDiscardListener *vrdl = + vfio_find_ram_discard_listener(bcontainer, section); + if (vrdl->listener.notify_populate(&vrdl->listener, section)) { + error_report("listener.notify_populate failed"); + } + } else { + vfio_register_ram_discard_listener(bcontainer, section); + } return; } diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c index bc51ebe..c4b95a8 100644 --- a/hw/vfio/cpr-legacy.c +++ b/hw/vfio/cpr-legacy.c @@ -29,9 +29,18 @@ static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp) error_setg_errno(errp, errno, "vfio_dma_unmap_vaddr_all"); return false; } + container->vaddr_unmapped = true; return true; } +static void vfio_region_remap(MemoryListener *listener, + MemoryRegionSection *section) +{ + VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase, + remap_listener); + vfio_container_region_add(bcontainer, section, true); +} + static bool vfio_can_cpr_exec(VFIOContainer *container, Error **errp) { if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) { @@ -95,6 +104,37 @@ static const VMStateDescription vfio_container_vmstate = { } }; +static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier, + MigrationEvent *e, Error **errp) +{ + VFIOContainer *container = + container_of(notifier, VFIOContainer, cpr_exec_notifier); + VFIOContainerBase *bcontainer = &container->bcontainer; + + if (e->type != MIG_EVENT_PRECOPY_FAILED) { + return 0; + } + + if (container->vaddr_unmapped) { + /* + * Force a call to vfio_region_remap for each mapped section by + * temporarily registering a listener, which calls vfio_dma_map + * further down the stack. Set reused so vfio_dma_map restores vaddr. + */ + bcontainer->reused = true; + bcontainer->remap_listener = (MemoryListener) { + .name = "vfio recover", + .region_add = vfio_region_remap + }; + memory_listener_register(&bcontainer->remap_listener, + bcontainer->space->as); + memory_listener_unregister(&bcontainer->remap_listener); + bcontainer->reused = false; + container->vaddr_unmapped = false; + } + return 0; +} + bool vfio_legacy_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp) { @@ -107,6 +147,9 @@ bool vfio_legacy_cpr_register_container(VFIOContainerBase *bcontainer, vmstate_register(NULL, -1, &vfio_container_vmstate, container); + migration_add_notifier_mode(&container->cpr_exec_notifier, + vfio_cpr_fail_notifier, + MIG_MODE_CPR_EXEC); return true; } @@ -115,4 +158,5 @@ void vfio_legacy_cpr_unregister_container(VFIOContainerBase *bcontainer) VFIOContainer *container = VFIO_CONTAINER(bcontainer); vmstate_unregister(NULL, &vfio_container_vmstate, container); + migration_remove_notifier(&container->cpr_exec_notifier); } diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 7c4283b..1902c8f 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -81,6 +81,8 @@ typedef struct VFIOContainer { VFIOContainerBase bcontainer; int fd; /* /dev/vfio/vfio, empowered by the attached groups */ unsigned iommu_type; + NotifierWithReturn cpr_exec_notifier; + bool vaddr_unmapped; QLIST_HEAD(, VFIOGroup) group_list; } VFIOContainer; @@ -292,7 +294,7 @@ int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova, uint64_t size, ram_addr_t ram_addr, Error **errp); void vfio_container_region_add(VFIOContainerBase *bcontainer, - MemoryRegionSection *section); + MemoryRegionSection *section, bool remap); void vfio_listener_register(VFIOContainerBase *bcontainer); /* Returns 0 on success, or a negative errno. */ diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h index 82ccf0c..3d30365 100644 --- a/include/hw/vfio/vfio-container-base.h +++ b/include/hw/vfio/vfio-container-base.h @@ -37,6 +37,7 @@ typedef struct VFIOContainerBase { Object parent; VFIOAddressSpace *space; MemoryListener listener; + MemoryListener remap_listener; Error *error; bool initialized; bool reused;