From patchwork Thu Nov 24 17:39:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13055230 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D03FFC4332F for ; Thu, 24 Nov 2022 17:40:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229459AbiKXRkh (ORCPT ); Thu, 24 Nov 2022 12:40:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229619AbiKXRkb (ORCPT ); Thu, 24 Nov 2022 12:40:31 -0500 Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1nam02on2055.outbound.protection.outlook.com [40.107.96.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3952713F489 for ; Thu, 24 Nov 2022 09:40:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kM9bilcaL6dlqm34xrlCH/Y8iPrYAh8ml6rdAC8JBRb8/BBTnTolDXP9iV2r/naar1sQLJpLKSMapfmAEULQi6iy3ADnRRMZu4BoArLPoGHxQxmc1ENVvfIas3JItDjFZyHoIzQKRJqT0kKfTMVEgNiIvBmxgGmMLhWhX2jEaKTiWfWoAtqckyKyBjUDc337n7Q/dDbg5I2nrCDnWW6sRAK6ciN1y2Q+z3qTWayD8wgldNm9+oVlwa4UJqZTVN56KPtw0W6v9B0/bT4qb+NZD1lCawmAIBzQyURNgwcyBaBQq/YIVT7t4ZoRu+rGtkINDLqy9cC/4iZbSZ6MkYkCLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=atS9WtVhHIUg8aIzy0neojIfiBRjVu88h+ixljUF16s=; b=C3skPoqVTr/pgq/Z5yvEekMqAxt0grW0qiOveEDxTreujdmRBOpunbunR2k7/6p34uH28xQGpUtPFwNfFFvd695GoW7LxoD4ILIjXSwfh2NJpRCKMP5RcLtTo02/zYQlAsZl7y3ZN1qoHuOuAci9rno+YTSfJjzWwJMmwEtm44nldrGJBz9aM8axjbupP/mrr4qO0oDc0IRCcJyt55s2QeKs4Y3/F2XyyR1rNBf5X7zDryEWPgmLqcNfAAOcpKsWPxMcsKdZnCCQuLoXHybyjPdfUwDurBg6qFDT2IRJJxzrerSDhd58jLKQzJsaEFon16utAalkW+sV9xm2gz/skQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=atS9WtVhHIUg8aIzy0neojIfiBRjVu88h+ixljUF16s=; b=eEP2JPwkxrswAh33U7jXVERxkfuRdrzw0u8bjfx2/4pG68QhskCPBN56qi2RibgsWTlTcecnZ5j04MRTh6XfcabQvHhDvVTtLwc3wnwKUbF40XPEMBKpC0EB2+SyPSQCFOHZ16Yd733pLZAY2puvX3v6QTPYWdK8iOb/w3N7ruiFhE+Gx4G2CMHecP54rg0TBgOd3ymf0UgX+01ZXljt8wm4X5tDzIPS60b9FNXbNHU0B040Hujtt8PnZniv/lkGawVxi1ZUQNI0bCd2jstOoIMKakh1/L7WKrTanWDTL+BI2u+tH+k8M6cesAoLzEGgg8nG+H598sVM6VaL3FGW5Q== Received: from BYAPR11CA0072.namprd11.prod.outlook.com (2603:10b6:a03:80::49) by MW4PR12MB7432.namprd12.prod.outlook.com (2603:10b6:303:22d::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5834.15; Thu, 24 Nov 2022 17:40:29 +0000 Received: from CO1PEPF00001A5E.namprd05.prod.outlook.com (2603:10b6:a03:80:cafe::6) by BYAPR11CA0072.outlook.office365.com (2603:10b6:a03:80::49) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:40:29 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by CO1PEPF00001A5E.mail.protection.outlook.com (10.167.241.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.17 via Frontend Transport; Thu, 24 Nov 2022 17:40:29 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:22 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:22 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 24 Nov 2022 09:40:19 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH V1 vfio 01/14] net/mlx5: Introduce ifc bits for pre_copy Date: Thu, 24 Nov 2022 19:39:19 +0200 Message-ID: <20221124173932.194654-2-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com> References: <20221124173932.194654-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF00001A5E:EE_|MW4PR12MB7432:EE_ X-MS-Office365-Filtering-Correlation-Id: 7e0a55f0-583b-4164-df71-08dace42fa94 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: nzOHNxUIdRFhNYoDk6/n5w/4Pt7XqQnHkR5LmVE9p0g1TET6wPvLXy0PVzlibUtFfr+E3rX/XS06T3vuVPvjuov4yq+FQOd0o2VpS1uMGrRU3XWDmRQiGItliTKTlSoNtHAbJpt6F0T+84bSwAq4otSHMadzgRed9N+6teOdgukVOJCa40o+92/hEzx0Z/tfAdf/HAOF2tCZtsuyyes3rlAx+2tKS5J7QEY10Ko4EHTVseQKmJKoDENtn9Sl3xIphfml+Sr7BhSmfW4inpcMtZ1gF0ww8HE1d10ylJIe/EPpA5qaleBOO5PwGPOrl7QxwfryJMthozP6jt5qaZuOWH+MonnBtgtSYXmd1BNY2lyoG0ZHF6Wx9yuXnzmBwt2A+c/2g1yaSd8t0PEYJtrfj6CxKgG0lK0ZGWT0oLx0H05BowiZM6UNw1CM+ih/Ly/m+D2mSvODU3EDBmKkjJFh6XyhT2B20Uz28GfgDtGNXVgMHtWHU5Kqu6Yr1/8topkRg2a99ZS0qp6tBW/k2yBdYjO6RqykN2ItOBoAUspOicWbAC7jcyiXGftssIajJVHtv2+vhwolAZIxEvmiNU9jKHff7SvWcOkot4HBMESIbOcVVPbQgb7uXJ57fzKaka+mm02ulimtyo7sFVJ6gzEAh/jzdKSvVvO8q88Q8wFrxKUb4FuVD4A+pD+9oxVvoYvpm+f1+UMlu1Y2O1/0L06MLw== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(396003)(376002)(346002)(136003)(451199015)(46966006)(40470700004)(36840700001)(7696005)(54906003)(478600001)(110136005)(4326008)(70586007)(26005)(8936002)(86362001)(82310400005)(8676002)(40480700001)(70206006)(316002)(41300700001)(6636002)(40460700003)(5660300002)(2906002)(83380400001)(36756003)(7636003)(1076003)(82740400003)(36860700001)(2616005)(47076005)(186003)(336012)(356005)(426003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2022 17:40:29.2632 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7e0a55f0-583b-4164-df71-08dace42fa94 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF00001A5E.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW4PR12MB7432 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory Introduce ifc related stuff to enable PRE_COPY of VF during migration. Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- include/linux/mlx5/mlx5_ifc.h | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 5a4e914e2a6f..230a96626a5f 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1882,7 +1882,12 @@ struct mlx5_ifc_cmd_hca_cap_2_bits { u8 max_reformat_remove_size[0x8]; u8 max_reformat_remove_offset[0x8]; - u8 reserved_at_c0[0xe0]; + u8 reserved_at_c0[0x8]; + u8 migration_multi_load[0x1]; + u8 migration_tracking_state[0x1]; + u8 reserved_at_ca[0x16]; + + u8 reserved_at_e0[0xc0]; u8 reserved_at_1a0[0xb]; u8 log_min_mkey_entity_size[0x5]; @@ -11918,7 +11923,8 @@ struct mlx5_ifc_query_vhca_migration_state_in_bits { u8 reserved_at_20[0x10]; u8 op_mod[0x10]; - u8 reserved_at_40[0x10]; + u8 incremental[0x1]; + u8 reserved_at_41[0xf]; u8 vhca_id[0x10]; u8 reserved_at_60[0x20]; @@ -11944,7 +11950,9 @@ struct mlx5_ifc_save_vhca_state_in_bits { u8 reserved_at_20[0x10]; u8 op_mod[0x10]; - u8 reserved_at_40[0x10]; + u8 incremental[0x1]; + u8 set_track[0x1]; + u8 reserved_at_42[0xe]; u8 vhca_id[0x10]; u8 reserved_at_60[0x20]; From patchwork Thu Nov 24 17:39:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13055232 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBA37C433FE for ; Thu, 24 Nov 2022 17:40:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229612AbiKXRkk (ORCPT ); Thu, 24 Nov 2022 12:40:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229604AbiKXRkf (ORCPT ); Thu, 24 Nov 2022 12:40:35 -0500 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2075.outbound.protection.outlook.com [40.107.94.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CDE0134750 for ; Thu, 24 Nov 2022 09:40:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NIvnMZtRUhdknkNnxmqKS1qGcBUbV7Zs/QtMWvGSj9K5qhT9dwjsJRGxuytIZ9JEKwZLOQVqgDPd83jlaMrMk+iUiut2wJN+5qF0ep1AaaL8WwyRqKlO/Q+tRSt7f9LYQAlwrxBwo2RKrIfIWXK52x8ANIf9wI7BOcRGI6Zo6V2HHNz4hfWxgukU0nMFSZxs5FW+SYLqbMtVR5LRUzmGFYhf4eyUYvatnX/wkYJEAmieix+DDBzYStXiipHDt62DUP3mnljYWYAQGULbPkx3s/HbBUduX0u8QyRkXJtBmJgFROMLQASpVyKB0IoAYdEdAoxaSgliNXBw62Y29imnAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=e/JLJLQnEFLDMv6tQswtaObA+QCfZQcRI52AyDES8XE=; b=WMVpwXlhbmGG4cDUjp9Um9c5VkIttZzuSjGPqcU1+4LTfAntWuwMxSss9ROtdhPDbHkeyF1S4uv4EY6sfAC9yuM6DoRZPy4rY063eFC9HjAyIjNKxQEK+6tNlyUf9hM2bvVlHNFbn1mGDpngJ07ME/G/FtYUqrBb6d8QZS18VviSNyPdta5ukDQJQPcOeHrxLFdTYKQwZFE+GaDlo8eIRHPy8ri13YuDEcf4LIk4uwfSoGhv7g57d2dVGt96xr5MJ6Qs8rX/eKR8M0oqNDf5y1Cwk95MqfpMnF2Jk/7fM70Jmb+mdZCT8tkX/cabF/kostA/0qY9bwrK5UZEV0R64Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=e/JLJLQnEFLDMv6tQswtaObA+QCfZQcRI52AyDES8XE=; b=dmAWpNvMSVibmgF/oJChFS9LyCCQKI4meAeBX9BMuN9tO6TkulYw97FmITR7XDsTrdSyA+z3ILN54LV6PEpdf4d1IUEhPMCsx0xeWEdFxGzabJbh/ge/iu2gNcAP9rbaPn6tChycyi2B1TEzihNKApY9FIa7uHBoi5vtqyaZtn+DYoZLD3IrtPOLNLkfjqrhxN4aXvKPTPEf08S4lckqPAw0NwHNouVfizELVb26SJRtrIcqXFDaNdjnNNraDBAkAPTedVPZN/G2axvyzRh8M3G4B071aPMXTxe6pdN3QHbdYv14MtTrEU23xZmVx1L6wQao1rVdYZpBmxq1hwBydA== Received: from MW4P220CA0008.NAMP220.PROD.OUTLOOK.COM (2603:10b6:303:115::13) by SA3PR12MB8024.namprd12.prod.outlook.com (2603:10b6:806:312::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19; Thu, 24 Nov 2022 17:40:31 +0000 Received: from CO1PEPF00001A5F.namprd05.prod.outlook.com (2603:10b6:303:115:cafe::a5) by MW4P220CA0008.outlook.office365.com (2603:10b6:303:115::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:40:31 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by CO1PEPF00001A5F.mail.protection.outlook.com (10.167.241.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.17 via Frontend Transport; Thu, 24 Nov 2022 17:40:31 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:25 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:25 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 24 Nov 2022 09:40:22 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH V1 vfio 02/14] vfio: Extend the device migration protocol with PRE_COPY Date: Thu, 24 Nov 2022 19:39:20 +0200 Message-ID: <20221124173932.194654-3-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com> References: <20221124173932.194654-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF00001A5F:EE_|SA3PR12MB8024:EE_ X-MS-Office365-Filtering-Correlation-Id: 935a6444-61fd-4152-5359-08dace42fbab X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: CdZVv5dOmxDo0cPOrLd2Px/nAGBd0SRJvZrdqjUWbtDuxMqIApm/Fd0spFQNqLRWMGEMADWQeveA+/nKPnbK4W7TALnFxw9fXl+4Pdr9p9eGGOurrxGbEYphOfVwXXFbw73lEMguKFUJIt71zEfVXs97Alq+cy28yxtxI52ZmaR8Jrq7vwD1Tgw/VksRIJ5Ht3pXmmnU12t283tp1+rO10SqgAQDIdL2XqL78ElTfbXqZfUNEoroQ9BfG4FxuYUbCcL2jV6Wcl1thgwD342PqhcWD16R/96+/uY5vqzkJdSepdA4YFge+C+3MnBNA2i7zdNe7UtUn+lBA/TDUzdQVwLvSpmicMaToxfzanw9ghZ+WEMboX/oW74XRbIj8pFuI1O26yHTYfnvO3iKlqKAqOLbBKDFeaNfsacYb69vv6ftj6xx2O5HpOTPVGfWhIBXrHIG7cUUd6uVfztmkn18emIpy0KFBhE1nPf0M0CQvnNuX9PTnjoXtLNrwBG64Snp8VyG3gIhA5/rx3JhB7k6gAgJapfF9HXOkR7KtgwM1sCXSN+JxNlcM/zvtrNxEkwnXTwGxUD19OyLzL+E6BI35vNITPWA4tDHIGSjdUr0blxwTG1LFqeFUmC/ScXSYvQYtgnhnPfumIJMqwjUR9HgOc6y1c1Hu1QMpcm9enerSB6R7zRU2zpUmhlF+38/g/jLaVNbllZtuasfHO6D6sYfBA== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(136003)(396003)(346002)(376002)(39860400002)(451199015)(46966006)(40470700004)(36840700001)(40480700001)(41300700001)(5660300002)(30864003)(70586007)(4326008)(70206006)(2906002)(8936002)(8676002)(83380400001)(47076005)(36756003)(316002)(110136005)(478600001)(426003)(54906003)(6636002)(26005)(7636003)(86362001)(336012)(2616005)(82740400003)(186003)(1076003)(82310400005)(40460700003)(36860700001)(356005)(7696005);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2022 17:40:31.0881 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 935a6444-61fd-4152-5359-08dace42fbab X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF00001A5F.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA3PR12MB8024 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jason Gunthorpe The optional PRE_COPY states open the saving data transfer FD before reaching STOP_COPY and allows the device to dirty track internal state changes with the general idea to reduce the volume of data transferred in the STOP_COPY stage. While in PRE_COPY the device remains RUNNING, but the saving FD is open. Only if the device also supports RUNNING_P2P can it support PRE_COPY_P2P, which halts P2P transfers while continuing the saving FD. PRE_COPY, with P2P support, requires the driver to implement 7 new arcs and exists as an optional FSM branch between RUNNING and STOP_COPY: RUNNING -> PRE_COPY -> PRE_COPY_P2P -> STOP_COPY A new ioctl VFIO_MIG_GET_PRECOPY_INFO is provided to allow userspace to query the progress of the precopy operation in the driver with the idea it will judge to move to STOP_COPY at least once the initial data set is transferred, and possibly after the dirty size has shrunk appropriately. This ioctl is valid only in PRE_COPY states and kernel driver should return -EINVAL from any other migration state. Compared to the v1 clarification, STOP_COPY -> PRE_COPY is blocked and to be defined in future. We also split the pending_bytes report into the initial and sustaining values, e.g.: initial_bytes and dirty_bytes. initial_bytes: Amount of initial mandatory precopy data. dirty_bytes: device state changes relative to data previously retrieved. These fields are not required to have any bearing to STOP_COPY phase. Signed-off-by: Jason Gunthorpe Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- drivers/vfio/vfio_main.c | 74 ++++++++++++++++++++++- include/uapi/linux/vfio.h | 122 ++++++++++++++++++++++++++++++++++++-- 2 files changed, 190 insertions(+), 6 deletions(-) diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 662e267a3e13..9c4a752dad4e 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1042,7 +1042,7 @@ int vfio_mig_get_next_state(struct vfio_device *device, enum vfio_device_mig_state new_fsm, enum vfio_device_mig_state *next_fsm) { - enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_RUNNING_P2P + 1 }; + enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_PRE_COPY_P2P + 1 }; /* * The coding in this table requires the driver to implement the * following FSM arcs: @@ -1057,30 +1057,65 @@ int vfio_mig_get_next_state(struct vfio_device *device, * RUNNING_P2P -> RUNNING * RUNNING_P2P -> STOP * STOP -> RUNNING_P2P - * Without P2P the driver must implement: + * + * If precopy is supported then the driver must support these additional + * FSM arcs: + * RUNNING -> PRE_COPY + * PRE_COPY -> RUNNING + * PRE_COPY -> STOP_COPY + * However, if precopy and P2P are supported together then the driver + * must support these additional arcs beyond the P2P arcs above: + * PRE_COPY -> RUNNING + * PRE_COPY -> PRE_COPY_P2P + * PRE_COPY_P2P -> PRE_COPY + * PRE_COPY_P2P -> RUNNING_P2P + * PRE_COPY_P2P -> STOP_COPY + * RUNNING -> PRE_COPY + * RUNNING_P2P -> PRE_COPY_P2P + * + * Without P2P and precopy the driver must implement: * RUNNING -> STOP * STOP -> RUNNING * * The coding will step through multiple states for some combination * transitions; if all optional features are supported, this means the * following ones: + * PRE_COPY -> PRE_COPY_P2P -> STOP_COPY + * PRE_COPY -> RUNNING -> RUNNING_P2P + * PRE_COPY -> RUNNING -> RUNNING_P2P -> STOP + * PRE_COPY -> RUNNING -> RUNNING_P2P -> STOP -> RESUMING + * PRE_COPY_P2P -> RUNNING_P2P -> RUNNING + * PRE_COPY_P2P -> RUNNING_P2P -> STOP + * PRE_COPY_P2P -> RUNNING_P2P -> STOP -> RESUMING * RESUMING -> STOP -> RUNNING_P2P + * RESUMING -> STOP -> RUNNING_P2P -> PRE_COPY_P2P * RESUMING -> STOP -> RUNNING_P2P -> RUNNING + * RESUMING -> STOP -> RUNNING_P2P -> RUNNING -> PRE_COPY * RESUMING -> STOP -> STOP_COPY + * RUNNING -> RUNNING_P2P -> PRE_COPY_P2P * RUNNING -> RUNNING_P2P -> STOP * RUNNING -> RUNNING_P2P -> STOP -> RESUMING * RUNNING -> RUNNING_P2P -> STOP -> STOP_COPY + * RUNNING_P2P -> RUNNING -> PRE_COPY * RUNNING_P2P -> STOP -> RESUMING * RUNNING_P2P -> STOP -> STOP_COPY + * STOP -> RUNNING_P2P -> PRE_COPY_P2P * STOP -> RUNNING_P2P -> RUNNING + * STOP -> RUNNING_P2P -> RUNNING -> PRE_COPY * STOP_COPY -> STOP -> RESUMING * STOP_COPY -> STOP -> RUNNING_P2P * STOP_COPY -> STOP -> RUNNING_P2P -> RUNNING + * + * The following transitions are blocked: + * STOP_COPY -> PRE_COPY + * STOP_COPY -> PRE_COPY_P2P */ static const u8 vfio_from_fsm_table[VFIO_DEVICE_NUM_STATES][VFIO_DEVICE_NUM_STATES] = { [VFIO_DEVICE_STATE_STOP] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, @@ -1089,14 +1124,38 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_RUNNING] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, }, + [VFIO_DEVICE_STATE_PRE_COPY] = { + [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P, + [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_PRE_COPY_P2P, + [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, + }, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = { + [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P, + [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY, + [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, + }, [VFIO_DEVICE_STATE_STOP_COPY] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_ERROR, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP, @@ -1105,6 +1164,8 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_RESUMING] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP, @@ -1113,6 +1174,8 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_RUNNING_P2P] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, @@ -1121,6 +1184,8 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_ERROR] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_ERROR, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_ERROR, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_ERROR, @@ -1131,6 +1196,11 @@ int vfio_mig_get_next_state(struct vfio_device *device, static const unsigned int state_flags_table[VFIO_DEVICE_NUM_STATES] = { [VFIO_DEVICE_STATE_STOP] = VFIO_MIGRATION_STOP_COPY, [VFIO_DEVICE_STATE_RUNNING] = VFIO_MIGRATION_STOP_COPY, + [VFIO_DEVICE_STATE_PRE_COPY] = + VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_PRE_COPY, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_MIGRATION_STOP_COPY | + VFIO_MIGRATION_P2P | + VFIO_MIGRATION_PRE_COPY, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_MIGRATION_STOP_COPY, [VFIO_DEVICE_STATE_RESUMING] = VFIO_MIGRATION_STOP_COPY, [VFIO_DEVICE_STATE_RUNNING_P2P] = diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 3e45dbaf190e..fca8e1b7e619 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -819,12 +819,20 @@ struct vfio_device_feature { * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P * is supported in addition to the STOP_COPY states. * + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_PRE_COPY means that + * PRE_COPY is supported in addition to the STOP_COPY states. + * + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY + * means that RUNNING_P2P, PRE_COPY and PRE_COPY_P2P are supported + * in addition to the STOP_COPY states. + * * Other combinations of flags have behavior to be defined in the future. */ struct vfio_device_feature_migration { __aligned_u64 flags; #define VFIO_MIGRATION_STOP_COPY (1 << 0) #define VFIO_MIGRATION_P2P (1 << 1) +#define VFIO_MIGRATION_PRE_COPY (1 << 2) }; #define VFIO_DEVICE_FEATURE_MIGRATION 1 @@ -875,8 +883,13 @@ struct vfio_device_feature_mig_state { * RESUMING - The device is stopped and is loading a new internal state * ERROR - The device has failed and must be reset * - * And 1 optional state to support VFIO_MIGRATION_P2P: + * And optional states to support VFIO_MIGRATION_P2P: * RUNNING_P2P - RUNNING, except the device cannot do peer to peer DMA + * And VFIO_MIGRATION_PRE_COPY: + * PRE_COPY - The device is running normally but tracking internal state + * changes + * And VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY: + * PRE_COPY_P2P - PRE_COPY, except the device cannot do peer to peer DMA * * The FSM takes actions on the arcs between FSM states. The driver implements * the following behavior for the FSM arcs: @@ -908,20 +921,48 @@ struct vfio_device_feature_mig_state { * * To abort a RESUMING session the device must be reset. * + * PRE_COPY -> RUNNING * RUNNING_P2P -> RUNNING * While in RUNNING the device is fully operational, the device may generate * interrupts, DMA, respond to MMIO, all vfio device regions are functional, * and the device may advance its internal state. * + * The PRE_COPY arc will terminate a data transfer session. + * + * PRE_COPY_P2P -> RUNNING_P2P * RUNNING -> RUNNING_P2P * STOP -> RUNNING_P2P * While in RUNNING_P2P the device is partially running in the P2P quiescent * state defined below. * + * The PRE_COPY_P2P arc will terminate a data transfer session. + * + * RUNNING -> PRE_COPY + * RUNNING_P2P -> PRE_COPY_P2P * STOP -> STOP_COPY - * This arc begin the process of saving the device state and will return a - * new data_fd. + * PRE_COPY, PRE_COPY_P2P and STOP_COPY form the "saving group" of states + * which share a data transfer session. Moving between these states alters + * what is streamed in session, but does not terminate or otherwise affect + * the associated fd. + * + * These arcs begin the process of saving the device state and will return a + * new data_fd. The migration driver may perform actions such as enabling + * dirty logging of device state when entering PRE_COPY or PER_COPY_P2P. * + * Each arc does not change the device operation, the device remains + * RUNNING, P2P quiesced or in STOP. The STOP_COPY state is described below + * in PRE_COPY_P2P -> STOP_COPY. + * + * PRE_COPY -> PRE_COPY_P2P + * Entering PRE_COPY_P2P continues all the behaviors of PRE_COPY above. + * However, while in the PRE_COPY_P2P state, the device is partially running + * in the P2P quiescent state defined below, like RUNNING_P2P. + * + * PRE_COPY_P2P -> PRE_COPY + * This arc allows returning the device to a full RUNNING behavior while + * continuing all the behaviors of PRE_COPY. + * + * PRE_COPY_P2P -> STOP_COPY * While in the STOP_COPY state the device has the same behavior as STOP * with the addition that the data transfers session continues to stream the * migration state. End of stream on the FD indicates the entire device @@ -939,6 +980,13 @@ struct vfio_device_feature_mig_state { * device state for this arc if required to prepare the device to receive the * migration data. * + * STOP_COPY -> PRE_COPY + * STOP_COPY -> PRE_COPY_P2P + * These arcs are not permitted and return error if requested. Future + * revisions of this API may define behaviors for these arcs, in this case + * support will be discoverable by a new flag in + * VFIO_DEVICE_FEATURE_MIGRATION. + * * any -> ERROR * ERROR cannot be specified as a device state, however any transition request * can be failed with an errno return and may then move the device_state into @@ -950,7 +998,7 @@ struct vfio_device_feature_mig_state { * The optional peer to peer (P2P) quiescent state is intended to be a quiescent * state for the device for the purposes of managing multiple devices within a * user context where peer-to-peer DMA between devices may be active. The - * RUNNING_P2P states must prevent the device from initiating + * RUNNING_P2P and PRE_COPY_P2P states must prevent the device from initiating * any new P2P DMA transactions. If the device can identify P2P transactions * then it can stop only P2P DMA, otherwise it must stop all DMA. The migration * driver must complete any such outstanding operations prior to completing the @@ -963,6 +1011,8 @@ struct vfio_device_feature_mig_state { * above FSM arcs. As there are multiple paths through the FSM arcs the path * should be selected based on the following rules: * - Select the shortest path. + * - The path cannot have saving group states as interior arcs, only + * starting/end states. * Refer to vfio_mig_get_next_state() for the result of the algorithm. * * The automatic transit through the FSM arcs that make up the combination @@ -976,6 +1026,9 @@ struct vfio_device_feature_mig_state { * support them. The user can discover if these states are supported by using * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions the user can * avoid knowing about these optional states if the kernel driver supports them. + * + * Arcs touching PRE_COPY and PRE_COPY_P2P are removed if support for PRE_COPY + * is not present. */ enum vfio_device_mig_state { VFIO_DEVICE_STATE_ERROR = 0, @@ -984,8 +1037,69 @@ enum vfio_device_mig_state { VFIO_DEVICE_STATE_STOP_COPY = 3, VFIO_DEVICE_STATE_RESUMING = 4, VFIO_DEVICE_STATE_RUNNING_P2P = 5, + VFIO_DEVICE_STATE_PRE_COPY = 6, + VFIO_DEVICE_STATE_PRE_COPY_P2P = 7, +}; + +/** + * VFIO_MIG_GET_PRECOPY_INFO - _IO(VFIO_TYPE, VFIO_BASE + 21) + * + * This ioctl is used on the migration data FD in the precopy phase of the + * migration data transfer. It returns an estimate of the current data sizes + * remaining to be transferred. It allows the user to judge when it is + * appropriate to leave PRE_COPY for STOP_COPY. + * + * This ioctl is valid only in PRE_COPY states and kernel driver should + * return -EINVAL from any other migration state. + * + * The vfio_precopy_info data structure returned by this ioctl provides + * estimates of data available from the device during the PRE_COPY states. + * This estimate is split into two categories, initial_bytes and + * dirty_bytes. + * + * The initial_bytes field indicates the amount of initial mandatory precopy + * data available from the device. This field should have a non-zero initial + * value and decrease as migration data is read from the device. + * It is a must to leave PRE_COPY for STOP_COPY only after this field reach + * zero. + * + * The dirty_bytes field tracks device state changes relative to data + * previously retrieved. This field starts at zero and may increase as + * the internal device state is modified or decrease as that modified + * state is read from the device. + * + * Userspace may use the combination of these fields to estimate the + * potential data size available during the PRE_COPY phases, as well as + * trends relative to the rate the device is dirtying its internal + * state, but these fields are not required to have any bearing relative + * to the data size available during the STOP_COPY phase. + * + * Drivers have a lot of flexibility in when and what they transfer during the + * PRE_COPY phase, and how they report this from VFIO_MIG_GET_PRECOPY_INFO. + * + * During pre-copy the migration data FD has a temporary "end of stream" that is + * reached when both initial_bytes and dirty_byte are zero. For instance, this + * may indicate that the device is idle and not currently dirtying any internal + * state. When read() is done on this temporary end of stream the kernel driver + * should return ENOMSG from read(). Userspace can wait for more data (which may + * never come) by using poll. + * + * Once in STOP_COPY the migration data FD has a permanent end of stream + * signaled in the usual way by read() always returning 0 and poll always + * returning readable. ENOMSG may not be returned in STOP_COPY. Support + * for this ioctl is optional. + * + * Return: 0 on success, -1 and errno set on failure. + */ +struct vfio_precopy_info { + __u32 argsz; + __u32 flags; + __aligned_u64 initial_bytes; + __aligned_u64 dirty_bytes; }; +#define VFIO_MIG_GET_PRECOPY_INFO _IO(VFIO_TYPE, VFIO_BASE + 21) + /* * Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power * state with the platform-based power management. Device use of lower power From patchwork Thu Nov 24 17:39:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13055233 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1841BC433FE for ; Thu, 24 Nov 2022 17:40:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229627AbiKXRkm (ORCPT ); Thu, 24 Nov 2022 12:40:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229610AbiKXRkk (ORCPT ); Thu, 24 Nov 2022 12:40:40 -0500 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2066.outbound.protection.outlook.com [40.107.243.66]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 502DD1369DD for ; Thu, 24 Nov 2022 09:40:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fFomPN4HspyqnNf0lnOm2NRek4t6mkf5WEse3iqEaiKsgQmdBkvciS0Gjkn7FfItciLYB27V71Xm3N+Xj6KRBZDSBInVbUzieyo/E1WzYe0g5PTP1ARUuNk5WQBGm0+5VvdxeTdUCprabECqXwvc79JhfpFfAeyfgRwb8O4Wi8IypM3oP7gKmBCvqBfC/3hnUWRR8Xi5Zj0OsmHAOBW3rbArAjNBsnwCDau8w6vWU9Z+VmAnJQBS8ibqtJObeG127LWUUNkj+r4lcSnkMEoI9X4lnJuvWpmIePI1RX0JxcJv1j8voqINDSGVY/BV0JudeGkLokAlL8NYBf2UAfut/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6IKX56A9g0wdhCtKrvYyx/KFTKDx9Q9EClvU4e5syGE=; b=gtGsRqC7U9ClnTN4Q2VNR04/xGAFi7K76+QKYR7IB2T7yq+25oWzEpz7Itjx3bO3SyVQAQskCtpBdNER8F7LFTnpLQ5BLkKo244MHW+UjbaV1BkexdTqfbkqJytwlY+eu9LxK1f+dRHf7kQxruDI8h2Aee6xDOinkyZlKHpc1ncUHYTfa+Krp+K3OJ/L0dkgdsXCWC2urcnac6xDJKiiEjqLvNbu3igsiUsG/A13pJTqOBoZv5k4b/pR2JN7oZ8cDNtzJV0OEluOHFMKXMxLKh54uRzm0XBPpoJqFhh00bog4fOuE/+U7exKeo01CyHm62GgS81RT7j8DIYFq01k0g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6IKX56A9g0wdhCtKrvYyx/KFTKDx9Q9EClvU4e5syGE=; b=eli1spmxHqXscrs461Lrz2xCbRTwVOK7T76/oJw6uoF/GBlfBmk0Gr28Mnqcvc0ftjvT43DSFEfdmcrVHoVAQtnG6dpLpFPmnu4g45I5FtUo52DcYMyHN2w4Kzu+XbwyrePIKWqnigeL6tJrNxZHk840hAkwAuKwvfRaASiYSqwZvGpnQAAw45dc2T8P+FEwR+2S4BKv5kQwHZnFPJpem8y8FuOItoC81na6lOwlIzPMUAOapTzOxGNoJiR8P9oC+5Gn0D72/cHQ5apy+VXa0eR4rYSPM8R0S8jQjoApP2SlTOSOdvugpgJbQ8Nt//O+AO9XtJS+Wy9JnZLi8JaOmg== Received: from BYAPR06CA0028.namprd06.prod.outlook.com (2603:10b6:a03:d4::41) by PH8PR12MB6915.namprd12.prod.outlook.com (2603:10b6:510:1bc::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19; Thu, 24 Nov 2022 17:40:38 +0000 Received: from CO1PEPF00001A60.namprd05.prod.outlook.com (2603:10b6:a03:d4:cafe::71) by BYAPR06CA0028.outlook.office365.com (2603:10b6:a03:d4::41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:40:37 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by CO1PEPF00001A60.mail.protection.outlook.com (10.167.241.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.17 via Frontend Transport; Thu, 24 Nov 2022 17:40:37 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:29 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:29 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 24 Nov 2022 09:40:25 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH V1 vfio 03/14] vfio/mlx5: Enforce a single SAVE command at a time Date: Thu, 24 Nov 2022 19:39:21 +0200 Message-ID: <20221124173932.194654-4-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com> References: <20221124173932.194654-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF00001A60:EE_|PH8PR12MB6915:EE_ X-MS-Office365-Filtering-Correlation-Id: 740be603-84d1-46c8-b2c8-08dace42ff82 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +h0tzpSdYpTT79hlq/MPOhEK0rBp1NbknBETWrJT+ubzmIsnqGcEqf+e3YxD42AvlLijijHVDnO6o7eaRttqbnlpizEir7rN+zdyYUKeaAXWz1NQ0dZuhDwmO79c94DKJMWXBN5bWhfWC1j8EADSnK2BnW90MAdjFJazR/qkJC7wphPtejC0Vgjxq0+BGv8LhKKZNJxJz+erPcTESScJz+Aov8/QspX9YNbRhUIR/jWoM7YxpZEUcRxdMRKqug+lzZwbDRbXTyjFT59qGtB3nNUXX+W4o15TX+INH7IP5git9LFL8fG2HLXcevUYeiRHdtnjEL+BHRrl1XN3Y4D9pe3cKcrOo8WtbRvXocKrkMq9sBOa7e7Sulni2+TxV66hLVEPot/TbTMf0Vb5dX+GsDGvrvcCdTWyp5MX7rEI9Hl2qeGeIxMhilSj4JinvVLS6yNb53FEKL80YZvzh4gkOOosfI3XN948TzVdm43k+e3CYNr/MryynHiArmGqpykktzLITIJIOSNrU78Xmm1jGkTvzs5YDPcQfOti/ZnxdpLe2FTQ/NgXbKB1FF9GH8Waq6lfMiIDvoNARTP86sFbIkSOfhp3IydYbX/Qfc3wVrpdhTdi92ddeEhN64sngHagIYxnh2SXeSwe8u5+UuaN9rMB4RgtbJYZctD/OwUEmTJFpHlZjFWK6X6Vjou2kRRRRfluKLGMf46J6psP002buA== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(376002)(39860400002)(136003)(396003)(346002)(451199015)(36840700001)(40470700004)(46966006)(316002)(6636002)(54906003)(110136005)(86362001)(41300700001)(8936002)(356005)(70206006)(8676002)(4326008)(70586007)(7636003)(36756003)(83380400001)(186003)(1076003)(336012)(426003)(2616005)(47076005)(36860700001)(82740400003)(40460700003)(478600001)(82310400005)(40480700001)(6666004)(26005)(7696005)(66899015)(2906002)(5660300002);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2022 17:40:37.5327 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 740be603-84d1-46c8-b2c8-08dace42ff82 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF00001A60.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH8PR12MB6915 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Enforce a single SAVE command at a time. As the SAVE command is an asynchronous one, we must enforce running only a single command at a time. This will preserve ordering between multiple calls and protect from races on the migration file data structure. This is a must for the next patches from the series where as part of PRE_COPY we may have multiple images to be saved and multiple SAVE commands may be issued from different flows. Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 6 ++++++ drivers/vfio/pci/mlx5/cmd.h | 1 + drivers/vfio/pci/mlx5/main.c | 2 ++ 3 files changed, 9 insertions(+) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 0848bc905d3e..55ee8036f59c 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -281,6 +281,7 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); mlx5_core_dealloc_pd(mdev, async_data->pdn); kvfree(async_data->out); + complete(&migf->save_comp); fput(migf->filp); } @@ -321,6 +322,10 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, return -ENOTCONN; mdev = mvdev->mdev; + err = wait_for_completion_interruptible(&migf->save_comp); + if (err) + return err; + err = mlx5_core_alloc_pd(mdev, &pdn); if (err) return err; @@ -371,6 +376,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); err_dma_map: mlx5_core_dealloc_pd(mdev, pdn); + complete(&migf->save_comp); return err; } diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 921d5720a1e5..8ffa7699872c 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -37,6 +37,7 @@ struct mlx5_vf_migration_file { unsigned long last_offset; struct mlx5vf_pci_core_device *mvdev; wait_queue_head_t poll_wait; + struct completion save_comp; struct mlx5_async_ctx async_ctx; struct mlx5vf_async_data async_data; }; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 6e9cf2aacc52..4081a0f7e057 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -245,6 +245,8 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); init_waitqueue_head(&migf->poll_wait); + init_completion(&migf->save_comp); + complete(&migf->save_comp); mlx5_cmd_init_async_ctx(mvdev->mdev, &migf->async_ctx); INIT_WORK(&migf->async_data.work, mlx5vf_mig_file_cleanup_cb); ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, From patchwork Thu Nov 24 17:39:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13055234 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDA55C433FE for ; Thu, 24 Nov 2022 17:40:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229613AbiKXRk7 (ORCPT ); Thu, 24 Nov 2022 12:40:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33660 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229646AbiKXRky (ORCPT ); Thu, 24 Nov 2022 12:40:54 -0500 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2077.outbound.protection.outlook.com [40.107.237.77]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CE6E13F489 for ; Thu, 24 Nov 2022 09:40:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=F8UdBOkZq0Btnfx8jaKGNH74aJrjLp7NZC01ApEGd4mVzqD9ElfnCXeD4tqIITLlJEn4nw8oXsUTJB6A5ViTd7vMZLs0Kmg+nfUdcuq5+ocdov8vbBt7EEInWh7Tj6EAXeYwVKNTaAAmdSWjZGNCYAW0u4W7bilwW4lLqRNnQuRtoyPUdwjONzI656IMFth6qDLXcCyCuZSUHI4RID9VC22rmpBWyLpqYoxufufLhUgCAv+4izsebz/5HDIhD9Z4kbcPt5/w2UElv5EGw4zr+/RLQgopvW8x/4cqHPQUc0jS2Y6yCrpC+lQ2PqQVQzu1MzLjxW3xZQx2q5jmERpM3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JAcpghZcRjZ5mmVfu97I7k1eDhULoHnLM3EYT+G6E0E=; b=nPNCgzoncm8etbzhWLxYN0Op+/n/8m9zPm72TVKHUx0ejLR3S+06wm9vbnAPb45wXlRXRyfyUcTIHfqhEvp1K5EwDv/EvZZXHvBuSMViZsnwjqOP7RxeSi3d4Vf4MKnAPqFvBw8uWLdaxEn+KB8hvIkaGmDiKIBDpt7nLmWPSY2i4OkfM9C83i1vV3hflULWyIYcGx1kvTPb5GR2+eun5ArAStQV7xGAhsTW5DRN1bYhEoZ/iN3CBNo4HkGGLP1HqDHPijsVsaDspGwyA/X8zienJAT3kjbxMM4ajQB00/a5x7qxqK1xc7xPp+1TsrYexHXC42hI7Om65ec8Jl/vuA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JAcpghZcRjZ5mmVfu97I7k1eDhULoHnLM3EYT+G6E0E=; b=hniTnTcfsyHsJjqZu+J36rVOBdMN+QTEXu90dqDMhGqvGOSKYBOp63VqVV9YNef/VsaQfTkTrkbVRjU8erOL4P6oJX4DMtBLpoMZYinrLbQ655zxoXgmUxhRDUZWCMTBRRkEeicA7Z25qsfPM9Bt2d8CJ3Lp+BT+s0Fvb830ds9CTmJS8Hd/M+JJr34va4cPYeuFqCsae3hRgV3E6oeDVIZdvI2436QNqWuF0LYqRWP6SGnAW5RKu+iBY+zpn8wGAhTmYABNhHdoWVob06Hjq/L8abFgHnUDdelbRrXeFAoFmwEUnfx9kUcPrthi46Lk6BCzyncnh71nuwHyT/pXUg== Received: from BL1PR13CA0061.namprd13.prod.outlook.com (2603:10b6:208:2b8::6) by MW4PR12MB7013.namprd12.prod.outlook.com (2603:10b6:303:218::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19; Thu, 24 Nov 2022 17:40:49 +0000 Received: from BL02EPF0000C403.namprd05.prod.outlook.com (2603:10b6:208:2b8:cafe::c2) by BL1PR13CA0061.outlook.office365.com (2603:10b6:208:2b8::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:40:48 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by BL02EPF0000C403.mail.protection.outlook.com (10.167.241.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.17 via Frontend Transport; Thu, 24 Nov 2022 17:40:48 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:33 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:33 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 24 Nov 2022 09:40:29 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH V1 vfio 04/14] vfio/mlx5: Refactor PD usage Date: Thu, 24 Nov 2022 19:39:22 +0200 Message-ID: <20221124173932.194654-5-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com> References: <20221124173932.194654-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BL02EPF0000C403:EE_|MW4PR12MB7013:EE_ X-MS-Office365-Filtering-Correlation-Id: 48b074ca-58d5-44c1-3f26-08dace430615 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 3EZoec9tYADi8hyu1hxVDrWZDyRSPVg9T5NLJ9dX5yFevCzzhdfWn6KjlWvrdAFx3UsmXXrtnXicEFDT+xMxacSNL8/RtdqgIkWSao3U33gd7+bmCqZC/lGs9OFHPIqxQqv4IdbmavxZ5XjKE/Xyx5oCutgXwxdAnBvVfKs6G4LpSbY61NAj0h7uMiZbDROagaIyPGOkq0hK+VXZQCl/qIVJ/2EOO+f8nql6gJAyocWY4cDbq4TT+MUbgnVe2almUHOOHDVbrXs3q2Um4yYjvv+S8Aw9Kiu+UK5hCeoiJZeT/vbBmg9gZw1sGDBtWmZQ/CQ+vn3Czy2bF2PXXRVaaukyOq6p1pufWrXi1myvhncxJVl4XBdHMw7mVl+Zxx2KWMuv0jMWM/CPIaokPf1bs3K/18/2p0anf6gOttm3RgNLpjS/9jJueEPeCoJVd4aubXc6bIlNlwUkOsw/yf9P8jBv5JIdxo0Jel8qUsLLyoJNCGSmG2XnS/VX8KrbjUiYp81RPILvJgWhPipyAlj24oH8aTPoRIArAzp6uL+U6z7tZeX+Kfw0RPvQCzi+jAK8DIazhMjYKXHVIgrxrnCW+EF8DhtuhFeqPoNfkIMkNG1b2wReLvsPA17Q/B6fPaL1PpQBduAW2PGJnz4cqhJN1OdcVA5FAWP7ZzGwyTI2zMseiNtAfjF5Mwpk824dgx7JSVCErkEd3YyYKPvSFzAt3A== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(396003)(346002)(376002)(136003)(451199015)(46966006)(36840700001)(40470700004)(66899015)(36860700001)(8936002)(41300700001)(82310400005)(5660300002)(82740400003)(7636003)(356005)(2906002)(4326008)(47076005)(426003)(83380400001)(6666004)(40480700001)(186003)(36756003)(336012)(70206006)(8676002)(70586007)(2616005)(6636002)(54906003)(26005)(40460700003)(86362001)(478600001)(7696005)(316002)(1076003)(110136005);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2022 17:40:48.4501 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 48b074ca-58d5-44c1-3f26-08dace430615 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BL02EPF0000C403.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW4PR12MB7013 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch refactors PD usage such as its life cycle will be as of the migration file instead of allocating/destroying it upon each SAVE/LOAD command. This is a preparation step towards the PRE_COPY series where multiple images will be SAVED/LOADED and a single PD can be simply reused. Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 53 ++++++++++++++++++++++++------------ drivers/vfio/pci/mlx5/cmd.h | 5 +++- drivers/vfio/pci/mlx5/main.c | 44 ++++++++++++++++++++++-------- 3 files changed, 71 insertions(+), 31 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 55ee8036f59c..a97eac49e3d6 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -279,7 +279,6 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) mlx5_core_destroy_mkey(mdev, async_data->mkey); dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); - mlx5_core_dealloc_pd(mdev, async_data->pdn); kvfree(async_data->out); complete(&migf->save_comp); fput(migf->filp); @@ -314,7 +313,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {}; struct mlx5vf_async_data *async_data; struct mlx5_core_dev *mdev; - u32 pdn, mkey; + u32 mkey; int err; lockdep_assert_held(&mvdev->state_mutex); @@ -326,16 +325,12 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, if (err) return err; - err = mlx5_core_alloc_pd(mdev, &pdn); - if (err) - return err; - err = dma_map_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); if (err) goto err_dma_map; - err = _create_mkey(mdev, pdn, migf, NULL, &mkey); + err = _create_mkey(mdev, migf->pdn, migf, NULL, &mkey); if (err) goto err_create_mkey; @@ -357,7 +352,6 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, migf->total_length = 0; get_file(migf->filp); async_data->mkey = mkey; - async_data->pdn = pdn; err = mlx5_cmd_exec_cb(&migf->async_ctx, in, sizeof(in), async_data->out, out_size, mlx5vf_save_callback, @@ -375,7 +369,6 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, err_create_mkey: dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); err_dma_map: - mlx5_core_dealloc_pd(mdev, pdn); complete(&migf->save_comp); return err; } @@ -386,7 +379,7 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, struct mlx5_core_dev *mdev; u32 out[MLX5_ST_SZ_DW(load_vhca_state_out)] = {}; u32 in[MLX5_ST_SZ_DW(load_vhca_state_in)] = {}; - u32 pdn, mkey; + u32 mkey; int err; lockdep_assert_held(&mvdev->state_mutex); @@ -400,15 +393,11 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, } mdev = mvdev->mdev; - err = mlx5_core_alloc_pd(mdev, &pdn); - if (err) - goto end; - err = dma_map_sgtable(mdev->device, &migf->table.sgt, DMA_TO_DEVICE, 0); if (err) - goto err_reg; + goto end; - err = _create_mkey(mdev, pdn, migf, NULL, &mkey); + err = _create_mkey(mdev, migf->pdn, migf, NULL, &mkey); if (err) goto err_mkey; @@ -424,13 +413,41 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, mlx5_core_destroy_mkey(mdev, mkey); err_mkey: dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_TO_DEVICE, 0); -err_reg: - mlx5_core_dealloc_pd(mdev, pdn); end: mutex_unlock(&migf->lock); return err; } +int mlx5vf_cmd_alloc_pd(struct mlx5_vf_migration_file *migf) +{ + int err; + + lockdep_assert_held(&migf->mvdev->state_mutex); + if (migf->mvdev->mdev_detach) + return -ENOTCONN; + + err = mlx5_core_alloc_pd(migf->mvdev->mdev, &migf->pdn); + return err; +} + +void mlx5vf_cmd_dealloc_pd(struct mlx5_vf_migration_file *migf) +{ + lockdep_assert_held(&migf->mvdev->state_mutex); + if (migf->mvdev->mdev_detach) + return; + + mlx5_core_dealloc_pd(migf->mvdev->mdev, migf->pdn); +} + +void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf) +{ + lockdep_assert_held(&migf->mvdev->state_mutex); + + WARN_ON(migf->mvdev->mdev_detach); + + mlx5vf_cmd_dealloc_pd(migf); +} + static void combine_ranges(struct rb_root_cached *root, u32 cur_nodes, u32 req_nodes) { diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 8ffa7699872c..ba760f956d53 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -16,7 +16,6 @@ struct mlx5vf_async_data { struct mlx5_async_work cb_work; struct work_struct work; int status; - u32 pdn; u32 mkey; void *out; }; @@ -27,6 +26,7 @@ struct mlx5_vf_migration_file { u8 disabled:1; u8 is_err:1; + u32 pdn; struct sg_append_table table; size_t total_length; size_t allocated_length; @@ -127,6 +127,9 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, struct mlx5_vf_migration_file *migf); int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, struct mlx5_vf_migration_file *migf); +int mlx5vf_cmd_alloc_pd(struct mlx5_vf_migration_file *migf); +void mlx5vf_cmd_dealloc_pd(struct mlx5_vf_migration_file *migf); +void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf); void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work); diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 4081a0f7e057..7392a93af96f 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -236,12 +236,15 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) migf->filp = anon_inode_getfile("mlx5vf_mig", &mlx5vf_save_fops, migf, O_RDONLY); if (IS_ERR(migf->filp)) { - int err = PTR_ERR(migf->filp); - - kfree(migf); - return ERR_PTR(err); + ret = PTR_ERR(migf->filp); + goto end; } + migf->mvdev = mvdev; + ret = mlx5vf_cmd_alloc_pd(migf); + if (ret) + goto out_free; + stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); init_waitqueue_head(&migf->poll_wait); @@ -252,20 +255,25 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &migf->total_length); if (ret) - goto out_free; + goto out_pd; ret = mlx5vf_add_migration_pages( migf, DIV_ROUND_UP_ULL(migf->total_length, PAGE_SIZE)); if (ret) - goto out_free; + goto out_pd; - migf->mvdev = mvdev; ret = mlx5vf_cmd_save_vhca_state(mvdev, migf); if (ret) - goto out_free; + goto out_save; return migf; +out_save: + mlx5vf_disable_fd(migf); +out_pd: + mlx5vf_cmd_dealloc_pd(migf); out_free: fput(migf->filp); +end: + kfree(migf); return ERR_PTR(ret); } @@ -347,6 +355,7 @@ static struct mlx5_vf_migration_file * mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) { struct mlx5_vf_migration_file *migf; + int ret; migf = kzalloc(sizeof(*migf), GFP_KERNEL); if (!migf) @@ -355,20 +364,30 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) migf->filp = anon_inode_getfile("mlx5vf_mig", &mlx5vf_resume_fops, migf, O_WRONLY); if (IS_ERR(migf->filp)) { - int err = PTR_ERR(migf->filp); - - kfree(migf); - return ERR_PTR(err); + ret = PTR_ERR(migf->filp); + goto end; } + + migf->mvdev = mvdev; + ret = mlx5vf_cmd_alloc_pd(migf); + if (ret) + goto out_free; + stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); return migf; +out_free: + fput(migf->filp); +end: + kfree(migf); + return ERR_PTR(ret); } void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev) { if (mvdev->resuming_migf) { mlx5vf_disable_fd(mvdev->resuming_migf); + mlx5fv_cmd_clean_migf_resources(mvdev->resuming_migf); fput(mvdev->resuming_migf->filp); mvdev->resuming_migf = NULL; } @@ -376,6 +395,7 @@ void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev) mlx5_cmd_cleanup_async_ctx(&mvdev->saving_migf->async_ctx); cancel_work_sync(&mvdev->saving_migf->async_data.work); mlx5vf_disable_fd(mvdev->saving_migf); + mlx5fv_cmd_clean_migf_resources(mvdev->saving_migf); fput(mvdev->saving_migf->filp); mvdev->saving_migf = NULL; } From patchwork Thu Nov 24 17:39:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13055237 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8D1CC4332F for ; Thu, 24 Nov 2022 17:41:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229632AbiKXRlJ (ORCPT ); Thu, 24 Nov 2022 12:41:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229623AbiKXRlA (ORCPT ); Thu, 24 Nov 2022 12:41:00 -0500 Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam04on2073.outbound.protection.outlook.com [40.107.102.73]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8AC11369DD for ; Thu, 24 Nov 2022 09:40:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UgY6pgmEmo6bz/khXl038/pfjfZ8hB+70MPqH/cO06kjFOLDqVco8xfHUks3hkoSKBQtDLrHRF2RZktq2fqHvq+bUtGbg/GAXOufp+z5oIq6kik2bXqMXSJSF2YXx8Hal6C247Pot+YgxwyfUr2Dmbz25rWnrYq6c1NT/iF1p7/3PA+sFezr4j5O9/sudBSmj980+qMed+/ylF9PBqQpYDmjXh95VqEp3rvbtKIVuKlHVrWbzffl4ww3RbPlD9tEWlMR7M+ljjxUbENM/mod14evYgqlu5Yscl67DnK77NW78zogVQ42EWY4Jg3j75ohLvdTxuEMZdY+XlOuD+sxWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1p+8PTm76bv6J1JHRLWdfnzpOvSl2pW9FP80ZCVj7fE=; b=C5nmfoXBp1ALg5abUbnUq82KVMFBxGtVnqmlptFoDAViNe3gCtqpqAC6BthU/vUvF7nPvUQGGT3otEohyEom8cCRKoXjXSxX2l9QyNakiKcmxfiLrFNkzOoRBHr+Oboi232GJy4KR4E7CKl2T8MK4YjJ3qADSPyOPMepJqsbmyk6qXo62rrrsQMANkeBN44qxRMohSSJzOYAqBRC3YWJ9yGE8ixCNTwE8nBb/ppOAZQFLjA8ew4bAsPef73Ls1qmpy8fBn5puTrfhMGRv4xqfTVW276CrQMMoCHFPjymcZqBRSBeDrQKywT3bqVD4+WmTscYTPdmnESCvXuAs0o6+Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1p+8PTm76bv6J1JHRLWdfnzpOvSl2pW9FP80ZCVj7fE=; b=Wab2YnCDrGErV21OodY8+FUW/BSV9dFy49oTbKc9EY+Nlv3pkXDrQJdCYVsUX1/qVZlQ55eW8SuKTKJMyRHFB/gSH1NaknnWCl5lqsRVjs3THMWwpIG5yELhlT2T7DCP3wUW1gJE3NwZFwZC8mJK5HcQAXvLB+nH9h/6aW//0EahDno/ZT7nAm4nKWjwkmEiVHveInZVUs682hVeYuf0zWVldx5UMUj2ILHTnVea2dJC92omuN5Nnv8ZMI1jRZDPkNiurRaqWeLWEKOYQXonfGeNY2N+NRsJN062t5VR551d/ya5mD/lViAVdlVlgujjIOyuYMHafF0ce/KdMzNsQw== Received: from BL0PR02CA0060.namprd02.prod.outlook.com (2603:10b6:207:3d::37) by MW3PR12MB4521.namprd12.prod.outlook.com (2603:10b6:303:53::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19; Thu, 24 Nov 2022 17:40:54 +0000 Received: from BL02EPF0000C405.namprd05.prod.outlook.com (2603:10b6:207:3d:cafe::91) by BL0PR02CA0060.outlook.office365.com (2603:10b6:207:3d::37) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:40:54 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by BL02EPF0000C405.mail.protection.outlook.com (10.167.241.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.17 via Frontend Transport; Thu, 24 Nov 2022 17:40:53 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:36 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:36 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 24 Nov 2022 09:40:33 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH V1 vfio 05/14] vfio/mlx5: Refactor MKEY usage Date: Thu, 24 Nov 2022 19:39:23 +0200 Message-ID: <20221124173932.194654-6-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com> References: <20221124173932.194654-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BL02EPF0000C405:EE_|MW3PR12MB4521:EE_ X-MS-Office365-Filtering-Correlation-Id: 356ec90b-0497-46a2-b5e4-08dace430955 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: q6fBdp2vXmJ8AXF/qTH9M7kUK6qPrTpACW9fwVFpUVsAALQ2JGv0v2K+TFF8DqEGtCkfgx5aOYhHKHmC4nMTvaostbAiwNAgmhVqFR4UoyJVj0WrRbECQZp25TAfF/qxOKJF3N5c+0AXDV6rwZYB1FWY00ZtYM6XDpkOMWeCH2iEDXzMys0PI+Bpx2h89/0S0KT2XHlBx35W0ICt2+iVygVV3F12eo9DBiQTkNbl+TSbNBd2DFr+qiuQT2qcOaB8+6tlwQxjxYtlWzHUJDAN00KcgXOoQIU/QXq4TKySQa6F9qmXTZT8sGwYHFGcXY8/qaFi7dhRjk8xyA2GTffuTVhxNFmSo5yRBvd/1U0nRGLCE8o62WhEQem6oPZYBuIUbszHQ7anMrLDvEO25/uVVASAjrau5XfVM8KV/GAFoTAX/rjRLY49dY3Ex2bGqt/dyz7GVlYLoufXtknyDZOwoSdDUDR2n2Lwzxyp5B2FE7I1+F4N43TEn0RCtHM+Ydzg0S1e3kaYTnX1wH22W6MQJxv3oX5LY3OoW20raAcC6C7l0O7Q+WU8sLi72k8od2Vz8rauWuCtT3ssaacHutda9KFhv3UlibQWaAhLxlLCOv7T45sVrsmR3ClxDAVwL77NYJ+JhKwdx13wNuecM5Du1jv38GfdUy5MOQ9cXy2XebxxcvLJNXEyV9r/I1ild2rgNyh1eTNjxKZQsIOdN78rOg== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(346002)(39860400002)(136003)(376002)(396003)(451199015)(46966006)(36840700001)(40470700004)(2616005)(1076003)(47076005)(26005)(336012)(83380400001)(186003)(426003)(82740400003)(356005)(7636003)(36860700001)(2906002)(5660300002)(30864003)(40480700001)(40460700003)(8936002)(82310400005)(110136005)(6666004)(41300700001)(7696005)(8676002)(6636002)(54906003)(70586007)(70206006)(316002)(4326008)(478600001)(66899015)(86362001)(36756003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2022 17:40:53.9215 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 356ec90b-0497-46a2-b5e4-08dace430955 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BL02EPF0000C405.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW3PR12MB4521 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch refactors MKEY usage such as its life cycle will be as of the migration file instead of allocating/destroying it upon each SAVE/LOAD command. This is a preparation step towards the PRE_COPY series where multiple images will be SAVED/LOADED. We achieve it by having a new struct named mlx5_vhca_data_buffer which holds the mkey and its related stuff as of sg_append_table, allocated_length, etc. The above fields were taken out from the migration file main struct, into mlx5_vhca_data_buffer dedicated struct with the proper helpers in place. For now we have a single mlx5_vhca_data_buffer per migration file. However, in coming patches we'll have multiple of them to support multiple images. Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 162 ++++++++++++++++++++++------------- drivers/vfio/pci/mlx5/cmd.h | 37 +++++--- drivers/vfio/pci/mlx5/main.c | 92 +++++++++++--------- 3 files changed, 178 insertions(+), 113 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index a97eac49e3d6..ed4c472d2eae 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -210,11 +210,11 @@ static int mlx5vf_cmd_get_vhca_id(struct mlx5_core_dev *mdev, u16 function_id, } static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, - struct mlx5_vf_migration_file *migf, + struct mlx5_vhca_data_buffer *buf, struct mlx5_vhca_recv_buf *recv_buf, u32 *mkey) { - size_t npages = migf ? DIV_ROUND_UP(migf->total_length, PAGE_SIZE) : + size_t npages = buf ? DIV_ROUND_UP(buf->allocated_length, PAGE_SIZE) : recv_buf->npages; int err = 0, inlen; __be64 *mtt; @@ -232,10 +232,10 @@ static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, DIV_ROUND_UP(npages, 2)); mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt); - if (migf) { + if (buf) { struct sg_dma_page_iter dma_iter; - for_each_sgtable_dma_page(&migf->table.sgt, &dma_iter, 0) + for_each_sgtable_dma_page(&buf->table.sgt, &dma_iter, 0) *mtt++ = cpu_to_be64(sg_page_iter_dma_address(&dma_iter)); } else { int i; @@ -255,20 +255,99 @@ static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, MLX5_SET(mkc, mkc, qpn, 0xffffff); MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT); MLX5_SET(mkc, mkc, translations_octword_size, DIV_ROUND_UP(npages, 2)); - MLX5_SET64(mkc, mkc, len, - migf ? migf->total_length : (npages * PAGE_SIZE)); + MLX5_SET64(mkc, mkc, len, npages * PAGE_SIZE); err = mlx5_core_create_mkey(mdev, mkey, in, inlen); kvfree(in); return err; } +static int mlx5vf_dma_data_buffer(struct mlx5_vhca_data_buffer *buf) +{ + struct mlx5vf_pci_core_device *mvdev = buf->migf->mvdev; + struct mlx5_core_dev *mdev = mvdev->mdev; + int ret; + + lockdep_assert_held(&mvdev->state_mutex); + if (mvdev->mdev_detach) + return -ENOTCONN; + + if (buf->dmaed || !buf->allocated_length) + return -EINVAL; + + ret = dma_map_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); + if (ret) + return ret; + + ret = _create_mkey(mdev, buf->migf->pdn, buf, NULL, &buf->mkey); + if (ret) + goto err; + + buf->dmaed = true; + + return 0; +err: + dma_unmap_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); + return ret; +} + +void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf) +{ + struct mlx5_vf_migration_file *migf = buf->migf; + struct sg_page_iter sg_iter; + + lockdep_assert_held(&migf->mvdev->state_mutex); + WARN_ON(migf->mvdev->mdev_detach); + + if (buf->dmaed) { + mlx5_core_destroy_mkey(migf->mvdev->mdev, buf->mkey); + dma_unmap_sgtable(migf->mvdev->mdev->device, &buf->table.sgt, + buf->dma_dir, 0); + } + + /* Undo alloc_pages_bulk_array() */ + for_each_sgtable_page(&buf->table.sgt, &sg_iter, 0) + __free_page(sg_page_iter_page(&sg_iter)); + sg_free_append_table(&buf->table); + kfree(buf); +} + +struct mlx5_vhca_data_buffer * +mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, + size_t length, + enum dma_data_direction dma_dir) +{ + struct mlx5_vhca_data_buffer *buf; + int ret; + + buf = kzalloc(sizeof(*buf), GFP_KERNEL); + if (!buf) + return ERR_PTR(-ENOMEM); + + buf->dma_dir = dma_dir; + buf->migf = migf; + if (length) { + ret = mlx5vf_add_migration_pages(buf, + DIV_ROUND_UP_ULL(length, PAGE_SIZE)); + if (ret) + goto end; + + ret = mlx5vf_dma_data_buffer(buf); + if (ret) + goto end; + } + + return buf; +end: + mlx5vf_free_data_buffer(buf); + return ERR_PTR(ret); +} + void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) { struct mlx5vf_async_data *async_data = container_of(_work, struct mlx5vf_async_data, work); struct mlx5_vf_migration_file *migf = container_of(async_data, struct mlx5_vf_migration_file, async_data); - struct mlx5_core_dev *mdev = migf->mvdev->mdev; mutex_lock(&migf->lock); if (async_data->status) { @@ -276,9 +355,6 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) wake_up_interruptible(&migf->poll_wait); } mutex_unlock(&migf->lock); - - mlx5_core_destroy_mkey(mdev, async_data->mkey); - dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); kvfree(async_data->out); complete(&migf->save_comp); fput(migf->filp); @@ -292,7 +368,7 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) struct mlx5_vf_migration_file, async_data); if (!status) { - WRITE_ONCE(migf->total_length, + WRITE_ONCE(migf->buf->length, MLX5_GET(save_vhca_state_out, async_data->out, actual_image_size)); wake_up_interruptible(&migf->poll_wait); @@ -307,39 +383,28 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) } int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, - struct mlx5_vf_migration_file *migf) + struct mlx5_vf_migration_file *migf, + struct mlx5_vhca_data_buffer *buf) { u32 out_size = MLX5_ST_SZ_BYTES(save_vhca_state_out); u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {}; struct mlx5vf_async_data *async_data; - struct mlx5_core_dev *mdev; - u32 mkey; int err; lockdep_assert_held(&mvdev->state_mutex); if (mvdev->mdev_detach) return -ENOTCONN; - mdev = mvdev->mdev; err = wait_for_completion_interruptible(&migf->save_comp); if (err) return err; - err = dma_map_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, - 0); - if (err) - goto err_dma_map; - - err = _create_mkey(mdev, migf->pdn, migf, NULL, &mkey); - if (err) - goto err_create_mkey; - MLX5_SET(save_vhca_state_in, in, opcode, MLX5_CMD_OP_SAVE_VHCA_STATE); MLX5_SET(save_vhca_state_in, in, op_mod, 0); MLX5_SET(save_vhca_state_in, in, vhca_id, mvdev->vhca_id); - MLX5_SET(save_vhca_state_in, in, mkey, mkey); - MLX5_SET(save_vhca_state_in, in, size, migf->total_length); + MLX5_SET(save_vhca_state_in, in, mkey, buf->mkey); + MLX5_SET(save_vhca_state_in, in, size, buf->allocated_length); async_data = &migf->async_data; async_data->out = kvzalloc(out_size, GFP_KERNEL); @@ -348,10 +413,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, goto err_out; } - /* no data exists till the callback comes back */ - migf->total_length = 0; get_file(migf->filp); - async_data->mkey = mkey; err = mlx5_cmd_exec_cb(&migf->async_ctx, in, sizeof(in), async_data->out, out_size, mlx5vf_save_callback, @@ -365,57 +427,33 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, fput(migf->filp); kvfree(async_data->out); err_out: - mlx5_core_destroy_mkey(mdev, mkey); -err_create_mkey: - dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); -err_dma_map: complete(&migf->save_comp); return err; } int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, - struct mlx5_vf_migration_file *migf) + struct mlx5_vf_migration_file *migf, + struct mlx5_vhca_data_buffer *buf) { - struct mlx5_core_dev *mdev; u32 out[MLX5_ST_SZ_DW(load_vhca_state_out)] = {}; u32 in[MLX5_ST_SZ_DW(load_vhca_state_in)] = {}; - u32 mkey; int err; lockdep_assert_held(&mvdev->state_mutex); if (mvdev->mdev_detach) return -ENOTCONN; - mutex_lock(&migf->lock); - if (!migf->total_length) { - err = -EINVAL; - goto end; - } - - mdev = mvdev->mdev; - err = dma_map_sgtable(mdev->device, &migf->table.sgt, DMA_TO_DEVICE, 0); + err = mlx5vf_dma_data_buffer(buf); if (err) - goto end; - - err = _create_mkey(mdev, migf->pdn, migf, NULL, &mkey); - if (err) - goto err_mkey; + return err; MLX5_SET(load_vhca_state_in, in, opcode, MLX5_CMD_OP_LOAD_VHCA_STATE); MLX5_SET(load_vhca_state_in, in, op_mod, 0); MLX5_SET(load_vhca_state_in, in, vhca_id, mvdev->vhca_id); - MLX5_SET(load_vhca_state_in, in, mkey, mkey); - MLX5_SET(load_vhca_state_in, in, size, migf->total_length); - - err = mlx5_cmd_exec_inout(mdev, load_vhca_state, in, out); - - mlx5_core_destroy_mkey(mdev, mkey); -err_mkey: - dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_TO_DEVICE, 0); -end: - mutex_unlock(&migf->lock); - return err; + MLX5_SET(load_vhca_state_in, in, mkey, buf->mkey); + MLX5_SET(load_vhca_state_in, in, size, buf->length); + return mlx5_cmd_exec_inout(mvdev->mdev, load_vhca_state, in, out); } int mlx5vf_cmd_alloc_pd(struct mlx5_vf_migration_file *migf) @@ -445,6 +483,10 @@ void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf) WARN_ON(migf->mvdev->mdev_detach); + if (migf->buf) { + mlx5vf_free_data_buffer(migf->buf); + migf->buf = NULL; + } mlx5vf_cmd_dealloc_pd(migf); } diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index ba760f956d53..b0f08dfc8120 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -12,11 +12,25 @@ #include #include +struct mlx5_vhca_data_buffer { + struct sg_append_table table; + loff_t start_pos; + u64 length; + u64 allocated_length; + u32 mkey; + enum dma_data_direction dma_dir; + u8 dmaed:1; + struct mlx5_vf_migration_file *migf; + /* Optimize mlx5vf_get_migration_page() for sequential access */ + struct scatterlist *last_offset_sg; + unsigned int sg_last_entry; + unsigned long last_offset; +}; + struct mlx5vf_async_data { struct mlx5_async_work cb_work; struct work_struct work; int status; - u32 mkey; void *out; }; @@ -27,14 +41,7 @@ struct mlx5_vf_migration_file { u8 is_err:1; u32 pdn; - struct sg_append_table table; - size_t total_length; - size_t allocated_length; - - /* Optimize mlx5vf_get_migration_page() for sequential access */ - struct scatterlist *last_offset_sg; - unsigned int sg_last_entry; - unsigned long last_offset; + struct mlx5_vhca_data_buffer *buf; struct mlx5vf_pci_core_device *mvdev; wait_queue_head_t poll_wait; struct completion save_comp; @@ -124,12 +131,20 @@ void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev, void mlx5vf_cmd_remove_migratable(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_cmd_close_migratable(struct mlx5vf_pci_core_device *mvdev); int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, - struct mlx5_vf_migration_file *migf); + struct mlx5_vf_migration_file *migf, + struct mlx5_vhca_data_buffer *buf); int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, - struct mlx5_vf_migration_file *migf); + struct mlx5_vf_migration_file *migf, + struct mlx5_vhca_data_buffer *buf); int mlx5vf_cmd_alloc_pd(struct mlx5_vf_migration_file *migf); void mlx5vf_cmd_dealloc_pd(struct mlx5_vf_migration_file *migf); void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf); +struct mlx5_vhca_data_buffer * +mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, + size_t length, enum dma_data_direction dma_dir); +void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf); +int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, + unsigned int npages); void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work); diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 7392a93af96f..38ef8708eca5 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -33,7 +33,7 @@ static struct mlx5vf_pci_core_device *mlx5vf_drvdata(struct pci_dev *pdev) } static struct page * -mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf, +mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, unsigned long offset) { unsigned long cur_offset = 0; @@ -41,20 +41,20 @@ mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf, unsigned int i; /* All accesses are sequential */ - if (offset < migf->last_offset || !migf->last_offset_sg) { - migf->last_offset = 0; - migf->last_offset_sg = migf->table.sgt.sgl; - migf->sg_last_entry = 0; + if (offset < buf->last_offset || !buf->last_offset_sg) { + buf->last_offset = 0; + buf->last_offset_sg = buf->table.sgt.sgl; + buf->sg_last_entry = 0; } - cur_offset = migf->last_offset; + cur_offset = buf->last_offset; - for_each_sg(migf->last_offset_sg, sg, - migf->table.sgt.orig_nents - migf->sg_last_entry, i) { + for_each_sg(buf->last_offset_sg, sg, + buf->table.sgt.orig_nents - buf->sg_last_entry, i) { if (offset < sg->length + cur_offset) { - migf->last_offset_sg = sg; - migf->sg_last_entry += i; - migf->last_offset = cur_offset; + buf->last_offset_sg = sg; + buf->sg_last_entry += i; + buf->last_offset = cur_offset; return nth_page(sg_page(sg), (offset - cur_offset) / PAGE_SIZE); } @@ -63,8 +63,8 @@ mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf, return NULL; } -static int mlx5vf_add_migration_pages(struct mlx5_vf_migration_file *migf, - unsigned int npages) +int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, + unsigned int npages) { unsigned int to_alloc = npages; struct page **page_list; @@ -85,13 +85,13 @@ static int mlx5vf_add_migration_pages(struct mlx5_vf_migration_file *migf, } to_alloc -= filled; ret = sg_alloc_append_table_from_pages( - &migf->table, page_list, filled, 0, + &buf->table, page_list, filled, 0, filled << PAGE_SHIFT, UINT_MAX, SG_MAX_SINGLE_ALLOC, GFP_KERNEL); if (ret) goto err; - migf->allocated_length += filled * PAGE_SIZE; + buf->allocated_length += filled * PAGE_SIZE; /* clean input for another bulk allocation */ memset(page_list, 0, filled * sizeof(*page_list)); to_fill = min_t(unsigned int, to_alloc, @@ -108,16 +108,8 @@ static int mlx5vf_add_migration_pages(struct mlx5_vf_migration_file *migf, static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf) { - struct sg_page_iter sg_iter; - mutex_lock(&migf->lock); - /* Undo alloc_pages_bulk_array() */ - for_each_sgtable_page(&migf->table.sgt, &sg_iter, 0) - __free_page(sg_page_iter_page(&sg_iter)); - sg_free_append_table(&migf->table); migf->disabled = true; - migf->total_length = 0; - migf->allocated_length = 0; migf->filp->f_pos = 0; mutex_unlock(&migf->lock); } @@ -136,6 +128,7 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, loff_t *pos) { struct mlx5_vf_migration_file *migf = filp->private_data; + struct mlx5_vhca_data_buffer *vhca_buf = migf->buf; ssize_t done = 0; if (pos) @@ -144,16 +137,16 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (!(filp->f_flags & O_NONBLOCK)) { if (wait_event_interruptible(migf->poll_wait, - READ_ONCE(migf->total_length) || migf->is_err)) + READ_ONCE(vhca_buf->length) || migf->is_err)) return -ERESTARTSYS; } mutex_lock(&migf->lock); - if ((filp->f_flags & O_NONBLOCK) && !READ_ONCE(migf->total_length)) { + if ((filp->f_flags & O_NONBLOCK) && !READ_ONCE(vhca_buf->length)) { done = -EAGAIN; goto out_unlock; } - if (*pos > migf->total_length) { + if (*pos > vhca_buf->length) { done = -EINVAL; goto out_unlock; } @@ -162,7 +155,7 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, goto out_unlock; } - len = min_t(size_t, migf->total_length - *pos, len); + len = min_t(size_t, vhca_buf->length - *pos, len); while (len) { size_t page_offset; struct page *page; @@ -171,7 +164,7 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, int ret; page_offset = (*pos) % PAGE_SIZE; - page = mlx5vf_get_migration_page(migf, *pos - page_offset); + page = mlx5vf_get_migration_page(vhca_buf, *pos - page_offset); if (!page) { if (done == 0) done = -EINVAL; @@ -208,7 +201,7 @@ static __poll_t mlx5vf_save_poll(struct file *filp, mutex_lock(&migf->lock); if (migf->disabled || migf->is_err) pollflags = EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; - else if (READ_ONCE(migf->total_length)) + else if (READ_ONCE(migf->buf->length)) pollflags = EPOLLIN | EPOLLRDNORM; mutex_unlock(&migf->lock); @@ -227,6 +220,8 @@ static struct mlx5_vf_migration_file * mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) { struct mlx5_vf_migration_file *migf; + struct mlx5_vhca_data_buffer *buf; + size_t length; int ret; migf = kzalloc(sizeof(*migf), GFP_KERNEL); @@ -252,22 +247,23 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) complete(&migf->save_comp); mlx5_cmd_init_async_ctx(mvdev->mdev, &migf->async_ctx); INIT_WORK(&migf->async_data.work, mlx5vf_mig_file_cleanup_cb); - ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, - &migf->total_length); + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length); if (ret) goto out_pd; - ret = mlx5vf_add_migration_pages( - migf, DIV_ROUND_UP_ULL(migf->total_length, PAGE_SIZE)); - if (ret) + buf = mlx5vf_alloc_data_buffer(migf, length, DMA_FROM_DEVICE); + if (IS_ERR(buf)) { + ret = PTR_ERR(buf); goto out_pd; + } - ret = mlx5vf_cmd_save_vhca_state(mvdev, migf); + ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf); if (ret) goto out_save; + migf->buf = buf; return migf; out_save: - mlx5vf_disable_fd(migf); + mlx5vf_free_data_buffer(buf); out_pd: mlx5vf_cmd_dealloc_pd(migf); out_free: @@ -281,6 +277,7 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, size_t len, loff_t *pos) { struct mlx5_vf_migration_file *migf = filp->private_data; + struct mlx5_vhca_data_buffer *vhca_buf = migf->buf; loff_t requested_length; ssize_t done = 0; @@ -301,10 +298,10 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, goto out_unlock; } - if (migf->allocated_length < requested_length) { + if (vhca_buf->allocated_length < requested_length) { done = mlx5vf_add_migration_pages( - migf, - DIV_ROUND_UP(requested_length - migf->allocated_length, + vhca_buf, + DIV_ROUND_UP(requested_length - vhca_buf->allocated_length, PAGE_SIZE)); if (done) goto out_unlock; @@ -318,7 +315,7 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, int ret; page_offset = (*pos) % PAGE_SIZE; - page = mlx5vf_get_migration_page(migf, *pos - page_offset); + page = mlx5vf_get_migration_page(vhca_buf, *pos - page_offset); if (!page) { if (done == 0) done = -EINVAL; @@ -337,7 +334,7 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, len -= page_len; done += page_len; buf += page_len; - migf->total_length += page_len; + vhca_buf->length += page_len; } out_unlock: mutex_unlock(&migf->lock); @@ -355,6 +352,7 @@ static struct mlx5_vf_migration_file * mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) { struct mlx5_vf_migration_file *migf; + struct mlx5_vhca_data_buffer *buf; int ret; migf = kzalloc(sizeof(*migf), GFP_KERNEL); @@ -373,9 +371,18 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) if (ret) goto out_free; + buf = mlx5vf_alloc_data_buffer(migf, 0, DMA_TO_DEVICE); + if (IS_ERR(buf)) { + ret = PTR_ERR(buf); + goto out_pd; + } + + migf->buf = buf; stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); return migf; +out_pd: + mlx5vf_cmd_dealloc_pd(migf); out_free: fput(migf->filp); end: @@ -469,7 +476,8 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) { ret = mlx5vf_cmd_load_vhca_state(mvdev, - mvdev->resuming_migf); + mvdev->resuming_migf, + mvdev->resuming_migf->buf); if (ret) return ERR_PTR(ret); mlx5vf_disable_fds(mvdev); From patchwork Thu Nov 24 17:39:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13055235 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82B64C4332F for ; Thu, 24 Nov 2022 17:41:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229657AbiKXRlB (ORCPT ); Thu, 24 Nov 2022 12:41:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229644AbiKXRk4 (ORCPT ); Thu, 24 Nov 2022 12:40:56 -0500 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2041.outbound.protection.outlook.com [40.107.94.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 340C213E073 for ; Thu, 24 Nov 2022 09:40:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ABDe+HLaKzUJurHCNQdyxOFZSZF8i7Hh6aMc//WEtF6TqkBQuSvCbeHgpL2ojtMOdykz8nqv5+b582ggtIOWJyd5nIOnrdSlS5SuTvVQcpMjtMzI7S2NPSu1iX3bNGY4fA4dkwIcVtHeKGYjhdwboj/nDOvyL4INzB8UL/MdIYBJZbJR9C5lcfUk/Kfk1s1AKZqFl8mSvwhxZbHtSeiKsfHTHrTvqQO2EYJXLEvUe6nnfjo/7RxnWghWOrqw75i+6OoMHRUvUTo2uD9lw1GSKcnnMRNUEbdaCjwcxtpyDOvN/oG3hEMQ1W1gRMQOFOiwy0to4yR7FTzBtlnWZCUvyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=MuGw/0NK3SUKhRrV2O5MX15cqjR4EAK7Me0zK/QrVpA=; b=b1wGDI6Xo+xY9sV6gSzNlJtiHeefUrU5JWk8wSrtWICvGfoOpESUy2eZQHL2gnBvyq45m2W+zWaoYopQpu5eWdroHasPG6hJMn4+qlvGlSReAyNXv/mxDUnqRJ7Nq24TuUCBU73t8lX9mY97BlawWZdQWydwTEW8IqGJBgTKDIdDyivEEkYWn4XM28RgyLwyStGgt43bNuN5cZS5JHGN8baUoau1DZeOTWM7czwOMoBkO1M7DFXpgaVdC+0mb8WqEXU1d5ETWGSp53m9TBGN92XoAsphwtOlGnspljuDq7kRBj3dkOpdinSs/lK3CfqzqA1aYpDRzQNMHJfzngQy9g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MuGw/0NK3SUKhRrV2O5MX15cqjR4EAK7Me0zK/QrVpA=; b=mZRV6m45kaV8t80zy/jiOHfnQ866ss6MGON5JWpoIaYIwhDOv5fMGHq5rK4ANSdFLDc61fZOSD3wL3yW2fqbsLtAVSv+Ilz+H4t032cTv7lWxJ4YAMXAqBHEH0ez8gg6/tCi7bFy/o876lRTowi4re5vf4rTLLu/gZXOLDIiOzUZdE1wzpMQvb7ZnvTlbz2zMYqVLk2eLqSbS9RhGrGYjymu8wBtCc3TC1k8dY+rm78me5Q7qOSxhJkaziz8+k5HREaSb9iAfk3or62Pc+mN/g6aLogZnakkN+gYrO1bgGt8qiTw4dS2ZRtJJN5bXl22I6WRclKVA3whrF8vQ9oicw== Received: from MW4P221CA0013.NAMP221.PROD.OUTLOOK.COM (2603:10b6:303:8b::18) by MN2PR12MB4437.namprd12.prod.outlook.com (2603:10b6:208:26f::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19; Thu, 24 Nov 2022 17:40:51 +0000 Received: from CO1NAM11FT105.eop-nam11.prod.protection.outlook.com (2603:10b6:303:8b:cafe::50) by MW4P221CA0013.outlook.office365.com (2603:10b6:303:8b::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:40:51 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by CO1NAM11FT105.mail.protection.outlook.com (10.13.175.159) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:40:51 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:40 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:39 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 24 Nov 2022 09:40:36 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH V1 vfio 06/14] vfio/mlx5: Refactor migration file state Date: Thu, 24 Nov 2022 19:39:24 +0200 Message-ID: <20221124173932.194654-7-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com> References: <20221124173932.194654-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT105:EE_|MN2PR12MB4437:EE_ X-MS-Office365-Filtering-Correlation-Id: 09600e8f-aba7-4fe2-18ee-08dace4307ac X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: fwVBsBgtVlgPVV0Nx7nQTSyHA38iWjzfFc+kn6dX/LMdTjC9DLlOdc78oNZ8VdTvSnuH9KlHkmIkKpjA6uo7c5M3DfSHkViur2lNE2Sc1Oy7Gg31zJ0n5vuEz8Fn2CylH+KaX3GmMc2jgbJessGyT7PT8qm2JcyD/kSBY8Bx5AZ01Xwtdf7AdyBVg0cNqg9ZJXmIhbLpJeWDoKsKM8hsGEa0KDleYsT+dZqIY2lmefwxlpYYi2VZ2qHnO5oMdMRPpCq9OqVZFQJxLq+aC1dMw0ZCBcuP+nfDJ+axhWRrKv+B10jyRfYBS3qPYAnG59L+BEO83tPTQz0UkBU6MceGc2FxeRwRy86D+7bY5dXh2MAeiOXJ6TQ73YcZUfgp29VNoX4kV21PvvpjeUykOTuZSd2G2vl5xLy9DsgNfXDBaDchpCLhti9DKvGF1hIfWY4JHv6IoyYsyzeLIO5DtmOYxmP7uRBAZ1Z1Co7D4a3l9/CeDVO0CL0Xh6SpFVvdjfK2GykJCe1G7+aB6N8H1UImWIgF8ATlLATiYdwo0ytowCNzuo3ag1r0iv+NzMcB+Wf5O0hHFm+F4orRNheX/s040Qsdj3IbB8Veiq+QhOQ8ch6nPtguk8VbCyTL2gNKHHy5l9ttdoaGdxSpZoI7PLtfNRDxnbv3X6ve9+8mNHcQ/QS+xwaODMa7mAHq4OSUZwb4GKFECHjKJA59zC1qIAXbZQ== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(376002)(39860400002)(346002)(136003)(396003)(451199015)(46966006)(40470700004)(36840700001)(186003)(2616005)(1076003)(426003)(336012)(47076005)(82310400005)(36756003)(5660300002)(41300700001)(6666004)(7696005)(8936002)(26005)(86362001)(316002)(54906003)(110136005)(6636002)(8676002)(4326008)(40460700003)(478600001)(70586007)(36860700001)(70206006)(2906002)(82740400003)(40480700001)(356005)(7636003)(83380400001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2022 17:40:51.1946 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 09600e8f-aba7-4fe2-18ee-08dace4307ac X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT105.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB4437 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Refactor migration file state to be an emum which is mutual exclusive. As of that dropped the 'disabled' state as 'error' is the same from functional point of view. Next patches from the series will extend this enum for other relevant states. Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 2 +- drivers/vfio/pci/mlx5/cmd.h | 7 +++++-- drivers/vfio/pci/mlx5/main.c | 11 ++++++----- 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index ed4c472d2eae..fcba12326185 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -351,7 +351,7 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) mutex_lock(&migf->lock); if (async_data->status) { - migf->is_err = true; + migf->state = MLX5_MIGF_STATE_ERROR; wake_up_interruptible(&migf->poll_wait); } mutex_unlock(&migf->lock); diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index b0f08dfc8120..14403e654e4e 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -12,6 +12,10 @@ #include #include +enum mlx5_vf_migf_state { + MLX5_MIGF_STATE_ERROR = 1, +}; + struct mlx5_vhca_data_buffer { struct sg_append_table table; loff_t start_pos; @@ -37,8 +41,7 @@ struct mlx5vf_async_data { struct mlx5_vf_migration_file { struct file *filp; struct mutex lock; - u8 disabled:1; - u8 is_err:1; + enum mlx5_vf_migf_state state; u32 pdn; struct mlx5_vhca_data_buffer *buf; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 38ef8708eca5..0ee8e509116c 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -109,7 +109,7 @@ int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf) { mutex_lock(&migf->lock); - migf->disabled = true; + migf->state = MLX5_MIGF_STATE_ERROR; migf->filp->f_pos = 0; mutex_unlock(&migf->lock); } @@ -137,7 +137,8 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (!(filp->f_flags & O_NONBLOCK)) { if (wait_event_interruptible(migf->poll_wait, - READ_ONCE(vhca_buf->length) || migf->is_err)) + READ_ONCE(vhca_buf->length) || + migf->state == MLX5_MIGF_STATE_ERROR)) return -ERESTARTSYS; } @@ -150,7 +151,7 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, done = -EINVAL; goto out_unlock; } - if (migf->disabled || migf->is_err) { + if (migf->state == MLX5_MIGF_STATE_ERROR) { done = -ENODEV; goto out_unlock; } @@ -199,7 +200,7 @@ static __poll_t mlx5vf_save_poll(struct file *filp, poll_wait(filp, &migf->poll_wait, wait); mutex_lock(&migf->lock); - if (migf->disabled || migf->is_err) + if (migf->state == MLX5_MIGF_STATE_ERROR) pollflags = EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; else if (READ_ONCE(migf->buf->length)) pollflags = EPOLLIN | EPOLLRDNORM; @@ -293,7 +294,7 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, return -ENOMEM; mutex_lock(&migf->lock); - if (migf->disabled) { + if (migf->state == MLX5_MIGF_STATE_ERROR) { done = -ENODEV; goto out_unlock; } From patchwork Thu Nov 24 17:39:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13055241 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7452EC433FE for ; Thu, 24 Nov 2022 17:41:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229672AbiKXRlO (ORCPT ); Thu, 24 Nov 2022 12:41:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229653AbiKXRlB (ORCPT ); Thu, 24 Nov 2022 12:41:01 -0500 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2068.outbound.protection.outlook.com [40.107.223.68]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 919B31400C7 for ; Thu, 24 Nov 2022 09:41:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CUzgdLFDCL/pvDoHz+oEaHsWbGu2oWthsOQvX+5RkOZYv/OPIEJYBaVx3aPC7KMG1jZuIc3W022j2hUHcawLIIxGYww97+PVvQzpEIRjU4K+y48nfjpHpx4qWkZtISeBQR3m1cEmBsG9mx8P3zv9+naFLIKXKeK7G32UFato04dmwrodWkNdOac0dlk12mB95R02Bub1jN+/9BI1kSq3eB3i9KLTTP7qv/SUcNtTze1JWSlZg15Cr5vrHnGYVyIyYC7Wj6lV+54iPMyWDVGnL3QlnHpWaloIYTkqJ3O+z8zqK0gk932eJu5nBKwGxkA6hN4/JqrtBXLy3+Isbsp3qA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=hx8jq7yvx1Yr/nRbpaDJj045bU9a0MQRrJfvrsHSWTU=; b=MUWwI+/nPCzIMCiyPFP4SbtwQ//hQPyBwFSGCP7Nb3YzZ2S3nH42xvFscgYuskSeCDrmKPDqa5dMm/ucARTUjJmYuB/siwGNxV6RCFONb5UfB9XpYdpICs1pk4WhgdIKS+EeMd6RtVVXpyHN2MLg0G4/Bd5FSleF+EPnpl9xShpSk4BsFetGKaXXg1atiIHatTt48U6jsfm4A0Ne17gRXWyD+hL9zcKaPkJW3TVUfdFmDMWtLP1TubyZlxaAJ+lz3iPN+FAanWdTJHY4NXH8l3N0WJvNLmiDea8bm/V79Ey4upID1NfP6vgnXO4IoORL9li+kBtU6UETYIqYpUOoQw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=hx8jq7yvx1Yr/nRbpaDJj045bU9a0MQRrJfvrsHSWTU=; b=sx4YLNwlDmgiwFR2JhPfQ40GSz0sqHdocbkRhIgyGl9xBRKYM6B3BkkU+ZXmGI3r8JyBp0mpo3/Sby3WOn8puNx/pUGCwy7ZWSnvLY1DNsDxtbH3vhJNG/gZR/Sq2z24pJOsMwnnZIRCxfR9YYFtsfPvmBhQ0+MzN2ckmiIZ734xN/bh+SMM7YUfNRsR7Nzx5ZOzP98UfaMuq5vNZlCBu6wLKVjpL/sKcTefszocwYsX37dzUzZaO5JfFtyM8udbkPLbXeZ3tCHQOCyZveKhlXk/zwNKtNrWYVPUTPPR8wRylLnlxYxRHSFIVC2XN32+WwsV7mYcB+CL/aKy9dRufg== Received: from DS7PR03CA0233.namprd03.prod.outlook.com (2603:10b6:5:3ba::28) by CH2PR12MB4071.namprd12.prod.outlook.com (2603:10b6:610:7b::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5834.15; Thu, 24 Nov 2022 17:40:59 +0000 Received: from DM6NAM11FT087.eop-nam11.prod.protection.outlook.com (2603:10b6:5:3ba:cafe::75) by DS7PR03CA0233.outlook.office365.com (2603:10b6:5:3ba::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:40:59 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DM6NAM11FT087.mail.protection.outlook.com (10.13.172.150) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5834.8 via Frontend Transport; Thu, 24 Nov 2022 17:40:58 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:43 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:42 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 24 Nov 2022 09:40:40 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH V1 vfio 07/14] vfio/mlx5: Refactor to use queue based data chunks Date: Thu, 24 Nov 2022 19:39:25 +0200 Message-ID: <20221124173932.194654-8-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com> References: <20221124173932.194654-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT087:EE_|CH2PR12MB4071:EE_ X-MS-Office365-Filtering-Correlation-Id: f788286e-434e-4629-e88c-08dace430c48 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 0Lov+v6Ak+xZU2Re1xa+4JX9+OcF472sWDkN4wqMV/8v7gBWWEI7/8cBA7MVtDJNmbmBrCzddoCm9JJYWt1abv0JDWM+vWMBpV2bA/6eCG4vTN0RLybI1VD5QMTlShCu/HCRk68v3xJb1TzqSlq5Ftc776ziLri19dq5FhR0/+Qex3Lhf34GquqwxA1xeO8+atGyM2CK/L3LTrjCmXrO4AtQzQUm+fATdYT0yyhYuFgeRxkma2U7/Zx6b8PAWgOgIHFVs2ISkGW9FJKPP6wKB8K30cQnishNf7eum+toHlNq+M9kMgCCi9fU6KF2L1ixMXMufJtsOU8vj3SEhxnnHsAEfXNbeKgjN+p9u3ccGsfyiwHYA11a30H5Ich3tmQWR5Uwg2H/tZd+jNdyIVyS/1IwFQa45TdeBw/YZ7glMP90aIScYWVxHiD2HkPcVWLm8x/q34G5GNRKETvseXH41fMWKuLPHokFnFaRCTGSS1fDvkBf0bCPatESTW6xKmX/FZ1d4U6svsXLarnZj0xoLI32wsRYxL1rzmQwSEBjuO44j6orXDD7zvnPDnCKk0U6TL+EtOy4A+yi68VM+Nfjxc1XCowtxWPhH04Z+qrluWK9tieCMRgWAnvlrDgx+QIOtfdwfjtK2HAuLD6aw3WhyUkdjRzL6Q52/IkfPaPkXk01Ha7wyVp5kWJ5Ez24s5au+HPpN3wGyw7nR7Fq0CsjMg== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(136003)(346002)(39860400002)(376002)(396003)(451199015)(40470700004)(36840700001)(46966006)(66899015)(36860700001)(36756003)(86362001)(82740400003)(5660300002)(40480700001)(2906002)(186003)(2616005)(1076003)(47076005)(426003)(26005)(40460700003)(83380400001)(356005)(336012)(7636003)(7696005)(82310400005)(70206006)(8676002)(4326008)(41300700001)(110136005)(6636002)(54906003)(8936002)(478600001)(6666004)(316002)(70586007);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2022 17:40:58.9331 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f788286e-434e-4629-e88c-08dace430c48 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT087.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4071 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Refactor to use queue based data chunks on the migration file. The SAVE command adds a chunk to the tail of the queue while the read() API finds the required chunk and returns its data. In case the queue is empty but the state of the migration file is MLX5_MIGF_STATE_COMPLETE, read() may not be blocked but will return 0 to indicate end of file. This is a step towards maintaining multiple images and their meta data (i.e. headers) on the migration file as part of next patches from the series. Note: At that point, we still use a single chunk on the migration file but becomes ready to support multiple. Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 24 +++++- drivers/vfio/pci/mlx5/cmd.h | 5 ++ drivers/vfio/pci/mlx5/main.c | 145 +++++++++++++++++++++++++++-------- 3 files changed, 136 insertions(+), 38 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index fcba12326185..0e36b4c8c816 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -351,6 +351,7 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) mutex_lock(&migf->lock); if (async_data->status) { + migf->buf = async_data->buf; migf->state = MLX5_MIGF_STATE_ERROR; wake_up_interruptible(&migf->poll_wait); } @@ -368,9 +369,15 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) struct mlx5_vf_migration_file, async_data); if (!status) { - WRITE_ONCE(migf->buf->length, - MLX5_GET(save_vhca_state_out, async_data->out, - actual_image_size)); + unsigned long flags; + + async_data->buf->length = + MLX5_GET(save_vhca_state_out, async_data->out, + actual_image_size); + spin_lock_irqsave(&migf->list_lock, flags); + list_add_tail(&async_data->buf->buf_elm, &migf->buf_list); + spin_unlock_irqrestore(&migf->list_lock, flags); + migf->state = MLX5_MIGF_STATE_COMPLETE; wake_up_interruptible(&migf->poll_wait); } @@ -407,6 +414,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, MLX5_SET(save_vhca_state_in, in, size, buf->allocated_length); async_data = &migf->async_data; + async_data->buf = buf; async_data->out = kvzalloc(out_size, GFP_KERNEL); if (!async_data->out) { err = -ENOMEM; @@ -479,14 +487,22 @@ void mlx5vf_cmd_dealloc_pd(struct mlx5_vf_migration_file *migf) void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf) { - lockdep_assert_held(&migf->mvdev->state_mutex); + struct mlx5_vhca_data_buffer *entry; + lockdep_assert_held(&migf->mvdev->state_mutex); WARN_ON(migf->mvdev->mdev_detach); if (migf->buf) { mlx5vf_free_data_buffer(migf->buf); migf->buf = NULL; } + + while ((entry = list_first_entry_or_null(&migf->buf_list, + struct mlx5_vhca_data_buffer, buf_elm))) { + list_del(&entry->buf_elm); + mlx5vf_free_data_buffer(entry); + } + mlx5vf_cmd_dealloc_pd(migf); } diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 14403e654e4e..6e594689566e 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -14,6 +14,7 @@ enum mlx5_vf_migf_state { MLX5_MIGF_STATE_ERROR = 1, + MLX5_MIGF_STATE_COMPLETE, }; struct mlx5_vhca_data_buffer { @@ -24,6 +25,7 @@ struct mlx5_vhca_data_buffer { u32 mkey; enum dma_data_direction dma_dir; u8 dmaed:1; + struct list_head buf_elm; struct mlx5_vf_migration_file *migf; /* Optimize mlx5vf_get_migration_page() for sequential access */ struct scatterlist *last_offset_sg; @@ -34,6 +36,7 @@ struct mlx5_vhca_data_buffer { struct mlx5vf_async_data { struct mlx5_async_work cb_work; struct work_struct work; + struct mlx5_vhca_data_buffer *buf; int status; void *out; }; @@ -45,6 +48,8 @@ struct mlx5_vf_migration_file { u32 pdn; struct mlx5_vhca_data_buffer *buf; + spinlock_t list_lock; + struct list_head buf_list; struct mlx5vf_pci_core_device *mvdev; wait_queue_head_t poll_wait; struct completion save_comp; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 0ee8e509116c..facb5ab6021e 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -124,11 +124,90 @@ static int mlx5vf_release_file(struct inode *inode, struct file *filp) return 0; } +static struct mlx5_vhca_data_buffer * +mlx5vf_get_data_buff_from_pos(struct mlx5_vf_migration_file *migf, loff_t pos, + bool *end_of_data) +{ + struct mlx5_vhca_data_buffer *buf; + bool found = false; + + *end_of_data = false; + spin_lock_irq(&migf->list_lock); + if (list_empty(&migf->buf_list)) { + *end_of_data = true; + goto end; + } + + buf = list_first_entry(&migf->buf_list, struct mlx5_vhca_data_buffer, + buf_elm); + if (pos >= buf->start_pos && + pos < buf->start_pos + buf->length) { + found = true; + goto end; + } + + /* + * As we use a stream based FD we may expect having the data always + * on first chunk + */ + migf->state = MLX5_MIGF_STATE_ERROR; + +end: + spin_unlock_irq(&migf->list_lock); + return found ? buf : NULL; +} + +static ssize_t mlx5vf_buf_read(struct mlx5_vhca_data_buffer *vhca_buf, + char __user **buf, size_t *len, loff_t *pos) +{ + unsigned long offset; + ssize_t done = 0; + size_t copy_len; + + copy_len = min_t(size_t, + vhca_buf->start_pos + vhca_buf->length - *pos, *len); + while (copy_len) { + size_t page_offset; + struct page *page; + size_t page_len; + u8 *from_buff; + int ret; + + offset = *pos - vhca_buf->start_pos; + page_offset = offset % PAGE_SIZE; + offset -= page_offset; + page = mlx5vf_get_migration_page(vhca_buf, offset); + if (!page) + return -EINVAL; + page_len = min_t(size_t, copy_len, PAGE_SIZE - page_offset); + from_buff = kmap_local_page(page); + ret = copy_to_user(*buf, from_buff + page_offset, page_len); + kunmap_local(from_buff); + if (ret) + return -EFAULT; + *pos += page_len; + *len -= page_len; + *buf += page_len; + done += page_len; + copy_len -= page_len; + } + + if (*pos >= vhca_buf->start_pos + vhca_buf->length) { + spin_lock_irq(&vhca_buf->migf->list_lock); + list_del_init(&vhca_buf->buf_elm); + spin_unlock_irq(&vhca_buf->migf->list_lock); + } + + return done; +} + static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, loff_t *pos) { struct mlx5_vf_migration_file *migf = filp->private_data; - struct mlx5_vhca_data_buffer *vhca_buf = migf->buf; + struct mlx5_vhca_data_buffer *vhca_buf; + bool first_loop_call = true; + bool end_of_data; ssize_t done = 0; if (pos) @@ -137,53 +216,47 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (!(filp->f_flags & O_NONBLOCK)) { if (wait_event_interruptible(migf->poll_wait, - READ_ONCE(vhca_buf->length) || - migf->state == MLX5_MIGF_STATE_ERROR)) + !list_empty(&migf->buf_list) || + migf->state == MLX5_MIGF_STATE_ERROR || + migf->state == MLX5_MIGF_STATE_COMPLETE)) return -ERESTARTSYS; } mutex_lock(&migf->lock); - if ((filp->f_flags & O_NONBLOCK) && !READ_ONCE(vhca_buf->length)) { - done = -EAGAIN; - goto out_unlock; - } - if (*pos > vhca_buf->length) { - done = -EINVAL; - goto out_unlock; - } if (migf->state == MLX5_MIGF_STATE_ERROR) { done = -ENODEV; goto out_unlock; } - len = min_t(size_t, vhca_buf->length - *pos, len); while (len) { - size_t page_offset; - struct page *page; - size_t page_len; - u8 *from_buff; - int ret; + ssize_t count; + + vhca_buf = mlx5vf_get_data_buff_from_pos(migf, *pos, + &end_of_data); + if (first_loop_call) { + first_loop_call = false; + if (end_of_data && migf->state != MLX5_MIGF_STATE_COMPLETE) { + if (filp->f_flags & O_NONBLOCK) { + done = -EAGAIN; + goto out_unlock; + } + } + } - page_offset = (*pos) % PAGE_SIZE; - page = mlx5vf_get_migration_page(vhca_buf, *pos - page_offset); - if (!page) { - if (done == 0) - done = -EINVAL; + if (end_of_data) + goto out_unlock; + + if (!vhca_buf) { + done = -EINVAL; goto out_unlock; } - page_len = min_t(size_t, len, PAGE_SIZE - page_offset); - from_buff = kmap_local_page(page); - ret = copy_to_user(buf, from_buff + page_offset, page_len); - kunmap_local(from_buff); - if (ret) { - done = -EFAULT; + count = mlx5vf_buf_read(vhca_buf, &buf, &len, pos); + if (count < 0) { + done = count; goto out_unlock; } - *pos += page_len; - len -= page_len; - done += page_len; - buf += page_len; + done += count; } out_unlock: @@ -202,7 +275,8 @@ static __poll_t mlx5vf_save_poll(struct file *filp, mutex_lock(&migf->lock); if (migf->state == MLX5_MIGF_STATE_ERROR) pollflags = EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; - else if (READ_ONCE(migf->buf->length)) + else if (!list_empty(&migf->buf_list) || + migf->state == MLX5_MIGF_STATE_COMPLETE) pollflags = EPOLLIN | EPOLLRDNORM; mutex_unlock(&migf->lock); @@ -248,6 +322,8 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) complete(&migf->save_comp); mlx5_cmd_init_async_ctx(mvdev->mdev, &migf->async_ctx); INIT_WORK(&migf->async_data.work, mlx5vf_mig_file_cleanup_cb); + INIT_LIST_HEAD(&migf->buf_list); + spin_lock_init(&migf->list_lock); ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length); if (ret) goto out_pd; @@ -261,7 +337,6 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf); if (ret) goto out_save; - migf->buf = buf; return migf; out_save: mlx5vf_free_data_buffer(buf); @@ -381,6 +456,8 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) migf->buf = buf; stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); + INIT_LIST_HEAD(&migf->buf_list); + spin_lock_init(&migf->list_lock); return migf; out_pd: mlx5vf_cmd_dealloc_pd(migf); From patchwork Thu Nov 24 17:39:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13055236 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74B3FC433FE for ; Thu, 24 Nov 2022 17:41:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229641AbiKXRlI (ORCPT ); Thu, 24 Nov 2022 12:41:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229624AbiKXRk7 (ORCPT ); Thu, 24 Nov 2022 12:40:59 -0500 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2071.outbound.protection.outlook.com [40.107.93.71]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C5461400E6 for ; Thu, 24 Nov 2022 09:40:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ndummpaSACvCW/fHuCzIv7FMtlOw8huaglKhYvZdhdxlvN2VEsTLH3EjeNPU8cOaKZTuM8yjRIKKaDGxRk+AHwsnGtlAugiT3FJ4pyd+ienrJGOkrxbqebV7BA3RqRaK/v0eonmtzN/iKiKPf92GAoFCF26XnCLtFbpB/l0DkCC5MpcRQG6TEl4OQCOxNcAUANPu9yu61sIDPGLnuqjiSpJ7EGUgefFRIkX4DNYfv8JFRvH/l3Zz+tfyZAmycN38NtaSuIGH70OMhC7xfOl+N+A+Cgxa+E98GMUjsaC8548fR4sYA78dIXeeuJ/yP3fXfr3dvUOVvdpi/XmUPf9e9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YwSoQN+MGNgpLMPbT48xpFnp9P3NPEFx1TSUutlBWf8=; b=C6Su40vygIKJWmAztxmAGgLTN9doSE/7OhedveY3EbKbHy0KWZS1Z4UPY8GOIGGM4UBV8rooQ+ChEI3BMn2gwqg/5ENw7BUN8Pz6GUuAIMNSSgIU9r2H2kdY/HnLGTEepMh5SsJiwdmPsdldT5xmlR20lELTDSRGq9s1GVcTX8qezyFdBdHNC/c8hCVm/wLcE/fhnCxIDirkjuqS9E9dTnoeh+2kIczXcQ7aS84h9rFJYQmFjy+6ZroFSLvA28ImYmMRF7OMg6xWLyFJhOG3t/kTghHFRA8lUiO1DJPLOuw8KxjVTlHlThXZbLzq0i5mjpmv2fwX1ZBI4qcKikslaw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YwSoQN+MGNgpLMPbT48xpFnp9P3NPEFx1TSUutlBWf8=; b=rAfGGjzfxl7iMmmW8k5aJJQybhzUlA/4MPdUutxev09kkL6pCHqsSPpQHYIl9/mBaqW5YQ5n5S4D0yXiK2eL5IC58QtfI4GZVvyV5bSP40Z/a69221TZMTYQeqEn01IeVhtxHHe+XZtCRoxuvDTannqvDSqX5j91PHUqTGPwBXHJBejQxyum9/tPF1XO6848hmQodH4li2+CueXtljKGVJnHhwCiALtltipBJVnWyqG8U0R6yeI8j1yWYMsrJKPjvo1Lw0D1qgi6TTDOTmiceRtYEBy2gYbLXBrBvt7MzvHpNQtLQuY8ezsOj/wj26i1xwAIhp5v0hgpR1YCo1aEhg== Received: from MW4P221CA0024.NAMP221.PROD.OUTLOOK.COM (2603:10b6:303:8b::29) by DM4PR12MB6009.namprd12.prod.outlook.com (2603:10b6:8:69::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19; Thu, 24 Nov 2022 17:40:54 +0000 Received: from CO1NAM11FT105.eop-nam11.prod.protection.outlook.com (2603:10b6:303:8b:cafe::80) by MW4P221CA0024.outlook.office365.com (2603:10b6:303:8b::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:40:54 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by CO1NAM11FT105.mail.protection.outlook.com (10.13.175.159) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:40:54 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:46 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:46 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 24 Nov 2022 09:40:43 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH V1 vfio 08/14] vfio/mlx5: Introduce device transitions of PRE_COPY Date: Thu, 24 Nov 2022 19:39:26 +0200 Message-ID: <20221124173932.194654-9-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com> References: <20221124173932.194654-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT105:EE_|DM4PR12MB6009:EE_ X-MS-Office365-Filtering-Correlation-Id: 89be676e-b7be-4735-9fe1-08dace43097a X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 8VoO2w4szYHfy8dTvOooeu1hy0+bft+0MT71YLzgz2RvppWZ0Lv7A0c4FJFn1E47xBycHDxPHmPgiDWqc/dhSlz350orTGpLVafePAAAWj1CMJXrwK2twhO1rhZyXFKyeJXpSjFiMkyGEQ7m2TVZx3PWx2BZ+B0W3APCCapzRLth3BuzF2VEfLytJus/QbnK2w7SHy6rC+KTBKmLu20LfxPI1sCqeaPHNSNTgaaJGhXEBQ2dhzZ/JaNh8X6WbNYM+LMh3mJgfHO4lQ05357Lb7k2xvOSqWTePgdewOeU1SmbzsZntFB/KasAsUCn4gPLXi99DDQxxoqCQ0mfFxupmOCJLGp7eOHllc2cLJLm/idud0PloaOc9OCQEXhMw1hEhTc0xgWES/l8Q7HK4Br2rwPYzE8VttX0D7vwNXZPiqS7EuVRsCaVVeC/JXoc8gCYx6GBO0oEyvue6xZhTogPRIxSUM6r4WZlYOl0wHaxM4IkZMGnzMUVReWT+ugDZKm5wZOAFPMcW9qgAV4/azAn5RCmfyUgOKJiqeVPh05UVdqq29M/S4AZdQgrjY5l81HnHBwQbi55jNDIXNOBJDinx7SQf2c2QQMWrAoqZ3U2h0ltRHtKB3yxb3bkNx+Ow91TrD/JO6q3eeEe9LuBOta1dc0WQu5pN7IbU7pfx9Arivltz+Kjwohu077VxgKaumrs+qDcZQdqzv4RDF5PCk/48Q== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(396003)(136003)(39860400002)(376002)(346002)(451199015)(40470700004)(36840700001)(46966006)(8936002)(41300700001)(2906002)(86362001)(4326008)(70586007)(8676002)(70206006)(82310400005)(36756003)(7696005)(478600001)(6666004)(26005)(356005)(7636003)(40480700001)(83380400001)(47076005)(426003)(2616005)(1076003)(186003)(110136005)(54906003)(336012)(5660300002)(82740400003)(316002)(6636002)(30864003)(36860700001)(40460700003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2022 17:40:54.2571 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 89be676e-b7be-4735-9fe1-08dace43097a X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT105.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB6009 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In order to support PRE_COPY, mlx5 driver is transferring multiple states (images) of the device. e.g.: the source VF can save and transfer multiple states, and the target VF will load them by that order. The device is saving three kinds of states: 1) Initial state - when the device moves to PRE_COPY state. 2) Middle state - during PRE_COPY phase via VFIO_MIG_GET_PRECOPY_INFO. There can be multiple states of this type. 3) Final state - when the device moves to STOP_COPY state. After moving to PRE_COPY state, user is holding the saving migf FD and can use it. For example: user can start transferring data via read() callback. Also, user can switch from PRE_COPY to STOP_COPY whenever he sees it fits. This will invoke saving of final state. This means that mlx5 VFIO device can be switched to STOP_COPY without transferring any data in PRE_COPY state. Therefore, when the device moves to STOP_COPY, mlx5 will store the final state on a dedicated queue entry on the list. Co-developed-by: Shay Drory Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 96 +++++++++++++++++++++++++++++++++--- drivers/vfio/pci/mlx5/cmd.h | 16 +++++- drivers/vfio/pci/mlx5/main.c | 90 ++++++++++++++++++++++++++++++--- 3 files changed, 184 insertions(+), 18 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 0e36b4c8c816..5fcece201d4c 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -14,18 +14,36 @@ _mlx5vf_free_page_tracker_resources(struct mlx5vf_pci_core_device *mvdev); int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod) { + struct mlx5_vf_migration_file *migf = mvdev->saving_migf; u32 out[MLX5_ST_SZ_DW(suspend_vhca_out)] = {}; u32 in[MLX5_ST_SZ_DW(suspend_vhca_in)] = {}; + int err; lockdep_assert_held(&mvdev->state_mutex); if (mvdev->mdev_detach) return -ENOTCONN; + /* + * In case PRE_COPY is used, saving_migf is exposed while the device is + * running. Make sure to run only once there is no active save command. + * Running both in parallel, might end-up with a failure in the save + * command once it will try to turn on 'tracking' on a suspended device. + */ + if (migf) { + err = wait_for_completion_interruptible(&migf->save_comp); + if (err) + return err; + } + MLX5_SET(suspend_vhca_in, in, opcode, MLX5_CMD_OP_SUSPEND_VHCA); MLX5_SET(suspend_vhca_in, in, vhca_id, mvdev->vhca_id); MLX5_SET(suspend_vhca_in, in, op_mod, op_mod); - return mlx5_cmd_exec_inout(mvdev->mdev, suspend_vhca, in, out); + err = mlx5_cmd_exec_inout(mvdev->mdev, suspend_vhca, in, out); + if (migf) + complete(&migf->save_comp); + + return err; } int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod) @@ -45,7 +63,7 @@ int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod) } int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, - size_t *state_size) + size_t *state_size, u8 query_flags) { u32 out[MLX5_ST_SZ_DW(query_vhca_migration_state_out)] = {}; u32 in[MLX5_ST_SZ_DW(query_vhca_migration_state_in)] = {}; @@ -59,6 +77,8 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE); MLX5_SET(query_vhca_migration_state_in, in, vhca_id, mvdev->vhca_id); MLX5_SET(query_vhca_migration_state_in, in, op_mod, 0); + MLX5_SET(query_vhca_migration_state_in, in, incremental, + query_flags & MLX5VF_QUERY_INC); ret = mlx5_cmd_exec_inout(mvdev->mdev, query_vhca_migration_state, in, out); @@ -342,6 +362,56 @@ mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, return ERR_PTR(ret); } +void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffer *buf) +{ + spin_lock_irq(&buf->migf->list_lock); + list_add_tail(&buf->buf_elm, &buf->migf->avail_list); + spin_unlock_irq(&buf->migf->list_lock); +} + +struct mlx5_vhca_data_buffer * +mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, + size_t length, enum dma_data_direction dma_dir) +{ + struct mlx5_vhca_data_buffer *buf, *temp_buf; + struct list_head free_list; + + lockdep_assert_held(&migf->mvdev->state_mutex); + if (migf->mvdev->mdev_detach) + return ERR_PTR(-ENOTCONN); + + INIT_LIST_HEAD(&free_list); + + spin_lock_irq(&migf->list_lock); + list_for_each_entry_safe(buf, temp_buf, &migf->avail_list, buf_elm) { + if (buf->dma_dir == dma_dir) { + list_del_init(&buf->buf_elm); + if (buf->allocated_length >= length) { + spin_unlock_irq(&migf->list_lock); + goto found; + } + /* + * Prevent holding redundant buffers. Put in a free + * list and call at the end not under the spin lock + * (&migf->list_lock) to mlx5vf_free_data_buffer which + * might sleep. + */ + list_add(&buf->buf_elm, &free_list); + } + } + spin_unlock_irq(&migf->list_lock); + buf = mlx5vf_alloc_data_buffer(migf, length, dma_dir); + +found: + while ((temp_buf = list_first_entry_or_null(&free_list, + struct mlx5_vhca_data_buffer, buf_elm))) { + list_del(&temp_buf->buf_elm); + mlx5vf_free_data_buffer(temp_buf); + } + + return buf; +} + void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) { struct mlx5vf_async_data *async_data = container_of(_work, @@ -351,7 +421,7 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) mutex_lock(&migf->lock); if (async_data->status) { - migf->buf = async_data->buf; + mlx5vf_put_data_buffer(async_data->buf); migf->state = MLX5_MIGF_STATE_ERROR; wake_up_interruptible(&migf->poll_wait); } @@ -369,15 +439,19 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) struct mlx5_vf_migration_file, async_data); if (!status) { + size_t image_size; unsigned long flags; - async_data->buf->length = - MLX5_GET(save_vhca_state_out, async_data->out, - actual_image_size); + image_size = MLX5_GET(save_vhca_state_out, async_data->out, + actual_image_size); + async_data->buf->length = image_size; + async_data->buf->start_pos = migf->max_pos; + migf->max_pos += async_data->buf->length; spin_lock_irqsave(&migf->list_lock, flags); list_add_tail(&async_data->buf->buf_elm, &migf->buf_list); spin_unlock_irqrestore(&migf->list_lock, flags); - migf->state = MLX5_MIGF_STATE_COMPLETE; + if (async_data->last_chunk) + migf->state = MLX5_MIGF_STATE_COMPLETE; wake_up_interruptible(&migf->poll_wait); } @@ -391,7 +465,8 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, struct mlx5_vf_migration_file *migf, - struct mlx5_vhca_data_buffer *buf) + struct mlx5_vhca_data_buffer *buf, bool inc, + bool track) { u32 out_size = MLX5_ST_SZ_BYTES(save_vhca_state_out); u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {}; @@ -412,9 +487,12 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, MLX5_SET(save_vhca_state_in, in, vhca_id, mvdev->vhca_id); MLX5_SET(save_vhca_state_in, in, mkey, buf->mkey); MLX5_SET(save_vhca_state_in, in, size, buf->allocated_length); + MLX5_SET(save_vhca_state_in, in, incremental, inc); + MLX5_SET(save_vhca_state_in, in, set_track, track); async_data = &migf->async_data; async_data->buf = buf; + async_data->last_chunk = !track; async_data->out = kvzalloc(out_size, GFP_KERNEL); if (!async_data->out) { err = -ENOMEM; @@ -497,6 +575,8 @@ void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf) migf->buf = NULL; } + list_splice(&migf->avail_list, &migf->buf_list); + while ((entry = list_first_entry_or_null(&migf->buf_list, struct mlx5_vhca_data_buffer, buf_elm))) { list_del(&entry->buf_elm); diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 6e594689566e..34e61c7aa23d 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -38,6 +38,7 @@ struct mlx5vf_async_data { struct work_struct work; struct mlx5_vhca_data_buffer *buf; int status; + u8 last_chunk:1; void *out; }; @@ -47,9 +48,11 @@ struct mlx5_vf_migration_file { enum mlx5_vf_migf_state state; u32 pdn; + loff_t max_pos; struct mlx5_vhca_data_buffer *buf; spinlock_t list_lock; struct list_head buf_list; + struct list_head avail_list; struct mlx5vf_pci_core_device *mvdev; wait_queue_head_t poll_wait; struct completion save_comp; @@ -129,10 +132,14 @@ struct mlx5vf_pci_core_device { struct mlx5_core_dev *mdev; }; +enum { + MLX5VF_QUERY_INC = (1UL << 0), +}; + int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod); int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod); int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, - size_t *state_size); + size_t *state_size, u8 query_flags); void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev, const struct vfio_migration_ops *mig_ops, const struct vfio_log_ops *log_ops); @@ -140,7 +147,8 @@ void mlx5vf_cmd_remove_migratable(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_cmd_close_migratable(struct mlx5vf_pci_core_device *mvdev); int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, struct mlx5_vf_migration_file *migf, - struct mlx5_vhca_data_buffer *buf); + struct mlx5_vhca_data_buffer *buf, bool inc, + bool track); int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, struct mlx5_vf_migration_file *migf, struct mlx5_vhca_data_buffer *buf); @@ -151,6 +159,10 @@ struct mlx5_vhca_data_buffer * mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, size_t length, enum dma_data_direction dma_dir); void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf); +struct mlx5_vhca_data_buffer * +mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, + size_t length, enum dma_data_direction dma_dir); +void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffer *buf); int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, unsigned int npages); void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev); diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index facb5ab6021e..e86489d5dd6e 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -195,6 +195,7 @@ static ssize_t mlx5vf_buf_read(struct mlx5_vhca_data_buffer *vhca_buf, if (*pos >= vhca_buf->start_pos + vhca_buf->length) { spin_lock_irq(&vhca_buf->migf->list_lock); list_del_init(&vhca_buf->buf_elm); + list_add_tail(&vhca_buf->buf_elm, &vhca_buf->migf->avail_list); spin_unlock_irq(&vhca_buf->migf->list_lock); } @@ -283,6 +284,16 @@ static __poll_t mlx5vf_save_poll(struct file *filp, return pollflags; } +/* + * FD is exposed and user can use it after receiving an error. + * Mark migf in error, and wake the user. + */ +static void mlx5vf_mark_err(struct mlx5_vf_migration_file *migf) +{ + migf->state = MLX5_MIGF_STATE_ERROR; + wake_up_interruptible(&migf->poll_wait); +} + static const struct file_operations mlx5vf_save_fops = { .owner = THIS_MODULE, .read = mlx5vf_save_read, @@ -291,8 +302,42 @@ static const struct file_operations mlx5vf_save_fops = { .llseek = no_llseek, }; +static int mlx5vf_pci_save_device_inc_data(struct mlx5vf_pci_core_device *mvdev) +{ + struct mlx5_vf_migration_file *migf = mvdev->saving_migf; + struct mlx5_vhca_data_buffer *buf; + size_t length; + int ret; + + if (migf->state == MLX5_MIGF_STATE_ERROR) + return -ENODEV; + + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, + MLX5VF_QUERY_INC); + if (ret) + goto err; + + buf = mlx5vf_get_data_buffer(migf, length, DMA_FROM_DEVICE); + if (IS_ERR(buf)) { + ret = PTR_ERR(buf); + goto err; + } + + ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, true, false); + if (ret) + goto err_save; + + return 0; + +err_save: + mlx5vf_put_data_buffer(buf); +err: + mlx5vf_mark_err(migf); + return ret; +} + static struct mlx5_vf_migration_file * -mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) +mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track) { struct mlx5_vf_migration_file *migf; struct mlx5_vhca_data_buffer *buf; @@ -323,8 +368,9 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) mlx5_cmd_init_async_ctx(mvdev->mdev, &migf->async_ctx); INIT_WORK(&migf->async_data.work, mlx5vf_mig_file_cleanup_cb); INIT_LIST_HEAD(&migf->buf_list); + INIT_LIST_HEAD(&migf->avail_list); spin_lock_init(&migf->list_lock); - ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length); + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, 0); if (ret) goto out_pd; @@ -334,7 +380,7 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) goto out_pd; } - ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf); + ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, false, track); if (ret) goto out_save; return migf; @@ -457,6 +503,7 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); INIT_LIST_HEAD(&migf->buf_list); + INIT_LIST_HEAD(&migf->avail_list); spin_lock_init(&migf->list_lock); return migf; out_pd: @@ -509,7 +556,8 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, return NULL; } - if (cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_RUNNING_P2P) { + if ((cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_RUNNING_P2P) || + (cur == VFIO_DEVICE_STATE_PRE_COPY && new == VFIO_DEVICE_STATE_PRE_COPY_P2P)) { ret = mlx5vf_cmd_suspend_vhca(mvdev, MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_INITIATOR); if (ret) @@ -517,7 +565,8 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, return NULL; } - if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_RUNNING) { + if ((cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_RUNNING) || + (cur == VFIO_DEVICE_STATE_PRE_COPY_P2P && new == VFIO_DEVICE_STATE_PRE_COPY)) { ret = mlx5vf_cmd_resume_vhca(mvdev, MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_INITIATOR); if (ret) @@ -528,7 +577,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_STOP_COPY) { struct mlx5_vf_migration_file *migf; - migf = mlx5vf_pci_save_device_data(mvdev); + migf = mlx5vf_pci_save_device_data(mvdev, false); if (IS_ERR(migf)) return ERR_CAST(migf); get_file(migf->filp); @@ -536,7 +585,10 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, return migf->filp; } - if ((cur == VFIO_DEVICE_STATE_STOP_COPY && new == VFIO_DEVICE_STATE_STOP)) { + if ((cur == VFIO_DEVICE_STATE_STOP_COPY && new == VFIO_DEVICE_STATE_STOP) || + (cur == VFIO_DEVICE_STATE_PRE_COPY && new == VFIO_DEVICE_STATE_RUNNING) || + (cur == VFIO_DEVICE_STATE_PRE_COPY_P2P && + new == VFIO_DEVICE_STATE_RUNNING_P2P)) { mlx5vf_disable_fds(mvdev); return NULL; } @@ -562,6 +614,28 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, return NULL; } + if ((cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_PRE_COPY) || + (cur == VFIO_DEVICE_STATE_RUNNING_P2P && + new == VFIO_DEVICE_STATE_PRE_COPY_P2P)) { + struct mlx5_vf_migration_file *migf; + + migf = mlx5vf_pci_save_device_data(mvdev, true); + if (IS_ERR(migf)) + return ERR_CAST(migf); + get_file(migf->filp); + mvdev->saving_migf = migf; + return migf->filp; + } + + if (cur == VFIO_DEVICE_STATE_PRE_COPY_P2P && new == VFIO_DEVICE_STATE_STOP_COPY) { + ret = mlx5vf_cmd_suspend_vhca(mvdev, + MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_RESPONDER); + if (ret) + return ERR_PTR(ret); + ret = mlx5vf_pci_save_device_inc_data(mvdev); + return ret ? ERR_PTR(ret) : NULL; + } + /* * vfio_mig_get_next_state() does not use arcs other than the above */ @@ -630,7 +704,7 @@ static int mlx5vf_pci_get_data_size(struct vfio_device *vdev, mutex_lock(&mvdev->state_mutex); ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, - &state_size); + &state_size, 0); if (!ret) *stop_copy_length = state_size; mlx5vf_state_mutex_unlock(mvdev); From patchwork Thu Nov 24 17:39:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13055238 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23C1DC4321E for ; Thu, 24 Nov 2022 17:41:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229648AbiKXRlL (ORCPT ); Thu, 24 Nov 2022 12:41:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229670AbiKXRlE (ORCPT ); Thu, 24 Nov 2022 12:41:04 -0500 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2056.outbound.protection.outlook.com [40.107.244.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1047F146F84 for ; Thu, 24 Nov 2022 09:41:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=eJ0f4wo3lF/s7LWQI4jSLx5BWllP7bqeYWmTSs5Qu69O5Kuy5ypVusi7RMQHtxKRhcwZeuCMZhuXJr1KmQv8StSCpNnfMzSTVBKqd0Zkcz8xcbjRsXWq58BtvXRzPtezyNVOXECqMAdrufiEIXqN23jtDbAIc28euvK71YUllmwg8KRmO9nGDbQJkwYAvMggBbd2FHoiSu0H+eRU6XGgqgRK1nh1h28pHJsr0kq7el8baOlRm+sizPGfeAPzB2xJF7/wKE77BBf6C2UWdSOTr285I384bHdN5oa+cSbB7WRvwfoWDhXyElEkzl4kHxbBEEB1n115jFE6NutSJtQVNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Tr8u9pn6TG1+o/fEH3tZ+hUdPpAExfBr9OEj77LaxrQ=; b=UoQyY138NMEANxhRglECZ9TQWydEJxzUUhi1VYPCwi8hCmKdRG+hAZRX2Fa/BFSwlprI9Foj5nxtF8eOG49LkKmLKUKdVRfBo73rYUawb+X483hjloVVDZlS8s8iSq1iAH3UiVSNfyHX1r8qZcLtVVZEas6a+iO/RhwoK9GnoJlixL/zOPdI5QWUp/mWWK+SOG420ova3QXXFZ+SkpX/rOmkIDsLykLUiLeOQ3cEmP526JaDHymPVBXVitAnmSLyCNEs6aLHeUKjX2KE164D5zUm3oKWhQlHE3IYyrKNwlwtBEt6P6KwkNfRG1tuI3W4DK7QRjT+CqjEaHZ/DrL8dw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Tr8u9pn6TG1+o/fEH3tZ+hUdPpAExfBr9OEj77LaxrQ=; b=lHrKr9c0Gsp8OWyQcRuM0mhVt7ijcrsqURaru5fSuDANcOkx2YXCUATofrEgHA/2in9LdfXRMApN4d7/xxS7DP08VdSTPhY3yHJXkKaFZo37PpUtAHEjIu0ZGHMsmPWWo4c+dIDOcfg6JN9r9oAaC0XyeRThtvcSRj5f7wecPO7vDbYYpn5Xvw+24cAnJG+es23YNwYocWKASQwHufMDY6NFYVhMyNvUBzZs79LSC/4kqLT98N2E6LexLfUAnm8lrH5IHXxBzJrYZOAbuDAXT4RPSAkEpu78zHRcufoxOqflC6QZU+i4eZ75M3I1G3OFcd8TD4BHehbn1q8J3sRsVQ== Received: from DS7PR03CA0229.namprd03.prod.outlook.com (2603:10b6:5:3ba::24) by BL0PR12MB4914.namprd12.prod.outlook.com (2603:10b6:208:1c4::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19; Thu, 24 Nov 2022 17:41:00 +0000 Received: from DM6NAM11FT087.eop-nam11.prod.protection.outlook.com (2603:10b6:5:3ba:cafe::4a) by DS7PR03CA0229.outlook.office365.com (2603:10b6:5:3ba::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:41:00 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DM6NAM11FT087.mail.protection.outlook.com (10.13.172.150) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5834.8 via Frontend Transport; Thu, 24 Nov 2022 17:41:00 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:49 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:49 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 24 Nov 2022 09:40:46 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH V1 vfio 09/14] vfio/mlx5: Introduce SW headers for migration states Date: Thu, 24 Nov 2022 19:39:27 +0200 Message-ID: <20221124173932.194654-10-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com> References: <20221124173932.194654-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT087:EE_|BL0PR12MB4914:EE_ X-MS-Office365-Filtering-Correlation-Id: c25726d2-97eb-42d0-8d11-08dace430d13 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: lsHZvuVyxE9kJuj01aZeAdL2zaeTsKFS7c0MZaw1mTQGdDx/o5cIQa4YJEvmARZfpJBmJA/jgPFvH9NyW324Qpg9RzNRz7YuWuNhnMZ+OL1+1YtCdt/cYErdZ0OWQ5505yEG3Y1VOb4tKzDlxY2laYzv0Igv4ZDGve8qTQWM3KVKJmnsMF4P29L3WdS37y+pmZVNKs0JA7Hl1mBHEEYFaM6Toz+N6Zdce3R1QH9FHqdzudEJSXl10rsRtPA9v+9lz6CaqciYyCEYCTvabTA+9NihW72NFCof26f6eiJW93Fj6tyHuw0RHpXgb/eqMFqSIB3D65Y+VEw17gI+D33LvezIE7klWsVUSYJq7iaSSMxgsVM/OddE8P9ZOHduT94aXqsa5RSxTZvDSzJjVW6879Tfana3PATWPKhjAEdLLU0YRqe9uRMciDK/sAjoOmbs3s7J9OcoeX6TLSxR1DCYeFaMjjZVZAJUVibvkEbC0S1PT3w0Ki8LvFIKndche2C94lrdxHk33i764N4xfecNE3SB9NktdQaBSmxyxVYYlW340wkpBi+iwJAqre3mEYy3RmX9hkFG5WepsvDpfUamCWWNSWt+k33Ns/fLqKPbTOJ9FzjL2pRQjImHjssTdi8liFILo2zusvT2Z8vdsqLps60k2iIjYfWqKwiXc51egDh1ltrAL85R1R36dfufg9rpRTHPWQPZGUZOZ0vCS+KvcQ== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(396003)(346002)(136003)(376002)(451199015)(40470700004)(36840700001)(46966006)(36756003)(83380400001)(1076003)(2616005)(41300700001)(47076005)(5660300002)(426003)(336012)(186003)(8936002)(36860700001)(40480700001)(86362001)(40460700003)(2906002)(82310400005)(82740400003)(6636002)(7696005)(26005)(110136005)(356005)(316002)(7636003)(6666004)(54906003)(4326008)(478600001)(70206006)(70586007)(8676002);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2022 17:41:00.2456 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c25726d2-97eb-42d0-8d11-08dace430d13 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT087.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL0PR12MB4914 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org As mentioned in the previous patches, mlx5 is transferring multiple states when the PRE_COPY protocol is used. This states mechanism requires the target VM to know the states' size in order to execute multiple loads. Therefore, add SW header, with the needed information, for each saved state the source VM is transferring to the target VM. This patch implements the source VM handling of the headers, following patch will implement the target VM handling of the headers. Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 56 ++++++++++++++++++++++++++++++++++-- drivers/vfio/pci/mlx5/cmd.h | 10 +++++++ drivers/vfio/pci/mlx5/main.c | 2 +- 3 files changed, 64 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 5fcece201d4c..0af1205e6363 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -351,9 +351,11 @@ mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, if (ret) goto end; - ret = mlx5vf_dma_data_buffer(buf); - if (ret) - goto end; + if (dma_dir != DMA_NONE) { + ret = mlx5vf_dma_data_buffer(buf); + if (ret) + goto end; + } } return buf; @@ -422,6 +424,8 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) mutex_lock(&migf->lock); if (async_data->status) { mlx5vf_put_data_buffer(async_data->buf); + if (async_data->header_buf) + mlx5vf_put_data_buffer(async_data->header_buf); migf->state = MLX5_MIGF_STATE_ERROR; wake_up_interruptible(&migf->poll_wait); } @@ -431,6 +435,32 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) fput(migf->filp); } +static int add_buf_header(struct mlx5_vhca_data_buffer *header_buf, + size_t image_size) +{ + struct mlx5_vf_migration_file *migf = header_buf->migf; + struct mlx5_vf_migration_header header = {}; + unsigned long flags; + struct page *page; + u8 *to_buff; + + header.image_size = cpu_to_le64(image_size); + page = mlx5vf_get_migration_page(header_buf, 0); + if (!page) + return -EINVAL; + to_buff = kmap_local_page(page); + memcpy(to_buff, &header, sizeof(header)); + kunmap_local(to_buff); + header_buf->length = sizeof(header); + header_buf->header_image_size = image_size; + header_buf->start_pos = header_buf->migf->max_pos; + migf->max_pos += header_buf->length; + spin_lock_irqsave(&migf->list_lock, flags); + list_add_tail(&header_buf->buf_elm, &migf->buf_list); + spin_unlock_irqrestore(&migf->list_lock, flags); + return 0; +} + static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) { struct mlx5vf_async_data *async_data = container_of(context, @@ -444,6 +474,11 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) image_size = MLX5_GET(save_vhca_state_out, async_data->out, actual_image_size); + if (async_data->header_buf) { + status = add_buf_header(async_data->header_buf, image_size); + if (status) + goto err; + } async_data->buf->length = image_size; async_data->buf->start_pos = migf->max_pos; migf->max_pos += async_data->buf->length; @@ -455,6 +490,7 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) wake_up_interruptible(&migf->poll_wait); } +err: /* * The error and the cleanup flows can't run from an * interrupt context @@ -470,6 +506,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, { u32 out_size = MLX5_ST_SZ_BYTES(save_vhca_state_out); u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {}; + struct mlx5_vhca_data_buffer *header_buf = NULL; struct mlx5vf_async_data *async_data; int err; @@ -499,6 +536,16 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, goto err_out; } + if (track || inc) { + header_buf = mlx5vf_get_data_buffer(migf, + sizeof(struct mlx5_vf_migration_header), DMA_NONE); + if (IS_ERR(header_buf)) { + err = PTR_ERR(header_buf); + goto err_free; + } + } + + async_data->header_buf = header_buf; get_file(migf->filp); err = mlx5_cmd_exec_cb(&migf->async_ctx, in, sizeof(in), async_data->out, @@ -510,7 +557,10 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, return 0; err_exec: + if (header_buf) + mlx5vf_put_data_buffer(header_buf); fput(migf->filp); +err_free: kvfree(async_data->out); err_out: complete(&migf->save_comp); diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 34e61c7aa23d..2b77e2ab9cd2 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -17,11 +17,18 @@ enum mlx5_vf_migf_state { MLX5_MIGF_STATE_COMPLETE, }; +struct mlx5_vf_migration_header { + __le64 image_size; + /* For future use in case we may need to change the kernel protocol */ + __le64 flags; +}; + struct mlx5_vhca_data_buffer { struct sg_append_table table; loff_t start_pos; u64 length; u64 allocated_length; + u64 header_image_size; u32 mkey; enum dma_data_direction dma_dir; u8 dmaed:1; @@ -37,6 +44,7 @@ struct mlx5vf_async_data { struct mlx5_async_work cb_work; struct work_struct work; struct mlx5_vhca_data_buffer *buf; + struct mlx5_vhca_data_buffer *header_buf; int status; u8 last_chunk:1; void *out; @@ -165,6 +173,8 @@ mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffer *buf); int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, unsigned int npages); +struct page *mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, + unsigned long offset); void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work); diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index e86489d5dd6e..ec52c8c4533a 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -32,7 +32,7 @@ static struct mlx5vf_pci_core_device *mlx5vf_drvdata(struct pci_dev *pdev) core_device); } -static struct page * +struct page * mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, unsigned long offset) { From patchwork Thu Nov 24 17:39:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13055239 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 616E9C433FE for ; Thu, 24 Nov 2022 17:41:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229649AbiKXRlM (ORCPT ); Thu, 24 Nov 2022 12:41:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229644AbiKXRlD (ORCPT ); Thu, 24 Nov 2022 12:41:03 -0500 Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam04on2040.outbound.protection.outlook.com [40.107.102.40]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 622181400FE for ; Thu, 24 Nov 2022 09:41:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RMdcNGxuAi43OTlplhbBNHH/O+lyRqsYTOiyfLH7p5Vf4tLVEOTPHAjV2PtXHb6SSHBCTNN+ncRYw7eFz8VDBy/oXSPiL4JBZq86LdsH3CF+I3gIfUtcn89bdINwPBVPI6QQHxbl7qCmRVHc6rWWTYkaCHh/7b5w50Twi0tOr9Wdzx8BQbuVhH/pwc1/uBd0966+C7nzPUJ0xWC0YELT7nP1Pntcd9FhY/w1bIWbw8i/hiSa+nJ4R3KqtE3RRcx97cEBugBiSkKY2xPnmj1hnOk9KwlElppBkn9Tdvd1sHVP+i1QWs613XDp2bssoN7vVrK7f2Uzua+1wrFoag2bqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QFsl8zQWBTBRYZpbJFn4Kr4bmkk/+xbam/cg+HXCJgI=; b=FDqkbh4QPk2r7kzpGSpq4IPlmgWxHgoXoTbvLlwPQOg2GdDq7IpGUArurpTP8CabVI7FLII1uApqY6VaQSVotQhZzcQJPCIegE4b6WLzRt2OsqSDDAEvqg4g+CrC5XTAnjWnlcSrlEpvz5irVpkOxixzpOPqIL+k1c5V0Kq8hiXX1B001oNhRjLitIT18RDIHHFIalJLIvre2MSqWONlZXBu3TVEP9LiuIeHAh4DP5itGaBNgOxruzOSCwW3ChBFPgpm7QUqR1WJ8LTXnWxek82anb7+lfYtfYq8X4C1MP6O49VVApynJNaARDk10W2zMxNbFR8n5N2gNHlUXnR0Cw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QFsl8zQWBTBRYZpbJFn4Kr4bmkk/+xbam/cg+HXCJgI=; b=BvGLp/MMQ5hcCP3JZI6SpAyT0vbhSw7zuRkQEv47AzNkfVvbvE72LSJLxA0qogkkqiJWThK0dRbZ2hyT+FITHbvMunTqdR5tyBm10RcUFHPEd0pb1uMcdg5IG1YGRExlHxjlz512k7voQy39roDOGf8GPh9NBRWp/O27S4u3JFBf/fAzIcBO7/G0xRWs5ONOOEdFl9q7S4BgFWi852bksgFlnVOdW25N96r3m7+M0nKORIgWD9qSZdxc5bVGC6HtkxY/1QDDTA799YUy/OOMS4sRg8bjJnYa6xAP+MkRH6Ih4AMX3ed43xgsQP/kROGBGw/6XZd3gmON7wtdM9DTWA== Received: from MW4PR04CA0260.namprd04.prod.outlook.com (2603:10b6:303:88::25) by PH0PR12MB5678.namprd12.prod.outlook.com (2603:10b6:510:14e::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5834.15; Thu, 24 Nov 2022 17:40:59 +0000 Received: from CO1NAM11FT058.eop-nam11.prod.protection.outlook.com (2603:10b6:303:88:cafe::ba) by MW4PR04CA0260.outlook.office365.com (2603:10b6:303:88::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:40:59 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by CO1NAM11FT058.mail.protection.outlook.com (10.13.174.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5834.8 via Frontend Transport; Thu, 24 Nov 2022 17:40:58 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:53 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:52 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 24 Nov 2022 09:40:49 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH V1 vfio 10/14] vfio/mlx5: Introduce vfio precopy ioctl implementation Date: Thu, 24 Nov 2022 19:39:28 +0200 Message-ID: <20221124173932.194654-11-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com> References: <20221124173932.194654-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT058:EE_|PH0PR12MB5678:EE_ X-MS-Office365-Filtering-Correlation-Id: a9ac534b-f287-4e73-f3b4-08dace430c4b X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 62Kh9toXOhaqEHZL18oCOSuZy/5+AG9oC6rizHM5nb57+Vs2HGypMuE99kgHD5yNYuWcErJcSFZVlNfgQtVNJ3gXMU3JSRoiDhypos/RyeJfXRgDtQ4iZGQnxtM80ulbOLBWh0gGbrlob48JuUjmtZh+bw23oLiw8TEAzBl6AKLqNDgy150chnLvajgAcVdNLUbKpgcs6ZK0i/DnnEG+us0/ExTnbPsBk5tNYcaevFfBXJ219TFz1M2flvQafQ/bg4yD1/qfZ+9xKFKlMfZvV9PS0JtksjKpfSWR7AjNVk8y7jkgv5O4cA1YoknznMDCHYlN7eytfvJvKXu05tHfEnqQL+mKfeBMC1ILcIIL+ExukHTpdOAm36vh0SC5zOW15ab8DatfkB3K+uysu+oiDgxIgbb5n9OZYDqf/MgZgYoLoqfedBDoz4KdfmwZtyGoCNgljgMTTTiAxtsqXHjlas2IF724hh2XDYcppxqY62LESA1OAlRqNFqy4J+N3W2z2LvjON1Xidg8GYawrTGTuN1qofDXU2NRYN1lW4wzW0rhmptfz/drdiLBITP8qZ89n1u34tvXk3gjGYQJQNAYjWPY6CYU8ZGBU+ENdMJLJ2VeNefdE6cZJC9LQib0Qr5k/Rpw3v8zJV3rqxHNPhWmiiASCJMzRwuWx91ODbaDL2HsijtjpROG15AQJm0d30hRBsDy6SPos0jLwcyMI7KWAQ== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(136003)(346002)(376002)(396003)(39860400002)(451199015)(46966006)(40470700004)(36840700001)(70586007)(40460700003)(70206006)(36756003)(8676002)(86362001)(6666004)(5660300002)(36860700001)(41300700001)(4326008)(7696005)(26005)(8936002)(1076003)(54906003)(83380400001)(47076005)(336012)(2616005)(186003)(426003)(7636003)(316002)(356005)(6636002)(110136005)(478600001)(82310400005)(40480700001)(82740400003)(2906002);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2022 17:40:58.9843 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a9ac534b-f287-4e73-f3b4-08dace430c4b X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT058.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR12MB5678 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org vfio precopy ioctl returns an estimation of data available for transferring from the device. Whenever a user is using VFIO_MIG_GET_PRECOPY_INFO, track the current state of the device, and if needed, append the dirty data to the transfer FD data. This is done by saving a middle state. As mlx5 runs the SAVE command asynchronously, make sure to query for incremental data only once there is no active save command. Running both in parallel, might end-up with a failure in the incremental query command on un-tracked vhca. Also, a middle state will be saved only after the previous state has finished its SAVE command and has been fully transferred, this prevents endless use resources. Co-developed-by: Shay Drory Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 16 +++++ drivers/vfio/pci/mlx5/main.c | 111 +++++++++++++++++++++++++++++++++++ 2 files changed, 127 insertions(+) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 0af1205e6363..65764ca1b31a 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -67,12 +67,25 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, { u32 out[MLX5_ST_SZ_DW(query_vhca_migration_state_out)] = {}; u32 in[MLX5_ST_SZ_DW(query_vhca_migration_state_in)] = {}; + bool inc = query_flags & MLX5VF_QUERY_INC; int ret; lockdep_assert_held(&mvdev->state_mutex); if (mvdev->mdev_detach) return -ENOTCONN; + /* + * In case PRE_COPY is used, saving_migf is exposed while device is + * running. Make sure to run only once there is no active save command. + * Running both in parallel, might end-up with a failure in the + * incremental query command on un-tracked vhca. + */ + if (inc) { + ret = wait_for_completion_interruptible(&mvdev->saving_migf->save_comp); + if (ret) + return ret; + } + MLX5_SET(query_vhca_migration_state_in, in, opcode, MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE); MLX5_SET(query_vhca_migration_state_in, in, vhca_id, mvdev->vhca_id); @@ -82,6 +95,9 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, ret = mlx5_cmd_exec_inout(mvdev->mdev, query_vhca_migration_state, in, out); + if (inc) + complete(&mvdev->saving_migf->save_comp); + if (ret) return ret; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index ec52c8c4533a..2f5f83f2b2a4 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -294,10 +294,121 @@ static void mlx5vf_mark_err(struct mlx5_vf_migration_file *migf) wake_up_interruptible(&migf->poll_wait); } +static ssize_t mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd, + unsigned long arg) +{ + struct mlx5_vf_migration_file *migf = filp->private_data; + struct mlx5vf_pci_core_device *mvdev = migf->mvdev; + struct mlx5_vhca_data_buffer *buf; + struct vfio_precopy_info info = {}; + loff_t *pos = &filp->f_pos; + unsigned long minsz; + size_t inc_length = 0; + bool end_of_data; + int ret; + + if (cmd != VFIO_MIG_GET_PRECOPY_INFO) + return -ENOTTY; + + minsz = offsetofend(struct vfio_precopy_info, dirty_bytes); + + if (copy_from_user(&info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz < minsz) + return -EINVAL; + + mutex_lock(&mvdev->state_mutex); + if (mvdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY && + mvdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY_P2P) { + ret = -EINVAL; + goto err_state_unlock; + } + + /* + * We can't issue a SAVE command when the device is suspended, so as + * part of VFIO_DEVICE_STATE_PRE_COPY_P2P no reason to query for extra + * bytes that can't be read. + */ + if (mvdev->mig_state == VFIO_DEVICE_STATE_PRE_COPY) { + /* + * Once the query returns it's guaranteed that there is no + * active SAVE command. + * As so, the other code below is safe with the proper locks. + */ + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &inc_length, + MLX5VF_QUERY_INC); + if (ret) + goto err_state_unlock; + } + + mutex_lock(&migf->lock); + if (migf->state == MLX5_MIGF_STATE_ERROR) { + ret = -ENODEV; + goto err_migf_unlock; + } + + buf = mlx5vf_get_data_buff_from_pos(migf, *pos, &end_of_data); + if (buf) { + if (buf->start_pos == 0) { + info.initial_bytes = buf->header_image_size - *pos; + } else if (buf->start_pos == + sizeof(struct mlx5_vf_migration_header)) { + /* First data buffer following the header */ + info.initial_bytes = buf->start_pos + + buf->length - *pos; + } else { + info.dirty_bytes = buf->start_pos + buf->length - *pos; + } + } else { + if (!end_of_data) { + ret = -EINVAL; + goto err_migf_unlock; + } + + info.dirty_bytes = inc_length; + } + + if (!end_of_data || !inc_length) { + mutex_unlock(&migf->lock); + goto done; + } + + mutex_unlock(&migf->lock); + /* + * We finished transferring the current state and the device has a + * dirty state, save a new state to be ready for. + */ + buf = mlx5vf_get_data_buffer(migf, inc_length, DMA_FROM_DEVICE); + if (IS_ERR(buf)) { + ret = PTR_ERR(buf); + mlx5vf_mark_err(migf); + goto err_state_unlock; + } + + ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, true, true); + if (ret) { + mlx5vf_mark_err(migf); + mlx5vf_put_data_buffer(buf); + goto err_state_unlock; + } + +done: + mlx5vf_state_mutex_unlock(mvdev); + return copy_to_user((void __user *)arg, &info, minsz); +err_migf_unlock: + mutex_unlock(&migf->lock); +err_state_unlock: + mlx5vf_state_mutex_unlock(mvdev); + return ret; +} + static const struct file_operations mlx5vf_save_fops = { .owner = THIS_MODULE, .read = mlx5vf_save_read, .poll = mlx5vf_save_poll, + .unlocked_ioctl = mlx5vf_precopy_ioctl, + .compat_ioctl = compat_ptr_ioctl, .release = mlx5vf_release_file, .llseek = no_llseek, }; From patchwork Thu Nov 24 17:39:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13055242 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12F73C4332F for ; Thu, 24 Nov 2022 17:41:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229680AbiKXRlQ (ORCPT ); Thu, 24 Nov 2022 12:41:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229679AbiKXRlG (ORCPT ); Thu, 24 Nov 2022 12:41:06 -0500 Received: from NAM04-MW2-obe.outbound.protection.outlook.com (mail-mw2nam04on2056.outbound.protection.outlook.com [40.107.101.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 984E61369DD for ; Thu, 24 Nov 2022 09:41:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Iw3DcuiLY7gvnPKiF18U1FmVPcnh9vRM/swpLcY3sakOeFIVkU1Gq5zVRmtq9ABvuuEFU0ItRzZ4rxd990QfYFODTI1V3ElyqfU5mj25neOIDZ9njb6JmGO+01ybooj5A1hBqaLG1B+ehUSgdT/4peMbTRb+UVZAAVw0rktuMc83CsxxTuS8rvZmjgG98f8dGGVlOuFPlOz80fFt4F7KMDd+TCexg3K8da/ZcUdhG2vebnYEu/HCH+IIRvyFrn+jhyv46W32JXjoVHiWRB8qY9smCb0YrovCjCNUbumfozl7BdDV7zrXfWgCxr5ta2ZeyhYXXVfkTKwS+oUFmvwkRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pqoeIdxNP1oofRd8gziL41r0zTyecZeJsZERnxbUXdY=; b=XLhEgZKiuA4lsbewLzhafHiQE2XnvG+ournxmyvLGKLOQbYMm9xvVRIfF/XPWRnVSsAds+K/ZnjUhGXFAtyAQqZnby3ACMYeRmWyujFaYgUdulWI4IqUQndJXF1lGIzswQ2ZoTqBlj+KChADbUn4LFWPKh6icTl97WPVHmTw/MkSvf8UU40Q2Y1FBz7SQaz3am0WLzInUUYeLxVlb54PhxuwzBnTzwnbJfcQb9eT1KRtM+ErZGTGODDt9cqRwFclDAGbOPaZqr63Md2rlMkX9GSc5UT3PGnx0xyruFWGo3J26vxBU5ZRS5OJNLZCYBE7z3xJwt7DupUYAryXk6DXUw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=pqoeIdxNP1oofRd8gziL41r0zTyecZeJsZERnxbUXdY=; b=qKRTuoW3D4zYbdbmw1z2AqpCza6qIAMY/uW7cadxHOFNl6aURG5hTL00Fq1jSHBFuFbxray5EgBgpP8pJdiGjnYXZkPnDk+H8TPMO48cObmwlh1b1nBv5Hzedj9aJbbaKWVmg1+5oi4z9WKdTbjoR3oE3mEhfXGZJPjeydsX+YXdLjWGE2HnONQHlOANdFshzRjqICvCqTRktPDbFm76nH7eZupuZHrAlRWzorlf95UKbXDL+pLD1Yg8bms2e7rNoNctgH3H5df5KdfPGdW4bvfRri9aCqg4bqtgCIYK/YMJki5rf/rMnOkTWQgcKtocQwU0UF6XG4tLghwWcTeJ0g== Received: from DS7PR03CA0232.namprd03.prod.outlook.com (2603:10b6:5:3ba::27) by SA0PR12MB4384.namprd12.prod.outlook.com (2603:10b6:806:9f::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.18; Thu, 24 Nov 2022 17:41:03 +0000 Received: from DM6NAM11FT087.eop-nam11.prod.protection.outlook.com (2603:10b6:5:3ba:cafe::e8) by DS7PR03CA0232.outlook.office365.com (2603:10b6:5:3ba::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:41:03 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DM6NAM11FT087.mail.protection.outlook.com (10.13.172.150) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5834.8 via Frontend Transport; Thu, 24 Nov 2022 17:41:03 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:56 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:55 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 24 Nov 2022 09:40:53 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH V1 vfio 11/14] vfio/mlx5: Consider temporary end of stream as part of PRE_COPY Date: Thu, 24 Nov 2022 19:39:29 +0200 Message-ID: <20221124173932.194654-12-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com> References: <20221124173932.194654-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT087:EE_|SA0PR12MB4384:EE_ X-MS-Office365-Filtering-Correlation-Id: 7734db70-b224-4414-8637-08dace430eb9 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: jOs38wlNUHLeJNenfydaQ+UlgREbKbBotXhAES56lWJcvXJlRIQ7giPog1ZL+M25wixybBwbUW9qzPnkLa0gEDayMt2TPi9BWu2ub2G7sNzjGV+0OcYgQbDp4GPKeAmPfDMdi1z9xd3ofPHP0EAGReKZx+65ciOGlLG4133euvolwOu2wcXxXhhQd17V4W49HnPovZUWBYvYbrrdt0/5tB0FbK+nqZGjlUVGfN3ATWvpebl/i50ZmX02SbL1IYzkrBz8GQwZKYBchUxYbpv/8fHeGlRWI3YGTlijoLW3M+WD4O21JkamgkqcrGIg7MeW4zPqKb+qrYTd9z7cb9zFKGaOGeiXiFlpvdpdTC6Zc96F4KmfmcM2B9UZVNU0qF/wV0S1X7BlmwvljqZQ1f2j0+k37M8hRX//YnZKSKScOHSdxGA1rQ6XxLfSYOsd9RtQSaIQOEUzVq0ixBxG42t3yCKIA4bhvRKWmJMC4EKFe6wJ98vN1psjtbKIIg/r6T1AHGssnSwE22QqhuK3cpKje1qrGv2+Qj9zzVZ0iG6Cuf+waWQi5UX7HyH6xmL1Y1gjaMokGNrqAREQjd3dOQo9pkJdst/ArQAfe8sA9DjznZhTIdgPPFyEluF1Hw0CoMlKBgLJFAw+Kpd55zqz1R1Cvsr8quXiF87qX8Hj4Pef9my/FrAOYypggA/AM/SAVgKxUNIVQCfo+phIDaJu2MqZyw== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(136003)(376002)(346002)(396003)(39860400002)(451199015)(40470700004)(36840700001)(46966006)(2616005)(8676002)(336012)(186003)(70206006)(426003)(1076003)(47076005)(70586007)(5660300002)(36756003)(4326008)(478600001)(83380400001)(7636003)(41300700001)(356005)(40460700003)(86362001)(82740400003)(110136005)(36860700001)(6636002)(40480700001)(316002)(8936002)(54906003)(2906002)(26005)(6666004)(82310400005)(7696005);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2022 17:41:03.0110 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7734db70-b224-4414-8637-08dace430eb9 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT087.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA0PR12MB4384 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org During PRE_COPY the migration data FD may have a temporary "end of stream" that is reached when the initial_bytes were read and no other dirty data exists yet. For instance, this may indicate that the device is idle and not currently dirtying any internal state. When read() is done on this temporary end of stream the kernel driver should return ENOMSG from read(). Userspace can wait for more data or consider moving to STOP_COPY. To not block the user upon read() and let it get ENOMSG we add a new state named MLX5_MIGF_STATE_PRE_COPY on the migration file. In addition, we add the MLX5_MIGF_STATE_SAVE_LAST state to block the read() once we call the last SAVE upon moving to STOP_COPY. Any further error will be marked with MLX5_MIGF_STATE_ERROR and the user won't be blocked. Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 7 +++++-- drivers/vfio/pci/mlx5/cmd.h | 2 ++ drivers/vfio/pci/mlx5/main.c | 7 +++++++ 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 65764ca1b31a..6ec71bc6be83 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -501,8 +501,8 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) spin_lock_irqsave(&migf->list_lock, flags); list_add_tail(&async_data->buf->buf_elm, &migf->buf_list); spin_unlock_irqrestore(&migf->list_lock, flags); - if (async_data->last_chunk) - migf->state = MLX5_MIGF_STATE_COMPLETE; + migf->state = async_data->last_chunk ? + MLX5_MIGF_STATE_COMPLETE : MLX5_MIGF_STATE_PRE_COPY; wake_up_interruptible(&migf->poll_wait); } @@ -561,6 +561,9 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, } } + if (async_data->last_chunk) + migf->state = MLX5_MIGF_STATE_SAVE_LAST; + async_data->header_buf = header_buf; get_file(migf->filp); err = mlx5_cmd_exec_cb(&migf->async_ctx, in, sizeof(in), diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 2b77e2ab9cd2..67bc77605bc5 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -14,6 +14,8 @@ enum mlx5_vf_migf_state { MLX5_MIGF_STATE_ERROR = 1, + MLX5_MIGF_STATE_PRE_COPY, + MLX5_MIGF_STATE_SAVE_LAST, MLX5_MIGF_STATE_COMPLETE, }; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 2f5f83f2b2a4..28185085008f 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -219,6 +219,7 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (wait_event_interruptible(migf->poll_wait, !list_empty(&migf->buf_list) || migf->state == MLX5_MIGF_STATE_ERROR || + migf->state == MLX5_MIGF_STATE_PRE_COPY || migf->state == MLX5_MIGF_STATE_COMPLETE)) return -ERESTARTSYS; } @@ -236,6 +237,12 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, &end_of_data); if (first_loop_call) { first_loop_call = false; + /* Temporary end of file as part of PRE_COPY */ + if (end_of_data && migf->state == MLX5_MIGF_STATE_PRE_COPY) { + done = -ENOMSG; + goto out_unlock; + } + if (end_of_data && migf->state != MLX5_MIGF_STATE_COMPLETE) { if (filp->f_flags & O_NONBLOCK) { done = -EAGAIN; From patchwork Thu Nov 24 17:39:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13055240 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38C42C352A1 for ; Thu, 24 Nov 2022 17:41:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229668AbiKXRlN (ORCPT ); Thu, 24 Nov 2022 12:41:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229676AbiKXRlG (ORCPT ); Thu, 24 Nov 2022 12:41:06 -0500 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2084.outbound.protection.outlook.com [40.107.223.84]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A194134750 for ; Thu, 24 Nov 2022 09:41:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Qh0lTLQey7ddLyHxGlDL4z7dEZKOn/hT/KRVDrghgTxiMzGsrp2w/nRIDfUMG+SpAzqmzt2qbiNeZVNVsL2dSOMeeRDXxquj1iX4Nzfq8sv5CH1V9tC4fqImN+YrBLC/GNC1ViSXUaUFMzCjh6OZCdH/LrqrVcXZJKWoDF21/ksNf5WLgEOx+X7FSJ0VlrACi6lZWgFTXBqiP/ln9NVVL1a3OLkHN3vMexz9tZRwbu04OE0PlGtJFJoQqAL0pCyaWwvuvMQwHWXOUAuX7mXltRj59VIHHBsM5aSVIvj5VmPc3NOSx2ZBbALpiRyLyO/JTCrl7JTa3H33UfTF++UXqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KQS+KQOxizDxGHkZ7U5cwX6aFNMTn9roIu/pBoI+w2M=; b=na7u8YGqCArXoWnPLJ3TOKvWMxX3uW9B82B4yRlo4AY3p/k7XdwxUij5xy2LxWzmhi6yH7pylEbOcOwjMmhPOujLiPPVEyQR200eX00/mQ3zRD7n870h4/9X5hGpSeyRIXzw2VlGn2C2usjwXI6KADU4rgKjfkSFtQnQxJ/7LmhThaTlUR60dJ0dy/07k5uemNFM6XxH/fqiq1lI2uA57bSZ4OvCU1AAZtA6KgOSx93drzwqK2iT0sbe+ik2WTRBgTDEtsV74geTv/P4ExkxqcRThdnESm4gAOPFDD/R5zLbEvCMBrA4AotXMs0dBJovE0gDvE+KKCHoSSO9bnfqUA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KQS+KQOxizDxGHkZ7U5cwX6aFNMTn9roIu/pBoI+w2M=; b=D5K6w8c3MCmPWn8DQmaU3BKnclfyU/MJ1zSaucib/pKA5P4OqmKrpQH3S8c2QwgkuZm4hAHY8vz2t4kOtKuUGw4xtd1eU+Ka3SsKsj6vZubR7OoPw/YLZuA+cEmMxQyJJC/m9qksOkGAYm5yqSXkQNRY0cK0d2OsYe0fYY7MjBLHXIc99K9WAZ801iduVDyIbAMnLQV9T6HoVZFsltqgNfuAoTj+3Mtjgs2QJNU40HHfQd/R0VqW2XdeSQUUacqw5kYbp+GahnpdivPTjizHH525KC2kaJxbrfOZDXFBPZDucrG2Ynz4NLUNsmrIiP6u9hQ4R954iVhFJnIlSEh6lA== Received: from MW4PR04CA0252.namprd04.prod.outlook.com (2603:10b6:303:88::17) by SA0PR12MB4558.namprd12.prod.outlook.com (2603:10b6:806:72::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19; Thu, 24 Nov 2022 17:41:01 +0000 Received: from CO1NAM11FT058.eop-nam11.prod.protection.outlook.com (2603:10b6:303:88:cafe::91) by MW4PR04CA0252.outlook.office365.com (2603:10b6:303:88::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:41:01 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by CO1NAM11FT058.mail.protection.outlook.com (10.13.174.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5834.8 via Frontend Transport; Thu, 24 Nov 2022 17:41:01 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:59 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:40:59 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 24 Nov 2022 09:40:56 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH V1 vfio 12/14] vfio/mlx5: Introduce multiple loads Date: Thu, 24 Nov 2022 19:39:30 +0200 Message-ID: <20221124173932.194654-13-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com> References: <20221124173932.194654-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT058:EE_|SA0PR12MB4558:EE_ X-MS-Office365-Filtering-Correlation-Id: 5c050131-e1b2-401b-666a-08dace430daa X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: tRvcigfTKs0jz4DMtq/kHC2toAt+Yi8cNVQcx/PQNvMrxnRUqv2Mx1uYflcNnKgqgvwMdswEKaMgWDNrdpRQuCglREoyQ3KlZLW7lg2vcCzf4An32E5TCVAJGHLSlf7bfOAY4EmOP+pX1d3IMu4k55r4EmcpTOV2MsPSv38+/8B5r2/zoIPkskLnYyNoQQwEZLls8akDd9Z+CXVhJynN86MhXJ6yfIzVjyvcl1SZPIhrNeie90SxfZ68Z9NhwyzCSK0f7Kos2zIBuUhWrl3AbkN++EHWwzwNiR6Qc5A2i+wQCD9V4IIlmoNwUo1iiv+zs+ur0SyZiScdrrmEDO72P5Tu1+kmJf1cMIOZgdVG3dKirB8zw3Bk16Jlsp4kbZsu5Pkz6ZCMobCsfVdcwERG6M+/rEq9qaGm9o205TYNTcTarjrpzlDn5RSrW0OrJ4tvz2QCvXE8km4KtSv80qpmIe1WrCVaiJoSLf+f2vSEjVb+omvwz4kJ7tigVKdXDOIysk/ww8Loijmp3tOPn58Y9oUDHlraFzYE3Aj0JzpCLcIvmAP8gjpJYdw37GEYqgLkMdwOKahAhPwiIOWm55EDsh9VZOZAxfbP4UJ1bGQh0EAgPNSO47sP2MvC+VmlXQRoDkre1I6AlDNXL8infldwaFiFPU2bmLA2gZscfS1EpZG6WFDfPfOF6PhjPNnce+vixYw5gingmTFrFshiouiTgQ== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(346002)(396003)(39860400002)(136003)(376002)(451199015)(40470700004)(36840700001)(46966006)(426003)(2616005)(336012)(186003)(1076003)(47076005)(30864003)(36756003)(41300700001)(5660300002)(82310400005)(6666004)(26005)(8936002)(7696005)(86362001)(316002)(54906003)(110136005)(6636002)(8676002)(70206006)(40460700003)(4326008)(70586007)(478600001)(36860700001)(66899015)(82740400003)(2906002)(40480700001)(7636003)(356005)(83380400001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2022 17:41:01.2810 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5c050131-e1b2-401b-666a-08dace430daa X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT058.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA0PR12MB4558 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In order to support PRE_COPY, mlx5 driver transfers multiple states (images) of the device. e.g.: the source VF can save and transfer multiple states, and the target VF will load them by that order. This patch implements the changes for the target VF to decompose the header for each state and to write and load multiple states. Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 13 +- drivers/vfio/pci/mlx5/cmd.h | 10 ++ drivers/vfio/pci/mlx5/main.c | 282 +++++++++++++++++++++++++++++------ 3 files changed, 260 insertions(+), 45 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 6ec71bc6be83..49a852a84283 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -598,9 +598,11 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, if (mvdev->mdev_detach) return -ENOTCONN; - err = mlx5vf_dma_data_buffer(buf); - if (err) - return err; + if (!buf->dmaed) { + err = mlx5vf_dma_data_buffer(buf); + if (err) + return err; + } MLX5_SET(load_vhca_state_in, in, opcode, MLX5_CMD_OP_LOAD_VHCA_STATE); @@ -644,6 +646,11 @@ void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf) migf->buf = NULL; } + if (migf->buf_header) { + mlx5vf_free_data_buffer(migf->buf_header); + migf->buf_header = NULL; + } + list_splice(&migf->avail_list, &migf->buf_list); while ((entry = list_first_entry_or_null(&migf->buf_list, diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 67bc77605bc5..5ba094cabb2d 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -19,6 +19,14 @@ enum mlx5_vf_migf_state { MLX5_MIGF_STATE_COMPLETE, }; +enum mlx5_vf_load_state { + MLX5_VF_LOAD_STATE_READ_IMAGE_NO_HEADER, + MLX5_VF_LOAD_STATE_READ_HEADER, + MLX5_VF_LOAD_STATE_PREP_IMAGE, + MLX5_VF_LOAD_STATE_READ_IMAGE, + MLX5_VF_LOAD_STATE_LOAD_IMAGE, +}; + struct mlx5_vf_migration_header { __le64 image_size; /* For future use in case we may need to change the kernel protocol */ @@ -57,9 +65,11 @@ struct mlx5_vf_migration_file { struct mutex lock; enum mlx5_vf_migf_state state; + enum mlx5_vf_load_state load_state; u32 pdn; loff_t max_pos; struct mlx5_vhca_data_buffer *buf; + struct mlx5_vhca_data_buffer *buf_header; spinlock_t list_lock; struct list_head buf_list; struct list_head avail_list; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 28185085008f..0caaf4e8e1e9 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -202,6 +202,9 @@ static ssize_t mlx5vf_buf_read(struct mlx5_vhca_data_buffer *vhca_buf, return done; } +#define VFIO_PRE_COPY_SUPP(mvdev) \ + ((mvdev)->core_device.vdev.migration_flags & VFIO_MIGRATION_PRE_COPY) + static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, loff_t *pos) { @@ -513,13 +516,162 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track) return ERR_PTR(ret); } +static int +mlx5vf_append_page_to_mig_buf(struct mlx5_vhca_data_buffer *vhca_buf, + const char __user **buf, size_t *len, + loff_t *pos, ssize_t *done) +{ + unsigned long offset; + size_t page_offset; + struct page *page; + size_t page_len; + u8 *to_buff; + int ret; + + offset = *pos - vhca_buf->start_pos; + page_offset = offset % PAGE_SIZE; + + page = mlx5vf_get_migration_page(vhca_buf, offset - page_offset); + if (!page) + return -EINVAL; + page_len = min_t(size_t, *len, PAGE_SIZE - page_offset); + to_buff = kmap_local_page(page); + ret = copy_from_user(to_buff + page_offset, *buf, page_len); + kunmap_local(to_buff); + if (ret) + return -EFAULT; + + *pos += page_len; + *done += page_len; + *buf += page_len; + *len -= page_len; + vhca_buf->length += page_len; + return 0; +} + +static int +mlx5vf_resume_read_image_no_header(struct mlx5_vhca_data_buffer *vhca_buf, + loff_t requested_length, + const char __user **buf, size_t *len, + loff_t *pos, ssize_t *done) +{ + int ret; + + if (requested_length > MAX_MIGRATION_SIZE) + return -ENOMEM; + + if (vhca_buf->allocated_length < requested_length) { + ret = mlx5vf_add_migration_pages( + vhca_buf, + DIV_ROUND_UP(requested_length - vhca_buf->allocated_length, + PAGE_SIZE)); + if (ret) + return ret; + } + + while (*len) { + ret = mlx5vf_append_page_to_mig_buf(vhca_buf, buf, len, pos, + done); + if (ret) + return ret; + } + + return 0; +} + +static ssize_t +mlx5vf_resume_read_image(struct mlx5_vf_migration_file *migf, + struct mlx5_vhca_data_buffer *vhca_buf, + size_t image_size, const char __user **buf, + size_t *len, loff_t *pos, ssize_t *done, + bool *has_work) +{ + size_t copy_len, to_copy; + int ret; + + to_copy = min_t(size_t, *len, image_size - vhca_buf->length); + copy_len = to_copy; + while (to_copy) { + ret = mlx5vf_append_page_to_mig_buf(vhca_buf, buf, &to_copy, pos, + done); + if (ret) + return ret; + } + + *len -= copy_len; + if (vhca_buf->length == image_size) { + migf->load_state = MLX5_VF_LOAD_STATE_LOAD_IMAGE; + migf->max_pos += image_size; + *has_work = true; + } + + return 0; +} + +static int +mlx5vf_resume_read_header(struct mlx5_vf_migration_file *migf, + struct mlx5_vhca_data_buffer *vhca_buf, + const char __user **buf, + size_t *len, loff_t *pos, + ssize_t *done, bool *has_work) +{ + struct page *page; + size_t copy_len; + u8 *to_buff; + int ret; + + copy_len = min_t(size_t, *len, + sizeof(struct mlx5_vf_migration_header) - vhca_buf->length); + page = mlx5vf_get_migration_page(vhca_buf, 0); + if (!page) + return -EINVAL; + to_buff = kmap_local_page(page); + ret = copy_from_user(to_buff + vhca_buf->length, *buf, copy_len); + if (ret) { + ret = -EFAULT; + goto end; + } + + *buf += copy_len; + *pos += copy_len; + *done += copy_len; + *len -= copy_len; + vhca_buf->length += copy_len; + if (vhca_buf->length == sizeof(struct mlx5_vf_migration_header)) { + u64 flags; + + vhca_buf->header_image_size = le64_to_cpup((__le64 *)to_buff); + if (vhca_buf->header_image_size > MAX_MIGRATION_SIZE) { + ret = -ENOMEM; + goto end; + } + + flags = le64_to_cpup((__le64 *)(to_buff + + offsetof(struct mlx5_vf_migration_header, flags))); + if (flags) { + ret = -EOPNOTSUPP; + goto end; + } + + migf->load_state = MLX5_VF_LOAD_STATE_PREP_IMAGE; + migf->max_pos += vhca_buf->length; + *has_work = true; + } +end: + kunmap_local(to_buff); + return ret; +} + static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, size_t len, loff_t *pos) { struct mlx5_vf_migration_file *migf = filp->private_data; struct mlx5_vhca_data_buffer *vhca_buf = migf->buf; + struct mlx5_vhca_data_buffer *vhca_buf_header = migf->buf_header; loff_t requested_length; + bool has_work = false; ssize_t done = 0; + int ret = 0; if (pos) return -ESPIPE; @@ -529,56 +681,83 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, check_add_overflow((loff_t)len, *pos, &requested_length)) return -EINVAL; - if (requested_length > MAX_MIGRATION_SIZE) - return -ENOMEM; - + mutex_lock(&migf->mvdev->state_mutex); mutex_lock(&migf->lock); if (migf->state == MLX5_MIGF_STATE_ERROR) { - done = -ENODEV; + ret = -ENODEV; goto out_unlock; } - if (vhca_buf->allocated_length < requested_length) { - done = mlx5vf_add_migration_pages( - vhca_buf, - DIV_ROUND_UP(requested_length - vhca_buf->allocated_length, - PAGE_SIZE)); - if (done) - goto out_unlock; - } + while (len || has_work) { + has_work = false; + switch (migf->load_state) { + case MLX5_VF_LOAD_STATE_READ_HEADER: + ret = mlx5vf_resume_read_header(migf, vhca_buf_header, + &buf, &len, pos, + &done, &has_work); + if (ret) + goto out_unlock; + break; + case MLX5_VF_LOAD_STATE_PREP_IMAGE: + { + u64 size = vhca_buf_header->header_image_size; + + if (vhca_buf->allocated_length < size) { + mlx5vf_free_data_buffer(vhca_buf); + + migf->buf = mlx5vf_alloc_data_buffer(migf, + size, DMA_TO_DEVICE); + if (IS_ERR(migf->buf)) { + ret = PTR_ERR(migf->buf); + migf->buf = NULL; + goto out_unlock; + } - while (len) { - size_t page_offset; - struct page *page; - size_t page_len; - u8 *to_buff; - int ret; + vhca_buf = migf->buf; + } - page_offset = (*pos) % PAGE_SIZE; - page = mlx5vf_get_migration_page(vhca_buf, *pos - page_offset); - if (!page) { - if (done == 0) - done = -EINVAL; - goto out_unlock; + vhca_buf->start_pos = migf->max_pos; + migf->load_state = MLX5_VF_LOAD_STATE_READ_IMAGE; + break; } + case MLX5_VF_LOAD_STATE_READ_IMAGE_NO_HEADER: + ret = mlx5vf_resume_read_image_no_header(vhca_buf, + requested_length, + &buf, &len, pos, &done); + if (ret) + goto out_unlock; + break; + case MLX5_VF_LOAD_STATE_READ_IMAGE: + ret = mlx5vf_resume_read_image(migf, vhca_buf, + vhca_buf_header->header_image_size, + &buf, &len, pos, &done, &has_work); + if (ret) + goto out_unlock; + break; + case MLX5_VF_LOAD_STATE_LOAD_IMAGE: + ret = mlx5vf_cmd_load_vhca_state(migf->mvdev, migf, vhca_buf); + if (ret) + goto out_unlock; + migf->load_state = MLX5_VF_LOAD_STATE_READ_HEADER; - page_len = min_t(size_t, len, PAGE_SIZE - page_offset); - to_buff = kmap_local_page(page); - ret = copy_from_user(to_buff + page_offset, buf, page_len); - kunmap_local(to_buff); - if (ret) { - done = -EFAULT; - goto out_unlock; + /* prep header buf for next image */ + vhca_buf_header->length = 0; + vhca_buf_header->header_image_size = 0; + /* prep data buf for next image */ + vhca_buf->length = 0; + + break; + default: + break; } - *pos += page_len; - len -= page_len; - done += page_len; - buf += page_len; - vhca_buf->length += page_len; } + out_unlock: + if (ret) + migf->state = MLX5_MIGF_STATE_ERROR; mutex_unlock(&migf->lock); - return done; + mlx5vf_state_mutex_unlock(migf->mvdev); + return ret ? ret : done; } static const struct file_operations mlx5vf_resume_fops = { @@ -618,12 +797,29 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) } migf->buf = buf; + if (VFIO_PRE_COPY_SUPP(mvdev)) { + buf = mlx5vf_alloc_data_buffer(migf, + sizeof(struct mlx5_vf_migration_header), DMA_NONE); + if (IS_ERR(buf)) { + ret = PTR_ERR(buf); + goto out_buf; + } + + migf->buf_header = buf; + migf->load_state = MLX5_VF_LOAD_STATE_READ_HEADER; + } else { + /* Initial state will be to read the image */ + migf->load_state = MLX5_VF_LOAD_STATE_READ_IMAGE_NO_HEADER; + } + stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); INIT_LIST_HEAD(&migf->buf_list); INIT_LIST_HEAD(&migf->avail_list); spin_lock_init(&migf->list_lock); return migf; +out_buf: + mlx5vf_free_data_buffer(buf); out_pd: mlx5vf_cmd_dealloc_pd(migf); out_free: @@ -723,11 +919,13 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, } if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) { - ret = mlx5vf_cmd_load_vhca_state(mvdev, - mvdev->resuming_migf, - mvdev->resuming_migf->buf); - if (ret) - return ERR_PTR(ret); + if (!VFIO_PRE_COPY_SUPP(mvdev)) { + ret = mlx5vf_cmd_load_vhca_state(mvdev, + mvdev->resuming_migf, + mvdev->resuming_migf->buf); + if (ret) + return ERR_PTR(ret); + } mlx5vf_disable_fds(mvdev); return NULL; } From patchwork Thu Nov 24 17:39:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13055243 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D59D6C433FE for ; Thu, 24 Nov 2022 17:41:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229687AbiKXRlS (ORCPT ); Thu, 24 Nov 2022 12:41:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33656 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229682AbiKXRlH (ORCPT ); Thu, 24 Nov 2022 12:41:07 -0500 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2064.outbound.protection.outlook.com [40.107.243.64]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AAF6813C707 for ; Thu, 24 Nov 2022 09:41:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VyvBXJ+hlRA5CuGYjNy7jSklgYa9M/X+04T5lS5mZxYhCnd9O9cSchOIarS1NuLNRwQxTEC6IlesVub6qxg+EBnqpylOlp05KzoO5xGpvLmGOKZXn3O3Y2SDXp2mMzbFzht14WifwQ9k8YQIyO2LM5fTnFB1K7F9LMjq/HtsIP8wSo5Wza14enkLR3d2t3Vfoy0ong5tiuBUmg6MNd+PTRFlD0tT4xvPstsj25vxNRdXYF0HdyK01N29weGmu6K8t3s9+2shB0Aqo4YF1unbYvZptQ6/nIbrycCLglVZ6T3kPbHbSunhB2FC+8SwYDe/+Xhzp/3Ger7YzdGIRWLUjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lChmR0EjT2JYvf5s+0aX8se3MpR/4+HS1LVBpOCGRgA=; b=VowIu94HDNjjS/5PYSe2EgNaiAblYNZt2uwyreWvppQKUVAJHK0qsZBOy0KFR9oQwo4/VoVnxr2VaodsBPfOT24g2Wf2Vun6ABKHfBUDc0pbjVb2bVIJcICD6G52+NgcfEVWBuuQRyL2WkvPlddpeus3BRAxpOkNke0BfJ2mV+GuVnHQLNTRpyKPyaUSL2ifDdU6riaf0OGw4XFrhpERyOdTvpCn1+XW7cv7gQLgDiuvdOFd6nPF53wkDJjiyYwMZBB5UMWTznAVvTq1DZ7KbbszXiPHlFesiK0pGaVmn8nt+iN4FGYuj5B+FLUBwTJg6Y2uwlrQgkcBmSfo6uXazg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lChmR0EjT2JYvf5s+0aX8se3MpR/4+HS1LVBpOCGRgA=; b=Zzd0RWdSshF7tbiaHHWajOGezHhHW7a+kKJpYpmvH//562sBi6RwIQJLEyY4w1glbcw0eh2np3Ax/qWOZCDuntieJNzqP5rp0NJtHJK9SSzXdWwsgjFPHARxdLkgwRSSk0oCxZLoz8bdFS3kK/ZM6r+GGnKyEDGe3clEQ7QRErEyLlBJKgV/8ypNh9XsAgAXL6Xu+KqeAMxby1Rm1g5RSOxBfpNaczHvNfoV2YK2fRrO30kVlUQeLuQA+7IZ0E8yyNMOa6UHxOnLB8RPk6Y2F8jHr5qJqQ6VHrb+ZiQxknfye0cZO1yhdGpMuJlT4r6uGq//qwQqrpQjAH2cZm26Jg== Received: from MW4PR04CA0243.namprd04.prod.outlook.com (2603:10b6:303:88::8) by LV2PR12MB6013.namprd12.prod.outlook.com (2603:10b6:408:171::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5834.15; Thu, 24 Nov 2022 17:41:04 +0000 Received: from CO1NAM11FT058.eop-nam11.prod.protection.outlook.com (2603:10b6:303:88:cafe::53) by MW4PR04CA0243.outlook.office365.com (2603:10b6:303:88::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:41:03 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by CO1NAM11FT058.mail.protection.outlook.com (10.13.174.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5834.8 via Frontend Transport; Thu, 24 Nov 2022 17:41:03 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:41:03 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:41:03 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 24 Nov 2022 09:40:59 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH V1 vfio 13/14] vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error Date: Thu, 24 Nov 2022 19:39:31 +0200 Message-ID: <20221124173932.194654-14-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com> References: <20221124173932.194654-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT058:EE_|LV2PR12MB6013:EE_ X-MS-Office365-Filtering-Correlation-Id: a7df6397-6ec0-49a5-b047-08dace430f20 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: o5dpeFOROHjRnnVh5sCgeaOqNoydM75sm1Cb5eUKpFqQlKSisghxaHg80kVYFtZiG5Enj5wEhzNp6iR2Wnr5F9WMJHVnWny2douM/f/dDeghwYeJCEimSAiUVqrR5x8rnZddP8f8fvMuQXgMUmH068datv/69Q1GSaIraEIuULLNDHUY+TkmVEsQYfimyYz+tYMS7hIzBsCCdfN7FkEiICH6TzBj+7CPLJeBDtpkswN2mZavPe0kryyxHcIQZRLU6jO0uqhNwe2G0OT5RQdSmP6zdkkJemgCW2Y6Q2KAVKEsscJgytTgvDhDFS3HLTBWscleofsdjOxtzXUv0+eNlFKwbbTLl5a/VdJwet+g7A2inxNljjKMFnnCNIUByoHsVftZ06gZidWUCWoMSffpqVtxqpnxZ5xHKDLxvLfaT7lvVglEk8c01OULf7OI33Ytdx5PCykFdZe8lfRCDTmn4y9JywZY2MZS0IYgvX5Ijbw4FH7ATUtr0/20dshq2nhG8GUTYYjtOpfDDBnVNCluDTV/QKggxapAWu7AmIYfjha58p+IiNKOHOLAMt+x9c3xb+lxUSl53DwhwAIAzjA/3ApCxx8dStXVh1gDVagCJWCwDUdGkIKEUwX17jHedEnDH1g9vg42s1luAknSEbNJ3whB2zr5VsiCRUDsyt519DuWNzE4HvNmSYRwHHH+FKZtbLqQzhF6T6zx+WHfpp01XA== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(376002)(346002)(39860400002)(396003)(136003)(451199015)(36840700001)(40470700004)(46966006)(36756003)(36860700001)(86362001)(47076005)(82740400003)(26005)(5660300002)(336012)(40480700001)(2906002)(83380400001)(186003)(426003)(1076003)(40460700003)(356005)(7636003)(2616005)(7696005)(82310400005)(70206006)(8676002)(4326008)(41300700001)(110136005)(6636002)(54906003)(8936002)(478600001)(6666004)(316002)(70586007);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2022 17:41:03.7340 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a7df6397-6ec0-49a5-b047-08dace430f20 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT058.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV2PR12MB6013 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory Before a SAVE command is issued, a QUERY command is issued in order to know the device data size. In case PRE_COPY is used, the above commands are issued while the device is running. Thus, it is possible that between the QUERY and the SAVE commands the state of the device will be changed significantly and thus the SAVE will fail. Currently, if a SAVE command is failing, the driver will fail the migration. In the above case, don't fail the migration, but don't allow for new SAVEs to be executed while the device is in a RUNNING state. Once the device will be moved to STOP_COPY, SAVE can be executed again and the full device state will be read. Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 26 +++++++++++++++++++++++++- drivers/vfio/pci/mlx5/cmd.h | 2 ++ drivers/vfio/pci/mlx5/main.c | 6 ++++-- 3 files changed, 31 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 49a852a84283..a1dca065b977 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -84,6 +84,18 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, ret = wait_for_completion_interruptible(&mvdev->saving_migf->save_comp); if (ret) return ret; + if (mvdev->saving_migf->state == + MLX5_MIGF_STATE_PRE_COPY_ERROR) { + /* + * In case we had a PRE_COPY error, only query full + * image for final image + */ + if (!(query_flags & MLX5VF_QUERY_FINAL)) { + *state_size = 0; + return 0; + } + query_flags &= ~MLX5VF_QUERY_INC; + } } MLX5_SET(query_vhca_migration_state_in, in, opcode, @@ -442,7 +454,10 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) mlx5vf_put_data_buffer(async_data->buf); if (async_data->header_buf) mlx5vf_put_data_buffer(async_data->header_buf); - migf->state = MLX5_MIGF_STATE_ERROR; + if (async_data->status == MLX5_CMD_STAT_BAD_RES_STATE_ERR) + migf->state = MLX5_MIGF_STATE_PRE_COPY_ERROR; + else + migf->state = MLX5_MIGF_STATE_ERROR; wake_up_interruptible(&migf->poll_wait); } mutex_unlock(&migf->lock); @@ -511,6 +526,8 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) * The error and the cleanup flows can't run from an * interrupt context */ + if (status == -EREMOTEIO) + status = MLX5_GET(save_vhca_state_out, async_data->out, status); async_data->status = status; queue_work(migf->mvdev->cb_wq, &async_data->work); } @@ -534,6 +551,13 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, if (err) return err; + if (migf->state == MLX5_MIGF_STATE_PRE_COPY_ERROR) + /* + * In case we had a PRE_COPY error, SAVE is triggered only for + * the final image, read device full image. + */ + inc = false; + MLX5_SET(save_vhca_state_in, in, opcode, MLX5_CMD_OP_SAVE_VHCA_STATE); MLX5_SET(save_vhca_state_in, in, op_mod, 0); diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 5ba094cabb2d..11a6e99a0bc9 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -14,6 +14,7 @@ enum mlx5_vf_migf_state { MLX5_MIGF_STATE_ERROR = 1, + MLX5_MIGF_STATE_PRE_COPY_ERROR, MLX5_MIGF_STATE_PRE_COPY, MLX5_MIGF_STATE_SAVE_LAST, MLX5_MIGF_STATE_COMPLETE, @@ -154,6 +155,7 @@ struct mlx5vf_pci_core_device { enum { MLX5VF_QUERY_INC = (1UL << 0), + MLX5VF_QUERY_FINAL = (1UL << 1), }; int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod); diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 0caaf4e8e1e9..0976fadf212d 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -222,6 +222,7 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (wait_event_interruptible(migf->poll_wait, !list_empty(&migf->buf_list) || migf->state == MLX5_MIGF_STATE_ERROR || + migf->state == MLX5_MIGF_STATE_PRE_COPY_ERROR || migf->state == MLX5_MIGF_STATE_PRE_COPY || migf->state == MLX5_MIGF_STATE_COMPLETE)) return -ERESTARTSYS; @@ -241,7 +242,8 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (first_loop_call) { first_loop_call = false; /* Temporary end of file as part of PRE_COPY */ - if (end_of_data && migf->state == MLX5_MIGF_STATE_PRE_COPY) { + if (end_of_data && (migf->state == MLX5_MIGF_STATE_PRE_COPY || + migf->state == MLX5_MIGF_STATE_PRE_COPY_ERROR)) { done = -ENOMSG; goto out_unlock; } @@ -434,7 +436,7 @@ static int mlx5vf_pci_save_device_inc_data(struct mlx5vf_pci_core_device *mvdev) return -ENODEV; ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, - MLX5VF_QUERY_INC); + MLX5VF_QUERY_INC | MLX5VF_QUERY_FINAL); if (ret) goto err; From patchwork Thu Nov 24 17:39:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13055244 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B852C4332F for ; Thu, 24 Nov 2022 17:41:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229682AbiKXRlU (ORCPT ); Thu, 24 Nov 2022 12:41:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34054 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229633AbiKXRlL (ORCPT ); Thu, 24 Nov 2022 12:41:11 -0500 Received: from NAM04-MW2-obe.outbound.protection.outlook.com (mail-mw2nam04on2040.outbound.protection.outlook.com [40.107.101.40]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8152C13C710 for ; Thu, 24 Nov 2022 09:41:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=lfSqh2T8ZtM3Q+pg1KDMcCzA96VHNuOiku7QJTmVWfh79Vd6siVCa2gEA/XDlFNYI7bERIFw/r+Q0hBVjFFVRdWyh6Q99ILuoS/tKxW4AAfp0xzvgPlhL17/hxlxAlgm84MSZHd37+l8Apj4k3oSQBWDzsCP00HfwsdPizmbzhPEBe22hFJI/mJ88yuozNbCI7HLT3IWywqIiynabIqZQ/6yUu8Yv8NhNs40l7v/xsn/YNDOMaXH6iVoqxjZYjD0EpiSyySjtAcCNLyGIh3MSe3zVMKLwSQ9aZbdqRnBH0BcSRurb0gmSWGV6DSp8rB4yJgo/QXLiLq5vpT1T9chEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=oeicPMB1a5ZmCLJ9SaKk7KlezHuZMdGanA64A0nPhcE=; b=MS3GQ37+g6GlYwajrTUMHEORPGyJXMQaG2Jq4TSfgBnfRIgHcrXMQrmeIDeMEoEzTGnctZh01TTH+POTokoLSW/hjRJ3Dl7ov3R4Qcc+Tyntl0AH0hYFxyDc3TWikkfw9RidxHi7gpCqG3q0QBCF27yNMFOy5UZt2DmDFykmxW8jFAU1IAKIBKUDSy7URozfXdqCQ8KbWh/sRcBQRV6kq0WaxFxS4yASl37phV8tLN/ewXdsAfPb7PvxAcTpnxvRrRq+HsBq2B+iFblZDo1zYEzdMRAhjKhJGfhHSNxcUT0IbJJm3BeUPoB/J8OihmRT1U783zg9VNvheQ014Ter3w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=oeicPMB1a5ZmCLJ9SaKk7KlezHuZMdGanA64A0nPhcE=; b=bHoKFyEKNEKPhaSB7Fg/hEqDSmwvKX59BWb+WD/Ew1klYSVZXEq9W7r1vexuwzf2eNCQOw8LWgBPic6Ci5G/0OEFlaHskI/gxaTHBDNzQWJwyuPHpBlBWAdy4qQXbGITbzGzSMngUd/JfJKjpIcaA5GAU9DNnMQyGQShjQKM1PeTU/4+IqRGkK/y+/fFYd9U/osQJHT4LlNUPxEscjkO0ZhUp6EYxyuxiwN/nX1wjtFuWjjTk/nfOEWT9CDYQovLNEBzWg/BqYsye1Ewonq7qn1QKZp+d2lMG8SfqYQobg8pfN0inkCGhd62cvlJKUfQRK9KYQjZRkDIZh6SV59rpw== Received: from DS7PR03CA0269.namprd03.prod.outlook.com (2603:10b6:5:3b3::34) by DM6PR12MB4297.namprd12.prod.outlook.com (2603:10b6:5:211::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19; Thu, 24 Nov 2022 17:41:08 +0000 Received: from DM6NAM11FT113.eop-nam11.prod.protection.outlook.com (2603:10b6:5:3b3:cafe::a9) by DS7PR03CA0269.outlook.office365.com (2603:10b6:5:3b3::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.19 via Frontend Transport; Thu, 24 Nov 2022 17:41:08 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DM6NAM11FT113.mail.protection.outlook.com (10.13.173.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.17 via Frontend Transport; Thu, 24 Nov 2022 17:41:08 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:41:07 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 24 Nov 2022 09:41:06 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 24 Nov 2022 09:41:03 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH V1 vfio 14/14] vfio/mlx5: Enable MIGRATION_PRE_COPY flag Date: Thu, 24 Nov 2022 19:39:32 +0200 Message-ID: <20221124173932.194654-15-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com> References: <20221124173932.194654-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT113:EE_|DM6PR12MB4297:EE_ X-MS-Office365-Filtering-Correlation-Id: bf02945a-76d0-40d1-2bbf-08dace4311dd X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 8APvwMpxYkkVnrIe/lKl76SPVAPeSarFWOZK07IrprCMs9RWe4UjkRrg3tulW9opW/NdKAuAVlewC2bE9H6+0fWOmnJwC6bBBFde+RQRTohyZSjw0zm6Vn1PUWmjaNoExKa4C50lNgJds1eFB4gQuCr8ox48zjw79P5i2GagWb0EFMDOWhuGphBmf86nYVQDIg0Qap+bPy+OlzS6Y8wudg0Sa8hAaP/mtdAL1jLGpGIUw+u4/8GFgLhYvverkvFBCBQkabC3/b+U08TzGFgxFbeDZsgb59+4VPVgbpwBe2bzKosYKKsg3oHHuCaMMrqpE2n5K7J3eHs0NuMk6EI4bdEb344jATHlIohMguwpQVG47H+n69rbrepNKmPHbyhIGqMTkNTIS+589VyBmf2YQTpMPzAbRokzxJoCIhIQ6Ebs3x+0T0q99Vx3A4dltAqoieQwu4F5PEbbIViHuNd5SBxZiSWBAFJZf1S1SCG3a6JzTwcEiUcGCg7IKywE4glq4wtEVxFek83i5asyh4FTum/zUQmhTu8rJ207ozJfE7QZMzy6EMHRFo6O+1n38EPH2FBslfe6R9BqGQz45A8YdhpK9My7nYJ/Ql11O/qPBbebxY7HgoBHEG1lHzq8nxvPaPqrNN5G/SAuCN7GKjK0WxKvpTRT+YIBFfijPk6h+h1ABzyPt2KOUtkboLhoiV0MSJuJJskWE8U07EAJ8fdFzw== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(376002)(39860400002)(346002)(136003)(396003)(451199015)(46966006)(40470700004)(36840700001)(186003)(2616005)(1076003)(426003)(336012)(47076005)(82310400005)(36756003)(5660300002)(41300700001)(4744005)(6666004)(7696005)(8936002)(26005)(86362001)(316002)(54906003)(110136005)(6636002)(8676002)(4326008)(40460700003)(478600001)(70586007)(36860700001)(70206006)(2906002)(82740400003)(40480700001)(356005)(7636003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2022 17:41:08.2778 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: bf02945a-76d0-40d1-2bbf-08dace4311dd X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT113.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4297 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory Now that everything has been set up for MIGRATION_PRE_COPY, enable it. Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index a1dca065b977..019011b8710e 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -221,6 +221,11 @@ void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev, if (MLX5_CAP_GEN(mvdev->mdev, adv_virtualization)) mvdev->core_device.vdev.log_ops = log_ops; + if (MLX5_CAP_GEN_2(mvdev->mdev, migration_multi_load) && + MLX5_CAP_GEN_2(mvdev->mdev, migration_tracking_state)) + mvdev->core_device.vdev.migration_flags |= + VFIO_MIGRATION_PRE_COPY; + end: mlx5_vf_put_core_dev(mvdev->mdev); }