From patchwork Mon Sep 5 10:58:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 12965979 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C63A9C6FA8D for ; Mon, 5 Sep 2022 11:00:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238132AbiIELAN (ORCPT ); Mon, 5 Sep 2022 07:00:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237958AbiIEK7q (ORCPT ); Mon, 5 Sep 2022 06:59:46 -0400 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2045.outbound.protection.outlook.com [40.107.220.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2153111445; Mon, 5 Sep 2022 03:59:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CVQ+uk1wbph/08dlygTYfArlGSEpj1/kJC9KPEyLZwKQsmda1UYJOCo21LY1wcHcw3jn3MWHpWi6Mur8GtgKAhz3FVa51vZLCq2sUR2PaMEUATbTY1C3HxQzOgwrtVH8speR6vXUguUc0DPqOJEeGkD8ngimzrkX3zMFD2U+EjCe+KItS4x80Mp6tQZhwih4CHZkU8BA9du3I3R/mLDJrb3+HR9OLiHS/zRiDmDQBaqPghjlGkplaA7OI7/m2DMZconsdfL/gW88DKMP8/StLHiZgRS5IHLCr8V07fYIzlZgdVOAWg4KfC8VTMkCtHpgrUKBPRQMzIByorq6Pg0GSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KtQbQg1nek4861Tza7nvTT11lXhRIzWkTBZo3dmlKlM=; b=BdT/affor3j8HrgDjTbtG+qBNyMvMiJe7v0jqjCYNzirPFOFrgQSzCFkplPz0dTg1G1AjjnOBc8GfyIIhJTkRSBYqeOFGrdDbPDaq11Tnaq/Bo7wD+mTbdUo2EcYerHl7yjaYFJMBUR6qLUC2rqecQc8yN4XheHUyjoOwHdUya0tgxKrXYU13vcPhp6ui5bKZJbHT+hlKTEOTuS7mYG3kLLGc1FTp4v191TFyQ6F5mcOxS/3sZh1DzV9zzM7EzoXuarP1u6dExC2HNgRwHnyTWn8RuWSNvNO+YRSj2FcaExVhaWl197m7k2Ugepz3wPc5uzCP96jZrn0//yY9aNZig== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=intel.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KtQbQg1nek4861Tza7nvTT11lXhRIzWkTBZo3dmlKlM=; b=T6KQDRXJWCNtVrPSEo3EketDogOkSXF4eSm/3ryzFK1Gsgtw6QE4v3tJ25tzDonrLDzZe8UPILPdCt2hbdQvrMyzIRFWbuLZPEbVc8yGvK6i/oecbhOieQ7Jth7Vq5r35dquiZAOCFcxqmUdkXL4XsQtitIvOa+Z3d7ghVjO+3Wu56zFhuy2LHO2dL7Sq7pgvKtRq3V85xPx9IDRLAufw/UEd0GGJzb9XdWPGAm7QLRjs272ufg3fIPGu62f3Cgpn0iS+5oBQ2RNJ9PpAiRgpaVeCKm1qEAv9fk/pQl0TPhvKffbTkTL3Wg+ix9xmaBgU1N+VmcRv3dZA+x+o56omw== Received: from MW4P223CA0003.NAMP223.PROD.OUTLOOK.COM (2603:10b6:303:80::8) by IA0PR12MB7529.namprd12.prod.outlook.com (2603:10b6:208:441::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5566.16; Mon, 5 Sep 2022 10:59:42 +0000 Received: from CO1NAM11FT005.eop-nam11.prod.protection.outlook.com (2603:10b6:303:80:cafe::df) by MW4P223CA0003.outlook.office365.com (2603:10b6:303:80::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5588.14 via Frontend Transport; Mon, 5 Sep 2022 10:59:42 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.236) by CO1NAM11FT005.mail.protection.outlook.com (10.13.174.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5588.10 via Frontend Transport; Mon, 5 Sep 2022 10:59:42 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.38; Mon, 5 Sep 2022 10:59:42 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Mon, 5 Sep 2022 03:59:41 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.986.29 via Frontend Transport; Mon, 5 Sep 2022 03:59:38 -0700 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V6 vfio 08/10] vfio/mlx5: Report dirty pages from tracker Date: Mon, 5 Sep 2022 13:58:50 +0300 Message-ID: <20220905105852.26398-9-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20220905105852.26398-1-yishaih@nvidia.com> References: <20220905105852.26398-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: e6bbd319-9dc7-4a3b-a36b-08da8f2dbc93 X-MS-TrafficTypeDiagnostic: IA0PR12MB7529:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: kBNLwwOV42XEGZrQC3Fbqtd0r08WLnCITe6iiQ/lkVaLRoewhzTmjeTmeOEGN/f7pzM5NkgqAVtD2JqxCxejsfTbUj5BaJNMmCPDmhh8GHuEPq9QhYTa7od6Cqjbh0USr/Z1TCcNLZe1Kxuzt0T23KoIwSWhi28zWenzGl1PGf0TMNDCyo/Vq2MsPR84U59X8UZox1ItlTo/A5Cv+7sA5wOQXwWfpcAcPhbJtxveWReUaWUIl6DeHF3q4NpKx2mFUcs4wsOMzF9FTg4nWkN4Jz+ZIVTSKYntJ7Gaijyn22KLNpBfuRLkX4r31EcyxuIKjGc/EeAEx1GSDM/s9uy3hLpjzh1M/OfqZYXoZxRCzfSwTW91kmFTtZlvFitleNdrcr3I0QYjITjENOBEHek/KFRP7+Rqnmw/cuETf1I93+Cm7y3V0fbb8woagMW2MHNd8lNqtpmpFI0jBKY0fdeNUMmaBesOAiYsUXWjLYwbitpJUGHol5tE4GkMKKcwde6r+/+WHi8amVtE3cDpnaLx7JMoKZKZBDCrydaEBQUfd64f8LW8Krr0EoGZyhnFm8HRRryoicu2y4SdRvrRvVg3uJkLjW/8+LKUiP83wVFfWLAsT09MR3aoFU8ghTwkaZ+D/M7UvN+BxQCQvS/lDRs5wb0G24EKP3x4bKBiCui93kUorM9TYY+r42+uur8ftQFgF+Xh3QpQh/vmgQgC/ZcX1Gs+Q/j0c30ktUNvIlN1T44ns9xYumdnYZ//ZT9PQfmKTAD9J79Ew4Y0YiMQ537r1M+YfIo3Ox63kDXDgj3pRzweLSS0RO6d+BFaLFiAiv08oK8b9/0d5uDjPqr+Q+HGgg== X-Forefront-Antispam-Report: CIP:12.22.5.236;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:InfoNoRecords;CAT:NONE;SFS:(13230016)(4636009)(136003)(376002)(346002)(396003)(39860400002)(36840700001)(46966006)(40470700004)(426003)(47076005)(8936002)(26005)(336012)(2616005)(1076003)(40480700001)(5660300002)(41300700001)(186003)(36756003)(86362001)(6666004)(478600001)(7696005)(356005)(36860700001)(81166007)(316002)(70206006)(82310400005)(83380400001)(70586007)(54906003)(2906002)(40460700003)(82740400003)(110136005)(6636002)(4326008)(8676002)(14143004)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Sep 2022 10:59:42.5032 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e6bbd319-9dc7-4a3b-a36b-08da8f2dbc93 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.236];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT005.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA0PR12MB7529 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Report dirty pages from tracker. It includes: Querying for dirty pages in a given IOVA range, this is done by modifying the tracker into the reporting state and supplying the required range. Using the CQ event completion mechanism to be notified once data is ready on the CQ/QP to be processed. Once data is available turn on the corresponding bits in the bit map. This functionality will be used as part of the 'log_read_and_clear' driver callback in the next patches. Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 191 ++++++++++++++++++++++++++++++++++++ drivers/vfio/pci/mlx5/cmd.h | 4 + 2 files changed, 195 insertions(+) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index f1cad96af6ab..fa9ddd926500 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -5,6 +5,8 @@ #include "cmd.h" +enum { CQ_OK = 0, CQ_EMPTY = -1, CQ_POLL_ERR = -2 }; + static int mlx5vf_cmd_get_vhca_id(struct mlx5_core_dev *mdev, u16 function_id, u16 *vhca_id); static void @@ -157,6 +159,7 @@ void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev, VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P; mvdev->core_device.vdev.mig_ops = mig_ops; + init_completion(&mvdev->tracker_comp); end: mlx5_vf_put_core_dev(mvdev->mdev); @@ -552,6 +555,29 @@ static int mlx5vf_cmd_destroy_tracker(struct mlx5_core_dev *mdev, return mlx5_cmd_exec(mdev, in, sizeof(in), out, sizeof(out)); } +static int mlx5vf_cmd_modify_tracker(struct mlx5_core_dev *mdev, + u32 tracker_id, unsigned long iova, + unsigned long length, u32 tracker_state) +{ + u32 in[MLX5_ST_SZ_DW(modify_page_track_obj_in)] = {}; + u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {}; + void *obj_context; + void *cmd_hdr; + + cmd_hdr = MLX5_ADDR_OF(modify_page_track_obj_in, in, general_obj_in_cmd_hdr); + MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_MODIFY_GENERAL_OBJECT); + MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_PAGE_TRACK); + MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, tracker_id); + + obj_context = MLX5_ADDR_OF(modify_page_track_obj_in, in, obj_context); + MLX5_SET64(page_track, obj_context, modify_field_select, 0x3); + MLX5_SET64(page_track, obj_context, range_start_address, iova); + MLX5_SET64(page_track, obj_context, length, length); + MLX5_SET(page_track, obj_context, state, tracker_state); + + return mlx5_cmd_exec(mdev, in, sizeof(in), out, sizeof(out)); +} + static int alloc_cq_frag_buf(struct mlx5_core_dev *mdev, struct mlx5_vhca_cq_buf *buf, int nent, int cqe_size) @@ -593,6 +619,16 @@ static void mlx5vf_destroy_cq(struct mlx5_core_dev *mdev, mlx5_db_free(mdev, &cq->db); } +static void mlx5vf_cq_complete(struct mlx5_core_cq *mcq, + struct mlx5_eqe *eqe) +{ + struct mlx5vf_pci_core_device *mvdev = + container_of(mcq, struct mlx5vf_pci_core_device, + tracker.cq.mcq); + + complete(&mvdev->tracker_comp); +} + static int mlx5vf_create_cq(struct mlx5_core_dev *mdev, struct mlx5_vhca_page_tracker *tracker, size_t ncqe) @@ -643,10 +679,13 @@ static int mlx5vf_create_cq(struct mlx5_core_dev *mdev, MLX5_SET64(cqc, cqc, dbr_addr, cq->db.dma); pas = (__be64 *)MLX5_ADDR_OF(create_cq_in, in, pas); mlx5_fill_page_frag_array(&cq->buf.frag_buf, pas); + cq->mcq.comp = mlx5vf_cq_complete; err = mlx5_core_create_cq(mdev, &cq->mcq, in, inlen, out, sizeof(out)); if (err) goto err_vec; + mlx5_cq_arm(&cq->mcq, MLX5_CQ_DB_REQ_NOT, tracker->uar->map, + cq->mcq.cons_index); kvfree(in); return 0; @@ -1109,3 +1148,155 @@ int mlx5vf_start_page_tracker(struct vfio_device *vdev, mlx5vf_state_mutex_unlock(mvdev); return err; } + +static void +set_report_output(u32 size, int index, struct mlx5_vhca_qp *qp, + struct iova_bitmap *dirty) +{ + u32 entry_size = MLX5_ST_SZ_BYTES(page_track_report_entry); + u32 nent = size / entry_size; + struct page *page; + u64 addr; + u64 *buf; + int i; + + if (WARN_ON(index >= qp->recv_buf.npages || + (nent > qp->max_msg_size / entry_size))) + return; + + page = qp->recv_buf.page_list[index]; + buf = kmap_local_page(page); + for (i = 0; i < nent; i++) { + addr = MLX5_GET(page_track_report_entry, buf + i, + dirty_address_low); + addr |= (u64)MLX5_GET(page_track_report_entry, buf + i, + dirty_address_high) << 32; + iova_bitmap_set(dirty, addr, qp->tracked_page_size); + } + kunmap_local(buf); +} + +static void +mlx5vf_rq_cqe(struct mlx5_vhca_qp *qp, struct mlx5_cqe64 *cqe, + struct iova_bitmap *dirty, int *tracker_status) +{ + u32 size; + int ix; + + qp->rq.cc++; + *tracker_status = be32_to_cpu(cqe->immediate) >> 28; + size = be32_to_cpu(cqe->byte_cnt); + ix = be16_to_cpu(cqe->wqe_counter) & (qp->rq.wqe_cnt - 1); + + /* zero length CQE, no data */ + WARN_ON(!size && *tracker_status == MLX5_PAGE_TRACK_STATE_REPORTING); + if (size) + set_report_output(size, ix, qp, dirty); + + qp->recv_buf.next_rq_offset = ix * qp->max_msg_size; + mlx5vf_post_recv(qp); +} + +static void *get_cqe(struct mlx5_vhca_cq *cq, int n) +{ + return mlx5_frag_buf_get_wqe(&cq->buf.fbc, n); +} + +static struct mlx5_cqe64 *get_sw_cqe(struct mlx5_vhca_cq *cq, int n) +{ + void *cqe = get_cqe(cq, n & (cq->ncqe - 1)); + struct mlx5_cqe64 *cqe64; + + cqe64 = (cq->mcq.cqe_sz == 64) ? cqe : cqe + 64; + + if (likely(get_cqe_opcode(cqe64) != MLX5_CQE_INVALID) && + !((cqe64->op_own & MLX5_CQE_OWNER_MASK) ^ !!(n & (cq->ncqe)))) { + return cqe64; + } else { + return NULL; + } +} + +static int +mlx5vf_cq_poll_one(struct mlx5_vhca_cq *cq, struct mlx5_vhca_qp *qp, + struct iova_bitmap *dirty, int *tracker_status) +{ + struct mlx5_cqe64 *cqe; + u8 opcode; + + cqe = get_sw_cqe(cq, cq->mcq.cons_index); + if (!cqe) + return CQ_EMPTY; + + ++cq->mcq.cons_index; + /* + * Make sure we read CQ entry contents after we've checked the + * ownership bit. + */ + rmb(); + opcode = get_cqe_opcode(cqe); + switch (opcode) { + case MLX5_CQE_RESP_SEND_IMM: + mlx5vf_rq_cqe(qp, cqe, dirty, tracker_status); + return CQ_OK; + default: + return CQ_POLL_ERR; + } +} + +int mlx5vf_tracker_read_and_clear(struct vfio_device *vdev, unsigned long iova, + unsigned long length, + struct iova_bitmap *dirty) +{ + struct mlx5vf_pci_core_device *mvdev = container_of( + vdev, struct mlx5vf_pci_core_device, core_device.vdev); + struct mlx5_vhca_page_tracker *tracker = &mvdev->tracker; + struct mlx5_vhca_cq *cq = &tracker->cq; + struct mlx5_core_dev *mdev; + int poll_err, err; + + mutex_lock(&mvdev->state_mutex); + if (!mvdev->log_active) { + err = -EINVAL; + goto end; + } + + if (mvdev->mdev_detach) { + err = -ENOTCONN; + goto end; + } + + mdev = mvdev->mdev; + err = mlx5vf_cmd_modify_tracker(mdev, tracker->id, iova, length, + MLX5_PAGE_TRACK_STATE_REPORTING); + if (err) + goto end; + + tracker->status = MLX5_PAGE_TRACK_STATE_REPORTING; + while (tracker->status == MLX5_PAGE_TRACK_STATE_REPORTING) { + poll_err = mlx5vf_cq_poll_one(cq, tracker->host_qp, dirty, + &tracker->status); + if (poll_err == CQ_EMPTY) { + mlx5_cq_arm(&cq->mcq, MLX5_CQ_DB_REQ_NOT, tracker->uar->map, + cq->mcq.cons_index); + poll_err = mlx5vf_cq_poll_one(cq, tracker->host_qp, + dirty, &tracker->status); + if (poll_err == CQ_EMPTY) { + wait_for_completion(&mvdev->tracker_comp); + continue; + } + } + if (poll_err == CQ_POLL_ERR) { + err = -EIO; + goto end; + } + mlx5_cq_set_ci(&cq->mcq); + } + + if (tracker->status == MLX5_PAGE_TRACK_STATE_ERROR) + err = -EIO; + +end: + mlx5vf_state_mutex_unlock(mvdev); + return err; +} diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 658925ba5459..fa1f9ab4d3d0 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -86,6 +86,7 @@ struct mlx5_vhca_page_tracker { struct mlx5_vhca_cq cq; struct mlx5_vhca_qp *host_qp; struct mlx5_vhca_qp *fw_qp; + int status; }; struct mlx5vf_pci_core_device { @@ -96,6 +97,7 @@ struct mlx5vf_pci_core_device { u8 deferred_reset:1; u8 mdev_detach:1; u8 log_active:1; + struct completion tracker_comp; /* protect migration state */ struct mutex state_mutex; enum vfio_device_mig_state mig_state; @@ -127,4 +129,6 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work); int mlx5vf_start_page_tracker(struct vfio_device *vdev, struct rb_root_cached *ranges, u32 nnodes, u64 *page_size); int mlx5vf_stop_page_tracker(struct vfio_device *vdev); +int mlx5vf_tracker_read_and_clear(struct vfio_device *vdev, unsigned long iova, + unsigned long length, struct iova_bitmap *dirty); #endif /* MLX5_VFIO_CMD_H */