From patchwork Tue Nov 12 17:03:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11239845 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F19D013BD for ; Tue, 12 Nov 2019 17:32:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BF58D21872 for ; Tue, 12 Nov 2019 17:32:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="YU7k3Srp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727298AbfKLRcx (ORCPT ); Tue, 12 Nov 2019 12:32:53 -0500 Received: from hqemgate15.nvidia.com ([216.228.121.64]:15671 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726932AbfKLRcx (ORCPT ); Tue, 12 Nov 2019 12:32:53 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 12 Nov 2019 09:32:52 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Tue, 12 Nov 2019 09:32:52 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Tue, 12 Nov 2019 09:32:52 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 12 Nov 2019 17:32:52 +0000 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 12 Nov 2019 17:32:51 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Tue, 12 Nov 2019 17:32:45 +0000 From: Kirti Wankhede To: , CC: , , , , , , , , , , , , , , , , , , , , Kirti Wankhede Subject: [PATCH v9 Kernel 1/5] vfio: KABI for migration interface for device state Date: Tue, 12 Nov 2019 22:33:36 +0530 Message-ID: <1573578220-7530-2-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1573578220-7530-1-git-send-email-kwankhede@nvidia.com> References: <1573578220-7530-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1573579972; bh=SJzWTVcCs3ugRMUC+MoFLwbA9yAA6M2044D6woFgoO0=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=YU7k3Srp1K/hH+rjNfxoUF2xwnLQxMLNUQAVjztljd3vw/UnWWU/Nd/BUTyoQlM2e JhzDe+ulELboXB1Hm/JUij4YqkttDnrTxPJDfxC2LXtWvCzZNDiEqA+HIqxxeK3DKM gcBJ3CTMOVK6hIT3n8Xf5aq7t7FQRSl8yarbNpmaBDC+GVw4xjVU9Y6FDMEXy512P6 tnervQFfSHJrSb8utr/147E9g5mSEcwwe4URthoFDN61TblxznFPXfty5POTCQl/0H j8rhwJTrytRcAaWJMcTXKLdSQUHsooYYxad07gryUeylejGR7VFPGsoka1LIpN15X4 2aAuF1fQ4oXRg== Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org - Defined MIGRATION region type and sub-type. - Used 3 bits to define VFIO device states. Bit 0 => _RUNNING Bit 1 => _SAVING Bit 2 => _RESUMING Combination of these bits defines VFIO device's state during migration _RUNNING => Normal VFIO device running state. When its reset, it indicates _STOPPED state. when device is changed to _STOPPED, driver should stop device before write() returns. _SAVING | _RUNNING => vCPUs are running, VFIO device is running but start saving state of device i.e. pre-copy state _SAVING => vCPUs are stopped, VFIO device should be stopped, and save device state,i.e. stop-n-copy state _RESUMING => VFIO device resuming state. _SAVING | _RESUMING and _RUNNING | _RESUMING => Invalid states Bits 3 - 31 are reserved for future use. User should perform read-modify-write operation on this field. - Defined vfio_device_migration_info structure which will be placed at 0th offset of migration region to get/set VFIO device related information. Defined members of structure and usage on read/write access: * device_state: (read/write) To convey VFIO device state to be transitioned to. Only 3 bits are used as of now, Bits 3 - 31 are reserved for future use. * pending bytes: (read only) To get pending bytes yet to be migrated for VFIO device. * data_offset: (read only) To get data offset in migration region from where data exist during _SAVING and from where data should be written by user space application during _RESUMING state. * data_size: (read/write) To get and set size in bytes of data copied in migration region during _SAVING and _RESUMING state. Migration region looks like: ------------------------------------------------------------------ |vfio_device_migration_info| data section | | | /////////////////////////////// | ------------------------------------------------------------------ ^ ^ offset 0-trapped part data_offset Structure vfio_device_migration_info is always followed by data section in the region, so data_offset will always be non-0. Offset from where data to be copied is decided by kernel driver, data section can be trapped or mapped depending on how kernel driver defines data section. Data section partition can be defined as mapped by sparse mmap capability. If mmapped, then data_offset should be page aligned, where as initial section which contain vfio_device_migration_info structure might not end at offset which is page aligned. Vendor driver should decide whether to partition data section and how to partition the data section. Vendor driver should return data_offset accordingly. For user application, data is opaque. User should write data in the same order as received. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- include/uapi/linux/vfio.h | 108 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 108 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 9e843a147ead..35b09427ad9f 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -305,6 +305,7 @@ struct vfio_region_info_cap_type { #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0xffff) #define VFIO_REGION_TYPE_GFX (1) #define VFIO_REGION_TYPE_CCW (2) +#define VFIO_REGION_TYPE_MIGRATION (3) /* sub-types for VFIO_REGION_TYPE_PCI_* */ @@ -379,6 +380,113 @@ struct vfio_region_gfx_edid { /* sub-types for VFIO_REGION_TYPE_CCW */ #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD (1) +/* sub-types for VFIO_REGION_TYPE_MIGRATION */ +#define VFIO_REGION_SUBTYPE_MIGRATION (1) + +/* + * Structure vfio_device_migration_info is placed at 0th offset of + * VFIO_REGION_SUBTYPE_MIGRATION region to get/set VFIO device related migration + * information. Field accesses from this structure are only supported at their + * native width and alignment, otherwise the result is undefined and vendor + * drivers should return an error. + * + * device_state: (read/write) + * To indicate vendor driver the state VFIO device should be transitioned + * to. If device state transition fails, write on this field return error. + * It consists of 3 bits: + * - If bit 0 set, indicates _RUNNING state. When its reset, that indicates + * _STOPPED state. When device is changed to _STOPPED, driver should stop + * device before write() returns. + * - If bit 1 set, indicates _SAVING state. When set, that indicates driver + * should start gathering device state information which will be provided + * to VFIO user space application to save device's state. + * - If bit 2 set, indicates _RESUMING state. When set, that indicates + * prepare to resume device, data provided through migration region + * should be used to resume device. + * Bits 3 - 31 are reserved for future use. User should perform + * read-modify-write operation on this field. + * _SAVING and _RESUMING bits set at the same time is invalid state. + * Similarly _RUNNING and _RESUMING bits set is invalid state. + * + * pending bytes: (read only) + * Number of pending bytes yet to be migrated from vendor driver + * + * data_offset: (read only) + * User application should read data_offset in migration region from where + * user application should read device data during _SAVING state or write + * device data during _RESUMING state. See below for detail of sequence to + * be followed. + * + * data_size: (read/write) + * User application should read data_size to get size of data copied in + * bytes in migration region during _SAVING state and write size of data + * copied in bytes in migration region during _RESUMING state. + * + * Migration region looks like: + * ------------------------------------------------------------------ + * |vfio_device_migration_info| data section | + * | | /////////////////////////////// | + * ------------------------------------------------------------------ + * ^ ^ + * offset 0-trapped part data_offset + * + * Structure vfio_device_migration_info is always followed by data section in + * the region, so data_offset will always be non-0. Offset from where data is + * copied is decided by kernel driver, data section can be trapped or mapped + * or partitioned, depending on how kernel driver defines data section. + * Data section partition can be defined as mapped by sparse mmap capability. + * If mmapped, then data_offset should be page aligned, where as initial section + * which contain vfio_device_migration_info structure might not end at offset + * which is page aligned. + * Vendor driver should decide whether to partition data section and how to + * partition the data section. Vendor driver should return data_offset + * accordingly. + * + * Sequence to be followed for _SAVING|_RUNNING device state or pre-copy phase + * and for _SAVING device state or stop-and-copy phase: + * a. read pending_bytes. If pending_bytes > 0, go through below steps. + * b. read data_offset, indicates kernel driver to write data to staging buffer. + * Kernel driver should return this read operation only after writing data to + * staging buffer is done. + * c. read data_size, amount of data in bytes written by vendor driver in + * migration region. + * d. read data_size bytes of data from data_offset in the migration region. + * e. process data. + * f. Loop through a to e. Next read on pending_bytes indicates that read data + * operation from migration region for previous iteration is done. + * + * Sequence to be followed while _RESUMING device state: + * While data for this device is available, repeat below steps: + * a. read data_offset from where user application should write data. + * b. write data of data_size to migration region from data_offset. + * c. write data_size which indicates vendor driver that data is written in + * staging buffer. Vendor driver should read this data from migration + * region and resume device's state. + * + * For user application, data is opaque. User should write data in the same + * order as received. + */ + +struct vfio_device_migration_info { + __u32 device_state; /* VFIO device state */ +#define VFIO_DEVICE_STATE_RUNNING (1 << 0) +#define VFIO_DEVICE_STATE_SAVING (1 << 1) +#define VFIO_DEVICE_STATE_RESUMING (1 << 2) +#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_RUNNING | \ + VFIO_DEVICE_STATE_SAVING | \ + VFIO_DEVICE_STATE_RESUMING) + +#define VFIO_DEVICE_STATE_INVALID_CASE1 (VFIO_DEVICE_STATE_SAVING | \ + VFIO_DEVICE_STATE_RESUMING) + +#define VFIO_DEVICE_STATE_INVALID_CASE2 (VFIO_DEVICE_STATE_RUNNING | \ + VFIO_DEVICE_STATE_RESUMING) + __u32 reserved; + __u64 pending_bytes; + __u64 data_offset; + __u64 data_size; +} __attribute__((packed)); + /* * The MSIX mappable capability informs that MSIX data of a BAR can be mmapped * which allows direct access to non-MSIX registers which happened to be within From patchwork Tue Nov 12 17:03:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11239847 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F26E913BD for ; Tue, 12 Nov 2019 17:33:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D10B521925 for ; Tue, 12 Nov 2019 17:33:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="WnTSTsCx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727221AbfKLRdA (ORCPT ); Tue, 12 Nov 2019 12:33:00 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:7561 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727323AbfKLRc7 (ORCPT ); Tue, 12 Nov 2019 12:32:59 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 12 Nov 2019 09:32:03 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Tue, 12 Nov 2019 09:32:59 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Tue, 12 Nov 2019 09:32:59 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 12 Nov 2019 17:32:58 +0000 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 12 Nov 2019 17:32:58 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Tue, 12 Nov 2019 17:32:52 +0000 From: Kirti Wankhede To: , CC: , , , , , , , , , , , , , , , , , , , , Kirti Wankhede Subject: [PATCH v9 Kernel 2/5] vfio iommu: Add ioctl defination to get dirty pages bitmap. Date: Tue, 12 Nov 2019 22:33:37 +0530 Message-ID: <1573578220-7530-3-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1573578220-7530-1-git-send-email-kwankhede@nvidia.com> References: <1573578220-7530-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1573579923; bh=hsCfdH2jXMq4AsAY8CLWysoECSs3dz6mo7Od5e+A1nQ=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=WnTSTsCxfyZuYr5B4jwHp9VwBUAYt/aX9LHKlO7qDzEQQVLICYR9bndQww9nBlKY/ 8al3gR/WhXoaK+QVlD/K1JX87hLjrMNBQPCjl6fKrbMQEXIHUnPZGpnldCR1nOdQCh 3ZdbeSWle/0T/c+0oKSnItWSrUPo8bQ0xo3rJQv2kFAKpaBqa1a3xB62rgUSL5W7g9 3fFaoXY+D+Qey5pKGRp81hzCc6UPydO3A/rHdQf1NXLOXg/Il/b7Th2+VCKa/hhBGq 1Y3qDQimx4/P8ISQice0xZo75koOuD7ADdzddcEoUr7SF+IOra6aisNXTrbbXL/8uc fzGZuIHwuILTg== Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org All pages pinned by vendor driver through vfio_pin_pages API should be considered as dirty during migration. IOMMU container maintains a list of all such pinned pages. Added an ioctl defination to get bitmap of such pinned pages for requested IO virtual address range. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- include/uapi/linux/vfio.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 35b09427ad9f..6fd3822aa610 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -902,6 +902,29 @@ struct vfio_iommu_type1_dma_unmap { #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15) #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) +/** + * VFIO_IOMMU_GET_DIRTY_BITMAP - _IOWR(VFIO_TYPE, VFIO_BASE + 17, + * struct vfio_iommu_type1_dirty_bitmap) + * + * IOCTL to get dirty pages bitmap for IOMMU container during migration. + * Get dirty pages bitmap of given IO virtual addresses range using + * struct vfio_iommu_type1_dirty_bitmap. Caller sets argsz, which is size of + * struct vfio_iommu_type1_dirty_bitmap. User should allocate memory to get + * bitmap and should set size of allocated memory in bitmap_size field. + * One bit is used to represent per page consecutively starting from iova + * offset. Bit set indicates page at that offset from iova is dirty. + */ +struct vfio_iommu_type1_dirty_bitmap { + __u32 argsz; + __u32 flags; + __u64 iova; /* IO virtual address */ + __u64 size; /* Size of iova range */ + __u64 bitmap_size; /* in bytes */ + void __user *bitmap; /* one bit per page */ +}; + +#define VFIO_IOMMU_GET_DIRTY_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 17) + /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ /* From patchwork Tue Nov 12 17:03:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11239849 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7C51514E5 for ; Tue, 12 Nov 2019 17:33:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5ABE721872 for ; Tue, 12 Nov 2019 17:33:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="O8wySMnJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727357AbfKLRdH (ORCPT ); Tue, 12 Nov 2019 12:33:07 -0500 Received: from hqemgate14.nvidia.com ([216.228.121.143]:2662 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726738AbfKLRdH (ORCPT ); Tue, 12 Nov 2019 12:33:07 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 12 Nov 2019 09:33:08 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Tue, 12 Nov 2019 09:33:05 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Tue, 12 Nov 2019 09:33:05 -0800 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 12 Nov 2019 17:33:05 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Tue, 12 Nov 2019 17:32:58 +0000 From: Kirti Wankhede To: , CC: , , , , , , , , , , , , , , , , , , , , "Kirti Wankhede" Subject: [PATCH v9 Kernel 3/5] vfio iommu: Add ioctl defination to unmap IOVA and return dirty bitmap Date: Tue, 12 Nov 2019 22:33:38 +0530 Message-ID: <1573578220-7530-4-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1573578220-7530-1-git-send-email-kwankhede@nvidia.com> References: <1573578220-7530-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1573579988; bh=ZL8WRUjv2HMqr+5c71mCqu+CqlFsMn0q7XRR6YFy7As=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=O8wySMnJz96tThtn5yOR+jCMWh6qABnnCPWLz3cdiGSOtiliQK4lhmHFp1riF6Kef 8QWku+dA48Te8BFdge2IUI/gYPfN5QwAottVcUVYvZwd+Ib7tP72A7qFxwFib2e63V L0XbPiATETwH/HZpGRj3/iVJVF1UHvXDiXuuQOILWvDhMUSv6iYkQHw/lkTwyaKrL6 72ndmFYV79GTUNXZUonCHDqLH6ndFyE0S+sEbZLRH3X2zHbEb8n+WCBtGMVk2qHZH8 oMnAPwTVtNWZ5ZsZ6CrQD1xWwdRUWDM2oEi4HbosTCQvVa4TXt1/7noFylyv3M1Vih 4ZeDcpe1PK0nQ== Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org With vIOMMU, during pre-copy phase of migration, while CPUs are still running, IO virtual address unmap can happen while device still keeping reference of guest pfns. Those pages should be reported as dirty before unmap, so that VFIO user space application can copy content of those pages from source to destination. IOCTL defination added here add bitmap pointer, size and flag. If flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP is set and bitmap memory is allocated and bitmap_size of set, then ioctl will create bitmap of pinned pages and then unmap those. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- include/uapi/linux/vfio.h | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 6fd3822aa610..72fd297baf52 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -925,6 +925,39 @@ struct vfio_iommu_type1_dirty_bitmap { #define VFIO_IOMMU_GET_DIRTY_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 17) +/** + * VFIO_IOMMU_UNMAP_DMA_GET_BITMAP - _IOWR(VFIO_TYPE, VFIO_BASE + 18, + * struct vfio_iommu_type1_dma_unmap_bitmap) + * + * Unmap IO virtual addresses using the provided struct + * vfio_iommu_type1_dma_unmap_bitmap. Caller sets argsz. + * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP should be set to get dirty bitmap + * before unmapping IO virtual addresses. If this flag is not set, only IO + * virtual address are unmapped without creating pinned pages bitmap, that + * is, behave same as VFIO_IOMMU_UNMAP_DMA ioctl. + * User should allocate memory to get bitmap and should set size of allocated + * memory in bitmap_size field. One bit in bitmap is used to represent per page + * consecutively starting from iova offset. Bit set indicates page at that + * offset from iova is dirty. + * The actual unmapped size is returned in the size field and bitmap of pages + * in the range of unmapped size is returned in bitmap if flag + * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP is set. + * + * No guarantee is made to the user that arbitrary unmaps of iova or size + * different from those used in the original mapping call will succeed. + */ +struct vfio_iommu_type1_dma_unmap_bitmap { + __u32 argsz; + __u32 flags; +#define VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP (1 << 0) + __u64 iova; /* IO virtual address */ + __u64 size; /* Size of mapping (bytes) */ + __u64 bitmap_size; /* in bytes */ + void __user *bitmap; /* one bit per page */ +}; + +#define VFIO_IOMMU_UNMAP_DMA_GET_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 18) + /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ /* From patchwork Tue Nov 12 17:03:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11239857 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3F7C314E5 for ; Tue, 12 Nov 2019 17:33:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 202F4222C1 for ; Tue, 12 Nov 2019 17:33:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="Oa9jIzdS" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727183AbfKLRdN (ORCPT ); Tue, 12 Nov 2019 12:33:13 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:7591 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727423AbfKLRdN (ORCPT ); Tue, 12 Nov 2019 12:33:13 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 12 Nov 2019 09:32:16 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Tue, 12 Nov 2019 09:33:12 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Tue, 12 Nov 2019 09:33:12 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 12 Nov 2019 17:33:11 +0000 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 12 Nov 2019 17:33:11 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Tue, 12 Nov 2019 17:33:05 +0000 From: Kirti Wankhede To: , CC: , , , , , , , , , , , , , , , , , , , , Kirti Wankhede Subject: [PATCH v9 Kernel 4/5] vfio iommu: Implementation of ioctl to get dirty pages bitmap. Date: Tue, 12 Nov 2019 22:33:39 +0530 Message-ID: <1573578220-7530-5-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1573578220-7530-1-git-send-email-kwankhede@nvidia.com> References: <1573578220-7530-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1573579936; bh=U7o8nopuwa5svw0kgwL12cP5oHdgEJ1PRVw74v24iMA=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=Oa9jIzdSNDv8nSrjw2gE/HEM5+chCLs5VFmZT1f6pXZk2cFN2f9W961uIXq4C6J9A 0wbQ+0xoH0eWsgA3rGKkd/gky0UIN++q7AC3MnvsywmVH7DJzmCoyPdKJ/uf4Wue6u CDCX1KhJPZulX2UrjHmKmiI+IiztBcspQsdKf+zomDq1oNFRvyiGPBgyiEa/9xKtvs r61NvcpH54uToWIdrnMcGwiyNyq2UJptvqwmFsa2K/Vr1zTsnAUp+I22nbw2pWEoNN TplD8/9Vf9QAcl4os7DY6lFZGzZXG2iJp4aBxkG2VQgl1zytul1kVTRvgfWiTOhoSi EunXiz8XRiqMA== Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org IOMMU container maintains list of external pinned pages. Bitmap of pinned pages for input IO virtual address range is created and returned. IO virtual address range should be from a single mapping created by map request. Input bitmap_size is validated by calculating the size of requested range. This ioctl returns bitmap of dirty pages, its user space application responsibility to copy content of dirty pages from source to destination during migration. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio_iommu_type1.c | 92 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 92 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 2ada8e6cdb88..ac176e672857 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -850,6 +850,81 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) return bitmap; } +/* + * start_iova is the reference from where bitmaping started. This is called + * from DMA_UNMAP where start_iova can be different than iova + */ + +static int vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, dma_addr_t iova, + size_t size, dma_addr_t start_iova, + unsigned long *bitmap) +{ + struct vfio_dma *dma; + dma_addr_t temp_iova = iova; + + dma = vfio_find_dma(iommu, iova, size); + if (!dma) + return -EINVAL; + + /* + * Range should be from a single mapping created by map request. + */ + + if ((iova < dma->iova) || + ((dma->iova + dma->size) < (iova + size))) + return -EINVAL; + + while (temp_iova < iova + size) { + struct vfio_pfn *vpfn = NULL; + + vpfn = vfio_find_vpfn(dma, temp_iova); + if (vpfn) + __bitmap_set(bitmap, vpfn->iova - start_iova, 1); + + temp_iova += PAGE_SIZE; + } + + return 0; +} + +static int verify_bitmap_size(unsigned long npages, unsigned long bitmap_size) +{ + unsigned long bsize = ALIGN(npages, BITS_PER_LONG) / 8; + + if ((bitmap_size == 0) || (bitmap_size < bsize)) + return -EINVAL; + return 0; +} + +static int vfio_iova_get_dirty_bitmap(struct vfio_iommu *iommu, + struct vfio_iommu_type1_dirty_bitmap *range) +{ + unsigned long *bitmap; + int ret; + + ret = verify_bitmap_size(range->size >> PAGE_SHIFT, range->bitmap_size); + if (ret) + return ret; + + /* one bit per page */ + bitmap = bitmap_zalloc(range->size >> PAGE_SHIFT, GFP_KERNEL); + if (!bitmap) + return -ENOMEM; + + mutex_lock(&iommu->lock); + ret = vfio_iova_dirty_bitmap(iommu, range->iova, range->size, + range->iova, bitmap); + mutex_unlock(&iommu->lock); + + if (!ret) { + if (copy_to_user(range->bitmap, bitmap, range->bitmap_size)) + ret = -EFAULT; + } + + bitmap_free(bitmap); + return ret; +} + static int vfio_dma_do_unmap(struct vfio_iommu *iommu, struct vfio_iommu_type1_dma_unmap *unmap) { @@ -2297,6 +2372,23 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, return copy_to_user((void __user *)arg, &unmap, minsz) ? -EFAULT : 0; + } else if (cmd == VFIO_IOMMU_GET_DIRTY_BITMAP) { + struct vfio_iommu_type1_dirty_bitmap range; + + /* Supported for v2 version only */ + if (!iommu->v2) + return -EACCES; + + minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap, + bitmap); + + if (copy_from_user(&range, (void __user *)arg, minsz)) + return -EFAULT; + + if (range.argsz < minsz) + return -EINVAL; + + return vfio_iova_get_dirty_bitmap(iommu, &range); } return -ENOTTY; From patchwork Tue Nov 12 17:03:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11239867 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B8D7017E6 for ; Tue, 12 Nov 2019 17:33:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9128421872 for ; Tue, 12 Nov 2019 17:33:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="HvyAOKaO" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727473AbfKLRdU (ORCPT ); Tue, 12 Nov 2019 12:33:20 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:7610 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727409AbfKLRdU (ORCPT ); Tue, 12 Nov 2019 12:33:20 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 12 Nov 2019 09:32:22 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Tue, 12 Nov 2019 09:33:18 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Tue, 12 Nov 2019 09:33:18 -0800 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 12 Nov 2019 17:33:18 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Tue, 12 Nov 2019 17:33:11 +0000 From: Kirti Wankhede To: , CC: , , , , , , , , , , , , , , , , , , , , "Kirti Wankhede" Subject: [PATCH v9 Kernel 5/5] vfio iommu: Implementation of ioctl to get dirty bitmap before unmap Date: Tue, 12 Nov 2019 22:33:40 +0530 Message-ID: <1573578220-7530-6-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1573578220-7530-1-git-send-email-kwankhede@nvidia.com> References: <1573578220-7530-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1573579942; bh=b76VG1CtaF0t0W+2xw926JIpF08VPIo1rVi43atlLHw=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=HvyAOKaOIfCq27eIY++DpFpn6cLVxF3V9P5LNQc9uHf11GXj1ihxcOK+JmcFRfsNz qvqi0Y5BiUG3bMqb3O30JBCjnkRBpCjljKVmo8xzhXzA5TF1XXxsLCCr14DuAoyHfl Qo2Dd4hGGU1XLz/cieVBumRSxOb89vDZ0LGD9edGUniATX0P0/9q1juq0UC+DLN8pu zXqeqlU5ymQHbnV9/Kw3Y/OcJzWOKlRfE708v93GHAAFVF4tG+gR8Q6gdtU9xdagVl PSo1vUk/qp063s+/tkvt5n0jKmsNVHF2ht2VU1NM25ZfCEAIkMylqMnKAUPL3fvm5L 8o9hcZnmFuIJg== Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org If pages are pinned by external interface for requested IO virtual address range, bitmap of such pages is created and then that range is unmapped. To get bitmap during unmap, user should set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP, bitmap memory should be allocated and bitmap_size should be set. If flag is not set, then it behaves same as VFIO_IOMMU_UNMAP_DMA ioctl. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio_iommu_type1.c | 71 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 69 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index ac176e672857..d6b988452ba6 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -926,7 +926,8 @@ static int vfio_iova_get_dirty_bitmap(struct vfio_iommu *iommu, } static int vfio_dma_do_unmap(struct vfio_iommu *iommu, - struct vfio_iommu_type1_dma_unmap *unmap) + struct vfio_iommu_type1_dma_unmap *unmap, + unsigned long *bitmap) { uint64_t mask; struct vfio_dma *dma, *dma_last = NULL; @@ -1026,6 +1027,12 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, &nb_unmap); goto again; } + + if (bitmap) { + vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size, + unmap->iova, bitmap); + } + unmapped += dma->size; vfio_remove_dma(iommu, dma); } @@ -1039,6 +1046,43 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, return ret; } +static int vfio_dma_do_unmap_bitmap(struct vfio_iommu *iommu, + struct vfio_iommu_type1_dma_unmap_bitmap *unmap_bitmap) +{ + struct vfio_iommu_type1_dma_unmap unmap; + unsigned long *bitmap = NULL; + int ret; + + /* check bitmap size */ + if ((unmap_bitmap->flags | VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)) { + ret = verify_bitmap_size(unmap_bitmap->size >> PAGE_SHIFT, + unmap_bitmap->bitmap_size); + if (ret) + return ret; + + /* one bit per page */ + bitmap = bitmap_zalloc(unmap_bitmap->size >> PAGE_SHIFT, + GFP_KERNEL); + if (!bitmap) + return -ENOMEM; + } + + unmap.iova = unmap_bitmap->iova; + unmap.size = unmap_bitmap->size; + ret = vfio_dma_do_unmap(iommu, &unmap, bitmap); + if (!ret) + unmap_bitmap->size = unmap.size; + + if (bitmap) { + if (!ret && copy_to_user(unmap_bitmap->bitmap, bitmap, + unmap_bitmap->bitmap_size)) + ret = -EFAULT; + bitmap_free(bitmap); + } + + return ret; +} + static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova, unsigned long pfn, long npage, int prot) { @@ -2366,7 +2410,7 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, if (unmap.argsz < minsz || unmap.flags) return -EINVAL; - ret = vfio_dma_do_unmap(iommu, &unmap); + ret = vfio_dma_do_unmap(iommu, &unmap, NULL); if (ret) return ret; @@ -2389,6 +2433,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, return -EINVAL; return vfio_iova_get_dirty_bitmap(iommu, &range); + } else if (cmd == VFIO_IOMMU_UNMAP_DMA_GET_BITMAP) { + struct vfio_iommu_type1_dma_unmap_bitmap unmap_bitmap; + long ret; + + /* Supported for v2 version only */ + if (!iommu->v2) + return -EACCES; + + minsz = offsetofend(struct vfio_iommu_type1_dma_unmap_bitmap, + bitmap); + + if (copy_from_user(&unmap_bitmap, (void __user *)arg, minsz)) + return -EFAULT; + + if (unmap_bitmap.argsz < minsz) + return -EINVAL; + + ret = vfio_dma_do_unmap_bitmap(iommu, &unmap_bitmap); + if (ret) + return ret; + + return copy_to_user((void __user *)arg, &unmap_bitmap, minsz) ? + -EFAULT : 0; } return -ENOTTY;