From patchwork Tue Dec 17 17:10:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11298337 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 74F82139A for ; Tue, 17 Dec 2019 17:43:37 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E4E30206D8 for ; Tue, 17 Dec 2019 17:43:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="phuagMhX" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E4E30206D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:44440 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihGsh-0006aV-9L for patchwork-qemu-devel@patchwork.kernel.org; Tue, 17 Dec 2019 12:43:35 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:49803) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihGp3-0001cG-Pg for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:39:51 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ihGp1-0000hw-Vc for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:39:49 -0500 Received: from hqnvemgate25.nvidia.com ([216.228.121.64]:11835) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ihGp1-0000cq-PR for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:39:47 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate25.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 17 Dec 2019 09:39:36 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Tue, 17 Dec 2019 09:39:46 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Tue, 17 Dec 2019 09:39:46 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 17 Dec 2019 17:39:45 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Tue, 17 Dec 2019 17:39:38 +0000 From: Kirti Wankhede To: , Subject: [PATCH v11 Kernel 1/6] vfio: KABI for migration interface for device state Date: Tue, 17 Dec 2019 22:40:46 +0530 Message-ID: <1576602651-15430-2-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1576602651-15430-1-git-send-email-kwankhede@nvidia.com> References: <1576602651-15430-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1576604377; bh=7AYRXh4LoclkN77ICtn3qTCw/alys5vsy9xwsNonWIk=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=phuagMhXJzzoR9ii6XHePNemKQPtxo9DwZrXP6xc89c3Xz6eoxFdKA1ok9NddJ3Re cEXNBFozcfrF9sdIVvnmW1JiAY6D+XPMVvCSjFoA+SG4eu2rjt8AhQATeHoIULqCtm CK5oYxc+r1S3UagIvZlWKEVRszVn/D/xnZy/u8sJFuGz5hskIPh8GZeO25fz0USESA XIlN2zfBSa2NhzxImaWZyRaeGJpy6mWKnIfHwEdfsyxPIJQamzy8XXM+PLCSrXEq1x LBKTpqxdMpb8N+ZvEgZXVUevFVIIOGunY2nKRWd2wXpN+ufvp1LIaYACRlSxaTD/E+ ewBw5LgKnBbFg== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.64 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" - Defined MIGRATION region type and sub-type. - Defined vfio_device_migration_info structure which will be placed at 0th offset of migration region to get/set VFIO device related information. Defined members of structure and usage on read/write access. - Defined device states and added state transition details in the comment. - Added sequence to be followed while saving and resuming VFIO device state Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- include/uapi/linux/vfio.h | 187 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 187 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 9e843a147ead..b7ac8f7c0d3c 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -305,6 +305,7 @@ struct vfio_region_info_cap_type { #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0xffff) #define VFIO_REGION_TYPE_GFX (1) #define VFIO_REGION_TYPE_CCW (2) +#define VFIO_REGION_TYPE_MIGRATION (3) /* sub-types for VFIO_REGION_TYPE_PCI_* */ @@ -379,6 +380,192 @@ struct vfio_region_gfx_edid { /* sub-types for VFIO_REGION_TYPE_CCW */ #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD (1) +/* sub-types for VFIO_REGION_TYPE_MIGRATION */ +#define VFIO_REGION_SUBTYPE_MIGRATION (1) + +/* + * Structure vfio_device_migration_info is placed at 0th offset of + * VFIO_REGION_SUBTYPE_MIGRATION region to get/set VFIO device related migration + * information. Field accesses from this structure are only supported at their + * native width and alignment, otherwise the result is undefined and vendor + * drivers should return an error. + * + * device_state: (read/write) + * To indicate vendor driver the state VFIO device should be transitioned + * to. If device state transition fails, write on this field return error. + * It consists of 3 bits: + * - If bit 0 set, indicates _RUNNING state. When it's clear, that + * indicates _STOP state. When device is changed to _STOP, driver should + * stop device before write() returns. + * - If bit 1 set, indicates _SAVING state. When set, that indicates driver + * should start gathering device state information which will be provided + * to VFIO user space application to save device's state. + * - If bit 2 set, indicates _RESUMING state. When set, that indicates + * prepare to resume device, data provided through migration region + * should be used to resume device. + * Bits 3 - 31 are reserved for future use. User should perform + * read-modify-write operation on this field. + * + * +------- _RESUMING + * |+------ _SAVING + * ||+----- _RUNNING + * ||| + * 000b => Device Stopped, not saving or resuming + * 001b => Device running state, default state + * 010b => Stop Device & save device state, stop-and-copy state + * 011b => Device running and save device state, pre-copy state + * 100b => Device stopped and device state is resuming + * 101b => Invalid state + * 110b => Invalid state + * 111b => Invalid state + * + * State transitions: + * + * _RESUMING _RUNNING Pre-copy Stop-and-copy _STOP + * (100b) (001b) (011b) (010b) (000b) + * 0. Running or Default state + * | + * + * 1. Normal Shutdown (optional) + * |------------------------------------->| + * + * 2. Save state or Suspend + * |------------------------->|---------->| + * + * 3. Save state during live migration + * |----------->|------------>|---------->| + * + * 4. Resuming + * |<---------| + * + * 5. Resumed + * |--------->| + * + * 0. Default state of VFIO device is _RUNNNG when VFIO application starts. + * 1. During normal VFIO application shutdown, vfio device state changes + * from _RUNNING to _STOP. This is optional, user space application may or + * may not perform this state transition and vendor driver may not need. + * 2. When VFIO application save state or suspend application, VFIO device + * state transition is from _RUNNING to stop-and-copy state and then to + * _STOP. + * On state transition from _RUNNING to stop-and-copy, driver must + * stop device, save device state and send it to application through + * migration region. + * On _RUNNING to stop-and-copy state transition failure, application should + * set VFIO device state to _RUNNING. + * 3. In VFIO application live migration, state transition is from _RUNNING + * to pre-copy to stop-and-copy to _STOP. + * On state transition from _RUNNING to pre-copy, driver should start + * gathering device state while application is still running and send device + * state data to application through migration region. + * On state transition from pre-copy to stop-and-copy, driver must stop + * device, save device state and send it to application through migration + * region. + * On any failure during any of these state transition, VFIO device state + * should be set to _RUNNING. + * 4. To start resuming phase, VFIO device state should be transitioned from + * _RUNNING to _RESUMING state. + * In _RESUMING state, driver should use received device state data through + * migration region to resume device. + * On failure during this state transition, application should set _RUNNING + * state. + * 5. On providing saved device data to driver, appliation should change state + * from _RESUMING to _RUNNING. + * On failure to transition to _RUNNING state, VFIO application should reset + * the device and set _RUNNING state so that device doesn't remain in unknown + * or bad state. On reset, driver must reset device and device should be + * available in default initial state, _RUNNING. + * + * pending bytes: (read only) + * Number of pending bytes yet to be migrated from vendor driver + * + * data_offset: (read only) + * User application should read data_offset in migration region from where + * user application should read device data during _SAVING state or write + * device data during _RESUMING state. See below for detail of sequence to + * be followed. + * + * data_size: (read/write) + * User application should read data_size to get size of data copied in + * bytes in migration region during _SAVING state and write size of data + * copied in bytes in migration region during _RESUMING state. + * + * Migration region looks like: + * ------------------------------------------------------------------ + * |vfio_device_migration_info| data section | + * | | /////////////////////////////// | + * ------------------------------------------------------------------ + * ^ ^ + * offset 0-trapped part data_offset + * + * Structure vfio_device_migration_info is always followed by data section in + * the region, so data_offset will always be non-0. Offset from where data is + * copied is decided by kernel driver, data section can be trapped or mapped + * or partitioned, depending on how kernel driver defines data section. + * Data section partition can be defined as mapped by sparse mmap capability. + * If mmapped, then data_offset should be page aligned, where as initial section + * which contain vfio_device_migration_info structure might not end at offset + * which is page aligned. The user is not required to access via mmap regardless + * of the region mmap capabilities. + * Vendor driver should decide whether to partition data section and how to + * partition the data section. Vendor driver should return data_offset + * accordingly. + * + * Sequence to be followed for _SAVING|_RUNNING device state or pre-copy phase + * and for _SAVING device state or stop-and-copy phase: + * a. read pending_bytes, indicates start of new iteration to get device data. + * If pending_bytes > 0, go through below steps. + * b. read data_offset, indicates kernel driver to make data available through + * data section. Kernel driver should return this read operation only after + * data is available from (region + data_offset) to (region + data_offset + + * data_size). + * c. read data_size, amount of data in bytes available through migration + * region. + * d. read data of data_size bytes from (region + data_offset) from migration + * region. + * e. process data. + * f. read pending_bytes, this read operation indicates data from previous + * iteration had read. If pending_bytes > 0, goto step b. + * + * User can transition from _SAVING|_RUNNING (pre-copy state) to _SAVING + * (stop-and-copy) state regardless of pending bytes. + * User should iterate in _SAVING (stop-and-copy) until pending_bytes is 0. + * + * Sequence to be followed while _RESUMING device state: + * While data for this device is available, repeat below steps: + * a. read data_offset from where user application should write data. + * b. write data of data_size to migration region from data_offset. Data size + * could be data packet size at source during _SAVING or migration region + * data section size which ever is less. + * c. write data_size which indicates vendor driver that data is written in + * staging buffer. Vendor driver should read this data from migration + * region and resume device's state. + * + * For user application, data is opaque. User should write data in the same + * order as received. + */ + +struct vfio_device_migration_info { + __u32 device_state; /* VFIO device state */ +#define VFIO_DEVICE_STATE_STOP (0) +#define VFIO_DEVICE_STATE_RUNNING (1 << 0) +#define VFIO_DEVICE_STATE_SAVING (1 << 1) +#define VFIO_DEVICE_STATE_RESUMING (1 << 2) +#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_RUNNING | \ + VFIO_DEVICE_STATE_SAVING | \ + VFIO_DEVICE_STATE_RESUMING) + +#define VFIO_DEVICE_STATE_INVALID_CASE1 (VFIO_DEVICE_STATE_SAVING | \ + VFIO_DEVICE_STATE_RESUMING) + +#define VFIO_DEVICE_STATE_INVALID_CASE2 (VFIO_DEVICE_STATE_RUNNING | \ + VFIO_DEVICE_STATE_RESUMING) + __u32 reserved; + __u64 pending_bytes; + __u64 data_offset; + __u64 data_size; +} __attribute__((packed)); + /* * The MSIX mappable capability informs that MSIX data of a BAR can be mmapped * which allows direct access to non-MSIX registers which happened to be within From patchwork Tue Dec 17 17:10:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11298333 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C7551109A for ; Tue, 17 Dec 2019 17:41:26 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9D878206D8 for ; Tue, 17 Dec 2019 17:41:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="XmmXOTHz" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9D878206D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:44386 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihGqb-0003dM-FJ for patchwork-qemu-devel@patchwork.kernel.org; Tue, 17 Dec 2019 12:41:25 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:49897) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihGp9-0001lV-JZ for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:39:56 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ihGp8-0000rr-Ch for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:39:55 -0500 Received: from hqnvemgate25.nvidia.com ([216.228.121.64]:11847) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ihGp8-0000qK-6p for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:39:54 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate25.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 17 Dec 2019 09:39:43 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Tue, 17 Dec 2019 09:39:52 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Tue, 17 Dec 2019 09:39:52 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 17 Dec 2019 17:39:52 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Tue, 17 Dec 2019 17:39:45 +0000 From: Kirti Wankhede To: , Subject: [PATCH v11 Kernel 2/6] vfio iommu: Add ioctl definition for dirty pages tracking. Date: Tue, 17 Dec 2019 22:40:47 +0530 Message-ID: <1576602651-15430-3-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1576602651-15430-1-git-send-email-kwankhede@nvidia.com> References: <1576602651-15430-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1576604383; bh=Y7ec53IcGD4zcnvo82N9DW1lEyMmknttTvvdgtYF6eY=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=XmmXOTHza0I3A7M92DGGBDPTAiijyH4b3PtWvzx6LN4FAkplgwuxEv/34DZMEGJPE FJdk/9s4dpOJp9EFlm0/as/N3jNOHhAdG9PxgZWG7SG9ycOfodVBOs9gVSUMqqKhRG 9DhzHPxngovCyiUJwhw/BAdW8/i2K683VQ90okFApxf544bHEQ9l8fkrEoihnlL5qB zbnm5SD9ykykg3RlC4tq4xzWAAln4TBLcmH77Srugc0L6BPb6z+ivSpyN/g+1MDNXP 85asmmCdCFA+kK88/051bY1/KyOCGeQp/58HZiQra+Q3dsJPaxEcHo/D8CjIbi0f6h 8JvcxeoAP66YA== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.64 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" IOMMU container maintains a list of all pages pinned by vfio_pin_pages API. All pages pinned by vendor driver through this API should be considered as dirty during migration. When container consists of IOMMU capable device and all pages are pinned and mapped, then all pages are marked dirty. Added support to start/stop unpinned pages tracking and to get bitmap of all dirtied pages for requested IO virtual address range. Unpinned page tracking is cleared either when bitmap is read by user application or unpinned page tracking is stopped. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- include/uapi/linux/vfio.h | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index b7ac8f7c0d3c..8268634e7e08 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -981,6 +981,49 @@ struct vfio_iommu_type1_dma_unmap { #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15) #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) +/** + * VFIO_IOMMU_DIRTY_PAGES - _IOWR(VFIO_TYPE, VFIO_BASE + 17, + * struct vfio_iommu_type1_dirty_bitmap) + * IOCTL is used for dirty pages tracking. Caller sets argsz, which is size of + * struct vfio_iommu_type1_dirty_bitmap. Caller set flag depend on which + * operation to perform, details as below: + * + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_START set, indicates + * migration is active and IOMMU module should track pages which are being + * unpinned. Unpinned pages are tracked until bitmap for that range is queried + * or tracking is stopped by user application by setting + * VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP flag. + * + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP set, indicates + * IOMMU should stop tracking unpinned pages and also free previously tracked + * unpinned pages data. + * + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set, + * IOCTL returns dirty pages bitmap for IOMMU container during migration for + * given IOVA range. User must allocate memory to get bitmap, zero the bitmap + * memory and set size of allocated memory in bitmap_size field. One bit is + * used to represent one page consecutively starting from iova offset. User + * should provide page size in 'pgsize'. Bit set in bitmap indicates page at + * that offset from iova is dirty. + * + * Only one flag should be set at a time. + * + */ +struct vfio_iommu_type1_dirty_bitmap { + __u32 argsz; + __u32 flags; +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_START (1 << 0) +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP (1 << 1) +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP (1 << 2) + __u64 iova; /* IO virtual address */ + __u64 size; /* Size of iova range */ + __u64 pgsize; /* page size for bitmap */ + __u64 bitmap_size; /* in bytes */ + void __user *bitmap; /* one bit per page */ +}; + +#define VFIO_IOMMU_DIRTY_PAGES _IO(VFIO_TYPE, VFIO_BASE + 17) + /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ /* From patchwork Tue Dec 17 17:10:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11298335 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 77269109A for ; Tue, 17 Dec 2019 17:41:39 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3C60D206D8 for ; Tue, 17 Dec 2019 17:41:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="OmlvCHUo" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3C60D206D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:44390 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihGqo-0003ze-0V for patchwork-qemu-devel@patchwork.kernel.org; Tue, 17 Dec 2019 12:41:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:49997) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihGpH-0001yS-9c for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:40:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ihGpF-00012L-IY for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:40:03 -0500 Received: from hqnvemgate24.nvidia.com ([216.228.121.143]:13174) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ihGpF-00010U-CV for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:40:01 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 17 Dec 2019 09:39:32 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Tue, 17 Dec 2019 09:39:59 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Tue, 17 Dec 2019 09:39:59 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 17 Dec 2019 17:39:59 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Tue, 17 Dec 2019 17:39:52 +0000 From: Kirti Wankhede To: , Subject: [PATCH v11 Kernel 3/6] vfio iommu: Implementation of ioctl to for dirty pages tracking. Date: Tue, 17 Dec 2019 22:40:48 +0530 Message-ID: <1576602651-15430-4-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1576602651-15430-1-git-send-email-kwankhede@nvidia.com> References: <1576602651-15430-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1576604372; bh=QOnEmR+EkkyOxeGaIH1jBaSXyczDPhA7hX51JeJjvWs=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=OmlvCHUonHtgDtIm0vQQnzqdUAlMt6ndmApLDNzAEM73LOV8r2ulZPKV8iqW4aCeE 3/NovsASmFyg43oFWzRnLlFDGkuQX8+ZR/fR6HvIDwpF6wM0gW9R9nhok23txc20o5 MncDH9sqWgoEBytnDehZIe1XTly9iYsUm07O//42eh2NMYW9lw0ZDqrIQ/z+PPqwe8 THNw6pQy/2/tP7MiUioBYqC/rW0W0bZagNcTcfPKefEfJt0buMgGl5YXCKsfPcBr76 mEMX1g7yaiPnDAl1jWAFQIyFV20NmCLBgKyLZk1AWj3W3goJvMhke5IkPG3N81hEn+ 7DrL9vEBV/0gQ== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.143 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations: - Start unpinned pages dirty pages tracking while migration is active and device is running, i.e. during pre-copy phase. - Stop unpinned pages dirty pages tracking. This is required to stop unpinned dirty pages tracking if migration failed or cancelled during pre-copy phase. Unpinned pages tracking is clear. - Get dirty pages bitmap. Stop unpinned dirty pages tracking and clear unpinned pages information on bitmap read. This ioctl returns bitmap of dirty pages, its user space application responsibility to copy content of dirty pages from source to destination during migration. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio_iommu_type1.c | 218 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 209 insertions(+), 9 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 2ada8e6cdb88..215aecb25453 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -70,6 +70,7 @@ struct vfio_iommu { unsigned int dma_avail; bool v2; bool nesting; + bool dirty_page_tracking; }; struct vfio_domain { @@ -112,6 +113,7 @@ struct vfio_pfn { dma_addr_t iova; /* Device address */ unsigned long pfn; /* Host pfn */ atomic_t ref_count; + bool unpinned; }; struct vfio_regions { @@ -244,6 +246,32 @@ static void vfio_remove_from_pfn_list(struct vfio_dma *dma, kfree(vpfn); } +static void vfio_remove_unpinned_from_pfn_list(struct vfio_dma *dma, bool warn) +{ + struct rb_node *n = rb_first(&dma->pfn_list); + + for (; n; n = rb_next(n)) { + struct vfio_pfn *vpfn = rb_entry(n, struct vfio_pfn, node); + + if (warn) + WARN_ON_ONCE(vpfn->unpinned); + + if (vpfn->unpinned) + vfio_remove_from_pfn_list(dma, vpfn); + } +} + +static void vfio_remove_unpinned_from_dma_list(struct vfio_iommu *iommu) +{ + struct rb_node *n = rb_first(&iommu->dma_list); + + for (; n; n = rb_next(n)) { + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); + + vfio_remove_unpinned_from_pfn_list(dma, false); + } +} + static struct vfio_pfn *vfio_iova_get_vfio_pfn(struct vfio_dma *dma, unsigned long iova) { @@ -254,13 +282,17 @@ static struct vfio_pfn *vfio_iova_get_vfio_pfn(struct vfio_dma *dma, return vpfn; } -static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn) +static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn, + bool dirty_tracking) { int ret = 0; if (atomic_dec_and_test(&vpfn->ref_count)) { ret = put_pfn(vpfn->pfn, dma->prot); - vfio_remove_from_pfn_list(dma, vpfn); + if (dirty_tracking) + vpfn->unpinned = true; + else + vfio_remove_from_pfn_list(dma, vpfn); } return ret; } @@ -504,7 +536,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr, } static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova, - bool do_accounting) + bool do_accounting, bool dirty_tracking) { int unlocked; struct vfio_pfn *vpfn = vfio_find_vpfn(dma, iova); @@ -512,7 +544,10 @@ static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova, if (!vpfn) return 0; - unlocked = vfio_iova_put_vfio_pfn(dma, vpfn); + if (vpfn->unpinned) + return 0; + + unlocked = vfio_iova_put_vfio_pfn(dma, vpfn, dirty_tracking); if (do_accounting) vfio_lock_acct(dma, -unlocked, true); @@ -571,8 +606,12 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, vpfn = vfio_iova_get_vfio_pfn(dma, iova); if (vpfn) { - phys_pfn[i] = vpfn->pfn; - continue; + if (vpfn->unpinned) + vfio_remove_from_pfn_list(dma, vpfn); + else { + phys_pfn[i] = vpfn->pfn; + continue; + } } remote_vaddr = dma->vaddr + iova - dma->iova; @@ -583,7 +622,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]); if (ret) { - vfio_unpin_page_external(dma, iova, do_accounting); + vfio_unpin_page_external(dma, iova, do_accounting, + false); goto pin_unwind; } } @@ -598,7 +638,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, iova = user_pfn[j] << PAGE_SHIFT; dma = vfio_find_dma(iommu, iova, PAGE_SIZE); - vfio_unpin_page_external(dma, iova, do_accounting); + vfio_unpin_page_external(dma, iova, do_accounting, false); phys_pfn[j] = 0; } pin_done: @@ -632,7 +672,8 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data, dma = vfio_find_dma(iommu, iova, PAGE_SIZE); if (!dma) goto unpin_exit; - vfio_unpin_page_external(dma, iova, do_accounting); + vfio_unpin_page_external(dma, iova, do_accounting, + iommu->dirty_page_tracking); } unpin_exit: @@ -850,6 +891,88 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) return bitmap; } +/* + * start_iova is the reference from where bitmaping started. This is called + * from DMA_UNMAP where start_iova can be different than iova + */ + +static void vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, dma_addr_t iova, + size_t size, uint64_t pgsize, + dma_addr_t start_iova, unsigned long *bitmap) +{ + struct vfio_dma *dma; + dma_addr_t i = iova; + unsigned long pgshift = __ffs(pgsize); + + while ((dma = vfio_find_dma(iommu, i, pgsize))) { + /* mark all pages dirty if all pages are pinned and mapped. */ + if (dma->iommu_mapped) { + dma_addr_t iova_limit; + + iova_limit = (dma->iova + dma->size) < (iova + size) ? + (dma->iova + dma->size) : (iova + size); + + for (; i < iova_limit; i += pgsize) { + unsigned int start; + + start = (i - start_iova) >> pgshift; + + __bitmap_set(bitmap, start, 1); + } + if (i >= iova + size) + return; + } else { + struct rb_node *n = rb_first(&dma->pfn_list); + bool found = false; + + for (; n; n = rb_next(n)) { + struct vfio_pfn *vpfn = rb_entry(n, + struct vfio_pfn, node); + if (vpfn->iova >= i) { + found = true; + break; + } + } + + if (!found) { + i += dma->size; + continue; + } + + for (; n; n = rb_next(n)) { + unsigned int start; + struct vfio_pfn *vpfn = rb_entry(n, + struct vfio_pfn, node); + + if (vpfn->iova >= iova + size) + return; + + start = (vpfn->iova - start_iova) >> pgshift; + + __bitmap_set(bitmap, start, 1); + + i = vpfn->iova + pgsize; + } + } + vfio_remove_unpinned_from_pfn_list(dma, false); + } +} + +static long verify_bitmap_size(unsigned long npages, unsigned long bitmap_size) +{ + long bsize; + + if (!bitmap_size || bitmap_size > SIZE_MAX) + return -EINVAL; + + bsize = ALIGN(npages, BITS_PER_LONG) / sizeof(unsigned long); + + if (bitmap_size < bsize) + return -EINVAL; + + return bsize; +} + static int vfio_dma_do_unmap(struct vfio_iommu *iommu, struct vfio_iommu_type1_dma_unmap *unmap) { @@ -2297,6 +2420,83 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, return copy_to_user((void __user *)arg, &unmap, minsz) ? -EFAULT : 0; + } else if (cmd == VFIO_IOMMU_DIRTY_PAGES) { + struct vfio_iommu_type1_dirty_bitmap range; + uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START | + VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP | + VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP; + int ret; + + if (!iommu->v2) + return -EACCES; + + minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap, + bitmap); + + if (copy_from_user(&range, (void __user *)arg, minsz)) + return -EFAULT; + + if (range.argsz < minsz || range.flags & ~mask) + return -EINVAL; + + if (range.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_START) { + iommu->dirty_page_tracking = true; + return 0; + } else if (range.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP) { + iommu->dirty_page_tracking = false; + + mutex_lock(&iommu->lock); + vfio_remove_unpinned_from_dma_list(iommu); + mutex_unlock(&iommu->lock); + return 0; + + } else if (range.flags & + VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) { + uint64_t iommu_pgmask; + unsigned long pgshift = __ffs(range.pgsize); + unsigned long *bitmap; + long bsize; + + iommu_pgmask = + ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1; + + if (((range.pgsize - 1) & iommu_pgmask) != + (range.pgsize - 1)) + return -EINVAL; + + if (range.iova & iommu_pgmask) + return -EINVAL; + if (!range.size || range.size > SIZE_MAX) + return -EINVAL; + if (range.iova + range.size < range.iova) + return -EINVAL; + + bsize = verify_bitmap_size(range.size >> pgshift, + range.bitmap_size); + if (bsize < 0) + return ret; + + bitmap = kmalloc(bsize, GFP_KERNEL); + if (!bitmap) + return -ENOMEM; + + ret = copy_from_user(bitmap, + (void __user *)range.bitmap, bsize) ? -EFAULT : 0; + if (ret) + goto bitmap_exit; + + iommu->dirty_page_tracking = false; + mutex_lock(&iommu->lock); + vfio_iova_dirty_bitmap(iommu, range.iova, range.size, + range.pgsize, range.iova, bitmap); + mutex_unlock(&iommu->lock); + + ret = copy_to_user((void __user *)range.bitmap, bitmap, + range.bitmap_size) ? -EFAULT : 0; +bitmap_exit: + kfree(bitmap); + return ret; + } } return -ENOTTY; From patchwork Tue Dec 17 17:10:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11298349 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D0A00109A for ; Tue, 17 Dec 2019 17:46:47 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A73E3206D8 for ; Tue, 17 Dec 2019 17:46:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="d2A2YA8h" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A73E3206D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:44518 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihGvm-0002EC-Mz for patchwork-qemu-devel@patchwork.kernel.org; Tue, 17 Dec 2019 12:46:46 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:50072) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihGpP-0002A8-01 for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:40:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ihGpM-0001H5-G2 for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:40:10 -0500 Received: from hqnvemgate24.nvidia.com ([216.228.121.143]:13189) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ihGpM-0001EL-7O for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:40:08 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 17 Dec 2019 09:39:38 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Tue, 17 Dec 2019 09:40:06 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Tue, 17 Dec 2019 09:40:06 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 17 Dec 2019 17:40:06 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Tue, 17 Dec 2019 17:39:59 +0000 From: Kirti Wankhede To: , Subject: [PATCH v11 Kernel 4/6] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap Date: Tue, 17 Dec 2019 22:40:49 +0530 Message-ID: <1576602651-15430-5-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1576602651-15430-1-git-send-email-kwankhede@nvidia.com> References: <1576602651-15430-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1576604379; bh=2nMmRAaeH0WSsVoEecdWNddlInLw13rMje59Myo1b2U=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=d2A2YA8hbAtlWBG+5ogsn1ob7VoFncErN7DhNyrnfCbA3HbNEN4i2WY0xFnVBVETR 22sHsgUISiaj/g8KJdwIISixWYsn2LDvfEmAMJhgBbJPd3/iaJCBFZYzVcAOXPAKwo z9PZSDvVENuClSsUDdlTInYfIGR2d0gjOQ8nuFhQ3KjdQss448ESYwwht5ezsMd9ja B/3ti8uRxU/eeYStINGjPECtOt/dPLBa3s+pFop7ILnZw2CEqPnfCZf0y3IN+T4snJ pJTg8wZ+Dsp4Twe45jmcV1ftSz64KB5O1iiVd5YVDjaRMkBDcqhvSskmPiJVNfM8sK j6bxLGlyazLoA== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.143 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Pages, pinned by external interface for requested IO virtual address range, might get unpinned and unmapped while migration is active and device is still running, that is, in pre-copy phase while guest driver still could access those pages. Host device can write to these pages while those were mapped. Such pages should be marked dirty so that after migration guest driver should still be able to complete the operation. To get bitmap during unmap, user should set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP, bitmap memory should be allocated and zeroed by user space application. Bitmap size and page size should be set by user application. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio_iommu_type1.c | 63 ++++++++++++++++++++++++++++++++++++----- include/uapi/linux/vfio.h | 12 ++++++++ 2 files changed, 68 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 215aecb25453..101c2b1e72b4 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -974,7 +974,8 @@ static long verify_bitmap_size(unsigned long npages, unsigned long bitmap_size) } static int vfio_dma_do_unmap(struct vfio_iommu *iommu, - struct vfio_iommu_type1_dma_unmap *unmap) + struct vfio_iommu_type1_dma_unmap *unmap, + unsigned long *bitmap) { uint64_t mask; struct vfio_dma *dma, *dma_last = NULL; @@ -1049,6 +1050,15 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, if (dma->task->mm != current->mm) break; + if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) && + (dma_last != dma)) + vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size, + unmap->bitmap_pgsize, unmap->iova, + bitmap); + else + vfio_remove_unpinned_from_pfn_list(dma, true); + + if (!RB_EMPTY_ROOT(&dma->pfn_list)) { struct vfio_iommu_type1_dma_unmap nb_unmap; @@ -1074,6 +1084,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, &nb_unmap); goto again; } + unmapped += dma->size; vfio_remove_dma(iommu, dma); } @@ -2404,22 +2415,60 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, } else if (cmd == VFIO_IOMMU_UNMAP_DMA) { struct vfio_iommu_type1_dma_unmap unmap; - long ret; + unsigned long *bitmap = NULL; + long ret, bsize; minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size); - if (copy_from_user(&unmap, (void __user *)arg, minsz)) + if (copy_from_user(&unmap, (void __user *)arg, sizeof(unmap))) return -EFAULT; - if (unmap.argsz < minsz || unmap.flags) + if (unmap.argsz < minsz || + unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) return -EINVAL; - ret = vfio_dma_do_unmap(iommu, &unmap); + if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) { + unsigned long pgshift = __ffs(unmap.bitmap_pgsize); + uint64_t iommu_pgmask = + ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1; + + if (((unmap.bitmap_pgsize - 1) & iommu_pgmask) != + (unmap.bitmap_pgsize - 1)) + return -EINVAL; + + bsize = verify_bitmap_size(unmap.size >> pgshift, + unmap.bitmap_size); + if (bsize < 0) + return bsize; + + bitmap = kmalloc(bsize, GFP_KERNEL); + if (!bitmap) + return -ENOMEM; + + if (copy_from_user(bitmap, (void __user *)unmap.bitmap, + bsize)) { + ret = -EFAULT; + goto unmap_exit; + } + } + + ret = vfio_dma_do_unmap(iommu, &unmap, bitmap); if (ret) - return ret; + goto unmap_exit; - return copy_to_user((void __user *)arg, &unmap, minsz) ? + if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) { + if (copy_to_user((void __user *)unmap.bitmap, bitmap, + bsize)) { + ret = -EFAULT; + goto unmap_exit; + } + } + + ret = copy_to_user((void __user *)arg, &unmap, minsz) ? -EFAULT : 0; +unmap_exit: + kfree(bitmap); + return ret; } else if (cmd == VFIO_IOMMU_DIRTY_PAGES) { struct vfio_iommu_type1_dirty_bitmap range; uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START | diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 8268634e7e08..e8e044c4974d 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -964,12 +964,24 @@ struct vfio_iommu_type1_dma_map { * field. No guarantee is made to the user that arbitrary unmaps of iova * or size different from those used in the original mapping call will * succeed. + * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP should be set to get dirty bitmap + * before unmapping IO virtual addresses. When this flag is set, user should + * allocate memory to get bitmap, clear the bitmap memory by setting zero and + * should set size of allocated memory in bitmap_size field. One bit in bitmap + * represents per page , page of user provided page size in 'bitmap_pgsize', + * consecutively starting from iova offset. Bit set indicates page at that + * offset from iova is dirty. Bitmap of pages in the range of unmapped size is + * returned in bitmap. */ struct vfio_iommu_type1_dma_unmap { __u32 argsz; __u32 flags; +#define VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP (1 << 0) __u64 iova; /* IO virtual address */ __u64 size; /* Size of mapping (bytes) */ + __u64 bitmap_pgsize; /* page size for bitmap */ + __u64 bitmap_size; /* in bytes */ + void __user *bitmap; /* one bit per page */ }; #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14) From patchwork Tue Dec 17 17:10:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11298339 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8C7B7109A for ; Tue, 17 Dec 2019 17:44:11 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 62675206D8 for ; Tue, 17 Dec 2019 17:44:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="TG2F1iQC" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 62675206D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:44448 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihGtF-0007BO-Tr for patchwork-qemu-devel@patchwork.kernel.org; Tue, 17 Dec 2019 12:44:09 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:50162) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihGpU-0002Ky-CE for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:40:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ihGpT-0001XS-5P for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:40:16 -0500 Received: from hqnvemgate24.nvidia.com ([216.228.121.143]:13222) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ihGpS-0001VU-Vg for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:40:15 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 17 Dec 2019 09:39:45 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Tue, 17 Dec 2019 09:40:13 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Tue, 17 Dec 2019 09:40:13 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 17 Dec 2019 17:40:13 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Tue, 17 Dec 2019 17:40:06 +0000 From: Kirti Wankhede To: , Subject: [PATCH v11 Kernel 5/6] vfio iommu: Adds flag to indicate dirty pages tracking capability support Date: Tue, 17 Dec 2019 22:40:50 +0530 Message-ID: <1576602651-15430-6-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1576602651-15430-1-git-send-email-kwankhede@nvidia.com> References: <1576602651-15430-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1576604386; bh=DW9hT32ZjvMazv9WgrRRguhla9Bh9yIz00pg6aOHvIw=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=TG2F1iQCrprPx3nGleKa1GBWHb7R0zxaHPh/N648bFz2r6edKbSCnmQU0iOoRfagP JDglcexXTFO+/LYota7WBTWkx4a4D4xWKRH4XOB5DGBC/YgPEaVYAZVBUohexvosHp U1qNO4kOtD63/MPCmezY6AcbuE7Xubk/k33brGd6vsjDfoN16l5pSGB9HLYEv8ZyVS RksdQOAfuqjNDfF02srL8FNVUvmLsKkF9cs/Qky5yvuWcjDpzzdwYApJJFqq4jjHiB DHW4xqV0X2gTLOL7gEqcdXHrU5oxQ7gId4bFlZApSf87507h62siomf3N4PXyaeHo6 4c2wLgZXbWtOw== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.143 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Flag VFIO_IOMMU_INFO_DIRTY_PGS in VFIO_IOMMU_GET_INFO indicates that driver support dirty pages tracking. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio_iommu_type1.c | 3 ++- include/uapi/linux/vfio.h | 5 +++-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 101c2b1e72b4..68d8ed3b2665 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2368,7 +2368,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, info.cap_offset = 0; /* output, no-recopy necessary */ } - info.flags = VFIO_IOMMU_INFO_PGSIZES; + info.flags = VFIO_IOMMU_INFO_PGSIZES | + VFIO_IOMMU_INFO_DIRTY_PGS; info.iova_pgsizes = vfio_pgsize_bitmap(iommu); diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index e8e044c4974d..bdd07e8429e3 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -907,8 +907,9 @@ struct vfio_device_ioeventfd { struct vfio_iommu_type1_info { __u32 argsz; __u32 flags; -#define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ -#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */ +#define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ +#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */ +#define VFIO_IOMMU_INFO_DIRTY_PGS (1 << 2) /* supports dirty page tracking */ __u64 iova_pgsizes; /* Bitmap of supported page sizes */ __u32 cap_offset; /* Offset within info struct of first cap */ }; From patchwork Tue Dec 17 17:10:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11298351 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB8E61593 for ; Tue, 17 Dec 2019 17:47:18 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B1CAA206D8 for ; Tue, 17 Dec 2019 17:47:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="Trz1gCIL" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B1CAA206D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:44522 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihGwH-0002fC-NN for patchwork-qemu-devel@patchwork.kernel.org; Tue, 17 Dec 2019 12:47:17 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:50243) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihGpg-0002ah-0B for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:40:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ihGpa-0001g3-8X for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:40:26 -0500 Received: from hqnvemgate25.nvidia.com ([216.228.121.64]:11896) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ihGpa-0001fH-2T for qemu-devel@nongnu.org; Tue, 17 Dec 2019 12:40:22 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate25.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 17 Dec 2019 09:40:11 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Tue, 17 Dec 2019 09:40:20 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Tue, 17 Dec 2019 09:40:20 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 17 Dec 2019 17:40:20 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Tue, 17 Dec 2019 17:40:13 +0000 From: Kirti Wankhede To: , Subject: [PATCH v11 Kernel 6/6] vfio: Selective dirty page tracking if IOMMU backed device pins pages Date: Tue, 17 Dec 2019 22:40:51 +0530 Message-ID: <1576602651-15430-7-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1576602651-15430-1-git-send-email-kwankhede@nvidia.com> References: <1576602651-15430-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1576604411; bh=2oOKZISnlK+HtkceRMjiNJPC+PXUu6lxNvUL0/CXFX4=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=Trz1gCIL6LRVRcRFOVHvf9tto/vx0LXLrJwXmj6tyNt6tZ4QKCLW8Qprg1IGcoljP M8w93i70jhV+HUgKiFWm6P07fwxqz202YsraDB8KwN3cTQouAxAkiV/vG8BaCLHEXr iXTUgq3Lprm10u9mFLLrGy3b9LPJs9DLeLDVIl/Kn8DFmrQwOjSFHxGjc+r4MLYbmD P2kjstfqPEzKRQ8yhd5W9SfpGs7+qaV5/635oXi8J6K++3DXbaXmx1pv/hoU+VkF8z 0QxA5c3bs/G7lRMb3q47rPeJCEBhuxanGKHJgCMQcvyeWRI3i/KhHLSBAo5510hGHx P1jS7K0f6jODg== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.64 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Track dirty pages reporting capability for each vfio_device by setting the capability flag on calling vfio_pin_pages() for that device. In vfio_iommu_type1 module, while creating dirty pages bitmap, check if IOMMU backed device is present in the container. If IOMMU backed device is present in container then check dirty pages reporting capability for each vfio device in the container. If all vfio devices are capable of reporing dirty pages tracking by pinning pages through external API, then report create bitmap of pinned pages only. If IOMMU backed device is present in the container and any one device is not able to report dirty pages, then marked all pages as dirty. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio.c | 33 +++++++++++++++++++++++++++++++ drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++-- include/linux/vfio.h | 3 ++- 3 files changed, 77 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index c8482624ca34..9d2fbe09768a 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -96,6 +96,8 @@ struct vfio_device { struct vfio_group *group; struct list_head group_next; void *device_data; + /* dirty pages reporting capable */ + bool dirty_pages_cap; }; #ifdef CONFIG_VFIO_NOIOMMU @@ -1866,6 +1868,29 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs, } EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare); +int vfio_device_is_dirty_reporting_capable(struct device *dev, bool *cap) +{ + struct vfio_device *device; + struct vfio_group *group; + + if (!dev || !cap) + return -EINVAL; + + group = vfio_group_get_from_dev(dev); + if (!group) + return -ENODEV; + + device = vfio_group_get_device(group, dev); + if (!device) + return -ENODEV; + + *cap = device->dirty_pages_cap; + vfio_device_put(device); + vfio_group_put(group); + return 0; +} +EXPORT_SYMBOL(vfio_device_is_dirty_reporting_capable); + /* * Pin a set of guest PFNs and return their associated host PFNs for local * domain only. @@ -1907,6 +1932,14 @@ int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, int npage, else ret = -ENOTTY; + if (ret > 0) { + struct vfio_device *device = vfio_group_get_device(group, dev); + + if (device) { + device->dirty_pages_cap = true; + vfio_device_put(device); + } + } vfio_group_try_dissolve_container(group); err_pin_pages: diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 68d8ed3b2665..ef56f31f4e73 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -891,6 +891,39 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) return bitmap; } +static int vfio_is_dirty_pages_reporting_capable(struct device *dev, void *data) +{ + bool new; + int ret; + + ret = vfio_device_is_dirty_reporting_capable(dev, &new); + if (ret) + return ret; + + *(bool *)data = *(bool *)data && new; + + return 0; +} + +static bool vfio_dirty_pages_reporting_capable(struct vfio_iommu *iommu) +{ + struct vfio_domain *d; + struct vfio_group *g; + bool capable = true; + int ret; + + list_for_each_entry(d, &iommu->domain_list, next) { + list_for_each_entry(g, &d->group_list, next) { + ret = iommu_group_for_each_dev(g->iommu_group, &capable, + vfio_is_dirty_pages_reporting_capable); + if (ret) + return false; + } + } + + return capable; +} + /* * start_iova is the reference from where bitmaping started. This is called * from DMA_UNMAP where start_iova can be different than iova @@ -903,10 +936,17 @@ static void vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, dma_addr_t iova, struct vfio_dma *dma; dma_addr_t i = iova; unsigned long pgshift = __ffs(pgsize); + bool dirty_report_cap = true; + + if (IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) + dirty_report_cap = vfio_dirty_pages_reporting_capable(iommu); while ((dma = vfio_find_dma(iommu, i, pgsize))) { - /* mark all pages dirty if all pages are pinned and mapped. */ - if (dma->iommu_mapped) { + /* + * mark all pages dirty if any IOMMU capable device is not able + * to report dirty pages and all pages are pinned and mapped. + */ + if (!dirty_report_cap && dma->iommu_mapped) { dma_addr_t iova_limit; iova_limit = (dma->iova + dma->size) < (iova + size) ? diff --git a/include/linux/vfio.h b/include/linux/vfio.h index e42a711a2800..ed3832ea10a1 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -148,7 +148,8 @@ extern int vfio_info_add_capability(struct vfio_info_cap *caps, extern int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs, int max_irq_type, size_t *data_size); - +extern int vfio_device_is_dirty_reporting_capable(struct device *dev, + bool *cap); struct pci_dev; #if IS_ENABLED(CONFIG_VFIO_SPAPR_EEH) extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);