From patchwork Mon Nov 5 16:55:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Jordan X-Patchwork-Id: 10668665 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8EC4814BD for ; Mon, 5 Nov 2018 16:56:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7BCCF293D9 for ; Mon, 5 Nov 2018 16:56:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6EB1B2946D; Mon, 5 Nov 2018 16:56:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9253D293D9 for ; Mon, 5 Nov 2018 16:56:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 33FE76B0272; Mon, 5 Nov 2018 11:56:41 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2C6EF6B0273; Mon, 5 Nov 2018 11:56:41 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18DFC6B0274; Mon, 5 Nov 2018 11:56:41 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-yb1-f199.google.com (mail-yb1-f199.google.com [209.85.219.199]) by kanga.kvack.org (Postfix) with ESMTP id D63046B0272 for ; Mon, 5 Nov 2018 11:56:40 -0500 (EST) Received: by mail-yb1-f199.google.com with SMTP id b129-v6so7800395ybg.7 for ; Mon, 05 Nov 2018 08:56:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=W/3M3WgNkyPY7/D1fTX9bMKrquODyVvPod8Snk5MTFI=; b=jP+os9NmHJAK1XWg1dSFIJAU+V+ifOcrSn6/Eo+qUAQ5W0lNP92AY9fV4k8/1bmAK6 7/4Jcrm9wDEGE/37ZDkQHUbwPGk5Xnfzy7mf3BGW6dYCmmques+OUUyRxlvrYseynWxW cL5h2J+iobcXmY3gAi75TeIXx9UVoLGR4ahXbrRNqCGbOXrcjkSoUFEd6jW5TMFR+fKA revLYtOF/VVoTer4VeEy2ORhwRRVQB3ZcQ6PGjmacKnyUVS0+ONqfmt3JTkGo29gkRcm 8UN7wwUR+dYLIijmqzoC/DWCWpE4sJZyROsAxpYgZPkIcAiXncySi/zrLrqy/sk/wEXu 5gxA== X-Gm-Message-State: AGRZ1gKg8jkVvAXneVGnP7gzEAg7u981yh2gyVqoHpD+DKPij2/Qo733 qgMwEZ9E45T1fsTGl+O+/Zt8fgfPm2IA+mQlYCXfQSXRrBXv69DnNMvKE+U+ZN9wVNm78ZxN/UX dpV/XA3/RrOdl2q9wMVZLqf7RsaG6evX7jTk+rdohk6Y8INn5XsG5MAIzHyBDb0nX2A== X-Received: by 2002:a81:350c:: with SMTP id c12-v6mr22207504ywa.342.1541437000567; Mon, 05 Nov 2018 08:56:40 -0800 (PST) X-Google-Smtp-Source: AJdET5eYt2d8lwIhWhOSCRFftmguqBkrFqkyxooqe4wsqByNo9YmlGgp20WO0iOfHhe4KbHte0UE X-Received: by 2002:a81:350c:: with SMTP id c12-v6mr22207445ywa.342.1541436999634; Mon, 05 Nov 2018 08:56:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541436999; cv=none; d=google.com; s=arc-20160816; b=DrnEswf+Y5M+jCAQMvrEj3zNnsmtW0PDcRvBU25htoL7HR36PySylXfdamfAFSB/G1 luWBpiAcS0jUiFyI/X+Fh4VZbTiAJDiizcpxVvUa2wnVG+hoLu1Wei8JKFGWEnUFkKoC agxb2yJK/u90bZR7hzG7/Z0EVVI8jsfCqMwLGy14LX3vp8mtLtWu6sMR0sv9Wrblk1D5 rk8xbpvYdRKl7LVD5f6T7GqVPYL1q4sTdeVAl+xc3z0B4iEtDxNozWkwvCkV1F5lFbm/ nO+gIG7EPbgPC7OicOt2wXoFhbsGkIGdmmkaIfCTW2SyxHGZzMM1Trrfutws8Uos/0LJ sdNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=W/3M3WgNkyPY7/D1fTX9bMKrquODyVvPod8Snk5MTFI=; b=TTXd+ilQLDFWTVaFvPG/Oag2RjbRSMQFCIuu+FWsybIhniFR6Lo62ILSgLSxjfVrQM zEpNfnDVzdwQfMfjCeNtCh2H4ifsYe8SMGjeNdRtSXqLSS2UhWFL9wPmE+R1FEatAFiz ssTyk38X/V+ARkfkMabPkxprYM0U6qB4iwSoVAWPg804jvZ1jL/w0uB0EpSXzkLlJMnl XWKv4vqOO77IliSjrwHflNCW8x1Z9mJy2FfbQLDgJNLqvGK3azkNfZ5dPgPOpRXho3TE S2OReP55hR+DYzc9ObsuFRq/ojK3r6brx8O9mXxgmQ3TsQa6e8rm/BSe8Ka/s1k++RdK 1bCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=qleIjIaY; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from userp2130.oracle.com (userp2130.oracle.com. [156.151.31.86]) by mx.google.com with ESMTPS id y2-v6si19163840ybi.420.2018.11.05.08.56.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 05 Nov 2018 08:56:39 -0800 (PST) Received-SPF: pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) client-ip=156.151.31.86; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=qleIjIaY; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id wA5GsADv097225; Mon, 5 Nov 2018 16:56:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2018-07-02; bh=W/3M3WgNkyPY7/D1fTX9bMKrquODyVvPod8Snk5MTFI=; b=qleIjIaYJDexWrd2rkhbiQVJrDE7fdLiJs5O9JaxGpJ1v+bzuJu5C6YgyaHn1KnTaX35 bwg0Zw+scNeWJC2Q2LA8tLzwOYwoWPjS7+Z36OEWqdsr0xd6BJ5CpqLNuusLaorsTmFU 1oInGkb/CSCy/rqjH/GRsKSCnstAQlQiMHWaPbGas0m77RdN6TS7lePa8Sn/bYowqmuk 6HqP7REWoucTlRd1X0rd7eH2n5cnxFPy6vin82CMBSiLhbOZVz2LnywpTFMa6C0Rr0Gi G1579hYh9MYMwuYWFfV8m5h2mrK0i8oiis619swGfHr7SspVZa6VbpkDV8V49Cv0dljf mQ== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2130.oracle.com with ESMTP id 2nh33tr6be-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 05 Nov 2018 16:56:23 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id wA5GuHRq022308 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 5 Nov 2018 16:56:17 GMT Received: from abhmp0006.oracle.com (abhmp0006.oracle.com [141.146.116.12]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id wA5GuGGF008403; Mon, 5 Nov 2018 16:56:16 GMT Received: from localhost.localdomain (/73.60.114.248) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 05 Nov 2018 08:56:15 -0800 From: Daniel Jordan To: linux-mm@kvack.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: aarcange@redhat.com, aaron.lu@intel.com, akpm@linux-foundation.org, alex.williamson@redhat.com, bsd@redhat.com, daniel.m.jordan@oracle.com, darrick.wong@oracle.com, dave.hansen@linux.intel.com, jgg@mellanox.com, jwadams@google.com, jiangshanlai@gmail.com, mhocko@kernel.org, mike.kravetz@oracle.com, Pavel.Tatashin@microsoft.com, prasad.singamsetty@oracle.com, rdunlap@infradead.org, steven.sistare@oracle.com, tim.c.chen@intel.com, tj@kernel.org, vbabka@suse.cz Subject: [RFC PATCH v4 06/13] vfio: parallelize vfio_pin_map_dma Date: Mon, 5 Nov 2018 11:55:51 -0500 Message-Id: <20181105165558.11698-7-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181105165558.11698-1-daniel.m.jordan@oracle.com> References: <20181105165558.11698-1-daniel.m.jordan@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9068 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1811050153 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When starting a large-memory kvm guest, it takes an excessively long time to start the boot process because qemu must pin all guest pages to accommodate DMA when VFIO is in use. Currently just one CPU is responsible for the page pinning, which usually boils down to page clearing time-wise, so the ways to optimize this are buying a faster CPU ;-) or using more of the CPUs you already have. Parallelize with ktask. Refactor so workqueue workers pin with the mm of the calling thread, and to enable an undo callback for ktask to handle errors during page pinning. Performance results appear later in the series. Signed-off-by: Daniel Jordan --- drivers/vfio/vfio_iommu_type1.c | 106 +++++++++++++++++++++++--------- 1 file changed, 76 insertions(+), 30 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index d9fd3188615d..e7cfbf0c8071 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -41,6 +41,7 @@ #include #include #include +#include #define DRIVER_VERSION "0.2" #define DRIVER_AUTHOR "Alex Williamson " @@ -395,7 +396,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, */ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, long npage, unsigned long *pfn_base, - unsigned long limit) + unsigned long limit, struct mm_struct *mm) { unsigned long pfn = 0; long ret, pinned = 0, lock_acct = 0; @@ -403,10 +404,10 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, dma_addr_t iova = vaddr - dma->vaddr + dma->iova; /* This code path is only user initiated */ - if (!current->mm) + if (!mm) return -ENODEV; - ret = vaddr_get_pfn(current->mm, vaddr, dma->prot, pfn_base); + ret = vaddr_get_pfn(mm, vaddr, dma->prot, pfn_base); if (ret) return ret; @@ -418,7 +419,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, * pages are already counted against the user. */ if (!rsvd && !vfio_find_vpfn(dma, iova)) { - if (!dma->lock_cap && current->mm->locked_vm + 1 > limit) { + if (!dma->lock_cap && mm->locked_vm + 1 > limit) { put_pfn(*pfn_base, dma->prot); pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", __func__, limit << PAGE_SHIFT); @@ -433,7 +434,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, /* Lock all the consecutive pages from pfn_base */ for (vaddr += PAGE_SIZE, iova += PAGE_SIZE; pinned < npage; pinned++, vaddr += PAGE_SIZE, iova += PAGE_SIZE) { - ret = vaddr_get_pfn(current->mm, vaddr, dma->prot, &pfn); + ret = vaddr_get_pfn(mm, vaddr, dma->prot, &pfn); if (ret) break; @@ -445,7 +446,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, if (!rsvd && !vfio_find_vpfn(dma, iova)) { if (!dma->lock_cap && - current->mm->locked_vm + lock_acct + 1 > limit) { + mm->locked_vm + lock_acct + 1 > limit) { put_pfn(pfn, dma->prot); pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", __func__, limit << PAGE_SHIFT); @@ -752,15 +753,15 @@ static size_t unmap_unpin_slow(struct vfio_domain *domain, } static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, + dma_addr_t iova, dma_addr_t end, bool do_accounting) { - dma_addr_t iova = dma->iova, end = dma->iova + dma->size; struct vfio_domain *domain, *d; LIST_HEAD(unmapped_region_list); int unmapped_region_cnt = 0; long unlocked = 0; - if (!dma->size) + if (iova == end) return 0; if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) @@ -777,7 +778,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, struct vfio_domain, next); list_for_each_entry_continue(d, &iommu->domain_list, next) { - iommu_unmap(d->domain, dma->iova, dma->size); + iommu_unmap(d->domain, iova, end - iova); cond_resched(); } @@ -818,8 +819,6 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, } } - dma->iommu_mapped = false; - if (unmapped_region_cnt) unlocked += vfio_sync_unpin(dma, domain, &unmapped_region_list); @@ -830,14 +829,21 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, return unlocked; } -static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) +static void vfio_remove_dma_finish(struct vfio_iommu *iommu, + struct vfio_dma *dma) { - vfio_unmap_unpin(iommu, dma, true); + dma->iommu_mapped = false; vfio_unlink_dma(iommu, dma); put_task_struct(dma->task); kfree(dma); } +static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) +{ + vfio_unmap_unpin(iommu, dma, dma->iova, dma->iova + dma->size, true); + vfio_remove_dma_finish(iommu, dma); +} + static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) { struct vfio_domain *domain; @@ -1031,20 +1037,29 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova, return ret; } -static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma, - size_t map_size) +struct vfio_pin_args { + struct vfio_iommu *iommu; + struct vfio_dma *dma; + unsigned long limit; + struct mm_struct *mm; +}; + +static int vfio_pin_map_dma_chunk(unsigned long start_vaddr, + unsigned long end_vaddr, + struct vfio_pin_args *args) { - dma_addr_t iova = dma->iova; - unsigned long vaddr = dma->vaddr; - size_t size = map_size; + struct vfio_dma *dma = args->dma; + dma_addr_t iova = dma->iova + (start_vaddr - dma->vaddr); + unsigned long unmapped_size = end_vaddr - start_vaddr; + unsigned long pfn, mapped_size = 0; long npage; - unsigned long pfn, limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; int ret = 0; - while (size) { + while (unmapped_size) { /* Pin a contiguous chunk of memory */ - npage = vfio_pin_pages_remote(dma, vaddr + dma->size, - size >> PAGE_SHIFT, &pfn, limit); + npage = vfio_pin_pages_remote(dma, start_vaddr + mapped_size, + unmapped_size >> PAGE_SHIFT, + &pfn, args->limit, args->mm); if (npage <= 0) { WARN_ON(!npage); ret = (int)npage; @@ -1052,22 +1067,50 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma, } /* Map it! */ - ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage, - dma->prot); + ret = vfio_iommu_map(args->iommu, iova + mapped_size, pfn, + npage, dma->prot); if (ret) { - vfio_unpin_pages_remote(dma, iova + dma->size, pfn, + vfio_unpin_pages_remote(dma, iova + mapped_size, pfn, npage, true); break; } - size -= npage << PAGE_SHIFT; - dma->size += npage << PAGE_SHIFT; + unmapped_size -= npage << PAGE_SHIFT; + mapped_size += npage << PAGE_SHIFT; } + return (ret == 0) ? KTASK_RETURN_SUCCESS : ret; +} + +static void vfio_pin_map_dma_undo(unsigned long start_vaddr, + unsigned long end_vaddr, + struct vfio_pin_args *args) +{ + struct vfio_dma *dma = args->dma; + dma_addr_t iova = dma->iova + (start_vaddr - dma->vaddr); + dma_addr_t end = dma->iova + (end_vaddr - dma->vaddr); + + vfio_unmap_unpin(args->iommu, args->dma, iova, end, true); +} + +static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma, + size_t map_size) +{ + unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + int ret = 0; + struct vfio_pin_args args = { iommu, dma, limit, current->mm }; + /* Stay on PMD boundary in case THP is being used. */ + DEFINE_KTASK_CTL(ctl, vfio_pin_map_dma_chunk, &args, PMD_SIZE); + + ktask_ctl_set_undo_func(&ctl, vfio_pin_map_dma_undo); + ret = ktask_run((void *)dma->vaddr, map_size, &ctl); + dma->iommu_mapped = true; if (ret) - vfio_remove_dma(iommu, dma); + vfio_remove_dma_finish(iommu, dma); + else + dma->size += map_size; return ret; } @@ -1229,7 +1272,8 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu, npage = vfio_pin_pages_remote(dma, vaddr, n >> PAGE_SHIFT, - &pfn, limit); + &pfn, limit, + current->mm); if (npage <= 0) { WARN_ON(!npage); ret = (int)npage; @@ -1497,7 +1541,9 @@ static void vfio_iommu_unmap_unpin_reaccount(struct vfio_iommu *iommu) long locked = 0, unlocked = 0; dma = rb_entry(n, struct vfio_dma, node); - unlocked += vfio_unmap_unpin(iommu, dma, false); + unlocked += vfio_unmap_unpin(iommu, dma, dma->iova, + dma->iova + dma->size, false); + dma->iommu_mapped = false; p = rb_first(&dma->pfn_list); for (; p; p = rb_next(p)) { struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,