From patchwork Mon Feb 26 19:19:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 10243097 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7323C602A0 for ; Mon, 26 Feb 2018 19:19:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 647F62A2A6 for ; Mon, 26 Feb 2018 19:19:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 58E332A2B1; Mon, 26 Feb 2018 19:19:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E359F2A2A6 for ; Mon, 26 Feb 2018 19:19:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751568AbeBZTTd (ORCPT ); Mon, 26 Feb 2018 14:19:33 -0500 Received: from mx1.redhat.com ([209.132.183.28]:24686 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750941AbeBZTTc (ORCPT ); Mon, 26 Feb 2018 14:19:32 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id F05BB7B9B5; Mon, 26 Feb 2018 19:19:31 +0000 (UTC) Received: from w520.home (ovpn-117-203.phx2.redhat.com [10.3.117.203]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0D5225D70C; Mon, 26 Feb 2018 19:19:30 +0000 (UTC) Date: Mon, 26 Feb 2018 12:19:30 -0700 From: Alex Williamson To: jason Cc: pbonzini@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, gnehzuil@linux.alibaba.com Subject: Re: [RFC] vfio iommu type1: improve memory pinning process for raw PFN mapping Message-ID: <20180226121930.5e1f6300@w520.home> In-Reply-To: <7090CB2E-8D63-44B1-A739-932FFA649BC9@linux.alibaba.com> References: <7090CB2E-8D63-44B1-A739-932FFA649BC9@linux.alibaba.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 26 Feb 2018 19:19:32 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Sat, 24 Feb 2018 13:44:07 +0800 jason wrote: > When using vfio to pass through a PCIe device (e.g. a GPU card) that > has a huge BAR (e.g. 16GB), a lot of cycles are wasted on memory > pinning because PFNs of PCI BAR are not backed by struct page, and > the corresponding VMA has flags VM_IO|VM_PFNMAP. > > With this change, memory pinning process will firstly try to figure > out whether the corresponding region is a raw PFN mapping, and if so > it can skip unnecessary user memory pinning process. > > Even though it commes with a little overhead, finding vma and testing > flags, on each call, it can significantly improve VM's boot up time > when passing through devices via VFIO. Needs a Sign-off, see Documentation/process/submitting-patches.rst > --- > drivers/vfio/vfio_iommu_type1.c | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index e30e29ae4819..1a471ece3f9c 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -374,6 +374,24 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > return ret; > } > > +static int try_io_pfnmap(struct mm_struct *mm, unsigned long vaddr, long npage, > + unsigned long *pfn) > +{ > + struct vm_area_struct *vma; > + int pinned = 0; > + > + down_read(&mm->mmap_sem); > + vma = find_vma_intersection(mm, vaddr, vaddr + 1); > + if (vma && vma->vm_flags & (VM_IO | VM_PFNMAP)) { > + *pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; > + if (is_invalid_reserved_pfn(*pfn)) > + pinned = min(npage, (long)vma_pages(vma)); > + } > + up_read(&mm->mmap_sem); > + > + return pinned; > +} > + > /* > * Attempt to pin pages. We really don't want to track all the pfns and > * the iommu can only map chunks of consecutive pfns anyway, so get the > @@ -392,6 +410,10 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, > if (!current->mm) > return -ENODEV; > > + ret = try_io_pfnmap(current->mm, vaddr, npage, pfn_base); > + if (ret) > + return ret; > + > ret = vaddr_get_pfn(current->mm, vaddr, dma->prot, pfn_base); > if (ret) > return ret; I like the idea, but couldn't we integrated it better? For instance, does it really make sense to test for this first, the majority of users are going to have more regular mappings than PFNMAP mappings. If we were to do the above optimization, doesn't the rsvd bits in the remainder of the code become cruft? What if we optimized from the point where we test the return of vaddr_get_pfn() for a reserved/invalid page? Perhaps something like the below (untested, uncompiled) patch. Also curious why the above tests VM_IO|VM_PFNMAP while vaddr_get_pfn() only tests VM_PFNMAP, we should at least be consistent, but also correct the existing function if it's missing a case. Thanks, Alex diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index e113b2c43be2..425922393316 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -399,7 +399,6 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, { unsigned long pfn = 0; long ret, pinned = 0, lock_acct = 0; - bool rsvd; dma_addr_t iova = vaddr - dma->vaddr + dma->iova; /* This code path is only user initiated */ @@ -410,14 +409,23 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, if (ret) return ret; + if (is_invalid_reserved_pfn(*pfn_base)) { + struct vm_area_struct *vma; + + down_read(&mm->mmap_sem); + vma = find_vma_intersection(mm, vaddr, vaddr + 1); + pinned = min(npage, (long)vma_pages(vma)); + up_read(&mm->mmap_sem); + return pinned; + } + pinned++; - rsvd = is_invalid_reserved_pfn(*pfn_base); /* * Reserved pages aren't counted against the user, externally pinned * pages are already counted against the user. */ - if (!rsvd && !vfio_find_vpfn(dma, iova)) { + if (!vfio_find_vpfn(dma, iova)) { if (!lock_cap && current->mm->locked_vm + 1 > limit) { put_pfn(*pfn_base, dma->prot); pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", __func__, @@ -437,13 +445,12 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, if (ret) break; - if (pfn != *pfn_base + pinned || - rsvd != is_invalid_reserved_pfn(pfn)) { + if (pfn != *pfn_base + pinned) { put_pfn(pfn, dma->prot); break; } - if (!rsvd && !vfio_find_vpfn(dma, iova)) { + if (!vfio_find_vpfn(dma, iova)) { if (!lock_cap && current->mm->locked_vm + lock_acct + 1 > limit) { put_pfn(pfn, dma->prot); @@ -461,10 +468,8 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, unpin_out: if (ret) { - if (!rsvd) { - for (pfn = *pfn_base ; pinned ; pfn++, pinned--) - put_pfn(pfn, dma->prot); - } + for (pfn = *pfn_base ; pinned ; pfn++, pinned--) + put_pfn(pfn, dma->prot); return ret; }