From patchwork Wed Oct 28 17:37:38 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alex Williamson <alex.williamson@redhat.com>
X-Patchwork-Id: 7513511
Return-Path: 
 <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org>
X-Original-To: patchwork-linux-arm@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 45BCA9F327
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Wed, 28 Oct 2015 17:39:43 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 273AD208C0
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Wed, 28 Oct 2015 17:39:42 +0000 (UTC)
Received: from bombadil.infradead.org (bombadil.infradead.org
	[198.137.202.9])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 107FF208B1
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Wed, 28 Oct 2015 17:39:41 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux))
	id 1ZrUfr-0006F1-68; Wed, 28 Oct 2015 17:38:11 +0000
Received: from mx1.redhat.com ([209.132.183.28])
	by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat
	Linux)) id 1ZrUfg-0005x5-OD
	for linux-arm-kernel@lists.infradead.org;
	Wed, 28 Oct 2015 17:38:01 +0000
Received: from int-mx11.intmail.prod.int.phx2.redhat.com
	(int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24])
	by mx1.redhat.com (Postfix) with ESMTPS id 24BFB8C1C0;
	Wed, 28 Oct 2015 17:37:40 +0000 (UTC)
Received: from ul30vt.home (ovpn-113-58.phx2.redhat.com [10.3.113.58])
	by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with
	ESMTP id t9SHbdWa004068; Wed, 28 Oct 2015 13:37:39 -0400
Message-ID: <1446053858.8018.406.camel@redhat.com>
Subject: Re: [RFC] vfio/type1: handle case where IOMMU does not support
	PAGE_SIZE size
From: Alex Williamson <alex.williamson@redhat.com>
To: Eric Auger <eric.auger@linaro.org>
Date: Wed, 28 Oct 2015 11:37:38 -0600
In-Reply-To: <563101A0.7020404@linaro.org>
References: <1446037965-2341-1-git-send-email-eric.auger@linaro.org>
	<1446049648.8018.397.camel@redhat.com> <563101A0.7020404@linaro.org>
Mime-Version: 1.0
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20151028_103800_964194_DCBFB0A0 
X-CRM114-Status: GOOD (  34.34  )
X-Spam-Score: -6.9 (------)
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: 
 <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: 
 <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Cc: eric.auger@st.com, kvm@vger.kernel.org, patches@linaro.org,
	will.deacon@arm.com, linux-kernel@vger.kernel.org,
	christoffer.dall@linaro.org,
	suravee.suthikulpanit@amd.com, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: 
 linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED,
	RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

On Wed, 2015-10-28 at 18:10 +0100, Eric Auger wrote:
> Hi Alex,
> On 10/28/2015 05:27 PM, Alex Williamson wrote:
> > On Wed, 2015-10-28 at 13:12 +0000, Eric Auger wrote:
> >> Current vfio_pgsize_bitmap code hides the supported IOMMU page
> >> sizes smaller than PAGE_SIZE. As a result, in case the IOMMU
> >> does not support PAGE_SIZE page, the alignment check on map/unmap
> >> is done with larger page sizes, if any. This can fail although
> >> mapping could be done with pages smaller than PAGE_SIZE.
> >>
> >> vfio_pgsize_bitmap is modified to expose the IOMMU page sizes,
> >> supported by all domains, even those smaller than PAGE_SIZE. The
> >> alignment check on map is performed against PAGE_SIZE if the minimum
> >> IOMMU size is less than PAGE_SIZE or against the min page size greater
> >> than PAGE_SIZE.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >>
> >> ---
> >>
> >> This was tested on AMD Seattle with 64kB page host. ARM MMU 401
> >> currently expose 4kB, 2MB and 1GB page support. With a 64kB page host,
> >> the map/unmap check is done against 2MB. Some alignment check fail
> >> so VFIO_IOMMU_MAP_DMA fail while we could map using 4kB IOMMU page
> >> size.
> >> ---
> >>  drivers/vfio/vfio_iommu_type1.c | 25 +++++++++++--------------
> >>  1 file changed, 11 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >> index 57d8c37..13fb974 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -403,7 +403,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> >>  static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
> >>  {
> >>  	struct vfio_domain *domain;
> >> -	unsigned long bitmap = PAGE_MASK;
> >> +	unsigned long bitmap = ULONG_MAX;
> > 
> > Isn't this and removing the WARN_ON()s the only real change in this
> > patch?  The rest looks like conversion to use IS_ALIGNED and the
> > following test, that I don't really understand...
> Yes basically you're right.


Ok, so with hopefully correcting my understand of what this does, isn't
this effectively the same:

This would also expose to the user that we're accepting PAGE_SIZE, which
we weren't before, so it was not quite right to just let them do it
anyway.  I don't think we even need to get rid of the WARN_ONs, do we?
Thanks,

Alex

> > 
> >>  
> >>  	mutex_lock(&iommu->lock);
> >>  	list_for_each_entry(domain, &iommu->domain_list, next)
> >> @@ -416,20 +416,18 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
> >>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> >>  			     struct vfio_iommu_type1_dma_unmap *unmap)
> >>  {
> >> -	uint64_t mask;
> >>  	struct vfio_dma *dma;
> >>  	size_t unmapped = 0;
> >>  	int ret = 0;
> >> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
> >> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
> >> +						PAGE_SIZE : min_pagesz;
> > 
> > This one.  If we're going to support sub-PAGE_SIZE mappings, why do we
> > care to cap alignment at PAGE_SIZE?
> My intent in this patch isn't to allow the user-space to map/unmap
> sub-PAGE_SIZE buffers. The new test makes sure the mapped area is bigger
> or equal than a host page whatever the supported page sizes.
> 
> I noticed that chunk construction, pinning and other many things are
> based on PAGE_SIZE and far be it from me to change that code! I want to
> keep that minimal granularity for all those computation.
> 
> However on iommu side, I would like to rely on the fact the iommu driver
> is clever enough to choose the right page size and even to choose a size
> that is smaller than PAGE_SIZE if this latter is not supported.
> > 
> >> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> >> -
> >> -	if (unmap->iova & mask)
> >> +	if (!IS_ALIGNED(unmap->iova, requested_alignment))
> >>  		return -EINVAL;
> >> -	if (!unmap->size || unmap->size & mask)
> >> +	if (!unmap->size || !IS_ALIGNED(unmap->size, requested_alignment))
> >>  		return -EINVAL;
> >>  
> >> -	WARN_ON(mask & PAGE_MASK);
> >> -
> >>  	mutex_lock(&iommu->lock);
> >>  
> >>  	/*
> >> @@ -553,25 +551,24 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >>  	size_t size = map->size;
> >>  	long npage;
> >>  	int ret = 0, prot = 0;
> >> -	uint64_t mask;
> >>  	struct vfio_dma *dma;
> >>  	unsigned long pfn;
> >> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
> >> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
> >> +						PAGE_SIZE : min_pagesz;
> >>  
> >>  	/* Verify that none of our __u64 fields overflow */
> >>  	if (map->size != size || map->vaddr != vaddr || map->iova != iova)
> >>  		return -EINVAL;
> >>  
> >> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> >> -
> >> -	WARN_ON(mask & PAGE_MASK);
> >> -
> >>  	/* READ/WRITE from device perspective */
> >>  	if (map->flags & VFIO_DMA_MAP_FLAG_WRITE)
> >>  		prot |= IOMMU_WRITE;
> >>  	if (map->flags & VFIO_DMA_MAP_FLAG_READ)
> >>  		prot |= IOMMU_READ;
> >>  
> >> -	if (!prot || !size || (size | iova | vaddr) & mask)
> >> +	if (!prot || !size ||
> >> +		!IS_ALIGNED(size | iova | vaddr, requested_alignment))
> >>  		return -EINVAL;
> >>  
> >>  	/* Don't allow IOVA or virtual address wrap */
> > 
> > This is mostly ignoring the problems with sub-PAGE_SIZE mappings.  For
> > instance, we can only pin on PAGE_SIZE and therefore we only do
> > accounting on PAGE_SIZE, so if the user does 4K mappings across your 64K
> > page, that page gets pinned and accounted 16 times.  Are we going to
> > tell users that their locked memory limit needs to be 16x now?  The rest
> > of the code would need an audit as well to see what other sub-page bugs
> > might be hiding.  Thanks,
> So if the user is not allowed to map sub-PAGE_SIZE buffers, accounting
> still is based on PAGE_SIZE while iommu mapping can be based on
> sub-PAGE_SIZE pages. I am misunderstanding something?
> 
> Best Regards
> 
> Eric
> > 
> > Alex
> > 
> > 
> > 
>

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 57d8c37..7db4f5a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -403,13 +403,19 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, stru
 static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
 {
        struct vfio_domain *domain;
-       unsigned long bitmap = PAGE_MASK;
+       unsigned long bitmap = ULONG_MAX;
 
        mutex_lock(&iommu->lock);
        list_for_each_entry(domain, &iommu->domain_list, next)
                bitmap &= domain->domain->ops->pgsize_bitmap;
        mutex_unlock(&iommu->lock);
 
+       /* Some comment about how the IOMMU API splits requests */
+       if (bitmap & ~PAGE_MASK) {
+               bitmap &= PAGE_MASK;
+               bitmap |= PAGE_SIZE;
+       }
+
        return bitmap;
 }