From patchwork Tue Jul 18 08:26:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kasireddy, Vivek" X-Patchwork-Id: 13316893 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 15B83EB64DA for ; Tue, 18 Jul 2023 08:47:02 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 35DA610E189; Tue, 18 Jul 2023 08:47:01 +0000 (UTC) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTPS id C041E10E300 for ; Tue, 18 Jul 2023 08:46:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689670019; x=1721206019; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fBC94lrie8PzdotV65iT7IktTjrsFKLJMPmD1wjnkfY=; b=ZChVBoXipFuj99wKItZQubC5hAzPVcst43QGzc/PuV2uGmyNjor6r25V qlF+xclCB5OjCvZm613WTTk1ywTJdYthRMOk8GodzoCNU9JjlWDT2mEro 1aokg1+sIqifcFshH8C3eW2pyZWMSwHyPusAD0FHA0b3gmqv+iNxyG7xF 98SKPi4/x2Tyrij5pNUE11x4vMEPvWPxpphn8xrboKvSkWuKlt6Fae0Al 46PoE1gm1O7rR9ooox2v0oXL1A/yqIptTYiOEVzIT6V4UN7zjsOdt8nbz LnHuGSWf/3m2V3NK8Y/+Zre0qQ/1vVk2dSlPzXjXqU3dL709JmGaz9PYA A==; X-IronPort-AV: E=McAfee;i="6600,9927,10774"; a="366191290" X-IronPort-AV: E=Sophos;i="6.01,213,1684825200"; d="scan'208";a="366191290" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 01:46:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10774"; a="723532670" X-IronPort-AV: E=Sophos;i="6.01,213,1684825200"; d="scan'208";a="723532670" Received: from vkasired-desk2.fm.intel.com ([10.105.128.127]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 01:46:58 -0700 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Subject: [PATCH v2 1/2] udmabuf: Use vmf_insert_pfn and VM_PFNMAP for handling mmap Date: Tue, 18 Jul 2023 01:26:04 -0700 Message-Id: <20230718082605.1570740-2-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230718082605.1570740-1-vivek.kasireddy@intel.com> References: <20230718082605.1570740-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dongwon Kim , David Hildenbrand , Junxiao Chang , Hugh Dickins , Vivek Kasireddy , Peter Xu , Gerd Hoffmann , Jason Gunthorpe , Mike Kravetz Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add VM_PFNMAP to vm_flags in the mmap handler to ensure that the mappings would be managed without using struct page. And, in the vm_fault handler, use vmf_insert_pfn to share the page's pfn to userspace instead of directly sharing the page (via struct page *). Cc: David Hildenbrand Cc: Mike Kravetz Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Junxiao Chang Suggested-by: David Hildenbrand Signed-off-by: Vivek Kasireddy Acked-by: David Hildenbrand --- drivers/dma-buf/udmabuf.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index c40645999648..820c993c8659 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -35,12 +35,13 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) struct vm_area_struct *vma = vmf->vma; struct udmabuf *ubuf = vma->vm_private_data; pgoff_t pgoff = vmf->pgoff; + unsigned long pfn; if (pgoff >= ubuf->pagecount) return VM_FAULT_SIGBUS; - vmf->page = ubuf->pages[pgoff]; - get_page(vmf->page); - return 0; + + pfn = page_to_pfn(ubuf->pages[pgoff]); + return vmf_insert_pfn(vma, vmf->address, pfn); } static const struct vm_operations_struct udmabuf_vm_ops = { @@ -56,6 +57,7 @@ static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma) vma->vm_ops = &udmabuf_vm_ops; vma->vm_private_data = ubuf; + vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); return 0; } From patchwork Tue Jul 18 08:26:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kasireddy, Vivek" X-Patchwork-Id: 13316894 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8FAFAEB64DC for ; Tue, 18 Jul 2023 08:47:06 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id AE2C810E191; Tue, 18 Jul 2023 08:47:01 +0000 (UTC) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTPS id DA57A10E189 for ; Tue, 18 Jul 2023 08:46:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689670019; x=1721206019; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=c4zsLSpWhuDTAmH1NCTDqq7Se/R6/CqwGrxTZX4UJNA=; b=EKXOFEPLeAG9uW9AEu0+abiOvrAb2wynRR6QWJb9YzWGogyq2yapR0RI bDGlRvdr3nLRiB5k1T9PlzcPqIdpaKUt3L6AaAXxNEB1fGybyoUJ5Chhj 02K0WHOUmxhyPJDUSgWvFK35I3NCdUU+ZNCMKIN28h/IwOALqEtsGqx+E 4Q8oL9cy60S5bEsF/Uo1SeksAhKunLqAj06YmYDX8yTsBVuyAnB2DXqJY VL7yjZOAL+Pa3frLoC5+Ur5ZnpLjf8fuaGdwV1opvpNedbMLXvZvNm1Jy m56/3qU3BFIE5+rQ4qmdtO20qITWPPV1u3A6DDQazSRw5SGGbsKsFQYH5 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10774"; a="366191298" X-IronPort-AV: E=Sophos;i="6.01,213,1684825200"; d="scan'208";a="366191298" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 01:46:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10774"; a="723532674" X-IronPort-AV: E=Sophos;i="6.01,213,1684825200"; d="scan'208";a="723532674" Received: from vkasired-desk2.fm.intel.com ([10.105.128.127]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 01:46:58 -0700 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Subject: [PATCH v2 2/2] udmabuf: Add back support for mapping hugetlb pages (v2) Date: Tue, 18 Jul 2023 01:26:05 -0700 Message-Id: <20230718082605.1570740-3-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230718082605.1570740-1-vivek.kasireddy@intel.com> References: <20230718082605.1570740-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dongwon Kim , David Hildenbrand , Junxiao Chang , Hugh Dickins , Vivek Kasireddy , Peter Xu , Gerd Hoffmann , Jason Gunthorpe , Mike Kravetz Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" A user or admin can configure a VMM (Qemu) Guest's memory to be backed by hugetlb pages for various reasons. However, a Guest OS would still allocate (and pin) buffers that are backed by regular 4k sized pages. In order to map these buffers and create dma-bufs for them on the Host, we first need to find the hugetlb pages where the buffer allocations are located and then determine the offsets of individual chunks (within those pages) and use this information to eventually populate a scatterlist. Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500 options were passed to the Host kernel and Qemu was launched with these relevant options: qemu-system-x86_64 -m 4096m.... -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080 -display gtk,gl=on -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M -machine memory-backend=mem1 Replacing -display gtk,gl=on with -display gtk,gl=off above would exercise the mmap handler. v2: Updated get_sg_table() to manually populate the scatterlist for both huge page and non-huge-page cases. Cc: David Hildenbrand Cc: Mike Kravetz Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Junxiao Chang Signed-off-by: Vivek Kasireddy Acked-by: Mike Kravetz --- drivers/dma-buf/udmabuf.c | 84 +++++++++++++++++++++++++++++++++------ 1 file changed, 71 insertions(+), 13 deletions(-) diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 820c993c8659..10c47bf77fb5 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -28,6 +29,7 @@ struct udmabuf { struct page **pages; struct sg_table *sg; struct miscdevice *device; + pgoff_t *offsets; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -41,6 +43,10 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) return VM_FAULT_SIGBUS; pfn = page_to_pfn(ubuf->pages[pgoff]); + if (ubuf->offsets) { + pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; + } + return vmf_insert_pfn(vma, vmf->address, pfn); } @@ -90,23 +96,31 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, { struct udmabuf *ubuf = buf->priv; struct sg_table *sg; + struct scatterlist *sgl; + pgoff_t offset; + unsigned long i = 0; int ret; sg = kzalloc(sizeof(*sg), GFP_KERNEL); if (!sg) return ERR_PTR(-ENOMEM); - ret = sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount, - 0, ubuf->pagecount << PAGE_SHIFT, - GFP_KERNEL); + + ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL); if (ret < 0) - goto err; + goto err_alloc; + + for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) { + offset = ubuf->offsets ? ubuf->offsets[i] : 0; + sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, offset); + } ret = dma_map_sgtable(dev, sg, direction, 0); if (ret < 0) - goto err; + goto err_map; return sg; -err: +err_map: sg_free_table(sg); +err_alloc: kfree(sg); return ERR_PTR(ret); } @@ -143,6 +157,7 @@ static void release_udmabuf(struct dma_buf *buf) for (pg = 0; pg < ubuf->pagecount; pg++) put_page(ubuf->pages[pg]); + kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); } @@ -206,7 +221,9 @@ static long udmabuf_create(struct miscdevice *device, struct udmabuf *ubuf; struct dma_buf *buf; pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit; - struct page *page; + struct page *page, *hpage = NULL; + pgoff_t hpoff, chunkoff, maxchunks; + struct hstate *hpstate; int seals, ret = -EINVAL; u32 i, flags; @@ -242,7 +259,7 @@ static long udmabuf_create(struct miscdevice *device, if (!memfd) goto err; mapping = memfd->f_mapping; - if (!shmem_mapping(mapping)) + if (!shmem_mapping(mapping) && !is_file_hugepages(memfd)) goto err; seals = memfd_fcntl(memfd, F_GET_SEALS, 0); if (seals == -EINVAL) @@ -253,16 +270,56 @@ static long udmabuf_create(struct miscdevice *device, goto err; pgoff = list[i].offset >> PAGE_SHIFT; pgcnt = list[i].size >> PAGE_SHIFT; + if (is_file_hugepages(memfd)) { + if (!ubuf->offsets) { + ubuf->offsets = kmalloc_array(ubuf->pagecount, + sizeof(*ubuf->offsets), + GFP_KERNEL); + if (!ubuf->offsets) { + ret = -ENOMEM; + goto err; + } + } + hpstate = hstate_file(memfd); + hpoff = list[i].offset >> huge_page_shift(hpstate); + chunkoff = (list[i].offset & + ~huge_page_mask(hpstate)) >> PAGE_SHIFT; + maxchunks = huge_page_size(hpstate) >> PAGE_SHIFT; + } for (pgidx = 0; pgidx < pgcnt; pgidx++) { - page = shmem_read_mapping_page(mapping, pgoff + pgidx); - if (IS_ERR(page)) { - ret = PTR_ERR(page); - goto err; + if (is_file_hugepages(memfd)) { + if (!hpage) { + hpage = find_get_page_flags(mapping, hpoff, + FGP_ACCESSED); + if (!hpage) { + ret = -EINVAL; + goto err; + } + } + get_page(hpage); + ubuf->pages[pgbuf] = hpage; + ubuf->offsets[pgbuf++] = chunkoff << PAGE_SHIFT; + if (++chunkoff == maxchunks) { + put_page(hpage); + hpage = NULL; + chunkoff = 0; + hpoff++; + } + } else { + page = shmem_read_mapping_page(mapping, pgoff + pgidx); + if (IS_ERR(page)) { + ret = PTR_ERR(page); + goto err; + } + ubuf->pages[pgbuf++] = page; } - ubuf->pages[pgbuf++] = page; } fput(memfd); memfd = NULL; + if (hpage) { + put_page(hpage); + hpage = NULL; + } } exp_info.ops = &udmabuf_ops; @@ -287,6 +344,7 @@ static long udmabuf_create(struct miscdevice *device, put_page(ubuf->pages[--pgbuf]); if (memfd) fput(memfd); + kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); return ret;