From patchwork Tue Dec 5 05:35:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kasireddy, Vivek" X-Patchwork-Id: 13479377 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B155AC4167B for ; Tue, 5 Dec 2023 05:58:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 079796B0078; Tue, 5 Dec 2023 00:58:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 02A9C6B007D; Tue, 5 Dec 2023 00:58:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E10A06B007E; Tue, 5 Dec 2023 00:58:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D293B6B0078 for ; Tue, 5 Dec 2023 00:58:57 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id AB519C0193 for ; Tue, 5 Dec 2023 05:58:57 +0000 (UTC) X-FDA: 81531711114.23.53AB1A9 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) by imf29.hostedemail.com (Postfix) with ESMTP id AC43712000A for ; Tue, 5 Dec 2023 05:58:55 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="ZmN/q31I"; spf=pass (imf29.hostedemail.com: domain of vivek.kasireddy@intel.com designates 198.175.65.12 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701755935; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RrTdeeMeNhhxZWbLlvb86lFm3325eG3oH7NhQhyl9gA=; b=J6e9jkq3yFiRAftBQGQ6AUlw4wmvJBJMyIYVpu3fJQsqK+uvbtqEDdkGrjkLurU0+UaX/J iCxbiFU3P7HACzzZD/o7JpmAyMI4afxRny+5kCHnHIiM2YVZnEopNoFxIDeFo1704NLqy6 hm38kd2Z7XcrI2+kawTzvD02hHRG93g= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701755935; a=rsa-sha256; cv=none; b=4DaFWaTG79nn7FTx4Kc7U06Kd/wgBXsS0PDVrY9BE4C76UYuomL/CAE6cL5cpA5mDKKWWH VbEigQc4uo1GNsSFB3nqZm8ocjAICGtXmY2L7RLqujrwNLhuHcc/iRjfXbx05VOT2SQvqz 7lTofJCq0h191Fd7JW+QGBXPE6Lr2lY= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="ZmN/q31I"; spf=pass (imf29.hostedemail.com: domain of vivek.kasireddy@intel.com designates 198.175.65.12 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701755936; x=1733291936; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ObYOQ8MIUmEaPxTiLC5uRoa6H7+5D11n4MFPzMj/Kdc=; b=ZmN/q31IbLOii491z+9awQXEKPSN1Fi0k9jrpYQZpytxtE2cnXbNlqhp XorzXB/rpw2FoNLlJh6x/NmregP2nJT9nQolhXgwhrUHEAISWDhNEs5cZ Ndtjow8uE0dRSeYtPBUIY36+gT0waCB9xW83DwiToOPclyNP0QHlF66CJ obTIyR0orzp74l047wzYVWmeayoyFxINSWbAA1xm6POHdNOJEAHZ66mlL /Y327dGXaDwUsAb2SIJljsSJsvKP3US0AOB61CjwLTAg4GUUSEBgWJs2R 0v076KrU/9HDciK9lttoa1HyvKErRMZQMSAFCoC+l+zO0uXWRLljNayAT g==; X-IronPort-AV: E=McAfee;i="6600,9927,10914"; a="906313" X-IronPort-AV: E=Sophos;i="6.04,251,1695711600"; d="scan'208";a="906313" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2023 21:58:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10914"; a="888807727" X-IronPort-AV: E=Sophos;i="6.04,251,1695711600"; d="scan'208";a="888807727" Received: from vkasired-desk2.fm.intel.com ([10.105.128.132]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2023 21:58:50 -0800 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Cc: Vivek Kasireddy , David Hildenbrand , Daniel Vetter , Mike Kravetz , Hugh Dickins , Peter Xu , Jason Gunthorpe , Gerd Hoffmann , Dongwon Kim , Junxiao Chang Subject: [PATCH v6 2/5] udmabuf: Add back support for mapping hugetlb pages (v5) Date: Mon, 4 Dec 2023 21:35:06 -0800 Message-Id: <20231205053509.2342169-3-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231205053509.2342169-1-vivek.kasireddy@intel.com> References: <20231205053509.2342169-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: AC43712000A X-Rspam-User: X-Stat-Signature: u3zt5rpwyw5qikd6upd8hjx6n7ha71a7 X-Rspamd-Server: rspam03 X-HE-Tag: 1701755935-183755 X-HE-Meta: U2FsdGVkX19b1G/ttVg8D1aB8VHlLUp9ABW1EWICt5Wv3fNq0lxfdheb+wlJP7r7juTPMbydOlufWEf9w/cPsU4i4cEkz8WM5U+8RGUWAEEY6Jp/pw5DBniuPzrM8DDcQyQi0tLIbGjHV5r4ZUgb04r9/upxto2T38nPfqAQmu9QhGA3wVC4p3LdkPyeXIIO2sDI0ahiFMHYZnVz247UtHZ0B4tThf1yPK1z609fun31qBB0D3Vv8R0beDcVIvqAvACIVv1PmVqo/gP28a0QFOMD4Eq7w4yQ1s2uf+cnc1ZTzSX07TcpwksiGY+eAGYhiK2kb+pyYOnV/YrzVGAtv/ACwf+uKPT6D9dQ67lSyYAE+ZpyUYEMYq/C2tTUyo9My02nUZyj7Py3VIJGGnFCELVNzGo41BFdBe4wYKWzKO4ieyV5Mp5GarnRW1+354Ad2fUmJGq/6woG7jn4cLrxFPMhQ7pDwB4wY5OTGhrqKxxw/6YjXpMDDyFNzp4VxQ8zgIZTggTAB0GDqt16CqaMTbDZ8KCjJoZo7Up1/ZmY+Bo/BZVozSmCL6ZY2bZCS77GQ/TYSaucfBvTnBot1h3lSFLKM8ogjglaFkyqw1GhLoP3woX9K4FG0Wpav708aCKIhfNMcvbv3DDvU5jIdui2c2nCGzHNLdvFbGPxFv+QG4NeAXfDBQu7NJe+iAfrnGUBzXI2/AI5vBrwpI19GuYNRmBPqtjsLIh/UpcX1wIjfkiV+qFyHnp8hkop0lnEa+1PgQsURc6M+o3iKf5cLiECy5Q/Nw0Cdn+06rzR7WXKM+yYoPBiYv2NTOQgYvoGc6Q3B6RDuMOYubnnJNuZ/f4NGycF8dNZe7YENK4fyBQJwUCit7JMSnDkrzk8WXg1Y4zh2voLlwfKVYXfb7oUyAHJJ6ILkPnH8R8RgGvQNjT1V7YTNz+5IpCFU9mr3HpDJ8bM5UeSIH+uJx85qyljGYa epyx+5gT X9zpnU6OrXob8XsAZ3JRWUBileA0XKSgp2bZ47utWd77yV7k6f17w0roiHz7Vm4tkABeagPWsYZZyiKM+CaAM06/mvZYvechgsfMFZPJdTm4O65F94RJn+bO3Aids2N58kjzL3ZhJgapE71eyrUYfYGf2TMfQg9c+1oTcAfXnS9CPLTY06n7t+rOPVrpc8X+T7Ss0o6HTmPxm7V/gVCQ7gfA3nSxeBAmj8LmHPaKb0J4FH0sftz9QXokcVZ6i/xy/cPIN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A user or admin can configure a VMM (Qemu) Guest's memory to be backed by hugetlb pages for various reasons. However, a Guest OS would still allocate (and pin) buffers that are backed by regular 4k sized pages. In order to map these buffers and create dma-bufs for them on the Host, we first need to find the hugetlb pages where the buffer allocations are located and then determine the offsets of individual chunks (within those pages) and use this information to eventually populate a scatterlist. Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500 options were passed to the Host kernel and Qemu was launched with these relevant options: qemu-system-x86_64 -m 4096m.... -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080 -display gtk,gl=on -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M -machine memory-backend=mem1 Replacing -display gtk,gl=on with -display gtk,gl=off above would exercise the mmap handler. v2: Updated get_sg_table() to manually populate the scatterlist for both huge page and non-huge-page cases. v3: s/offsets/subpgoff/g s/hpoff/mapidx/g v4: Replaced find_get_page_flags() with __filemap_get_folio() to ensure that we only obtain head pages from the mapping v5: Fix the calculation of mapidx to ensure that it is a order-n page multiple Cc: David Hildenbrand Cc: Daniel Vetter Cc: Mike Kravetz Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Junxiao Chang Acked-by: Mike Kravetz (v2) Signed-off-by: Vivek Kasireddy --- drivers/dma-buf/udmabuf.c | 88 +++++++++++++++++++++++++++++++++------ 1 file changed, 75 insertions(+), 13 deletions(-) diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 820c993c8659..1d1cc5e7e613 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -28,6 +29,7 @@ struct udmabuf { struct page **pages; struct sg_table *sg; struct miscdevice *device; + pgoff_t *subpgoff; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -41,6 +43,10 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) return VM_FAULT_SIGBUS; pfn = page_to_pfn(ubuf->pages[pgoff]); + if (ubuf->subpgoff) { + pfn += ubuf->subpgoff[pgoff] >> PAGE_SHIFT; + } + return vmf_insert_pfn(vma, vmf->address, pfn); } @@ -90,23 +96,31 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, { struct udmabuf *ubuf = buf->priv; struct sg_table *sg; + struct scatterlist *sgl; + pgoff_t offset; + unsigned long i = 0; int ret; sg = kzalloc(sizeof(*sg), GFP_KERNEL); if (!sg) return ERR_PTR(-ENOMEM); - ret = sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount, - 0, ubuf->pagecount << PAGE_SHIFT, - GFP_KERNEL); + + ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL); if (ret < 0) - goto err; + goto err_alloc; + + for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) { + offset = ubuf->subpgoff ? ubuf->subpgoff[i] : 0; + sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, offset); + } ret = dma_map_sgtable(dev, sg, direction, 0); if (ret < 0) - goto err; + goto err_map; return sg; -err: +err_map: sg_free_table(sg); +err_alloc: kfree(sg); return ERR_PTR(ret); } @@ -143,6 +157,7 @@ static void release_udmabuf(struct dma_buf *buf) for (pg = 0; pg < ubuf->pagecount; pg++) put_page(ubuf->pages[pg]); + kfree(ubuf->subpgoff); kfree(ubuf->pages); kfree(ubuf); } @@ -206,7 +221,10 @@ static long udmabuf_create(struct miscdevice *device, struct udmabuf *ubuf; struct dma_buf *buf; pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit; - struct page *page; + struct page *page, *hpage = NULL; + struct folio *folio; + pgoff_t mapidx, chunkoff, maxchunks; + struct hstate *hpstate; int seals, ret = -EINVAL; u32 i, flags; @@ -242,7 +260,7 @@ static long udmabuf_create(struct miscdevice *device, if (!memfd) goto err; mapping = memfd->f_mapping; - if (!shmem_mapping(mapping)) + if (!shmem_mapping(mapping) && !is_file_hugepages(memfd)) goto err; seals = memfd_fcntl(memfd, F_GET_SEALS, 0); if (seals == -EINVAL) @@ -253,16 +271,59 @@ static long udmabuf_create(struct miscdevice *device, goto err; pgoff = list[i].offset >> PAGE_SHIFT; pgcnt = list[i].size >> PAGE_SHIFT; + if (is_file_hugepages(memfd)) { + if (!ubuf->subpgoff) { + ubuf->subpgoff = kmalloc_array(ubuf->pagecount, + sizeof(*ubuf->subpgoff), + GFP_KERNEL); + if (!ubuf->subpgoff) { + ret = -ENOMEM; + goto err; + } + } + hpstate = hstate_file(memfd); + mapidx = list[i].offset >> huge_page_shift(hpstate); + mapidx <<= huge_page_order(hpstate); + chunkoff = (list[i].offset & + ~huge_page_mask(hpstate)) >> PAGE_SHIFT; + maxchunks = huge_page_size(hpstate) >> PAGE_SHIFT; + } for (pgidx = 0; pgidx < pgcnt; pgidx++) { - page = shmem_read_mapping_page(mapping, pgoff + pgidx); - if (IS_ERR(page)) { - ret = PTR_ERR(page); - goto err; + if (is_file_hugepages(memfd)) { + if (!hpage) { + folio = __filemap_get_folio(mapping, mapidx, + FGP_ACCESSED, 0); + hpage = IS_ERR(folio) ? NULL: &folio->page; + if (!hpage) { + ret = -EINVAL; + goto err; + } + } + get_page(hpage); + ubuf->pages[pgbuf] = hpage; + ubuf->subpgoff[pgbuf++] = chunkoff << PAGE_SHIFT; + if (++chunkoff == maxchunks) { + put_page(hpage); + hpage = NULL; + chunkoff = 0; + mapidx += pages_per_huge_page(hpstate); + } + } else { + mapidx = pgoff + pgidx; + page = shmem_read_mapping_page(mapping, mapidx); + if (IS_ERR(page)) { + ret = PTR_ERR(page); + goto err; + } + ubuf->pages[pgbuf++] = page; } - ubuf->pages[pgbuf++] = page; } fput(memfd); memfd = NULL; + if (hpage) { + put_page(hpage); + hpage = NULL; + } } exp_info.ops = &udmabuf_ops; @@ -287,6 +348,7 @@ static long udmabuf_create(struct miscdevice *device, put_page(ubuf->pages[--pgbuf]); if (memfd) fput(memfd); + kfree(ubuf->subpgoff); kfree(ubuf->pages); kfree(ubuf); return ret;