From patchwork Mon Jun 24 06:36:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kasireddy, Vivek" X-Patchwork-Id: 13709046 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B63DC30653 for ; Mon, 24 Jun 2024 07:05:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BEC0E6B0158; Mon, 24 Jun 2024 03:05:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B994C6B026D; Mon, 24 Jun 2024 03:05:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A135B6B026B; Mon, 24 Jun 2024 03:05:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 81C0B6B00F7 for ; Mon, 24 Jun 2024 03:05:32 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3006B1C1BCD for ; Mon, 24 Jun 2024 07:05:32 +0000 (UTC) X-FDA: 82264896504.28.6690568 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) by imf26.hostedemail.com (Postfix) with ESMTP id EE298140019 for ; Mon, 24 Jun 2024 07:05:29 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=TBqeVG4v; spf=pass (imf26.hostedemail.com: domain of vivek.kasireddy@intel.com designates 198.175.65.21 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719212723; a=rsa-sha256; cv=none; b=QmYnds4/ggr+hK8VCioQ4lZDjyPghRIQ05X1TzJ6O9eaTHYDNnj9+sy3SnVJtoF5/GxjBA htqVC1CO9/N8vXaWWRI6/dQUphDM9v2gB+/6L81hylmeXW3N/036drujWD1T8hiva524OV ER0gGA6t6QQqIBwuSkyLGASbVtFZOOg= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=TBqeVG4v; spf=pass (imf26.hostedemail.com: domain of vivek.kasireddy@intel.com designates 198.175.65.21 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719212723; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fYNzMo8R9IZza3lof7PJHFdB/0yyaWpys8zRImzF5hA=; b=6IpHoyBixnWZGdGxVVEG2WzPzjdfuaf2sFJu00LOYlgE9ir1JMkH4MJQBl4RlKqWCyQ/vs f/ti1IfyKdF7bKSQVnt41KnXj2VhKRX/KKiRvp+qu1serI+GrQKIkIgGZ19HRlGnPddUGh +QnJ+fNUAEH9n6dOossnUF1hYQV51j0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719212730; x=1750748730; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KWfvKYp6qtQmdJwHokBW0M3ksxOQNq0tWLBPfJBLXlM=; b=TBqeVG4vYgkYzHfAQHhVjrB3m6BYAoEu9NOU49yYzqkJj4w+kLKq1LK4 E+ZODv+Nm4p7pVUctLGmN+ula66c/8RVKlo0dmxC9OSPu0IyAs723f3y5 XrynbXk0xW+g3p5Do0sdGKPdLA5fdye6JL5ncMuUf+9koEjmTzeQ5jrvt XjD0iYEiBTKT+XqCGNk2i9l7Z3VNX7GoJPNtJXW2Mq8l0IQVZJJqgu8TC ao4iMCOc7JkxtjWYY7/7faEUIsL7mhllHny70AbEoysLVWXZxV8tcZfv9 X1PFtFOk6zycYTU2Pdv75IBuDph8L5qy789B/pFlngbZBFmbFRwZh3TwD Q==; X-CSE-ConnectionGUID: HOHhD4uVRR6dAcTJnIF3Rg== X-CSE-MsgGUID: B4F8xdrAR4+AgvpccK6Ekw== X-IronPort-AV: E=McAfee;i="6700,10204,11112"; a="16134963" X-IronPort-AV: E=Sophos;i="6.08,261,1712646000"; d="scan'208";a="16134963" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jun 2024 00:05:26 -0700 X-CSE-ConnectionGUID: ytc2l21XQmWELk9WFFK/Dw== X-CSE-MsgGUID: jiNoum31R2yQInzB0kiDZw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,261,1712646000"; d="scan'208";a="73955882" Received: from vkasired-desk2.fm.intel.com ([10.105.128.132]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jun 2024 00:05:25 -0700 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Cc: Vivek Kasireddy , David Hildenbrand , Daniel Vetter , Mike Kravetz , Hugh Dickins , Peter Xu , Jason Gunthorpe , Gerd Hoffmann , Dongwon Kim , Junxiao Chang , Dave Airlie Subject: [PATCH v16 6/9] udmabuf: Add back support for mapping hugetlb pages Date: Sun, 23 Jun 2024 23:36:14 -0700 Message-ID: <20240624063952.1572359-7-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240624063952.1572359-1-vivek.kasireddy@intel.com> References: <20240624063952.1572359-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 X-Stat-Signature: jqt8kax7xu77ii3r6ni1tnb75k6q5zqn X-Rspamd-Queue-Id: EE298140019 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1719212729-173885 X-HE-Meta: U2FsdGVkX1/yCyuH1k/8t4uuKMufbTS7IvWK+Xx4eNYzolpjimc81bjaj2BtFNmSc37QA2LKkacCoxOeLgBbQNyyS8YyL6LS8Q1bI/FC46ourZsXkEsHhg+L7haiGILLRZ2KrwQez1fQGvrxlzmp37K2CmyQR9mR0sJI8xOH1Yks4aXraZI91yiK/qyG42T4CLghWuiwcqeY/qEtqVXtA7zyt0tARgoQ3JXAOx1/+NAWWdkq3o4/9XwsThx4baGv8Xf0dv+SR8W2Smre+ZD0C04wZmYVYm52vadPG30j8O8Ep5tWJhBDuP/WI3P8xigXStdpjenigMvx/H6oUfXNItwQe6wwbnAcRw9y5TP//G2do7IbtEeMNFBLNcimfiEYxm7YYk6qcqDA2j4elaqrRsjH3a/abIz7TGDlOMrvR264D8nmCejhvDlZhHe0fzHem6Ac5LuyKozTwXlNNHVlaGMtyZp/F//wRbpBG4r95pORy9OEfGbNQkcfO9lvRQzO334MORzepDv0Nqg5qwu6ffNngkhIkEWxQXipIFjtKDv+EVG99NpJqAkjQJ5O+efOtSDNkCoTUOxqUUpxq3j8LidtIQAjN2YmTP3gQ0RlyjXJeuCLSZgehmqvuLZBI0KhF/NR6H+awAtOgPb+2iOxb7/DP4rGcZsxfpc8BGLOLnfp4n52LvH5eBJ/iXlpkQJ1c9EV/x6+dA39OC0H7LDPwX2Qjr0rBS625fleGsmEJiS6AQxLLZLWmQoOg2t1xHz1SU8bhCXEWyythCCro4hkcYBFjzn/MhnWRfckqrwfBhf7SwddDRLb/kbrbblTSx0fUZcg/s5XEzQD1Gb0Wfi06EzwzALmaU6Edy8WHSkOSqdChxqJoUrwt9Tf+mZhAFwcDr4Ve6Pk4O5JS++ET/HgFlayIPYIL1SyIU6eNMwcGDiX00glvxz6GjopAI2a3cUhJ5cGHnlRzNz7g58hJh6 TGdiiz80 P1ZWv9RYVB9jZDB1RnXMj4/NVWULcrhht/ZcXYejkfUHxUUB7e6jqZJO4NbOumcuWOCU5gW4clr+aSNiHemQiPjkcx7qy2sQByeIzlpucuAPow/44Pg88nn6vg9Sc0v6QrGzo4UODvGjc85ehqJ0KH3ANHMb+woCXGL4vpp67PDtdS1bn83XMjJFXhCpRta0PVI38H3BP54kmzlV/4fAG06fI4xp8ble/T2j8yWLdTF6wTucqLKWm0aW66w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A user or admin can configure a VMM (Qemu) Guest's memory to be backed by hugetlb pages for various reasons. However, a Guest OS would still allocate (and pin) buffers that are backed by regular 4k sized pages. In order to map these buffers and create dma-bufs for them on the Host, we first need to find the hugetlb pages where the buffer allocations are located and then determine the offsets of individual chunks (within those pages) and use this information to eventually populate a scatterlist. Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500 options were passed to the Host kernel and Qemu was launched with these relevant options: qemu-system-x86_64 -m 4096m.... -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080 -display gtk,gl=on -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M -machine memory-backend=mem1 Replacing -display gtk,gl=on with -display gtk,gl=off above would exercise the mmap handler. Cc: David Hildenbrand Cc: Daniel Vetter Cc: Mike Kravetz Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Junxiao Chang Acked-by: Mike Kravetz (v2) Acked-by: Dave Airlie Acked-by: Gerd Hoffmann Signed-off-by: Vivek Kasireddy --- drivers/dma-buf/udmabuf.c | 122 +++++++++++++++++++++++++++++++------- 1 file changed, 101 insertions(+), 21 deletions(-) diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 820c993c8659..274defd3fa3e 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -28,6 +29,7 @@ struct udmabuf { struct page **pages; struct sg_table *sg; struct miscdevice *device; + pgoff_t *offsets; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -41,6 +43,8 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) return VM_FAULT_SIGBUS; pfn = page_to_pfn(ubuf->pages[pgoff]); + pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; + return vmf_insert_pfn(vma, vmf->address, pfn); } @@ -90,23 +94,29 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, { struct udmabuf *ubuf = buf->priv; struct sg_table *sg; + struct scatterlist *sgl; + unsigned int i = 0; int ret; sg = kzalloc(sizeof(*sg), GFP_KERNEL); if (!sg) return ERR_PTR(-ENOMEM); - ret = sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount, - 0, ubuf->pagecount << PAGE_SHIFT, - GFP_KERNEL); + + ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL); if (ret < 0) - goto err; + goto err_alloc; + + for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) + sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, ubuf->offsets[i]); + ret = dma_map_sgtable(dev, sg, direction, 0); if (ret < 0) - goto err; + goto err_map; return sg; -err: +err_map: sg_free_table(sg); +err_alloc: kfree(sg); return ERR_PTR(ret); } @@ -143,6 +153,7 @@ static void release_udmabuf(struct dma_buf *buf) for (pg = 0; pg < ubuf->pagecount; pg++) put_page(ubuf->pages[pg]); + kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); } @@ -196,17 +207,77 @@ static const struct dma_buf_ops udmabuf_ops = { #define SEALS_WANTED (F_SEAL_SHRINK) #define SEALS_DENIED (F_SEAL_WRITE) +static int handle_hugetlb_pages(struct udmabuf *ubuf, struct file *memfd, + pgoff_t offset, pgoff_t pgcnt, + pgoff_t *pgbuf) +{ + struct hstate *hpstate = hstate_file(memfd); + pgoff_t mapidx = offset >> huge_page_shift(hpstate); + pgoff_t subpgoff = (offset & ~huge_page_mask(hpstate)) >> PAGE_SHIFT; + pgoff_t maxsubpgs = huge_page_size(hpstate) >> PAGE_SHIFT; + struct page *hpage = NULL; + struct folio *folio; + pgoff_t pgidx; + + mapidx <<= huge_page_order(hpstate); + for (pgidx = 0; pgidx < pgcnt; pgidx++) { + if (!hpage) { + folio = __filemap_get_folio(memfd->f_mapping, + mapidx, + FGP_ACCESSED, 0); + if (IS_ERR(folio)) + return PTR_ERR(folio); + + hpage = &folio->page; + } + + get_page(hpage); + ubuf->pages[*pgbuf] = hpage; + ubuf->offsets[*pgbuf] = subpgoff << PAGE_SHIFT; + (*pgbuf)++; + if (++subpgoff == maxsubpgs) { + put_page(hpage); + hpage = NULL; + subpgoff = 0; + mapidx += pages_per_huge_page(hpstate); + } + } + + if (hpage) + put_page(hpage); + + return 0; +} + +static int handle_shmem_pages(struct udmabuf *ubuf, struct file *memfd, + pgoff_t offset, pgoff_t pgcnt, + pgoff_t *pgbuf) +{ + pgoff_t pgidx, pgoff = offset >> PAGE_SHIFT; + struct page *page; + + for (pgidx = 0; pgidx < pgcnt; pgidx++) { + page = shmem_read_mapping_page(memfd->f_mapping, + pgoff + pgidx); + if (IS_ERR(page)) + return PTR_ERR(page); + + ubuf->pages[*pgbuf] = page; + (*pgbuf)++; + } + + return 0; +} + static long udmabuf_create(struct miscdevice *device, struct udmabuf_create_list *head, struct udmabuf_create_item *list) { DEFINE_DMA_BUF_EXPORT_INFO(exp_info); struct file *memfd = NULL; - struct address_space *mapping = NULL; struct udmabuf *ubuf; struct dma_buf *buf; - pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit; - struct page *page; + pgoff_t pgcnt, pgbuf = 0, pglimit; int seals, ret = -EINVAL; u32 i, flags; @@ -234,6 +305,12 @@ static long udmabuf_create(struct miscdevice *device, ret = -ENOMEM; goto err; } + ubuf->offsets = kcalloc(ubuf->pagecount, sizeof(*ubuf->offsets), + GFP_KERNEL); + if (!ubuf->offsets) { + ret = -ENOMEM; + goto err; + } pgbuf = 0; for (i = 0; i < head->count; i++) { @@ -241,8 +318,7 @@ static long udmabuf_create(struct miscdevice *device, memfd = fget(list[i].memfd); if (!memfd) goto err; - mapping = memfd->f_mapping; - if (!shmem_mapping(mapping)) + if (!shmem_file(memfd) && !is_file_hugepages(memfd)) goto err; seals = memfd_fcntl(memfd, F_GET_SEALS, 0); if (seals == -EINVAL) @@ -251,16 +327,19 @@ static long udmabuf_create(struct miscdevice *device, if ((seals & SEALS_WANTED) != SEALS_WANTED || (seals & SEALS_DENIED) != 0) goto err; - pgoff = list[i].offset >> PAGE_SHIFT; - pgcnt = list[i].size >> PAGE_SHIFT; - for (pgidx = 0; pgidx < pgcnt; pgidx++) { - page = shmem_read_mapping_page(mapping, pgoff + pgidx); - if (IS_ERR(page)) { - ret = PTR_ERR(page); - goto err; - } - ubuf->pages[pgbuf++] = page; - } + + pgcnt = list[i].size >> PAGE_SHIFT; + if (is_file_hugepages(memfd)) + ret = handle_hugetlb_pages(ubuf, memfd, + list[i].offset, + pgcnt, &pgbuf); + else + ret = handle_shmem_pages(ubuf, memfd, + list[i].offset, + pgcnt, &pgbuf); + if (ret < 0) + goto err; + fput(memfd); memfd = NULL; } @@ -287,6 +366,7 @@ static long udmabuf_create(struct miscdevice *device, put_page(ubuf->pages[--pgbuf]); if (memfd) fput(memfd); + kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); return ret;