From patchwork Tue Jul 18 08:26:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Kasireddy X-Patchwork-Id: 13316888 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A46C5EB64DD for ; Tue, 18 Jul 2023 08:47:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C425E6B0075; Tue, 18 Jul 2023 04:47:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BC9406B0078; Tue, 18 Jul 2023 04:47:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A45EB8D0001; Tue, 18 Jul 2023 04:47:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 92C4D6B0075 for ; Tue, 18 Jul 2023 04:47:07 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 40B941C88BB for ; Tue, 18 Jul 2023 08:47:07 +0000 (UTC) X-FDA: 81024102894.29.EB6F888 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf16.hostedemail.com (Postfix) with ESMTP id 32062180019 for ; Tue, 18 Jul 2023 08:47:04 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mXBpxui3; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf16.hostedemail.com: domain of vivek.kasireddy@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689670025; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fzb1RvbAnIjt5vMRwTibFh+F/UMpENnk6otnmKBso8g=; b=QbBPfIFbllm4zRlEHxhR9iyRM5a6LG6hZW6u5HUilYj/UzZY6hrogjN5L8GH53c+PXz4SI 4BoMTNDT+qqJBJSU7sC4axCaCgWaz1lwvNy+qvX2N+wYljc0RSWm/34e3zIRRT7zZiktUe mJ03RW8/zuiKICUT2NN52QTwKDjPJpU= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mXBpxui3; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf16.hostedemail.com: domain of vivek.kasireddy@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689670025; a=rsa-sha256; cv=none; b=L4epq8FX9apcn1NxSIwQ6ThPWGv8pTCvi3c0VTaoUYc2Go61jnsCpvylfJxhT8fHTYr5w3 3hMihOs1UHozy4VdialxpsmPK8/8ksY+F7wWJQQyxXniMQ1Xr7PIs8kLPE6i23Rz1SjdNn AYHIN6y9NJldhDRku32j8U/oxngXMAw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689670025; x=1721206025; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=c4zsLSpWhuDTAmH1NCTDqq7Se/R6/CqwGrxTZX4UJNA=; b=mXBpxui3Opwkz7kHaneS7/9ryuHr6JItWnzKSG39CskkbrRcxgeTzJrh T9uVkJCtY1c+N04fkZ8HRZeYPrCbHQEBawSJj/Z38jQcKqdPdDEnKVDrg fYgy8/F8BHfIGB7+d/sigU0JlHTXAUACtrE990+e/Tg3WS5dC2UT67Iqn VWK8As0N1R2QaMLS8fv2WB/yzrBbnAXK5ys+tMmz+WiEkTdnRXAyrKlTl saWjxk1f25I1q5uORw4fQXK7KYTZ0cRqQstRRocwvO8RGs7r4XJqZc448 O+Q2tbfNMonNGaRGhHVu0lmBO15WU/rV/diEgLs7EuHkZlKz+etjI12jy w==; X-IronPort-AV: E=McAfee;i="6600,9927,10774"; a="366191297" X-IronPort-AV: E=Sophos;i="6.01,213,1684825200"; d="scan'208";a="366191297" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 01:46:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10774"; a="723532674" X-IronPort-AV: E=Sophos;i="6.01,213,1684825200"; d="scan'208";a="723532674" Received: from vkasired-desk2.fm.intel.com ([10.105.128.127]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 01:46:58 -0700 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Cc: Vivek Kasireddy , David Hildenbrand , Mike Kravetz , Hugh Dickins , Peter Xu , Jason Gunthorpe , Gerd Hoffmann , Dongwon Kim , Junxiao Chang Subject: [PATCH v2 2/2] udmabuf: Add back support for mapping hugetlb pages (v2) Date: Tue, 18 Jul 2023 01:26:05 -0700 Message-Id: <20230718082605.1570740-3-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230718082605.1570740-1-vivek.kasireddy@intel.com> References: <20230718082605.1570740-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 32062180019 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 6dpdaott5g9xrd1joq1gtg11ut8io3yo X-HE-Tag: 1689670024-113749 X-HE-Meta: U2FsdGVkX1/5jjyFxPzlYxsoSZlyZ2gL6a+mknb5ccqJ60FxYAanuIXuIVPuBUr521D424ZGDjEtN1Q54yjhFhT7KQ71EUWogIhQdIlb49JP09drn2SAMlPdiq5DNO2lgReeiNZZN8fxmJXcM/mj/mLsmx561mNqb67Xenzdl9zXZTCWl+39ExYpbcGkhX1usqHrN/NhvCmKg+YKK73dJei9twKmlrFQpLUr2ComZWhWZfY37/uOGv4Cpll4cAA+Ag6leS7IDjtG6GBTEjYw22gNCs7ONSYhBdwTj73t0YM3p8Yzdr38lXfaEIlSqFgelaAEZRaFuOaxSC0+8gRV4HzZ4loc5xd1J2OOJLtxKGPds6zsytKOxiwrw97IeHRlTciq8ccwcU+DFmYMkWhWPLhI/6pbBOokAolD6e3dI97RofkBkSHGYOG/WMNb3MzQBb6jA1EPc854MQSSd+sthS7mdbWFltcVkKFUoEl6LBREI5a7qB7+EybEby7bB1bvxv98eCHcgaLrvTa1vBOsLpvCldE7MQR3LQisbdzxTIXTefZkHB66a0RJobxjObENadfCzsk+RuTLajw/p+trZFbKR6ZMikQTzaM10KRNUcN0jZSoH/4+u4f9BX8krePO8LYHpzPYtpbdE1xYRPMK0Cr0pPPZJgurVIgcVcjmd4MU9Jz2bykol9vnqTjT19v27mIoxVhUuAbBdtE2tGQqBW/CYYlGumukIlyPWHmGC2KOt+Z7nTaErJeHHgXs327YaWRYNbqqkqRvGt8JOsquU32bfCbtRMmgNmure4/cibOQ1NNSEKDywGYja8V1Tbh1kWT3v2eaIgZKsaN6b6VZawW8krCqhoS7rA6sITyVHQ3KSgFL8sUUEEsWS9WTqN2ZpjYiGIktzQ25gh87yRh/267dLip4l0b+VStWT6UnfRPtqPRr9nTnELBpyzTH+aSHeSmyhhH+IGPcMVA4nHT q2qyEWms rl8R+4hlEAmwkCdW2TDVVK/lLpsCnlUWh6jjz7TetY+a8m0KlllRp7O8iyS+jWkXl+klKEwnrOBRZ9WQR/OTQ7/lDQVjsEkNOynxrlklYRNjVHAfeCHiLzsGPrPUp/MlISUNL69DnYJdxr0PG7Oc+7CfjD0JcYznkqfs90/qXGlMss3DbBKeWaVhPDyeqoITDpqNnuzOHbaDmyUAjvzTTFe7+eHJVHgRXigOD54wNC9vRQSV2kPEeKioXlA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A user or admin can configure a VMM (Qemu) Guest's memory to be backed by hugetlb pages for various reasons. However, a Guest OS would still allocate (and pin) buffers that are backed by regular 4k sized pages. In order to map these buffers and create dma-bufs for them on the Host, we first need to find the hugetlb pages where the buffer allocations are located and then determine the offsets of individual chunks (within those pages) and use this information to eventually populate a scatterlist. Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500 options were passed to the Host kernel and Qemu was launched with these relevant options: qemu-system-x86_64 -m 4096m.... -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080 -display gtk,gl=on -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M -machine memory-backend=mem1 Replacing -display gtk,gl=on with -display gtk,gl=off above would exercise the mmap handler. v2: Updated get_sg_table() to manually populate the scatterlist for both huge page and non-huge-page cases. Cc: David Hildenbrand Cc: Mike Kravetz Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Junxiao Chang Signed-off-by: Vivek Kasireddy Acked-by: Mike Kravetz --- drivers/dma-buf/udmabuf.c | 84 +++++++++++++++++++++++++++++++++------ 1 file changed, 71 insertions(+), 13 deletions(-) diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 820c993c8659..10c47bf77fb5 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -28,6 +29,7 @@ struct udmabuf { struct page **pages; struct sg_table *sg; struct miscdevice *device; + pgoff_t *offsets; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -41,6 +43,10 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) return VM_FAULT_SIGBUS; pfn = page_to_pfn(ubuf->pages[pgoff]); + if (ubuf->offsets) { + pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; + } + return vmf_insert_pfn(vma, vmf->address, pfn); } @@ -90,23 +96,31 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, { struct udmabuf *ubuf = buf->priv; struct sg_table *sg; + struct scatterlist *sgl; + pgoff_t offset; + unsigned long i = 0; int ret; sg = kzalloc(sizeof(*sg), GFP_KERNEL); if (!sg) return ERR_PTR(-ENOMEM); - ret = sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount, - 0, ubuf->pagecount << PAGE_SHIFT, - GFP_KERNEL); + + ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL); if (ret < 0) - goto err; + goto err_alloc; + + for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) { + offset = ubuf->offsets ? ubuf->offsets[i] : 0; + sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, offset); + } ret = dma_map_sgtable(dev, sg, direction, 0); if (ret < 0) - goto err; + goto err_map; return sg; -err: +err_map: sg_free_table(sg); +err_alloc: kfree(sg); return ERR_PTR(ret); } @@ -143,6 +157,7 @@ static void release_udmabuf(struct dma_buf *buf) for (pg = 0; pg < ubuf->pagecount; pg++) put_page(ubuf->pages[pg]); + kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); } @@ -206,7 +221,9 @@ static long udmabuf_create(struct miscdevice *device, struct udmabuf *ubuf; struct dma_buf *buf; pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit; - struct page *page; + struct page *page, *hpage = NULL; + pgoff_t hpoff, chunkoff, maxchunks; + struct hstate *hpstate; int seals, ret = -EINVAL; u32 i, flags; @@ -242,7 +259,7 @@ static long udmabuf_create(struct miscdevice *device, if (!memfd) goto err; mapping = memfd->f_mapping; - if (!shmem_mapping(mapping)) + if (!shmem_mapping(mapping) && !is_file_hugepages(memfd)) goto err; seals = memfd_fcntl(memfd, F_GET_SEALS, 0); if (seals == -EINVAL) @@ -253,16 +270,56 @@ static long udmabuf_create(struct miscdevice *device, goto err; pgoff = list[i].offset >> PAGE_SHIFT; pgcnt = list[i].size >> PAGE_SHIFT; + if (is_file_hugepages(memfd)) { + if (!ubuf->offsets) { + ubuf->offsets = kmalloc_array(ubuf->pagecount, + sizeof(*ubuf->offsets), + GFP_KERNEL); + if (!ubuf->offsets) { + ret = -ENOMEM; + goto err; + } + } + hpstate = hstate_file(memfd); + hpoff = list[i].offset >> huge_page_shift(hpstate); + chunkoff = (list[i].offset & + ~huge_page_mask(hpstate)) >> PAGE_SHIFT; + maxchunks = huge_page_size(hpstate) >> PAGE_SHIFT; + } for (pgidx = 0; pgidx < pgcnt; pgidx++) { - page = shmem_read_mapping_page(mapping, pgoff + pgidx); - if (IS_ERR(page)) { - ret = PTR_ERR(page); - goto err; + if (is_file_hugepages(memfd)) { + if (!hpage) { + hpage = find_get_page_flags(mapping, hpoff, + FGP_ACCESSED); + if (!hpage) { + ret = -EINVAL; + goto err; + } + } + get_page(hpage); + ubuf->pages[pgbuf] = hpage; + ubuf->offsets[pgbuf++] = chunkoff << PAGE_SHIFT; + if (++chunkoff == maxchunks) { + put_page(hpage); + hpage = NULL; + chunkoff = 0; + hpoff++; + } + } else { + page = shmem_read_mapping_page(mapping, pgoff + pgidx); + if (IS_ERR(page)) { + ret = PTR_ERR(page); + goto err; + } + ubuf->pages[pgbuf++] = page; } - ubuf->pages[pgbuf++] = page; } fput(memfd); memfd = NULL; + if (hpage) { + put_page(hpage); + hpage = NULL; + } } exp_info.ops = &udmabuf_ops; @@ -287,6 +344,7 @@ static long udmabuf_create(struct miscdevice *device, put_page(ubuf->pages[--pgbuf]); if (memfd) fput(memfd); + kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); return ret;