From patchwork Thu Aug 17 06:46:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Kasireddy X-Patchwork-Id: 13356021 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BD4FC2FC14 for ; Thu, 17 Aug 2023 07:07:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 14154280016; Thu, 17 Aug 2023 03:07:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C9EA280009; Thu, 17 Aug 2023 03:07:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D75D3280016; Thu, 17 Aug 2023 03:07:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BD723280009 for ; Thu, 17 Aug 2023 03:07:51 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 92CC2A1020 for ; Thu, 17 Aug 2023 07:07:51 +0000 (UTC) X-FDA: 81132716742.09.0B013C2 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by imf21.hostedemail.com (Postfix) with ESMTP id 912FF1C001E for ; Thu, 17 Aug 2023 07:07:48 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=D8m8Lt3s; spf=pass (imf21.hostedemail.com: domain of vivek.kasireddy@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692256069; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wSX8DVYyL+2CfxA74KiR5uRA4uAXoO1SDkpTnwsvGc8=; b=ruCAlW5IqaYJhJKJ2P0/g/UnFWSYt2koXCr6wcXjb/oqKRfYPmA0VKEmWgRaAHwuo0yNuF cOL+GOj/GXBcMfwTsqdd6tJna5jb/uFiNRBTYSNlfvEuqcnuR1r6QhsMJGcIWVOkj0Iy54 q0x2sfDUNVX6xEb5aKQJOHgSbkF8df0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692256069; a=rsa-sha256; cv=none; b=w7wBmjO4p4co97LNH/22icS46ciEiintNgfMfP9i8/DvOtfyKMBIPptDl1ewZ6Y7162aeE k5Ss7KLyKZNJRbkMfshXmXaR/XC+UnNHdb5o3RpLXT5oZtxX98PCduCYLYBb8zriWICjTQ cwM767I8lp6gNnbd3QXEqAoRwtS59Mw= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=D8m8Lt3s; spf=pass (imf21.hostedemail.com: domain of vivek.kasireddy@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692256069; x=1723792069; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4gHoMhYSuE9PY2sLaMjLckWZ81rKPYGfJDvuxgZ60RU=; b=D8m8Lt3sUF3dB79DcroLKhn03Hs+VbHpklZv7Fe37BaJE26JfelALF82 4x2ZTp2QyJvSLc+sb55+woCVecxUTduMRoEnqbY4Kjnk+ymhVRy9c+ugy in69PtcbYEZZMkMG0HeC/aARaOwb7c6vblMWlV7bnW/4N3l2m2oY+SY8a 4+Ii9g3TMXEyKDr6BowS5jMW8DI5D2pOxZqsymWOkGCWaoqPPCVYudIkO 2Z2eZL15zrgHbniMlHk22f4ggVdQeQUhX2p/JWUW50tvU/jeCdXiqSDcj u3eHdjzqvH++JDjJcLl3pZ1Ih0/kxUwbFeldG1zCp0s9tnyiXRYDPG+w1 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="357697955" X-IronPort-AV: E=Sophos;i="6.01,179,1684825200"; d="scan'208";a="357697955" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Aug 2023 00:07:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="799913428" X-IronPort-AV: E=Sophos;i="6.01,179,1684825200"; d="scan'208";a="799913428" Received: from vkasired-desk2.fm.intel.com ([10.105.128.127]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Aug 2023 00:07:41 -0700 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Cc: Vivek Kasireddy , David Hildenbrand , Daniel Vetter , Mike Kravetz , Hugh Dickins , Peter Xu , Jason Gunthorpe , Gerd Hoffmann , Dongwon Kim , Junxiao Chang Subject: [PATCH v3 2/2] udmabuf: Add back support for mapping hugetlb pages (v3) Date: Wed, 16 Aug 2023 23:46:23 -0700 Message-Id: <20230817064623.3424348-3-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230817064623.3424348-1-vivek.kasireddy@intel.com> References: <20230817064623.3424348-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 912FF1C001E X-Rspam-User: X-Stat-Signature: zadfi56d1pd7hhcr86tq8ierydrpo6oz X-Rspamd-Server: rspam03 X-HE-Tag: 1692256068-405495 X-HE-Meta: U2FsdGVkX1+MneQSK9bIG/XErOx7rhUWqeEnl6uw2TRF+UgMx5fLPeuZY6+OU7D3B+dnim5fr7VvDEnWSDRO/PLdNObZQv+evUNNW6fYqdn9Idkqg8Nql3p+U42LQdqnsmjK5l0k1H7GyRgN8898BTeS+Px+7Q8FjH7KE9aZ4GZRAaK1/oFa8lzrPMvgEqnBogd7Z3WhDdwgHC8nddSZvYWfh1u7gNRZ2dIKcaIlLtSZfOR2M/xK3wRHSJ5P/EqmMN9ho2Zn19WL8y7aB6obe8hDMuPX34CgSU4atJ8PTKzuFJpOBHtFZg/BYtNdDMjjN4YK4rDGeLdmoEwKB74BLPJCLFI+j6oipiFs5siTRDJCx+tWHpu2CBiooVb1t1wN28tmNWnxgwNKXFIFQvTSXZTyTL1MUyVYjwpB3DfUK5NY9J8uL+/29+0FJ59Ixv9RwireG9s9BXGT+ea12We+q5/EUfWvbIbmqvfLMLn/1HxYETuDxbFa8mPaBrXgxkb0lHCZUjBeOLwuIeONC9XB/uNou9Nn/Ow4cBp/kf9/SSorFfaHgxg8tA2u5AsXG/WxanXtSVf6KZ45W4Ui+V7ij+GjhUfxyIgC7tMZOS6uzPf6j4KLBBQFqm9iz12cfl1pt6lFKEO20Zvwyz0baSbnEu+85FACMpKEWZ0OThTF7Ih/c1byK2NzfxK1eRIveOiPm0uNKjZgmF2skJwg7sTq52uYtWBL/hhgj54u2Mzrzoo5wFe9NdV5h7aoYL8tJqbrSlaDjKnpLMr4BrUwR0cU3sYvPuVhRKc2zYokOwY0M145nBdcBfakoGihW5PUMWCWvoXr648QX6Wbf3EOP17fYhXeMqWzuv+/Ashlwgfm34VpjH4y9l2/6Gc7KjOw1U9BMcMlF3/0QZYG8qxv4KQdqhgSoSHym55gOX+PC8h1H9ZsiZoyJiRHo1B1FoAYbCQLC90kmdfkifr61CNwz5I P3I5MSRO QopPPNrNVL5ZKv3YEM1btFkm613tzPaMxJfmwZk5wNuhaal+4zWScqAIBg8jjyMXla8Yme+EdUCuatQa9HSQsvXWIneAJabPbZD7UPKKtVdTlytG7bWiOLbQcmNUXo1yiNnPdy3jebSktNxoIp3KxdaO9it8SSgdTNx89A9tgCwOoM6rQaQS4GxiFYCKKPsDE1f2E/EzQwPV50MUMn1qGD1ptd9P+FEUOcKgJVl1Juu5ryY+Taq5kkRG/2RpFcWHvQiSk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A user or admin can configure a VMM (Qemu) Guest's memory to be backed by hugetlb pages for various reasons. However, a Guest OS would still allocate (and pin) buffers that are backed by regular 4k sized pages. In order to map these buffers and create dma-bufs for them on the Host, we first need to find the hugetlb pages where the buffer allocations are located and then determine the offsets of individual chunks (within those pages) and use this information to eventually populate a scatterlist. Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500 options were passed to the Host kernel and Qemu was launched with these relevant options: qemu-system-x86_64 -m 4096m.... -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080 -display gtk,gl=on -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M -machine memory-backend=mem1 Replacing -display gtk,gl=on with -display gtk,gl=off above would exercise the mmap handler. v2: Updated get_sg_table() to manually populate the scatterlist for both huge page and non-huge-page cases. v3: s/offsets/subpgoff/g s/hpoff/mapidx/g Cc: David Hildenbrand Cc: Daniel Vetter Cc: Mike Kravetz Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Junxiao Chang Acked-by: Mike Kravetz (v2) Signed-off-by: Vivek Kasireddy --- drivers/dma-buf/udmabuf.c | 85 +++++++++++++++++++++++++++++++++------ 1 file changed, 72 insertions(+), 13 deletions(-) diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 820c993c8659..1a41c4a069ea 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -28,6 +29,7 @@ struct udmabuf { struct page **pages; struct sg_table *sg; struct miscdevice *device; + pgoff_t *subpgoff; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -41,6 +43,10 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) return VM_FAULT_SIGBUS; pfn = page_to_pfn(ubuf->pages[pgoff]); + if (ubuf->subpgoff) { + pfn += ubuf->subpgoff[pgoff] >> PAGE_SHIFT; + } + return vmf_insert_pfn(vma, vmf->address, pfn); } @@ -90,23 +96,31 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, { struct udmabuf *ubuf = buf->priv; struct sg_table *sg; + struct scatterlist *sgl; + pgoff_t offset; + unsigned long i = 0; int ret; sg = kzalloc(sizeof(*sg), GFP_KERNEL); if (!sg) return ERR_PTR(-ENOMEM); - ret = sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount, - 0, ubuf->pagecount << PAGE_SHIFT, - GFP_KERNEL); + + ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL); if (ret < 0) - goto err; + goto err_alloc; + + for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) { + offset = ubuf->subpgoff ? ubuf->subpgoff[i] : 0; + sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, offset); + } ret = dma_map_sgtable(dev, sg, direction, 0); if (ret < 0) - goto err; + goto err_map; return sg; -err: +err_map: sg_free_table(sg); +err_alloc: kfree(sg); return ERR_PTR(ret); } @@ -143,6 +157,7 @@ static void release_udmabuf(struct dma_buf *buf) for (pg = 0; pg < ubuf->pagecount; pg++) put_page(ubuf->pages[pg]); + kfree(ubuf->subpgoff); kfree(ubuf->pages); kfree(ubuf); } @@ -206,7 +221,9 @@ static long udmabuf_create(struct miscdevice *device, struct udmabuf *ubuf; struct dma_buf *buf; pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit; - struct page *page; + struct page *page, *hpage = NULL; + pgoff_t mapidx, chunkoff, maxchunks; + struct hstate *hpstate; int seals, ret = -EINVAL; u32 i, flags; @@ -242,7 +259,7 @@ static long udmabuf_create(struct miscdevice *device, if (!memfd) goto err; mapping = memfd->f_mapping; - if (!shmem_mapping(mapping)) + if (!shmem_mapping(mapping) && !is_file_hugepages(memfd)) goto err; seals = memfd_fcntl(memfd, F_GET_SEALS, 0); if (seals == -EINVAL) @@ -253,16 +270,57 @@ static long udmabuf_create(struct miscdevice *device, goto err; pgoff = list[i].offset >> PAGE_SHIFT; pgcnt = list[i].size >> PAGE_SHIFT; + if (is_file_hugepages(memfd)) { + if (!ubuf->subpgoff) { + ubuf->subpgoff = kmalloc_array(ubuf->pagecount, + sizeof(*ubuf->subpgoff), + GFP_KERNEL); + if (!ubuf->subpgoff) { + ret = -ENOMEM; + goto err; + } + } + hpstate = hstate_file(memfd); + mapidx = list[i].offset >> huge_page_shift(hpstate); + chunkoff = (list[i].offset & + ~huge_page_mask(hpstate)) >> PAGE_SHIFT; + maxchunks = huge_page_size(hpstate) >> PAGE_SHIFT; + } for (pgidx = 0; pgidx < pgcnt; pgidx++) { - page = shmem_read_mapping_page(mapping, pgoff + pgidx); - if (IS_ERR(page)) { - ret = PTR_ERR(page); - goto err; + if (is_file_hugepages(memfd)) { + if (!hpage) { + hpage = find_get_page_flags(mapping, mapidx, + FGP_ACCESSED); + if (!hpage) { + ret = -EINVAL; + goto err; + } + } + get_page(hpage); + ubuf->pages[pgbuf] = hpage; + ubuf->subpgoff[pgbuf++] = chunkoff << PAGE_SHIFT; + if (++chunkoff == maxchunks) { + put_page(hpage); + hpage = NULL; + chunkoff = 0; + mapidx++; + } + } else { + mapidx = pgoff + pgidx; + page = shmem_read_mapping_page(mapping, mapidx); + if (IS_ERR(page)) { + ret = PTR_ERR(page); + goto err; + } + ubuf->pages[pgbuf++] = page; } - ubuf->pages[pgbuf++] = page; } fput(memfd); memfd = NULL; + if (hpage) { + put_page(hpage); + hpage = NULL; + } } exp_info.ops = &udmabuf_ops; @@ -287,6 +345,7 @@ static long udmabuf_create(struct miscdevice *device, put_page(ubuf->pages[--pgbuf]); if (memfd) fput(memfd); + kfree(ubuf->subpgoff); kfree(ubuf->pages); kfree(ubuf); return ret;