From patchwork Sat Dec 16 06:05:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kasireddy, Vivek" X-Patchwork-Id: 13495482 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC125C4167B for ; Sat, 16 Dec 2023 06:29:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 726036B00A1; Sat, 16 Dec 2023 01:29:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 525C96B00C0; Sat, 16 Dec 2023 01:29:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A3FF6B00A8; Sat, 16 Dec 2023 01:29:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id EA44D6B00AA for ; Sat, 16 Dec 2023 01:29:01 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id BB5F08067D for ; Sat, 16 Dec 2023 06:29:01 +0000 (UTC) X-FDA: 81571703682.09.86A83ED Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by imf20.hostedemail.com (Postfix) with ESMTP id A3A071C000A for ; Sat, 16 Dec 2023 06:28:59 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QpH7WN23; spf=pass (imf20.hostedemail.com: domain of vivek.kasireddy@intel.com designates 192.198.163.11 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702708139; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gnFFCZ/4N7bQ8jwcsO7jGvxgDXP7BPVbDrHLX60MqX8=; b=L+VOhHzXNvn05pXRzeWOk05M8y7cOiWGA3M0t6HlTs411TOwsmAHgRHCM8T+IJEn+2O9O9 ZFZikbz7K17vrbKf/l4zQYgNhOt5xRncJ3x5NRrruHNK6q/u31jNF05DN2dwe2h7EfSp+h o/Y9rMU3IvS4QZU0kabGmZ5+YclQBMQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702708139; a=rsa-sha256; cv=none; b=Kt/7GfyWdMVS4z3Vv2/OFDoYhD9T6bAsZpyJTHUuxKlJ5LlrQMZMdmnFSA/pJ46J86PiMX XFu9BGfNjtu6SlnS3vRtODecP/Ei7LOOe0c9cS+2YmJLXner7GWba9NLRlmInHaNOYRev8 4UZ4iJ6A2d0ThLnVojcG7L9TtzY+s4I= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QpH7WN23; spf=pass (imf20.hostedemail.com: domain of vivek.kasireddy@intel.com designates 192.198.163.11 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702708140; x=1734244140; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=O1c3JIhGATN1dxcRTvUuRK9RhQSZEkJPgfU5c33zkAo=; b=QpH7WN23x1UEgXmRoZhVdX37VCh9jF3m4ZEu15cEQiGkD2LcW0wZFE77 k4YmTPaGb9jARf5rEV2pjRfz3W5kuOraXU8DQl3BMMGHkZpK8TnTnppvB i2Q5w4SARgVXUvcKnUmWeN1djTX8bSr8EFeM8mkOmuC9EjswOmvbxXpq4 3d8NYj0l63R0gjch69V1KM6g2hT/6zyo203/ux7DfFTh461WEzBXhwczG q3SaFrBbmAZYgzD1W61IEVymahiMD/fFWEXGjeb4KEeGuwqKOmOJLYMnW rb2HgFXszS5xNWdQLgV1ES3IXMxs9I0FMAHvM3W+nSMtaaR/PTjLMlhTn w==; X-IronPort-AV: E=McAfee;i="6600,9927,10925"; a="2186021" X-IronPort-AV: E=Sophos;i="6.04,280,1695711600"; d="scan'208";a="2186021" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Dec 2023 22:28:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10925"; a="751178906" X-IronPort-AV: E=Sophos;i="6.04,280,1695711600"; d="scan'208";a="751178906" Received: from vkasired-desk2.fm.intel.com ([10.105.128.132]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Dec 2023 22:28:55 -0800 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Cc: Vivek Kasireddy , David Hildenbrand , Daniel Vetter , Mike Kravetz , Hugh Dickins , Peter Xu , Jason Gunthorpe , Gerd Hoffmann , Dongwon Kim , Junxiao Chang Subject: [PATCH v8 2/6] udmabuf: Add back support for mapping hugetlb pages (v6) Date: Fri, 15 Dec 2023 22:05:32 -0800 Message-Id: <20231216060536.3716466-3-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231216060536.3716466-1-vivek.kasireddy@intel.com> References: <20231216060536.3716466-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 X-Stat-Signature: qb8zqmrnknock61ghrr5yosg4gpaw3ja X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: A3A071C000A X-Rspam-User: X-HE-Tag: 1702708139-53920 X-HE-Meta: U2FsdGVkX18Mz01ItrEG98LgktgPiDQXgMxfcV9rWDX9pUknbeIPHpuy9sELAlZsDmXlQZB1I2Yb+5hx9a0hN3HzUpiV8Kfx7P8jax88LC+/Iy3Qfyc5QqCrKKtAJGb0JfYd86hGhR41QoTh8MHp/AKoMKYy0YCwwdQfxPQesaBxlbTaQaixCPKktxvkyYidbSYOlLg/fW3J0gOCl5Tvwb7rO2D3Fp0FBnUnrNQ4qrzUQXiRlpz49GgHdw545G1n+iL1q11gUlASS0kRGh0DFG6z308pUoiUdMS3zxnW/kAJQGNLoZyX7uOI4W4f/iZ5f/Zcw7mitaHC8HRhpYesoYqa5JSZx2d9GqUFce9poYHQUedi6b0pFo70+zl2BLn16M+8HcNgMTEYkHd7S+PGPPn458CFfoYeFvafkWpB8rKoNZ9oImXme+jnVmQ1yvLcc5VBIE/OiNpRlKHlqI313n1wR4jDmRrqrVJLgP72kwFKSPpRptTcgVpQOsP1oexCCrJcr6X4OEACL/hkb4OVNIXu9qc3ec3YuSvpuS/1fi5ZGddxnhYu87ws1MdImdUYC08AFeJ/h6pSmVojjMxRO+bILWSGND4eMo9Cbz/nVChlDTEy2iUfbEKHZNwELLHZmpSPcmXhqNtYCpnxh7D+8sCwssuEbYA5veS/gH9uK3f74ZVO0k8dYWnMwVJFl3Ma3Zhvdhc3zAkMZ6fIVv+OMr7O3NsB0o5cIDI80wvQdMd69xjO75prw3oh3T+bve+x1z5MG6GJSWd4hKVrSIqj7eMTBWsfuOYpc1yoEB20aA7WHWy2W2q3cSyxsYhLrxH6mNc6BagcMFiA+YbMTQTG/GjSNqXV9pJgiM16mk/xFD5P6jkIQGggUDFB3TJ5VyYdKg+AaqRlcokSfZ2hVob7Kos1i95sWFLNgyHk3JD/gUvkw4YSnN/PGfDY8jJnRqgxMzGXW2qLq+fp4arAyPh 4yn7koth jYd/6sKV2pH36eZodlGR0JjyeEHVkzLQhTgI/by8R0qvIBjnnbuOA3Qu3J4Kkcawe0OS9+BIlrVMW0F82Ch8sEGZP1Ylx3M+ZbzUvVOJtfsmVWVhXIcoCm20vWeDG/SQDgQlITumk9JkHqXr2roMBLRvybbhjWAT1k/VR8uNe9+AwHOuJFLf5eGcg6BtZ+iGRHDeEpb1Ojc9em/u1/RMhvzl9511l+TrtTtGBm97jqqgvrYDMlNW3dnpfQg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A user or admin can configure a VMM (Qemu) Guest's memory to be backed by hugetlb pages for various reasons. However, a Guest OS would still allocate (and pin) buffers that are backed by regular 4k sized pages. In order to map these buffers and create dma-bufs for them on the Host, we first need to find the hugetlb pages where the buffer allocations are located and then determine the offsets of individual chunks (within those pages) and use this information to eventually populate a scatterlist. Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500 options were passed to the Host kernel and Qemu was launched with these relevant options: qemu-system-x86_64 -m 4096m.... -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080 -display gtk,gl=on -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M -machine memory-backend=mem1 Replacing -display gtk,gl=on with -display gtk,gl=off above would exercise the mmap handler. v2: Updated get_sg_table() to manually populate the scatterlist for both huge page and non-huge-page cases. v3: s/offsets/subpgoff/g s/hpoff/mapidx/g v4: Replaced find_get_page_flags() with __filemap_get_folio() to ensure that we only obtain head pages from the mapping v5: Fix the calculation of mapidx to ensure that it is a order-n page multiple v6: - Split the processing of hugetlb or shmem pages into helpers to simplify the code in udmabuf_create() (Christoph) - Move the creation of offsets array out of hugetlb context and into common code Cc: David Hildenbrand Cc: Daniel Vetter Cc: Mike Kravetz Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Junxiao Chang Acked-by: Mike Kravetz (v2) Signed-off-by: Vivek Kasireddy --- drivers/dma-buf/udmabuf.c | 122 +++++++++++++++++++++++++++++++------- 1 file changed, 101 insertions(+), 21 deletions(-) diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 820c993c8659..274defd3fa3e 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -28,6 +29,7 @@ struct udmabuf { struct page **pages; struct sg_table *sg; struct miscdevice *device; + pgoff_t *offsets; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -41,6 +43,8 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) return VM_FAULT_SIGBUS; pfn = page_to_pfn(ubuf->pages[pgoff]); + pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; + return vmf_insert_pfn(vma, vmf->address, pfn); } @@ -90,23 +94,29 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, { struct udmabuf *ubuf = buf->priv; struct sg_table *sg; + struct scatterlist *sgl; + unsigned int i = 0; int ret; sg = kzalloc(sizeof(*sg), GFP_KERNEL); if (!sg) return ERR_PTR(-ENOMEM); - ret = sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount, - 0, ubuf->pagecount << PAGE_SHIFT, - GFP_KERNEL); + + ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL); if (ret < 0) - goto err; + goto err_alloc; + + for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) + sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, ubuf->offsets[i]); + ret = dma_map_sgtable(dev, sg, direction, 0); if (ret < 0) - goto err; + goto err_map; return sg; -err: +err_map: sg_free_table(sg); +err_alloc: kfree(sg); return ERR_PTR(ret); } @@ -143,6 +153,7 @@ static void release_udmabuf(struct dma_buf *buf) for (pg = 0; pg < ubuf->pagecount; pg++) put_page(ubuf->pages[pg]); + kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); } @@ -196,17 +207,77 @@ static const struct dma_buf_ops udmabuf_ops = { #define SEALS_WANTED (F_SEAL_SHRINK) #define SEALS_DENIED (F_SEAL_WRITE) +static int handle_hugetlb_pages(struct udmabuf *ubuf, struct file *memfd, + pgoff_t offset, pgoff_t pgcnt, + pgoff_t *pgbuf) +{ + struct hstate *hpstate = hstate_file(memfd); + pgoff_t mapidx = offset >> huge_page_shift(hpstate); + pgoff_t subpgoff = (offset & ~huge_page_mask(hpstate)) >> PAGE_SHIFT; + pgoff_t maxsubpgs = huge_page_size(hpstate) >> PAGE_SHIFT; + struct page *hpage = NULL; + struct folio *folio; + pgoff_t pgidx; + + mapidx <<= huge_page_order(hpstate); + for (pgidx = 0; pgidx < pgcnt; pgidx++) { + if (!hpage) { + folio = __filemap_get_folio(memfd->f_mapping, + mapidx, + FGP_ACCESSED, 0); + if (IS_ERR(folio)) + return PTR_ERR(folio); + + hpage = &folio->page; + } + + get_page(hpage); + ubuf->pages[*pgbuf] = hpage; + ubuf->offsets[*pgbuf] = subpgoff << PAGE_SHIFT; + (*pgbuf)++; + if (++subpgoff == maxsubpgs) { + put_page(hpage); + hpage = NULL; + subpgoff = 0; + mapidx += pages_per_huge_page(hpstate); + } + } + + if (hpage) + put_page(hpage); + + return 0; +} + +static int handle_shmem_pages(struct udmabuf *ubuf, struct file *memfd, + pgoff_t offset, pgoff_t pgcnt, + pgoff_t *pgbuf) +{ + pgoff_t pgidx, pgoff = offset >> PAGE_SHIFT; + struct page *page; + + for (pgidx = 0; pgidx < pgcnt; pgidx++) { + page = shmem_read_mapping_page(memfd->f_mapping, + pgoff + pgidx); + if (IS_ERR(page)) + return PTR_ERR(page); + + ubuf->pages[*pgbuf] = page; + (*pgbuf)++; + } + + return 0; +} + static long udmabuf_create(struct miscdevice *device, struct udmabuf_create_list *head, struct udmabuf_create_item *list) { DEFINE_DMA_BUF_EXPORT_INFO(exp_info); struct file *memfd = NULL; - struct address_space *mapping = NULL; struct udmabuf *ubuf; struct dma_buf *buf; - pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit; - struct page *page; + pgoff_t pgcnt, pgbuf = 0, pglimit; int seals, ret = -EINVAL; u32 i, flags; @@ -234,6 +305,12 @@ static long udmabuf_create(struct miscdevice *device, ret = -ENOMEM; goto err; } + ubuf->offsets = kcalloc(ubuf->pagecount, sizeof(*ubuf->offsets), + GFP_KERNEL); + if (!ubuf->offsets) { + ret = -ENOMEM; + goto err; + } pgbuf = 0; for (i = 0; i < head->count; i++) { @@ -241,8 +318,7 @@ static long udmabuf_create(struct miscdevice *device, memfd = fget(list[i].memfd); if (!memfd) goto err; - mapping = memfd->f_mapping; - if (!shmem_mapping(mapping)) + if (!shmem_file(memfd) && !is_file_hugepages(memfd)) goto err; seals = memfd_fcntl(memfd, F_GET_SEALS, 0); if (seals == -EINVAL) @@ -251,16 +327,19 @@ static long udmabuf_create(struct miscdevice *device, if ((seals & SEALS_WANTED) != SEALS_WANTED || (seals & SEALS_DENIED) != 0) goto err; - pgoff = list[i].offset >> PAGE_SHIFT; - pgcnt = list[i].size >> PAGE_SHIFT; - for (pgidx = 0; pgidx < pgcnt; pgidx++) { - page = shmem_read_mapping_page(mapping, pgoff + pgidx); - if (IS_ERR(page)) { - ret = PTR_ERR(page); - goto err; - } - ubuf->pages[pgbuf++] = page; - } + + pgcnt = list[i].size >> PAGE_SHIFT; + if (is_file_hugepages(memfd)) + ret = handle_hugetlb_pages(ubuf, memfd, + list[i].offset, + pgcnt, &pgbuf); + else + ret = handle_shmem_pages(ubuf, memfd, + list[i].offset, + pgcnt, &pgbuf); + if (ret < 0) + goto err; + fput(memfd); memfd = NULL; } @@ -287,6 +366,7 @@ static long udmabuf_create(struct miscdevice *device, put_page(ubuf->pages[--pgbuf]); if (memfd) fput(memfd); + kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); return ret;