From patchwork Wed Dec 27 07:38:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kasireddy, Vivek" X-Patchwork-Id: 13505183 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04AEBC47073 for ; Wed, 27 Dec 2023 08:02:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C4E718D0007; Wed, 27 Dec 2023 03:02:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BFF278D0001; Wed, 27 Dec 2023 03:02:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A2A978D0007; Wed, 27 Dec 2023 03:02:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8818C8D0001 for ; Wed, 27 Dec 2023 03:02:44 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 51A321406CA for ; Wed, 27 Dec 2023 08:02:44 +0000 (UTC) X-FDA: 81611856648.02.001190E Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by imf21.hostedemail.com (Postfix) with ESMTP id 5C8911C001E for ; Wed, 27 Dec 2023 08:02:42 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Vf5ExOCz; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf21.hostedemail.com: domain of vivek.kasireddy@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703664162; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AjirTdZ1/+2r7DM5iaSbg1iK9Y5ayl+qnV7mb9bRMUM=; b=WehgLp1tuy6UvqMpZ1vtKEwIY/bR/R1aQnPO3lez3OEsmPAlYfO8H/kMEey93j0x2ISePP U97UTyvBgEyWPnJ+QiU3Inuon8oyFn/5/VRvh/mSyQ/hlex4VSqlkCadlP5vBZRes0gZlo oVpSjdUsfuaeyNTCcPqg02nbpAf5N8s= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Vf5ExOCz; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf21.hostedemail.com: domain of vivek.kasireddy@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703664162; a=rsa-sha256; cv=none; b=3dS8axAXy9ajRKb9Z+PZ50yQ1FZAx5Za/95p16q8+lf5c5nYZMCtBqoBgfbz42/GiZBnHU Pg2dN2kk46fypq8ZfcR5cFRMoRfnCGsQJv0QWL/x3lCNU6XlpqMh0RepJvuQqaKhLW1yXN Hs6eerKDqfrgeXTjRsE6hzTLccSQsmU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1703664162; x=1735200162; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+D2WzD3rI8NbEmKw61bSPQboFfyOeN2u/DR9dqBLiB0=; b=Vf5ExOCzxnRABrbV3XLQGE2yPvVqJc4KshcXosg5hdzuS64QKdrMJb80 j8qKLzLxPz0b31fcNAx5mcz484AryqZ7hg3I9uXtcyrYCItKJPxvUduqM z2O7IQe/cH7KOJcKGt374iki38XfJjbmwbUCh4omsS4Jv9xa5TjfwPBgC 7pkPZcDNR+iUOL1r2Vu9nXL1fjRvG2pPd+OCeb7nvoRD0qi24NHtRnfCQ 2hr7R4Z30Nka7ikDNzphZBpin9Qcf88n8SVwe5RsJijC9DGZFYte5z297 G1nMn6jO2Np9qpvNnKmYext/PCqoOFpY2mqJt65tPonRk4YRbrVc8sK30 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10935"; a="482604320" X-IronPort-AV: E=Sophos;i="6.04,308,1695711600"; d="scan'208";a="482604320" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Dec 2023 00:02:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10935"; a="781668194" X-IronPort-AV: E=Sophos;i="6.04,308,1695711600"; d="scan'208";a="781668194" Received: from vkasired-desk2.fm.intel.com ([10.105.128.132]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Dec 2023 00:02:36 -0800 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Cc: Vivek Kasireddy , David Hildenbrand , Matthew Wilcox , Daniel Vetter , Mike Kravetz , Hugh Dickins , Peter Xu , Jason Gunthorpe , Gerd Hoffmann , Dongwon Kim , Junxiao Chang Subject: [PATCH v9 5/6] udmabuf: Pin the pages using memfd_pin_folios() API (v7) Date: Tue, 26 Dec 2023 23:38:21 -0800 Message-Id: <20231227073822.390518-6-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231227073822.390518-1-vivek.kasireddy@intel.com> References: <20231227073822.390518-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 5C8911C001E X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: mru53ewtuhz9hp8amgqs56o75utczu3s X-HE-Tag: 1703664162-768529 X-HE-Meta: U2FsdGVkX18um8z9Vyr0zdN4pNAAXfpnGftjiBvMQ6oiLxk016stz2Nvduh7aW2qWanOxWsI1o/HHNHkkLWcsE7uiURacyO33nMD/QZFJw3ic7FBmko/hCQogTn47EFgJglf4W9OSTNr8kpn26ogLw5AvQdAQUKCCCbgyDttYtLl4+FwSRULC60MPlI0oMw9wocjEkVQ4tTWsqfd7/i3FaEQ9uI2eBdXYza+fSxscJDu/3xXgOMGZoBZCMiaPh6cfVXeR1SsZ9AzrgPIwma63B74oLKLhmoFj6Z1r4AydA8ByqIxYapVzB8gtFotybi9HtdF4LL64npP+K5fNasVjB0uP4GXWvaW/pAsPTYWv3CEqleGclUnIo0gNMuOLuBxsfaiCNWqf8lgVXGK+t9R1I7/ejuc63Xx99dgsA5lgb3jaeb+UPUADYx4Ed/H0RmkIawPSnbvVYvKvt/nxrYHo7gDbZ3Irx6y2Picia2QMWIXNwxihb8TY90KgIzBqTJVnUINuK1ucFhgvkVMndaUpAPCqmfl2o9OIo+uvHbdhH5DNQurUNRqfwAWqsDSx4pKJKQfihKzREGpyZu27bQ2CzIDhabwLJmcLdwF300owHMtAEBS/tiv1XHhY2sabtLqihpvZ7YVWm9zfZjyasJMMMGb6senFORARQaS6kR2mvqDbrZ+XCLFjdYf+998f8PSRnzI3ZFHDBNd3B+qmxH5d3t7+hn6wpvZf5EOmWSwiMneU57DQXRgSWSo51KBmuDhUhzzIpKDn8CguAEZFA09myZQyAtUK51nyJQicviK1rlp97FIJkVtQ6qkPWcVkXN5g51hjWQz/wHCyAwBi9YW+hM9bKL8B+Ww5j6OgSGFs/M505t+5pY5Bz0dtFCVr5Z+2RGDRgotZnJn9eNhiRnnMJjTa3My2TX2kbeUjXNToujPQdN2TP5aoA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Using memfd_pin_folios() will ensure that the pages are pinned correctly using FOLL_PIN. And, this also ensures that we don't accidentally break features such as memory hotunplug as it would not allow pinning pages in the movable zone. Using this new API also simplifies the code as we no longer have to deal with extracting individual pages from their mappings or handle shmem and hugetlb cases separately. v2: - Adjust to the change in signature of pin_user_pages_fd() by passing in file * instead of fd. v3: - Limit the changes in this patch only to those that are required for using pin_user_pages_fd() - Slightly improve the commit message v4: - Adjust to the change in name of the API (memfd_pin_user_pages) v5: - Adjust to the changes in memfd_pin_folios which now populates a list of folios and offsets v6: - Don't unnecessarily use folio_page() (Matthew) - Pass [start, end] and max_folios to memfd_pin_folios() - Create another temporary array to hold the folios returned by memfd_pin_folios() as we populate ubuf->folios. - Unpin the folios only once as memfd_pin_folios pins them once v7: - Use a list to track the folios that need to be unpinned Cc: David Hildenbrand Cc: Matthew Wilcox Cc: Daniel Vetter Cc: Mike Kravetz Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Junxiao Chang Signed-off-by: Vivek Kasireddy --- drivers/dma-buf/udmabuf.c | 153 +++++++++++++++++++------------------- 1 file changed, 78 insertions(+), 75 deletions(-) diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index a8f3af61f7f2..8086c2b5be5a 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -30,6 +30,12 @@ struct udmabuf { struct sg_table *sg; struct miscdevice *device; pgoff_t *offsets; + struct list_head unpin_list; +}; + +struct udmabuf_folio { + struct folio *folio; + struct list_head list; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -153,17 +159,43 @@ static void unmap_udmabuf(struct dma_buf_attachment *at, return put_sg_table(at->dev, sg, direction); } +static void unpin_all_folios(struct list_head *unpin_list) +{ + struct udmabuf_folio *ubuf_folio; + + while (!list_empty(unpin_list)) { + ubuf_folio = list_first_entry(unpin_list, + struct udmabuf_folio, list); + unpin_user_page(&ubuf_folio->folio->page); + + list_del(&ubuf_folio->list); + kfree(ubuf_folio); + } +} + +static int add_to_unpin_list(struct list_head *unpin_list, + struct folio *folio) +{ + struct udmabuf_folio *ubuf_folio; + + ubuf_folio = kzalloc(sizeof(*ubuf_folio), GFP_KERNEL); + if (!ubuf_folio) + return -ENOMEM; + + ubuf_folio->folio = folio; + list_add_tail(&ubuf_folio->list, unpin_list); + return 0; +} + static void release_udmabuf(struct dma_buf *buf) { struct udmabuf *ubuf = buf->priv; struct device *dev = ubuf->device->this_device; - pgoff_t pg; if (ubuf->sg) put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL); - for (pg = 0; pg < ubuf->pagecount; pg++) - folio_put(ubuf->folios[pg]); + unpin_all_folios(&ubuf->unpin_list); kfree(ubuf->offsets); kfree(ubuf->folios); kfree(ubuf); @@ -218,64 +250,6 @@ static const struct dma_buf_ops udmabuf_ops = { #define SEALS_WANTED (F_SEAL_SHRINK) #define SEALS_DENIED (F_SEAL_WRITE) -static int handle_hugetlb_pages(struct udmabuf *ubuf, struct file *memfd, - pgoff_t offset, pgoff_t pgcnt, - pgoff_t *pgbuf) -{ - struct hstate *hpstate = hstate_file(memfd); - pgoff_t mapidx = offset >> huge_page_shift(hpstate); - pgoff_t subpgoff = (offset & ~huge_page_mask(hpstate)) >> PAGE_SHIFT; - pgoff_t maxsubpgs = huge_page_size(hpstate) >> PAGE_SHIFT; - struct folio *folio = NULL; - pgoff_t pgidx; - - mapidx <<= huge_page_order(hpstate); - for (pgidx = 0; pgidx < pgcnt; pgidx++) { - if (!folio) { - folio = __filemap_get_folio(memfd->f_mapping, - mapidx, - FGP_ACCESSED, 0); - if (IS_ERR(folio)) - return PTR_ERR(folio); - } - - folio_get(folio); - ubuf->folios[*pgbuf] = folio; - ubuf->offsets[*pgbuf] = subpgoff << PAGE_SHIFT; - (*pgbuf)++; - if (++subpgoff == maxsubpgs) { - folio_put(folio); - folio = NULL; - subpgoff = 0; - mapidx += pages_per_huge_page(hpstate); - } - } - - if (folio) - folio_put(folio); - - return 0; -} - -static int handle_shmem_pages(struct udmabuf *ubuf, struct file *memfd, - pgoff_t offset, pgoff_t pgcnt, - pgoff_t *pgbuf) -{ - pgoff_t pgidx, pgoff = offset >> PAGE_SHIFT; - struct folio *folio = NULL; - - for (pgidx = 0; pgidx < pgcnt; pgidx++) { - folio = shmem_read_folio(memfd->f_mapping, pgoff + pgidx); - if (IS_ERR(folio)) - return PTR_ERR(folio); - - ubuf->folios[*pgbuf] = folio; - (*pgbuf)++; - } - - return 0; -} - static int check_memfd_seals(struct file *memfd) { int seals; @@ -321,16 +295,19 @@ static long udmabuf_create(struct miscdevice *device, struct udmabuf_create_list *head, struct udmabuf_create_item *list) { - pgoff_t pgcnt, pgbuf = 0, pglimit; + pgoff_t pgoff, pgcnt, pglimit, pgbuf = 0; + long nr_folios, ret = -EINVAL; struct file *memfd = NULL; + struct folio **folios; struct udmabuf *ubuf; - int ret = -EINVAL; - u32 i, flags; + u32 i, j, k, flags; + loff_t end; ubuf = kzalloc(sizeof(*ubuf), GFP_KERNEL); if (!ubuf) return -ENOMEM; + INIT_LIST_HEAD(&ubuf->unpin_list); pglimit = (size_limit_mb * 1024 * 1024) >> PAGE_SHIFT; for (i = 0; i < head->count; i++) { if (!IS_ALIGNED(list[i].offset, PAGE_SIZE)) @@ -366,17 +343,44 @@ static long udmabuf_create(struct miscdevice *device, goto err; pgcnt = list[i].size >> PAGE_SHIFT; - if (is_file_hugepages(memfd)) - ret = handle_hugetlb_pages(ubuf, memfd, - list[i].offset, - pgcnt, &pgbuf); - else - ret = handle_shmem_pages(ubuf, memfd, - list[i].offset, - pgcnt, &pgbuf); - if (ret < 0) + folios = kmalloc_array(pgcnt, sizeof(*folios), GFP_KERNEL); + if (!folios) { + ret = -ENOMEM; goto err; + } + end = list[i].offset + (pgcnt << PAGE_SHIFT) - 1; + ret = memfd_pin_folios(memfd, list[i].offset, end, + folios, pgcnt, &pgoff); + if (ret < 0) { + kfree(folios); + goto err; + } + + nr_folios = ret; + pgoff >>= PAGE_SHIFT; + for (j = 0, k = 0; j < pgcnt; j++) { + ubuf->folios[pgbuf] = folios[k]; + ubuf->offsets[pgbuf] = pgoff << PAGE_SHIFT; + + if (j == 0 || ubuf->folios[pgbuf-1] != folios[k]) { + ret = add_to_unpin_list(&ubuf->unpin_list, + folios[k]); + if (ret < 0) { + kfree(folios); + goto err; + } + } + + pgbuf++; + if (++pgoff == folio_nr_pages(folios[k])) { + pgoff = 0; + if (++k == nr_folios) + break; + } + } + + kfree(folios); fput(memfd); } @@ -388,10 +392,9 @@ static long udmabuf_create(struct miscdevice *device, return ret; err: - while (pgbuf > 0) - folio_put(ubuf->folios[--pgbuf]); if (memfd) fput(memfd); + unpin_all_folios(&ubuf->unpin_list); kfree(ubuf->offsets); kfree(ubuf->folios); kfree(ubuf);