From patchwork Tue Jan 14 08:08:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kasireddy, Vivek" X-Patchwork-Id: 13938513 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D627E77188 for ; Tue, 14 Jan 2025 08:38:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DEDC8280001; Tue, 14 Jan 2025 03:38:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D9E676B008A; Tue, 14 Jan 2025 03:38:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C655A280001; Tue, 14 Jan 2025 03:38:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A84156B0089 for ; Tue, 14 Jan 2025 03:38:15 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id F30F3160737 for ; Tue, 14 Jan 2025 08:38:14 +0000 (UTC) X-FDA: 83005405308.22.49F8BE0 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by imf16.hostedemail.com (Postfix) with ESMTP id 3BF8C180002 for ; Tue, 14 Jan 2025 08:38:12 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=CVfpnybA; spf=pass (imf16.hostedemail.com: domain of vivek.kasireddy@intel.com designates 192.198.163.18 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736843893; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=80crgMKZBnt9bmXlZ2aLdewJASLD0++WegCvcNqM2Pk=; b=PiLMVKssGyoe4jNARcVt93PWb4CXuSTwOGzxS7YaIqjaxxhc9aX32x07+9fRxFMiCzqftM buWESUHj7yMuUqPJAkLwWr/OouzDd3qCnHXlXtYpAoM55liFddSSJ3n1ak4a6Bm+sULmJ1 hQ2jZYcMFpZyNcThHFwwb601ywcBLC8= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=CVfpnybA; spf=pass (imf16.hostedemail.com: domain of vivek.kasireddy@intel.com designates 192.198.163.18 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736843893; a=rsa-sha256; cv=none; b=rt1YRGF845uuPEuD34EsTcsEL/aftIrCGxcHWvktUCuy0ipyuIspYSckxYL+KN9svDez2R ObJRAKruuTRooC/cMpP95hKEMRPvEV1X+7esul9befPXNqN2grlav7zYcy/gOEiF9APQFd CeeOuw9kWqSq9M0be34cIHaRn6zmMeU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736843892; x=1768379892; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AGr3HND4zgilxVnk1lyq4l3ZnqiUJy/LoPlwt7ZEFYA=; b=CVfpnybAc7LEYsK9VY/VPRZvC9NZZToFUeWsx7APSiGGvwCeQkm2ZIJJ zwWNrRQniBgQGuw7+0J034KnX0dGiIaA+wjKHjIbJSGSVIYH3LD2Oucl+ d8IEn+gEKyCkLjRxiXiU1MErFos1ZyBCTGgSjp/ypNAyXvabDOuQ/45y7 Tj1P22K8vXFxUluz/s9cu90Va97LS15WWANPjvbd4OXVLRvjwFZQp54at tLpz+cwkCw7DqBTbwIsrb9ZjFhGXRuXJW4iG9EUZbPJOm0cGM3YC17sYV 5J2RpyEHW0A0vAEj3JJK8mmXs10OAcCawGy4jy84LWSL32LT8rusCFJaK g==; X-CSE-ConnectionGUID: LBQodTK5QlmsbzmM9A868A== X-CSE-MsgGUID: 3/rwBuFOR52Sx1d8Kykp1w== X-IronPort-AV: E=McAfee;i="6700,10204,11314"; a="36418225" X-IronPort-AV: E=Sophos;i="6.12,313,1728975600"; d="scan'208";a="36418225" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2025 00:38:10 -0800 X-CSE-ConnectionGUID: wkVOiRxRTka7a4K9WUv16Q== X-CSE-MsgGUID: 2HrSYbVTTZa1Ve8SNlQiXg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,313,1728975600"; d="scan'208";a="105251577" Received: from vkasired-desk2.fm.intel.com ([10.105.128.132]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2025 00:38:10 -0800 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Cc: Vivek Kasireddy , syzbot+a504cb5bae4fe117ba94@syzkaller.appspotmail.com, Steve Sistare , Muchun Song , David Hildenbrand , Andrew Morton Subject: [PATCH v2 1/2] mm/memfd: reserve hugetlb folios before allocation Date: Tue, 14 Jan 2025 00:08:00 -0800 Message-ID: <20250114080927.2616684-2-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250114080927.2616684-1-vivek.kasireddy@intel.com> References: <20250114080927.2616684-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 3BF8C180002 X-Rspamd-Server: rspam12 X-Stat-Signature: tr8aehw8zq1dxnxjajdpjxanj7mugbin X-Rspam-User: X-HE-Tag: 1736843892-29184 X-HE-Meta: U2FsdGVkX1+Azny4fuAfF5FDg1Poniz3REX0AFmn95atVbGNl3+sz2DQRBuhUw6yXf1qhR5ZZNRrhXWL3HCOm4IfErUkloaYqL7A6wdIOk8W8PMyoGEDjxE+b11g9vAkPmkB3vUoKoljP8kF+QUOXf3wRlwnIBIDFIJwI8VP795AKJeyAHDW6H1BI31npx/RajBXuoCSUJeg/xrwRvuuNjtdQRF1LfW2ib3U7xROam+PlmGXYFF8DH+6/l/YLT4x1QflzSyiLNxU9QM3jRKKLbokvGxbHMr/VvHNKMF9CImKOXj91LlAlu/ZJGPsWCYq9YLmD9/DAaJ0JQfeee6C2TbTCmA/k+bsSEKFV2a6Ue6udW0WHbAsrk6ft4OiQT8JrcTnQcuqxPwOG+mdSaNbCzjsm1tSLW1vk81k2f/jyvGnB7hNmVRH6UnC2lhCoEdEKjds9cC62T4Ri7O39FYdPozh8ELSV2DAgPW11DL0eZgODN485rvEiAXpnZz41XrRS5N1rkPWUfyVXLyAxK74qNANSOx31ZCXSlc1/adwxc53iX2Y7OczB6ISC2ppNZuMtPzbNtRWhV3r3i488BfkpalDJDBh8c1EwPHN4JqTvq/VkOur2bopvu02kJ1jP6L6vRHTJIYJjB0yeOIvch87rsL1Ouzte3tjW2qp6YO+GpGlO7TlSAK4X16wAWk2J3FXMpPqiSSvRoJwFirxksVRNf6s1BxxKteDgfbdmv3mCrbKVfkXtDtssKq1MhYXyCSDRJHRQAACRtZY2ghOUxgdc0tR9IfVztzbxeEdxK6S1IpVc2448446d1cK1PWjs3pyuAp7cmE7QGla9gXppKEVLAkb6kAefcf1VFYVYfscNfMB1ZYREOxmOh8GxuQELvaavc5EIiffTjHUvBH7F5k5yi39+1JjXMclxR1OqLYmqoZ4HIu3vRObq2QcOhssrZ1paO0JbVSbXOHQsXi8stv J2q+MsMw eNNehJMYTvZ3dFf5NF8MLkG6TR1XdBdlPRZL7RKC2/sl4M/9zYsUdrpfWBMa2yXBv7S52XJLm5GCYU7IcoDTbhGgPFNLGOm02oNNRBeXoiLT8zQwyXsZT70AerIFybYQb5SRV+Bg7vtKuBRn9EBIbjRN/fX0y0ct1MjztTNyodL9qBRe35hfY1JPkeCJSDrK3DSJ1q6pk95EAtldrquqXBhSpHRQJ57IMQ/6ewgZ6z96356UlDqio6UK/7OYvpiZi9mbl5FxmtPlPoJy/LFcDgLiS3pFX6SVfcdKivRsqJdvNFfIvdgLNzyDxqFBGGsLOZC9er94Ci3N22thUJifjXi4uJkBBqFqqc2qtB016Zgx6QMZie3doQRoRE/ypBtwqwStNbNwW5ZNa53tnUNqZWhGdZmUQHAQZ3WD0SwAoYD7xM2x/A9bqzKVRHQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: There are cases when we try to pin a folio but discover that it has not been faulted-in. So, we try to allocate it in memfd_alloc_folio() but there is a chance that we might encounter a crash/failure (VM_BUG_ON(!h->resv_huge_pages)) if there are no active reservations at that instant. This issue was reported by syzbot: kernel BUG at mm/hugetlb.c:2403! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5315 Comm: syz.0.0 Not tainted 6.13.0-rc5-syzkaller-00161-g63676eefb7a0 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:alloc_hugetlb_folio_reserve+0xbc/0xc0 mm/hugetlb.c:2403 Code: 1f eb 05 e8 56 18 a0 ff 48 c7 c7 40 56 61 8e e8 ba 21 cc 09 4c 89 f0 5b 41 5c 41 5e 41 5f 5d c3 cc cc cc cc e8 35 18 a0 ff 90 <0f> 0b 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f RSP: 0018:ffffc9000d3d77f8 EFLAGS: 00010087 RAX: ffffffff81ff6beb RBX: 0000000000000000 RCX: 0000000000100000 RDX: ffffc9000e51a000 RSI: 00000000000003ec RDI: 00000000000003ed RBP: 1ffffffff34810d9 R08: ffffffff81ff6ba3 R09: 1ffffd4000093005 R10: dffffc0000000000 R11: fffff94000093006 R12: dffffc0000000000 R13: dffffc0000000000 R14: ffffea0000498000 R15: ffffffff9a4086c8 FS: 00007f77ac12e6c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f77ab54b170 CR3: 0000000040b70000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: memfd_alloc_folio+0x1bd/0x370 mm/memfd.c:88 memfd_pin_folios+0xf10/0x1570 mm/gup.c:3750 udmabuf_pin_folios drivers/dma-buf/udmabuf.c:346 [inline] udmabuf_create+0x70e/0x10c0 drivers/dma-buf/udmabuf.c:443 udmabuf_ioctl_create drivers/dma-buf/udmabuf.c:495 [inline] udmabuf_ioctl+0x301/0x4e0 drivers/dma-buf/udmabuf.c:526 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Therefore, to avoid this situation and fix this issue, we just need to make a reservation (by calling hugetlb_reserve_pages()) before we try to allocate the folio. This will ensure that we are properly doing region/subpool accounting associated with our allocation. While at it, move subpool_inode() into hugetlb header and also replace the VM_BUG_ON() with WARN_ON_ONCE() as there is no need to crash the system in this scenario and instead we could just warn and fail the allocation. Fixes: 26a8ea80929c ("mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak") Reported-by: syzbot+a504cb5bae4fe117ba94@syzkaller.appspotmail.com Signed-off-by: Vivek Kasireddy Cc: Steve Sistare Cc: Muchun Song Cc: David Hildenbrand Cc: Andrew Morton --- include/linux/hugetlb.h | 5 +++++ mm/hugetlb.c | 14 ++++++-------- mm/memfd.c | 14 +++++++++++--- 3 files changed, 22 insertions(+), 11 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ae4fe8615bb6..38c580548564 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -712,6 +712,11 @@ extern unsigned int default_hstate_idx; #define default_hstate (hstates[default_hstate_idx]) +static inline struct hugepage_subpool *subpool_inode(struct inode *inode) +{ + return HUGETLBFS_SB(inode->i_sb)->spool; +} + static inline struct hugepage_subpool *hugetlb_folio_subpool(struct folio *folio) { return folio->_hugetlb_subpool; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c498874a7170..ef948f56b864 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -251,11 +251,6 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, return ret; } -static inline struct hugepage_subpool *subpool_inode(struct inode *inode) -{ - return HUGETLBFS_SB(inode->i_sb)->spool; -} - static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma) { return subpool_inode(file_inode(vma->vm_file)); @@ -2397,12 +2392,15 @@ struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid, struct folio *folio; spin_lock_irq(&hugetlb_lock); + if (WARN_ON_ONCE(!h->resv_huge_pages)) { + spin_unlock_irq(&hugetlb_lock); + return NULL; + } + folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, preferred_nid, nmask); - if (folio) { - VM_BUG_ON(!h->resv_huge_pages); + if (folio) h->resv_huge_pages--; - } spin_unlock_irq(&hugetlb_lock); return folio; diff --git a/mm/memfd.c b/mm/memfd.c index 35a370d75c9a..0d128c44fb78 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -70,7 +70,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx) #ifdef CONFIG_HUGETLB_PAGE struct folio *folio; gfp_t gfp_mask; - int err; + int err = -ENOMEM; if (is_file_hugepages(memfd)) { /* @@ -79,12 +79,16 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx) * alloc from. Also, the folio will be pinned for an indefinite * amount of time, so it is not expected to be migrated away. */ + struct inode *inode = file_inode(memfd); struct hstate *h = hstate_file(memfd); gfp_mask = htlb_alloc_mask(h); gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE); idx >>= huge_page_order(h); + if (!hugetlb_reserve_pages(inode, idx, idx + 1, NULL, 0)) + return ERR_PTR(err); + folio = alloc_hugetlb_folio_reserve(h, numa_node_id(), NULL, @@ -95,12 +99,16 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx) idx); if (err) { folio_put(folio); - return ERR_PTR(err); + goto err; } + + hugetlb_set_folio_subpool(folio, subpool_inode(inode)); folio_unlock(folio); return folio; } - return ERR_PTR(-ENOMEM); +err: + hugetlb_unreserve_pages(inode, idx, idx + 1, 0); + return ERR_PTR(err); } #endif return shmem_read_folio(memfd->f_mapping, idx);