From patchwork Wed May 24 21:36:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13254550 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3BE4C7EE2E for ; Wed, 24 May 2023 21:36:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3BCC7900004; Wed, 24 May 2023 17:36:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 36D00900003; Wed, 24 May 2023 17:36:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E3B9E900004; Wed, 24 May 2023 17:36:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BEF3C900004 for ; Wed, 24 May 2023 17:36:39 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8A81C16060E for ; Wed, 24 May 2023 21:36:39 +0000 (UTC) X-FDA: 80826458118.07.A0BCF4F Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf21.hostedemail.com (Postfix) with ESMTP id C78251C0006 for ; Wed, 24 May 2023 21:36:37 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=JI86TZpQ; spf=none (imf21.hostedemail.com: domain of mcgrof@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=mcgrof@infradead.org; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684964197; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RrkUWbut98yR/Hf36f02z4xhJTDFlh9boXGuuIUOBdc=; b=Qkb/YpHR+JPhgP+sg5ElIfnRXKu3f2rGPEis2oNqIa3RTzAphS55RvXj+tO1Dd2uYGef9v anvYzocMKPyrR48NtnnxuztQ0xVotpLzNFD7ljDwvbxx3A8Cr3+BZp+tEFC5rFBSB2vTf/ 4H1er3hOqdSFa3oL4S9EyLI05IarMQU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684964197; a=rsa-sha256; cv=none; b=nfvVi0eVjc87TBZxeRnkx8rAbtZB74YVy/0TSGdhAQpIBrzn4uTKVEqXN3fn5wsb+Jl11F uSeswTMi6oOAa6l5YhTry/SJdt10Sgu7nKg+kUO9SBOGyQIipQWxjHWPSR9B6YDwwp9r45 HcDtlugh8ofh7Z61Rxix45+dW33f43Y= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=JI86TZpQ; spf=none (imf21.hostedemail.com: domain of mcgrof@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=mcgrof@infradead.org; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=RrkUWbut98yR/Hf36f02z4xhJTDFlh9boXGuuIUOBdc=; b=JI86TZpQzeC3bG7KmsKZkbjk8G KSEq3nTtgdgdmj6zWF+bNIOhfUetX1zI2/UxJhH05DP7ep6N0Rf1Thg7MOKhEQg1RMxWKjIg7fkzK F1GjeWjLn921tDj23m3zW4XyfsHOIznSF29SUE3WoobhA3caf5D9FFlrXMNIH626/e/UgrWpK2dJR AIVQw4i0AC0pqRxEgjUmP/E+VzhNPrs6qSySaOwkODTiMP+GKK0AbMFTKSi0s2Y+RWiql/hbe+uqr 3U/a/HpdFwQ5m2ZSJDfS81XDnLyifEcPnWk9QeZEcXOxyKb1qOrhBNOQpw541aB0jwmJ0IfcAiQ+4 LoyCrI2g==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.96 #2 (Red Hat Linux)) id 1q1w9d-00EitH-0z; Wed, 24 May 2023 21:36:21 +0000 From: Luis Chamberlain To: david@redhat.com, tglx@linutronix.de, hch@lst.de, patches@lists.linux.dev, linux-modules@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, pmladek@suse.com, petr.pavlu@suse.com, prarit@redhat.com, torvalds@linux-foundation.org, lennart@poettering.net Cc: gregkh@linuxfoundation.org, rafael@kernel.org, song@kernel.org, lucas.de.marchi@gmail.com, lucas.demarchi@intel.com, christophe.leroy@csgroup.eu, peterz@infradead.org, rppt@kernel.org, dave@stgolabs.net, willy@infradead.org, vbabka@suse.cz, mhocko@suse.com, dave.hansen@linux.intel.com, colin.i.king@gmail.com, jim.cromie@gmail.com, catalin.marinas@arm.com, jbaron@akamai.com, rick.p.edgecombe@intel.com, yujie.liu@intel.com, mcgrof@kernel.org Subject: [PATCH 1/2] fs/kernel_read_file: add support for duplicate detection Date: Wed, 24 May 2023 14:36:19 -0700 Message-Id: <20230524213620.3509138-2-mcgrof@kernel.org> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20230524213620.3509138-1-mcgrof@kernel.org> References: <20230524213620.3509138-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Stat-Signature: sd6tanb8kpbsggaoeyi6jnrfu6obzadq X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: C78251C0006 X-Rspam-User: X-HE-Tag: 1684964197-856759 X-HE-Meta: U2FsdGVkX18qazlXFrH87L19f/BbVQqmomPLbci/br9c8qez1syfyXPvsNzVdA9n/qRmHfbl50/bSqW0gx9Nof3HsDPEPrNTXsy8m8V1XgX3S3jC4TQjgvsxgmYddlL55Jear/E1+tgrlmyHBlGIQuBv3tYciPrYCRHel08eTPATVdpB7BsJN3wDbBMWoicw0cgcmNxNe4BqVKGT0MGb2AHvGltr+gvXu50nWqlBc6NZUJWl50b0+AyHL8OqlZQLGnJH1E9T61wgVeAx21p7GWWp2e4qVxjzMho/enJRh39YM9YC6eIrOTb6s6pdKO7DmhWsy3nrWdAr/lq5BFGjPKyymDd+wXTRwjEZbHkP6GZVxkiMfPVIpfiYUbnBQ7KMWjzFggpablPOjMLkBgFks7Ohkiv9CapoVzTtWpxZ/N6R6q75zffGOpBahCADcdKFTtiWE6sFyT3zUJEmhCKYZhyP/Up0+QyGqwgDwJlT4R7yU8oJDRoXES7oNphArhFwMzn67u4VwO1rp3MWg9JCehGx3D4YxuLA+JrFvQV+CkxIafUqTdqWbs2T+R3ebrUir8Vep/m1VSBn+WnuKl3IXYhelMKHby4nyPJZjraq1uxFWUdqHoVAdU5xTVP3xsHcb0rycrIC4/zqljqS7uvvHO3Cq3ShI3T58OMz9ysOuWK5i5hHyqC1Tr9UZ8DhBbJi43xgtw/Ex3qiVep3a1/DBm3LHfhOgaN7n/TPsS7JT+YlO5nnaRpK7pM1W5iN6i7CdvHuGhUVXJ8JkTO6wLLKuTi2D9Fv1exYPDQFWpD2PutbAVL4I5ehkZZ7ARr+a7BkL76+Y62a1KHwzC7nrMBjmyAnB/mb+OLUIAeq+5IouTYj/pToYL74LzpEWzfeaaOtLwvLroS5se3eTdqgoLPPw5iUTVXaKoxxbHBcKy619/Cv7hs4Evlvkgt+d2BE/3hPAQF6ykiteZ2E4Rcsgwi dv6H67t6 Rc9K9ms8NDXUe63e1sfmsSDkoEu9r4HcAaULEVJ9RW86bfveS0eL97r2VVL4ZiedMZyImswVrUXQ8BilFwS2HT59N1Qf+p1nw4Yr9uHNt/oniTGpcXlfgA2WCLa+rcn2xKXPUCQkrjy5oUyphPO0iORhBvPelDZyxRSQ4hSuDA3dG0b3e/o5IRdS2ndjXgK5zLKnXEQanYRr3MAC43VWvNcRjNQbE+lz4bTB+0EHvNXYojIkNy3Sf3et2+4rtDGVA6dHl6HEaG4ftYzwKjjmAP1HEv4JT+K0n4hzO5kOZ+viqkSwt6RRdUGCPPwZCtrHWEMnYHgHwBvPex09hK9hPFe14Ml05yUzqxXyil75tsE+fQysxSAAEkzeJqBBZaLtWH9g0jRkFZO8yNe6hOaCD3lr/MQcRs5Ru5J3c5tb4yLL6USk5vBOFLkLL20NLbPNk+NFt02wTUgdjvfoq5S/DIxpLPGTdHyADRIzR4bIacsBw6M4rnqBo7wI5jZG0mtJBaVqNnSvEXU86cB1RoVQiLVr4K80lLAljxsfj5/xefkp5dYQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add support for a new call which allows to detect duplicate requests for each inode passed. This enables users to avoid having to incur a whole vmalloc() for duplicates inodes with kernel_read_file(). Support to avoid duplicates is desirable in-kernel since changing existing userspace or kernel users to account for these duplicates would otherwise be difficult to implement, and the measured impact of amount of wasted memory due to duplicates with vmalloc is observed to be high. This currently goes disabled because no user exists yet which wants this enabled. Kernel code which needs this enabled should select the new CONFIG_KREAD_UNIQ, otherwise the API falls back to the existing kernel_read_file_from_fd(). If we later need to have some code enable CONFIG_KREAD_UNIQ but some not we can have the feature be enabled per enum kernel_read_file_id id. For now this should cover the main future use case with modules and allow easily to disable / enable this feature with just one future kconfig option. Contrary to kernel_read_file_from_fd() users of thew new API will use kread_uniq_fd(), keep track of the inode internally, and once done use kread_uniq_fd_free() once the inode is no longer used. Signed-off-by: Luis Chamberlain --- fs/Kconfig | 3 + fs/kernel_read_file.c | 124 +++++++++++++++++++++++++++++++ include/linux/kernel_read_file.h | 14 ++++ 3 files changed, 141 insertions(+) diff --git a/fs/Kconfig b/fs/Kconfig index cc07a0cd3172..0a78657b00d5 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -18,6 +18,9 @@ config VALIDATE_FS_PARSER config FS_IOMAP bool +config KREAD_UNIQ + bool + # old blockdev_direct_IO implementation. Use iomap for new code instead config LEGACY_DIRECT_IO bool diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c index 5d826274570c..a066e2f239e8 100644 --- a/fs/kernel_read_file.c +++ b/fs/kernel_read_file.c @@ -1,9 +1,12 @@ // SPDX-License-Identifier: GPL-2.0-only +#define pr_fmt(fmt) "kread: " fmt + #include #include #include #include #include +#include /** * kernel_read_file() - read file contents into a kernel buffer @@ -187,3 +190,124 @@ ssize_t kernel_read_file_from_fd(int fd, loff_t offset, void **buf, return ret; } EXPORT_SYMBOL_GPL(kernel_read_file_from_fd); + +#ifdef CONFIG_KREAD_UNIQ +static DEFINE_MUTEX(kread_dup_mutex); +static LIST_HEAD(kread_dup_reqs); + +struct kread_dup_req { + struct list_head list; + unsigned long i_ino; +}; + +static struct kread_dup_req *kread_dup_request_lookup(unsigned long i_ino) +{ + struct kread_dup_req *kread_req; + + list_for_each_entry_rcu(kread_req, &kread_dup_reqs, list, + lockdep_is_held(&kread_dup_mutex)) { + if (kread_req->i_ino == i_ino) + return kread_req; + } + + return NULL; +} + +static struct kread_dup_req *kread_dup_request_new(char *name, unsigned long i_ino) +{ + struct kread_dup_req *kread_req, *new_kread_req; + + /* + * Pre-allocate the entry in case we have to use it later + * to avoid contention with the mutex. + */ + new_kread_req = kzalloc(sizeof(*new_kread_req), GFP_KERNEL); + if (!new_kread_req) + return false; + + new_kread_req->i_ino = i_ino; + + kread_req = kread_dup_request_lookup(i_ino); + if (!kread_req) { + /* + * There was no duplicate, just add the request so we can + * keep tab on duplicates later. + */ + pr_debug("accepted request for i_ino %lu for: %s\n", i_ino, name); + return new_kread_req; + } + + /* We are dealing with a duplicate request now */ + + kfree(new_kread_req); + + pr_debug("duplicate request on i_ino %lu for: %s\n", i_ino, name); + + return NULL; +} + +ssize_t kread_uniq_fd(int fd, loff_t offset, void **buf, unsigned long *i_ino, + size_t buf_size, size_t *file_size, enum kernel_read_file_id id) +{ + struct fd f = fdget(fd); + ssize_t ret = -EBADF; + char *name, *path; + struct kread_dup_req *req; + + if (!f.file || !(f.file->f_mode & FMODE_READ)) + goto out; + + path = kzalloc(PATH_MAX, GFP_KERNEL); + if (!path) + return -ENOMEM; + + name = file_path(f.file, path, PATH_MAX); + if (IS_ERR(name)) { + ret = PTR_ERR(name); + goto out_mem; + } + + *i_ino = file_inode(f.file)->i_ino; + + mutex_lock(&kread_dup_mutex); + req = kread_dup_request_new(name, *i_ino); + if (!req) { + mutex_unlock(&kread_dup_mutex); + ret = -EBUSY; + goto out_mem; + } + + ret = kernel_read_file(f.file, offset, buf, buf_size, file_size, id); + if (ret < 0) + kfree(req); + else + list_add_rcu(&req->list, &kread_dup_reqs); + mutex_unlock(&kread_dup_mutex); +out_mem: + kfree(path); +out: + fdput(f); + return ret; +} + +void kread_uniq_fd_free(unsigned long i_ino) +{ + struct kread_dup_req *kread_req; + + if (!i_ino) + return; + + mutex_lock(&kread_dup_mutex); + + kread_req = kread_dup_request_lookup(i_ino); + if (!kread_req) + goto out; + + list_del_rcu(&kread_req->list); + synchronize_rcu(); + +out: + mutex_unlock(&kread_dup_mutex); + kfree(kread_req); +} +#endif /* CONFIG_KREAD_UNIQ */ diff --git a/include/linux/kernel_read_file.h b/include/linux/kernel_read_file.h index 90451e2e12bd..884985b0dc88 100644 --- a/include/linux/kernel_read_file.h +++ b/include/linux/kernel_read_file.h @@ -51,5 +51,19 @@ ssize_t kernel_read_file_from_fd(int fd, loff_t offset, void **buf, size_t buf_size, size_t *file_size, enum kernel_read_file_id id); +#ifdef CONFIG_KREAD_UNIQ +ssize_t kread_uniq_fd(int fd, loff_t offset, void **buf, unsigned long *i_ino, + size_t buf_size, size_t *file_size, enum kernel_read_file_id id); +void kread_uniq_fd_free(unsigned long i_ino); +#else +static inline ssize_t kread_uniq_fd(int fd, loff_t offset, void **buf, unsigned long *i_ino, + size_t buf_size, size_t *file_size, enum kernel_read_file_id id) +{ + return kernel_read_file_from_fd(fd, offset, buf, buf_size, file_size, id); +} +static inline void kread_uniq_fd_free(unsigned long i_ino) +{ +} +#endif /* CONFIG_KREAD_UNIQ */ #endif /* _LINUX_KERNEL_READ_FILE_H */ From patchwork Wed May 24 21:36:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13254548 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D00D4C77B7A for ; Wed, 24 May 2023 21:36:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E3EEB900005; Wed, 24 May 2023 17:36:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DEC6E900003; Wed, 24 May 2023 17:36:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C8CFD900005; Wed, 24 May 2023 17:36:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B47A3900002 for ; Wed, 24 May 2023 17:36:39 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1DF861C6C70 for ; Wed, 24 May 2023 21:36:39 +0000 (UTC) X-FDA: 80826458118.09.8058F86 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf13.hostedemail.com (Postfix) with ESMTP id 37F7620010 for ; Wed, 24 May 2023 21:36:36 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=3gNDO0Wg; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=none (imf13.hostedemail.com: domain of mcgrof@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=mcgrof@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684964197; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h6efk8Ysq6YvdHmIiWzptWXcpmBWI4dNRo/RHrF21d0=; b=JpcMdS7g25ER3Cm+T4QL1Yslc2TpuXLXKT0u3yc2jyYLR2NiIbw9dqTKOhUL2VTjD5vLWQ vSf79RH5+lG+4YVt4bWjW0JezMisdvBJp96LL05t0hRlq17MvLvTRaMRwl48jLF/PGhQJC pD7BgnzCZbIYZI/JIuuwGnJT4dIRCE0= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=3gNDO0Wg; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=none (imf13.hostedemail.com: domain of mcgrof@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=mcgrof@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684964197; a=rsa-sha256; cv=none; b=STbBe4Ew7Hjv+UrLR2V7hBvUGxBpwuS6p3tVTXB9dMjxuW7z+POwTgD0vpg9UkdaUeGYHn j0b7BNzWMXolmZMFJIBMUehbZLAkQO0ViCO559EDDkqx/G5Zt1oXWp9SfX+wfH+Irq6rxQ bv2JiV8DAIcsCXxwfvY7wVDf7dsSDLA= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=h6efk8Ysq6YvdHmIiWzptWXcpmBWI4dNRo/RHrF21d0=; b=3gNDO0Wg64hic2BkgNJQhT8uEE vXe/WFB4JJNHZqRPduj24Av8RrkczfBwIjceOGIkKu/H2TuNiWkmWdwRgQ1mHBn37QqCCRJsKNPbo xoKeNHkwQbNwUVnRZWtftS2KwDwwjPlbHm36QyNR2G3G/YZrOhQHGvpbWPEQfzCAc/VOMMdYwt8Aj 5R1qjrHtBVEnchbIwMQQqPKuHr9pOROwPTuvKwV4iwoocqcmY2z5HasmNInfcAXDD6acGh/I0qZ6B p6JJGXhwtzWz5we3JQqlO/sSDYSuHsJW2Qw1iabSfCLW2V52Z+vspH7fm/NPsIHibZOMMQTuZKH+E mNxmi1gA==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.96 #2 (Red Hat Linux)) id 1q1w9d-00EitJ-19; Wed, 24 May 2023 21:36:21 +0000 From: Luis Chamberlain To: david@redhat.com, tglx@linutronix.de, hch@lst.de, patches@lists.linux.dev, linux-modules@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, pmladek@suse.com, petr.pavlu@suse.com, prarit@redhat.com, torvalds@linux-foundation.org, lennart@poettering.net Cc: gregkh@linuxfoundation.org, rafael@kernel.org, song@kernel.org, lucas.de.marchi@gmail.com, lucas.demarchi@intel.com, christophe.leroy@csgroup.eu, peterz@infradead.org, rppt@kernel.org, dave@stgolabs.net, willy@infradead.org, vbabka@suse.cz, mhocko@suse.com, dave.hansen@linux.intel.com, colin.i.king@gmail.com, jim.cromie@gmail.com, catalin.marinas@arm.com, jbaron@akamai.com, rick.p.edgecombe@intel.com, yujie.liu@intel.com, mcgrof@kernel.org Subject: [PATCH 2/2] module: add support to avoid duplicates early on load Date: Wed, 24 May 2023 14:36:20 -0700 Message-Id: <20230524213620.3509138-3-mcgrof@kernel.org> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20230524213620.3509138-1-mcgrof@kernel.org> References: <20230524213620.3509138-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: 13nnfemeg736in4qa59zbi8tpxeauhib X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 37F7620010 X-HE-Tag: 1684964196-434348 X-HE-Meta: U2FsdGVkX1+Y/sRUL0cnTlKXyW9CrmhOQJ64tGA801AONf4BlpmoEp4lcjJtR85dE8XrAdE42dz8NCWZiWm6CA4NhAicXcgQY1afXLcT/oe+yTLZuvINHQkI2jBO6w0sl1ubdi+BRThCWUkUuNlET0CRm7nUMzKki92WTxKzPFsumt1HWkshYnSc/5kiDRmQUyJBKDfV/BQ/c1FtU29FzmV20aumkvWttZKQ4Ik4tXQC/Bkw2B92N8aw3t3bCfFss0ahw5WY1fSSe5EVdRGzSxF3mBtQz57Mfal9CxP5fWzI/IZEaQnqbYd68+IzjgV7XL2yNWHIy0xmSo7V6R2ZHmJEtxd94VILcWEKPFEjg3aDT8qAPWEKm1pRwmK7nQOXTp3fJsZYYdV4u0MTImplOxkGGB4Eg0DpcMZxNnWlyIDYLZHbQSueTuFwMLasQsDs9MTdoTqODVAQ7aLvbnhrofszWIaUYTP7NkMz27algxrp/qjkagDSQuuJljGo2X/xM9Z/IXBUllY6U9fDprig3AhT910+6DNUPvoqieuJwStqNWkOwSonzNcDn8PS5DKXyqkZWIeuz7CvHLTlqa5m/EmL1bRfDFzfgxjO0SoSVHFOE6vx5VTXLXBrpiWodDSacTzPIHCPG1x2lBwv9MUsoYzPyOwHCqbrm0XnDD5IBw7F+bYyPEQH6VwobNM3ginNsSjVcfUVkHf1Tq//8EDi/V4VK6wyUXOuB1Lu9UYVhnjy/7AAxBL1Tsehse3qxmrDRREPz9sOK6MnTMXM9mMZPOiLVdyNjzKEOFzWilT7Zftf+OdcEVxNdCnE6mCRv0C8OW1OLYcXkeYb/ggqda3m3fDa3nOfD/txdSqguLYVB2Umxr+Mnlmc6QVTI4BBlBWqWvrjEG59EGZp6hizB+L+JVg27PbXhCxxDG2RMTFHspQTsBdIFJDXnFKblIucEm+Bo2lN21yDPQGyNVADNfy W1z4RnlX 137b4cx3ECr39Hx2y8k0fpEvLd1foulbxB/Lyqgc/kykTLf9dieLKAPj2dPo6Mx9yrBnuWWRCDZK4EjeWYb+7KtEGB5Wvnd911TzY+rHDh+MO0cR4TWW8Jt+DdehPW58cs89lZ1X6G9WdYh/rYv9l+OeZpzy7FS/JolRorDNN217hA4T78sqJSA3y4aPzCKVAchT+13o8PC4dwI8t1N/psSp63rysLI8Ik0Ai9Tb5KW4IOTRwDoIOnrr3tzLSD9QiK8JYgDoFYQdh9H36Za6utyRD3Ha0fwfoFiHGn1GPjKjEGwajoHpj5jhsun4gDoS08145wluskQZ5LvvV7tY95NFexfgGHjXQem1uHFIRaFk3CdS4m6G7PeXxwM0X3GxfVX/IY97Ivfu8MELk2Q7J2nu93XunviQVxr018dJxSwhfB0TTWG6N1gZ7Wr7JvkN12WiyyKKvcPVs13r33TAzkmvyrzJ9amsogFVGRbllQh9tr8g1ER42ygRagDFa4OZSI/IAAIIVYDWH6lssR+fjGQoyUg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add support to use the new kread_uniq_fd() to avoid duplicate kernel reads on modules. At the cost of about ~945 bytes to your kernel size, enabling this on a 255 CPU x86_64 qemu guest this saves about ~1.8 GiB of memory during boot which would otherwise be free'd, and reduces boot time by about ~11 seconds. Userspace loads modules through finit_module(), this in turn will use vmalloc space up to 3 times: a) The kernel_read_file() call b) Optional module decompression c) Our final copy of the module Commit 064f4536d139 ("module: avoid allocation if module is already present and ready") shows a graph of the amount of vmalloc space observed allocated but freed for duplicate module request which end up in the trash bin. Since there is a linear relationship with the number of CPUs eventually this will bite us and you end up not being able to boot. That commit put a stop gap for c) but to avoid the vmalloc() space wasted on a) and b) we need to detect duplicates earlier. We could just have userspace fix this, but as reviewed at LSFMM 2023 this year in Vancouver, fixing this in userspace can be complex and we also can't know when userpace is fixed. Fixing this in kernel turned out to be easy with the inode and with a simple kconfig option we can let users / distros decide if this full stop gap is worthy to enable. With this enabled I'm now able to see 0 bytes wasted on vmalloc space due to duplicates. Before: # sudo systemd-analyze Startup finished in 41.653s (kernel) + 44.305s (userspace) = 1min 25.958s graphical.target reached after 44.178s in userspace. # grep "Virtual mem wasted bytes" /sys/kernel/debug/modules/stats Virtual mem wasted bytes 1949006968 So ~1.8 GiB... of vmalloc space wasted during boot. After: # sudo systemd-analyze Startup finished in 29.883s (kernel) + 45.817s (userspace) = 1min 15.700s graphical.target reached after 45.682s in userspace. # grep "Virtual mem wasted bytes" /sys/kernel/debug/modules/stats Virtual mem wasted bytes 0 Suggested-by: Lennart Poettering Signed-off-by: Luis Chamberlain Tested-by: Luis Chamberlain Tested-by: Luis Chamberlain Tested-by: Dan Williams --- include/linux/module.h | 1 + kernel/module/Kconfig | 20 ++++++++++++++++++++ kernel/module/internal.h | 1 + kernel/module/main.c | 19 ++++++++++++------- 4 files changed, 34 insertions(+), 7 deletions(-) diff --git a/include/linux/module.h b/include/linux/module.h index 9e56763dff81..afc44df96278 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -557,6 +557,7 @@ struct module { unsigned int printk_index_size; struct pi_entry **printk_index_start; #endif + unsigned long i_ino; #ifdef CONFIG_MODULE_UNLOAD /* What modules depend on me? */ diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig index 33a2e991f608..85a6c7c5ddc0 100644 --- a/kernel/module/Kconfig +++ b/kernel/module/Kconfig @@ -157,6 +157,26 @@ config MODULE_UNLOAD_TAINT_TRACKING page (see bad_page()), the aforementioned details are also shown. If unsure, say N. +config MODULE_KREAD_UNIQ + bool "Avoid duplicate module kernel reads" + select KREAD_UNIQ + help + Enable this option to avoid vmalloc() space for duplicate module + requests early before we can even check for the module name. This + is useful to avoid duplicate module requests which userspace or kernel + can issue. On some architectures such as x86_64 there is only 128 MiB + of virtual memory space and since in the worst case we can end up + allocating up to 3 times the module size in vmalloc space, avoiding + duplicates can save virtual memory on boot. + + Enabling this will incrase your kernel by about 945 bytes, but can + save considerable memory during bootup which would otherwise be freed + and this in turn can help speed up kernel boot time. + + Disable this if you have enabled CONFIG_MODULE_STATS and have + verified you see no duplicates or virtual memory being freed on + bootup. + config MODVERSIONS bool "Module versioning support" help diff --git a/kernel/module/internal.h b/kernel/module/internal.h index dc7b0160c480..7ea7f177f907 100644 --- a/kernel/module/internal.h +++ b/kernel/module/internal.h @@ -67,6 +67,7 @@ struct load_info { unsigned int max_pages; unsigned int used_pages; #endif + unsigned long i_ino; struct { unsigned int sym, str, mod, vers, info, pcpu; } index; diff --git a/kernel/module/main.c b/kernel/module/main.c index ea7d0c7f3e60..e35900ee616a 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -1283,6 +1283,7 @@ static void free_module(struct module *mod) kfree(mod->args); percpu_modfree(mod); + kread_uniq_fd_free(mod->i_ino); free_mod_mem(mod); } @@ -1964,12 +1965,14 @@ static int copy_module_from_user(const void __user *umod, unsigned long len, return err; } -static void free_copy(struct load_info *info, int flags) +static void free_copy(struct load_info *info, int flags, bool error) { if (flags & MODULE_INIT_COMPRESSED_FILE) module_decompress_cleanup(info); else vfree(info->hdr); + if (error) + kread_uniq_fd_free(info->i_ino); } static int rewrite_section_headers(struct load_info *info, int flags) @@ -2965,7 +2968,7 @@ static int load_module(struct load_info *info, const char __user *uargs, } /* Get rid of temporary copy. */ - free_copy(info, flags); + free_copy(info, flags, false); /* Done! */ trace_module_load(mod); @@ -3023,7 +3026,7 @@ static int load_module(struct load_info *info, const char __user *uargs, */ if (!module_allocated) mod_stat_bump_becoming(info, flags); - free_copy(info, flags); + free_copy(info, flags, true); return err; } @@ -3068,11 +3071,12 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags) |MODULE_INIT_COMPRESSED_FILE)) return -EINVAL; - len = kernel_read_file_from_fd(fd, 0, &buf, INT_MAX, NULL, - READING_MODULE); + len = kread_uniq_fd(fd, 0, &buf, &info.i_ino, INT_MAX, NULL, READING_MODULE); if (len < 0) { - mod_stat_inc(&failed_kreads); - mod_stat_add_long(len, &invalid_kread_bytes); + if (len != -EBUSY) { + mod_stat_inc(&failed_kreads); + mod_stat_add_long(len, &invalid_kread_bytes); + } return len; } @@ -3082,6 +3086,7 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags) if (err) { mod_stat_inc(&failed_decompress); mod_stat_add_long(len, &invalid_decompress_bytes); + kread_uniq_fd_free(info.i_ino); return err; } } else {