Message ID | 20230405022702.753323-5-mcgrof@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show
Return-Path: <owner-linux-mm@kvack.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BE32C6FD1D for <linux-mm@archiver.kernel.org>; Wed, 5 Apr 2023 02:27:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80298900002; Tue, 4 Apr 2023 22:27:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 766616B0082; Tue, 4 Apr 2023 22:27:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 47D66900002; Tue, 4 Apr 2023 22:27:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2BC6C6B0081 for <linux-mm@kvack.org>; Tue, 4 Apr 2023 22:27:21 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0321140236 for <linux-mm@kvack.org>; Wed, 5 Apr 2023 02:27:20 +0000 (UTC) X-FDA: 80645750682.28.FBCC3AF Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf20.hostedemail.com (Postfix) with ESMTP id 46FF61C0008 for <linux-mm@kvack.org>; Wed, 5 Apr 2023 02:27:18 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=OWclGMrE; spf=none (imf20.hostedemail.com: domain of mcgrof@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=mcgrof@infradead.org; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680661639; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/mwlyLE7s4o2drS4c3fg7l4jgOz8wTyFaQ/xPuBgF5k=; b=otW3SBrFnGG4q+Ci1IVNWByuUEkx6gVMIQVxAUDbaIvxk8NzPvSJxr4LkPq39a+ByZdZU1 NRBdUWCYnd5Acsj5ALxyWls3vThwTWWVR4ILqud4gT0qYTeEruAmcIpoNvqn2Ry51M4fEs aou9JRByVJYnmoQoqlmEPk09v/oWnpc= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=OWclGMrE; spf=none (imf20.hostedemail.com: domain of mcgrof@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=mcgrof@infradead.org; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680661639; a=rsa-sha256; cv=none; b=hgHCQgNTlDXdoe4AjCiGyCsR5qANheUlVNZ4sPhTMQwpLqxHQX9okkpGp16V+MIJsUGjyj GRIfb5sMxGoa4GNCU6ynNgR/34tnaeqLLcut/EB+cj1Dy7ETnx+R/R76Ub2GjatkDe6dsw iUTfAr26p12s7T++jFPJ8bfodUIpoCc= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=/mwlyLE7s4o2drS4c3fg7l4jgOz8wTyFaQ/xPuBgF5k=; b=OWclGMrE3RTS/XBeqQQsW8oTmW aauU43jHoyRWQkLFaFaOUZ50T3eOK4FRtbFuYbrLYI7Xo/wqyv7EWYOlUxMgxHyoGxoEtxbj3OaMR ycI7olX1tXVmQkUUBE+qAp+TJpRY/sCChWg40THsVNIzxl9rUosriFxB5FDJVPESKydLboaYfS/RZ yeJvvmvd35oCT2Y0Jfq7lqf4EoOWJnrW/qoEg44qjOE49KAkhyAzdSWJOatYrq8Ejg+iemQ/+N0Mv Xml2Vmcia1+Oqq/4VjXqkktb4bxmhJdyT5b5EordEJrBtVqiLOO1dTRE7zVIjzRGIqJiFqfJWO5Aj wTjpCK6A==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.96 #2 (Red Hat Linux)) id 1pjsrb-003A0W-20; Wed, 05 Apr 2023 02:27:07 +0000 From: Luis Chamberlain <mcgrof@kernel.org> To: david@redhat.com, patches@lists.linux.dev, linux-modules@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, pmladek@suse.com, petr.pavlu@suse.com, prarit@redhat.com, torvalds@linux-foundation.org, gregkh@linuxfoundation.org, rafael@kernel.org Cc: christophe.leroy@csgroup.eu, tglx@linutronix.de, peterz@infradead.org, song@kernel.org, rppt@kernel.org, dave@stgolabs.net, willy@infradead.org, vbabka@suse.cz, mhocko@suse.com, dave.hansen@linux.intel.com, colin.i.king@gmail.com, jim.cromie@gmail.com, catalin.marinas@arm.com, jbaron@akamai.com, rick.p.edgecombe@intel.com, mcgrof@kernel.org Subject: [PATCH v2 4/6] module: avoid allocation if module is already present and ready Date: Tue, 4 Apr 2023 19:27:00 -0700 Message-Id: <20230405022702.753323-5-mcgrof@kernel.org> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20230405022702.753323-1-mcgrof@kernel.org> References: <20230405022702.753323-1-mcgrof@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 46FF61C0008 X-Stat-Signature: nz35oraaceojp1dpqrx1br81khhq41ep X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1680661638-568758 X-HE-Meta: U2FsdGVkX187wfZsiEzCblYzCnGozhgEi2lMcTTHYgSioe7C4DgEoOpZVHupv5eW6uFo1YwwnOg7Wn8M5mWHw/VRaqvwpSFEyIpdxu8yxJoLi9/gr5/3qdtpMjzoFYY4Ql04WL2qwHxxmlYlE1gI1owT78fH4K9ezT0u2g2IzWYmp56MGHDTct8Ph2VngNZkUFysJ1hmqGv9CN3NkVhXQ3kH5xGL8Wx6bAXg9+dEi0tCZgaqBz7IvlgCKjDqq3TfAL4Y+M8RIfH5GHth6QOFXIi+9obwEiDCPWr1qt0jizfbF2IB8VTLY1cbb5jXlLc5JVFxnIlWKLNVqq7Z/fpWMh3U97oOeOvqwStAmyRAmE85ve6NAeDBeamziKleIr4qWJUuxVjpcu4n9o4Q8V0iDt0XPsy3JWRoLj8hbbABJSHfoHs1ZPtbB9DREjo2y0ivWfmGobd38txenol6239qRStQea9C+IrANpJ7HldwE44km1shrAC2Ozq7y21nGQttE+Y7A4gvqwvIHbDN7PdTl+MIZDwD7bDk6ejBJ6WlPlGeqRI2VSfqI14CYdAw5e5ZdrgI96ZsyMpfWrGjoCkGA8PJ9w3/KImdW1R9b2JGDqUiIMkd3tprkkIOsOxnRadrRHHoOCatIXfqk9v+lw7Xm4bUe1wBefFMlpi9V5efbGaZnngeeMoBd2XbiSoNx4gErRgQ2FVGLHYSxBoGbSi73P2qOt2tJgJZmbOVMl2t28eRuVhWkYABZCX9KRTjZ+k5lRwSw/HgRg5ZAVSODySpKVu3XAWn25Kc4y/Cwv2eAfG5G+f4KCnICmiT8oIlOlPUvLIeQE0SzEnPm3pDWNqbCp5ds87Fjzb4gei90xw6kumMM+aGsvKQzOHFzr/OqkyIwxVhD//lMJ0SfAAFHtkcgtT9STxK8GMoyEQbBJWD94vkD1HvWY33Z2cir2flTldMAYmBirTWC6k2efVBBWV qI3y2ccp c6wmKMt68mCOo5knk7VFJjsePh2ADPd6BlP6MqZWWrbfpRFPdAS4ouJWVf9ZQKsbmR/yJioGUMqempMLIDYii9pqOiaU0Hgc6EhXfGRcuKGcp7Zs8bi1oQqImFcGGo4ehRs/zQ7oKfg4vNNCWdVAdZGlVCQXUWOKUdtDBmPD7XV5fZKbdYcn1nSQcJqi61bnvpaRBMPAZy4eGxIWKNCE3hCCWUQb/AHweHdF48OuzcYLhPFy4CQlYxPrFS2kzPMalZ3gSkdvRhqDRJvxHC5RNePfPaDy/eAmrG5x1YEO9BDohZS8hedjG+trgJmp3A1pMdn/+XBo415Bt0NX6zmzUZeQimawyylBPgMCEAuNnJH7iykkHuhkHC53jMF3rBeQZvvzPN1lYz3Ei9u4eL+jVuYcZPCmZJfvOSB4RDXaaoZNTQ0FDRrKEknneUnuYzfXKHU40UMCTjFrqntCPkjjsNqEZ9ke7kAJBjecxSOduRm0+K2sx1/iMMWeQtvpVrhPflLII X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: <linux-mm.kvack.org> |
Series |
module: avoid userspace pressure on unwanted allocations
|
expand
|
diff --git a/kernel/module/main.c b/kernel/module/main.c index 8f382580195b..137fd9292dc0 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -2797,7 +2797,11 @@ static int early_mod_check(struct load_info *info, int flags) if (err) return err; - return 0; + mutex_lock(&module_mutex); + err = module_patient_check_exists(info->mod->name); + mutex_unlock(&module_mutex); + + return err; } /*
load_module() will allocate a struct module before even checking if the module is already loaded. This can create unecessary memory pressure since we can easily just check if the module is already present early with the copy of the module information from userspace after we've validated it a bit. This can only be an issue if a system is getting hammered with userspace loading modules. Note that there are two ways to load modules, one is kernel moduile auto-loading (request_module() calls in-kernel) and the other is modprobe calls from userspace. The auto-loading is in-kernel, that pings back to userspace to just call modprobe. We already have a way to restrict the amount of concurrent kernel auto-loads in a given time, however that does not stop a system from issuing tons of system calls to load a module and for the races to exist. Userspace itself *is* supposed to check if a module is present before loading it. But we're observing situations where tons of the same module are in effect being loaded. Although some of these are acknolwedged as in-kernel bugs such as the ACPI frequency modules, issues for which we already have fixes merged or are working towards, but we can also help a bit more in the modules side to avoid those dramatic situations. All that is just memory being allocated to then be thrown away. To avoid memory pressure for such stupid cases put a stop gap for them. We now check for the module being present *before* allocation, and then right after we are going to add it to the system. On a 8vcpu 8 GiB RAM system using kdevops and testing against selftests kmod.sh -t 0008 I see a saving in the *highest* side of memory consumption of up to ~ 84 MiB with the Linux kernel selftests kmod test 0008. With the new stress-ng module test I see a 145 MiB difference in max memory consumption with 100 ops. The stress-ng module ops tests can be pretty pathalogical -- it is not realistic, however it was used to finally successfully reproduce issues which are only reported to happen on system with over 400 CPUs [0] by just usign 100 ops on a 8vcpu 8 GiB RAM system. This can be observed and visualized below. The time it takes to run the test is also not affected. The kmod tests 0008: The gnuplot is set to a range from 400000 KiB (390 Mib) - 580000 (566 Mib) given the tests peak around that range. cat kmod.plot set term dumb set output fileout set yrange [400000:580000] plot filein with linespoints title "Memory usage (KiB)" Before: root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008 root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > log-0008-before.txt ^C root@kmod ~ # sort -n -r log-0008-before.txt | head -1 528732 So ~516.33 MiB After: root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008 root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > log-0008-after.txt ^C root@kmod ~ # sort -n -r log-0008-after.txt | head -1 442516 So ~432.14 MiB That's about 84 ~MiB in savings in the worst case. The graphs: root@kmod ~ # gnuplot -e "filein='log-0008-before.txt'; fileout='graph-0008-before.txt'" kmod.plot root@kmod ~ # gnuplot -e "filein='log-0008-after.txt'; fileout='graph-0008-after.txt'" kmod.plot root@kmod ~ # cat graph-0008-before.txt 580000 +-----------------------------------------------------------------+ | + + + + + + + | 560000 |-+ Memory usage (KiB) ***A***-| | | 540000 |-+ +-| | | | *A *AA*AA*A*AA *A*AA A*A*A *AA*A*AA*A A | 520000 |-+A*A*AA *AA*A *A*AA*A*AA *A*A A *A+-| |*A | 500000 |-+ +-| | | 480000 |-+ +-| | | 460000 |-+ +-| | | | | 440000 |-+ +-| | | 420000 |-+ +-| | + + + + + + + | 400000 +-----------------------------------------------------------------+ 0 5 10 15 20 25 30 35 40 root@kmod ~ # cat graph-0008-after.txt 580000 +-----------------------------------------------------------------+ | + + + + + + + | 560000 |-+ Memory usage (KiB) ***A***-| | | 540000 |-+ +-| | | | | 520000 |-+ +-| | | 500000 |-+ +-| | | 480000 |-+ +-| | | 460000 |-+ +-| | | | *A *A*A | 440000 |-+A*A*AA*A A A*A*AA A*A*AA*A*AA*A*AA*A*AA*AA*A*AA*A*AA-| |*A *A*AA*A | 420000 |-+ +-| | + + + + + + + | 400000 +-----------------------------------------------------------------+ 0 5 10 15 20 25 30 35 40 The stress-ng module tests: This is used to run the test to try to reproduce the vmap issues reported by David: echo 0 > /proc/sys/vm/oom_dump_tasks ./stress-ng --module 100 --module-name xfs Prior to this commit: root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > baseline-stress-ng.txt root@kmod ~ # sort -n -r baseline-stress-ng.txt | head -1 5046456 After this commit: root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > after-stress-ng.txt root@kmod ~ # sort -n -r after-stress-ng.txt | head -1 4896972 5046456 - 4896972 149484 149484/1024 145.98046875000000000000 So this commit using stress-ng reveals saving about 145 MiB in memory using 100 ops from stress-ng which reproduced the vmap issue reported. cat kmod.plot set term dumb set output fileout set yrange [4700000:5070000] plot filein with linespoints title "Memory usage (KiB)" root@kmod ~ # gnuplot -e "filein='baseline-stress-ng.txt'; fileout='graph-stress-ng-before.txt'" kmod-simple-stress-ng.plot root@kmod ~ # gnuplot -e "filein='after-stress-ng.txt'; fileout='graph-stress-ng-after.txt'" kmod-simple-stress-ng.plot root@kmod ~ # cat graph-stress-ng-before.txt +---------------------------------------------------------------+ 5.05e+06 |-+ + A + + + + + + +-| | * Memory usage (KiB) ***A*** | | * A | 5e+06 |-+ ** ** +-| | ** * * A | 4.95e+06 |-+ * * A * A* +-| | * * A A * * * * A | | * * * * * * *A * * * A * | 4.9e+06 |-+ * * * A*A * A*AA*A A *A **A **A*A *+-| | A A*A A * A * * A A * A * ** | | * ** ** * * * * * * * | 4.85e+06 |-+ A A A ** * * ** *-| | * * * * ** * | | * A * * * * | 4.8e+06 |-+ * * * A A-| | * * * | 4.75e+06 |-+ * * * +-| | * ** | | * + + + + + + ** + | 4.7e+06 +---------------------------------------------------------------+ 0 5 10 15 20 25 30 35 40 root@kmod ~ # cat graph-stress-ng-after.txt +---------------------------------------------------------------+ 5.05e+06 |-+ + + + + + + + +-| | Memory usage (KiB) ***A*** | | | 5e+06 |-+ +-| | | 4.95e+06 |-+ +-| | | | | 4.9e+06 |-+ *AA +-| | A*AA*A*A A A*AA*AA*A*AA*A A A A*A *AA*A*A A A*AA*AA | | * * ** * * * ** * *** * | 4.85e+06 |-+* *** * * * * *** A * * +-| | * A * * ** * * A * * | | * * * * ** * * | 4.8e+06 |-+* * * A * * * +-| | * * * A * * | 4.75e+06 |-* * * * * +-| | * * * * * | | * + * *+ + + + + * *+ | 4.7e+06 +---------------------------------------------------------------+ 0 5 10 15 20 25 30 35 40 [0] https://lkml.kernel.org/r/20221013180518.217405-1-david@redhat.com Reported-by: David Hildenbrand <david@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- kernel/module/main.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)