From patchwork Mon Aug 14 17:21:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13353126 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE5E9C001B0 for ; Mon, 14 Aug 2023 17:21:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 21FE26B0075; Mon, 14 Aug 2023 13:21:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D0FB8E0002; Mon, 14 Aug 2023 13:21:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0983F8E0001; Mon, 14 Aug 2023 13:21:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EDAF16B0075 for ; Mon, 14 Aug 2023 13:21:49 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id BE4DBC090E for ; Mon, 14 Aug 2023 17:21:49 +0000 (UTC) X-FDA: 81123377538.07.FB4FB90 Received: from mail-lj1-f175.google.com (mail-lj1-f175.google.com [209.85.208.175]) by imf12.hostedemail.com (Postfix) with ESMTP id DBE9240017 for ; Mon, 14 Aug 2023 17:21:47 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=KoAWn8A5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.175 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692033708; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=9JC5TFMXUMZWvGoHxDc++vq6ZIXjTrzJoWy+WtuWXDM=; b=cxpLr7n57h+CzFW6xh1viVEZBJR/vwwqz0KFVHXA+1Vz4u1hUcxaoVit89cwBvAkvGIln8 xHssJjo64ayH9vZxB0jxwSBwVXpuQ6uHfjogBRoaRa6jSmioibaCNHmIUqAg4nHdi6Pdvq PrBoe00kjuw+96LTdTeBvvcjCOyqgjI= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=KoAWn8A5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.175 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692033708; a=rsa-sha256; cv=none; b=Yy0j5Jq3V8sInXPOHqsZZdNG4V42TLNBvYaPWUYbafKBiryThGJ8wrqZrD/dxuvKY5wW4y RoxVj50eky17gg+k/RD8tlyH+WGxgEzxt57uXtxc6bOPRtU5rQrMIGAmnZat0clO+YUhhi JaaDcZGK4v5tmvSHGrQtjU4YdNfQSTo= Received: by mail-lj1-f175.google.com with SMTP id 38308e7fff4ca-2ba1e9b1fa9so70368611fa.3 for ; Mon, 14 Aug 2023 10:21:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692033706; x=1692638506; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=9JC5TFMXUMZWvGoHxDc++vq6ZIXjTrzJoWy+WtuWXDM=; b=KoAWn8A5CWOqyO7GkArV0tVey//pMjPAxQu+o0AWib7dv4xooofRm/gacWDYLbIIzZ wwBgBcaoBfA5RLFMfgLGhpdX4Q94KlvY7DaHd1/HfwrFVfZj0AJ+CbUl1fiD0NZMXhzG 0Si0wJlilkAt5E/DJD0j43Bh9KrLiETiPjxY4qJu2lZX+tFsy5SGFlkAw61S/iYp/JeV 5FezNTBPWkLi8kq5KolGdEVfuzOYb6OmKLnUNcGH/wjF/8htqj1dUHsJTZERX3mutcBv H2QpyrahXO6phpCuAZjVspNVfhl4clKCRpRY/KeglpW0qxm060sbjPNOuUnOFfwxx2J7 9MWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692033706; x=1692638506; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=9JC5TFMXUMZWvGoHxDc++vq6ZIXjTrzJoWy+WtuWXDM=; b=B3B871ABS1UqA5/yHeI/MEhylC5kwQMB1mFib8LHqlygsOjHvFwcIBg6LMEbnMNjfo gCUwtbiEBq0Lf3LEUnQFqMuo2X/64AYDRnYwdberGpj5eCYnAdJ/Szgx5CAYmLayiip5 TSSEvg1aaQajMz+I2fjOnPtJB658MYFELC6BuvEaSrFpWgqXkfObLKKE0+ovW8zSDL4j ZLO/2s3VpuNFs09olH+2TvT5GpPLfDiGo29sIptjz/FP5E8ZoSifCmwn4ApBLn2//sE6 vFZwtPjkXpKIbyfTkaRQB+gA7ysKHCJgc4Hm0iLmbNl8AbpYcnAKVwff5Kn7mKMfuXOT hTYA== X-Gm-Message-State: AOJu0Yy4/rJES2aKjWNsjzVng5q/oHMZNnZ4gH0/RUgLTPn4pAH3O/+0 OOMZFUgNkC8BEH4FKpnT7qw= X-Google-Smtp-Source: AGHT+IEn4S9L2R9YQDurZoa+CF0wHXe3lqfG72WFd/bO8Q6ncD1XA83TUYmlcJcmfZiNMlmBdExT2A== X-Received: by 2002:a2e:8612:0:b0:2b9:dd5d:5d0a with SMTP id a18-20020a2e8612000000b002b9dd5d5d0amr6781528lji.40.1692033705607; Mon, 14 Aug 2023 10:21:45 -0700 (PDT) Received: from f.. (cst-prg-75-195.cust.vodafone.cz. [46.135.75.195]) by smtp.gmail.com with ESMTPSA id sa25-20020a170906edb900b009937dbabbd5sm5855444ejb.220.2023.08.14.10.21.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Aug 2023 10:21:45 -0700 (PDT) From: Mateusz Guzik To: linux-kernel@vger.kernel.org Cc: torvalds@linux-foundation.org, brauner@kernel.org, ebiederm@xmission.com, david@redhat.com, akpm@linux-foundation.org, linux-mm@kvack.org, koct9i@gmail.com, oleg@redhat.com, dave@stgolabs.net, Mateusz Guzik Subject: [PATCH v2] kernel/fork: stop playing lockless games for exe_file replacement Date: Mon, 14 Aug 2023 19:21:40 +0200 Message-Id: <20230814172140.1777161-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: DBE9240017 X-Stat-Signature: nrwnqh1bt1arz8wnfj8jktwmy1819irk X-HE-Tag: 1692033707-644111 X-HE-Meta: U2FsdGVkX1+Dp5O0PLUTZyHHlFgFFQwiAJViR7+02avFDSzchuGdeEQaNXSZTUwDdwqLgSrHf3qU6935tB9uIDp7ygwLzNQhHGtgjoPDzuievzAoNk1CuWyMgtTNo9SiRujtT9S0a2UZ1wqF0rEDuw0s9bjwA8/8JLHzVO05voYFMoKhKYXariPA0UGEvoIPc15hPYwjMDvbPiY1/nxlj7nNCiH6ASNR/XK4eXJs1JXH0bWrxqiKdiTypI0EkpPsCGJaQz0XxgZ5J//jT4smrxpQIhhG/aulNcrJwroDBPLalyfvVcshfeIAj041Er7dslE7F1marQacYhY3PlyGhKYAjqvZNoR2rDQbAkk3Avy7zRJy7v7AvfY+S3LH+x8D6rx0dOih7fxaU6z3/qFRBDShyZXraCfciCxv+v+N0avqcnSOp2neefyaLdXPdsouYd8BcM9s1+LjdqUdJoqV/4Lu2+/jI5MdBesbKFGuIhgZVp/EyTejg0TIIwPydbCs60gFAzOdKbpKmU9AaEHDIOLNLLmlZId3TXOA+Yg4KTv5stt6FBdC4M6fIUtltETl1kRTTLjlv4p5DBI7byqfC/I8YLusCGLFAlm/7kXn/IbqaNCc5jqn7YbBuDrIlDe2kL5Q8VU+UeOsHGV3ey5h8F02NEiDf1Uu1gG9YdhMw8fgTENrhlvFpbhtbD19Bhqd3T9XgPP6675p+rNdhfmPk+OySnMdrMlOFPs3I+AWpGgRPVCbDUa952yXktct1OICNiSninHKW5xqn5q7tDCniKD7lo4V2Vb+avCcd2VAFfhwfkBBJCZI5WUFNmY6Az/7w1MoJHXFlhVpC4YL6562WQr28Ddt5nfVxJa2irQzINVi/Y4snQ37ztgcX2j6ffcqPnLw07CpjfIQOl8Zu40zMDdAVhK/Zql6errubDa7pHAUr2sfy/Bjyk0Ue4WTk8UAltWeqGM4wHWFj3Hxo51 vd96jt64 uKZIbfbLaw6NpMWtxsDln1MICjUBG0caLpoSK4Z1Jp9fRe23HipUrVE6yNob7kzdx0x+tQAijI7Z5B8cFBiE+0pZ1ciKdSade2NDpAPRoYjgkfSNlH+fPHnK8oaG5PMq7epaJOZaa0n8Cdso6yPBJLrHs3alu7jHY6hmhFHaJwwAYt9q+qNfQ0c5VMEHvuu+pZqhdya418LzcWUbixBkjVIgzdPnwkFjXjzwlvMU3saHROy7pvLxGBaoDqUKundntqbYA41yg0Bqg57cmS33a9HY02rAIHC/mDEKEAUN4tRRNfTKp9OWZiOSvTaS03fAMutlYsIzQZh+lIortPqcdcpoCOkkEFO7yKltiaWFiTJ1fa6vSTPQIQHxz805T8xLMOybQzwbLQvZeLA4V/g/UhV/kznw3vMr7XgkLTVfV1HKBBNx0fNJUwg2wSWAx1NyDuNedFO+v/zyUaJev5q+6lNYly3vRsGOjW4vMkSA13WMPCISSEY2GHokyzd/c62XDt5dJ4KHh5j7W1SJ4HHQfhXlz7AQ947mHkt6mKwFSFtG0t1MGnKMng2EdJ0KBPwBebPnjvBtIbVXc1U1NN3X9Cc/5LkOG3Mqm8XrLGZWHAI9OUzo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: xchg originated in 6e399cd144d8 ("prctl: avoid using mmap_sem for exe_file serialization"). While the commit message does not explain *why* the change, I found the original submission [1] which ultimately claims it cleans things up by removing dependency of exe_file on the semaphore. However, fe69d560b5bd ("kernel/fork: always deny write access to current MM exe_file") added a semaphore up/down cycle to synchronize the state of exe_file against fork, defeating the point of the original change. This is on top of semaphore trips already present both in the replacing function and prctl (the only consumer). Normally replacing exe_file does not happen for busy processes, thus write-locking is not an impediment to performance in the intended use case. If someone keeps invoking the routine for a busy processes they are trying to play dirty and that's another reason to avoid any trickery. As such I think the atomic here only adds complexity for no benefit. Just write-lock around the replacement. I also note that replacement races against the mapping check loop as nothing synchronizes actual assignment with with said checks but I am not addressing it in this patch. (Is the loop of any use to begin with?) V2: - fix up comments - tweak commit message Link: https://lore.kernel.org/linux-mm/1424979417.10344.14.camel@stgolabs.net/ [1] Signed-off-by: Mateusz Guzik Acked-by: Oleg Nesterov Acked-by: David Hildenbrand --- fs/exec.c | 4 ++-- kernel/fork.c | 22 +++++++++------------- 2 files changed, 11 insertions(+), 15 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 1a827d55ba94..dc41180d4e70 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1276,8 +1276,8 @@ int begin_new_exec(struct linux_binprm * bprm) /* * Must be called _before_ exec_mmap() as bprm->mm is - * not visible until then. This also enables the update - * to be lockless. + * not visible until then. Doing it here also ensures + * we don't race against replace_mm_exe_file(). */ retval = set_mm_exe_file(bprm->mm, bprm->file); if (retval) diff --git a/kernel/fork.c b/kernel/fork.c index d2e12b6d2b18..7b8b63fb0438 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1396,8 +1396,8 @@ EXPORT_SYMBOL_GPL(mmput_async); * This changes mm's executable file (shown as symlink /proc/[pid]/exe). * * Main users are mmput() and sys_execve(). Callers prevent concurrent - * invocations: in mmput() nobody alive left, in execve task is single - * threaded. + * invocations: in mmput() nobody alive left, in execve it happens before + * the new mm is made visible to anyone. * * Can only fail if new_exe_file != NULL. */ @@ -1432,9 +1432,7 @@ int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file) /** * replace_mm_exe_file - replace a reference to the mm's executable file * - * This changes mm's executable file (shown as symlink /proc/[pid]/exe), - * dealing with concurrent invocation and without grabbing the mmap lock in - * write mode. + * This changes mm's executable file (shown as symlink /proc/[pid]/exe). * * Main user is sys_prctl(PR_SET_MM_MAP/EXE_FILE). */ @@ -1464,22 +1462,20 @@ int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file) return ret; } - /* set the new file, lockless */ ret = deny_write_access(new_exe_file); if (ret) return -EACCES; get_file(new_exe_file); - old_exe_file = xchg(&mm->exe_file, new_exe_file); + /* set the new file */ + mmap_write_lock(mm); + old_exe_file = rcu_dereference_raw(mm->exe_file); + rcu_assign_pointer(mm->exe_file, new_exe_file); + mmap_write_unlock(mm); + if (old_exe_file) { - /* - * Don't race with dup_mmap() getting the file and disallowing - * write access while someone might open the file writable. - */ - mmap_read_lock(mm); allow_write_access(old_exe_file); fput(old_exe_file); - mmap_read_unlock(mm); } return 0; }