From patchwork Thu Mar 27 05:46:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 14030788 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 233FEC3600B for ; Thu, 27 Mar 2025 05:46:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 718C92800C5; Thu, 27 Mar 2025 01:46:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A3892800A5; Thu, 27 Mar 2025 01:46:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 56A4F2800C5; Thu, 27 Mar 2025 01:46:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 36CAA2800A5 for ; Thu, 27 Mar 2025 01:46:41 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id DD4141C8160 for ; Thu, 27 Mar 2025 05:46:42 +0000 (UTC) X-FDA: 83266246644.28.3F03152 Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) by imf30.hostedemail.com (Postfix) with ESMTP id 10E5B8000C for ; Thu, 27 Mar 2025 05:46:40 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Fg6QN2N4; spf=pass (imf30.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743054401; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=iO9m696Pg23vSIpLUy+p8mK7s+/wLj/iCYZaH5Fi038=; b=DYgo++S6uqYKcItev0niLSZNezS9bvJZRKYRuy668aPLyez43k+c8syYeXToDZGGxPTIMB NaBADMof+QCw/bxgq7qTsYcv2+7sG6vDPV7vl3/3lOf3rN9SC//rPTUnRKGd+FBMlhM3OR l64D//p1ki28a7M+b/DI2jbuVpXDY4A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743054401; a=rsa-sha256; cv=none; b=rLtUplOXZNkru3zuWcSLZRkV3ETNesG4dzLPFYGQhE+s4slcVKNQPElFaPsZWtWWBfaFq8 IVHusqBP0RCLpswuVWUgzVBNC6SNapsU9NkeMurvBAlb0e/DPSxA6ZEN5b1uqKxlCH5Rhz M4lH1iGzYA86A3MG2rAUVgNf0u0y5/g= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Fg6QN2N4; spf=pass (imf30.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ed1-f54.google.com with SMTP id 4fb4d7f45d1cf-5e6194e9d2cso1080753a12.2 for ; Wed, 26 Mar 2025 22:46:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743054399; x=1743659199; darn=kvack.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=iO9m696Pg23vSIpLUy+p8mK7s+/wLj/iCYZaH5Fi038=; b=Fg6QN2N4myrVvkaGrlLoL8Aggsx7vWiRNS5vD71wfgSOyckC6UhLa6Q4lZD0IzWf4H ZaguiT1GbMPzTBp2s0TAD+GFojEVR2XhhRkK77wK86DbEdvWT8odKFXyMeLozdeFAWVR TKHcu1FJgcrA5ecKcyxMMLvk1GzkBD0/9fcHtmfYw2s+1iPR4SZPGIiD0T2BUai5aPg6 8ZkLy11Qp1dFgLFS7yXYfAVF4l79tpHr5LUQB5K2iNl6msWArxYTPIsd+N9UTCToUDfL wGH2tLUJFv7uPF4s8LzedLWRYNXOv977NImInlum1Dkb9Uwdjd165vwP9aPV4GAt9GLh X27g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743054399; x=1743659199; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=iO9m696Pg23vSIpLUy+p8mK7s+/wLj/iCYZaH5Fi038=; b=wZCoFSQ0mNTAcqj5SSRCK+F1hYPwW9ZKYM/MFYkbyl/g2AOVdh+CoPZmqWZFe4n71h 1pBKlYYho9ABNeardCRffQVDAh7DRTS0zTfANhCQC5xhqSeISVuZMGGdLAuJ6RKMWZAG AAbrlAQa0K5SzKK5TV1DGMdEVNk0aGfH1GCsVVjlFXsMz4HzC+K0l4krOAfRDEaayHFs F7UGbWwgM0PzpAdxHso/NiaZx6qSiKmKnwbqiASerc9dIzWSoRUFKGsyqgWxRsarc8u3 tMzva0EJSl0+WNenT3OfEuBI6d3OosoG7Rt81YKWd8liQK0A8G46aGDCEYnsBHWwDbj8 MPww== X-Gm-Message-State: AOJu0Yy0qbLuKAVJDDL8zELfHZIDRk7WLBlET4QzV7LU5rgQ74UmE+rS bf4kUgAeyRgUMwSEDoQV3eNVkquNg6yWdtdUtyl61hJKObvTvJnGbpm/7j9rhWpz+eB0XT417CL 8l/GCoeQsNJpV7ircgn2XtemyPsE= X-Gm-Gg: ASbGncsB26hYdvn95RNajjKP5eU2tk3SHG3bnOg6RPrZ3JZdgQzOLXG/+UE0SEkZWNK okzfqyvQWxT2Amqq5iWWj+4PtpVRJmZ10en61CrPQJyPs7DauW8fg6gOxIFuEy6KyuYOsMUWrSh ehjhraoOvsAoWyZkye2rQLid+zcxV7aMeBQ0U= X-Google-Smtp-Source: AGHT+IFyXEOfAANMR+O33YaJfx+LMF9PbirSFr1Vn1p0qiQGfZBOsR4Ad6Q29pmY/BIrM0tQ5/GQ8ejVVGUSCaSgxls= X-Received: by 2002:a05:6402:5252:b0:5ec:9e9d:486b with SMTP id 4fb4d7f45d1cf-5ed8e2555damr1636554a12.7.1743054398994; Wed, 26 Mar 2025 22:46:38 -0700 (PDT) MIME-Version: 1.0 From: Mateusz Guzik Date: Thu, 27 Mar 2025 06:46:27 +0100 X-Gm-Features: AQ5f1JoaueUTKsU1gVO-1_MYWRVom6GtoM43-XquFb4rfCKSAZw4X3Tz9cFB3uk Message-ID: Subject: not issuing vma_start_write() in dup_mmap() if the caller is single-threaded To: Suren Baghdasaryan Cc: linux-mm , "Liam R. Howlett" X-Stat-Signature: gkriz7bxg1zk544a959hedwx3krewero X-Rspam-User: X-Rspamd-Queue-Id: 10E5B8000C X-Rspamd-Server: rspam08 X-HE-Tag: 1743054400-415471 X-HE-Meta: U2FsdGVkX19dCusCf4DdGkIUjjMawvpPUkoswCpXqGXslCk6HcpwkU2I74H2tFTyDA+Zt/p8w6mFoSeMnaECi46D72k9LZ/dqGhF7EqK1dlRwkkbVRcu7biXS11cyJpi+rnuF6r040/JbFPUMRJFq5NJuTywDXROERMdsIgCPAb42Cp946d8Bx0KDhzN93uJX35KDKj5eqdKMr88JGMux4XsAy+rH9vuwBkUnfVB6IBeZ62L9fFBNBeXrBxbIdKnRWLgghtWrtCBGIM0jHeln165+6ImqvtIeUwwxYzLJ5ZsVA71LxaDoVvsV+RUavFqAq01AMHfVDiFIx+hA8Cr2IiUAaDJQpORq9SyGgHWMim1Bi3F6MqAVF8QmwGHhojj1hnqiGWTmfngf82s+3TNFN0kIiB4Jz+Iyr5uQtzEtoqn9Rp3aa2ubp4L3M+pwTWNu+o+eUR10n5RNqG2PwOtZfPgnbUVlDn96mwAlvNVlkQFIefsp4QStSxWaYeEMRAN7SM4uSeUNAVV0OvGBxeKemhIGbuDPl+R43V00fQsTcNUAoh8D3tZFZv7aQfVAOgXVWFd17SnMNVzDTHal6daSdPyplTpG0+UjnJ8z+rpYqIDw9ekSuEaB+LLeNjM+hVXEY20oIQ0+nGRhL+mciULnqkTifiG5c9n1LM0xf9ySO83F1QiabyZH4O5DEbI5tnUR4Sc2MB9117DN7hTkStfVT/GrxaBMZdgu6xmRxKxEYAPxCD6ceJSOW3U22vB4f7bpDFUfgfQF01m3qxAZEH1tH64W++QeotysPDmF2eTiJviRLZPxYSVz+yiA3/5is6kLfnyWQEXUHSpXCZuwVTnr3Uz+W/AnUQYADdBS0SpMBMXtqZtXfkF8sINV3G3WXUOAMkVg+8jw5Dvv1YrAVqpiPQXva1/Ky7Or/GfI8TwCjAY+s86O5b3sR3hHSBKRilPzTm9zRVxJjqKZbE071y XAbuM9VR 8J0CBLb4NYhOvHmczBvSywNuQip6K2cSRR5vOLYxemzQg2Ml/BCBmsmGh4uVpw7frYIjdAp+4616kVyAvkhoHa0E7/XKbL0+HX7frpoDF/W6+jpniO7sTrA4COLcrI27esVZvMhfmFgI6hui7PDbjxM3g8Pawx/G/pZnY2/j3ZxdScZsW1wxDu9MVRDl6YNY0/fE6RC7DUDwnaDXMf0+71qWqrIwKIE/r9QGBKe6gdET9EAJWcivUTBn1uISYbtigyihbd7vNotHslEmcYwroLGMhlnOUU1V5EPtNWfg6u9Epf1Rn4IMPvjjYkuyRtQOcmKmEAY6vcdNbTBk7dkDOBNyiYVnvku9+n+e01KoZRoB2kRM82dcGILP+ZOUP+Q0tjWSsVfc8l/Q4YL1A2gJPBjIk537DKHZI0VbF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Suren, I brought this up long time ago, you agreed the idea works and stated you will sort it out, but it apparently fell to the wayside (no hard feelings though :>) Here is the original: https://lore.kernel.org/all/20230804214620.btgwhsszsd7rh6nf@f/ the thread is weirdly long and I recommend not opening it without a good reason, I link it for reference if needed. At the time there were woes concerning stability of the new locking scheme, resulting in CC list being rather excessive. As such I'm starting a new thread with a modest list, not sure who to add though. So dup_mmap() holds the caller mmap_sem for writing and calls vma_start_write() to protect against fault handling in another threads using the same mm. If this is the only thread with this ->mm in place, there are no sibling threads to worry about and this can be checked with mm_users == 1. AFAICS all remote accesses require the mmap_sem to also be held, which provides exclusion against dup_mmap, meaning they don't pose a problem either. The patch to merely skip locking is a few liner and I would officially submit it myself, but for my taste an assert is needed in fault handling to runtime test the above invariant. So happens I really can't be bothered to figure out how to sort it out and was hoping you would step in. ;) Alternatively if you guys don't think the assert is warranted, that's your business. As for whether this can save any locking -- yes: I added a probe (below for reference) with two args: whether we are single-threaded and vma_start_write() returning whether it took the down/up cycle and ran make -j 20 in the kernel dir. The lock was taken for every single vma (377913 in total), while all forking processes were single-threaded. Or to put it differently all of these were skippable. the probe (total hack): bpftrace -e 'kprobe:dup_probe { @[arg0, arg1] = count(); }' probe diff: static __latent_entropy int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) @@ -638,9 +640,11 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, unsigned long charge = 0; LIST_HEAD(uf); VMA_ITERATOR(vmi, mm, 0); + bool only_user; if (mmap_write_lock_killable(oldmm)) return -EINTR; + only_user = atomic_read(&oldmm->mm_users) == 1; flush_cache_dup_mm(oldmm); uprobe_dup_mmap(oldmm, mm); /* @@ -664,8 +668,11 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, mt_clear_in_rcu(vmi.mas.tree); for_each_vma(vmi, mpnt) { struct file *file; + bool locked; + + locked = vma_start_write(mpnt); + dup_probe(only_user ? 1 :0, locked ? 1 : 0); - vma_start_write(mpnt); if (mpnt->vm_flags & VM_DONTCOPY) { retval = vma_iter_clear_gfp(&vmi, mpnt->vm_start, mpnt->vm_end, GFP_KERNEL); diff --git a/fs/namei.c b/fs/namei.c index ecb7b95c2ca3..d6cde76eda81 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -5459,3 +5459,7 @@ const struct inode_operations page_symlink_inode_operations = { .get_link = page_get_link, }; EXPORT_SYMBOL(page_symlink_inode_operations); + +void dup_probe(int, int); +void dup_probe(int, int) { } +EXPORT_SYMBOL(dup_probe); diff --git a/include/linux/mm.h b/include/linux/mm.h index 1f80baddacc5..f7b1f0a02f2e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -760,12 +760,12 @@ static bool __is_vma_write_locked(struct vm_area_struct *vma, unsigned int *mm_l * Exclude concurrent readers under the per-VMA lock until the currently * write-locked mmap_lock is dropped or downgraded. */ -static inline void vma_start_write(struct vm_area_struct *vma) +static inline bool vma_start_write(struct vm_area_struct *vma) { unsigned int mm_lock_seq; if (__is_vma_write_locked(vma, &mm_lock_seq)) - return; + return false; down_write(&vma->vm_lock->lock); /* @@ -776,6 +776,7 @@ static inline void vma_start_write(struct vm_area_struct *vma) */ WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); up_write(&vma->vm_lock->lock); + return true; } static inline void vma_assert_write_locked(struct vm_area_struct *vma) diff --git a/kernel/fork.c b/kernel/fork.c index 735405a9c5f3..0cc56255a339 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -629,6 +629,8 @@ static void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm) pr_warn_once("exe_file_deny_write_access() failed in %s\n", __func__); } +void dup_probe(int, int); + #ifdef CONFIG_MMU