From patchwork Mon Jan 9 20:53:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 13094278 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 151DAC54EBD for ; Mon, 9 Jan 2023 20:54:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AA9108E0010; Mon, 9 Jan 2023 15:54:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A59C28E0001; Mon, 9 Jan 2023 15:54:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FAA78E0010; Mon, 9 Jan 2023 15:54:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 80A1B8E0001 for ; Mon, 9 Jan 2023 15:54:13 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6407840F44 for ; Mon, 9 Jan 2023 20:54:13 +0000 (UTC) X-FDA: 80336463186.10.4DF2CCD Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf19.hostedemail.com (Postfix) with ESMTP id D270C1A000E for ; Mon, 9 Jan 2023 20:54:11 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=TUshLIkn; spf=pass (imf19.hostedemail.com: domain of 38368YwYKCAg02zmvjowwotm.kwutqv25-uus3iks.wzo@flex--surenb.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=38368YwYKCAg02zmvjowwotm.kwutqv25-uus3iks.wzo@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673297651; a=rsa-sha256; cv=none; b=d0iv2wGL3Liu1sdhAw5qa2wfjfW/2cyft5HMaA18TBho+VlOFuz+bB474mHdPt0r41CiHj q0aLqS7HTG7iJ8wuac5Gtp2Sytdd6MaOJbxJlcx+vW21hpZTjEeY4brIIez16xVlZxjqwU GNSQrzBBplQf3SirkIGt3U4qNVw++6w= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=TUshLIkn; spf=pass (imf19.hostedemail.com: domain of 38368YwYKCAg02zmvjowwotm.kwutqv25-uus3iks.wzo@flex--surenb.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=38368YwYKCAg02zmvjowwotm.kwutqv25-uus3iks.wzo@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673297651; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=79bvrukKvLAS13wwuEGJsW+4aNck296i4OOdN8AZrSw=; b=VeMdRzZbpWfHcTK6cJXIGDT0sl4i+QP1yrJ5VSZJZbWM9zDgrMfO6b7+FVw99mBfSlmG+5 8eYt0XOSPXAdTPR5/QdItTPW2J4MyI/EMt7R2ldf1Pn6W882bmr994VFZ50K1HqR6fhFyJ MGLsQ7Gy8fs8kDI2FaY9/KyA/pT51MA= Received: by mail-yb1-f202.google.com with SMTP id l194-20020a2525cb000000b007b411fbdc13so10389429ybl.23 for ; Mon, 09 Jan 2023 12:54:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=79bvrukKvLAS13wwuEGJsW+4aNck296i4OOdN8AZrSw=; b=TUshLIkna8lEowgf738l64K6OhURrpjePFxYxdKUuWhyUGH2D62dGB+OGRki5oXp4C xTYuRmR7XzQ7I/28ieatH6mJigev2F3K0HIzC6hflnBGI34WrRdJkJZ4AZfQFOQ8gGav 9UNpZpO2F0owMoX4KYmO5GlLvBicz0eIhvCyoFVAlBVkJOLooeK67cBYtsiWvskUp6uE tE8t31WtIP9ueuHL/u/lXs1s/RC2HGn9FvMQfHmx5Itoy4F3Nj/5Ls77RI1e7g50JCXC 9+/P6cV1602qOnEnSC6h6SSpk6hPN1ukv5Ai24vZZWfGTttvD7F17H6z929cULqI8Sm7 oWIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=79bvrukKvLAS13wwuEGJsW+4aNck296i4OOdN8AZrSw=; b=tmzsCP5ujuODcCJrqKhZhZzt5RlItmyXKxGDVXHbYREHQaL8HnWFrQTKF+1/ywksKe O6VEUaLzjRvQVMEHh0VAJ0K9uArHmMT8cCEikObV5TohYxAuq6Fru6PSetSmqZYAfwwn dyvRyTDsGhjHs2KtlzrUIo/kwN/nV+z1d/On7ybRr6Knyc7RAK3AjbkH3buEZRYfpbtP 1neGI0UsmLCoDQLC4gzU/AicF1PHLR7QK0VYqn+fk0i/WCN1A95CtCoNiQhrm3PK1goi UXDQ6brycIHeRPk1RJEC12cx6qhbJJy+b1yxZLZQsvgeEkMI/K4pMRvrA7oVtrR6GQWS u1SA== X-Gm-Message-State: AFqh2koCV9aJcXINjCj8oAAqoBJhhg+Mz9l7DwAb2RBFTDHs2k7ZO+Ff zzFNPpEbAsFeZaLg9ZwO005s5ZIpYeQ= X-Google-Smtp-Source: AMrXdXukETqaSxCpG7iZuCRz7oEpYHbdWSzbf1+vDAfX24FIika8R4beORIYrEIIAWwV84R+bp3+gXy0ol4= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:9393:6f7a:d410:55ca]) (user=surenb job=sendgmr) by 2002:a0d:c481:0:b0:4ad:7104:1f66 with SMTP id g123-20020a0dc481000000b004ad71041f66mr3418570ywd.53.1673297651084; Mon, 09 Jan 2023 12:54:11 -0800 (PST) Date: Mon, 9 Jan 2023 12:53:07 -0800 In-Reply-To: <20230109205336.3665937-1-surenb@google.com> Mime-Version: 1.0 References: <20230109205336.3665937-1-surenb@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230109205336.3665937-13-surenb@google.com> Subject: [PATCH 12/41] mm: add per-VMA lock and helper functions to control it From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, paulmck@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, hughlynch@google.com, leewalsh@google.com, posk@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: D270C1A000E X-Stat-Signature: 1gdtqbxaa4ofcpdkwm877rg3kkawe74w X-HE-Tag: 1673297651-451644 X-HE-Meta: U2FsdGVkX19FAeAwNlsbbGZHwM3MvxNVbvd7QwVLgRO5M+qBSWaLGloUGi234e1vzwIT7kOUzKLPkXUeyax2WR4gS4X0S/m7g6rf44ou3uqwAiXtsHi9W0yAacCRZSvMwrxMj/b9vxzm3OSXnoaTP+MQjYi2uv2DQpSTyWErIwflXw5zYQcXzA+lQ1wdX43L5ImfBZSf9vyLgn+0diquFnonfr7qFBXRxW6k89BKHGnh5p8hUzvfjaHziNVuEfNdRyhOiVjmU1dRNxCBz0RsqNgBPQoZP3olOVn0gdeg64+54Iz/OTjodpGFPs0SorkLUvFig6lLo88/c+Q1o/EaS+H29Yn3O/A2YN86d6U8EcOy1RM8QW1dFalTksFBY19AumvvxIjb9VpdSwlERHVHrRMrZfwuiianwb/+sU5+6CPK014NVhLueVynN1PdKaoEeFoKIfm2DzlnkbQEVkb3UgYiQEyCWYkltgnEtvYju00Yd3jvvMBkh0GXBBdl/ds7HuZkcnfOrk2HvRilL/6735Qp70k03muHLYsgyHEvYl8BxbAl/AKoXrV/OPF0ffd0nTHM0/COHzTsXWiT1uRwUey6F73Ky17emdk5IWaWS8Hop60UtrFHZCXLB1gTLJbXPYs/RJiH8wGjaKDRiQW0lOky8/13c2DpBaU7Xrgrm3o9nGeIpk8PbjvdTrBpcDFLWW9hJTsd4zv05K4xrxgUVjyfCc9vckUKD11QA+FcnaAffQRl36t/B565/N7c05TDyng9qI7zEnCnaptBl+XMl4qAlnyP0h0wkdE8Pr/KmdNawExEGNY+32Dup6S9fAYtER8WcenNvoDsMTDRH11VwpjoWUI+22CLOB0ucS9dZshI00oORD/VhT19wB+PshkApaUipOr1MBCBgjgLcJHImBvNXOP8MMRkhCGbhGlTGaY3QHrl4O4jxVRb5tKorLgaV4B/TqejpNnDfPFFHfI JdgIwWgV Kv/DUA/17fVrFpqAg230zHh7YvfdXe5sTzxdKNPwU6eqiHh5bV2h6D/BqRuPRQccZ8dPmpRYnJM5rjE/7PYnNoqfMmlF1qNx++9FCxzMcVgKjfR1DyQJjdG1shgT/gKoyRfDk/r3AOELMl+YEbaHh1uFAg16DlZkNzkMlWL8pSnvckKN7wnkMwGTXbPKfvJws/00kbQLB9u4sz4lq75mcOtU+BVRrBBmIUhgjY0o3xhBm/az03m7sZR2kZIpfIOx3UF2apP8dCZf8mFyC6x9xY+Pn6BjqInmeXhYkpZGBDHOEFyqPI3wq4QZGZymem6/sE16e7o48fdCCvnt2WtvbPoMugByBylSgBnJOX775U/CUsk1SmVK9g/DZGg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduce a per-VMA rw_semaphore to be used during page fault handling instead of mmap_lock. Because there are cases when multiple VMAs need to be exclusively locked during VMA tree modifications, instead of the usual lock/unlock patter we mark a VMA as locked by taking per-VMA lock exclusively and setting vma->lock_seq to the current mm->lock_seq. When mmap_write_lock holder is done with all modifications and drops mmap_lock, it will increment mm->lock_seq, effectively unlocking all VMAs marked as locked. VMA lock is placed on the cache line boundary so that its 'count' field falls into the first cache line while the rest of the fields fall into the second cache line. This lets the 'count' field to be cached with other frequently accessed fields and used quickly in uncontended case while 'owner' and other fields used in the contended case will not invalidate the first cache line while waiting on the lock. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 80 +++++++++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 8 ++++ include/linux/mmap_lock.h | 13 +++++++ kernel/fork.c | 4 ++ mm/init-mm.c | 3 ++ 5 files changed, 108 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index f3f196e4d66d..ec2c4c227d51 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -612,6 +612,85 @@ struct vm_operations_struct { unsigned long addr); }; +#ifdef CONFIG_PER_VMA_LOCK +static inline void vma_init_lock(struct vm_area_struct *vma) +{ + init_rwsem(&vma->lock); + vma->vm_lock_seq = -1; +} + +static inline void vma_write_lock(struct vm_area_struct *vma) +{ + int mm_lock_seq; + + mmap_assert_write_locked(vma->vm_mm); + + /* + * current task is holding mmap_write_lock, both vma->vm_lock_seq and + * mm->mm_lock_seq can't be concurrently modified. + */ + mm_lock_seq = READ_ONCE(vma->vm_mm->mm_lock_seq); + if (vma->vm_lock_seq == mm_lock_seq) + return; + + down_write(&vma->lock); + vma->vm_lock_seq = mm_lock_seq; + up_write(&vma->lock); +} + +/* + * Try to read-lock a vma. The function is allowed to occasionally yield false + * locked result to avoid performance overhead, in which case we fall back to + * using mmap_lock. The function should never yield false unlocked result. + */ +static inline bool vma_read_trylock(struct vm_area_struct *vma) +{ + /* Check before locking. A race might cause false locked result. */ + if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) + return false; + + if (unlikely(down_read_trylock(&vma->lock) == 0)) + return false; + + /* + * Overflow might produce false locked result. + * False unlocked result is impossible because we modify and check + * vma->vm_lock_seq under vma->lock protection and mm->mm_lock_seq + * modification invalidates all existing locks. + */ + if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) { + up_read(&vma->lock); + return false; + } + return true; +} + +static inline void vma_read_unlock(struct vm_area_struct *vma) +{ + up_read(&vma->lock); +} + +static inline void vma_assert_write_locked(struct vm_area_struct *vma) +{ + mmap_assert_write_locked(vma->vm_mm); + /* + * current task is holding mmap_write_lock, both vma->vm_lock_seq and + * mm->mm_lock_seq can't be concurrently modified. + */ + VM_BUG_ON_VMA(vma->vm_lock_seq != READ_ONCE(vma->vm_mm->mm_lock_seq), vma); +} + +#else /* CONFIG_PER_VMA_LOCK */ + +static inline void vma_init_lock(struct vm_area_struct *vma) {} +static inline void vma_write_lock(struct vm_area_struct *vma) {} +static inline bool vma_read_trylock(struct vm_area_struct *vma) + { return false; } +static inline void vma_read_unlock(struct vm_area_struct *vma) {} +static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} + +#endif /* CONFIG_PER_VMA_LOCK */ + static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) { static const struct vm_operations_struct dummy_vm_ops = {}; @@ -620,6 +699,7 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) vma->vm_mm = mm; vma->vm_ops = &dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); + vma_init_lock(vma); } static inline void vma_set_anonymous(struct vm_area_struct *vma) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index d5cdec1314fe..5f7c5ca89931 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -555,6 +555,11 @@ struct vm_area_struct { pgprot_t vm_page_prot; unsigned long vm_flags; /* Flags, see mm.h. */ +#ifdef CONFIG_PER_VMA_LOCK + int vm_lock_seq; + struct rw_semaphore lock; +#endif + /* * For areas with an address space and backing store, * linkage into the address_space->i_mmap interval tree. @@ -680,6 +685,9 @@ struct mm_struct { * init_mm.mmlist, and are protected * by mmlist_lock */ +#ifdef CONFIG_PER_VMA_LOCK + int mm_lock_seq; +#endif unsigned long hiwater_rss; /* High-watermark of RSS usage */ diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index e49ba91bb1f0..40facd4c398b 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -72,6 +72,17 @@ static inline void mmap_assert_write_locked(struct mm_struct *mm) VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); } +#ifdef CONFIG_PER_VMA_LOCK +static inline void vma_write_unlock_mm(struct mm_struct *mm) +{ + mmap_assert_write_locked(mm); + /* No races during update due to exclusive mmap_lock being held */ + WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1); +} +#else +static inline void vma_write_unlock_mm(struct mm_struct *mm) {} +#endif + static inline void mmap_init_lock(struct mm_struct *mm) { init_rwsem(&mm->mmap_lock); @@ -114,12 +125,14 @@ static inline bool mmap_write_trylock(struct mm_struct *mm) static inline void mmap_write_unlock(struct mm_struct *mm) { __mmap_lock_trace_released(mm, true); + vma_write_unlock_mm(mm); up_write(&mm->mmap_lock); } static inline void mmap_write_downgrade(struct mm_struct *mm) { __mmap_lock_trace_acquire_returned(mm, false, true); + vma_write_unlock_mm(mm); downgrade_write(&mm->mmap_lock); } diff --git a/kernel/fork.c b/kernel/fork.c index 5986817f393c..c026d75108b3 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -474,6 +474,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) */ *new = data_race(*orig); INIT_LIST_HEAD(&new->anon_vma_chain); + vma_init_lock(new); dup_anon_vma_name(orig, new); } return new; @@ -1145,6 +1146,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, seqcount_init(&mm->write_protect_seq); mmap_init_lock(mm); INIT_LIST_HEAD(&mm->mmlist); +#ifdef CONFIG_PER_VMA_LOCK + WRITE_ONCE(mm->mm_lock_seq, 0); +#endif mm_pgtables_bytes_init(mm); mm->map_count = 0; mm->locked_vm = 0; diff --git a/mm/init-mm.c b/mm/init-mm.c index c9327abb771c..33269314e060 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -37,6 +37,9 @@ struct mm_struct init_mm = { .page_table_lock = __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock), .arg_lock = __SPIN_LOCK_UNLOCKED(init_mm.arg_lock), .mmlist = LIST_HEAD_INIT(init_mm.mmlist), +#ifdef CONFIG_PER_VMA_LOCK + .mm_lock_seq = 0, +#endif .user_ns = &init_user_ns, .cpu_bitmap = CPU_BITS_NONE, #ifdef CONFIG_IOMMU_SVA