From patchwork Mon Nov 11 20:55:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 13871368 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC1AFD3ABF5 for ; Mon, 11 Nov 2024 20:55:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B45946B00D1; Mon, 11 Nov 2024 15:55:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AF5636B00D2; Mon, 11 Nov 2024 15:55:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 884C76B00D3; Mon, 11 Nov 2024 15:55:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5FD2E6B00D1 for ; Mon, 11 Nov 2024 15:55:17 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 18A061A1AC4 for ; Mon, 11 Nov 2024 20:55:17 +0000 (UTC) X-FDA: 82775018508.05.4BABAFB Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf04.hostedemail.com (Postfix) with ESMTP id 7504E40008 for ; Mon, 11 Nov 2024 20:54:23 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=otbmjnde; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 3Mm8yZwYKCBsJLI5E27FF7C5.3FDC9ELO-DDBM13B.FI7@flex--surenb.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3Mm8yZwYKCBsJLI5E27FF7C5.3FDC9ELO-DDBM13B.FI7@flex--surenb.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731358387; a=rsa-sha256; cv=none; b=nMdGdz0G3BUM8JRpEo9j3gErqsCWEmP0C7YNhRyzB/ySG9I98gE70yqF6j7V/E1DtImawZ xxT65HkkN7/NCyDE33GEkFG+v1SLrY0rcYlEzElqEZscYxV2LDZvN3/W+gSKGH6yvWP9IC BYj4oMmwViPqs3fY9FbhOS7CP77Oe04= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=otbmjnde; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 3Mm8yZwYKCBsJLI5E27FF7C5.3FDC9ELO-DDBM13B.FI7@flex--surenb.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3Mm8yZwYKCBsJLI5E27FF7C5.3FDC9ELO-DDBM13B.FI7@flex--surenb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731358387; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ykW1gbgekt/rdC2PDe+Q7K6R2D2rbM+L22MrFdz7vw8=; b=Co3YIpkj7tY5j3svQtrZvlxC4+OVSbLCpLiSXu0Gw9urF1oebFyM+QiZIsDE59vcLyArcT KcVocgfA2eQ5wckGJxpiC6K4lIyiEWHIlcSuU8XvHHXSEkIJw6ztSprV1kVyg2AbieIA9a /COJLkGUP87D8HHxPndyNm0UbbTa9D8= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-6eae6aba6d4so46688067b3.3 for ; Mon, 11 Nov 2024 12:55:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731358514; x=1731963314; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ykW1gbgekt/rdC2PDe+Q7K6R2D2rbM+L22MrFdz7vw8=; b=otbmjnderUlQLg55D7yG8DI+1FSM7QBiLF0qiB3es3pg1e4PTQCb0SsremIcP9pTEZ 8djdn4rpHGYPHXeSpGzfqt34y+ADJibhJZHgorl7R6QdGXv8Z6buvEsBz9TAsro9A81P eRwYlHZbugmsD1mRgo0GVYS9OIAcQHyllIQ4iKs62v1iMss6CmsfXeNPmGriGxTj4uSt kXaQsEpCLje5HHItMMlBSIUxTZH+InXKZdl25kkdeLggHy+90N4agLzWxaSnfxMAtpyv mqQ7ddKjxCQMuJ2H+p9tScUlFWzefVjNK/B9zA1MD8o583sD2ncWIQ1e9AiEluY0JFcD uuWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731358514; x=1731963314; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ykW1gbgekt/rdC2PDe+Q7K6R2D2rbM+L22MrFdz7vw8=; b=J12TNII5eMfY8biatvusA7GktZgFgdTXBFNAh1IL/CudBRKZAWG4kvdbN1cm+Z0CNl 6gj4FlxYzYr5qYaWWLQjYLQVolP6iw+cDEbJksUvfiIXOCOWEEyfeIrVKPESWXOHNUES C18o2EmKB4iyaOKKWOFQ6htHuLAilM3PZrNkGH/AW2l7Kn03ud+Ns3JY5jnwQoUrTiIJ o6TIlsFeukd7SlDdkTj1ypCt6s41TpyXhCEHOn8xoBf53NqSxAz0B1TBibFnmYZn3sZ+ kmu/m4IdtZFa7nEe+nWwNvqMbX+LrRD7+Cqj9ON7Tk7OOWQsfTIdaymMTNWuMDVYnLAy PB3g== X-Forwarded-Encrypted: i=1; AJvYcCVUU91k+tBkbNxyyFOJ7uB1pHmHdTYEhlC34QNZH8Ws0LEyuWZvopzkORyB3I4FUyLhUTojHeL8HQ==@kvack.org X-Gm-Message-State: AOJu0YzkrqpNlcKO18Th7H4W1cH+C/naPYbIfspTTOUyFA8Yq8iPRLrl elJdEm3jEej1qiDHL2juWef+VRls9vXLp2Jb2C2B6qWnt+cRJVazbqFGwfIqA8vqejDWgTFuOd5 FIA== X-Google-Smtp-Source: AGHT+IH1frtfP2Lf4sm0W9DSf94AOlw0snxCLYaHFUcao7FdY9FAQlPyHOnZbHHIWvjf/NOxp7z7fX74rks= X-Received: from surenb-desktop.mtv.corp.google.com ([2a00:79e0:2e3f:8:53af:d9fa:522d:99b1]) (user=surenb job=sendgmr) by 2002:a05:690c:887:b0:6b2:6cd4:7f9a with SMTP id 00721157ae682-6eaddff31bcmr1462977b3.8.1731358514433; Mon, 11 Nov 2024 12:55:14 -0800 (PST) Date: Mon, 11 Nov 2024 12:55:04 -0800 In-Reply-To: <20241111205506.3404479-1-surenb@google.com> Mime-Version: 1.0 References: <20241111205506.3404479-1-surenb@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241111205506.3404479-3-surenb@google.com> Subject: [PATCH 2/4] mm: move per-vma lock into vm_area_struct From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com X-Rspamd-Queue-Id: 7504E40008 X-Stat-Signature: pcten4qrscw5915frzsn976sfijobg8y X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1731358463-494289 X-HE-Meta: U2FsdGVkX1+7WoVldU1VXcLPq0cSIrVFkJAL58roMqOBQA9lQtCKihtpzuBCHvMnQtRA606CQMFDpUD3sQA2Y81eKR9IQ76+MmgHub69Hf7AZ52phrWOSXu/wLM6UUPQ9sUkOlT4OsLpm6FyY1HYSdLM33JntneEeQ19B8TWc1u+xWg8dGDIvKRrLONC0whjxoHH/7dpdbZM8ElL8/KKxxTlY27NOVCdUoMzoCQEQ/tC4T2J7wdz+XKRD/SNvqzdQ5y5HQZBZtskKrzhxj4jBS0giRGTaC0f9Md+7tF0HzOStpjHiXZkW3T7s6gdg3iWd8VtYfp3CmGN86g8cH82O+/UytYpQDWqdk3v0pEraaYE+iyNl1DcZd0ee7SojYNaAJ8a+UogmQMEgJoXrFCmVC4O29ixZM5gZ6RqQXcOFPYtRP1LHpusJB66ksC1adY7afe5xDXnlkL29/n+Sq4S42QbYXgZynDxGWOHZtuciEiYBDASndc0vnlY5Y9Hb0b3ylZoHRiHJ7Mu7ThO6dmk6eCaSQRCFUqq4X5dbsXekyQCkYmBYV5NbrPmv/299mcNpLAkYudQrsSPqrdht40wGSiwncGNB5HU5ohygqRKj8gtRdPuPaXm8rywrIQylDLujvOfKSpO1nRhuzE6NZ9oN0fijBhRvij+Sb9VaGhJ9neY6dlzovwT+95PSCP+loWGQPuMibIwrlPgJgMtgPRNG2ucS3BJJ0kSdL4tTbx27vAGy3Wm0gj7TBH3RJt3phf0p2Su7adKVuqKK0thQd7LMogz4tZxBVmDOh0dVXYRmn3XejPeQvvZb/PF4VhiwZyqOtHv/lll8sXa+6SmA6d7XIEV4cUaCuK2617uIoJ766K3JnlOIvD7dDbTLtgUORK82yMFee+Ug3ZMb3yUftp8vifCqKH3UIjoT9+SqalD6NVeghoY/JLascu11Ub4SySOAMf3oIKQW4/X4AK4sj7 SGY1xcic 0hakpeWunoN4VA50/Unl5P8eVqwbVHIGCcjTsqGK8HuiUsJrtiKg/5jgEV4iG7Z4A1WH9adqxHy+SvR/0Ehxmct+ILcPDSTsFpz42K//UYBtyR5z0GMxnSkv1qXW7jW9ZM1HTA5W5EV6mfaa8Q/qBqCwKKFw4ZmnZ34xVOIZn67jqklWA0TZmNyM9Y4Z8k6mCZeybGZfo+M3H13PvoemNrWy2IkDf+apKMQ3kXi4huUMx0nyqWAJJfvcrLtAh9lCQen+tv4Nrkw09jIBIg+ds68d8NXdIbws3cgG0sT6WN3mrBOvOBI89Fe44eFqsUa2HPxiB+vt6AZP/Vex3ku92dUCyl0inN56An68uo0HcCQlZOVaEXcE6exqkL+ZvNXc7UiUdEH3MMxLwE+SAGuXpF9AhJ4+Pvk2QTHAvqvDUilbdr40c7qSoIqDtwo7ibkt8fSJ8rzF6c3Nl3A7o0YtxcSmxhI1eVnNeisRzXo2tRGR0uv7dxenDcdvyC8BPt47H3XfZdtQI3tuuYDV1+vusgrShfLkgaaQIOiMAjUSUm1opvGDY6ljgdXJrGTx0NxdIkpfw67FYoSdSq5BAC/feRD6JM8EiBebep+dKF17wdTGYItdq1tDNlVlmO9JW5/LmwIJk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Back when per-vma locks were introduces, vm_lock was moved out of vm_area_struct in [1] because of the performance regression caused by false cacheline sharing. Recent investigation [2] revealed that the regressions is limited to a rather old Broadwell microarchitecture and even there it can be mitigated by disabling adjacent cacheline prefetching, see [3]. This patchset moves vm_lock back into vm_area_struct, aligning it at the cacheline boundary and changing the cache to be cache-aligned as well. This causes VMA memory consumption to grow from 160 (vm_area_struct) + 40 (vm_lock) bytes to 256 bytes: slabinfo before: ... : ... vma_lock ... 40 102 1 : ... vm_area_struct ... 160 51 2 : ... slabinfo after moving vm_lock: ... : ... vm_area_struct ... 256 32 2 : ... Aggregate VMA memory consumption per 1000 VMAs grows from 50 to 64 pages, which is 5.5MB per 100000 VMAs. This memory consumption growth will be addressed in the patches that follow. [1] https://lore.kernel.org/all/20230227173632.3292573-34-surenb@google.com/T/#m861679f3fe0e22c945d6334b88dc996fef5ea6cc [2] https://lore.kernel.org/all/ZsQyI%2F087V34JoIt@xsang-OptiPlex-9020/ [3] https://lore.kernel.org/all/CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbfP_pR+-2g@mail.gmail.com/ Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 27 ++++++++++++---------- include/linux/mm_types.h | 6 +++-- kernel/fork.c | 50 +++++----------------------------------- 3 files changed, 25 insertions(+), 58 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 01ce619f3d17..c1c2899464db 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -684,6 +684,11 @@ static inline void vma_numab_state_free(struct vm_area_struct *vma) {} #endif /* CONFIG_NUMA_BALANCING */ #ifdef CONFIG_PER_VMA_LOCK +static inline void vma_lock_init(struct vma_lock *vm_lock) +{ + init_rwsem(&vm_lock->lock); +} + /* * Try to read-lock a vma. The function is allowed to occasionally yield false * locked result to avoid performance overhead, in which case we fall back to @@ -701,7 +706,7 @@ static inline bool vma_start_read(struct vm_area_struct *vma) if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq.sequence)) return false; - if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0)) + if (unlikely(down_read_trylock(&vma->vm_lock.lock) == 0)) return false; /* @@ -716,7 +721,7 @@ static inline bool vma_start_read(struct vm_area_struct *vma) * This pairs with RELEASE semantics in vma_end_write_all(). */ if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&vma->vm_mm->mm_lock_seq))) { - up_read(&vma->vm_lock->lock); + up_read(&vma->vm_lock.lock); return false; } return true; @@ -729,7 +734,7 @@ static inline bool vma_start_read(struct vm_area_struct *vma) static inline void vma_start_read_locked_nested(struct vm_area_struct *vma, int subclass) { mmap_assert_locked(vma->vm_mm); - down_read_nested(&vma->vm_lock->lock, subclass); + down_read_nested(&vma->vm_lock.lock, subclass); } /* @@ -739,13 +744,13 @@ static inline void vma_start_read_locked_nested(struct vm_area_struct *vma, int static inline void vma_start_read_locked(struct vm_area_struct *vma) { mmap_assert_locked(vma->vm_mm); - down_read(&vma->vm_lock->lock); + down_read(&vma->vm_lock.lock); } static inline void vma_end_read(struct vm_area_struct *vma) { rcu_read_lock(); /* keeps vma alive till the end of up_read */ - up_read(&vma->vm_lock->lock); + up_read(&vma->vm_lock.lock); rcu_read_unlock(); } @@ -774,7 +779,7 @@ static inline void vma_start_write(struct vm_area_struct *vma) if (__is_vma_write_locked(vma, &mm_lock_seq)) return; - down_write(&vma->vm_lock->lock); + down_write(&vma->vm_lock.lock); /* * We should use WRITE_ONCE() here because we can have concurrent reads * from the early lockless pessimistic check in vma_start_read(). @@ -782,7 +787,7 @@ static inline void vma_start_write(struct vm_area_struct *vma) * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. */ WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); - up_write(&vma->vm_lock->lock); + up_write(&vma->vm_lock.lock); } static inline void vma_assert_write_locked(struct vm_area_struct *vma) @@ -794,7 +799,7 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma) static inline void vma_assert_locked(struct vm_area_struct *vma) { - if (!rwsem_is_locked(&vma->vm_lock->lock)) + if (!rwsem_is_locked(&vma->vm_lock.lock)) vma_assert_write_locked(vma); } @@ -861,10 +866,6 @@ static inline void assert_fault_locked(struct vm_fault *vmf) extern const struct vm_operations_struct vma_dummy_vm_ops; -/* - * WARNING: vma_init does not initialize vma->vm_lock. - * Use vm_area_alloc()/vm_area_free() if vma needs locking. - */ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) { memset(vma, 0, sizeof(*vma)); @@ -873,6 +874,8 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) INIT_LIST_HEAD(&vma->anon_vma_chain); vma_mark_detached(vma, false); vma_numab_state_init(vma); + vma_lock_init(&vma->vm_lock); + vma->vm_lock_seq = UINT_MAX; } /* Use when VMA is not part of the VMA tree and needs no locking */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 80fef38d9d64..5c4bfdcfac72 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -716,8 +716,6 @@ struct vm_area_struct { * slowpath. */ unsigned int vm_lock_seq; - /* Unstable RCU readers are allowed to read this. */ - struct vma_lock *vm_lock; #endif /* @@ -770,6 +768,10 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_PER_VMA_LOCK + /* Unstable RCU readers are allowed to read this. */ + struct vma_lock vm_lock ____cacheline_aligned_in_smp; +#endif } __randomize_layout; #ifdef CONFIG_NUMA diff --git a/kernel/fork.c b/kernel/fork.c index 0061cf2450ef..9e504105f24f 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -436,35 +436,6 @@ static struct kmem_cache *vm_area_cachep; /* SLAB cache for mm_struct structures (tsk->mm) */ static struct kmem_cache *mm_cachep; -#ifdef CONFIG_PER_VMA_LOCK - -/* SLAB cache for vm_area_struct.lock */ -static struct kmem_cache *vma_lock_cachep; - -static bool vma_lock_alloc(struct vm_area_struct *vma) -{ - vma->vm_lock = kmem_cache_alloc(vma_lock_cachep, GFP_KERNEL); - if (!vma->vm_lock) - return false; - - init_rwsem(&vma->vm_lock->lock); - vma->vm_lock_seq = UINT_MAX; - - return true; -} - -static inline void vma_lock_free(struct vm_area_struct *vma) -{ - kmem_cache_free(vma_lock_cachep, vma->vm_lock); -} - -#else /* CONFIG_PER_VMA_LOCK */ - -static inline bool vma_lock_alloc(struct vm_area_struct *vma) { return true; } -static inline void vma_lock_free(struct vm_area_struct *vma) {} - -#endif /* CONFIG_PER_VMA_LOCK */ - struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) { struct vm_area_struct *vma; @@ -474,10 +445,6 @@ struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) return NULL; vma_init(vma, mm); - if (!vma_lock_alloc(vma)) { - kmem_cache_free(vm_area_cachep, vma); - return NULL; - } return vma; } @@ -496,10 +463,8 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) * will be reinitialized. */ data_race(memcpy(new, orig, sizeof(*new))); - if (!vma_lock_alloc(new)) { - kmem_cache_free(vm_area_cachep, new); - return NULL; - } + vma_lock_init(&new->vm_lock); + new->vm_lock_seq = UINT_MAX; INIT_LIST_HEAD(&new->anon_vma_chain); vma_numab_state_init(new); dup_anon_vma_name(orig, new); @@ -511,7 +476,6 @@ void __vm_area_free(struct vm_area_struct *vma) { vma_numab_state_free(vma); free_anon_vma_name(vma); - vma_lock_free(vma); kmem_cache_free(vm_area_cachep, vma); } @@ -522,7 +486,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head) vm_rcu); /* The vma should not be locked while being destroyed. */ - VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock->lock), vma); + VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock.lock), vma); __vm_area_free(vma); } #endif @@ -3168,11 +3132,9 @@ void __init proc_caches_init(void) sizeof(struct fs_struct), 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); - - vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT); -#ifdef CONFIG_PER_VMA_LOCK - vma_lock_cachep = KMEM_CACHE(vma_lock, SLAB_PANIC|SLAB_ACCOUNT); -#endif + vm_area_cachep = KMEM_CACHE(vm_area_struct, + SLAB_HWCACHE_ALIGN|SLAB_NO_MERGE|SLAB_PANIC| + SLAB_ACCOUNT); mmap_init(); nsproxy_cache_init(); }