From patchwork Mon Nov 11 20:55:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 13871367 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73C91D3ABF4 for ; Mon, 11 Nov 2024 20:55:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2CB96B00D0; Mon, 11 Nov 2024 15:55:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D8CC26B00D1; Mon, 11 Nov 2024 15:55:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB8F16B00D2; Mon, 11 Nov 2024 15:55:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9848D6B00D1 for ; Mon, 11 Nov 2024 15:55:15 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4C15A1C7230 for ; Mon, 11 Nov 2024 20:55:15 +0000 (UTC) X-FDA: 82775018088.21.9BE13B9 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf02.hostedemail.com (Postfix) with ESMTP id 4440C8001B for ; Mon, 11 Nov 2024 20:53:56 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=iC03jwuV; spf=pass (imf02.hostedemail.com: domain of 3MG8yZwYKCBkHJG3C05DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--surenb.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3MG8yZwYKCBkHJG3C05DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731358459; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c6L2TsxMW3xdptYp+SxjhxiQb9Rn3imOm+Hs2oG6vZc=; b=fTrDTuUPZ0AkCFSB2C5v7YkZ1U+virIStaXPWI+ocS/vYkQGpHR/sDCf42q2L61ov7q/XV 0l2ptGk5hUNixGIeRcB7XX3hy1U7Tl+TqSKnsCrBeixUt/6uHgf/y7G/Ug0bxVcIc7niW/ UGLuZw6b+aRBbuJH/f0Mt09n1MwSHOY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731358459; a=rsa-sha256; cv=none; b=zHGhr7KwxyDzgQLORUVstPcCHhks/dQ0mwhWsvFl/Mr8JQqpt/YECYpMlOZC9qfZ8WDkEZ QR4r+WYChOH2SyUDUNQ3FxmWJfCxRHEtD+pIgRmZ9gi2cw6UBCN4SWjiW8w01PID4IBjDh W7S0m+R7eIY9mhMjTOFuUVRCkY63Ipo= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=iC03jwuV; spf=pass (imf02.hostedemail.com: domain of 3MG8yZwYKCBkHJG3C05DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--surenb.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3MG8yZwYKCBkHJG3C05DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e28fdb4f35fso7296663276.0 for ; Mon, 11 Nov 2024 12:55:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731358512; x=1731963312; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=c6L2TsxMW3xdptYp+SxjhxiQb9Rn3imOm+Hs2oG6vZc=; b=iC03jwuV3hW7J/8egjY52QJI9kWwodL6JT09xBbgB1/W3KiLdN/wU8xorGkwhJxKpA kH8LbOveqx3HeKk4HZrd/RaANP6ubdvKWldOpkMnLpccZKSRQVLw9CMgkbT+75FFpxDe EARD/2PO69xUa7KVOsrh3YIo6FU+s9KmEhVfkrVme6eHU6vE0FxTT3zXbHki5yBYFSuh 9t4Yvqm+N/aXMpJwgZ7ar5pnYXf9C3ifOJ65E432c+KL5GCUTg7oE+9oCf4tk6vL8MkG KHDQcuJCEI/E3od8r/yo2B0TcCxb3wtNT62vUJVLzLywo9fSzftdOqG0QCwxxW3xuYwh pXEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731358512; x=1731963312; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=c6L2TsxMW3xdptYp+SxjhxiQb9Rn3imOm+Hs2oG6vZc=; b=MInlW8cnJACr5wbchayQaswRfSv3hmN0WJacOkGtyghwJsv0zrK0QruTuHXQgJEs+L 7OZNKD3X8IMcPZzKQ9e/cjCL3ZcKepR9msrViQnaJfCkkyHkwZFCQz+OcZ32wisDt+u2 oiKl8j9czHhtd/8jFsb+qvIxX6AwjzF9FfgczxQz6mLTbRG5CJRbnGkiANfPRtnFLIuG 35QbxKnR7sCb6P8CIvv7y7UWtwlr0mRBrVTCd4F+sbNyuIDD5QBr3SHKNBdLP9UFkyZ1 X8+FPEdQDn6sCMQCGfcsfSQ1xkgLyVz58XSPSFR3ySUvKUgMuDSd1JEySceEmEl4HzPC o5jg== X-Forwarded-Encrypted: i=1; AJvYcCXmYrN4OUw8tktxfCm451c9OMCkgvSjjZn7XOCesg6URRmCS3K6G8bQ8qpxTdF3GsfVYXSFtdKHXg==@kvack.org X-Gm-Message-State: AOJu0YwounXZgvvxmVUkr75lzw/vgzkoFy0TthMIY5z+ojRGMfjnJGcM z+D84squvhrq/BqD7xG7GvPsYO3jfBl73Zwu41pI61Di2YuLv/RFFuU3wqyzS3fg3EIs39MnBHd 4fg== X-Google-Smtp-Source: AGHT+IFtRvBKCg21ZonSdl0i6oAaNOlWcK8osbIXD1VUy6Zm+UF4tEaZT/nKAdEJtdzHD4gxQkKcsjXP+Js= X-Received: from surenb-desktop.mtv.corp.google.com ([2a00:79e0:2e3f:8:53af:d9fa:522d:99b1]) (user=surenb job=sendgmr) by 2002:a25:af4b:0:b0:e33:104c:fac8 with SMTP id 3f1490d57ef6-e337f8d7427mr11316276.7.1731358512081; Mon, 11 Nov 2024 12:55:12 -0800 (PST) Date: Mon, 11 Nov 2024 12:55:03 -0800 In-Reply-To: <20241111205506.3404479-1-surenb@google.com> Mime-Version: 1.0 References: <20241111205506.3404479-1-surenb@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241111205506.3404479-2-surenb@google.com> Subject: [PATCH 1/4] mm: introduce vma_start_read_locked{_nested} helpers From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com X-Stat-Signature: h5mnzs5jfyg5qgy9ak4crzrychc48mfq X-Rspam-User: X-Rspamd-Queue-Id: 4440C8001B X-Rspamd-Server: rspam02 X-HE-Tag: 1731358436-574927 X-HE-Meta: U2FsdGVkX19zTrfRUCGlCSjbU8LGZOIx9j0qK/bN2DskZGpkQxLcPK8BbDbnxMqz3RIbD6EfMlCWJhlXYO/biBf0ptps2lpe4sbgc8gQSOUiWCsJ7jJ5DFgs6wCxvKjRmSg47V8aw2K/0vSm+lM9UYnF9fPMK6ZL4Jpl81uOh85jPQz09kBPRc9kv3JKHHeVMuETLNSi3xJAQSrRB47wa8haN1vMudGSDA8A++qQM2KNdM4OfCovAsuKuV9JpQTcNp6wsxqR7u/pewwK9bB1Wq03l8Frq1msxs3RVd0iamt95GaHOLJ8o24bIoH5UdAzm1NmDJrg9+5bn0+P9ez+lI6TuWHU5rc1FiKrQr93rL2OfHiaSkx5E2BYDZYlyBRCNcLtQ6hq9yfIj016vrgbJdNoLz0YjHiU04qL0fnNssqDgxsF5cdgaTFDKhDgXirf/RlLBatufwNULVG6TY1ejbOdKyYSiJ8+hVCF6UeCrkO5SeRsyElMxdivgyCPm+JEtt4kOGSDawnkJsU48LqUWXrTNJcp1r8hEWt8h1N8MQRscZKBEQH4fM+QIqyEKQs7F5389bSjuFbaiiVOXcWTIeVrM1tw//uCI6VZeVCaxIZZb61h7GCPWDhHrXoLzLNkA8OqeMjMU5D+yWwwu8TEGOg8+hkuV1qtjdqGGygddaSYIgS4KudXRiaW8Fza/kLfD8Ka6j5eBkAVujj31i7lZA/hHLf0ezopTYGk1SucnWpFXhwS6vuJTfjzPHkYaUiiIcn0X/y1fOPWO/zljdGZWcvrlJLYlATaPELTVESMQmiQ06FB25R7SDU1tT9YnhAMEFdmjr4fi0mgvWD6Yl7pVLGN5Qdobs7QmudDX2AZogL6k76+ZRkQMlsyudZtIQCppT+UtObVqhjk33eehbUGBO1I69GgIRnUaZIFQOcxCSIvRQNIpay4PngQHqxiXRTSKOz+sjssc7RXSsdqMhb 20tUE2tl KuAcaetjOeV0avKXMrjt3sZxvGaq3KOKbaDyXJpz5pwOy1RF2/KebmdWHMoazARIEwwHCxkyPGUnLrS36gxlpU2Zl19mT7lVvxujxsTvS314COSakJR9dvi6Z4zs+x+WIE5NTtFRqsImStQ31YCodEUXBJQA6xWd+9fJi3+9pWlW7iVKWlD5URa3DIfUPgRK7f7BW/FV/ZBNHqMPmJhIR/kxl5mcBnty3xYgqvGt/Jikwhy2/ZfzDWQdmZI2jdkznzhmoUgHT1S26jyX+dzShAQ7eoPkICf+jSAFH11tk+LYXGhEFTFSJykxsm6I7SilTAY9pazJUyZ4U2HWLn53dVPHYPWBgZTVBCRYV4nTZKCHFLqPPsOvrE3O6EBLRBTSFGiDYqodWniyRJAn4Ls6YlZkYkOjmTv/hyEVpyxsK5lisDPVginSfrU74VUdqPY3DPHV1dDqE3lgseRFlBJmDP+x+fOT+gCK6WBF2rx4MCvTBKpyD8rccWFVeojzRe6THdmcnOHkAhx+ven/JOz+9fPUxxg74DANB7DDgtWvTW45464bd/iJfR300g5uhiwRGjRgFGDrS+rZch51WtgPrwAlWRA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Introduce helper functions which can be used to read-lock a VMA when holding mmap_lock for read. Replace direct accesses to vma->vm_lock with these new helpers. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 20 ++++++++++++++++++++ mm/userfaultfd.c | 14 ++++++-------- 2 files changed, 26 insertions(+), 8 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index fecd47239fa9..01ce619f3d17 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -722,6 +722,26 @@ static inline bool vma_start_read(struct vm_area_struct *vma) return true; } +/* + * Use only while holding mmap_read_lock which guarantees that nobody can lock + * the vma for write (vma_start_write()) from under us. + */ +static inline void vma_start_read_locked_nested(struct vm_area_struct *vma, int subclass) +{ + mmap_assert_locked(vma->vm_mm); + down_read_nested(&vma->vm_lock->lock, subclass); +} + +/* + * Use only while holding mmap_read_lock which guarantees that nobody can lock + * the vma for write (vma_start_write()) from under us. + */ +static inline void vma_start_read_locked(struct vm_area_struct *vma) +{ + mmap_assert_locked(vma->vm_mm); + down_read(&vma->vm_lock->lock); +} + static inline void vma_end_read(struct vm_area_struct *vma) { rcu_read_lock(); /* keeps vma alive till the end of up_read */ diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 60a0be33766f..55019c11b5a8 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -86,13 +86,11 @@ static struct vm_area_struct *uffd_lock_vma(struct mm_struct *mm, vma = find_vma_and_prepare_anon(mm, address); if (!IS_ERR(vma)) { /* + * While holding mmap_lock we can't fail * We cannot use vma_start_read() as it may fail due to - * false locked (see comment in vma_start_read()). We - * can avoid that by directly locking vm_lock under - * mmap_lock, which guarantees that nobody can lock the - * vma for write (vma_start_write()) under us. + * false locked (see comment in vma_start_read()). */ - down_read(&vma->vm_lock->lock); + vma_start_read_locked(vma); } mmap_read_unlock(mm); @@ -1480,10 +1478,10 @@ static int uffd_move_lock(struct mm_struct *mm, * See comment in uffd_lock_vma() as to why not using * vma_start_read() here. */ - down_read(&(*dst_vmap)->vm_lock->lock); + vma_start_read_locked(*dst_vmap); if (*dst_vmap != *src_vmap) - down_read_nested(&(*src_vmap)->vm_lock->lock, - SINGLE_DEPTH_NESTING); + vma_start_read_locked_nested(*src_vmap, + SINGLE_DEPTH_NESTING); } mmap_read_unlock(mm); return err; From patchwork Mon Nov 11 20:55:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 13871368 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC1AFD3ABF5 for ; Mon, 11 Nov 2024 20:55:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B45946B00D1; Mon, 11 Nov 2024 15:55:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AF5636B00D2; Mon, 11 Nov 2024 15:55:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 884C76B00D3; Mon, 11 Nov 2024 15:55:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5FD2E6B00D1 for ; Mon, 11 Nov 2024 15:55:17 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 18A061A1AC4 for ; Mon, 11 Nov 2024 20:55:17 +0000 (UTC) X-FDA: 82775018508.05.4BABAFB Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf04.hostedemail.com (Postfix) with ESMTP id 7504E40008 for ; Mon, 11 Nov 2024 20:54:23 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=otbmjnde; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 3Mm8yZwYKCBsJLI5E27FF7C5.3FDC9ELO-DDBM13B.FI7@flex--surenb.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3Mm8yZwYKCBsJLI5E27FF7C5.3FDC9ELO-DDBM13B.FI7@flex--surenb.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731358387; a=rsa-sha256; cv=none; b=nMdGdz0G3BUM8JRpEo9j3gErqsCWEmP0C7YNhRyzB/ySG9I98gE70yqF6j7V/E1DtImawZ xxT65HkkN7/NCyDE33GEkFG+v1SLrY0rcYlEzElqEZscYxV2LDZvN3/W+gSKGH6yvWP9IC BYj4oMmwViPqs3fY9FbhOS7CP77Oe04= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=otbmjnde; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 3Mm8yZwYKCBsJLI5E27FF7C5.3FDC9ELO-DDBM13B.FI7@flex--surenb.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3Mm8yZwYKCBsJLI5E27FF7C5.3FDC9ELO-DDBM13B.FI7@flex--surenb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731358387; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ykW1gbgekt/rdC2PDe+Q7K6R2D2rbM+L22MrFdz7vw8=; b=Co3YIpkj7tY5j3svQtrZvlxC4+OVSbLCpLiSXu0Gw9urF1oebFyM+QiZIsDE59vcLyArcT KcVocgfA2eQ5wckGJxpiC6K4lIyiEWHIlcSuU8XvHHXSEkIJw6ztSprV1kVyg2AbieIA9a /COJLkGUP87D8HHxPndyNm0UbbTa9D8= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-6eae6aba6d4so46688067b3.3 for ; Mon, 11 Nov 2024 12:55:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731358514; x=1731963314; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ykW1gbgekt/rdC2PDe+Q7K6R2D2rbM+L22MrFdz7vw8=; b=otbmjnderUlQLg55D7yG8DI+1FSM7QBiLF0qiB3es3pg1e4PTQCb0SsremIcP9pTEZ 8djdn4rpHGYPHXeSpGzfqt34y+ADJibhJZHgorl7R6QdGXv8Z6buvEsBz9TAsro9A81P eRwYlHZbugmsD1mRgo0GVYS9OIAcQHyllIQ4iKs62v1iMss6CmsfXeNPmGriGxTj4uSt kXaQsEpCLje5HHItMMlBSIUxTZH+InXKZdl25kkdeLggHy+90N4agLzWxaSnfxMAtpyv mqQ7ddKjxCQMuJ2H+p9tScUlFWzefVjNK/B9zA1MD8o583sD2ncWIQ1e9AiEluY0JFcD uuWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731358514; x=1731963314; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ykW1gbgekt/rdC2PDe+Q7K6R2D2rbM+L22MrFdz7vw8=; b=J12TNII5eMfY8biatvusA7GktZgFgdTXBFNAh1IL/CudBRKZAWG4kvdbN1cm+Z0CNl 6gj4FlxYzYr5qYaWWLQjYLQVolP6iw+cDEbJksUvfiIXOCOWEEyfeIrVKPESWXOHNUES C18o2EmKB4iyaOKKWOFQ6htHuLAilM3PZrNkGH/AW2l7Kn03ud+Ns3JY5jnwQoUrTiIJ o6TIlsFeukd7SlDdkTj1ypCt6s41TpyXhCEHOn8xoBf53NqSxAz0B1TBibFnmYZn3sZ+ kmu/m4IdtZFa7nEe+nWwNvqMbX+LrRD7+Cqj9ON7Tk7OOWQsfTIdaymMTNWuMDVYnLAy PB3g== X-Forwarded-Encrypted: i=1; AJvYcCVUU91k+tBkbNxyyFOJ7uB1pHmHdTYEhlC34QNZH8Ws0LEyuWZvopzkORyB3I4FUyLhUTojHeL8HQ==@kvack.org X-Gm-Message-State: AOJu0YzkrqpNlcKO18Th7H4W1cH+C/naPYbIfspTTOUyFA8Yq8iPRLrl elJdEm3jEej1qiDHL2juWef+VRls9vXLp2Jb2C2B6qWnt+cRJVazbqFGwfIqA8vqejDWgTFuOd5 FIA== X-Google-Smtp-Source: AGHT+IH1frtfP2Lf4sm0W9DSf94AOlw0snxCLYaHFUcao7FdY9FAQlPyHOnZbHHIWvjf/NOxp7z7fX74rks= X-Received: from surenb-desktop.mtv.corp.google.com ([2a00:79e0:2e3f:8:53af:d9fa:522d:99b1]) (user=surenb job=sendgmr) by 2002:a05:690c:887:b0:6b2:6cd4:7f9a with SMTP id 00721157ae682-6eaddff31bcmr1462977b3.8.1731358514433; Mon, 11 Nov 2024 12:55:14 -0800 (PST) Date: Mon, 11 Nov 2024 12:55:04 -0800 In-Reply-To: <20241111205506.3404479-1-surenb@google.com> Mime-Version: 1.0 References: <20241111205506.3404479-1-surenb@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241111205506.3404479-3-surenb@google.com> Subject: [PATCH 2/4] mm: move per-vma lock into vm_area_struct From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com X-Rspamd-Queue-Id: 7504E40008 X-Stat-Signature: pcten4qrscw5915frzsn976sfijobg8y X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1731358463-494289 X-HE-Meta: U2FsdGVkX1+7WoVldU1VXcLPq0cSIrVFkJAL58roMqOBQA9lQtCKihtpzuBCHvMnQtRA606CQMFDpUD3sQA2Y81eKR9IQ76+MmgHub69Hf7AZ52phrWOSXu/wLM6UUPQ9sUkOlT4OsLpm6FyY1HYSdLM33JntneEeQ19B8TWc1u+xWg8dGDIvKRrLONC0whjxoHH/7dpdbZM8ElL8/KKxxTlY27NOVCdUoMzoCQEQ/tC4T2J7wdz+XKRD/SNvqzdQ5y5HQZBZtskKrzhxj4jBS0giRGTaC0f9Md+7tF0HzOStpjHiXZkW3T7s6gdg3iWd8VtYfp3CmGN86g8cH82O+/UytYpQDWqdk3v0pEraaYE+iyNl1DcZd0ee7SojYNaAJ8a+UogmQMEgJoXrFCmVC4O29ixZM5gZ6RqQXcOFPYtRP1LHpusJB66ksC1adY7afe5xDXnlkL29/n+Sq4S42QbYXgZynDxGWOHZtuciEiYBDASndc0vnlY5Y9Hb0b3ylZoHRiHJ7Mu7ThO6dmk6eCaSQRCFUqq4X5dbsXekyQCkYmBYV5NbrPmv/299mcNpLAkYudQrsSPqrdht40wGSiwncGNB5HU5ohygqRKj8gtRdPuPaXm8rywrIQylDLujvOfKSpO1nRhuzE6NZ9oN0fijBhRvij+Sb9VaGhJ9neY6dlzovwT+95PSCP+loWGQPuMibIwrlPgJgMtgPRNG2ucS3BJJ0kSdL4tTbx27vAGy3Wm0gj7TBH3RJt3phf0p2Su7adKVuqKK0thQd7LMogz4tZxBVmDOh0dVXYRmn3XejPeQvvZb/PF4VhiwZyqOtHv/lll8sXa+6SmA6d7XIEV4cUaCuK2617uIoJ766K3JnlOIvD7dDbTLtgUORK82yMFee+Ug3ZMb3yUftp8vifCqKH3UIjoT9+SqalD6NVeghoY/JLascu11Ub4SySOAMf3oIKQW4/X4AK4sj7 SGY1xcic 0hakpeWunoN4VA50/Unl5P8eVqwbVHIGCcjTsqGK8HuiUsJrtiKg/5jgEV4iG7Z4A1WH9adqxHy+SvR/0Ehxmct+ILcPDSTsFpz42K//UYBtyR5z0GMxnSkv1qXW7jW9ZM1HTA5W5EV6mfaa8Q/qBqCwKKFw4ZmnZ34xVOIZn67jqklWA0TZmNyM9Y4Z8k6mCZeybGZfo+M3H13PvoemNrWy2IkDf+apKMQ3kXi4huUMx0nyqWAJJfvcrLtAh9lCQen+tv4Nrkw09jIBIg+ds68d8NXdIbws3cgG0sT6WN3mrBOvOBI89Fe44eFqsUa2HPxiB+vt6AZP/Vex3ku92dUCyl0inN56An68uo0HcCQlZOVaEXcE6exqkL+ZvNXc7UiUdEH3MMxLwE+SAGuXpF9AhJ4+Pvk2QTHAvqvDUilbdr40c7qSoIqDtwo7ibkt8fSJ8rzF6c3Nl3A7o0YtxcSmxhI1eVnNeisRzXo2tRGR0uv7dxenDcdvyC8BPt47H3XfZdtQI3tuuYDV1+vusgrShfLkgaaQIOiMAjUSUm1opvGDY6ljgdXJrGTx0NxdIkpfw67FYoSdSq5BAC/feRD6JM8EiBebep+dKF17wdTGYItdq1tDNlVlmO9JW5/LmwIJk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Back when per-vma locks were introduces, vm_lock was moved out of vm_area_struct in [1] because of the performance regression caused by false cacheline sharing. Recent investigation [2] revealed that the regressions is limited to a rather old Broadwell microarchitecture and even there it can be mitigated by disabling adjacent cacheline prefetching, see [3]. This patchset moves vm_lock back into vm_area_struct, aligning it at the cacheline boundary and changing the cache to be cache-aligned as well. This causes VMA memory consumption to grow from 160 (vm_area_struct) + 40 (vm_lock) bytes to 256 bytes: slabinfo before: ... : ... vma_lock ... 40 102 1 : ... vm_area_struct ... 160 51 2 : ... slabinfo after moving vm_lock: ... : ... vm_area_struct ... 256 32 2 : ... Aggregate VMA memory consumption per 1000 VMAs grows from 50 to 64 pages, which is 5.5MB per 100000 VMAs. This memory consumption growth will be addressed in the patches that follow. [1] https://lore.kernel.org/all/20230227173632.3292573-34-surenb@google.com/T/#m861679f3fe0e22c945d6334b88dc996fef5ea6cc [2] https://lore.kernel.org/all/ZsQyI%2F087V34JoIt@xsang-OptiPlex-9020/ [3] https://lore.kernel.org/all/CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbfP_pR+-2g@mail.gmail.com/ Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 27 ++++++++++++---------- include/linux/mm_types.h | 6 +++-- kernel/fork.c | 50 +++++----------------------------------- 3 files changed, 25 insertions(+), 58 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 01ce619f3d17..c1c2899464db 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -684,6 +684,11 @@ static inline void vma_numab_state_free(struct vm_area_struct *vma) {} #endif /* CONFIG_NUMA_BALANCING */ #ifdef CONFIG_PER_VMA_LOCK +static inline void vma_lock_init(struct vma_lock *vm_lock) +{ + init_rwsem(&vm_lock->lock); +} + /* * Try to read-lock a vma. The function is allowed to occasionally yield false * locked result to avoid performance overhead, in which case we fall back to @@ -701,7 +706,7 @@ static inline bool vma_start_read(struct vm_area_struct *vma) if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq.sequence)) return false; - if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0)) + if (unlikely(down_read_trylock(&vma->vm_lock.lock) == 0)) return false; /* @@ -716,7 +721,7 @@ static inline bool vma_start_read(struct vm_area_struct *vma) * This pairs with RELEASE semantics in vma_end_write_all(). */ if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&vma->vm_mm->mm_lock_seq))) { - up_read(&vma->vm_lock->lock); + up_read(&vma->vm_lock.lock); return false; } return true; @@ -729,7 +734,7 @@ static inline bool vma_start_read(struct vm_area_struct *vma) static inline void vma_start_read_locked_nested(struct vm_area_struct *vma, int subclass) { mmap_assert_locked(vma->vm_mm); - down_read_nested(&vma->vm_lock->lock, subclass); + down_read_nested(&vma->vm_lock.lock, subclass); } /* @@ -739,13 +744,13 @@ static inline void vma_start_read_locked_nested(struct vm_area_struct *vma, int static inline void vma_start_read_locked(struct vm_area_struct *vma) { mmap_assert_locked(vma->vm_mm); - down_read(&vma->vm_lock->lock); + down_read(&vma->vm_lock.lock); } static inline void vma_end_read(struct vm_area_struct *vma) { rcu_read_lock(); /* keeps vma alive till the end of up_read */ - up_read(&vma->vm_lock->lock); + up_read(&vma->vm_lock.lock); rcu_read_unlock(); } @@ -774,7 +779,7 @@ static inline void vma_start_write(struct vm_area_struct *vma) if (__is_vma_write_locked(vma, &mm_lock_seq)) return; - down_write(&vma->vm_lock->lock); + down_write(&vma->vm_lock.lock); /* * We should use WRITE_ONCE() here because we can have concurrent reads * from the early lockless pessimistic check in vma_start_read(). @@ -782,7 +787,7 @@ static inline void vma_start_write(struct vm_area_struct *vma) * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. */ WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); - up_write(&vma->vm_lock->lock); + up_write(&vma->vm_lock.lock); } static inline void vma_assert_write_locked(struct vm_area_struct *vma) @@ -794,7 +799,7 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma) static inline void vma_assert_locked(struct vm_area_struct *vma) { - if (!rwsem_is_locked(&vma->vm_lock->lock)) + if (!rwsem_is_locked(&vma->vm_lock.lock)) vma_assert_write_locked(vma); } @@ -861,10 +866,6 @@ static inline void assert_fault_locked(struct vm_fault *vmf) extern const struct vm_operations_struct vma_dummy_vm_ops; -/* - * WARNING: vma_init does not initialize vma->vm_lock. - * Use vm_area_alloc()/vm_area_free() if vma needs locking. - */ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) { memset(vma, 0, sizeof(*vma)); @@ -873,6 +874,8 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) INIT_LIST_HEAD(&vma->anon_vma_chain); vma_mark_detached(vma, false); vma_numab_state_init(vma); + vma_lock_init(&vma->vm_lock); + vma->vm_lock_seq = UINT_MAX; } /* Use when VMA is not part of the VMA tree and needs no locking */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 80fef38d9d64..5c4bfdcfac72 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -716,8 +716,6 @@ struct vm_area_struct { * slowpath. */ unsigned int vm_lock_seq; - /* Unstable RCU readers are allowed to read this. */ - struct vma_lock *vm_lock; #endif /* @@ -770,6 +768,10 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_PER_VMA_LOCK + /* Unstable RCU readers are allowed to read this. */ + struct vma_lock vm_lock ____cacheline_aligned_in_smp; +#endif } __randomize_layout; #ifdef CONFIG_NUMA diff --git a/kernel/fork.c b/kernel/fork.c index 0061cf2450ef..9e504105f24f 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -436,35 +436,6 @@ static struct kmem_cache *vm_area_cachep; /* SLAB cache for mm_struct structures (tsk->mm) */ static struct kmem_cache *mm_cachep; -#ifdef CONFIG_PER_VMA_LOCK - -/* SLAB cache for vm_area_struct.lock */ -static struct kmem_cache *vma_lock_cachep; - -static bool vma_lock_alloc(struct vm_area_struct *vma) -{ - vma->vm_lock = kmem_cache_alloc(vma_lock_cachep, GFP_KERNEL); - if (!vma->vm_lock) - return false; - - init_rwsem(&vma->vm_lock->lock); - vma->vm_lock_seq = UINT_MAX; - - return true; -} - -static inline void vma_lock_free(struct vm_area_struct *vma) -{ - kmem_cache_free(vma_lock_cachep, vma->vm_lock); -} - -#else /* CONFIG_PER_VMA_LOCK */ - -static inline bool vma_lock_alloc(struct vm_area_struct *vma) { return true; } -static inline void vma_lock_free(struct vm_area_struct *vma) {} - -#endif /* CONFIG_PER_VMA_LOCK */ - struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) { struct vm_area_struct *vma; @@ -474,10 +445,6 @@ struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) return NULL; vma_init(vma, mm); - if (!vma_lock_alloc(vma)) { - kmem_cache_free(vm_area_cachep, vma); - return NULL; - } return vma; } @@ -496,10 +463,8 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) * will be reinitialized. */ data_race(memcpy(new, orig, sizeof(*new))); - if (!vma_lock_alloc(new)) { - kmem_cache_free(vm_area_cachep, new); - return NULL; - } + vma_lock_init(&new->vm_lock); + new->vm_lock_seq = UINT_MAX; INIT_LIST_HEAD(&new->anon_vma_chain); vma_numab_state_init(new); dup_anon_vma_name(orig, new); @@ -511,7 +476,6 @@ void __vm_area_free(struct vm_area_struct *vma) { vma_numab_state_free(vma); free_anon_vma_name(vma); - vma_lock_free(vma); kmem_cache_free(vm_area_cachep, vma); } @@ -522,7 +486,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head) vm_rcu); /* The vma should not be locked while being destroyed. */ - VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock->lock), vma); + VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock.lock), vma); __vm_area_free(vma); } #endif @@ -3168,11 +3132,9 @@ void __init proc_caches_init(void) sizeof(struct fs_struct), 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); - - vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT); -#ifdef CONFIG_PER_VMA_LOCK - vma_lock_cachep = KMEM_CACHE(vma_lock, SLAB_PANIC|SLAB_ACCOUNT); -#endif + vm_area_cachep = KMEM_CACHE(vm_area_struct, + SLAB_HWCACHE_ALIGN|SLAB_NO_MERGE|SLAB_PANIC| + SLAB_ACCOUNT); mmap_init(); nsproxy_cache_init(); } From patchwork Mon Nov 11 20:55:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 13871370 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 075B5D3ABF4 for ; Mon, 11 Nov 2024 20:55:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B44D26B00D8; Mon, 11 Nov 2024 15:55:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ACD526B00D9; Mon, 11 Nov 2024 15:55:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F9CE6B00DA; Mon, 11 Nov 2024 15:55:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6881C6B00D8 for ; Mon, 11 Nov 2024 15:55:20 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 22A78AD61B for ; Mon, 11 Nov 2024 20:55:20 +0000 (UTC) X-FDA: 82775018424.12.EBE9221 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf28.hostedemail.com (Postfix) with ESMTP id B0640C0006 for ; Mon, 11 Nov 2024 20:54:36 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=JVsEw5cz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of 3NG8yZwYKCB0LNK7G49HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--surenb.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3NG8yZwYKCB0LNK7G49HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--surenb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731358343; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ifdJ2eX7iGI1j7/6aWcHQxmdsxhJaJa3q49onQy2Ll4=; b=rCFmqJ4Cipl1JxQwTeqrT9A3AE96uTnDx6XrT1Fp1w0PAmmPcHsQJwVHK9SCVUPJAMRZw9 RLKrMZpUlPyMjOrfuHk20K7021K/Stx0iq5QavlBNxJx/niWAqvXCACQDLf1liZBTWQi/v soSYl+XqlW05rxGHv9gXNiDGtp5tfD4= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=JVsEw5cz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of 3NG8yZwYKCB0LNK7G49HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--surenb.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3NG8yZwYKCB0LNK7G49HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--surenb.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731358343; a=rsa-sha256; cv=none; b=2itCpjVS14NJ41XjluTXoEu3pWaJl76Snu7iPUf+H64g6q4cJ8Tavn6jZ6zqnmDlhe3nQN NlGg3NqnTUR3YTmiyEwKE+kA25q1YiXnai5z4rNfZKyqDAiS/5h0L98sQu9aoGD0akbLBz ewGVy9dAHU8TEcLWsHdZligafPtHRko= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6eae6aba6d4so46688657b3.3 for ; Mon, 11 Nov 2024 12:55:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731358517; x=1731963317; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ifdJ2eX7iGI1j7/6aWcHQxmdsxhJaJa3q49onQy2Ll4=; b=JVsEw5czbncTjW+2XEynD70WG6OsxUyRL8G1rAl7lYCPFzawJ3QNrUn8o6GanWRk4Z eiSYnVAqExeeOP9YZB68YBP+OyrNzsqiiIaU5/cGwtt1BE3jR3mNtoftY5cziTL4dMA4 bhMbM3b1Kf5A+y6THUPMPt3oH8yQP8gp8vK8RYKIUkXrNWiOPbwc0HZEnSE2rX0jyQxs btCgaJbRz/Cw2YolRbc0JyGMm34bFFozlKNiPVh/aOA7LJAz6cBtQN+yeEt2N6iTlNON CZkh/1JtADTO/9K18psyj7GHJOw9q1nx0z76uLyZCUWLb4PzmG5D86K842IbEPPCwkJy nY9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731358517; x=1731963317; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ifdJ2eX7iGI1j7/6aWcHQxmdsxhJaJa3q49onQy2Ll4=; b=XjDjql1IZdTWdliSeI/msO3bPnUCEi5MDWuHDZLf2xeWIAMPkma2t0l8zK69pYF1BR 0Wku7aOfDGN7GNpk37N5sNdBWtPjNLDLSOFNvtNmgvDth31DSqiBmCbpfsxYIgLhq3Cp F6Y+Oi8N7ipfmBZpZWRGCv5SWk4AeBhHD2z+FZwz4Xvoe2wVZDqeMGeHlaaBEBEetDQk 7g2/itMOmKEtL/fk5CYMukpgbida3cV+Xoa/TsArr90zPn3JIkUBWod4p2Ap96vjdOHY VGwCW3dNIjkoSKPfIJIkNqXx4hHaa7xlitIHoAl0PQ10O7h3reVAJkMJvczBlW4zX3X9 nGaw== X-Forwarded-Encrypted: i=1; AJvYcCU6y+EZr5GvMO6WHaiafytMvPAb2w2AQC4zRJ4L2EFsm2RPlU779Wf1nlEVe6s9AubgjCVvMYzKiQ==@kvack.org X-Gm-Message-State: AOJu0Yzkv9mhrAEAzZ/pVsJWAVy9gqjqvLUvBdMwb1IfabgE3H7z2C5v q12VL5UBSudwZP+8fdFHWSpNTq7X9oEIYYsAH8PJ9fZ+ZtFma2E4ZcS7TOnAXgomGx3jmIKE5q9 GvQ== X-Google-Smtp-Source: AGHT+IHUDRefIenXEALrWCUUFASKqSkv9c9zYj40iRYol9Wb1hfXY9ZKD83QbQvFvfVT3Q58PHh5dYLSl+c= X-Received: from surenb-desktop.mtv.corp.google.com ([2a00:79e0:2e3f:8:53af:d9fa:522d:99b1]) (user=surenb job=sendgmr) by 2002:a25:7144:0:b0:e30:d518:30f1 with SMTP id 3f1490d57ef6-e337f83cceemr4619276.1.1731358516486; Mon, 11 Nov 2024 12:55:16 -0800 (PST) Date: Mon, 11 Nov 2024 12:55:05 -0800 In-Reply-To: <20241111205506.3404479-1-surenb@google.com> Mime-Version: 1.0 References: <20241111205506.3404479-1-surenb@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241111205506.3404479-4-surenb@google.com> Subject: [PATCH 3/4] mm: replace rw_semaphore with atomic_t in vma_lock From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B0640C0006 X-Stat-Signature: rwom8htzo8x7so6hw3p73fkrt8xko6qh X-Rspam-User: X-HE-Tag: 1731358476-859409 X-HE-Meta: U2FsdGVkX1+aV8nVGQjgExo+ej68PjiGHF8vIE3/jsuOaO/aauPSaoPLBtsk4bcj79cXC/irTbrpyOT82IqzlmrmgydWyHzWCM0Ez5ghgbqwnjrPsBf8b+QkcelJSdUrk1vx2dONbBg3y2trRFj+31GILOuhnRwKTT3/RWYSn3Avh/YKKewlZBFnniz+G4lywF8V5ngATY/Dds8b5VFD4LwT8C9ttCYMcqGwV74s/sKsVsxujf/ETLJ85zBdqhJchKmR4QR5uxeTdoirOYmwBFALOI5Xq74oWe+j85wWOLqUTBuriYUaRAVMQnLcV0zZeh/WgAwLazpXM08zjl34evWEWKOdGYpSn1en4Kfrgd0yo/16DFSU4B1N3wWdJlFsWWezqXFmSdN3Uod85LPL8jxTukqe3flJfC77tzJwsGYP69Ij2u2sH+RoG2e6CrPBj671C5T5Y0veEnFVE9OctjwSPCtLL6DrZ6zbgmwuw97rCa5SbVQk3TyPX7j+VLsTzBxH6LSfuHL0Ru3yi6nVD+f1Y7nSXZA77bnRY09Awzp5Dilgs03XIIkt3nSriHAtce98nUlqUtQYThSEZGBvO2FW4xEAvjGeTQTVvRnzk1Qg8452trZNHEUirZE56EjxYzagnL3HuX4ZcdXjeWBz7GQwzYSyD6+aIMtwy/fU9pmpPt3rJrl3rQGbad7DaXR5zDEDuerznt6VvgayBDbMEw3wJ6vkcvnZhEUVw38D3FDM7NgQ2fCOcO4U727K5Ge+UEJkIhn3zutwcBT7R23TlEV2jrYtpe5yrXkyX16OoVEh/VYKhNj2FnuthkV+KXnRCYvE3YLWq56dXUNS3uo8NEhKsOzCsA21Y+UbxhRJl45zDOyomn7b/nYshOEg7XTuHWtWVtlkjIkF9d1SCDFi/6/Miv4T6sZtJi6KuJaQpO0hx6lwMRRTbfV6hhhQIMOrlmrRmu9uVCWbxY1PpgY uFR+mhH2 +0ZakifZZJfpx8CnwJvQMryEB4CBfGBUIqn0TPDJYlLBInVjOYDLsSiKyHYeqzxcahXm99BlicKqauHcKZ+nUF4AvVsnJNyDzbEnY8WfznKG7mmbtM+m/OT68a7EUgfBAiqyFHp/P2ZKuf5yuqGwpvV6WtrLbhw3daU5v6P4ZPm6Lux6VanHJWRnfRzdgoMzGtTBTJ0Htrj+C9oBSmgbwRI+vdaxr0lNf5Otv/+sG/3NaHMebtzxRD3zA+WBeYtTACR8I68k2KOA/fWGb2iGWm2Ucgke6I/XId+eAqk4loO87JzcagQQ/zoffbCI+JSrUm5Id42FFoaK3qXkU3kHh6Km8oWSg/6yXX9KMgn1hNDPeVRl3+nSd3TfD1+2PIELsklFJ/FxE15x2Vq9il1+3edPvp60BV4tveJgg9d0z+b/iLxY5OGyEscxwem4/b/S8Bpx66/UGWV0+v4wtltY/Jegml+Do7b78yT1K1u7DXxijLonZDZlNqa3anYiNeCHsaLe7QnqU96g7zx5gL+Gx1dFAv/sE+IvDa1tn3bWHuhCvLVIm6M2D1AbPSteukrKeqqi051ODX9zDuL/5xwYCO5KGcg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: rw_semaphore is a sizable structure of 40 bytes and consumes considerable space for each vm_area_struct. However vma_lock has two important specifics which can be used to replace rw_semaphore with a simpler structure: 1. Readers never wait. They try to take the vma_lock and fall back to mmap_lock if that fails. 2. Only one writer at a time will ever try to write-lock a vma_lock because writers first take mmap_lock in write mode. Because of these requirements, full rw_semaphore functionality is not needed and we can replace rw_semaphore with an atomic variable. When a reader takes read lock, it increments the atomic, unless the top two bits are set indicating a writer is present. When writer takes write lock, it sets VMA_LOCK_WR_LOCKED bit if there are no readers or VMA_LOCK_WR_WAIT bit if readers are holding the lock and puts itself onto newly introduced mm.vma_writer_wait. Since all writers take mmap_lock in write mode first, there can be only one writer at a time. The last reader to release the lock will signal the writer to wake up. atomic_t might overflow if there are many competing readers, therefore vma_start_read() implements an overflow check and if that occurs it exits with a failure to lock. vma_start_read_locked{_nested} may cause an overflow but it is later handled by __vma_end_read(). Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 142 ++++++++++++++++++++++++++++++++++---- include/linux/mm_types.h | 18 ++++- include/linux/mmap_lock.h | 3 + kernel/fork.c | 2 +- mm/init-mm.c | 2 + 5 files changed, 151 insertions(+), 16 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index c1c2899464db..27c0e9ba81c4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -686,7 +686,41 @@ static inline void vma_numab_state_free(struct vm_area_struct *vma) {} #ifdef CONFIG_PER_VMA_LOCK static inline void vma_lock_init(struct vma_lock *vm_lock) { - init_rwsem(&vm_lock->lock); +#ifdef CONFIG_DEBUG_LOCK_ALLOC + static struct lock_class_key lockdep_key; + + lockdep_init_map(&vm_lock->dep_map, "vm_lock", &lockdep_key, 0); +#endif + atomic_set(&vm_lock->count, VMA_LOCK_UNLOCKED); +} + +static inline unsigned int vma_lock_reader_count(unsigned int counter) +{ + return counter & VMA_LOCK_RD_COUNT_MASK; +} + +static inline void __vma_end_read(struct mm_struct *mm, struct vm_area_struct *vma) +{ + unsigned int count, prev, new; + + count = (unsigned int)atomic_read(&vma->vm_lock.count); + for (;;) { + if (unlikely(vma_lock_reader_count(count) == 0)) { + /* + * Overflow was possible in vma_start_read_locked(). + * When detected, wrap around preserving writer bits. + */ + new = count | ~(VMA_LOCK_WR_LOCKED | VMA_LOCK_WR_WAIT); + } else + new = count - 1; + prev = atomic_cmpxchg(&vma->vm_lock.count, count, new); + if (prev == count) + break; + count = prev; + } + rwsem_release(&vma->vm_lock.dep_map, _RET_IP_); + if (vma_lock_reader_count(new) == 0 && (new & VMA_LOCK_WR_WAIT)) + wake_up(&mm->vma_writer_wait); } /* @@ -696,6 +730,9 @@ static inline void vma_lock_init(struct vma_lock *vm_lock) */ static inline bool vma_start_read(struct vm_area_struct *vma) { + struct mm_struct *mm = vma->vm_mm; + unsigned int count, prev, new; + /* * Check before locking. A race might cause false locked result. * We can use READ_ONCE() for the mm_lock_seq here, and don't need @@ -703,11 +740,35 @@ static inline bool vma_start_read(struct vm_area_struct *vma) * we don't rely on for anything - the mm_lock_seq read against which we * need ordering is below. */ - if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq.sequence)) + if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence)) return false; - if (unlikely(down_read_trylock(&vma->vm_lock.lock) == 0)) - return false; + rwsem_acquire_read(&vma->vm_lock.dep_map, 0, 0, _RET_IP_); + count = (unsigned int)atomic_read(&vma->vm_lock.count); + for (;;) { + /* Is VMA is write-locked or writer is waiting? */ + if (count & (VMA_LOCK_WR_LOCKED | VMA_LOCK_WR_WAIT)) { + rwsem_release(&vma->vm_lock.dep_map, _RET_IP_); + return false; + } + + new = count + 1; + /* If atomic_t overflows, fail to lock. */ + if (new & (VMA_LOCK_WR_LOCKED | VMA_LOCK_WR_WAIT)) { + rwsem_release(&vma->vm_lock.dep_map, _RET_IP_); + return false; + } + + /* + * Atomic RMW will provide implicit mb on success to pair with smp_wmb in + * vma_start_write, on failure we retry. + */ + prev = atomic_cmpxchg(&vma->vm_lock.count, count, new); + if (prev == count) + break; + count = prev; + } + lock_acquired(&vma->vm_lock.dep_map, _RET_IP_); /* * Overflow might produce false locked result. @@ -720,8 +781,8 @@ static inline bool vma_start_read(struct vm_area_struct *vma) * after it has been unlocked. * This pairs with RELEASE semantics in vma_end_write_all(). */ - if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&vma->vm_mm->mm_lock_seq))) { - up_read(&vma->vm_lock.lock); + if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&mm->mm_lock_seq))) { + __vma_end_read(mm, vma); return false; } return true; @@ -733,8 +794,30 @@ static inline bool vma_start_read(struct vm_area_struct *vma) */ static inline void vma_start_read_locked_nested(struct vm_area_struct *vma, int subclass) { - mmap_assert_locked(vma->vm_mm); - down_read_nested(&vma->vm_lock.lock, subclass); + struct mm_struct *mm = vma->vm_mm; + unsigned int count, prev, new; + + mmap_assert_locked(mm); + + rwsem_acquire_read(&vma->vm_lock.dep_map, subclass, 0, _RET_IP_); + count = (unsigned int)atomic_read(&vma->vm_lock.count); + for (;;) { + /* We are holding mmap_lock, no active or waiting writers are possible. */ + VM_BUG_ON_VMA(count & (VMA_LOCK_WR_LOCKED | VMA_LOCK_WR_WAIT), vma); + new = count + 1; + /* Unlikely but if atomic_t overflows, wrap around to. */ + if (WARN_ON(new & (VMA_LOCK_WR_LOCKED | VMA_LOCK_WR_WAIT))) + new = 0; + /* + * Atomic RMW will provide implicit mb on success to pair with smp_wmb in + * vma_start_write, on failure we retry. + */ + prev = atomic_cmpxchg(&vma->vm_lock.count, count, new); + if (prev == count) + break; + count = prev; + } + lock_acquired(&vma->vm_lock.dep_map, _RET_IP_); } /* @@ -743,14 +826,15 @@ static inline void vma_start_read_locked_nested(struct vm_area_struct *vma, int */ static inline void vma_start_read_locked(struct vm_area_struct *vma) { - mmap_assert_locked(vma->vm_mm); - down_read(&vma->vm_lock.lock); + vma_start_read_locked_nested(vma, 0); } static inline void vma_end_read(struct vm_area_struct *vma) { + struct mm_struct *mm = vma->vm_mm; + rcu_read_lock(); /* keeps vma alive till the end of up_read */ - up_read(&vma->vm_lock.lock); + __vma_end_read(mm, vma); rcu_read_unlock(); } @@ -774,12 +858,34 @@ static bool __is_vma_write_locked(struct vm_area_struct *vma, unsigned int *mm_l */ static inline void vma_start_write(struct vm_area_struct *vma) { + unsigned int count, prev, new; unsigned int mm_lock_seq; + might_sleep(); if (__is_vma_write_locked(vma, &mm_lock_seq)) return; - down_write(&vma->vm_lock.lock); + rwsem_acquire(&vma->vm_lock.dep_map, 0, 0, _RET_IP_); + count = (unsigned int)atomic_read(&vma->vm_lock.count); + for (;;) { + if (vma_lock_reader_count(count) > 0) + new = count | VMA_LOCK_WR_WAIT; + else + new = count | VMA_LOCK_WR_LOCKED; + prev = atomic_cmpxchg(&vma->vm_lock.count, count, new); + if (prev == count) + break; + count = prev; + } + if (new & VMA_LOCK_WR_WAIT) { + lock_contended(&vma->vm_lock.dep_map, _RET_IP_); + wait_event(vma->vm_mm->vma_writer_wait, + atomic_cmpxchg(&vma->vm_lock.count, VMA_LOCK_WR_WAIT, + VMA_LOCK_WR_LOCKED) == VMA_LOCK_WR_WAIT); + + } + lock_acquired(&vma->vm_lock.dep_map, _RET_IP_); + /* * We should use WRITE_ONCE() here because we can have concurrent reads * from the early lockless pessimistic check in vma_start_read(). @@ -787,7 +893,10 @@ static inline void vma_start_write(struct vm_area_struct *vma) * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. */ WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); - up_write(&vma->vm_lock.lock); + /* Write barrier to ensure vm_lock_seq change is visible before count */ + smp_wmb(); + rwsem_release(&vma->vm_lock.dep_map, _RET_IP_); + atomic_set(&vma->vm_lock.count, VMA_LOCK_UNLOCKED); } static inline void vma_assert_write_locked(struct vm_area_struct *vma) @@ -797,9 +906,14 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma) VM_BUG_ON_VMA(!__is_vma_write_locked(vma, &mm_lock_seq), vma); } +static inline bool is_vma_read_locked(struct vm_area_struct *vma) +{ + return vma_lock_reader_count((unsigned int)atomic_read(&vma->vm_lock.count)) > 0; +} + static inline void vma_assert_locked(struct vm_area_struct *vma) { - if (!rwsem_is_locked(&vma->vm_lock.lock)) + if (!is_vma_read_locked(vma)) vma_assert_write_locked(vma); } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5c4bfdcfac72..789bccc05520 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -615,8 +615,23 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name) } #endif +#define VMA_LOCK_UNLOCKED 0 +#define VMA_LOCK_WR_LOCKED (1 << 31) +#define VMA_LOCK_WR_WAIT (1 << 30) + +#define VMA_LOCK_RD_COUNT_MASK (VMA_LOCK_WR_WAIT - 1) + struct vma_lock { - struct rw_semaphore lock; + /* + * count & VMA_LOCK_RD_COUNT_MASK > 0 ==> read-locked with 'count' number of readers + * count & VMA_LOCK_WR_LOCKED != 0 ==> write-locked + * count & VMA_LOCK_WR_WAIT != 0 ==> writer is waiting + * count = 0 ==> unlocked + */ + atomic_t count; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map dep_map; +#endif }; struct vma_numab_state { @@ -883,6 +898,7 @@ struct mm_struct { * by mmlist_lock */ #ifdef CONFIG_PER_VMA_LOCK + struct wait_queue_head vma_writer_wait; /* * This field has lock-like semantics, meaning it is sometimes * accessed with ACQUIRE/RELEASE semantics. diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 58dde2e35f7e..769ab97fff3e 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -121,6 +121,9 @@ static inline void mmap_init_lock(struct mm_struct *mm) { init_rwsem(&mm->mmap_lock); mm_lock_seqcount_init(mm); +#ifdef CONFIG_PER_VMA_LOCK + init_waitqueue_head(&mm->vma_writer_wait); +#endif } static inline void mmap_write_lock(struct mm_struct *mm) diff --git a/kernel/fork.c b/kernel/fork.c index 9e504105f24f..726050c557e2 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -486,7 +486,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head) vm_rcu); /* The vma should not be locked while being destroyed. */ - VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock.lock), vma); + VM_BUG_ON_VMA(is_vma_read_locked(vma), vma); __vm_area_free(vma); } #endif diff --git a/mm/init-mm.c b/mm/init-mm.c index 6af3ad675930..db058873ba18 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -40,6 +40,8 @@ struct mm_struct init_mm = { .arg_lock = __SPIN_LOCK_UNLOCKED(init_mm.arg_lock), .mmlist = LIST_HEAD_INIT(init_mm.mmlist), #ifdef CONFIG_PER_VMA_LOCK + .vma_writer_wait = + __WAIT_QUEUE_HEAD_INITIALIZER(init_mm.vma_writer_wait), .mm_lock_seq = SEQCNT_ZERO(init_mm.mm_lock_seq), #endif .user_ns = &init_user_ns, From patchwork Mon Nov 11 20:55:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 13871371 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC262D3ABF6 for ; Mon, 11 Nov 2024 20:55:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4DFD36B00DB; Mon, 11 Nov 2024 15:55:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 465BE6B00DC; Mon, 11 Nov 2024 15:55:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B7D36B00DD; Mon, 11 Nov 2024 15:55:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0201B6B00DB for ; Mon, 11 Nov 2024 15:55:21 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B8E86AD7EC for ; Mon, 11 Nov 2024 20:55:21 +0000 (UTC) X-FDA: 82775019264.28.1CD0248 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf20.hostedemail.com (Postfix) with ESMTP id 016DA1C0016 for ; Mon, 11 Nov 2024 20:54:28 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="Qy/SbHT6"; spf=pass (imf20.hostedemail.com: domain of 3Nm8yZwYKCB8NPM9I6BJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--surenb.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3Nm8yZwYKCB8NPM9I6BJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731358347; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=akNUNHRHwFgLbLvzRySURA4LLcUajIyMqiu+hcl7K7Q=; b=kW4Y2omNqa3i4HxKts2QaLoxWZov03FPmC+p3zEEhBoUgelTfUF178T7C5PZkb24ASDIYg lOhONzJtyA/BEjgq4lGNi2RCwllRgR+zLvcgaFr17IEBH1cVbyQFLzMC2NgiyblVnfmkya OnOebCl6qOtsXdtVr2Cl3KwerhDc6xw= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="Qy/SbHT6"; spf=pass (imf20.hostedemail.com: domain of 3Nm8yZwYKCB8NPM9I6BJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--surenb.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3Nm8yZwYKCB8NPM9I6BJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731358347; a=rsa-sha256; cv=none; b=agvx3q0nIPoLi8MlklP7C7R17fZH2yp+dNPRSJct4RDxoIwl3YIpuouytFWiSwRUVuH0g0 Xv+17NjkKkvHt6XdSx6gRompAIEeI7OK0VO7+k0rtZtXHEhc4zRD1UFN/I3PcnENX4uFmt O9MIYMDxG0rz7+4j8FQXs5ujPbPGBJc= Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-e33152c8225so10007123276.0 for ; Mon, 11 Nov 2024 12:55:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731358519; x=1731963319; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=akNUNHRHwFgLbLvzRySURA4LLcUajIyMqiu+hcl7K7Q=; b=Qy/SbHT6gNhBdivtMeGCcRBMUnxvAf3sXCovvNeXkCKlKGhSsSgiHqRoVkHFOrdeEN Bz2vOoXD31eGcSrIAzQj9V5NzEHi7zRh69aIAWEqBTgCZ9bhOt6aElKv/MxI/Jg3YQrr HMoZmen01/psIiCQCPxY940mJniiauyR+gCHN2JQ+cF9PWi4pRwyWkB1Kky5hCJqHUbc 7ZJCrlE1dl3ahMGEMQFfSVeHBBrwjCkTNxyJ4gIJv1qe6vnxJEaPVgg0VnpabS9tLWmB qPOiPuu3PJB/yXalbH26AY6jIzIUAkBR9bceZSUuskZmaQqBxM6HhxLEJm138YBagtE4 Ihhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731358519; x=1731963319; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=akNUNHRHwFgLbLvzRySURA4LLcUajIyMqiu+hcl7K7Q=; b=BR47MLDx4Z+SJ3pP/1rt2kQOMAMDUBnjCOcvk2wUcV6TiL5WbwMELmHhS+yGeaE1CA xayp8/XlDDaPt0igjMxYg7TU9VlCB8ReW2Pb51EvtUt5aQH2GQdHdyEybzMGn4ca8qOy nFuW6W2gIvdfGDe4PxrqEGaQaNtP6ybdYZ7TX87hJN0PA/UJ8AYkzMu18oT/91qC8LF6 DoIoAOt1xQrCu9SbsMqKn61FR22Z92roxz1+W23mA2jhAaXFrOOBDkhesRO8giubd4pi S3kuRrkhJosrWgIcsXZ2RChmhIvilyzRmsBumqAxXnsvcajMfl3V/YMhxXJJpIHWtWm+ u1Bg== X-Forwarded-Encrypted: i=1; AJvYcCUUv3bJb+WISRpjilKXmprS9pkG+HQVn7f13lweEGgTKXsnE590HWQIG8zz8Vub/yAlbY0nvfxxRA==@kvack.org X-Gm-Message-State: AOJu0Yx+N6q8cYw0ScBtKbOzG4QUf8FosMA02+RZwATxUVUqzAvtN+w/ nWg33SgR3DxdcSWez3mF+jXWWbKoQcw2mzSCIKKDgNY0oEx5rCLS2xfJ810NuawjiAuGHVi/jon AWA== X-Google-Smtp-Source: AGHT+IHpj1lpqX7UFCEsgQvhYlvsKEu/LqdCTRcWojtEX3DucdiIWzMOj6Fj6hWiR/dHK1I4p9BRUvnrioQ= X-Received: from surenb-desktop.mtv.corp.google.com ([2a00:79e0:2e3f:8:53af:d9fa:522d:99b1]) (user=surenb job=sendgmr) by 2002:a25:bcc2:0:b0:e2b:e955:d58a with SMTP id 3f1490d57ef6-e337f8fa240mr13216276.7.1731358518983; Mon, 11 Nov 2024 12:55:18 -0800 (PST) Date: Mon, 11 Nov 2024 12:55:06 -0800 In-Reply-To: <20241111205506.3404479-1-surenb@google.com> Mime-Version: 1.0 References: <20241111205506.3404479-1-surenb@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241111205506.3404479-5-surenb@google.com> Subject: [PATCH 4/4] mm: move lesser used vma_area_struct members into the last cacheline From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 016DA1C0016 X-Stat-Signature: yxsiatdx59jxnr7ixeiwic3e4mwihhho X-Rspam-User: X-HE-Tag: 1731358468-197349 X-HE-Meta: U2FsdGVkX1/svnHxfs/03Gq/TvFt+7rBrc790EpZytyTHP2X3n3D4OC08qMUxPF0ubpV3zzzac+T4ZOWId12pho0QJogiWRaWYcIT9bSTzF7TkNi0E2n5deWXdXDfj9A6cDHQRiJA1myVmq1vDwSTM07d9Q9Iju4fRU+CJrn0uNZdoe+N7O+ZndnnXQyyD+X0glMv6yvvcvVFXg2+Vo/4tW0RwnySNgONxrovd8exdg+9qpttv4LKSUxcfhQDPou2CL++OpJXn5BcDc8no7bZ6oX04ZDerRZxPMFQiCBkcIkua492mkLOFNyK1VDTuEEYnM4mTWeFDGq/bWKC8lwFetwMsCTHiPFeXi4mXwfiXN2JoTHWe7zCVUWNGcPaw3QiyKgTjoJ7ya9SHi/9unp3sM5O22pDiRU46wtZSzjEuq12Iov+oXGVT0rQ9BUcEIiFLwXTCk9BBNJtVMaurdEMogLpQCbcNYKBS2w2A7JAuDtWsW50hCbQ8OzeUC2LK3wdc5TIINWKxmB7zJXZFEMMZscWig1inBQp7i5B0A/sMc7z/gKR+YJ8SjvE3TgL6ui9+f+WpLd5X9dHeLQFY/jLzaeZzMdXGynfaR3jgC4gRlHL+srgy28mKe8qPx4Zn/MRodbh2ohFTbF56LHBfOTqs7AY8iuk739tNxsFEsoZ7sDb47GVIzq/PWCLS9Z65FfDrxf9hgmFcNnSpwX0W7z1Ajr+WakyWn+Xjka5qnkGkICyc6Npiq1pfC7ZezHYMyR0Mp6wSPvi3KGDdsejqg/g0pp4fz4Qj+VF9pqj/Q+1pAe6DxoL17HgRNwnnGPNPs+NfhsWHVbIaREy/rSpZnWIctjaLjwZDbXGw/iOYv+V0sunXWyqdw4h6G7S+GAOMV7kSAb6awDI8M1Ynlc2fb0Q9nXPcgd9Ok1jtznKhfY4bgV5FGXpHd3VyL7i6mQK0qzAr9Dl9MN27mtnVgRB+Z HWEnmvUM PdUURfLZRl06RtHDFp8BaNfXWADELQnd/Rg7Zv2wHGc5tF3UURlciGFTavIOpsEzx3DqXXuiRdGKQsTAf9taSWVhhGIBtyDKwo48t5Ow5VrOfDwAAGf49chizTMj7xoGu5+ai1ceOhKKKt0RQwC209Rqnp/D39YKhdieqvcqTFsC8sOdwQvdxauuRJnpS1hIr3AOQxCW5oL1JcfgWG2XuiCHPMBbQ+G+nEH36ZNSYyQyZCSrgGUmH9Wl+CyTrZVZP9Hh4jRr7O3c6O3UimVUPwK61l4/R/Dbp4IomY9UzXR+uVv8zzkk3EM9/IOjERugOzTJKmAVAbak4bA9xMo4tH3jvAlnIl10wX2riTP+rHF+yU28R4KI1mytlh1l3TDMkAXKZSGTG05kZw2266b38FUKMgR05oHUHiwqVGPETDab3a8xDzZc3FYO3H8cbDAZEHkqAz8HiW2HTYOkUqYuykaC/wnfVxZM46Ww/3fyksDGCRpqb+94Jv8WTQFmW6EoU8iUUQvlhEepQKjzXe4ULXVayxd9U9ErCLXsyYrIA03sfc9HT+uIcmYO5hH1sWxeJHO21jLuDSkq49738Ek8p8ar9Vg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Move several vma_area_struct members which are rarely or never used during page fault handling into the last cacheline to better pack vm_area_struct. As a result vm_area_struct will fit into 3 cachelines as opposed to 4 cachelines before this change. New vm_area_struct layout: struct vm_area_struct { union { struct { long unsigned int vm_start; /* 0 8 */ long unsigned int vm_end; /* 8 8 */ }; /* 0 16 */ struct callback_head vm_rcu ; /* 0 16 */ } __attribute__((__aligned__(8))); /* 0 16 */ struct mm_struct * vm_mm; /* 16 8 */ pgprot_t vm_page_prot; /* 24 8 */ union { const vm_flags_t vm_flags; /* 32 8 */ vm_flags_t __vm_flags; /* 32 8 */ }; /* 32 8 */ bool detached; /* 40 1 */ /* XXX 3 bytes hole, try to pack */ unsigned int vm_lock_seq; /* 44 4 */ struct list_head anon_vma_chain; /* 48 16 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct anon_vma * anon_vma; /* 64 8 */ const struct vm_operations_struct * vm_ops; /* 72 8 */ long unsigned int vm_pgoff; /* 80 8 */ struct file * vm_file; /* 88 8 */ void * vm_private_data; /* 96 8 */ atomic_long_t swap_readahead_info; /* 104 8 */ struct mempolicy * vm_policy; /* 112 8 */ /* XXX 8 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ struct vma_lock vm_lock (__aligned__(64)); /* 128 4 */ /* XXX 4 bytes hole, try to pack */ struct { struct rb_node rb (__aligned__(8)); /* 136 24 */ long unsigned int rb_subtree_last; /* 160 8 */ } __attribute__((__aligned__(8))) shared; /* 136 32 */ struct vm_userfaultfd_ctx vm_userfaultfd_ctx; /* 168 0 */ /* size: 192, cachelines: 3, members: 17 */ /* sum members: 153, holes: 3, sum holes: 15 */ /* padding: 24 */ /* forced alignments: 3, forced holes: 2, sum forced holes: 12 */ } __attribute__((__aligned__(64))); Memory consumption per 1000 VMAs becomes 48 pages: slabinfo after vm_area_struct changes: ... : ... vm_area_struct ... 192 42 2 : ... Signed-off-by: Suren Baghdasaryan --- include/linux/mm_types.h | 37 ++++++++++++++++++------------------- 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 789bccc05520..c3755b680911 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -733,16 +733,6 @@ struct vm_area_struct { unsigned int vm_lock_seq; #endif - /* - * For areas with an address space and backing store, - * linkage into the address_space->i_mmap interval tree. - * - */ - struct { - struct rb_node rb; - unsigned long rb_subtree_last; - } shared; - /* * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma * list, after a COW of one of the file pages. A MAP_SHARED vma @@ -762,14 +752,6 @@ struct vm_area_struct { struct file * vm_file; /* File we map to (can be NULL). */ void * vm_private_data; /* was vm_pte (shared mem) */ -#ifdef CONFIG_ANON_VMA_NAME - /* - * For private and shared anonymous mappings, a pointer to a null - * terminated string containing the name given to the vma, or NULL if - * unnamed. Serialized by mmap_lock. Use anon_vma_name to access. - */ - struct anon_vma_name *anon_name; -#endif #ifdef CONFIG_SWAP atomic_long_t swap_readahead_info; #endif @@ -782,11 +764,28 @@ struct vm_area_struct { #ifdef CONFIG_NUMA_BALANCING struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif - struct vm_userfaultfd_ctx vm_userfaultfd_ctx; #ifdef CONFIG_PER_VMA_LOCK /* Unstable RCU readers are allowed to read this. */ struct vma_lock vm_lock ____cacheline_aligned_in_smp; #endif + /* + * For areas with an address space and backing store, + * linkage into the address_space->i_mmap interval tree. + * + */ + struct { + struct rb_node rb; + unsigned long rb_subtree_last; + } shared; +#ifdef CONFIG_ANON_VMA_NAME + /* + * For private and shared anonymous mappings, a pointer to a null + * terminated string containing the name given to the vma, or NULL if + * unnamed. Serialized by mmap_lock. Use anon_vma_name to access. + */ + struct anon_vma_name *anon_name; +#endif + struct vm_userfaultfd_ctx vm_userfaultfd_ctx; } __randomize_layout; #ifdef CONFIG_NUMA