From patchwork Mon Aug 29 21:25:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 12958453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AA1B5ECAAD2 for ; Mon, 29 Aug 2022 21:26:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: Mime-Version:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Owner; bh=KdCkD8ynwvRHryG+ihkgiuhPZSFjRG+ILQE7FhRI2pY=; b=ZWm /ZFjbOfsIQVzY5bSfdoktRYFvU3gAkq1zJzPBBmoZBT91NJp09gJJq3J6/TtCjYszEyf0T0XpRDo+ WkzJpgiI7AkiHVrl/NLXmU3sA/wl1mcZaYrIGB9DybOfFTQt5VFkhUca7q61+Va1nh6ssiZwoiIEV AY8cRRU9o+1NuIYpUT3KEgaA+iaCqwnVJUk6ioMnd/rcLt74f3YriMqqDIIPaEF/5cSL46QXZVzjy Swc3n5BiOcrvCx/oWsni/Se6wVBDenL9yK2tLsV28XvBxCM6IxCiZEVHgCmcrVSdu0+Cgp6JpOMwm TpxvsjGgelGweQiRVmNH1z08VQi2Zaw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1oSmGM-00Cn7t-Pe; Mon, 29 Aug 2022 21:25:42 +0000 Received: from mail-pj1-x1049.google.com ([2607:f8b0:4864:20::1049]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1oSmGH-00Cn4r-Lj for linux-arm-kernel@lists.infradead.org; Mon, 29 Aug 2022 21:25:39 +0000 Received: by mail-pj1-x1049.google.com with SMTP id oo12-20020a17090b1c8c00b001faa0b549caso10129439pjb.0 for ; Mon, 29 Aug 2022 14:25:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:from:to:cc; bh=pDTzUH9ZmkMZg3cmno897HgIwaaxstGwmWtClqeSQMY=; b=gjUmOZLV9bcbLonNS4lb3WBoz84QjOA/D7CmogvUP5ud9+InysL8Oh9E7DKJdwkXA1 t51SYWvWY7me7fS7UP+Cv2Nf/JhijDPumYxXmtSNDCmzKiJTL6HeIT2vpSjyKp+dKUA+ i/6AfO9yhnI27e6m85YLYK5uydUZDc69Ws3yGDc+9jVDOLuag8kYSvlT6GKmBabHISlj Agpyx6xSKbJ2VmCCsK/dQfHtKexX+gaWqNmpXiQWr3NzSlhT0v1YMSw9eFOFs2Oau/MZ ybOwhTMeJ9rLhGpwSsJLA1PyAZvwxA2dIcGal5knPsRHjNnfif7k984xL88bpesVgDea AYvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:x-gm-message-state:from:to:cc; bh=pDTzUH9ZmkMZg3cmno897HgIwaaxstGwmWtClqeSQMY=; b=a0tcfH9V93mJLXrNN/YP70WVjleXRewS/MnRtPtccsrOjEsuSCgnGbGirmIntF1uWj xhhN/gZpVez7JNTU+v0kcX6hdSuYmEQKh9m9D++QXpBx2Gp9j0Zv+eSyWJeNTUIubnQv lps8ql+ZDtA0Kq9ntcvYLsQWfBFUJdj5E67eG+y8xbROl40YD7g3oRd9C+RrzAhJNAF7 yqY4VplDb6f8Sejo1myWkxavGiWae0hzEeoYtXoLItPMPRidrSSDU+y57Q7tagu50FuM bgnVYkr5Na7ElTXLp8ltGpQfP7GN19q279ZlsfRfC15TigkxNSTygX7NisVr9DbSZw2X ddhg== X-Gm-Message-State: ACgBeo0EL5I8w5gLKYTX8yKIBuDhIGQZBzlGWbGlhYtd+v3O57V13Lea DjmhLIcdrSTEIsbdU5tkrPDlIYdRDk8= X-Google-Smtp-Source: AA6agR6UQmZ/vaRok0dLo0fY9ZItUpZFj7DUiATdYYW4UNrGosZUH5TdrbUN0W11FQtWqAaY7kD6g0jugM8= X-Received: from surenb-spec.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e30]) (user=surenb job=sendgmr) by 2002:a05:6a00:1a47:b0:52e:6a8c:5430 with SMTP id h7-20020a056a001a4700b0052e6a8c5430mr18212460pfv.48.1661808335904; Mon, 29 Aug 2022 14:25:35 -0700 (PDT) Date: Mon, 29 Aug 2022 21:25:03 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220829212531.3184856-1-surenb@google.com> Subject: [RFC PATCH 00/28] per-VMA locks proposal From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, paulmck@kernel.org, riel@surriel.com, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, surenb@google.com, kernel-team@android.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220829_142538_095927_92AD43EB X-CRM114-Status: GOOD ( 16.55 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org This is a proof of concept for per-vma locks idea that was discussed during SPF [1] discussion at LSF/MM this year [2], which concluded with suggestion that “a reader/writer semaphore could be put into the VMA itself; that would have the effect of using the VMA as a sort of range lock. There would still be contention at the VMA level, but it would be an improvement.” This patchset implements this suggested approach. When handling page faults we lookup the VMA that contains the faulting page under RCU protection and try to acquire its lock. If that fails we fall back to using mmap_lock, similar to how SPF handled this situation. One notable way the implementation deviates from the proposal is the way VMAs are marked as locked. Because during some of mm updates multiple VMAs need to be locked until the end of the update (e.g. vma_merge, split_vma, etc). Tracking all the locked VMAs, avoiding recursive locks and other complications would make the code more complex. Therefore we provide a way to "mark" VMAs as locked and then unmark all locked VMAs all at once. This is done using two sequence numbers - one in the vm_area_struct and one in the mm_struct. VMA is considered locked when these sequence numbers are equal. To mark a VMA as locked we set the sequence number in vm_area_struct to be equal to the sequence number in mm_struct. To unlock all VMAs we increment mm_struct's seq number. This allows for an efficient way to track locked VMAs and to drop the locks on all VMAs at the end of the update. The patchset implements per-VMA locking only for anonymous pages which are not in swap. If the initial proposal is considered acceptable, then support for swapped and file-backed page faults will be added. Performance benchmarks show similar although slightly smaller benefits as with SPF patchset (~75% of SPF benefits). Still, with lower complexity this approach might be more desirable. The patchset applies cleanly over 6.0-rc3 The tree for testing is posted at [3] [1] https://lore.kernel.org/all/20220128131006.67712-1-michel@lespinasse.org/ [2] https://lwn.net/Articles/893906/ [3] https://github.com/surenbaghdasaryan/linux/tree/per_vma_lock_rfc Laurent Dufour (2): powerc/mm: try VMA lock-based page fault handling first powerpc/mm: define ARCH_SUPPORTS_PER_VMA_LOCK Michel Lespinasse (1): mm: rcu safe VMA freeing Suren Baghdasaryan (25): mm: introduce CONFIG_PER_VMA_LOCK mm: introduce __find_vma to be used without mmap_lock protection mm: move mmap_lock assert function definitions mm: add per-VMA lock and helper functions to control it mm: mark VMA as locked whenever vma->vm_flags are modified kernel/fork: mark VMAs as locked before copying pages during fork mm/khugepaged: mark VMA as locked while collapsing a hugepage mm/mempolicy: mark VMA as locked when changing protection policy mm/mmap: mark VMAs as locked in vma_adjust mm/mmap: mark VMAs as locked before merging or splitting them mm/mremap: mark VMA as locked while remapping it to a new address range mm: conditionally mark VMA as locked in free_pgtables and unmap_page_range mm: mark VMAs as locked before isolating them mm/mmap: mark adjacent VMAs as locked if they can grow into unmapped area kernel/fork: assert no VMA readers during its destruction mm/mmap: prevent pagefault handler from racing with mmu_notifier registration mm: add FAULT_FLAG_VMA_LOCK flag mm: disallow do_swap_page to handle page faults under VMA lock mm: introduce per-VMA lock statistics mm: introduce find_and_lock_anon_vma to be used from arch-specific code x86/mm: try VMA lock-based page fault handling first x86/mm: define ARCH_SUPPORTS_PER_VMA_LOCK arm64/mm: try VMA lock-based page fault handling first arm64/mm: define ARCH_SUPPORTS_PER_VMA_LOCK kernel/fork: throttle call_rcu() calls in vm_area_free arch/arm64/Kconfig | 1 + arch/arm64/mm/fault.c | 36 +++++++++ arch/powerpc/mm/fault.c | 41 ++++++++++ arch/powerpc/platforms/powernv/Kconfig | 1 + arch/powerpc/platforms/pseries/Kconfig | 1 + arch/x86/Kconfig | 1 + arch/x86/mm/fault.c | 36 +++++++++ drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- fs/proc/task_mmu.c | 1 + fs/userfaultfd.c | 6 ++ include/linux/mm.h | 104 ++++++++++++++++++++++++- include/linux/mm_types.h | 33 ++++++-- include/linux/mmap_lock.h | 37 ++++++--- include/linux/vm_event_item.h | 6 ++ include/linux/vmstat.h | 6 ++ kernel/fork.c | 75 +++++++++++++++++- mm/Kconfig | 13 ++++ mm/Kconfig.debug | 8 ++ mm/init-mm.c | 6 ++ mm/internal.h | 4 +- mm/khugepaged.c | 1 + mm/madvise.c | 1 + mm/memory.c | 82 ++++++++++++++++--- mm/mempolicy.c | 6 +- mm/mlock.c | 2 + mm/mmap.c | 60 ++++++++++---- mm/mprotect.c | 1 + mm/mremap.c | 1 + mm/nommu.c | 2 + mm/oom_kill.c | 3 +- mm/vmstat.c | 6 ++ 31 files changed, 531 insertions(+), 54 deletions(-)