From patchwork Sun Feb 6 21:34:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12736736 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3266C433F5 for ; Sun, 6 Feb 2022 21:34:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7A44A6B0072; Sun, 6 Feb 2022 16:34:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 753A56B0073; Sun, 6 Feb 2022 16:34:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F5CE6B0074; Sun, 6 Feb 2022 16:34:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 526B46B0072 for ; Sun, 6 Feb 2022 16:34:56 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 07FEA8249980 for ; Sun, 6 Feb 2022 21:34:56 +0000 (UTC) X-FDA: 79113660192.20.DC5D506 Received: from mail-qk1-f177.google.com (mail-qk1-f177.google.com [209.85.222.177]) by imf06.hostedemail.com (Postfix) with ESMTP id 9B408180005 for ; Sun, 6 Feb 2022 21:34:55 +0000 (UTC) Received: by mail-qk1-f177.google.com with SMTP id j24so9506307qkk.10 for ; Sun, 06 Feb 2022 13:34:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=aN9jBUAYtXHCGaaeDEIY7w9yB+GjJ5YgDtHsb77P658=; b=WpVzCKTSSVi6GoYJEZT6vLbf1sd7YZmUJrQ6su8wcqoyJi5mMuPwdKm+TX3wj+cPOF Onecd/s03fPVi3QTwLUozKzP+43eWRkt6eQGfAw+Eav1oQ8wH7lbwdDtAp6zNMltTqX1 daklbwZttoGb5ecV4vugBaGIyH1JP3g/07jFiOz+xLSBztZiZrBo5yTKBH2WfqzC6dcQ Xdd7czT4qmQsV224vt9lD1L4jsep2PPVpmCW+69lyEWmIeKHt4Kpg4nFyr7jQ3QQqc7C bbC842ieUWUA5hDtgLSyrj9rpWkBbIbq3cfLoiCemQ3Fee3xqz3hn/yId77xktb0Jh6K I14Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=aN9jBUAYtXHCGaaeDEIY7w9yB+GjJ5YgDtHsb77P658=; b=UrOKyTFti+GFHPjvNbwmMWPvqRj6UIN+LSTcFA3sKGQ4CRT3PLsuLe/TGXvEQJ2//Y fwJppFQzfdTC0rhEefYUDVhHpNFoenmdae27XxwJVf0QYjyP7QlAaHTZ8rs9lIOlQlGx qEce4uLz4b4oLCo8exH30N9oybjyonY26En/aeR4XL+UrcuDao6rATOO7qSSBfpKjF2V i+Vhmj6Yi2jxJaH7jkk9utyEcwnkBVZYHur5EbSpRspfeOovbwXLj27GmU5TrHGgBUTT l3+eb2Udm9iCGnXTkcOhFKyYx7UKyyQS+UX4PisnntiL25i6r1eyCSshiPI6XVfuGgu9 nVyg== X-Gm-Message-State: AOAM530K+IJo6V6p1a1iv8uMZLo6f4fWGb4QzEGfgHwmjWDkX9hNZI51 oioR0pJY768HsAv5G0kjsuJFeQ== X-Google-Smtp-Source: ABdhPJxujpl+3AC442NQ22lFrdKNnJHbmm/WntCsFSP6F26ynHKhbaiKOWngjy9Dx/FenVb41XuHdg== X-Received: by 2002:a37:658b:: with SMTP id z133mr4932257qkb.119.1644183294804; Sun, 06 Feb 2022 13:34:54 -0800 (PST) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id br30sm4544545qkb.67.2022.02.06.13.34.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 06 Feb 2022 13:34:54 -0800 (PST) Date: Sun, 6 Feb 2022 13:34:51 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Michal Hocko , Vlastimil Babka , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Alistair Popple , Johannes Weiner , Rik van Riel , Suren Baghdasaryan , Yu Zhao , Greg Thelen , Shakeel Butt , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 03/13] mm/munlock: delete munlock_vma_pages_all(), allow oomreap In-Reply-To: <8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com> Message-ID: <8dddb3d4-361-da5-538-3f3ae1b326b@google.com> References: <8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com> MIME-Version: 1.0 X-Stat-Signature: z9ybwjk1u4jfcsswk99tbn58x3zqfdim X-Rspam-User: nil Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=WpVzCKTS; spf=pass (imf06.hostedemail.com: domain of hughd@google.com designates 209.85.222.177 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 9B408180005 X-HE-Tag: 1644183295-278943 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: munlock_vma_pages_range() will still be required, when munlocking but not munmapping a set of pages; but when unmapping a pte, the mlock count will be maintained in much the same way as it will be maintained when mapping in the pte. Which removes the need for munlock_vma_pages_all() on mlocked vmas when munmapping or exiting: eliminating the catastrophic contention on i_mmap_rwsem, and the need for page lock on the pages. There is still a need to update locked_vm accounting according to the munmapped vmas when munmapping: do that in detach_vmas_to_be_unmapped(). exit_mmap() does not need locked_vm updates, so delete unlock_range(). And wasn't I the one who forbade the OOM reaper to attack mlocked vmas, because of the uncertainty in blocking on all those page locks? No fear of that now, so permit the OOM reaper on mlocked vmas. Signed-off-by: Hugh Dickins Acked-by: Vlastimil Babka --- mm/internal.h | 16 ++-------------- mm/madvise.c | 5 +++++ mm/mlock.c | 4 ++-- mm/mmap.c | 32 ++------------------------------ mm/oom_kill.c | 2 +- 5 files changed, 12 insertions(+), 47 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index e48c486d5ddf..f235aa92e564 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -71,11 +71,6 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, unsigned long floor, unsigned long ceiling); void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte); -static inline bool can_madv_lru_vma(struct vm_area_struct *vma) -{ - return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)); -} - struct zap_details; void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma, @@ -398,12 +393,8 @@ extern long populate_vma_page_range(struct vm_area_struct *vma, extern long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start, unsigned long end, bool write, int *locked); -extern void munlock_vma_pages_range(struct vm_area_struct *vma, - unsigned long start, unsigned long end); -static inline void munlock_vma_pages_all(struct vm_area_struct *vma) -{ - munlock_vma_pages_range(vma, vma->vm_start, vma->vm_end); -} +extern int mlock_future_check(struct mm_struct *mm, unsigned long flags, + unsigned long len); /* * must be called with vma's mmap_lock held for read or write, and page locked. @@ -411,9 +402,6 @@ static inline void munlock_vma_pages_all(struct vm_area_struct *vma) extern void mlock_vma_page(struct page *page); extern void munlock_vma_page(struct page *page); -extern int mlock_future_check(struct mm_struct *mm, unsigned long flags, - unsigned long len); - /* * Clear the page's PageMlocked(). This can be useful in a situation where * we want to unconditionally remove a page from the pagecache -- e.g., diff --git a/mm/madvise.c b/mm/madvise.c index 5604064df464..ae35d72627ef 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -530,6 +530,11 @@ static void madvise_cold_page_range(struct mmu_gather *tlb, tlb_end_vma(tlb, vma); } +static inline bool can_madv_lru_vma(struct vm_area_struct *vma) +{ + return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)); +} + static long madvise_cold(struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start_addr, unsigned long end_addr) diff --git a/mm/mlock.c b/mm/mlock.c index 544c18ce2c58..d148da934fe9 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -137,8 +137,8 @@ void munlock_vma_page(struct page *page) * Returns with VM_LOCKED cleared. Callers must be prepared to * deal with this. */ -void munlock_vma_pages_range(struct vm_area_struct *vma, - unsigned long start, unsigned long end) +static void munlock_vma_pages_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end) { /* Reimplementation to follow in later commit */ } diff --git a/mm/mmap.c b/mm/mmap.c index 1e8fdb0b51ed..64b5985b5295 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2674,6 +2674,8 @@ detach_vmas_to_be_unmapped(struct mm_struct *mm, struct vm_area_struct *vma, vma->vm_prev = NULL; do { vma_rb_erase(vma, &mm->mm_rb); + if (vma->vm_flags & VM_LOCKED) + mm->locked_vm -= vma_pages(vma); mm->map_count--; tail_vma = vma; vma = vma->vm_next; @@ -2778,22 +2780,6 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma, return __split_vma(mm, vma, addr, new_below); } -static inline void -unlock_range(struct vm_area_struct *start, unsigned long limit) -{ - struct mm_struct *mm = start->vm_mm; - struct vm_area_struct *tmp = start; - - while (tmp && tmp->vm_start < limit) { - if (tmp->vm_flags & VM_LOCKED) { - mm->locked_vm -= vma_pages(tmp); - munlock_vma_pages_all(tmp); - } - - tmp = tmp->vm_next; - } -} - /* Munmap is split into 2 main parts -- this part which finds * what needs doing, and the areas themselves, which do the * work. This now handles partial unmappings. @@ -2874,12 +2860,6 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len, return error; } - /* - * unlock any mlock()ed ranges before detaching vmas - */ - if (mm->locked_vm) - unlock_range(vma, end); - /* Detach vmas from rbtree */ if (!detach_vmas_to_be_unmapped(mm, vma, prev, end)) downgrade = false; @@ -3147,20 +3127,12 @@ void exit_mmap(struct mm_struct *mm) * Nothing can be holding mm->mmap_lock here and the above call * to mmu_notifier_release(mm) ensures mmu notifier callbacks in * __oom_reap_task_mm() will not block. - * - * This needs to be done before calling unlock_range(), - * which clears VM_LOCKED, otherwise the oom reaper cannot - * reliably test it. */ (void)__oom_reap_task_mm(mm); - set_bit(MMF_OOM_SKIP, &mm->flags); } mmap_write_lock(mm); - if (mm->locked_vm) - unlock_range(mm->mmap, ULONG_MAX); - arch_exit_mmap(mm); vma = mm->mmap; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 832fb330376e..6b875acabd1e 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -526,7 +526,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm) set_bit(MMF_UNSTABLE, &mm->flags); for (vma = mm->mmap ; vma; vma = vma->vm_next) { - if (!can_madv_lru_vma(vma)) + if (vma->vm_flags & (VM_HUGETLB|VM_PFNMAP)) continue; /*