From patchwork Thu May 17 11:06:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laurent Dufour X-Patchwork-Id: 10406389 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DC865602C2 for ; Thu, 17 May 2018 11:07:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C812828A69 for ; Thu, 17 May 2018 11:07:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B931628A07; Thu, 17 May 2018 11:07:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6953528A07 for ; Thu, 17 May 2018 11:07:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CBAAF6B04B0; Thu, 17 May 2018 07:07:14 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C43CD6B04B1; Thu, 17 May 2018 07:07:14 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B098B6B04B2; Thu, 17 May 2018 07:07:14 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wr0-f197.google.com (mail-wr0-f197.google.com [209.85.128.197]) by kanga.kvack.org (Postfix) with ESMTP id 3EDA36B04B0 for ; Thu, 17 May 2018 07:07:14 -0400 (EDT) Received: by mail-wr0-f197.google.com with SMTP id r23-v6so2861725wrc.2 for ; Thu, 17 May 2018 04:07:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:in-reply-to:references:message-id; bh=4mzpgughF1INlQZEGjG6Xa2zD2ghpQIRIg42oeHTh0g=; b=YQUBvgMOUdmP5Mtqn7obPm9dVI5Dt4zP0XKnuU2UrhCsiQupya0n7Ob3mSDAcfKFGZ j3IvHwNW22IE1jDmt43PvFrLn249POFKmm9eA1lFMpg7pShpQXLrcItRsMHgsoeaXpKi fHYBAqEbhTxSBI39a0fbjMxZ/KAv6GUcwJ7qGEL9DpIHBbSejelXpDzrPuNqMH3CBgVN 5TTmtPCZy/E4Z9pI2Sr59rZ2KjtUulvkDgmwgyI+RTSCjbX7QNiFm2PGo0zCZ7MHZT+y 5aRnMkMP+7NLOamk8OTa+ywQH8Q89I6ESLrqAtuyZxJKAWhiteteWqoKgGmWgEBssnmP cZ4w== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 148.163.158.5 is neither permitted nor denied by best guess record for domain of ldufour@linux.vnet.ibm.com) smtp.mailfrom=ldufour@linux.vnet.ibm.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com X-Gm-Message-State: ALKqPwen9LcAq2TYGbhF3Gohk6tED0nhp8PEh0QPj9Xsx3itqAD8McqW KJ+r82zrKJakTtrg+UCYYOXNv49yLazGgP/HhJ8e7qqMRNOCTGxFgepWcOprb4cYyDb9Z0wZ5YJ cId08IZAW+vFWJZy0DzoadCoRaW+2pt/9O7eJD3qmsd622r6xF2pTUI9w/RI/GiE= X-Received: by 2002:a50:f0c7:: with SMTP id a7-v6mr6505619edm.90.1526555233812; Thu, 17 May 2018 04:07:13 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqIlQMIYMRe0BlD9GM4hCOTCUNz3TnDP42Kq3GbkDsPDsgXVCOrTWHbY0f0+Wow9aSsoQP3 X-Received: by 2002:a50:f0c7:: with SMTP id a7-v6mr6505531edm.90.1526555232613; Thu, 17 May 2018 04:07:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526555232; cv=none; d=google.com; s=arc-20160816; b=t702FhmwPMUcCpTkOZAQXi/Aw1ZJgTYVz0G7Jy7XIJ+Gq39sFZFuZGIpz3CyQYxjpO QharOyui3+l8J4oxuejW0ITUh3JSnmGB7Mt/sDBJIsBMg0NfASoP0AIBcD8pWzwXyoGF nQkU21aMycuan2CNw5dvPKLlzSJDHBJZmnuQnj3w5fsJ41CkkMJLnzxNUFHdi6cParH9 xNT+cF80/t8kmTVOfBxvyF44MCnu4W+P+2jvXakUzpvkLRT1TDVOypRwHUy9Zqm/xvRC /74Mgja9LDlC/PB6GwoezwRXMnRubugPwtpsKYmVmB2/fe4eB94l6qP5Zm5uVbwWIMrY 1NUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:references:in-reply-to:date:subject:cc:to:from :arc-authentication-results; bh=4mzpgughF1INlQZEGjG6Xa2zD2ghpQIRIg42oeHTh0g=; b=owgcTgL6302lR0IQDzSAY1GgjN+NebVAUzQYFDwmmDiPW9JTjgV38rVIH5qapsx8bl QPGA2RK+7hZbrEzjVPDI1KFbnH+ELqUoFIufjOU8Ot/IHTyjt2CXJZf433oFS0vLUIvV fKtAeKJO3iNXxR/9GCNpp+Z/uPo7jdmZafby/tJfD9cNvdC7iC8kBIR1nwWGT0w/UmXP NzohJGtu+iu25ByfoRuUu9RGRFmWlnGz+nobaFCXNHAqd0LFbNF0p5UtbVegsWZZrncN ELEZ7H5kaIyEwbFZLg+1hLIh4s88FUZX6KEXdYpzsEifMDdsLyLqdDA78W629hqYuYV5 9tWw== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 148.163.158.5 is neither permitted nor denied by best guess record for domain of ldufour@linux.vnet.ibm.com) smtp.mailfrom=ldufour@linux.vnet.ibm.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com. [148.163.158.5]) by mx.google.com with ESMTPS id g31-v6si753299edd.382.2018.05.17.04.07.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 May 2018 04:07:12 -0700 (PDT) Received-SPF: neutral (google.com: 148.163.158.5 is neither permitted nor denied by best guess record for domain of ldufour@linux.vnet.ibm.com) client-ip=148.163.158.5; Authentication-Results: mx.google.com; spf=neutral (google.com: 148.163.158.5 is neither permitted nor denied by best guess record for domain of ldufour@linux.vnet.ibm.com) smtp.mailfrom=ldufour@linux.vnet.ibm.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w4HB5563127111 for ; Thu, 17 May 2018 07:07:11 -0400 Received: from e06smtp11.uk.ibm.com (e06smtp11.uk.ibm.com [195.75.94.107]) by mx0b-001b2d01.pphosted.com with ESMTP id 2j17v0t18m-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 17 May 2018 07:07:11 -0400 Received: from localhost by e06smtp11.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 17 May 2018 12:07:08 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp11.uk.ibm.com (192.168.101.141) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 17 May 2018 12:06:58 +0100 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w4HB6wVX12714466; Thu, 17 May 2018 11:06:58 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3A7924C050; Thu, 17 May 2018 11:58:47 +0100 (BST) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F2FCF4C044; Thu, 17 May 2018 11:58:45 +0100 (BST) Received: from nimbus.lab.toulouse-stg.fr.ibm.com (unknown [9.101.4.33]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 17 May 2018 11:58:45 +0100 (BST) From: Laurent Dufour To: akpm@linux-foundation.org, mhocko@kernel.org, peterz@infradead.org, kirill@shutemov.name, ak@linux.intel.com, dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , khandual@linux.vnet.ibm.com, aneesh.kumar@linux.vnet.ibm.com, benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , Sergey Senozhatsky , sergey.senozhatsky.work@gmail.com, Andrea Arcangeli , Alexei Starovoitov , kemi.wang@intel.com, Daniel Jordan , David Rientjes , Jerome Glisse , Ganesh Mahendran , Minchan Kim , Punit Agrawal , vinayak menon , Yang Shi Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, npiggin@gmail.com, bsingharora@gmail.com, paulmck@linux.vnet.ibm.com, Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org Subject: [PATCH v11 10/26] mm: protect VMA modifications using VMA sequence count Date: Thu, 17 May 2018 13:06:17 +0200 X-Mailer: git-send-email 2.7.4 In-Reply-To: <1526555193-7242-1-git-send-email-ldufour@linux.vnet.ibm.com> References: <1526555193-7242-1-git-send-email-ldufour@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 18051711-0040-0000-0000-0000045A6DC7 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18051711-0041-0000-0000-000020FEA53F Message-Id: <1526555193-7242-11-git-send-email-ldufour@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-05-17_05:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1805170100 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The VMA sequence count has been introduced to allow fast detection of VMA modification when running a page fault handler without holding the mmap_sem. This patch provides protection against the VMA modification done in : - madvise() - mpol_rebind_policy() - vma_replace_policy() - change_prot_numa() - mlock(), munlock() - mprotect() - mmap_region() - collapse_huge_page() - userfaultd registering services In addition, VMA fields which will be read during the speculative fault path needs to be written using WRITE_ONCE to prevent write to be split and intermediate values to be pushed to other CPUs. Signed-off-by: Laurent Dufour Signed-off-by: Laurent Dufour --- fs/proc/task_mmu.c | 5 ++++- fs/userfaultfd.c | 17 +++++++++++++---- mm/khugepaged.c | 3 +++ mm/madvise.c | 6 +++++- mm/mempolicy.c | 51 ++++++++++++++++++++++++++++++++++----------------- mm/mlock.c | 13 ++++++++----- mm/mmap.c | 22 +++++++++++++--------- mm/mprotect.c | 4 +++- mm/swap_state.c | 8 ++++++-- 9 files changed, 89 insertions(+), 40 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 597969db9e90..7247d6d5afba 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1137,8 +1137,11 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, goto out_mm; } for (vma = mm->mmap; vma; vma = vma->vm_next) { - vma->vm_flags &= ~VM_SOFTDIRTY; + vm_write_begin(vma); + WRITE_ONCE(vma->vm_flags, + vma->vm_flags & ~VM_SOFTDIRTY); vma_set_page_prot(vma); + vm_write_end(vma); } downgrade_write(&mm->mmap_sem); break; diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index cec550c8468f..b8212ba17695 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -659,8 +659,11 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs) octx = vma->vm_userfaultfd_ctx.ctx; if (!octx || !(octx->features & UFFD_FEATURE_EVENT_FORK)) { + vm_write_begin(vma); vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - vma->vm_flags &= ~(VM_UFFD_WP | VM_UFFD_MISSING); + WRITE_ONCE(vma->vm_flags, + vma->vm_flags & ~(VM_UFFD_WP | VM_UFFD_MISSING)); + vm_write_end(vma); return 0; } @@ -885,8 +888,10 @@ static int userfaultfd_release(struct inode *inode, struct file *file) vma = prev; else prev = vma; - vma->vm_flags = new_flags; + vm_write_begin(vma); + WRITE_ONCE(vma->vm_flags, new_flags); vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; + vm_write_end(vma); } up_write(&mm->mmap_sem); mmput(mm); @@ -1434,8 +1439,10 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, * the next vma was merged into the current one and * the current one has not been updated yet. */ - vma->vm_flags = new_flags; + vm_write_begin(vma); + WRITE_ONCE(vma->vm_flags, new_flags); vma->vm_userfaultfd_ctx.ctx = ctx; + vm_write_end(vma); skip: prev = vma; @@ -1592,8 +1599,10 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, * the next vma was merged into the current one and * the current one has not been updated yet. */ - vma->vm_flags = new_flags; + vm_write_begin(vma); + WRITE_ONCE(vma->vm_flags, new_flags); vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; + vm_write_end(vma); skip: prev = vma; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index d7b2a4bf8671..0b28af4b950d 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1011,6 +1011,7 @@ static void collapse_huge_page(struct mm_struct *mm, if (mm_find_pmd(mm, address) != pmd) goto out; + vm_write_begin(vma); anon_vma_lock_write(vma->anon_vma); pte = pte_offset_map(pmd, address); @@ -1046,6 +1047,7 @@ static void collapse_huge_page(struct mm_struct *mm, pmd_populate(mm, pmd, pmd_pgtable(_pmd)); spin_unlock(pmd_ptl); anon_vma_unlock_write(vma->anon_vma); + vm_write_end(vma); result = SCAN_FAIL; goto out; } @@ -1080,6 +1082,7 @@ static void collapse_huge_page(struct mm_struct *mm, set_pmd_at(mm, address, pmd, _pmd); update_mmu_cache_pmd(vma, address, pmd); spin_unlock(pmd_ptl); + vm_write_end(vma); *hpage = NULL; diff --git a/mm/madvise.c b/mm/madvise.c index 4d3c922ea1a1..e328f7ab5942 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -184,7 +184,9 @@ static long madvise_behavior(struct vm_area_struct *vma, /* * vm_flags is protected by the mmap_sem held in write mode. */ - vma->vm_flags = new_flags; + vm_write_begin(vma); + WRITE_ONCE(vma->vm_flags, new_flags); + vm_write_end(vma); out: return error; } @@ -450,9 +452,11 @@ static void madvise_free_page_range(struct mmu_gather *tlb, .private = tlb, }; + vm_write_begin(vma); tlb_start_vma(tlb, vma); walk_page_range(addr, end, &free_walk); tlb_end_vma(tlb, vma); + vm_write_end(vma); } static int madvise_free_single_vma(struct vm_area_struct *vma, diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 9ac49ef17b4e..898d325c9fea 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -380,8 +380,11 @@ void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new) struct vm_area_struct *vma; down_write(&mm->mmap_sem); - for (vma = mm->mmap; vma; vma = vma->vm_next) + for (vma = mm->mmap; vma; vma = vma->vm_next) { + vm_write_begin(vma); mpol_rebind_policy(vma->vm_policy, new); + vm_write_end(vma); + } up_write(&mm->mmap_sem); } @@ -554,9 +557,11 @@ unsigned long change_prot_numa(struct vm_area_struct *vma, { int nr_updated; + vm_write_begin(vma); nr_updated = change_protection(vma, addr, end, PAGE_NONE, 0, 1); if (nr_updated) count_vm_numa_events(NUMA_PTE_UPDATES, nr_updated); + vm_write_end(vma); return nr_updated; } @@ -657,6 +662,7 @@ static int vma_replace_policy(struct vm_area_struct *vma, if (IS_ERR(new)) return PTR_ERR(new); + vm_write_begin(vma); if (vma->vm_ops && vma->vm_ops->set_policy) { err = vma->vm_ops->set_policy(vma, new); if (err) @@ -664,11 +670,17 @@ static int vma_replace_policy(struct vm_area_struct *vma, } old = vma->vm_policy; - vma->vm_policy = new; /* protected by mmap_sem */ + /* + * The speculative page fault handler accesses this field without + * hodling the mmap_sem. + */ + WRITE_ONCE(vma->vm_policy, new); + vm_write_end(vma); mpol_put(old); return 0; err_out: + vm_write_end(vma); mpol_put(new); return err; } @@ -1614,23 +1626,28 @@ COMPAT_SYSCALL_DEFINE4(migrate_pages, compat_pid_t, pid, struct mempolicy *__get_vma_policy(struct vm_area_struct *vma, unsigned long addr) { - struct mempolicy *pol = NULL; + struct mempolicy *pol; - if (vma) { - if (vma->vm_ops && vma->vm_ops->get_policy) { - pol = vma->vm_ops->get_policy(vma, addr); - } else if (vma->vm_policy) { - pol = vma->vm_policy; + if (!vma) + return NULL; - /* - * shmem_alloc_page() passes MPOL_F_SHARED policy with - * a pseudo vma whose vma->vm_ops=NULL. Take a reference - * count on these policies which will be dropped by - * mpol_cond_put() later - */ - if (mpol_needs_cond_ref(pol)) - mpol_get(pol); - } + if (vma->vm_ops && vma->vm_ops->get_policy) + return vma->vm_ops->get_policy(vma, addr); + + /* + * This could be called without holding the mmap_sem in the + * speculative page fault handler's path. + */ + pol = READ_ONCE(vma->vm_policy); + if (pol) { + /* + * shmem_alloc_page() passes MPOL_F_SHARED policy with + * a pseudo vma whose vma->vm_ops=NULL. Take a reference + * count on these policies which will be dropped by + * mpol_cond_put() later + */ + if (mpol_needs_cond_ref(pol)) + mpol_get(pol); } return pol; diff --git a/mm/mlock.c b/mm/mlock.c index 74e5a6547c3d..c40285c94ced 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -445,7 +445,9 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec, void munlock_vma_pages_range(struct vm_area_struct *vma, unsigned long start, unsigned long end) { - vma->vm_flags &= VM_LOCKED_CLEAR_MASK; + vm_write_begin(vma); + WRITE_ONCE(vma->vm_flags, vma->vm_flags & VM_LOCKED_CLEAR_MASK); + vm_write_end(vma); while (start < end) { struct page *page; @@ -568,10 +570,11 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev, * It's okay if try_to_unmap_one unmaps a page just after we * set VM_LOCKED, populate_vma_page_range will bring it back. */ - - if (lock) - vma->vm_flags = newflags; - else + if (lock) { + vm_write_begin(vma); + WRITE_ONCE(vma->vm_flags, newflags); + vm_write_end(vma); + } else munlock_vma_pages_range(vma, start, end); out: diff --git a/mm/mmap.c b/mm/mmap.c index eeafd0bc8b36..add13b4e1d8d 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -852,17 +852,18 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, } if (start != vma->vm_start) { - vma->vm_start = start; + WRITE_ONCE(vma->vm_start, start); start_changed = true; } if (end != vma->vm_end) { - vma->vm_end = end; + WRITE_ONCE(vma->vm_end, end); end_changed = true; } - vma->vm_pgoff = pgoff; + WRITE_ONCE(vma->vm_pgoff, pgoff); if (adjust_next) { - next->vm_start += adjust_next << PAGE_SHIFT; - next->vm_pgoff += adjust_next; + WRITE_ONCE(next->vm_start, + next->vm_start + (adjust_next << PAGE_SHIFT)); + WRITE_ONCE(next->vm_pgoff, next->vm_pgoff + adjust_next); } if (root) { @@ -1793,13 +1794,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr, out: perf_event_mmap(vma); + vm_write_begin(vma); vm_stat_account(mm, vm_flags, len >> PAGE_SHIFT); if (vm_flags & VM_LOCKED) { if (!((vm_flags & VM_SPECIAL) || is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm))) mm->locked_vm += (len >> PAGE_SHIFT); else - vma->vm_flags &= VM_LOCKED_CLEAR_MASK; + WRITE_ONCE(vma->vm_flags, + vma->vm_flags & VM_LOCKED_CLEAR_MASK); } if (file) @@ -1812,9 +1815,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr, * then new mapped in-place (which must be aimed as * a completely new data area). */ - vma->vm_flags |= VM_SOFTDIRTY; + WRITE_ONCE(vma->vm_flags, vma->vm_flags | VM_SOFTDIRTY); vma_set_page_prot(vma); + vm_write_end(vma); return addr; @@ -2443,8 +2447,8 @@ int expand_downwards(struct vm_area_struct *vma, mm->locked_vm += grow; vm_stat_account(mm, vma->vm_flags, grow); anon_vma_interval_tree_pre_update_vma(vma); - vma->vm_start = address; - vma->vm_pgoff -= grow; + WRITE_ONCE(vma->vm_start, address); + WRITE_ONCE(vma->vm_pgoff, vma->vm_pgoff - grow); anon_vma_interval_tree_post_update_vma(vma); vma_gap_update(vma); spin_unlock(&mm->page_table_lock); diff --git a/mm/mprotect.c b/mm/mprotect.c index 625608bc8962..83594cc68062 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -375,12 +375,14 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, * vm_flags and vm_page_prot are protected by the mmap_sem * held in write mode. */ - vma->vm_flags = newflags; + vm_write_begin(vma); + WRITE_ONCE(vma->vm_flags, newflags); dirty_accountable = vma_wants_writenotify(vma, vma->vm_page_prot); vma_set_page_prot(vma); change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable, 0); + vm_write_end(vma); /* * Private VM_LOCKED VMA becoming writable: trigger COW to avoid major diff --git a/mm/swap_state.c b/mm/swap_state.c index c6b3eab73fde..2ee7198df281 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -572,6 +572,10 @@ static unsigned long swapin_nr_pages(unsigned long offset) * the readahead. * * Caller must hold down_read on the vma->vm_mm if vmf->vma is not NULL. + * This is needed to ensure the VMA will not be freed in our back. In the case + * of the speculative page fault handler, this cannot happen, even if we don't + * hold the mmap_sem. Callees are assumed to take care of reading VMA's fields + * using READ_ONCE() to read consistent values. */ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, struct vm_fault *vmf) @@ -665,9 +669,9 @@ static inline void swap_ra_clamp_pfn(struct vm_area_struct *vma, unsigned long *start, unsigned long *end) { - *start = max3(lpfn, PFN_DOWN(vma->vm_start), + *start = max3(lpfn, PFN_DOWN(READ_ONCE(vma->vm_start)), PFN_DOWN(faddr & PMD_MASK)); - *end = min3(rpfn, PFN_DOWN(vma->vm_end), + *end = min3(rpfn, PFN_DOWN(READ_ONCE(vma->vm_end)), PFN_DOWN((faddr & PMD_MASK) + PMD_SIZE)); }