From patchwork Thu May 17 11:06:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laurent Dufour X-Patchwork-Id: 10406387 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id F110A602C2 for ; Thu, 17 May 2018 11:07:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DC9C028A07 for ; Thu, 17 May 2018 11:07:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CF81128A5B; Thu, 17 May 2018 11:07:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AE13D28A6E for ; Thu, 17 May 2018 11:07:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F03666B04AE; Thu, 17 May 2018 07:07:11 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EB2646B04B0; Thu, 17 May 2018 07:07:11 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D5E2A6B04B1; Thu, 17 May 2018 07:07:11 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wr0-f197.google.com (mail-wr0-f197.google.com [209.85.128.197]) by kanga.kvack.org (Postfix) with ESMTP id 583E96B04AE for ; Thu, 17 May 2018 07:07:11 -0400 (EDT) Received: by mail-wr0-f197.google.com with SMTP id k27-v6so2737579wre.23 for ; Thu, 17 May 2018 04:07:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:in-reply-to:references:message-id; bh=3zPuyQM0KWcl0br+UtQ423NwYqX04uJf52iJUPdtqs4=; b=o+kh50BQaDe8Xehkz7nbRogF20TRCv695Ip1FiwBvyjYwwzXdqxwg0iYCT6DEuftLA s0QsjnpU3Iz6KljtOsGKcGIdE/5uLkjlQt5fqiefCUxg4TFs5pidqL5coSJhMaNFf9ZA futk/yYVQh2bLY26iul7EueEvdnNnaBnRWF6X1VADZ86SKy/a5dgRcAIl1WiH0MKt01k OygpGXHK03nYGKgWel4Cp+laV2BDUeYpawWpAb8tGTSr8I3+R4uotk7bxkREEXih14pH nnSTnUhEJgyJ+GBnJtSnUgnLLjYY7wZR4M7zff8SRJNgxzJ1SWdGj/afq7qLFZmKbHJv GkqQ== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 148.163.158.5 is neither permitted nor denied by best guess record for domain of ldufour@linux.vnet.ibm.com) smtp.mailfrom=ldufour@linux.vnet.ibm.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com X-Gm-Message-State: ALKqPwfDdQAqoJ9yGvg7rYSF0xpG+nRep46XXIkptgFIxE6QmX+EcBFp ANYHe8mPhGyP5/xuHuAinsLSvWw7j3wiuXY3kgt7E3N/sErm2802VCLGZsf36QlNshoedFMjB6j 0eE5/NZXJDmo0giuSSibR2MNE9srjFz4wiRssojy8VOAocqr+OFRdoArsxdVdDEQ= X-Received: by 2002:a50:d00a:: with SMTP id j10-v6mr6396457edf.44.1526555230911; Thu, 17 May 2018 04:07:10 -0700 (PDT) X-Google-Smtp-Source: AB8JxZq2bpBj9XD6caXE1zOonAuZ1Y2l0BqLlhUkdvWPStJ7HO2rEBWf2ouMNN6CJgRPpidBTlA3 X-Received: by 2002:a50:d00a:: with SMTP id j10-v6mr6396400edf.44.1526555230098; Thu, 17 May 2018 04:07:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526555230; cv=none; d=google.com; s=arc-20160816; b=oJABYh/LP1uInJcN0IzdILv15cU8FaaU+4V1zxRu08hH5+s3ui5FOTCFvJZ5adqi1f niFBkOgT5aIaKrGcbdy1KeBqopWgg6mEziOJzRMvfr2Vclyufos/ukmVyLxPFhBpvkqE 1JYTseizFBDB2KgACakera4xYKjqMRx+MR0XZRGWrXwH7MwJsZo1Dj2BKMRiNcRqqN+e bkytDrKOA6z8DXYmjZqi/mVyThywt9bVERjw3HbBjKo3efWCn+maI+xJt6B0a98J3cRJ u8X/oJQrPCkr6Or3NLHQOdfrzMtL6Bb3ZRihTdtBo9W6RBCtxC0XR6u8iT3Eb9YAgDOP myGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:references:in-reply-to:date:subject:cc:to:from :arc-authentication-results; bh=3zPuyQM0KWcl0br+UtQ423NwYqX04uJf52iJUPdtqs4=; b=Di4i7LhUjnjNRVRm7OQHM0XairD+WNzoGWayUO0JdIdDrV2ohrXC0JYS00K2AU4e88 hTsQpOpGzGiCQQMFmyOQnbj2MiUD5ICtJjw8srMW7DcMDp1LskNDoupedaLM7Wp4Z8O2 oyGKj7XkE/02z6v1ek57MelHhLSEp/pyGKF1eTe/yW1SHiIcztlHq4wdqFUEWSk0/8Mz j2unyUUjM+bYVvJPWC2gWfUF1oZOok44UQzmqPYUKXLqAhIRX4+1Giu2FwGcuuMgsHvI oGT3f7m5pz8nxGeQriaHgqeJmQ7AKGWw5Ux6yQHXBi3fABVMNlq84xH2wfnLbZ+3Re6k wq3A== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 148.163.158.5 is neither permitted nor denied by best guess record for domain of ldufour@linux.vnet.ibm.com) smtp.mailfrom=ldufour@linux.vnet.ibm.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com. [148.163.158.5]) by mx.google.com with ESMTPS id i2-v6si4749883edb.100.2018.05.17.04.07.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 May 2018 04:07:10 -0700 (PDT) Received-SPF: neutral (google.com: 148.163.158.5 is neither permitted nor denied by best guess record for domain of ldufour@linux.vnet.ibm.com) client-ip=148.163.158.5; Authentication-Results: mx.google.com; spf=neutral (google.com: 148.163.158.5 is neither permitted nor denied by best guess record for domain of ldufour@linux.vnet.ibm.com) smtp.mailfrom=ldufour@linux.vnet.ibm.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w4HB4FDb054424 for ; Thu, 17 May 2018 07:07:08 -0400 Received: from e06smtp13.uk.ibm.com (e06smtp13.uk.ibm.com [195.75.94.109]) by mx0a-001b2d01.pphosted.com with ESMTP id 2j17kxawbm-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 17 May 2018 07:07:08 -0400 Received: from localhost by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 17 May 2018 12:07:06 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp13.uk.ibm.com (192.168.101.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 17 May 2018 12:06:56 +0100 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w4HB6ug82818510; Thu, 17 May 2018 11:06:56 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E3D4E4C04E; Thu, 17 May 2018 11:58:44 +0100 (BST) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A10074C05E; Thu, 17 May 2018 11:58:43 +0100 (BST) Received: from nimbus.lab.toulouse-stg.fr.ibm.com (unknown [9.101.4.33]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 17 May 2018 11:58:43 +0100 (BST) From: Laurent Dufour To: akpm@linux-foundation.org, mhocko@kernel.org, peterz@infradead.org, kirill@shutemov.name, ak@linux.intel.com, dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , khandual@linux.vnet.ibm.com, aneesh.kumar@linux.vnet.ibm.com, benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , Sergey Senozhatsky , sergey.senozhatsky.work@gmail.com, Andrea Arcangeli , Alexei Starovoitov , kemi.wang@intel.com, Daniel Jordan , David Rientjes , Jerome Glisse , Ganesh Mahendran , Minchan Kim , Punit Agrawal , vinayak menon , Yang Shi Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, npiggin@gmail.com, bsingharora@gmail.com, paulmck@linux.vnet.ibm.com, Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org Subject: [PATCH v11 09/26] mm: VMA sequence count Date: Thu, 17 May 2018 13:06:16 +0200 X-Mailer: git-send-email 2.7.4 In-Reply-To: <1526555193-7242-1-git-send-email-ldufour@linux.vnet.ibm.com> References: <1526555193-7242-1-git-send-email-ldufour@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 18051711-0012-0000-0000-000005D790D9 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18051711-0013-0000-0000-00001954BBC7 Message-Id: <1526555193-7242-10-git-send-email-ldufour@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-05-17_05:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1805170100 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Peter Zijlstra Wrap the VMA modifications (vma_adjust/unmap_page_range) with sequence counts such that we can easily test if a VMA is changed. The calls to vm_write_begin/end() in unmap_page_range() are used to detect when a VMA is being unmap and thus that new page fault should not be satisfied for this VMA. If the seqcount hasn't changed when the page table are locked, this means we are safe to satisfy the page fault. The flip side is that we cannot distinguish between a vma_adjust() and the unmap_page_range() -- where with the former we could have re-checked the vma bounds against the address. The VMA's sequence counter is also used to detect change to various VMA's fields used during the page fault handling, such as: - vm_start, vm_end - vm_pgoff - vm_flags, vm_page_prot - anon_vma - vm_policy Signed-off-by: Peter Zijlstra (Intel) [Port to 4.12 kernel] [Build depends on CONFIG_SPECULATIVE_PAGE_FAULT] [Introduce vm_write_* inline function depending on CONFIG_SPECULATIVE_PAGE_FAULT] [Fix lock dependency between mapping->i_mmap_rwsem and vma->vm_sequence by using vm_raw_write* functions] [Fix a lock dependency warning in mmap_region() when entering the error path] [move sequence initialisation INIT_VMA()] [Review the patch description about unmap_page_range()] Signed-off-by: Laurent Dufour --- include/linux/mm.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 3 +++ mm/memory.c | 2 ++ mm/mmap.c | 31 +++++++++++++++++++++++++++++++ 4 files changed, 80 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 35ecb983ff36..18acfdeee759 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1306,6 +1306,9 @@ struct zap_details { static inline void INIT_VMA(struct vm_area_struct *vma) { INIT_LIST_HEAD(&vma->anon_vma_chain); +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + seqcount_init(&vma->vm_sequence); +#endif } struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, @@ -1428,6 +1431,47 @@ static inline void unmap_shared_mapping_range(struct address_space *mapping, unmap_mapping_range(mapping, holebegin, holelen, 0); } +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT +static inline void vm_write_begin(struct vm_area_struct *vma) +{ + write_seqcount_begin(&vma->vm_sequence); +} +static inline void vm_write_begin_nested(struct vm_area_struct *vma, + int subclass) +{ + write_seqcount_begin_nested(&vma->vm_sequence, subclass); +} +static inline void vm_write_end(struct vm_area_struct *vma) +{ + write_seqcount_end(&vma->vm_sequence); +} +static inline void vm_raw_write_begin(struct vm_area_struct *vma) +{ + raw_write_seqcount_begin(&vma->vm_sequence); +} +static inline void vm_raw_write_end(struct vm_area_struct *vma) +{ + raw_write_seqcount_end(&vma->vm_sequence); +} +#else +static inline void vm_write_begin(struct vm_area_struct *vma) +{ +} +static inline void vm_write_begin_nested(struct vm_area_struct *vma, + int subclass) +{ +} +static inline void vm_write_end(struct vm_area_struct *vma) +{ +} +static inline void vm_raw_write_begin(struct vm_area_struct *vma) +{ +} +static inline void vm_raw_write_end(struct vm_area_struct *vma) +{ +} +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ + extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, unsigned int gup_flags); extern int access_remote_vm(struct mm_struct *mm, unsigned long addr, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 54f1e05ecf3e..fb5962308183 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -335,6 +335,9 @@ struct vm_area_struct { struct mempolicy *vm_policy; /* NUMA policy for the VMA */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + seqcount_t vm_sequence; +#endif } __randomize_layout; struct core_thread { diff --git a/mm/memory.c b/mm/memory.c index 75163c145c76..551a1916da5d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1499,6 +1499,7 @@ void unmap_page_range(struct mmu_gather *tlb, unsigned long next; BUG_ON(addr >= end); + vm_write_begin(vma); tlb_start_vma(tlb, vma); pgd = pgd_offset(vma->vm_mm, addr); do { @@ -1508,6 +1509,7 @@ void unmap_page_range(struct mmu_gather *tlb, next = zap_p4d_range(tlb, vma, pgd, addr, next, details); } while (pgd++, addr = next, addr != end); tlb_end_vma(tlb, vma); + vm_write_end(vma); } diff --git a/mm/mmap.c b/mm/mmap.c index ceb1c2c1b46b..eeafd0bc8b36 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -701,6 +701,30 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, long adjust_next = 0; int remove_next = 0; + /* + * Why using vm_raw_write*() functions here to avoid lockdep's warning ? + * + * Locked is complaining about a theoretical lock dependency, involving + * 3 locks: + * mapping->i_mmap_rwsem --> vma->vm_sequence --> fs_reclaim + * + * Here are the major path leading to this dependency : + * 1. __vma_adjust() mmap_sem -> vm_sequence -> i_mmap_rwsem + * 2. move_vmap() mmap_sem -> vm_sequence -> fs_reclaim + * 3. __alloc_pages_nodemask() fs_reclaim -> i_mmap_rwsem + * 4. unmap_mapping_range() i_mmap_rwsem -> vm_sequence + * + * So there is no way to solve this easily, especially because in + * unmap_mapping_range() the i_mmap_rwsem is grab while the impacted + * VMAs are not yet known. + * However, the way the vm_seq is used is guarantying that we will + * never block on it since we just check for its value and never wait + * for it to move, see vma_has_changed() and handle_speculative_fault(). + */ + vm_raw_write_begin(vma); + if (next) + vm_raw_write_begin(next); + if (next && !insert) { struct vm_area_struct *exporter = NULL, *importer = NULL; @@ -911,6 +935,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, anon_vma_merge(vma, next); mm->map_count--; mpol_put(vma_policy(next)); + vm_raw_write_end(next); kmem_cache_free(vm_area_cachep, next); /* * In mprotect's case 6 (see comments on vma_merge), @@ -925,6 +950,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, * "vma->vm_next" gap must be updated. */ next = vma->vm_next; + if (next) + vm_raw_write_begin(next); } else { /* * For the scope of the comment "next" and @@ -971,6 +998,10 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, if (insert && file) uprobe_mmap(insert); + if (next && next != vma) + vm_raw_write_end(next); + vm_raw_write_end(vma); + validate_mm(mm); return 0;