From patchwork Fri Nov 8 09:38:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11234421 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4E9A41515 for ; Fri, 8 Nov 2019 09:38:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 26266214DA for ; Fri, 8 Nov 2019 09:38:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 26266214DA Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B28B46B000D; Fri, 8 Nov 2019 04:38:28 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id ADB8A6B000E; Fri, 8 Nov 2019 04:38:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A0EF6B0010; Fri, 8 Nov 2019 04:38:28 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0054.hostedemail.com [216.40.44.54]) by kanga.kvack.org (Postfix) with ESMTP id 826806B000D for ; Fri, 8 Nov 2019 04:38:28 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 23915180AD806 for ; Fri, 8 Nov 2019 09:38:28 +0000 (UTC) X-FDA: 76132609896.08.front66_84180fb9a761f X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,:stable@vger.kernel.org::linux-kernel@vger.kernel.org:akaher@vmware.com:will.deacon@arm.com:punit.agrawal@arm.com:steve.capper@arm.com:kirill.shutemov@linux.intel.com:aneesh.kumar@linux.vnet.ibm.com:catalin.marinas@arm.com:n-horiguchi@ah.jp.nec.com:mark.rutland@arm.com:hillf.zj@alibaba-inc.com:mhocko@suse.com:mike.kravetz@oracle.com:akpm@linux-foundation.org:torvalds@linux-foundation.org:vbabka@suse.cz,RULES_HIT:30012:30054:30064:30070,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: front66_84180fb9a761f X-Filterd-Recvd-Size: 4274 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Fri, 8 Nov 2019 09:38:27 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0B306AE8A; Fri, 8 Nov 2019 09:38:24 +0000 (UTC) From: Vlastimil Babka To: stable@vger.kernel.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ajay Kaher , Will Deacon , Punit Agrawal , Steve Capper , "Kirill A . Shutemov" , "Aneesh Kumar K . V" , Catalin Marinas , Naoya Horiguchi , Mark Rutland , Hillf Danton , Michal Hocko , Mike Kravetz , Andrew Morton , Linus Torvalds , Vlastimil Babka Subject: [PATCH STABLE 4.4 1/8] mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages Date: Fri, 8 Nov 2019 10:38:07 +0100 Message-Id: <20191108093814.16032-2-vbabka@suse.cz> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191108093814.16032-1-vbabka@suse.cz> References: <20191108093814.16032-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Will Deacon commit a3e328556d41bb61c55f9dfcc62d6a826ea97b85 upstream. When operating on hugepages with DEBUG_VM enabled, the GUP code checks the compound head for each tail page prior to calling page_cache_add_speculative. This is broken, because on the fast-GUP path (where we don't hold any page table locks) we can be racing with a concurrent invocation of split_huge_page_to_list. split_huge_page_to_list deals with this race by using page_ref_freeze to freeze the page and force concurrent GUPs to fail whilst the component pages are modified. This modification includes clearing the compound_head field for the tail pages, so checking this prior to a successful call to page_cache_add_speculative can lead to false positives: In fact, page_cache_add_speculative *already* has this check once the page refcount has been successfully updated, so we can simply remove the broken calls to VM_BUG_ON_PAGE. Link: http://lkml.kernel.org/r/20170522133604.11392-2-punit.agrawal@arm.com Signed-off-by: Will Deacon Signed-off-by: Punit Agrawal Acked-by: Steve Capper Acked-by: Kirill A. Shutemov Cc: Aneesh Kumar K.V Cc: Catalin Marinas Cc: Naoya Horiguchi Cc: Mark Rutland Cc: Hillf Danton Cc: Michal Hocko Cc: Mike Kravetz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Vlastimil Babka --- mm/gup.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 2cd3b31e3666..6f9088cb8ebe 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1134,7 +1134,6 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT); tail = page; do { - VM_BUG_ON_PAGE(compound_head(page) != head, page); pages[*nr] = page; (*nr)++; page++; @@ -1181,7 +1180,6 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT); tail = page; do { - VM_BUG_ON_PAGE(compound_head(page) != head, page); pages[*nr] = page; (*nr)++; page++; @@ -1224,7 +1222,6 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, page = head + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT); tail = page; do { - VM_BUG_ON_PAGE(compound_head(page) != head, page); pages[*nr] = page; (*nr)++; page++; From patchwork Fri Nov 8 09:38:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11234475 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 906A613BD for ; Fri, 8 Nov 2019 09:53:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6734321D6C for ; Fri, 8 Nov 2019 09:53:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6734321D6C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 63BA56B000D; Fri, 8 Nov 2019 04:53:49 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5EAFC6B026A; Fri, 8 Nov 2019 04:53:49 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 500D66B026B; Fri, 8 Nov 2019 04:53:49 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0039.hostedemail.com [216.40.44.39]) by kanga.kvack.org (Postfix) with ESMTP id 37F8A6B000D for ; Fri, 8 Nov 2019 04:53:49 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id EC43A824999B for ; Fri, 8 Nov 2019 09:53:48 +0000 (UTC) X-FDA: 76132648536.15.price71_78931ed4b233f X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,:stable@vger.kernel.org::linux-kernel@vger.kernel.org:akaher@vmware.com:punit.agrawal@arm.com:steve.capper@arm.com:mhocko@suse.com:kirill.shutemov@linux.intel.com:aneesh.kumar@linux.vnet.ibm.com:catalin.marinas@arm.com:will.deacon@arm.com:n-horiguchi@ah.jp.nec.com:mark.rutland@arm.com:hillf.zj@alibaba-inc.com:mike.kravetz@oracle.com:akpm@linux-foundation.org:torvalds@linux-foundation.org:vbabka@suse.cz,RULES_HIT:30036:30054:30064,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:3,LUA_SUMMARY:none X-HE-Tag: price71_78931ed4b233f X-Filterd-Recvd-Size: 5266 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Fri, 8 Nov 2019 09:53:48 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0B21BAE7F; Fri, 8 Nov 2019 09:38:24 +0000 (UTC) From: Vlastimil Babka To: stable@vger.kernel.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ajay Kaher , Punit Agrawal , Steve Capper , Michal Hocko , "Kirill A. Shutemov" , "Aneesh Kumar K . V" , Catalin Marinas , Will Deacon , Naoya Horiguchi , Mark Rutland , Hillf Danton , Mike Kravetz , Andrew Morton , Linus Torvalds , Vlastimil Babka Subject: [PATCH STABLE 4.4 2/8] mm, gup: ensure real head page is ref-counted when using hugepages Date: Fri, 8 Nov 2019 10:38:08 +0100 Message-Id: <20191108093814.16032-3-vbabka@suse.cz> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191108093814.16032-1-vbabka@suse.cz> References: <20191108093814.16032-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Punit Agrawal commit d63206ee32b6e64b0e12d46e5d6004afd9913713 upstream. When speculatively taking references to a hugepage using page_cache_add_speculative() in gup_huge_pmd(), it is assumed that the page returned by pmd_page() is the head page. Although normally true, this assumption doesn't hold when the hugepage comprises of successive page table entries such as when using contiguous bit on arm64 at PTE or PMD levels. This can be addressed by ensuring that the page passed to page_cache_add_speculative() is the real head or by de-referencing the head page within the function. We take the first approach to keep the usage pattern aligned with page_cache_get_speculative() where users already pass the appropriate page, i.e., the de-referenced head. Apply the same logic to fix gup_huge_[pud|pgd]() as well. [punit.agrawal@arm.com: fix arm64 ltp failure] Link: http://lkml.kernel.org/r/20170619170145.25577-5-punit.agrawal@arm.com Link: http://lkml.kernel.org/r/20170522133604.11392-3-punit.agrawal@arm.com Signed-off-by: Punit Agrawal Acked-by: Steve Capper Cc: Michal Hocko Cc: "Kirill A. Shutemov" Cc: Aneesh Kumar K.V Cc: Catalin Marinas Cc: Will Deacon Cc: Naoya Horiguchi Cc: Mark Rutland Cc: Hillf Danton Cc: Mike Kravetz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Vlastimil Babka --- mm/gup.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 6f9088cb8ebe..71e9d0093a35 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1130,8 +1130,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, return 0; refs = 0; - head = pmd_page(orig); - page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT); + page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); tail = page; do { pages[*nr] = page; @@ -1140,6 +1139,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); + head = compound_head(pmd_page(orig)); if (!page_cache_add_speculative(head, refs)) { *nr -= refs; return 0; @@ -1176,8 +1176,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, return 0; refs = 0; - head = pud_page(orig); - page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT); + page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); tail = page; do { pages[*nr] = page; @@ -1186,6 +1185,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); + head = compound_head(pud_page(orig)); if (!page_cache_add_speculative(head, refs)) { *nr -= refs; return 0; @@ -1218,8 +1218,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, return 0; refs = 0; - head = pgd_page(orig); - page = head + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT); + page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT); tail = page; do { pages[*nr] = page; @@ -1228,6 +1227,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); + head = compound_head(pgd_page(orig)); if (!page_cache_add_speculative(head, refs)) { *nr -= refs; return 0; From patchwork Fri Nov 8 09:38:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11234417 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF38E14E5 for ; Fri, 8 Nov 2019 09:38:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BD01121D82 for ; Fri, 8 Nov 2019 09:38:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD01121D82 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 128276B000A; Fri, 8 Nov 2019 04:38:28 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 02B816B000D; Fri, 8 Nov 2019 04:38:27 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D99CB6B000E; Fri, 8 Nov 2019 04:38:27 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0115.hostedemail.com [216.40.44.115]) by kanga.kvack.org (Postfix) with ESMTP id B81776B000C for ; Fri, 8 Nov 2019 04:38:27 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 76ADD8249980 for ; Fri, 8 Nov 2019 09:38:27 +0000 (UTC) X-FDA: 76132609854.22.balls06_83edcdec21119 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,:stable@vger.kernel.org::linux-kernel@vger.kernel.org:akaher@vmware.com:torvalds@linux-foundation.org:willy@infradead.org:jannh@google.com:stable@kernel.org:vbabka@suse.cz,RULES_HIT:30034:30054:30070,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:193,LUA_SUMMARY:none X-HE-Tag: balls06_83edcdec21119 X-Filterd-Recvd-Size: 4172 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Fri, 8 Nov 2019 09:38:26 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 56883AC79; Fri, 8 Nov 2019 09:38:24 +0000 (UTC) From: Vlastimil Babka To: stable@vger.kernel.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ajay Kaher , Linus Torvalds , Matthew Wilcox , Jann Horn , stable@kernel.org, Vlastimil Babka Subject: [PATCH STABLE 4.4 3/8] mm: make page ref count overflow check tighter and more explicit Date: Fri, 8 Nov 2019 10:38:09 +0100 Message-Id: <20191108093814.16032-4-vbabka@suse.cz> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191108093814.16032-1-vbabka@suse.cz> References: <20191108093814.16032-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Linus Torvalds commit f958d7b528b1b40c44cfda5eabe2d82760d868c3 upstream. [ 4.4 backport: page_ref_count() doesn't exist, introduce it to reduce churn. Change also two similar checks in mm/internal.h ] We have a VM_BUG_ON() to check that the page reference count doesn't underflow (or get close to overflow) by checking the sign of the count. That's all fine, but we actually want to allow people to use a "get page ref unless it's already very high" helper function, and we want that one to use the sign of the page ref (without triggering this VM_BUG_ON). Change the VM_BUG_ON to only check for small underflows (or _very_ close to overflowing), and ignore overflows which have strayed into negative territory. Acked-by: Matthew Wilcox Cc: Jann Horn Cc: stable@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Vlastimil Babka --- include/linux/mm.h | 11 ++++++++++- mm/internal.h | 5 +++-- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index ed653ba47c46..997edfcb0a30 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -488,6 +488,15 @@ static inline void get_huge_page_tail(struct page *page) extern bool __get_page_tail(struct page *page); +static inline int page_ref_count(struct page *page) +{ + return atomic_read(&page->_count); +} + +/* 127: arbitrary random number, small enough to assemble well */ +#define page_ref_zero_or_close_to_overflow(page) \ + ((unsigned int) page_ref_count(page) + 127u <= 127u) + static inline void get_page(struct page *page) { if (unlikely(PageTail(page))) @@ -497,7 +506,7 @@ static inline void get_page(struct page *page) * Getting a normal page or the head of a compound page * requires to already have an elevated page->_count. */ - VM_BUG_ON_PAGE(atomic_read(&page->_count) <= 0, page); + VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page); atomic_inc(&page->_count); } diff --git a/mm/internal.h b/mm/internal.h index f63f4393d633..a6639c72780a 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -81,7 +81,8 @@ static inline void __get_page_tail_foll(struct page *page, * speculative page access (like in * page_cache_get_speculative()) on tail pages. */ - VM_BUG_ON_PAGE(atomic_read(&compound_head(page)->_count) <= 0, page); + VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(compound_head(page)), + page); if (get_page_head) atomic_inc(&compound_head(page)->_count); get_huge_page_tail(page); @@ -106,7 +107,7 @@ static inline void get_page_foll(struct page *page) * Getting a normal page or the head of a compound page * requires to already have an elevated page->_count. */ - VM_BUG_ON_PAGE(atomic_read(&page->_count) <= 0, page); + VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page); atomic_inc(&page->_count); } } From patchwork Fri Nov 8 09:38:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11234409 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0D9CC14E5 for ; Fri, 8 Nov 2019 09:38:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D7A18222CE for ; Fri, 8 Nov 2019 09:38:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D7A18222CE Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 005826B0005; Fri, 8 Nov 2019 04:38:27 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EF8446B0006; Fri, 8 Nov 2019 04:38:26 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0E356B0007; Fri, 8 Nov 2019 04:38:26 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0160.hostedemail.com [216.40.44.160]) by kanga.kvack.org (Postfix) with ESMTP id CB4106B0005 for ; Fri, 8 Nov 2019 04:38:26 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 80145181AEF1D for ; Fri, 8 Nov 2019 09:38:26 +0000 (UTC) X-FDA: 76132609812.23.word34_83dba08b87505 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,:stable@vger.kernel.org::linux-kernel@vger.kernel.org:akaher@vmware.com:torvalds@linux-foundation.org:willy@infradead.org:jannh@google.com:stable@kernel.org:vbabka@suse.cz,RULES_HIT:30012:30054:30070,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.14.6.2 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: word34_83dba08b87505 X-Filterd-Recvd-Size: 3525 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Fri, 8 Nov 2019 09:38:25 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 56815AC67; Fri, 8 Nov 2019 09:38:24 +0000 (UTC) From: Vlastimil Babka To: stable@vger.kernel.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ajay Kaher , Linus Torvalds , Matthew Wilcox , Jann Horn , stable@kernel.org, Vlastimil Babka Subject: [PATCH STABLE 4.4 4/8] mm: add 'try_get_page()' helper function Date: Fri, 8 Nov 2019 10:38:10 +0100 Message-Id: <20191108093814.16032-5-vbabka@suse.cz> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191108093814.16032-1-vbabka@suse.cz> References: <20191108093814.16032-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Linus Torvalds commit 88b1a17dfc3ed7728316478fae0f5ad508f50397 upstream. [ 4.4 backport: get_page() is more complicated due to special handling of tail pages via __get_page_tail(). But in all cases, eventually the compound head page's refcount is incremented. So try_get_page() just checks compound head's refcount for overflow and then simply calls get_page(). ] This is the same as the traditional 'get_page()' function, but instead of unconditionally incrementing the reference count of the page, it only does so if the count was "safe". It returns whether the reference count was incremented (and is marked __must_check, since the caller obviously has to be aware of it). Also like 'get_page()', you can't use this function unless you already had a reference to the page. The intent is that you can use this exactly like get_page(), but in situations where you want to limit the maximum reference count. The code currently does an unconditional WARN_ON_ONCE() if we ever hit the reference count issues (either zero or negative), as a notification that the conditional non-increment actually happened. NOTE! The count access for the "safety" check is inherently racy, but that doesn't matter since the buffer we use is basically half the range of the reference count (ie we look at the sign of the count). Acked-by: Matthew Wilcox Cc: Jann Horn Cc: stable@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Vlastimil Babka --- include/linux/mm.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 997edfcb0a30..78358aeb7732 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -510,6 +510,21 @@ static inline void get_page(struct page *page) atomic_inc(&page->_count); } +static inline __must_check bool try_get_page(struct page *page) +{ + struct page *head = compound_head(page); + + /* + * get_page() increases always head page's refcount, either directly or + * via __get_page_tail() for tail page, so we check that + */ + if (WARN_ON_ONCE(page_ref_count(head) <= 0)) + return false; + + get_page(page); + return true; +} + static inline struct page *virt_to_head_page(const void *x) { struct page *page = virt_to_page(x); From patchwork Fri Nov 8 09:38:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11234415 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D3F031515 for ; Fri, 8 Nov 2019 09:38:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 90312214DA for ; Fri, 8 Nov 2019 09:38:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 90312214DA Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E35C56B0008; Fri, 8 Nov 2019 04:38:27 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DE6456B000A; Fri, 8 Nov 2019 04:38:27 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC6046B000D; Fri, 8 Nov 2019 04:38:27 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0147.hostedemail.com [216.40.44.147]) by kanga.kvack.org (Postfix) with ESMTP id 9B8BB6B000A for ; Fri, 8 Nov 2019 04:38:27 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 57B3ABBD1 for ; Fri, 8 Nov 2019 09:38:27 +0000 (UTC) X-FDA: 76132609854.06.guide83_83f08491c5a4f X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,:stable@vger.kernel.org::linux-kernel@vger.kernel.org:akaher@vmware.com:torvalds@linux-foundation.org:jannh@google.com:willy@infradead.org:stable@kernel.org:vbabka@suse.cz,RULES_HIT:30003:30054:30070,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:197,LUA_SUMMARY:none X-HE-Tag: guide83_83f08491c5a4f X-Filterd-Recvd-Size: 10420 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Fri, 8 Nov 2019 09:38:26 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5B32AAD77; Fri, 8 Nov 2019 09:38:24 +0000 (UTC) From: Vlastimil Babka To: stable@vger.kernel.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ajay Kaher , Linus Torvalds , Jann Horn , Matthew Wilcox , stable@kernel.org, Vlastimil Babka Subject: [PATCH STABLE 4.4 5/8] mm: prevent get_user_pages() from overflowing page refcount Date: Fri, 8 Nov 2019 10:38:11 +0100 Message-Id: <20191108093814.16032-6-vbabka@suse.cz> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191108093814.16032-1-vbabka@suse.cz> References: <20191108093814.16032-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Linus Torvalds commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64 upstream. [ 4.4 backport: there's get_page_foll(), so add try_get_page()-like checks in there, enabled by a new parameter, which is false where upstream patch doesn't replace get_page() with try_get_page() (the THP and hugetlb callers). In gup_pte_range(), we don't expect tail pages, so just check page ref count instead of try_get_compound_head() Also patch arch-specific variants of gup.c for x86 and s390, leaving mips, sh, sparc alone ] If the page refcount wraps around past zero, it will be freed while there are still four billion references to it. One of the possible avenues for an attacker to try to make this happen is by doing direct IO on a page multiple times. This patch makes get_user_pages() refuse to take a new page reference if there are already more than two billion references to the page. Reported-by: Jann Horn Acked-by: Matthew Wilcox Cc: stable@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Vlastimil Babka --- arch/s390/mm/gup.c | 6 ++++-- arch/x86/mm/gup.c | 9 ++++++++- mm/gup.c | 39 +++++++++++++++++++++++++++++++-------- mm/huge_memory.c | 2 +- mm/hugetlb.c | 18 ++++++++++++++++-- mm/internal.h | 12 +++++++++--- 6 files changed, 69 insertions(+), 17 deletions(-) diff --git a/arch/s390/mm/gup.c b/arch/s390/mm/gup.c index 7ad41be8b373..bdaa5f7b652c 100644 --- a/arch/s390/mm/gup.c +++ b/arch/s390/mm/gup.c @@ -37,7 +37,8 @@ static inline int gup_pte_range(pmd_t *pmdp, pmd_t pmd, unsigned long addr, return 0; VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); - if (!page_cache_get_speculative(page)) + if (unlikely(WARN_ON_ONCE(page_ref_count(page) < 0) + || !page_cache_get_speculative(page))) return 0; if (unlikely(pte_val(pte) != pte_val(*ptep))) { put_page(page); @@ -76,7 +77,8 @@ static inline int gup_huge_pmd(pmd_t *pmdp, pmd_t pmd, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); - if (!page_cache_add_speculative(head, refs)) { + if (unlikely(WARN_ON_ONCE(page_ref_count(head) < 0) + || !page_cache_add_speculative(head, refs))) { *nr -= refs; return 0; } diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c index 7d2542ad346a..6612d532e42e 100644 --- a/arch/x86/mm/gup.c +++ b/arch/x86/mm/gup.c @@ -95,7 +95,10 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long addr, } VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); - get_page(page); + if (unlikely(!try_get_page(page))) { + pte_unmap(ptep); + return 0; + } SetPageReferenced(page); pages[*nr] = page; (*nr)++; @@ -132,6 +135,8 @@ static noinline int gup_huge_pmd(pmd_t pmd, unsigned long addr, refs = 0; head = pmd_page(pmd); + if (WARN_ON_ONCE(page_ref_count(head) <= 0)) + return 0; page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT); do { VM_BUG_ON_PAGE(compound_head(page) != head, page); @@ -208,6 +213,8 @@ static noinline int gup_huge_pud(pud_t pud, unsigned long addr, refs = 0; head = pud_page(pud); + if (WARN_ON_ONCE(page_ref_count(head) <= 0)) + return 0; page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT); do { VM_BUG_ON_PAGE(compound_head(page) != head, page); diff --git a/mm/gup.c b/mm/gup.c index 71e9d0093a35..fc8e2dca99fc 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -127,7 +127,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, } if (flags & FOLL_GET) - get_page_foll(page); + if (!get_page_foll(page, true)) { + page = ERR_PTR(-ENOMEM); + goto out; + } if (flags & FOLL_TOUCH) { if ((flags & FOLL_WRITE) && !pte_dirty(pte) && !PageDirty(page)) @@ -289,7 +292,10 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address, goto unmap; *page = pte_page(*pte); } - get_page(*page); + if (unlikely(!try_get_page(*page))) { + ret = -ENOMEM; + goto unmap; + } out: ret = 0; unmap: @@ -1053,6 +1059,20 @@ struct page *get_dump_page(unsigned long addr) */ #ifdef CONFIG_HAVE_GENERIC_RCU_GUP +/* + * Return the compund head page with ref appropriately incremented, + * or NULL if that failed. + */ +static inline struct page *try_get_compound_head(struct page *page, int refs) +{ + struct page *head = compound_head(page); + if (WARN_ON_ONCE(page_ref_count(head) < 0)) + return NULL; + if (unlikely(!page_cache_add_speculative(head, refs))) + return NULL; + return head; +} + #ifdef __HAVE_ARCH_PTE_SPECIAL static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, int write, struct page **pages, int *nr) @@ -1083,6 +1103,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); + if (WARN_ON_ONCE(page_ref_count(page) < 0)) + goto pte_unmap; + if (!page_cache_get_speculative(page)) goto pte_unmap; @@ -1139,8 +1162,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); - head = compound_head(pmd_page(orig)); - if (!page_cache_add_speculative(head, refs)) { + head = try_get_compound_head(pmd_page(orig), refs); + if (!head) { *nr -= refs; return 0; } @@ -1185,8 +1208,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); - head = compound_head(pud_page(orig)); - if (!page_cache_add_speculative(head, refs)) { + head = try_get_compound_head(pud_page(orig), refs); + if (!head) { *nr -= refs; return 0; } @@ -1227,8 +1250,8 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); - head = compound_head(pgd_page(orig)); - if (!page_cache_add_speculative(head, refs)) { + head = try_get_compound_head(pgd_page(orig), refs); + if (!head) { *nr -= refs; return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 465786cd6490..6087277981a6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1322,7 +1322,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; VM_BUG_ON_PAGE(!PageCompound(page), page); if (flags & FOLL_GET) - get_page_foll(page); + get_page_foll(page, false); out: return page; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index fd932e7a25dd..b4a8a18fa3a5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3886,6 +3886,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long vaddr = *position; unsigned long remainder = *nr_pages; struct hstate *h = hstate_vma(vma); + int err = -EFAULT; while (vaddr < vma->vm_end && remainder) { pte_t *pte; @@ -3957,10 +3958,23 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT; page = pte_page(huge_ptep_get(pte)); + + /* + * Instead of doing 'try_get_page()' below in the same_page + * loop, just check the count once here. + */ + if (unlikely(page_count(page) <= 0)) { + if (pages) { + spin_unlock(ptl); + remainder = 0; + err = -ENOMEM; + break; + } + } same_page: if (pages) { pages[i] = mem_map_offset(page, pfn_offset); - get_page_foll(pages[i]); + get_page_foll(pages[i], false); } if (vmas) @@ -3983,7 +3997,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, *nr_pages = remainder; *position = vaddr; - return i ? i : -EFAULT; + return i ? i : err; } unsigned long hugetlb_change_protection(struct vm_area_struct *vma, diff --git a/mm/internal.h b/mm/internal.h index a6639c72780a..b52041969d06 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -93,23 +93,29 @@ static inline void __get_page_tail_foll(struct page *page, * follow_page() and it must be called while holding the proper PT * lock while the pte (or pmd_trans_huge) is still mapping the page. */ -static inline void get_page_foll(struct page *page) +static inline bool get_page_foll(struct page *page, bool check) { - if (unlikely(PageTail(page))) + if (unlikely(PageTail(page))) { /* * This is safe only because * __split_huge_page_refcount() can't run under * get_page_foll() because we hold the proper PT lock. */ + if (check && WARN_ON_ONCE( + page_ref_count(compound_head(page)) <= 0)) + return false; __get_page_tail_foll(page, true); - else { + } else { /* * Getting a normal page or the head of a compound page * requires to already have an elevated page->_count. */ VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page); + if (check && WARN_ON_ONCE(page_ref_count(page) <= 0)) + return false; atomic_inc(&page->_count); } + return true; } extern unsigned long highest_memmap_pfn; From patchwork Fri Nov 8 09:38:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11234419 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 925761515 for ; Fri, 8 Nov 2019 09:38:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 69B62214DA for ; Fri, 8 Nov 2019 09:38:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 69B62214DA Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 55CB16B000C; Fri, 8 Nov 2019 04:38:28 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 50E4A6B000D; Fri, 8 Nov 2019 04:38:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44A666B000E; Fri, 8 Nov 2019 04:38:28 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0026.hostedemail.com [216.40.44.26]) by kanga.kvack.org (Postfix) with ESMTP id 2712C6B000C for ; Fri, 8 Nov 2019 04:38:28 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id CCBE2181AEF1D for ; Fri, 8 Nov 2019 09:38:27 +0000 (UTC) X-FDA: 76132609854.03.low13_8410cec285f53 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,:stable@vger.kernel.org::linux-kernel@vger.kernel.org:akaher@vmware.com:mszeredi@redhat.com:viro@zeniv.linux.org.uk:vbabka@suse.cz,RULES_HIT:30054:30079,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:26,LUA_SUMMARY:none X-HE-Tag: low13_8410cec285f53 X-Filterd-Recvd-Size: 3359 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Fri, 8 Nov 2019 09:38:27 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B2AA8AE68; Fri, 8 Nov 2019 09:38:24 +0000 (UTC) From: Vlastimil Babka To: stable@vger.kernel.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ajay Kaher , Miklos Szeredi , Al Viro , Vlastimil Babka Subject: [PATCH STABLE 4.4 6/8] pipe: add pipe_buf_get() helper Date: Fri, 8 Nov 2019 10:38:12 +0100 Message-Id: <20191108093814.16032-7-vbabka@suse.cz> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191108093814.16032-1-vbabka@suse.cz> References: <20191108093814.16032-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miklos Szeredi commit 7bf2d1df80822ec056363627e2014990f068f7aa upstream. Signed-off-by: Miklos Szeredi Signed-off-by: Al Viro Signed-off-by: Vlastimil Babka --- fs/fuse/dev.c | 2 +- fs/splice.c | 4 ++-- include/linux/pipe_fs_i.h | 11 +++++++++++ 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index f5d2d2340b44..36a5df92eb9c 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -2052,7 +2052,7 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, pipe->curbuf = (pipe->curbuf + 1) & (pipe->buffers - 1); pipe->nrbufs--; } else { - ibuf->ops->get(pipe, ibuf); + pipe_buf_get(pipe, ibuf); *obuf = *ibuf; obuf->flags &= ~PIPE_BUF_FLAG_GIFT; obuf->len = rem; diff --git a/fs/splice.c b/fs/splice.c index 8398974e1538..fde126369966 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1876,7 +1876,7 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, * Get a reference to this pipe buffer, * so we can copy the contents over. */ - ibuf->ops->get(ipipe, ibuf); + pipe_buf_get(ipipe, ibuf); *obuf = *ibuf; /* @@ -1948,7 +1948,7 @@ static int link_pipe(struct pipe_inode_info *ipipe, * Get a reference to this pipe buffer, * so we can copy the contents over. */ - ibuf->ops->get(ipipe, ibuf); + pipe_buf_get(ipipe, ibuf); obuf = opipe->bufs + nbuf; *obuf = *ibuf; diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 24f5470d3944..10876f3cb3da 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -115,6 +115,17 @@ struct pipe_buf_operations { void (*get)(struct pipe_inode_info *, struct pipe_buffer *); }; +/** + * pipe_buf_get - get a reference to a pipe_buffer + * @pipe: the pipe that the buffer belongs to + * @buf: the buffer to get a reference to + */ +static inline void pipe_buf_get(struct pipe_inode_info *pipe, + struct pipe_buffer *buf) +{ + buf->ops->get(pipe, buf); +} + /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual memory allocation, whereas PIPE_BUF makes atomicity guarantees. */ #define PIPE_SIZE PAGE_SIZE From patchwork Fri Nov 8 09:38:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11234411 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8032B14E5 for ; Fri, 8 Nov 2019 09:38:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4A8C9218AE for ; Fri, 8 Nov 2019 09:38:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4A8C9218AE Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 472346B0006; Fri, 8 Nov 2019 04:38:27 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3FC9D6B0007; Fri, 8 Nov 2019 04:38:27 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C3126B0008; Fri, 8 Nov 2019 04:38:27 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0E50B6B0007 for ; Fri, 8 Nov 2019 04:38:27 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id BAEA7180AD806 for ; Fri, 8 Nov 2019 09:38:26 +0000 (UTC) X-FDA: 76132609812.15.stem32_83e2003958508 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,:stable@vger.kernel.org::linux-kernel@vger.kernel.org:akaher@vmware.com:willy@infradead.org:jannh@google.com:stable@kernel.org:torvalds@linux-foundation.org:vbabka@suse.cz,RULES_HIT:30054:30070:30079,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: stem32_83e2003958508 X-Filterd-Recvd-Size: 6649 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Fri, 8 Nov 2019 09:38:26 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 6A198AE4B; Fri, 8 Nov 2019 09:38:24 +0000 (UTC) From: Vlastimil Babka To: stable@vger.kernel.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ajay Kaher , Matthew Wilcox , Jann Horn , stable@kernel.org, Linus Torvalds , Vlastimil Babka Subject: [PATCH STABLE 4.4 7/8] fs: prevent page refcount overflow in pipe_buf_get Date: Fri, 8 Nov 2019 10:38:13 +0100 Message-Id: <20191108093814.16032-8-vbabka@suse.cz> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191108093814.16032-1-vbabka@suse.cz> References: <20191108093814.16032-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Matthew Wilcox commit 15fab63e1e57be9fdb5eec1bbc5916e9825e9acb upstream. Change pipe_buf_get() to return a bool indicating whether it succeeded in raising the refcount of the page (if the thing in the pipe is a page). This removes another mechanism for overflowing the page refcount. All callers converted to handle a failure. Reported-by: Jann Horn Signed-off-by: Matthew Wilcox Cc: stable@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Vlastimil Babka --- fs/fuse/dev.c | 12 ++++++------ fs/pipe.c | 4 ++-- fs/splice.c | 12 ++++++++++-- include/linux/pipe_fs_i.h | 10 ++++++---- kernel/trace/trace.c | 6 +++++- 5 files changed, 29 insertions(+), 15 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 36a5df92eb9c..16891f5364af 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -2031,10 +2031,8 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, rem += pipe->bufs[(pipe->curbuf + idx) & (pipe->buffers - 1)].len; ret = -EINVAL; - if (rem < len) { - pipe_unlock(pipe); - goto out; - } + if (rem < len) + goto out_free; rem = len; while (rem) { @@ -2052,7 +2050,9 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, pipe->curbuf = (pipe->curbuf + 1) & (pipe->buffers - 1); pipe->nrbufs--; } else { - pipe_buf_get(pipe, ibuf); + if (!pipe_buf_get(pipe, ibuf)) + goto out_free; + *obuf = *ibuf; obuf->flags &= ~PIPE_BUF_FLAG_GIFT; obuf->len = rem; @@ -2075,13 +2075,13 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, ret = fuse_dev_do_write(fud, &cs, len); pipe_lock(pipe); +out_free: for (idx = 0; idx < nbuf; idx++) { struct pipe_buffer *buf = &bufs[idx]; buf->ops->release(pipe, buf); } pipe_unlock(pipe); -out: kfree(bufs); return ret; } diff --git a/fs/pipe.c b/fs/pipe.c index 1e7263bb837a..6534470a6c19 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -178,9 +178,9 @@ EXPORT_SYMBOL(generic_pipe_buf_steal); * in the tee() system call, when we duplicate the buffers in one * pipe into another. */ -void generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf) +bool generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { - page_cache_get(buf->page); + return try_get_page(buf->page); } EXPORT_SYMBOL(generic_pipe_buf_get); diff --git a/fs/splice.c b/fs/splice.c index fde126369966..57ccc583a172 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1876,7 +1876,11 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, * Get a reference to this pipe buffer, * so we can copy the contents over. */ - pipe_buf_get(ipipe, ibuf); + if (!pipe_buf_get(ipipe, ibuf)) { + if (ret == 0) + ret = -EFAULT; + break; + } *obuf = *ibuf; /* @@ -1948,7 +1952,11 @@ static int link_pipe(struct pipe_inode_info *ipipe, * Get a reference to this pipe buffer, * so we can copy the contents over. */ - pipe_buf_get(ipipe, ibuf); + if (!pipe_buf_get(ipipe, ibuf)) { + if (ret == 0) + ret = -EFAULT; + break; + } obuf = opipe->bufs + nbuf; *obuf = *ibuf; diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 10876f3cb3da..0b28b65c12fb 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -112,18 +112,20 @@ struct pipe_buf_operations { /* * Get a reference to the pipe buffer. */ - void (*get)(struct pipe_inode_info *, struct pipe_buffer *); + bool (*get)(struct pipe_inode_info *, struct pipe_buffer *); }; /** * pipe_buf_get - get a reference to a pipe_buffer * @pipe: the pipe that the buffer belongs to * @buf: the buffer to get a reference to + * + * Return: %true if the reference was successfully obtained. */ -static inline void pipe_buf_get(struct pipe_inode_info *pipe, +static inline __must_check bool pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { - buf->ops->get(pipe, buf); + return buf->ops->get(pipe, buf); } /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual @@ -148,7 +150,7 @@ struct pipe_inode_info *alloc_pipe_info(void); void free_pipe_info(struct pipe_inode_info *); /* Generic pipe buffer ops functions */ -void generic_pipe_buf_get(struct pipe_inode_info *, struct pipe_buffer *); +bool generic_pipe_buf_get(struct pipe_inode_info *, struct pipe_buffer *); int generic_pipe_buf_confirm(struct pipe_inode_info *, struct pipe_buffer *); int generic_pipe_buf_steal(struct pipe_inode_info *, struct pipe_buffer *); void generic_pipe_buf_release(struct pipe_inode_info *, struct pipe_buffer *); diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c6e4e3e7f685..32cc4ea93ad6 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -5748,12 +5748,16 @@ static void buffer_pipe_buf_release(struct pipe_inode_info *pipe, buf->private = 0; } -static void buffer_pipe_buf_get(struct pipe_inode_info *pipe, +static bool buffer_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { struct buffer_ref *ref = (struct buffer_ref *)buf->private; + if (ref->ref > INT_MAX/2) + return false; + ref->ref++; + return true; } /* Pipe buffer operations for a buffer. */ From patchwork Fri Nov 8 09:38:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11234413 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 224E81515 for ; Fri, 8 Nov 2019 09:38:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ED738222C5 for ; Fri, 8 Nov 2019 09:38:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ED738222C5 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 81CD66B0007; Fri, 8 Nov 2019 04:38:27 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7CE236B0008; Fri, 8 Nov 2019 04:38:27 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6BDB76B000C; Fri, 8 Nov 2019 04:38:27 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0136.hostedemail.com [216.40.44.136]) by kanga.kvack.org (Postfix) with ESMTP id 535C36B0008 for ; Fri, 8 Nov 2019 04:38:27 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 0F6F5BBC8 for ; Fri, 8 Nov 2019 09:38:27 +0000 (UTC) X-FDA: 76132609854.14.shape30_83ee7115f2526 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,:stable@vger.kernel.org::linux-kernel@vger.kernel.org:akaher@vmware.com:vbabka@suse.cz:osalvador@suse.de:tglx@linutronix.de:mingo@redhat.com:peterz@infradead.org:jgross@suse.com:kirill.shutemov@linux.intel.com:vkuznets@redhat.com:torvalds@linux-foundation.org:bp@alien8.de:dave.hansen@linux.intel.com:luto@kernel.org,RULES_HIT:30012:30045:30054:30091,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:29,LUA_SUMMARY:none X-HE-Tag: shape30_83ee7115f2526 X-Filterd-Recvd-Size: 5031 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Fri, 8 Nov 2019 09:38:26 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 99ECAAE61; Fri, 8 Nov 2019 09:38:24 +0000 (UTC) From: Vlastimil Babka To: stable@vger.kernel.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ajay Kaher , Vlastimil Babka , Oscar Salvador , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Juergen Gross , "Kirill A . Shutemov" , Vitaly Kuznetsov , Linus Torvalds , Borislav Petkov , Dave Hansen , Andy Lutomirski Subject: [PATCH STABLE 4.4 8/8] x86, mm, gup: prevent get_page() race with munmap in paravirt guest Date: Fri, 8 Nov 2019 10:38:14 +0100 Message-Id: <20191108093814.16032-9-vbabka@suse.cz> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191108093814.16032-1-vbabka@suse.cz> References: <20191108093814.16032-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 version of get_user_pages_fast() relies on disabled interrupts to synchronize gup_pte_range() between gup_get_pte(ptep); and get_page() against a parallel munmap. The munmap side nulls the pte, then flushes TLBs, then releases the page. As TLB flush is done synchronously via IPI disabling interrupts blocks the page release, and get_page(), which assumes existing reference on page, is thus safe. However when TLB flush is done by a hypercall, e.g. in a Xen PV guest, there is no blocking thanks to disabled interrupts, and get_page() can succeed on a page that was already freed or even reused. We have recently seen this happen with our 4.4 and 4.12 based kernels, with userspace (java) that exits a thread, where mm_release() performs a futex_wake() on tsk->clear_child_tid, and another thread in parallel unmaps the page where tsk->clear_child_tid points to. The spurious get_page() succeeds, but futex code immediately releases the page again, while it's already on a freelist. Symptoms include a bad page state warning, general protection faults acessing a poisoned list prev/next pointer in the freelist, or free page pcplists of two cpus joined together in a single list. Oscar has also reproduced this scenario, with a patch inserting delays before the get_page() to make the race window larger. Fix this by removing the dependency on TLB flush interrupts the same way as the generic get_user_pages_fast() code by using page_cache_add_speculative() and revalidating the PTE contents after pinning the page. Mainline is safe since 4.13 where the x86 gup code was removed in favor of the common code. Accessing the page table itself safely also relies on disabled interrupts and TLB flush IPIs that don't happen with hypercalls, which was acknowledged in commit 9e52fc2b50de ("x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)"). That commit with follups should also be backported for full safety, although our reproducer didn't hit a problem without that backport. Reproduced-by: Oscar Salvador Signed-off-by: Vlastimil Babka Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juergen Gross Cc: Kirill A. Shutemov Cc: Vitaly Kuznetsov Cc: Linus Torvalds Cc: Borislav Petkov Cc: Dave Hansen Cc: Andy Lutomirski Signed-off-by: Vlastimil Babka --- arch/x86/mm/gup.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c index 6612d532e42e..6379a4883c0a 100644 --- a/arch/x86/mm/gup.c +++ b/arch/x86/mm/gup.c @@ -9,6 +9,7 @@ #include #include #include +#include #include @@ -95,10 +96,23 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long addr, } VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); - if (unlikely(!try_get_page(page))) { + + if (WARN_ON_ONCE(page_ref_count(page) < 0)) { + pte_unmap(ptep); + return 0; + } + + if (!page_cache_get_speculative(page)) { pte_unmap(ptep); return 0; } + + if (unlikely(pte_val(pte) != pte_val(*ptep))) { + put_page(page); + pte_unmap(ptep); + return 0; + } + SetPageReferenced(page); pages[*nr] = page; (*nr)++;