From patchwork Wed Oct 9 00:44:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ajay Kaher X-Patchwork-Id: 11179887 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1B915139A for ; Tue, 8 Oct 2019 16:43:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E650B21721 for ; Tue, 8 Oct 2019 16:43:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E650B21721 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=vmware.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 29BBD8E0006; Tue, 8 Oct 2019 12:43:47 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 272E38E0003; Tue, 8 Oct 2019 12:43:47 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 189A68E0006; Tue, 8 Oct 2019 12:43:47 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0177.hostedemail.com [216.40.44.177]) by kanga.kvack.org (Postfix) with ESMTP id ECD288E0003 for ; Tue, 8 Oct 2019 12:43:46 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 93E4B2C22 for ; Tue, 8 Oct 2019 16:43:46 +0000 (UTC) X-FDA: 76021188852.01.yard36_3cd3f9f1f737 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,akaher@vmware.com,:gregkh@linuxfoundation.org:torvalds@linux-foundation.org:punit.agrawal@arm.com:akpm@linux-foundation.org:kirill.shutemov@linux.intel.com:willy@infradead.org:will.deacon@arm.com:mszeredi@redhat.com:stable@vger.kernel.org::linux-kernel@vger.kernel.org:srivatsab@vmware.com:srivatsa@csail.mit.edu:amakhalov@vmware.com:srinidhir@vmware.com:bvikas@vmware.com:anishs@vmware.com:vsirnapalli@vmware.com:srostedt@vmware.com:akaher@vmware.com:jannh@google.com:stable@kernel.org,RULES_HIT:30012:30034:30054:30070,0,RBL:208.91.0.190:@vmware.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: yard36_3cd3f9f1f737 X-Filterd-Recvd-Size: 4806 Received: from EX13-EDG-OU-002.vmware.com (ex13-edg-ou-002.vmware.com [208.91.0.190]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Oct 2019 16:43:46 +0000 (UTC) Received: from sc9-mailhost3.vmware.com (10.113.161.73) by EX13-EDG-OU-002.vmware.com (10.113.208.156) with Microsoft SMTP Server id 15.0.1156.6; Tue, 8 Oct 2019 09:43:42 -0700 Received: from akaher-lnx-dev.eng.vmware.com (unknown [10.110.19.203]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id 932EB40AF8; Tue, 8 Oct 2019 09:43:39 -0700 (PDT) From: Ajay Kaher To: CC: , , , , , , , , , , , , , , , , , , , Jann Horn , Subject: [PATCH v2 1/8] mm: make page ref count overflow check tighter and more explicit Date: Wed, 9 Oct 2019 06:14:16 +0530 Message-ID: <1570581863-12090-2-git-send-email-akaher@vmware.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1570581863-12090-1-git-send-email-akaher@vmware.com> References: <1570581863-12090-1-git-send-email-akaher@vmware.com> MIME-Version: 1.0 Received-SPF: None (EX13-EDG-OU-002.vmware.com: akaher@vmware.com does not designate permitted sender hosts) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Linus Torvalds commit f958d7b528b1b40c44cfda5eabe2d82760d868c3 upsteam. We have a VM_BUG_ON() to check that the page reference count doesn't underflow (or get close to overflow) by checking the sign of the count. That's all fine, but we actually want to allow people to use a "get page ref unless it's already very high" helper function, and we want that one to use the sign of the page ref (without triggering this VM_BUG_ON). Change the VM_BUG_ON to only check for small underflows (or _very_ close to overflowing), and ignore overflows which have strayed into negative territory. Acked-by: Matthew Wilcox Cc: Jann Horn Cc: stable@kernel.org Signed-off-by: Linus Torvalds [ 4.4.y backport notes: Ajay: Open-coded atomic refcount access due to missing page_ref_count() helper in 4.4.y Srivatsa: Added overflow check to get_page_foll() and related code. ] Signed-off-by: Srivatsa S. Bhat (VMware) Signed-off-by: Ajay Kaher --- include/linux/mm.h | 6 +++++- mm/internal.h | 5 +++-- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index ed653ba..701088e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -488,6 +488,10 @@ static inline void get_huge_page_tail(struct page *page) extern bool __get_page_tail(struct page *page); +/* 127: arbitrary random number, small enough to assemble well */ +#define page_ref_zero_or_close_to_overflow(page) \ + ((unsigned int) atomic_read(&page->_count) + 127u <= 127u) + static inline void get_page(struct page *page) { if (unlikely(PageTail(page))) @@ -497,7 +501,7 @@ static inline void get_page(struct page *page) * Getting a normal page or the head of a compound page * requires to already have an elevated page->_count. */ - VM_BUG_ON_PAGE(atomic_read(&page->_count) <= 0, page); + VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page); atomic_inc(&page->_count); } diff --git a/mm/internal.h b/mm/internal.h index f63f439..67015e5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -81,7 +81,8 @@ static inline void __get_page_tail_foll(struct page *page, * speculative page access (like in * page_cache_get_speculative()) on tail pages. */ - VM_BUG_ON_PAGE(atomic_read(&compound_head(page)->_count) <= 0, page); + VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(compound_head(page)), + page); if (get_page_head) atomic_inc(&compound_head(page)->_count); get_huge_page_tail(page); @@ -106,7 +107,7 @@ static inline void get_page_foll(struct page *page) * Getting a normal page or the head of a compound page * requires to already have an elevated page->_count. */ - VM_BUG_ON_PAGE(atomic_read(&page->_count) <= 0, page); + VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page); atomic_inc(&page->_count); } } From patchwork Wed Oct 9 00:44:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ajay Kaher X-Patchwork-Id: 11179889 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D94B14DB for ; Tue, 8 Oct 2019 16:43:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1412B21721 for ; Tue, 8 Oct 2019 16:43:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1412B21721 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=vmware.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 496DB8E0007; Tue, 8 Oct 2019 12:43:57 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 446B28E0003; Tue, 8 Oct 2019 12:43:57 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 384888E0007; Tue, 8 Oct 2019 12:43:57 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0008.hostedemail.com [216.40.44.8]) by kanga.kvack.org (Postfix) with ESMTP id 169608E0003 for ; Tue, 8 Oct 2019 12:43:57 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id B1DB3824CA30 for ; Tue, 8 Oct 2019 16:43:56 +0000 (UTC) X-FDA: 76021189272.12.trees43_53f153fcda27 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,akaher@vmware.com,:gregkh@linuxfoundation.org:torvalds@linux-foundation.org:punit.agrawal@arm.com:akpm@linux-foundation.org:kirill.shutemov@linux.intel.com:willy@infradead.org:will.deacon@arm.com:mszeredi@redhat.com:stable@vger.kernel.org::linux-kernel@vger.kernel.org:srivatsab@vmware.com:srivatsa@csail.mit.edu:amakhalov@vmware.com:srinidhir@vmware.com:bvikas@vmware.com:anishs@vmware.com:vsirnapalli@vmware.com:srostedt@vmware.com:akaher@vmware.com:jannh@google.com:stable@kernel.org,RULES_HIT:30012:30054:30070,0,RBL:208.91.0.189:@vmware.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:34,LUA_SUMMARY:none X-HE-Tag: trees43_53f153fcda27 X-Filterd-Recvd-Size: 5026 Received: from EX13-EDG-OU-001.vmware.com (ex13-edg-ou-001.vmware.com [208.91.0.189]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Oct 2019 16:43:55 +0000 (UTC) Received: from sc9-mailhost3.vmware.com (10.113.161.73) by EX13-EDG-OU-001.vmware.com (10.113.208.155) with Microsoft SMTP Server id 15.0.1156.6; Tue, 8 Oct 2019 09:43:52 -0700 Received: from akaher-lnx-dev.eng.vmware.com (unknown [10.110.19.203]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id 9474E40C48; Tue, 8 Oct 2019 09:43:49 -0700 (PDT) From: Ajay Kaher To: CC: , , , , , , , , , , , , , , , , , , , Jann Horn , Subject: [PATCH v2 2/8] mm: add 'try_get_page()' helper function Date: Wed, 9 Oct 2019 06:14:17 +0530 Message-ID: <1570581863-12090-3-git-send-email-akaher@vmware.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1570581863-12090-1-git-send-email-akaher@vmware.com> References: <1570581863-12090-1-git-send-email-akaher@vmware.com> MIME-Version: 1.0 Received-SPF: None (EX13-EDG-OU-001.vmware.com: akaher@vmware.com does not designate permitted sender hosts) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Linus Torvalds commit 88b1a17dfc3ed7728316478fae0f5ad508f50397 upsteam. This is the same as the traditional 'get_page()' function, but instead of unconditionally incrementing the reference count of the page, it only does so if the count was "safe". It returns whether the reference count was incremented (and is marked __must_check, since the caller obviously has to be aware of it). Also like 'get_page()', you can't use this function unless you already had a reference to the page. The intent is that you can use this exactly like get_page(), but in situations where you want to limit the maximum reference count. The code currently does an unconditional WARN_ON_ONCE() if we ever hit the reference count issues (either zero or negative), as a notification that the conditional non-increment actually happened. NOTE! The count access for the "safety" check is inherently racy, but that doesn't matter since the buffer we use is basically half the range of the reference count (ie we look at the sign of the count). Acked-by: Matthew Wilcox Cc: Jann Horn Cc: stable@kernel.org Signed-off-by: Linus Torvalds [ 4.4.y backport notes: Srivatsa: - Adapted try_get_page() to match the get_page() implementation in 4.4.y, except for the refcount check. - Added try_get_page_foll() which will be needed in a subsequent patch. ] Signed-off-by: Srivatsa S. Bhat (VMware) Signed-off-by: Ajay Kaher --- include/linux/mm.h | 12 ++++++++++++ mm/internal.h | 23 +++++++++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 701088e..52edaf1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -505,6 +505,18 @@ static inline void get_page(struct page *page) atomic_inc(&page->_count); } +static inline __must_check bool try_get_page(struct page *page) +{ + if (unlikely(PageTail(page))) + if (likely(__get_page_tail(page))) + return true; + + if (WARN_ON_ONCE(atomic_read(&page->_count) <= 0)) + return false; + atomic_inc(&page->_count); + return true; +} + static inline struct page *virt_to_head_page(const void *x) { struct page *page = virt_to_page(x); diff --git a/mm/internal.h b/mm/internal.h index 67015e5..d83afc9 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -112,6 +112,29 @@ static inline void get_page_foll(struct page *page) } } +static inline __must_check bool try_get_page_foll(struct page *page) +{ + if (unlikely(PageTail(page))) { + if (WARN_ON_ONCE(atomic_read(&compound_head(page)->_count) <= 0)) + return false; + /* + * This is safe only because + * __split_huge_page_refcount() can't run under + * get_page_foll() because we hold the proper PT lock. + */ + __get_page_tail_foll(page, true); + } else { + /* + * Getting a normal page or the head of a compound page + * requires to already have an elevated page->_count. + */ + if (WARN_ON_ONCE(atomic_read(&page->_count) <= 0)) + return false; + atomic_inc(&page->_count); + } + return true; +} + extern unsigned long highest_memmap_pfn; /* From patchwork Wed Oct 9 00:44:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ajay Kaher X-Patchwork-Id: 11179891 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9AC63139A for ; Tue, 8 Oct 2019 16:44:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 71779217D7 for ; Tue, 8 Oct 2019 16:44:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 71779217D7 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=vmware.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A15808E0008; Tue, 8 Oct 2019 12:44:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9C54C8E0003; Tue, 8 Oct 2019 12:44:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 902508E0008; Tue, 8 Oct 2019 12:44:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0139.hostedemail.com [216.40.44.139]) by kanga.kvack.org (Postfix) with ESMTP id 6F7CF8E0003 for ; Tue, 8 Oct 2019 12:44:08 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 255BB181AC9B4 for ; Tue, 8 Oct 2019 16:44:08 +0000 (UTC) X-FDA: 76021189776.14.light04_6e8343c51d0a X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,akaher@vmware.com,:gregkh@linuxfoundation.org:torvalds@linux-foundation.org:punit.agrawal@arm.com:akpm@linux-foundation.org:kirill.shutemov@linux.intel.com:willy@infradead.org:will.deacon@arm.com:mszeredi@redhat.com:stable@vger.kernel.org::linux-kernel@vger.kernel.org:srivatsab@vmware.com:srivatsa@csail.mit.edu:amakhalov@vmware.com:srinidhir@vmware.com:bvikas@vmware.com:anishs@vmware.com:vsirnapalli@vmware.com:srostedt@vmware.com:akaher@vmware.com:aarcange@redhat.com:hughd@google.com:dave.hansen@intel.com:mgorman@suse.de:riel@redhat.com:n-horiguchi@ah.jp.nec.com:steve.capper@linaro.org:hannes@cmpxchg.org:mhocko@suse.cz:cl@linux.com:rientjes@google.com,RULES_HIT:30054:30064,0,RBL:208.91.0.190:@vmware.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: light04_6e8343c51d0a X-Filterd-Recvd-Size: 4670 Received: from EX13-EDG-OU-002.vmware.com (ex13-edg-ou-002.vmware.com [208.91.0.190]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Oct 2019 16:44:07 +0000 (UTC) Received: from sc9-mailhost3.vmware.com (10.113.161.73) by EX13-EDG-OU-002.vmware.com (10.113.208.156) with Microsoft SMTP Server id 15.0.1156.6; Tue, 8 Oct 2019 09:44:04 -0700 Received: from akaher-lnx-dev.eng.vmware.com (unknown [10.110.19.203]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id 1302A40AF8; Tue, 8 Oct 2019 09:43:58 -0700 (PDT) From: Ajay Kaher To: CC: , , , , , , , , , , , , , , , , , , , Andrea Arcangeli , Hugh Dickins , Dave Hansen , Mel Gorman , Rik van Riel , Naoya Horiguchi , Steve Capper , Johannes Weiner , Michal Hocko , Christoph Lameter , David Rientjes Subject: [PATCH v2 3/8] mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton Date: Wed, 9 Oct 2019 06:14:18 +0530 Message-ID: <1570581863-12090-4-git-send-email-akaher@vmware.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1570581863-12090-1-git-send-email-akaher@vmware.com> References: <1570581863-12090-1-git-send-email-akaher@vmware.com> MIME-Version: 1.0 Received-SPF: None (EX13-EDG-OU-002.vmware.com: akaher@vmware.com does not designate permitted sender hosts) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Kirill A. Shutemov" commit 7aef4172c7957d7e65fc172be4c99becaef855d4 upstream. With new refcounting we are going to see THP tail pages mapped with PTE. Generic fast GUP rely on page_cache_get_speculative() to obtain reference on page. page_cache_get_speculative() always fails on tail pages, because ->_count on tail pages is always zero. Let's handle tail pages in gup_pte_range(). New split_huge_page() will rely on migration entries to freeze page's counts. Recheck PTE value after page_cache_get_speculative() on head page should be enough to serialize against split. Signed-off-by: Kirill A. Shutemov Tested-by: Sasha Levin Tested-by: Aneesh Kumar K.V Acked-by: Jerome Marchand Acked-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: Hugh Dickins Cc: Dave Hansen Cc: Mel Gorman Cc: Rik van Riel Cc: Naoya Horiguchi Cc: Steve Capper Cc: Johannes Weiner Cc: Michal Hocko Cc: Christoph Lameter Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ajay Kaher --- mm/gup.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 2cd3b31..45c544b 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1070,7 +1070,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, * for an example see gup_get_pte in arch/x86/mm/gup.c */ pte_t pte = READ_ONCE(*ptep); - struct page *page; + struct page *head, *page; /* * Similar to the PMD case below, NUMA hinting must take slow @@ -1082,15 +1082,17 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); + head = compound_head(page); - if (!page_cache_get_speculative(page)) + if (!page_cache_get_speculative(head)) goto pte_unmap; if (unlikely(pte_val(pte) != pte_val(*ptep))) { - put_page(page); + put_page(head); goto pte_unmap; } + VM_BUG_ON_PAGE(compound_head(page) != head, page); pages[*nr] = page; (*nr)++; From patchwork Wed Oct 9 00:44:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ajay Kaher X-Patchwork-Id: 11179893 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BC152139A for ; Tue, 8 Oct 2019 16:44:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9319E21721 for ; Tue, 8 Oct 2019 16:44:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9319E21721 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=vmware.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BC2118E0009; Tue, 8 Oct 2019 12:44:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B72568E0003; Tue, 8 Oct 2019 12:44:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB0D48E0009; Tue, 8 Oct 2019 12:44:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0191.hostedemail.com [216.40.44.191]) by kanga.kvack.org (Postfix) with ESMTP id 889F18E0003 for ; Tue, 8 Oct 2019 12:44:17 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 422934837 for ; Tue, 8 Oct 2019 16:44:17 +0000 (UTC) X-FDA: 76021190154.19.fruit26_840e0a052b09 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,akaher@vmware.com,:gregkh@linuxfoundation.org:torvalds@linux-foundation.org:punit.agrawal@arm.com:akpm@linux-foundation.org:kirill.shutemov@linux.intel.com:willy@infradead.org:will.deacon@arm.com:mszeredi@redhat.com:stable@vger.kernel.org::linux-kernel@vger.kernel.org:srivatsab@vmware.com:srivatsa@csail.mit.edu:amakhalov@vmware.com:srinidhir@vmware.com:bvikas@vmware.com:anishs@vmware.com:vsirnapalli@vmware.com:srostedt@vmware.com:akaher@vmware.com:aneesh.kumar@linux.vnet.ibm.com:catalin.marinas@arm.com:n-horiguchi@ah.jp.nec.com:mark.rutland@arm.com:hillf.zj@alibaba-inc.com:mhocko@suse.com:mike.kravetz@oracle.com,RULES_HIT:30012:30054:30064:30070,0,RBL:208.91.0.189:@vmware.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: fruit26_840e0a052b09 X-Filterd-Recvd-Size: 4782 Received: from EX13-EDG-OU-001.vmware.com (ex13-edg-ou-001.vmware.com [208.91.0.189]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Oct 2019 16:44:16 +0000 (UTC) Received: from sc9-mailhost3.vmware.com (10.113.161.73) by EX13-EDG-OU-001.vmware.com (10.113.208.155) with Microsoft SMTP Server id 15.0.1156.6; Tue, 8 Oct 2019 09:44:13 -0700 Received: from akaher-lnx-dev.eng.vmware.com (unknown [10.110.19.203]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id 1580B40C00; Tue, 8 Oct 2019 09:44:08 -0700 (PDT) From: Ajay Kaher To: CC: , , , , , , , , , , , , , , , , , , , "Aneesh Kumar K . V" , Catalin Marinas , Naoya Horiguchi , Mark Rutland , Hillf Danton , Michal Hocko , Mike Kravetz Subject: [PATCH v2 4/8] mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages Date: Wed, 9 Oct 2019 06:14:19 +0530 Message-ID: <1570581863-12090-5-git-send-email-akaher@vmware.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1570581863-12090-1-git-send-email-akaher@vmware.com> References: <1570581863-12090-1-git-send-email-akaher@vmware.com> MIME-Version: 1.0 Received-SPF: None (EX13-EDG-OU-001.vmware.com: akaher@vmware.com does not designate permitted sender hosts) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Will Deacon commit a3e328556d41bb61c55f9dfcc62d6a826ea97b85 upstream. When operating on hugepages with DEBUG_VM enabled, the GUP code checks the compound head for each tail page prior to calling page_cache_add_speculative. This is broken, because on the fast-GUP path (where we don't hold any page table locks) we can be racing with a concurrent invocation of split_huge_page_to_list. split_huge_page_to_list deals with this race by using page_ref_freeze to freeze the page and force concurrent GUPs to fail whilst the component pages are modified. This modification includes clearing the compound_head field for the tail pages, so checking this prior to a successful call to page_cache_add_speculative can lead to false positives: In fact, page_cache_add_speculative *already* has this check once the page refcount has been successfully updated, so we can simply remove the broken calls to VM_BUG_ON_PAGE. Link: http://lkml.kernel.org/r/20170522133604.11392-2-punit.agrawal@arm.com Signed-off-by: Will Deacon Signed-off-by: Punit Agrawal Acked-by: Steve Capper Acked-by: Kirill A. Shutemov Cc: Aneesh Kumar K.V Cc: Catalin Marinas Cc: Naoya Horiguchi Cc: Mark Rutland Cc: Hillf Danton Cc: Michal Hocko Cc: Mike Kravetz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Srivatsa S. Bhat (VMware) Signed-off-by: Ajay Kaher --- mm/gup.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 45c544b..6e7cfaa 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1136,7 +1136,6 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT); tail = page; do { - VM_BUG_ON_PAGE(compound_head(page) != head, page); pages[*nr] = page; (*nr)++; page++; @@ -1183,7 +1182,6 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT); tail = page; do { - VM_BUG_ON_PAGE(compound_head(page) != head, page); pages[*nr] = page; (*nr)++; page++; @@ -1226,7 +1224,6 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, page = head + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT); tail = page; do { - VM_BUG_ON_PAGE(compound_head(page) != head, page); pages[*nr] = page; (*nr)++; page++; From patchwork Wed Oct 9 00:44:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ajay Kaher X-Patchwork-Id: 11179895 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BAFCC139A for ; Tue, 8 Oct 2019 16:44:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8890D21721 for ; Tue, 8 Oct 2019 16:44:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8890D21721 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=vmware.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AB0488E000A; Tue, 8 Oct 2019 12:44:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A62588E0003; Tue, 8 Oct 2019 12:44:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 977648E000A; Tue, 8 Oct 2019 12:44:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0188.hostedemail.com [216.40.44.188]) by kanga.kvack.org (Postfix) with ESMTP id 75C9D8E0003 for ; Tue, 8 Oct 2019 12:44:26 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 22F4152C5 for ; Tue, 8 Oct 2019 16:44:26 +0000 (UTC) X-FDA: 76021190532.21.rate45_98682083622b X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,akaher@vmware.com,:gregkh@linuxfoundation.org:torvalds@linux-foundation.org:punit.agrawal@arm.com:akpm@linux-foundation.org:kirill.shutemov@linux.intel.com:willy@infradead.org:will.deacon@arm.com:mszeredi@redhat.com:stable@vger.kernel.org::linux-kernel@vger.kernel.org:srivatsab@vmware.com:srivatsa@csail.mit.edu:amakhalov@vmware.com:srinidhir@vmware.com:bvikas@vmware.com:anishs@vmware.com:vsirnapalli@vmware.com:srostedt@vmware.com:akaher@vmware.com:mhocko@suse.com:aneesh.kumar@linux.vnet.ibm.com:catalin.marinas@arm.com:n-horiguchi@ah.jp.nec.com:mark.rutland@arm.com:hillf.zj@alibaba-inc.com:mike.kravetz@oracle.com,RULES_HIT:30036:30054:30064,0,RBL:208.91.0.190:@vmware.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:28,LUA_SUMMARY:none X-HE-Tag: rate45_98682083622b X-Filterd-Recvd-Size: 5773 Received: from EX13-EDG-OU-002.vmware.com (ex13-edg-ou-002.vmware.com [208.91.0.190]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Oct 2019 16:44:25 +0000 (UTC) Received: from sc9-mailhost3.vmware.com (10.113.161.73) by EX13-EDG-OU-002.vmware.com (10.113.208.156) with Microsoft SMTP Server id 15.0.1156.6; Tue, 8 Oct 2019 09:44:22 -0700 Received: from akaher-lnx-dev.eng.vmware.com (unknown [10.110.19.203]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id F139340AF8; Tue, 8 Oct 2019 09:44:17 -0700 (PDT) From: Ajay Kaher To: CC: , , , , , , , , , , , , , , , , , , , Michal Hocko , "Aneesh Kumar K . V" , Catalin Marinas , Naoya Horiguchi , Mark Rutland , Hillf Danton , Mike Kravetz Subject: [PATCH v2 5/8] mm, gup: ensure real head page is ref-counted when using hugepages Date: Wed, 9 Oct 2019 06:14:20 +0530 Message-ID: <1570581863-12090-6-git-send-email-akaher@vmware.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1570581863-12090-1-git-send-email-akaher@vmware.com> References: <1570581863-12090-1-git-send-email-akaher@vmware.com> MIME-Version: 1.0 Received-SPF: None (EX13-EDG-OU-002.vmware.com: akaher@vmware.com does not designate permitted sender hosts) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Punit Agrawal commit d63206ee32b6e64b0e12d46e5d6004afd9913713 upstream. When speculatively taking references to a hugepage using page_cache_add_speculative() in gup_huge_pmd(), it is assumed that the page returned by pmd_page() is the head page. Although normally true, this assumption doesn't hold when the hugepage comprises of successive page table entries such as when using contiguous bit on arm64 at PTE or PMD levels. This can be addressed by ensuring that the page passed to page_cache_add_speculative() is the real head or by de-referencing the head page within the function. We take the first approach to keep the usage pattern aligned with page_cache_get_speculative() where users already pass the appropriate page, i.e., the de-referenced head. Apply the same logic to fix gup_huge_[pud|pgd]() as well. [punit.agrawal@arm.com: fix arm64 ltp failure] Link: http://lkml.kernel.org/r/20170619170145.25577-5-punit.agrawal@arm.com Link: http://lkml.kernel.org/r/20170522133604.11392-3-punit.agrawal@arm.com Signed-off-by: Punit Agrawal Acked-by: Steve Capper Cc: Michal Hocko Cc: "Kirill A. Shutemov" Cc: Aneesh Kumar K.V Cc: Catalin Marinas Cc: Will Deacon Cc: Naoya Horiguchi Cc: Mark Rutland Cc: Hillf Danton Cc: Mike Kravetz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ajay Kaher Reviewed-by: Srivatsa S. Bhat (VMware) --- mm/gup.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 6e7cfaa..fae4d1e 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1132,8 +1132,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, return 0; refs = 0; - head = pmd_page(orig); - page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT); + page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); tail = page; do { pages[*nr] = page; @@ -1142,6 +1141,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); + head = compound_head(pmd_page(orig)); if (!page_cache_add_speculative(head, refs)) { *nr -= refs; return 0; @@ -1178,8 +1178,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, return 0; refs = 0; - head = pud_page(orig); - page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT); + page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); tail = page; do { pages[*nr] = page; @@ -1188,6 +1187,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); + head = compound_head(pud_page(orig)); if (!page_cache_add_speculative(head, refs)) { *nr -= refs; return 0; @@ -1220,8 +1220,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, return 0; refs = 0; - head = pgd_page(orig); - page = head + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT); + page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT); tail = page; do { pages[*nr] = page; @@ -1230,6 +1229,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); + head = compound_head(pgd_page(orig)); if (!page_cache_add_speculative(head, refs)) { *nr -= refs; return 0; From patchwork Wed Oct 9 00:44:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ajay Kaher X-Patchwork-Id: 11179897 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3CDEB14DB for ; Tue, 8 Oct 2019 16:44:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 067FB21871 for ; Tue, 8 Oct 2019 16:44:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 067FB21871 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=vmware.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 224D08E0005; Tue, 8 Oct 2019 12:44:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1D5728E0003; Tue, 8 Oct 2019 12:44:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0EABC8E0005; Tue, 8 Oct 2019 12:44:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0120.hostedemail.com [216.40.44.120]) by kanga.kvack.org (Postfix) with ESMTP id E04D28E0003 for ; Tue, 8 Oct 2019 12:44:37 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 58970824CA30 for ; Tue, 8 Oct 2019 16:44:37 +0000 (UTC) X-FDA: 76021190994.01.waste95_b2bcba3bb62d X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,akaher@vmware.com,:gregkh@linuxfoundation.org:torvalds@linux-foundation.org:punit.agrawal@arm.com:akpm@linux-foundation.org:kirill.shutemov@linux.intel.com:willy@infradead.org:will.deacon@arm.com:mszeredi@redhat.com:stable@vger.kernel.org::linux-kernel@vger.kernel.org:srivatsab@vmware.com:srivatsa@csail.mit.edu:amakhalov@vmware.com:srinidhir@vmware.com:bvikas@vmware.com:anishs@vmware.com:vsirnapalli@vmware.com:srostedt@vmware.com:akaher@vmware.com:stable@kernel.org,RULES_HIT:30003:30054:30070,0,RBL:208.91.0.189:@vmware.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: waste95_b2bcba3bb62d X-Filterd-Recvd-Size: 6886 Received: from EX13-EDG-OU-001.vmware.com (ex13-edg-ou-001.vmware.com [208.91.0.189]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Oct 2019 16:44:36 +0000 (UTC) Received: from sc9-mailhost3.vmware.com (10.113.161.73) by EX13-EDG-OU-001.vmware.com (10.113.208.155) with Microsoft SMTP Server id 15.0.1156.6; Tue, 8 Oct 2019 09:44:33 -0700 Received: from akaher-lnx-dev.eng.vmware.com (unknown [10.110.19.203]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id 7FBA140AF8; Tue, 8 Oct 2019 09:44:30 -0700 (PDT) From: Ajay Kaher To: CC: , , , , , , , , , , , , , , , , , , , Subject: [PATCH v2 6/8] mm: prevent get_user_pages() from overflowing page refcount Date: Wed, 9 Oct 2019 06:14:21 +0530 Message-ID: <1570581863-12090-7-git-send-email-akaher@vmware.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1570581863-12090-1-git-send-email-akaher@vmware.com> References: <1570581863-12090-1-git-send-email-akaher@vmware.com> MIME-Version: 1.0 Received-SPF: None (EX13-EDG-OU-001.vmware.com: akaher@vmware.com does not designate permitted sender hosts) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Linus Torvalds commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64 upstream. If the page refcount wraps around past zero, it will be freed while there are still four billion references to it. One of the possible avenues for an attacker to try to make this happen is by doing direct IO on a page multiple times. This patch makes get_user_pages() refuse to take a new page reference if there are already more than two billion references to the page. Reported-by: Jann Horn Acked-by: Matthew Wilcox Cc: stable@kernel.org Signed-off-by: Linus Torvalds [ 4.4.y backport notes: Ajay: Added local variable 'err' with-in follow_hugetlb_page() from 2be7cfed995e, to resolve compilation error Srivatsa: Replaced call to get_page_foll() with try_get_page_foll() ] Signed-off-by: Srivatsa S. Bhat (VMware) Signed-off-by: Ajay Kaher --- mm/gup.c | 43 ++++++++++++++++++++++++++++++++----------- mm/hugetlb.c | 16 +++++++++++++++- 2 files changed, 47 insertions(+), 12 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index fae4d1e..171b460 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -126,8 +126,12 @@ retry: } } - if (flags & FOLL_GET) - get_page_foll(page); + if (flags & FOLL_GET) { + if (unlikely(!try_get_page_foll(page))) { + page = ERR_PTR(-ENOMEM); + goto out; + } + } if (flags & FOLL_TOUCH) { if ((flags & FOLL_WRITE) && !pte_dirty(pte) && !PageDirty(page)) @@ -289,7 +293,10 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address, goto unmap; *page = pte_page(*pte); } - get_page(*page); + if (unlikely(!try_get_page(*page))) { + ret = -ENOMEM; + goto unmap; + } out: ret = 0; unmap: @@ -1053,6 +1060,20 @@ struct page *get_dump_page(unsigned long addr) */ #ifdef CONFIG_HAVE_GENERIC_RCU_GUP +/* + * Return the compund head page with ref appropriately incremented, + * or NULL if that failed. + */ +static inline struct page *try_get_compound_head(struct page *page, int refs) +{ + struct page *head = compound_head(page); + if (WARN_ON_ONCE(atomic_read(&head->_count) < 0)) + return NULL; + if (unlikely(!page_cache_add_speculative(head, refs))) + return NULL; + return head; +} + #ifdef __HAVE_ARCH_PTE_SPECIAL static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, int write, struct page **pages, int *nr) @@ -1082,9 +1103,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); - head = compound_head(page); - if (!page_cache_get_speculative(head)) + head = try_get_compound_head(page, 1); + if (!head) goto pte_unmap; if (unlikely(pte_val(pte) != pte_val(*ptep))) { @@ -1141,8 +1162,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); - head = compound_head(pmd_page(orig)); - if (!page_cache_add_speculative(head, refs)) { + head = try_get_compound_head(pmd_page(orig), refs); + if (!head) { *nr -= refs; return 0; } @@ -1187,8 +1208,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); - head = compound_head(pud_page(orig)); - if (!page_cache_add_speculative(head, refs)) { + head = try_get_compound_head(pud_page(orig), refs); + if (!head) { *nr -= refs; return 0; } @@ -1229,8 +1250,8 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); - head = compound_head(pgd_page(orig)); - if (!page_cache_add_speculative(head, refs)) { + head = try_get_compound_head(pgd_page(orig), refs); + if (!head) { *nr -= refs; return 0; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index fd932e7..3a1501e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3886,6 +3886,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long vaddr = *position; unsigned long remainder = *nr_pages; struct hstate *h = hstate_vma(vma); + int err = -EFAULT; while (vaddr < vma->vm_end && remainder) { pte_t *pte; @@ -3957,6 +3958,19 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT; page = pte_page(huge_ptep_get(pte)); + + /* + * Instead of doing 'try_get_page_foll()' below in the same_page + * loop, just check the count once here. + */ + if (unlikely(page_count(page) <= 0)) { + if (pages) { + spin_unlock(ptl); + remainder = 0; + err = -ENOMEM; + break; + } + } same_page: if (pages) { pages[i] = mem_map_offset(page, pfn_offset); @@ -3983,7 +3997,7 @@ same_page: *nr_pages = remainder; *position = vaddr; - return i ? i : -EFAULT; + return i ? i : err; } unsigned long hugetlb_change_protection(struct vm_area_struct *vma, From patchwork Wed Oct 9 00:44:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ajay Kaher X-Patchwork-Id: 11179899 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C0F12139A for ; Tue, 8 Oct 2019 16:44:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9865A21721 for ; Tue, 8 Oct 2019 16:44:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9865A21721 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=vmware.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C0F068E000B; Tue, 8 Oct 2019 12:44:46 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BBFE48E0003; Tue, 8 Oct 2019 12:44:46 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD6538E000B; Tue, 8 Oct 2019 12:44:46 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0052.hostedemail.com [216.40.44.52]) by kanga.kvack.org (Postfix) with ESMTP id 8C5F88E0003 for ; Tue, 8 Oct 2019 12:44:46 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 3B2B0180AD803 for ; Tue, 8 Oct 2019 16:44:46 +0000 (UTC) X-FDA: 76021191372.24.board85_c7a822e1235a X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,akaher@vmware.com,:gregkh@linuxfoundation.org:torvalds@linux-foundation.org:punit.agrawal@arm.com:akpm@linux-foundation.org:kirill.shutemov@linux.intel.com:willy@infradead.org:will.deacon@arm.com:mszeredi@redhat.com:stable@vger.kernel.org::linux-kernel@vger.kernel.org:srivatsab@vmware.com:srivatsa@csail.mit.edu:amakhalov@vmware.com:srinidhir@vmware.com:bvikas@vmware.com:anishs@vmware.com:vsirnapalli@vmware.com:srostedt@vmware.com:akaher@vmware.com:viro@zeniv.linux.org.uk,RULES_HIT:30054:30079,0,RBL:208.91.0.190:@vmware.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:26,LUA_SUMMARY:none X-HE-Tag: board85_c7a822e1235a X-Filterd-Recvd-Size: 4025 Received: from EX13-EDG-OU-002.vmware.com (ex13-edg-ou-002.vmware.com [208.91.0.190]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Oct 2019 16:44:45 +0000 (UTC) Received: from sc9-mailhost3.vmware.com (10.113.161.73) by EX13-EDG-OU-002.vmware.com (10.113.208.156) with Microsoft SMTP Server id 15.0.1156.6; Tue, 8 Oct 2019 09:44:42 -0700 Received: from akaher-lnx-dev.eng.vmware.com (unknown [10.110.19.203]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id 6C20640C00; Tue, 8 Oct 2019 09:44:39 -0700 (PDT) From: Ajay Kaher To: CC: , , , , , , , , , , , , , , , , , , , Al Viro Subject: [PATCH v2 7/8] pipe: add pipe_buf_get() helper Date: Wed, 9 Oct 2019 06:14:22 +0530 Message-ID: <1570581863-12090-8-git-send-email-akaher@vmware.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1570581863-12090-1-git-send-email-akaher@vmware.com> References: <1570581863-12090-1-git-send-email-akaher@vmware.com> MIME-Version: 1.0 Received-SPF: None (EX13-EDG-OU-002.vmware.com: akaher@vmware.com does not designate permitted sender hosts) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miklos Szeredi commit 7bf2d1df80822ec056363627e2014990f068f7aa upstream. Signed-off-by: Miklos Szeredi Signed-off-by: Al Viro Signed-off-by: Ajay Kaher Reviewed-by: Srivatsa S. Bhat (VMware) --- fs/fuse/dev.c | 2 +- fs/splice.c | 4 ++-- include/linux/pipe_fs_i.h | 11 +++++++++++ 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index f5d2d23..36a5df9 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -2052,7 +2052,7 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, pipe->curbuf = (pipe->curbuf + 1) & (pipe->buffers - 1); pipe->nrbufs--; } else { - ibuf->ops->get(pipe, ibuf); + pipe_buf_get(pipe, ibuf); *obuf = *ibuf; obuf->flags &= ~PIPE_BUF_FLAG_GIFT; obuf->len = rem; diff --git a/fs/splice.c b/fs/splice.c index 8398974..fde1263 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1876,7 +1876,7 @@ retry: * Get a reference to this pipe buffer, * so we can copy the contents over. */ - ibuf->ops->get(ipipe, ibuf); + pipe_buf_get(ipipe, ibuf); *obuf = *ibuf; /* @@ -1948,7 +1948,7 @@ static int link_pipe(struct pipe_inode_info *ipipe, * Get a reference to this pipe buffer, * so we can copy the contents over. */ - ibuf->ops->get(ipipe, ibuf); + pipe_buf_get(ipipe, ibuf); obuf = opipe->bufs + nbuf; *obuf = *ibuf; diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 24f5470..10876f3 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -115,6 +115,17 @@ struct pipe_buf_operations { void (*get)(struct pipe_inode_info *, struct pipe_buffer *); }; +/** + * pipe_buf_get - get a reference to a pipe_buffer + * @pipe: the pipe that the buffer belongs to + * @buf: the buffer to get a reference to + */ +static inline void pipe_buf_get(struct pipe_inode_info *pipe, + struct pipe_buffer *buf) +{ + buf->ops->get(pipe, buf); +} + /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual memory allocation, whereas PIPE_BUF makes atomicity guarantees. */ #define PIPE_SIZE PAGE_SIZE From patchwork Wed Oct 9 00:44:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ajay Kaher X-Patchwork-Id: 11179901 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E1CA14DB for ; Tue, 8 Oct 2019 16:44:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EEA9A217D7 for ; Tue, 8 Oct 2019 16:44:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EEA9A217D7 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=vmware.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1AFC28E0006; Tue, 8 Oct 2019 12:44:56 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 186F08E0003; Tue, 8 Oct 2019 12:44:56 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 09D348E0006; Tue, 8 Oct 2019 12:44:56 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DD2AD8E0003 for ; Tue, 8 Oct 2019 12:44:55 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 840456894 for ; Tue, 8 Oct 2019 16:44:55 +0000 (UTC) X-FDA: 76021191750.09.pear78_dc280544b15e X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,akaher@vmware.com,:gregkh@linuxfoundation.org:torvalds@linux-foundation.org:punit.agrawal@arm.com:akpm@linux-foundation.org:kirill.shutemov@linux.intel.com:willy@infradead.org:will.deacon@arm.com:mszeredi@redhat.com:stable@vger.kernel.org::linux-kernel@vger.kernel.org:srivatsab@vmware.com:srivatsa@csail.mit.edu:amakhalov@vmware.com:srinidhir@vmware.com:bvikas@vmware.com:anishs@vmware.com:vsirnapalli@vmware.com:srostedt@vmware.com:akaher@vmware.com:stable@kernel.org,RULES_HIT:30054:30070:30079,0,RBL:208.91.0.189:@vmware.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:349,LUA_SUMMARY:none X-HE-Tag: pear78_dc280544b15e X-Filterd-Recvd-Size: 7473 Received: from EX13-EDG-OU-001.vmware.com (ex13-edg-ou-001.vmware.com [208.91.0.189]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Oct 2019 16:44:54 +0000 (UTC) Received: from sc9-mailhost3.vmware.com (10.113.161.73) by EX13-EDG-OU-001.vmware.com (10.113.208.155) with Microsoft SMTP Server id 15.0.1156.6; Tue, 8 Oct 2019 09:44:51 -0700 Received: from akaher-lnx-dev.eng.vmware.com (unknown [10.110.19.203]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id 49F7240C00; Tue, 8 Oct 2019 09:44:48 -0700 (PDT) From: Ajay Kaher To: CC: , , , , , , , , , , , , , , , , , , , Subject: [PATCH v2 8/8] fs: prevent page refcount overflow in pipe_buf_get Date: Wed, 9 Oct 2019 06:14:23 +0530 Message-ID: <1570581863-12090-9-git-send-email-akaher@vmware.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1570581863-12090-1-git-send-email-akaher@vmware.com> References: <1570581863-12090-1-git-send-email-akaher@vmware.com> MIME-Version: 1.0 Received-SPF: None (EX13-EDG-OU-001.vmware.com: akaher@vmware.com does not designate permitted sender hosts) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Matthew Wilcox commit 15fab63e1e57be9fdb5eec1bbc5916e9825e9acb upstream. Change pipe_buf_get() to return a bool indicating whether it succeeded in raising the refcount of the page (if the thing in the pipe is a page). This removes another mechanism for overflowing the page refcount. All callers converted to handle a failure. Reported-by: Jann Horn Signed-off-by: Matthew Wilcox Cc: stable@kernel.org Signed-off-by: Linus Torvalds [ 4.4.y backport notes: Regarding the change in generic_pipe_buf_get(), note that page_cache_get() is the same as get_page(). See mainline commit 09cbfeaf1a5a6 "mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros" for context. ] Signed-off-by: Ajay Kaher Reviewed-by: Srivatsa S. Bhat (VMware) --- fs/fuse/dev.c | 12 ++++++------ fs/pipe.c | 4 ++-- fs/splice.c | 12 ++++++++++-- include/linux/pipe_fs_i.h | 10 ++++++---- kernel/trace/trace.c | 6 +++++- 5 files changed, 29 insertions(+), 15 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 36a5df9..16891f5 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -2031,10 +2031,8 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, rem += pipe->bufs[(pipe->curbuf + idx) & (pipe->buffers - 1)].len; ret = -EINVAL; - if (rem < len) { - pipe_unlock(pipe); - goto out; - } + if (rem < len) + goto out_free; rem = len; while (rem) { @@ -2052,7 +2050,9 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, pipe->curbuf = (pipe->curbuf + 1) & (pipe->buffers - 1); pipe->nrbufs--; } else { - pipe_buf_get(pipe, ibuf); + if (!pipe_buf_get(pipe, ibuf)) + goto out_free; + *obuf = *ibuf; obuf->flags &= ~PIPE_BUF_FLAG_GIFT; obuf->len = rem; @@ -2075,13 +2075,13 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, ret = fuse_dev_do_write(fud, &cs, len); pipe_lock(pipe); +out_free: for (idx = 0; idx < nbuf; idx++) { struct pipe_buffer *buf = &bufs[idx]; buf->ops->release(pipe, buf); } pipe_unlock(pipe); -out: kfree(bufs); return ret; } diff --git a/fs/pipe.c b/fs/pipe.c index 1e7263b..6534470 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -178,9 +178,9 @@ EXPORT_SYMBOL(generic_pipe_buf_steal); * in the tee() system call, when we duplicate the buffers in one * pipe into another. */ -void generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf) +bool generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { - page_cache_get(buf->page); + return try_get_page(buf->page); } EXPORT_SYMBOL(generic_pipe_buf_get); diff --git a/fs/splice.c b/fs/splice.c index fde1263..57ccc58 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1876,7 +1876,11 @@ retry: * Get a reference to this pipe buffer, * so we can copy the contents over. */ - pipe_buf_get(ipipe, ibuf); + if (!pipe_buf_get(ipipe, ibuf)) { + if (ret == 0) + ret = -EFAULT; + break; + } *obuf = *ibuf; /* @@ -1948,7 +1952,11 @@ static int link_pipe(struct pipe_inode_info *ipipe, * Get a reference to this pipe buffer, * so we can copy the contents over. */ - pipe_buf_get(ipipe, ibuf); + if (!pipe_buf_get(ipipe, ibuf)) { + if (ret == 0) + ret = -EFAULT; + break; + } obuf = opipe->bufs + nbuf; *obuf = *ibuf; diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 10876f3..0b28b65 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -112,18 +112,20 @@ struct pipe_buf_operations { /* * Get a reference to the pipe buffer. */ - void (*get)(struct pipe_inode_info *, struct pipe_buffer *); + bool (*get)(struct pipe_inode_info *, struct pipe_buffer *); }; /** * pipe_buf_get - get a reference to a pipe_buffer * @pipe: the pipe that the buffer belongs to * @buf: the buffer to get a reference to + * + * Return: %true if the reference was successfully obtained. */ -static inline void pipe_buf_get(struct pipe_inode_info *pipe, +static inline __must_check bool pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { - buf->ops->get(pipe, buf); + return buf->ops->get(pipe, buf); } /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual @@ -148,7 +150,7 @@ struct pipe_inode_info *alloc_pipe_info(void); void free_pipe_info(struct pipe_inode_info *); /* Generic pipe buffer ops functions */ -void generic_pipe_buf_get(struct pipe_inode_info *, struct pipe_buffer *); +bool generic_pipe_buf_get(struct pipe_inode_info *, struct pipe_buffer *); int generic_pipe_buf_confirm(struct pipe_inode_info *, struct pipe_buffer *); int generic_pipe_buf_steal(struct pipe_inode_info *, struct pipe_buffer *); void generic_pipe_buf_release(struct pipe_inode_info *, struct pipe_buffer *); diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index ae00e68..7fe8d04 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -5731,12 +5731,16 @@ static void buffer_pipe_buf_release(struct pipe_inode_info *pipe, buf->private = 0; } -static void buffer_pipe_buf_get(struct pipe_inode_info *pipe, +static bool buffer_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { struct buffer_ref *ref = (struct buffer_ref *)buf->private; + if (ref->ref > INT_MAX/2) + return false; + ref->ref++; + return true; } /* Pipe buffer operations for a buffer. */