From patchwork Thu May 12 15:40:55 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 9083531 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id BC37B9F372 for ; Thu, 12 May 2016 15:45:43 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id CD790200E9 for ; Thu, 12 May 2016 15:45:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BC7CF2024D for ; Thu, 12 May 2016 15:45:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932659AbcELPo1 (ORCPT ); Thu, 12 May 2016 11:44:27 -0400 Received: from mga03.intel.com ([134.134.136.65]:55147 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932576AbcELPln (ORCPT ); Thu, 12 May 2016 11:41:43 -0400 Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP; 12 May 2016 08:41:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,610,1455004800"; d="scan'208";a="701127127" Received: from black.fi.intel.com ([10.237.72.93]) by FMSMGA003.fm.intel.com with ESMTP; 12 May 2016 08:41:30 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id D37B0DFF; Thu, 12 May 2016 18:41:20 +0300 (EEST) From: "Kirill A. Shutemov" To: Hugh Dickins , Andrea Arcangeli , Andrew Morton Cc: Dave Hansen , Vlastimil Babka , Christoph Lameter , Naoya Horiguchi , Jerome Marchand , Yang Shi , Sasha Levin , Andres Lagar-Cavilla , Ning Qu , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv8 15/32] thp, mlock: do not mlock PTE-mapped file huge pages Date: Thu, 12 May 2016 18:40:55 +0300 Message-Id: <1463067672-134698-16-git-send-email-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1463067672-134698-1-git-send-email-kirill.shutemov@linux.intel.com> References: <1463067672-134698-1-git-send-email-kirill.shutemov@linux.intel.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP As with anon THP, we only mlock file huge pages if we can prove that the page is not mapped with PTE. This way we can avoid mlock leak into non-mlocked vma on split. We rely on PageDoubleMap() under lock_page() to check if the the page may be PTE mapped. PG_double_map is set by page_add_file_rmap() when the page mapped with PTEs. Signed-off-by: Kirill A. Shutemov --- include/linux/page-flags.h | 13 ++++++++++++- mm/huge_memory.c | 27 ++++++++++++++++++++------- mm/mmap.c | 6 ++++++ mm/page_alloc.c | 2 ++ mm/rmap.c | 16 ++++++++++++++-- 5 files changed, 54 insertions(+), 10 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index f4ed4f1b0c77..517707ae8cd1 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -544,6 +544,17 @@ static inline int PageDoubleMap(struct page *page) return PageHead(page) && test_bit(PG_double_map, &page[1].flags); } +static inline void SetPageDoubleMap(struct page *page) +{ + VM_BUG_ON_PAGE(!PageHead(page), page); + set_bit(PG_double_map, &page[1].flags); +} + +static inline void ClearPageDoubleMap(struct page *page) +{ + VM_BUG_ON_PAGE(!PageHead(page), page); + clear_bit(PG_double_map, &page[1].flags); +} static inline int TestSetPageDoubleMap(struct page *page) { VM_BUG_ON_PAGE(!PageHead(page), page); @@ -560,7 +571,7 @@ static inline int TestClearPageDoubleMap(struct page *page) TESTPAGEFLAG_FALSE(TransHuge) TESTPAGEFLAG_FALSE(TransCompound) TESTPAGEFLAG_FALSE(TransTail) -TESTPAGEFLAG_FALSE(DoubleMap) +PAGEFLAG_FALSE(DoubleMap) TESTSETFLAG_FALSE(DoubleMap) TESTCLEARFLAG_FALSE(DoubleMap) #endif diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5daad812d71e..001687987777 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1439,6 +1439,8 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, * We don't mlock() pte-mapped THPs. This way we can avoid * leaking mlocked pages into non-VM_LOCKED VMAs. * + * For anon THP: + * * In most cases the pmd is the only mapping of the page as we * break COW for the mlock() -- see gup_flags |= FOLL_WRITE for * writable private mappings in populate_vma_page_range(). @@ -1446,15 +1448,26 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, * The only scenario when we have the page shared here is if we * mlocking read-only mapping shared over fork(). We skip * mlocking such pages. + * + * For file THP: + * + * We can expect PageDoubleMap() to be stable under page lock: + * for file pages we set it in page_add_file_rmap(), which + * requires page to be locked. */ - if (compound_mapcount(page) == 1 && !PageDoubleMap(page) && - page->mapping && trylock_page(page)) { - lru_add_drain(); - if (page->mapping) - mlock_vma_page(page); - unlock_page(page); - } + + if (PageAnon(page) && compound_mapcount(page) != 1) + goto skip_mlock; + if (PageDoubleMap(page) || !page->mapping) + goto skip_mlock; + if (!trylock_page(page)) + goto skip_mlock; + lru_add_drain(); + if (page->mapping && !PageDoubleMap(page)) + mlock_vma_page(page); + unlock_page(page); } +skip_mlock: page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; VM_BUG_ON_PAGE(!PageCompound(page), page); if (flags & FOLL_GET) diff --git a/mm/mmap.c b/mm/mmap.c index df3976af9302..0a68cf0b37e7 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2584,6 +2584,12 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, /* drop PG_Mlocked flag for over-mapped range */ for (tmp = vma; tmp->vm_start >= start + size; tmp = tmp->vm_next) { + /* + * Split pmd and munlock page on the border + * of the range. + */ + vma_adjust_trans_huge(tmp, start, start + size, 0); + munlock_vma_pages_range(tmp, max(tmp->vm_start, start), min(tmp->vm_end, start + size)); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 59de90d5d3a3..4243b400b331 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1036,6 +1036,8 @@ static bool free_pages_prepare(struct page *page, unsigned int order) if (PageAnon(page)) page->mapping = NULL; + if (compound) + ClearPageDoubleMap(page); bad += free_pages_check(page); for (i = 1; i < (1 << order); i++) { if (compound) diff --git a/mm/rmap.c b/mm/rmap.c index 76d8c9269ac6..f67e765869f3 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1285,6 +1285,12 @@ void page_add_file_rmap(struct page *page, bool compound) if (!atomic_inc_and_test(compound_mapcount_ptr(page))) goto out; } else { + if (PageTransCompound(page)) { + VM_BUG_ON_PAGE(!PageLocked(page), page); + SetPageDoubleMap(compound_head(page)); + if (PageMlocked(page)) + clear_page_mlock(compound_head(page)); + } if (!atomic_inc_and_test(&page->_mapcount)) goto out; } @@ -1458,8 +1464,14 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, */ if (!(flags & TTU_IGNORE_MLOCK)) { if (vma->vm_flags & VM_LOCKED) { - /* Holding pte lock, we do *not* need mmap_sem here */ - mlock_vma_page(page); + /* PTE-mapped THP are never mlocked */ + if (!PageTransCompound(page)) { + /* + * Holding pte lock, we do *not* need + * mmap_sem here + */ + mlock_vma_page(page); + } ret = SWAP_MLOCK; goto out_unmap; }