From patchwork Wed Nov 6 05:16:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11229301 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 07D3115AB for ; Wed, 6 Nov 2019 05:16:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A5BB42187F for ; Wed, 6 Nov 2019 05:16:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="FpKemzW6" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A5BB42187F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D57946B026C; Wed, 6 Nov 2019 00:16:33 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D2E906B026D; Wed, 6 Nov 2019 00:16:33 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C6C726B026E; Wed, 6 Nov 2019 00:16:33 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id B2B8C6B026C for ; Wed, 6 Nov 2019 00:16:33 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 375238249980 for ; Wed, 6 Nov 2019 05:16:33 +0000 (UTC) X-FDA: 76124692266.09.unit11_3926110d29241 X-Spam-Summary: 2,0,0,859df64c89619ab5,d41d8cd98f00b204,akpm@linux-foundation.org,:aarcange@redhat.com:akpm@linux-foundation.org:gavin.dg@linux.alibaba.com:hughd@google.com:kirill.shutemov@linux.intel.com::mm-commits@vger.kernel.org:stable@vger.kernel.org:torvalds@linux-foundation.org:willy@infradead.org:yang.shi@linux.alibaba.com,RULES_HIT:2:41:355:379:800:960:966:967:968:973:988:989:1260:1263:1345:1381:1431:1437:1535:1606:1730:1747:1777:1792:2194:2196:2199:2200:2393:2525:2553:2559:2564:2682:2685:2859:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3354:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4117:4250:4321:4385:4605:4837:5007:6119:6120:6261:6653:6737:7576:7901:7903:8603:8957:9025:9391:9545:9592:10004:10913:11026:11473:11638:11639:11658:11914:12043:12048:12295:12296:12297:12438:12517:12519:12555:12679:12740:12783:12895:12986:13221:13229:13870:14096:21060:21080:21220:21433:21451:21627:21819:21939:300 12:30034 X-HE-Tag: unit11_3926110d29241 X-Filterd-Recvd-Size: 6731 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Nov 2019 05:16:32 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 4AD42217F4; Wed, 6 Nov 2019 05:16:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1573017391; bh=R3g8c6kIW5WzWb9NlCfXi7C9BnPG7b7quIZ/FZu3a0c=; h=Date:From:To:Subject:From; b=FpKemzW6NYL8uDjDS5uB10wj/coKp1ztckKTymmJSoUxBuUUqW0FCoR4UYor9PW38 b4PueJAfvzTThuMu/iB1DnwJ45yraUE9Ui5viQSeteCc+x3LA5nGM1AVuwHCp0mLB9 BfiT8WfRe2U9aMmmQ9nE+Xj7HavfG0agowRGyyLs= Date: Tue, 05 Nov 2019 21:16:30 -0800 From: akpm@linux-foundation.org To: aarcange@redhat.com, akpm@linux-foundation.org, gavin.dg@linux.alibaba.com, hughd@google.com, kirill.shutemov@linux.intel.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, stable@vger.kernel.org, torvalds@linux-foundation.org, willy@infradead.org, yang.shi@linux.alibaba.com Subject: [patch 04/17] mm: thp: handle page cache THP correctly in PageTransCompoundMap Message-ID: <20191106051630.mVkfeUoJk%akpm@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yang Shi Subject: mm: thp: handle page cache THP correctly in PageTransCompoundMap We have a usecase to use tmpfs as QEMU memory backend and we would like to take the advantage of THP as well. But, our test shows the EPT is not PMD mapped even though the underlying THP are PMD mapped on host. The number showed by /sys/kernel/debug/kvm/largepage is much less than the number of PMD mapped shmem pages as the below: 7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 579584 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 12 And some benchmarks do worse than with anonymous THPs. By digging into the code we figured out that commit 127393fbe597 ("mm: thp: kvm: fix memory corruption in KVM with THP enabled") checks if there is a single PTE mapping on the page for anonymous THP when setting up EPT map. But, the _mapcount < 0 check doesn't fit to page cache THP since every subpage of page cache THP would get _mapcount inc'ed once it is PMD mapped, so PageTransCompoundMap() always returns false for page cache THP. This would prevent KVM from setting up PMD mapped EPT entry. So we need handle page cache THP correctly. However, when page cache THP's PMD gets split, kernel just remove the map instead of setting up PTE map like what anonymous THP does. Before KVM calls get_user_pages() the subpages may get PTE mapped even though it is still a THP since the page cache THP may be mapped by other processes at the mean time. Checking its _mapcount and whether the THP has PTE mapped or not. Although this may report some false negative cases (PTE mapped by other processes), it looks not trivial to make this accurate. With this fix /sys/kernel/debug/kvm/largepage would show reasonable pages are PMD mapped by EPT as the below: 7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 557056 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 271 And the benchmarks are as same as anonymous THPs. [yang.shi@linux.alibaba.com: v4] Link: http://lkml.kernel.org/r/1571865575-42913-1-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1571769577-89735-1-git-send-email-yang.shi@linux.alibaba.com Fixes: dd78fedde4b9 ("rmap: support file thp") Signed-off-by: Yang Shi Reported-by: Gang Deng Tested-by: Gang Deng Suggested-by: Hugh Dickins Acked-by: Kirill A. Shutemov Cc: Andrea Arcangeli Cc: Matthew Wilcox Cc: [4.8+] Signed-off-by: Andrew Morton --- include/linux/mm.h | 5 ----- include/linux/mm_types.h | 5 +++++ include/linux/page-flags.h | 20 ++++++++++++++++++-- 3 files changed, 23 insertions(+), 7 deletions(-) --- a/include/linux/mm.h~mm-thp-handle-page-cache-thp-correctly-in-pagetranscompoundmap +++ a/include/linux/mm.h @@ -695,11 +695,6 @@ static inline void *kvcalloc(size_t n, s extern void kvfree(const void *addr); -static inline atomic_t *compound_mapcount_ptr(struct page *page) -{ - return &page[1].compound_mapcount; -} - static inline int compound_mapcount(struct page *page) { VM_BUG_ON_PAGE(!PageCompound(page), page); --- a/include/linux/mm_types.h~mm-thp-handle-page-cache-thp-correctly-in-pagetranscompoundmap +++ a/include/linux/mm_types.h @@ -221,6 +221,11 @@ struct page { #endif } _struct_page_alignment; +static inline atomic_t *compound_mapcount_ptr(struct page *page) +{ + return &page[1].compound_mapcount; +} + /* * Used for sizing the vmemmap region on some architectures */ --- a/include/linux/page-flags.h~mm-thp-handle-page-cache-thp-correctly-in-pagetranscompoundmap +++ a/include/linux/page-flags.h @@ -622,12 +622,28 @@ static inline int PageTransCompound(stru * * Unlike PageTransCompound, this is safe to be called only while * split_huge_pmd() cannot run from under us, like if protected by the - * MMU notifier, otherwise it may result in page->_mapcount < 0 false + * MMU notifier, otherwise it may result in page->_mapcount check false * positives. + * + * We have to treat page cache THP differently since every subpage of it + * would get _mapcount inc'ed once it is PMD mapped. But, it may be PTE + * mapped in the current process so comparing subpage's _mapcount to + * compound_mapcount to filter out PTE mapped case. */ static inline int PageTransCompoundMap(struct page *page) { - return PageTransCompound(page) && atomic_read(&page->_mapcount) < 0; + struct page *head; + + if (!PageTransCompound(page)) + return 0; + + if (PageAnon(page)) + return atomic_read(&page->_mapcount) < 0; + + head = compound_head(page); + /* File THP is PMD mapped and not PTE mapped */ + return atomic_read(&page->_mapcount) == + atomic_read(compound_mapcount_ptr(head)); } /*