From patchwork Mon Jul 29 21:09:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Kucharski X-Patchwork-Id: 11064535 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 23D0B112C for ; Mon, 29 Jul 2019 21:10:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1258428429 for ; Mon, 29 Jul 2019 21:10:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0266728468; Mon, 29 Jul 2019 21:10:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E121F28429 for ; Mon, 29 Jul 2019 21:10:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 04A058E0003; Mon, 29 Jul 2019 17:10:20 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 022738E0002; Mon, 29 Jul 2019 17:10:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E2CA88E0003; Mon, 29 Jul 2019 17:10:19 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-vs1-f69.google.com (mail-vs1-f69.google.com [209.85.217.69]) by kanga.kvack.org (Postfix) with ESMTP id BEFD38E0002 for ; Mon, 29 Jul 2019 17:10:19 -0400 (EDT) Received: by mail-vs1-f69.google.com with SMTP id b188so16335944vsc.21 for ; Mon, 29 Jul 2019 14:10:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=PL6wqvYOZ69qRzWKkDI0Itl6m3d3Bz9HZBNcef5DvnU=; b=ZMGEp4NnIIHc9m3jxdR3c0d+nVP+OBwRJ2Zi4FK18Xv4LUZA8j5NTeVeZj1wtUv7n7 eC63vLAa+SGltGygzNzmAgHjFUJCF+6P6zwQ93iCdS13VnmGu5siu+4N4/hK4nWDJWj4 vIPRewPgvhedaazoXKL/pfTEkdbD+/9EiIJ222OO0D65Cxws/D4JqDLBD99M/PKFKxMy u2lpBMMp6+DqTmlcfViaVIqM36qfla12a92QWyL7abE1Eq2gIOY1HewCkYFOgtxpjb/+ WBddIWi/94aduQhinqkSXikF0h5MJb8rZ4LE7pyGoxxwwpb46ywGXenN7wrrwZ0NVKHM cmxw== X-Gm-Message-State: APjAAAVJJfS/ytN2KdxyKr5vylQwypM6YyRA6iTohGTsTktFx1x7l/Rk t9ozAqRDB1cdsqua0j5yIUOLwHIEYuShHAUnOKsEP2o1LTWCXGa0q4K+h6JcCHIQDReyg2B1aE+ V/cWPyghs3sHRUQKrn/HAB5GEtCuVZPVcgmjyBo7hoCcUcS5utH0ViIiM4F5QanZjxg== X-Received: by 2002:ab0:6788:: with SMTP id v8mr24883987uar.48.1564434619444; Mon, 29 Jul 2019 14:10:19 -0700 (PDT) X-Google-Smtp-Source: APXvYqwBGy82ODMwr27/8COEFG0WRE4p7M5ZGpf+RdWM4EgUjJ0Zsh+vwBuE3+RZAqrNElJ98FhC X-Received: by 2002:ab0:6788:: with SMTP id v8mr24883896uar.48.1564434618423; Mon, 29 Jul 2019 14:10:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564434618; cv=none; d=google.com; s=arc-20160816; b=ZrgPMQXz10iX9hZ7Ah3AGE3mZPzFN+EMScbIqpK4BZhvqfnngpgl3jRI5aucUUuIa1 b3TItTnSxZNa3piuCU1iIGf5ds2DnMt43W7bpBnfl+cG1Gj9qpJPesmCyv7VcFqhoGMH RbGSp5izzzGG0ftKWrGMZpCGtUVyltBbP+hQg1seYZLO7+5er61VTOMPHMF3PIApTyK1 oaJv40fAqdE1CktowIPR19LgS4UOaGEP2QZ3OtN3YYu7noq/gFYtCgCVIGvB8Wi7pyJL b9wE52xYKb5jktE1wSCjTSMK6Z8C7MVCKcuZt5/79v7n9s6h/BEh0YGF9UdLHXPUAIRs CgFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=PL6wqvYOZ69qRzWKkDI0Itl6m3d3Bz9HZBNcef5DvnU=; b=U8ioqMiSjvBe8FD0Ti5PZacFFFuyfCKd9va0PLpo2dXYLHcKXIxJoKL7WkNvBl7h+j 1YR4xbepDUfTCY2cbA+YnLoY/vEdQ7LxUqWxwm8j6T0LJi172mf6XA/KVeSiyD7KvcHw AkA3kVr0QdyZWOsBqCguq0qNhj208bLcZauDl/vjTlvMbZdaX9lEzGcc52/bMA5wOkSW 6p4jdYU4MJ4sGTdJ4rY5Zwt2oHlqv7u2orLRwiB1UExQguiLZv8sgv1P39vUQhPwnrvO 3c81/qFNdzQSUp5+0PORPHqw6ycUHZITFTijmTUyAcidlsasi3DC7YQmTy0KAihdO8mx FITg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=QLLWrhSt; spf=pass (google.com: domain of william.kucharski@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=william.kucharski@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from userp2130.oracle.com (userp2130.oracle.com. [156.151.31.86]) by mx.google.com with ESMTPS id g27si14144824uac.156.2019.07.29.14.10.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Jul 2019 14:10:18 -0700 (PDT) Received-SPF: pass (google.com: domain of william.kucharski@oracle.com designates 156.151.31.86 as permitted sender) client-ip=156.151.31.86; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=QLLWrhSt; spf=pass (google.com: domain of william.kucharski@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=william.kucharski@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x6TL9Vx2030410; Mon, 29 Jul 2019 21:09:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2018-07-02; bh=PL6wqvYOZ69qRzWKkDI0Itl6m3d3Bz9HZBNcef5DvnU=; b=QLLWrhStnCtACaMYcMeAgSrAwzhGF959ZKXAt4P7ceiqkwlsjnJhmhhSSy0zAEgHgeVa MjBMW5guN7vE3Eabe0WFsRxBEGPuJIvKFYaTLH1QyboFyMheb/aO/zjBW1wVLQb+XgZY OOJMNB8guzOZSavd75Qylswx31OsCZtT6F0j1ehu8m1mryRdbjBPoXDIgf7BlIp6Sc/v j587mDAK0qOMi+AM1cqEK59FboaleLZlqqS72nxRNWhXODik6oofnp+PsWZ7uAKCCTDL 4QaPPEXobIHC2N6+UMCr8QTQBX8Yp+329WyjrfItC3dNJdfRZJqF0LxWIGIA0fU8yAFQ Aw== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 2u0e1tj9r9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 29 Jul 2019 21:09:55 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x6TL7o5n015443; Mon, 29 Jul 2019 21:09:54 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 2u0dxqf49y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 29 Jul 2019 21:09:54 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x6TL9kU9014044; Mon, 29 Jul 2019 21:09:46 GMT Received: from localhost.localdomain (/73.243.10.6) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 29 Jul 2019 21:09:45 +0000 From: William Kucharski To: ceph-devel@vger.kernel.org, linux-afs@lists.infradead.org, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, netdev@vger.kernel.org, Chris Mason , "David S. Miller" , David Sterba , Josef Bacik Cc: Dave Hansen , Song Liu , Bob Kasten , Mike Kravetz , William Kucharski , Chad Mynhier , "Kirill A. Shutemov" , Johannes Weiner , Matthew Wilcox , Dave Airlie , Vlastimil Babka , Keith Busch , Ralph Campbell , Steve Capper , Dave Chinner , Sean Christopherson , Hugh Dickins , Ilya Dryomov , Alexander Duyck , Thomas Gleixner , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Amir Goldstein , Jason Gunthorpe , Michal Hocko , Jann Horn , David Howells , John Hubbard , Souptick Joarder , "john.hubbard@gmail.com" , Jan Kara , Andrey Konovalov , Arun KS , "Aneesh Kumar K.V" , Jeff Layton , Yangtao Li , Andrew Morton , Robin Murphy , Mike Rapoport , David Rientjes , Andrey Ryabinin , Yafang Shao , Huang Shijie , Yang Shi , Miklos Szeredi , Pavel Tatashin , Kirill Tkhai , Sage Weil , Ira Weiny , Dan Williams , "Darrick J. Wong" , Gao Xiang , Bartlomiej Zolnierkiewicz , Ross Zwisler , kbuild test robot Subject: [PATCH v2 1/2] mm: Allow the page cache to allocate large pages Date: Mon, 29 Jul 2019 15:09:32 -0600 Message-Id: <20190729210933.18674-2-william.kucharski@oracle.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190729210933.18674-1-william.kucharski@oracle.com> References: <20190729210933.18674-1-william.kucharski@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9333 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1907290231 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9333 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1907290231 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Add an order field to __page_cache_alloc() to allow for the allocation of large memory page page cache entries. Signed-off-by: Matthew Wilcox Signed-off-by: William Kucharski Reported-by: kbuild test robot --- fs/afs/dir.c | 2 +- fs/btrfs/compression.c | 2 +- fs/cachefiles/rdwr.c | 4 ++-- fs/ceph/addr.c | 2 +- fs/ceph/file.c | 2 +- include/linux/pagemap.h | 13 +++++++++---- mm/filemap.c | 25 +++++++++++++------------ mm/readahead.c | 2 +- net/ceph/pagelist.c | 4 ++-- net/ceph/pagevec.c | 2 +- 10 files changed, 32 insertions(+), 26 deletions(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index e640d67274be..0a392214f71e 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -274,7 +274,7 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) afs_stat_v(dvnode, n_inval); ret = -ENOMEM; - req->pages[i] = __page_cache_alloc(gfp); + req->pages[i] = __page_cache_alloc(gfp, 0); if (!req->pages[i]) goto error; ret = add_to_page_cache_lru(req->pages[i], diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 60c47b417a4b..5280e7477b7e 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -466,7 +466,7 @@ static noinline int add_ra_bio_pages(struct inode *inode, } page = __page_cache_alloc(mapping_gfp_constraint(mapping, - ~__GFP_FS)); + ~__GFP_FS), 0); if (!page) break; diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 44a3ce1e4ce4..11d30212745f 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -259,7 +259,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object, goto backing_page_already_present; if (!newpage) { - newpage = __page_cache_alloc(cachefiles_gfp); + newpage = __page_cache_alloc(cachefiles_gfp, 0); if (!newpage) goto nomem_monitor; } @@ -495,7 +495,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, goto backing_page_already_present; if (!newpage) { - newpage = __page_cache_alloc(cachefiles_gfp); + newpage = __page_cache_alloc(cachefiles_gfp, 0); if (!newpage) goto nomem; } diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index e078cc55b989..bcb41fbee533 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1707,7 +1707,7 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page) if (len > PAGE_SIZE) len = PAGE_SIZE; } else { - page = __page_cache_alloc(GFP_NOFS); + page = __page_cache_alloc(GFP_NOFS, 0); if (!page) { err = -ENOMEM; goto out; diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 685a03cc4b77..ae58d7c31aa4 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1305,7 +1305,7 @@ static ssize_t ceph_read_iter(struct kiocb *iocb, struct iov_iter *to) struct page *page = NULL; loff_t i_size; if (retry_op == READ_INLINE) { - page = __page_cache_alloc(GFP_KERNEL); + page = __page_cache_alloc(GFP_KERNEL, 0); if (!page) return -ENOMEM; } diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index c7552459a15f..e9004e3cb6a3 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -208,17 +208,17 @@ static inline int page_cache_add_speculative(struct page *page, int count) } #ifdef CONFIG_NUMA -extern struct page *__page_cache_alloc(gfp_t gfp); +extern struct page *__page_cache_alloc(gfp_t gfp, unsigned int order); #else -static inline struct page *__page_cache_alloc(gfp_t gfp) +static inline struct page *__page_cache_alloc(gfp_t gfp, unsigned int order) { - return alloc_pages(gfp, 0); + return alloc_pages(gfp, order); } #endif static inline struct page *page_cache_alloc(struct address_space *x) { - return __page_cache_alloc(mapping_gfp_mask(x)); + return __page_cache_alloc(mapping_gfp_mask(x), 0); } static inline gfp_t readahead_gfp_mask(struct address_space *x) @@ -240,6 +240,11 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping, #define FGP_NOFS 0x00000010 #define FGP_NOWAIT 0x00000020 #define FGP_FOR_MMAP 0x00000040 +/* If you add more flags, increment FGP_ORDER_SHIFT */ +#define FGP_ORDER_SHIFT 7 +#define FGP_PMD ((PMD_SHIFT - PAGE_SHIFT) << FGP_ORDER_SHIFT) +#define FGP_PUD ((PUD_SHIFT - PAGE_SHIFT) << FGP_ORDER_SHIFT) +#define fgp_get_order(fgp) ((fgp) >> FGP_ORDER_SHIFT) struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, int fgp_flags, gfp_t cache_gfp_mask); diff --git a/mm/filemap.c b/mm/filemap.c index d0cf700bf201..a96092243fc4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -954,7 +954,7 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping, EXPORT_SYMBOL_GPL(add_to_page_cache_lru); #ifdef CONFIG_NUMA -struct page *__page_cache_alloc(gfp_t gfp) +struct page *__page_cache_alloc(gfp_t gfp, unsigned int order) { int n; struct page *page; @@ -964,12 +964,12 @@ struct page *__page_cache_alloc(gfp_t gfp) do { cpuset_mems_cookie = read_mems_allowed_begin(); n = cpuset_mem_spread_node(); - page = __alloc_pages_node(n, gfp, 0); + page = __alloc_pages_node(n, gfp, order); } while (!page && read_mems_allowed_retry(cpuset_mems_cookie)); return page; } - return alloc_pages(gfp, 0); + return alloc_pages(gfp, order); } EXPORT_SYMBOL(__page_cache_alloc); #endif @@ -1597,12 +1597,12 @@ EXPORT_SYMBOL(find_lock_entry); * pagecache_get_page - find and get a page reference * @mapping: the address_space to search * @offset: the page index - * @fgp_flags: PCG flags + * @fgp_flags: FGP flags * @gfp_mask: gfp mask to use for the page cache data page allocation * * Looks up the page cache slot at @mapping & @offset. * - * PCG flags modify how the page is returned. + * FGP flags modify how the page is returned. * * @fgp_flags can be: * @@ -1615,6 +1615,7 @@ EXPORT_SYMBOL(find_lock_entry); * - FGP_FOR_MMAP: Similar to FGP_CREAT, only we want to allow the caller to do * its own locking dance if the page is already in cache, or unlock the page * before returning if we had to add the page to pagecache. + * - FGP_PMD: If FGP_CREAT is specified, attempt to allocate a PMD-sized page. * * If FGP_LOCK or FGP_CREAT are specified then the function may sleep even * if the GFP flags specified for FGP_CREAT are atomic. @@ -1660,12 +1661,13 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, no_page: if (!page && (fgp_flags & FGP_CREAT)) { int err; - if ((fgp_flags & FGP_WRITE) && mapping_cap_account_dirty(mapping)) + if ((fgp_flags & FGP_WRITE) && + mapping_cap_account_dirty(mapping)) gfp_mask |= __GFP_WRITE; if (fgp_flags & FGP_NOFS) gfp_mask &= ~__GFP_FS; - page = __page_cache_alloc(gfp_mask); + page = __page_cache_alloc(gfp_mask, fgp_get_order(fgp_flags)); if (!page) return NULL; @@ -2802,15 +2804,14 @@ static struct page *wait_on_page_read(struct page *page) static struct page *do_read_cache_page(struct address_space *mapping, pgoff_t index, int (*filler)(void *, struct page *), - void *data, - gfp_t gfp) + void *data, unsigned int order, gfp_t gfp) { struct page *page; int err; repeat: page = find_get_page(mapping, index); if (!page) { - page = __page_cache_alloc(gfp); + page = __page_cache_alloc(gfp, order); if (!page) return ERR_PTR(-ENOMEM); err = add_to_page_cache_lru(page, mapping, index, gfp); @@ -2917,7 +2918,7 @@ struct page *read_cache_page(struct address_space *mapping, int (*filler)(void *, struct page *), void *data) { - return do_read_cache_page(mapping, index, filler, data, + return do_read_cache_page(mapping, index, filler, data, 0, mapping_gfp_mask(mapping)); } EXPORT_SYMBOL(read_cache_page); @@ -2939,7 +2940,7 @@ struct page *read_cache_page_gfp(struct address_space *mapping, pgoff_t index, gfp_t gfp) { - return do_read_cache_page(mapping, index, NULL, NULL, gfp); + return do_read_cache_page(mapping, index, NULL, NULL, 0, gfp); } EXPORT_SYMBOL(read_cache_page_gfp); diff --git a/mm/readahead.c b/mm/readahead.c index 2fe72cd29b47..954760a612ea 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -193,7 +193,7 @@ unsigned int __do_page_cache_readahead(struct address_space *mapping, continue; } - page = __page_cache_alloc(gfp_mask); + page = __page_cache_alloc(gfp_mask, 0); if (!page) break; page->index = page_offset; diff --git a/net/ceph/pagelist.c b/net/ceph/pagelist.c index 65e34f78b05d..0c3face908dc 100644 --- a/net/ceph/pagelist.c +++ b/net/ceph/pagelist.c @@ -56,7 +56,7 @@ static int ceph_pagelist_addpage(struct ceph_pagelist *pl) struct page *page; if (!pl->num_pages_free) { - page = __page_cache_alloc(GFP_NOFS); + page = __page_cache_alloc(GFP_NOFS, 0); } else { page = list_first_entry(&pl->free_list, struct page, lru); list_del(&page->lru); @@ -107,7 +107,7 @@ int ceph_pagelist_reserve(struct ceph_pagelist *pl, size_t space) space = (space + PAGE_SIZE - 1) >> PAGE_SHIFT; /* conv to num pages */ while (space > pl->num_pages_free) { - struct page *page = __page_cache_alloc(GFP_NOFS); + struct page *page = __page_cache_alloc(GFP_NOFS, 0); if (!page) return -ENOMEM; list_add_tail(&page->lru, &pl->free_list); diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c index 64305e7056a1..1d07e639216d 100644 --- a/net/ceph/pagevec.c +++ b/net/ceph/pagevec.c @@ -45,7 +45,7 @@ struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags) if (!pages) return ERR_PTR(-ENOMEM); for (i = 0; i < num_pages; i++) { - pages[i] = __page_cache_alloc(flags); + pages[i] = __page_cache_alloc(flags, 0); if (pages[i] == NULL) { ceph_release_page_vector(pages, i); return ERR_PTR(-ENOMEM); From patchwork Mon Jul 29 21:09:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Kucharski X-Patchwork-Id: 11064539 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2CB0B912 for ; Mon, 29 Jul 2019 21:10:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1B59728429 for ; Mon, 29 Jul 2019 21:10:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0D53928468; Mon, 29 Jul 2019 21:10:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AF12A28429 for ; Mon, 29 Jul 2019 21:10:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 63ED48E0006; Mon, 29 Jul 2019 17:10:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 617648E0002; Mon, 29 Jul 2019 17:10:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 450158E0006; Mon, 29 Jul 2019 17:10:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by kanga.kvack.org (Postfix) with ESMTP id 202458E0002 for ; Mon, 29 Jul 2019 17:10:22 -0400 (EDT) Received: by mail-qt1-f197.google.com with SMTP id l16so49619686qtq.16 for ; Mon, 29 Jul 2019 14:10:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=JMoCGVOAyJvKikg5JlAm/JkTHC9oMd/QX94tYPzIgho=; b=OoGbFD8k0y7RYh2LYAJJiW2AbNEqJh5Snlt/01CgdVfgXJFHkB5WIHVnNla6Ld1W12 1u1/Pu/U7XA+cs9i1l2ds7CD50kdA36JbdvEFHoL33UXoqeEQF5ICdMCkjjj8HO3oMBV 7xT3EGgjzy+DbmwO2PoB/6raZaj5unDVb6XCrPQV3jo1vbMkXHTaOO0wMci5MbgAZiir 3bQw/u0iaG38qLQpTviVm0qzn5DniEWJ7xidM8yvHW0TgnDUnJfApM3nTwbLMVVgJyZ7 zfBat9bfjF992ArJ1HM+e4FcA/Nr5vJFvMUymy2LOuZthSnwMu3Z919irYHpqO++v6rd I4FQ== X-Gm-Message-State: APjAAAUrmYIXTIJmrqtK3KR+ZD11pxiva2TlEGjiqD06bIPAeukX1Jm5 p7/3EgVNrmWXLclGMKnjVLOAbcADfiDoPsCY9A4T/ZBQn1/Mvq5PNbbBP6QcxAqnPBu0K+rh9NQ 4AmbyttMkaRDhvXaam5B1Rl14n4oALu9PRuPyqu34Jie1RezL/7cWmmBsyxGueNlWPA== X-Received: by 2002:ac8:28e2:: with SMTP id j31mr79207752qtj.274.1564434621804; Mon, 29 Jul 2019 14:10:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqyOxxukg7TyqdmcZEdTJ5o1dzVynFrSM1kUdnUpb95YLRy8ABcT5aBFqzOpgbTD6codQu/d X-Received: by 2002:ac8:28e2:: with SMTP id j31mr79207699qtj.274.1564434620513; Mon, 29 Jul 2019 14:10:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564434620; cv=none; d=google.com; s=arc-20160816; b=YhJaLERSJH8r/jC2VNr12dU8k2FfxiEdz3j2T0j9efUj3camrnHfvMl0YrfDmYMlcv yZqcGPv3jRRA6nbVl5+S9uH5ikx7CQgC1XvbvEzB9Y948AqnIbnUGDaB8bZtMdvMb6R1 o5DCasSdFMuBi7icL9ptg1YMv1l93QeQXpzynOfXRdeSQyt1TxEBlNBQKo77AJUphpo5 ak39R121wW7JHqRV3nCghHP/yRFFRFz3X3dy0OFwckBcStzzqU/O0BDA9sHWFm8M2ZgX hZYcI/+D4jC0ltDHsMrGGCqg0+81BLCTUuNuc7ck2mEKE+RETwHtnGYCnjNoWvWBXqO+ b3hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=JMoCGVOAyJvKikg5JlAm/JkTHC9oMd/QX94tYPzIgho=; b=ptNWv7hIi5zkOLDxWqafOl3Ily0bqrtyGpRVM3bsIuwzLCOKEt77e2Ks0Yi2Hm231o ExVTWyMZ5YhD24WrG19t7Cl3CPWcINHMxI+KbKqWiSLrzgnBtM+lYutlgUKw8vcGton/ eJclPwAAVaPCbbidfzPULYbil+D9Mzcy7QwI/ETofU/5g5kaaIksWHmpGBfp7e6RlPkj U5ouHmuEEUboQ/RCbF3KswlZ3LBtc9BqGR5dc4RNv0PWqQ0g7DXXkPZm6xVx0GD4Y6y0 e9B7EfX36GQspYdzA/iydTiE85GR2LNuZizGKH1TuMpzq7RxyJG3UY2T8TD9JFjNJFpZ s/Wg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=ec16a6W0; spf=pass (google.com: domain of william.kucharski@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=william.kucharski@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from userp2130.oracle.com (userp2130.oracle.com. [156.151.31.86]) by mx.google.com with ESMTPS id o129si32660572qke.374.2019.07.29.14.10.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Jul 2019 14:10:20 -0700 (PDT) Received-SPF: pass (google.com: domain of william.kucharski@oracle.com designates 156.151.31.86 as permitted sender) client-ip=156.151.31.86; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=ec16a6W0; spf=pass (google.com: domain of william.kucharski@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=william.kucharski@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x6TL9TUC030387; Mon, 29 Jul 2019 21:09:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2018-07-02; bh=JMoCGVOAyJvKikg5JlAm/JkTHC9oMd/QX94tYPzIgho=; b=ec16a6W0APo7dr3+fWNRfBHC8fhWav9/RFqOgqbLeXi6k2z+tzqIoDfM+bTPJf4hWelw /iqRtyky0NON7s+5Wws4SJYLMl+WQ9m7NLnTRXBJKkrAV1+SGPhYtzpKtNotuiwY33eD 71qlxY7BeX2RlrlAsF94unNGJk9nPlwmvOvZfbJOzBRI2rjY6XhyDOj68Ev0ixwQLf7Z MguGfzaGJiOi8sdk5j9Dk2T75cTpbQSyl7pzTxEGqzuxDVxaGS1Lgsb1JJmyGWIfENav Yyfp7jfnpgY9tCyRR/kOKPcjuSX1fHjM89IDss5JxaFHSRFPDdQvu81AoJvQ3n0jMG9i 9g== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 2u0e1tj9rc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 29 Jul 2019 21:09:55 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x6TL7nq8015324; Mon, 29 Jul 2019 21:09:54 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 2u0dxqf4a0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 29 Jul 2019 21:09:54 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x6TL9mfF009787; Mon, 29 Jul 2019 21:09:48 GMT Received: from localhost.localdomain (/73.243.10.6) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 29 Jul 2019 21:09:48 +0000 From: William Kucharski To: ceph-devel@vger.kernel.org, linux-afs@lists.infradead.org, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, netdev@vger.kernel.org, Chris Mason , "David S. Miller" , David Sterba , Josef Bacik Cc: Dave Hansen , Song Liu , Bob Kasten , Mike Kravetz , William Kucharski , Chad Mynhier , "Kirill A. Shutemov" , Johannes Weiner , Matthew Wilcox , Dave Airlie , Vlastimil Babka , Keith Busch , Ralph Campbell , Steve Capper , Dave Chinner , Sean Christopherson , Hugh Dickins , Ilya Dryomov , Alexander Duyck , Thomas Gleixner , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Amir Goldstein , Jason Gunthorpe , Michal Hocko , Jann Horn , David Howells , John Hubbard , Souptick Joarder , "john.hubbard@gmail.com" , Jan Kara , Andrey Konovalov , Arun KS , "Aneesh Kumar K.V" , Jeff Layton , Yangtao Li , Andrew Morton , Robin Murphy , Mike Rapoport , David Rientjes , Andrey Ryabinin , Yafang Shao , Huang Shijie , Yang Shi , Miklos Szeredi , Pavel Tatashin , Kirill Tkhai , Sage Weil , Ira Weiny , Dan Williams , "Darrick J. Wong" , Gao Xiang , Bartlomiej Zolnierkiewicz , Ross Zwisler Subject: [PATCH v2 2/2] mm,thp: Add experimental config option RO_EXEC_FILEMAP_HUGE_FAULT_THP Date: Mon, 29 Jul 2019 15:09:33 -0600 Message-Id: <20190729210933.18674-3-william.kucharski@oracle.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190729210933.18674-1-william.kucharski@oracle.com> References: <20190729210933.18674-1-william.kucharski@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9333 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1907290231 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9333 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1907290231 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Add filemap_huge_fault() to attempt to satisfy page faults on memory-mapped read-only text pages using THP when possible. Signed-off-by: William Kucharski --- include/linux/huge_mm.h | 16 ++- include/linux/mm.h | 6 + mm/Kconfig | 15 ++ mm/filemap.c | 299 +++++++++++++++++++++++++++++++++++++++- mm/huge_memory.c | 3 + mm/mmap.c | 36 ++++- mm/rmap.c | 8 ++ 7 files changed, 373 insertions(+), 10 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 45ede62aa85b..34723f7e75d0 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -79,13 +79,15 @@ extern struct kobj_attribute shmem_enabled_attr; #define HPAGE_PMD_NR (1<index != hindex || (!PageUptodate(page)) || + (!PageLocked(page)))) + return false; + + break; + } + + xas_set(xasp, hindex); + *pagep = page; + return true; +} + +/** + * filemap_huge_fault - read in file data for page fault handling to THP + * @vmf: struct vm_fault containing details of the fault + * @pe_size: large page size to map, currently this must be PE_SIZE_PMD + * + * filemap_huge_fault() is invoked via the vma operations vector for a + * mapped memory region to read in file data to a transparent huge page during + * a page fault. + * + * If for any reason we can't allocate a THP, map it or add it to the page + * cache, VM_FAULT_FALLBACK will be returned which will cause the fault + * handler to try mapping the page using a PAGESIZE page, usually via + * filemap_fault() if so speicifed in the vma operations vector. + * + * Returns either VM_FAULT_FALLBACK or the result of calling allcc_set_pte() + * to map the new THP. + * + * NOTE: This routine depends upon the file system's readpage routine as + * specified in the address space operations vector to recognize when it + * is being passed a large page and to read the approprate amount of data + * in full and without polluting the page cache for the large page itself + * with PAGESIZE pages to perform a buffered read or to pollute what + * would be the page cache space for any succeeding pages with PAGESIZE + * pages due to readahead. + * + * It is VITAL that this routine not be enabled without such filesystem + * support. As there is no way to determine how many bytes were read by + * the readpage() operation, if only a PAGESIZE page is read, this routine + * will map the THP containing only the first PAGESIZE bytes of file data + * to satisfy the fault, which is never the result desired. + */ +vm_fault_t filemap_huge_fault(struct vm_fault *vmf, + enum page_entry_size pe_size) +{ + struct file *filp = vmf->vma->vm_file; + struct address_space *mapping = filp->f_mapping; + struct vm_area_struct *vma = vmf->vma; + + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + pgoff_t hindex = round_down(vmf->pgoff, HPAGE_PMD_NR); + pgoff_t hindex_max = hindex + HPAGE_PMD_NR; + + struct page *cached_page, *hugepage; + struct page *new_page = NULL; + + vm_fault_t ret = VM_FAULT_FALLBACK; + int error; + + XA_STATE_ORDER(xas, &mapping->i_pages, hindex, HPAGE_PMD_ORDER); + + /* + * Return VM_FAULT_FALLBACK if: + * + * + pe_size != PE_SIZE_PMD + * + FAULT_FLAG_WRITE is set in vmf->flags + * + vma isn't aligned to allow a PMD mapping + * + PMD would extend beyond the end of the vma + */ + if (pe_size != PE_SIZE_PMD || (vmf->flags & FAULT_FLAG_WRITE) || + (haddr < vma->vm_start || + (haddr + HPAGE_PMD_SIZE > vma->vm_end))) + return ret; + + xas_lock_irq(&xas); + +retry_xas_locked: + if (!filemap_huge_check_pagecache_usable(&xas, &cached_page, hindex, + hindex_max)) { + /* found a conflicting entry in the page cache, so fallback */ + goto unlock; + } else if (cached_page) { + /* found a valid cached page, so map it */ + hugepage = cached_page; + goto map_huge; + } + + xas_unlock_irq(&xas); + + /* allocate huge THP page in VMA */ + new_page = __page_cache_alloc(vmf->gfp_mask | __GFP_COMP | + __GFP_NOWARN | __GFP_NORETRY, HPAGE_PMD_ORDER); + + if (unlikely(!new_page)) + return ret; + + if (unlikely(!(PageCompound(new_page)))) { + put_page(new_page); + return ret; + } + + prep_transhuge_page(new_page); + new_page->index = hindex; + new_page->mapping = mapping; + + __SetPageLocked(new_page); + + /* + * The readpage() operation below is expected to fill the large + * page with data without polluting the page cache with + * PAGESIZE entries due to a buffered read and/or readahead(). + * + * A filesystem's vm_operations_struct huge_fault field should + * never point to this routine without such a capability, and + * without it a call to this routine would eventually just + * fall through to the normal fault op anyway. + */ + error = mapping->a_ops->readpage(vmf->vma->vm_file, new_page); + + if (unlikely(error)) { + put_page(new_page); + return ret; + } + + /* XXX - use wait_on_page_locked_killable() instead? */ + wait_on_page_locked(new_page); + + if (!PageUptodate(new_page)) { + /* EIO */ + new_page->mapping = NULL; + put_page(new_page); + return ret; + } + + do { + xas_lock_irq(&xas); + xas_set(&xas, hindex); + xas_create_range(&xas); + + if (!(xas_error(&xas))) + break; + + if (!xas_nomem(&xas, GFP_KERNEL)) { + if (new_page) { + new_page->mapping = NULL; + put_page(new_page); + } + + goto unlock; + } + + xas_unlock_irq(&xas); + } while (1); + + /* + * Double check that an entry did not sneak into the page cache while + * creating Xarray entries for the new page. + */ + if (!filemap_huge_check_pagecache_usable(&xas, &cached_page, hindex, + hindex_max)) { + /* + * An unusable entry was found, so delete the newly allocated + * page and fallback. + */ + new_page->mapping = NULL; + put_page(new_page); + goto unlock; + } else if (cached_page) { + /* + * A valid large page was found in the page cache, so free the + * newly allocated page and map the cached page instead. + */ + new_page->mapping = NULL; + put_page(new_page); + new_page = NULL; + hugepage = cached_page; + goto map_huge; + } + + __SetPageLocked(new_page); + + /* did it get truncated? */ + if (unlikely(new_page->mapping != mapping)) { + unlock_page(new_page); + put_page(new_page); + goto retry_xas_locked; + } + + hugepage = new_page; + +map_huge: + /* map hugepage at the PMD level */ + ret = alloc_set_pte(vmf, NULL, hugepage); + + VM_BUG_ON_PAGE((!(pmd_trans_huge(*vmf->pmd))), hugepage); + + if (likely(!(ret & VM_FAULT_ERROR))) { + /* + * The alloc_set_pte() succeeded without error, so + * add the page to the page cache if it is new, and + * increment page statistics accordingly. + */ + if (new_page) { + unsigned long nr; + + xas_set(&xas, hindex); + + for (nr = 0; nr < HPAGE_PMD_NR; nr++) { +#ifndef COMPOUND_PAGES_HEAD_ONLY + xas_store(&xas, new_page + nr); +#else + xas_store(&xas, new_page); +#endif + xas_next(&xas); + } + + count_vm_event(THP_FILE_ALLOC); + __inc_node_page_state(new_page, NR_SHMEM_THPS); + __mod_node_page_state(page_pgdat(new_page), + NR_FILE_PAGES, HPAGE_PMD_NR); + __mod_node_page_state(page_pgdat(new_page), + NR_SHMEM, HPAGE_PMD_NR); + } + + vmf->address = haddr; + vmf->page = hugepage; + + page_ref_add(hugepage, HPAGE_PMD_NR); + count_vm_event(THP_FILE_MAPPED); + } else if (new_page) { + /* there was an error mapping the new page, so release it */ + new_page->mapping = NULL; + put_page(new_page); + } + +unlock: + xas_unlock_irq(&xas); + return ret; +} +EXPORT_SYMBOL(filemap_huge_fault); +#endif + void filemap_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff) { @@ -2924,7 +3218,8 @@ struct page *read_cache_page(struct address_space *mapping, EXPORT_SYMBOL(read_cache_page); /** - * read_cache_page_gfp - read into page cache, using specified page allocation flags. + * read_cache_page_gfp - read into page cache, using specified page allocation + * flags. * @mapping: the page's address_space * @index: the page index * @gfp: the page allocator flags to use if allocating diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1334ede667a8..26d74466d1f7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -543,8 +543,11 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, if (addr) goto out; + +#ifndef CONFIG_RO_EXEC_FILEMAP_HUGE_FAULT_THP if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD)) goto out; +#endif addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE); if (addr) diff --git a/mm/mmap.c b/mm/mmap.c index 7e8c3e8ae75f..96ff80d2a8fb 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1391,6 +1391,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr, struct mm_struct *mm = current->mm; int pkey = 0; +#ifdef CONFIG_RO_EXEC_FILEMAP_HUGE_FAULT_THP + unsigned long vm_maywrite = VM_MAYWRITE; +#endif + *populate = 0; if (!len) @@ -1429,7 +1433,33 @@ unsigned long do_mmap(struct file *file, unsigned long addr, /* Obtain the address to map to. we verify (or select) it and ensure * that it represents a valid section of the address space. */ - addr = get_unmapped_area(file, addr, len, pgoff, flags); + +#ifdef CONFIG_RO_EXEC_FILEMAP_HUGE_FAULT_THP + /* + * If THP is enabled, it's a read-only executable that is + * MAP_PRIVATE mapped, the length is larger than a PMD page + * and either it's not a MAP_FIXED mapping or the passed address is + * properly aligned for a PMD page, attempt to get an appropriate + * address at which to map a PMD-sized THP page, otherwise call the + * normal routine. + */ + if ((prot & PROT_READ) && (prot & PROT_EXEC) && + (!(prot & PROT_WRITE)) && (flags & MAP_PRIVATE) && + (!(flags & MAP_FIXED)) && len >= HPAGE_PMD_SIZE && + (!(addr & HPAGE_PMD_OFFSET))) { + addr = thp_get_unmapped_area(file, addr, len, pgoff, flags); + + if (addr && (!(addr & HPAGE_PMD_OFFSET))) + vm_maywrite = 0; + else + addr = get_unmapped_area(file, addr, len, pgoff, flags); + } else { +#endif + addr = get_unmapped_area(file, addr, len, pgoff, flags); +#ifdef CONFIG_RO_EXEC_FILEMAP_HUGE_FAULT_THP + } +#endif + if (offset_in_page(addr)) return addr; @@ -1451,7 +1481,11 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * of the memory object, so we don't do any here. */ vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | +#ifdef CONFIG_RO_EXEC_FILEMAP_HUGE_FAULT_THP + mm->def_flags | VM_MAYREAD | vm_maywrite | VM_MAYEXEC; +#else mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; +#endif if (flags & MAP_LOCKED) if (!can_do_mlock()) diff --git a/mm/rmap.c b/mm/rmap.c index e5dfe2ae6b0d..503612d3b52b 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1192,7 +1192,11 @@ void page_add_file_rmap(struct page *page, bool compound) } if (!atomic_inc_and_test(compound_mapcount_ptr(page))) goto out; + +#ifndef CONFIG_RO_EXEC_FILEMAP_HUGE_FAULT_THP VM_BUG_ON_PAGE(!PageSwapBacked(page), page); +#endif + __inc_node_page_state(page, NR_SHMEM_PMDMAPPED); } else { if (PageTransCompound(page) && page_mapping(page)) { @@ -1232,7 +1236,11 @@ static void page_remove_file_rmap(struct page *page, bool compound) } if (!atomic_add_negative(-1, compound_mapcount_ptr(page))) goto out; + +#ifndef CONFIG_RO_EXEC_FILEMAP_HUGE_FAULT_THP VM_BUG_ON_PAGE(!PageSwapBacked(page), page); +#endif + __dec_node_page_state(page, NR_SHMEM_PMDMAPPED); } else { if (!atomic_add_negative(-1, &page->_mapcount))