From patchwork Tue Jun 25 11:44:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13710920 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE65CC3065A for ; Tue, 25 Jun 2024 11:44:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E9856B02E2; Tue, 25 Jun 2024 07:44:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 471746B02E7; Tue, 25 Jun 2024 07:44:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C3D56B02E2; Tue, 25 Jun 2024 07:44:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 087746B01F3 for ; Tue, 25 Jun 2024 07:44:35 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B2B181A1129 for ; Tue, 25 Jun 2024 11:44:34 +0000 (UTC) X-FDA: 82269228468.06.2FE749F Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by imf05.hostedemail.com (Postfix) with ESMTP id E0416100019 for ; Tue, 25 Jun 2024 11:44:32 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=SoczZoJA; spf=pass (imf05.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719315865; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tfN9Et075BXtO0w+rFRIJHZouvd8/PL1Nw0p2P4Vhow=; b=a8uEv8K5AjkTUqVXK+GmidM45L04Ui3qkWDzrkG4Tp14ofkr9+v8l0qwggcfmIU5YD+mV8 FJYLcYGl45mL24Ob3zNXLXK384J1FkDImmjtXxT+bnLziqo5XP1FJGW8Q3kVVMNxb5MhqR sO18wlU6Z0km/QpIhTRTMmtM/IbOjcc= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=SoczZoJA; spf=pass (imf05.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719315865; a=rsa-sha256; cv=none; b=WwzQOLLCy+o2ISODzjw6c09LMe9m4s0iC+DjSDqsUqn027gIGFh15eobNrCyHZpl2DrbA+ v0+1X8UqXuMb+3uMGrmO766CV2Ov+PpupqpIjfyHz7+CIkePA50v6YdGj1Z3Z9Hh+uZ5q8 EllwH/C1QDmXZvbNXRKynqSoFJhRqVI= Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:b231:465::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4W7jfj1jJyz9sWy; Tue, 25 Jun 2024 13:44:29 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1719315869; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tfN9Et075BXtO0w+rFRIJHZouvd8/PL1Nw0p2P4Vhow=; b=SoczZoJACn34827uX93daQ8lJf6ybr2/Kjwys5/562jiLW33olc3/Nk0DvBhTfmsSiU+Fz EfrD0iUTuiXsnFbcycOTuDR+KZ2TbvDr5Knj7DSpZ/SKWpnUrnvxGowAv0Sw0NVRNgY2hg W+lmRCRB0W2dZevkmrrUeu6TUK5TWjKruyxd64SBSS/ooiBiKJ6LFXJIgZry8FINwgrWdz 2PC0PF3Cw6YvOXvxZnDnXZCGWoT2VMlW0HPLsCRUUPZS6MmhawlRsXc+2+m+1a+Wne/NZm zN71v04qXXtjBlogHNvBoYa9LPpkYTwtPglaXl/1lE99RZ2JgaURCBvb0/BK0w== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v8 01/10] fs: Allow fine-grained control of folio sizes Date: Tue, 25 Jun 2024 11:44:11 +0000 Message-ID: <20240625114420.719014-2-kernel@pankajraghav.com> In-Reply-To: <20240625114420.719014-1-kernel@pankajraghav.com> References: <20240625114420.719014-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: E0416100019 X-Stat-Signature: uqty3hchhcshux55fjt1ph4xr4jord1x X-Rspam-User: X-HE-Tag: 1719315872-333996 X-HE-Meta: U2FsdGVkX18K8WAyaFMMIQvDPzorvMykiNNR4RwmKWcuIa7O+2B2yc1uB9jUZlV8EBp5KB3m2ZEkYRYyjXAb2YPNDT3qUfPvl9awrG+B5VpdcUWFe9DuSzUseX8pWjMN0xL2t86/8g7R3TdDbogydIAaErdBV0jQngMd2jJTvU3Y0W/FHK5AGIDweYfGL9K+w8Lmxc2np7UVcOwhMtaNDMPPnDMM5MhcD74VvimvzK9BNfemR0hQegeXKf4CQcqgPfFa1V50ABFO2Aq+NW0eCw5SXZZ/BEHl3YRaqD59Z9q9+u2DBlasa9TOJdQf/wAhmCy/hoVOJU4KE+UV1HnxzSr/FpVDaB48gihIsoCCBhQcHLuY2OhhMg/PBHREG4lKuGK4kIm0ThwC1PDzjbMnDfz/m0a80RK2QnA9idk7ZhF7RkZcJ1/IhJs1rEGJvBUy2IS0hQAXA6FF6DxYSPwHl4vAsCr282l9w8XlSW1TjBAHt7K4TOwHz69v3FxgnrDFq+MyYza7xpl1tx55rhd51vn7jQvnUfDEYeTG6ZABBPiiw2YF6ayKu52Y+LTAAy/KdUTBMRxnSE72HoEm0LVlLTg5iHyCOO6iUV6U6fRdQgWPWIvIuSx4TX2H1kTPobcAOE242poik/zks2HL46LQLpsT3LITZzoCSK1OTB2tsvHzV+49wYWhf3iyQ7pyt/Xzk2txP0RONTTA3uEsUoH21QIw5wuGoIX2npiYesULA9oG2JmFE/C6Edk/wDX9xdHBpMlWMlNOvzRx3xQNFtSA2M6xDRZ8iugl6vnKv4Es63PPdF8QoB8kWa4tO7jlZfSMH31V/5E9yhzuLW14efgB/4y93x/juqTbF7768dJRIrPUYDAi4SqlG9JDd6stHJQH0bbaJH4IS1NYMshIwRz31cYX3QGuGTkBHmrkep3orsG5YDlOAkQsVZ7rDiEkS7B8EPL7CmwjdTtiYgxuVEG 0+HpgAdl Y/2oh2fW+fPH2wMOl4lmEh79BUPpCM/WWLkWhHdeDQtTD5kr0bopRoZSmpENvJIhV+N80q2vfBQAfZL+Rgxj/O0tvWPYy39tmRJUcEPBfwrNXIdYswzaSNQtNvxaXWWYJcQkhkkkFf8VDmgmhgYaPLc2bAXW1XEpKNJjJihM8xChM7QFENYghigm/AYTKFc+esOIxprRcoMXUcAtCNvqQkbyAP+RJFMcIsAO50THdAnD3SVsYS1DL1q8jXx55SD5kJQlBxTvKPZHe80SKBdkpjN31BpAfQb3p9RhEYDZkZxyQSac9AIa0qBINBGNRWWBJlW5Y7XnOZwUpJN/WcHtQOOmH6DBIP/HS8Sqos09FBkpKPetWhD5s+aLUvccYMBkSUz+/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: "Matthew Wilcox (Oracle)" We need filesystems to be able to communicate acceptable folio sizes to the pagecache for a variety of uses (e.g. large block sizes). Support a range of folio sizes between order-0 and order-31. Signed-off-by: Matthew Wilcox (Oracle) Co-developed-by: Pankaj Raghav Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke Reviewed-by: Darrick J. Wong --- include/linux/pagemap.h | 86 ++++++++++++++++++++++++++++++++++------- mm/filemap.c | 6 +-- mm/readahead.c | 4 +- 3 files changed, 77 insertions(+), 19 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 4b71d581091f..0c51154cdb57 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -204,14 +204,21 @@ enum mapping_flags { AS_EXITING = 4, /* final truncate in progress */ /* writeback related tags are not used */ AS_NO_WRITEBACK_TAGS = 5, - AS_LARGE_FOLIO_SUPPORT = 6, - AS_RELEASE_ALWAYS, /* Call ->release_folio(), even if no private data */ - AS_STABLE_WRITES, /* must wait for writeback before modifying + AS_RELEASE_ALWAYS = 6, /* Call ->release_folio(), even if no private data */ + AS_STABLE_WRITES = 7, /* must wait for writeback before modifying folio contents */ - AS_UNMOVABLE, /* The mapping cannot be moved, ever */ - AS_INACCESSIBLE, /* Do not attempt direct R/W access to the mapping */ + AS_UNMOVABLE = 8, /* The mapping cannot be moved, ever */ + AS_INACCESSIBLE = 9, /* Do not attempt direct R/W access to the mapping */ + /* Bits 16-25 are used for FOLIO_ORDER */ + AS_FOLIO_ORDER_BITS = 5, + AS_FOLIO_ORDER_MIN = 16, + AS_FOLIO_ORDER_MAX = AS_FOLIO_ORDER_MIN + AS_FOLIO_ORDER_BITS, }; +#define AS_FOLIO_ORDER_MASK ((1u << AS_FOLIO_ORDER_BITS) - 1) +#define AS_FOLIO_ORDER_MIN_MASK (AS_FOLIO_ORDER_MASK << AS_FOLIO_ORDER_MIN) +#define AS_FOLIO_ORDER_MAX_MASK (AS_FOLIO_ORDER_MASK << AS_FOLIO_ORDER_MAX) + /** * mapping_set_error - record a writeback error in the address_space * @mapping: the mapping in which an error should be set @@ -360,9 +367,49 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask) #define MAX_PAGECACHE_ORDER 8 #endif +/* + * mapping_set_folio_order_range() - Set the orders supported by a file. + * @mapping: The address space of the file. + * @min: Minimum folio order (between 0-MAX_PAGECACHE_ORDER inclusive). + * @max: Maximum folio order (between @min-MAX_PAGECACHE_ORDER inclusive). + * + * The filesystem should call this function in its inode constructor to + * indicate which base size (min) and maximum size (max) of folio the VFS + * can use to cache the contents of the file. This should only be used + * if the filesystem needs special handling of folio sizes (ie there is + * something the core cannot know). + * Do not tune it based on, eg, i_size. + * + * Context: This should not be called while the inode is active as it + * is non-atomic. + */ +static inline void mapping_set_folio_order_range(struct address_space *mapping, + unsigned int min, + unsigned int max) +{ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + return; + + if (min > MAX_PAGECACHE_ORDER) + min = MAX_PAGECACHE_ORDER; + if (max > MAX_PAGECACHE_ORDER) + max = MAX_PAGECACHE_ORDER; + if (max < min) + max = min; + + mapping->flags = (mapping->flags & ~AS_FOLIO_ORDER_MASK) | + (min << AS_FOLIO_ORDER_MIN) | (max << AS_FOLIO_ORDER_MAX); +} + +static inline void mapping_set_folio_min_order(struct address_space *mapping, + unsigned int min) +{ + mapping_set_folio_order_range(mapping, min, MAX_PAGECACHE_ORDER); +} + /** * mapping_set_large_folios() - Indicate the file supports large folios. - * @mapping: The file. + * @mapping: The address space of the file. * * The filesystem should call this function in its inode constructor to * indicate that the VFS can use large folios to cache the contents of @@ -373,7 +420,23 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask) */ static inline void mapping_set_large_folios(struct address_space *mapping) { - __set_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags); + mapping_set_folio_order_range(mapping, 0, MAX_PAGECACHE_ORDER); +} + +static inline +unsigned int mapping_max_folio_order(const struct address_space *mapping) +{ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + return 0; + return (mapping->flags & AS_FOLIO_ORDER_MAX_MASK) >> AS_FOLIO_ORDER_MAX; +} + +static inline +unsigned int mapping_min_folio_order(const struct address_space *mapping) +{ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + return 0; + return (mapping->flags & AS_FOLIO_ORDER_MIN_MASK) >> AS_FOLIO_ORDER_MIN; } /* @@ -386,16 +449,13 @@ static inline bool mapping_large_folio_support(struct address_space *mapping) VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON, "Anonymous mapping always supports large folio"); - return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && - test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags); + return mapping_max_folio_order(mapping) > 0; } /* Return the maximum folio size for this pagecache mapping, in bytes. */ -static inline size_t mapping_max_folio_size(struct address_space *mapping) +static inline size_t mapping_max_folio_size(const struct address_space *mapping) { - if (mapping_large_folio_support(mapping)) - return PAGE_SIZE << MAX_PAGECACHE_ORDER; - return PAGE_SIZE; + return PAGE_SIZE << mapping_max_folio_order(mapping); } static inline int filemap_nr_thps(struct address_space *mapping) diff --git a/mm/filemap.c b/mm/filemap.c index 0b8c732bb643..d617c9afca51 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1933,10 +1933,8 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP)))) fgp_flags |= FGP_LOCK; - if (!mapping_large_folio_support(mapping)) - order = 0; - if (order > MAX_PAGECACHE_ORDER) - order = MAX_PAGECACHE_ORDER; + if (order > mapping_max_folio_order(mapping)) + order = mapping_max_folio_order(mapping); /* If we're not aligned, allocate a smaller folio */ if (index & ((1UL << order) - 1)) order = __ffs(index); diff --git a/mm/readahead.c b/mm/readahead.c index c1b23989d9ca..66058ae02f2e 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -503,9 +503,9 @@ void page_cache_ra_order(struct readahead_control *ractl, limit = min(limit, index + ra->size - 1); - if (new_order < MAX_PAGECACHE_ORDER) { + if (new_order < mapping_max_folio_order(mapping)) { new_order += 2; - new_order = min_t(unsigned int, MAX_PAGECACHE_ORDER, new_order); + new_order = min(mapping_max_folio_order(mapping), new_order); new_order = min_t(unsigned int, new_order, ilog2(ra->size)); } From patchwork Tue Jun 25 11:44:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13710921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A01AC2BBCA for ; Tue, 25 Jun 2024 11:44:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A695D6B02EA; Tue, 25 Jun 2024 07:44:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A17EA6B02EB; Tue, 25 Jun 2024 07:44:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81A2C6B02EC; Tue, 25 Jun 2024 07:44:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 61BB86B02EA for ; Tue, 25 Jun 2024 07:44:38 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CA7AF1A1562 for ; Tue, 25 Jun 2024 11:44:37 +0000 (UTC) X-FDA: 82269228594.05.0577026 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by imf27.hostedemail.com (Postfix) with ESMTP id 050BE40016 for ; Tue, 25 Jun 2024 11:44:35 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=toaP98b4; spf=pass (imf27.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719315868; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4YLF87QSX6js20i4xQuuldtKlYWhaWvZZz7QN4gKPUI=; b=uD9Q+uWwuFHLAJYiPoEK+Njels660CCgxLIUxBYvEiLPBWVdnoYMr9kaBYEsxOJGmbzo6y k6VtLfNIDhR6Ql34zyX4ttDarrW1FACB++Jdge1CE1R5DwNDSyCNNclrUNZCw+7LnWR1WH u36naHm8OpXU4mEHPed1NVPToYnJbCA= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=toaP98b4; spf=pass (imf27.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719315868; a=rsa-sha256; cv=none; b=mEXOGu8I1E1sFAoTuluM6Wd4nZL9gMdryfGSJT218dmSF0MV6tvn7MJQl5cxdcWN4nEALR uZXM4kNABBFH1uJAaw0gKSNtDTwsOxqpH6jv3y820yy+VUuvrwEXjKaAf5cQttYXyeOD2q SeFbTWUI46qfCTXrBgjyt93fn81zA0U= Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4W7jfm6p42z9sqQ; Tue, 25 Jun 2024 13:44:32 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1719315873; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4YLF87QSX6js20i4xQuuldtKlYWhaWvZZz7QN4gKPUI=; b=toaP98b41Fak13ssMmY1Kynhd1esE1uDjnxwiBXPi3ixvvTb4HZEBkDJVthCDWzBqW1ov/ uc9wcBwwaA+xeo6oKM7j0NgbH+s/qwxrgzqoIGyZI5v9GumxkWXk6xZm6MoujS1geKjVFq ZREGbQvyyPTtQ4qPEVUkzM9cXpmPvsZV5Lboq/U6fPQZKOLTxzBezjC51DSpd0/BZbQafF nYxVrvpbZw6hMaN/ra0AIcAc67zBTlASbbdXtnbl5Vx2F6kzSqkS45u8IdM+pGAtskfpbI B8MTVcXgmBUlaHDu4VK4pV23ZVnCFycKYEfs+DUmbPQ3X0f4EdHpteLEZzbdHg== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v8 02/10] filemap: allocate mapping_min_order folios in the page cache Date: Tue, 25 Jun 2024 11:44:12 +0000 Message-ID: <20240625114420.719014-3-kernel@pankajraghav.com> In-Reply-To: <20240625114420.719014-1-kernel@pankajraghav.com> References: <20240625114420.719014-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 050BE40016 X-Stat-Signature: 9nao9yqyr7pghzebnccjopyqkirb9p5c X-Rspam-User: X-HE-Tag: 1719315875-222051 X-HE-Meta: U2FsdGVkX1+W5Ss6lgrG3WZDJCqHEr83iJQpe9HbWhtXm6Qo+yoxsYmWCec8msZYVE020GaSbfGhTOyZT6ftL7yLoYwSx640nCf+f+7F2S//mMFoOxGH4YHPbbazyCsD4njSckqFDiHgv94essAxqxpXOvjv77fevPdWXMZ2XXDfE+2KhNbaC6wVGhGpYDTlNxL3/78snTx1u2DIHTkRzLI8UJ30oXVhrAVd/TwVl2Qq3NR7lstz00RA2vRrgVqpCkw32Ew6n7XxERaZo7yW4vC1wLJ5URZxfH1YNrJjyt8B+8hQKqrMipSAU5TdqlmCnA1FAQdvBr0e8VfFRtSmXhGzQnBOkhlYOkKAaIvCmqcA/q5gOVuBzck8EpnkzRdVFOiKBLaVBCMuZJ7E40yIGXyyuIea4wT4fmAnOB5zCUQg29JA2omwIIwkdzr9I3GH4w0xoFNqmR1ZxpN6DzF6DNUn/YQ8GEwNTi4pgG/K/XiM/v1Q9MCqp5OJirwW6Ly0HVMBCrXChaID2evY+m6aWXgA8RK489U4vHSGfh188h8Tnv29oCsrka9zBYIBhkrOImQi7XbKYNnnfOe+lGoZzaTFhVOz5dWW9wNzmzyzHT0yZn/TawiJvyWXejf0yHRr1wQS5ltmwcsj37GA/I4RsICOLpLOe7W0UaSsDMmTD0letNLSCC/BkKcWCrvDwn2tFShrctKasoRdMYp05FZp8KBR6GBtFx3p57IRQnsA50VRrvsj0ftLTGPt1wnwCKMlQKXPjyQT+2MHfnEqfMGHr9GDpQJs8tswiJ2ugXKeoMfbyYKFbsrqD1X+G8P41JE1XUdb9ia8LmIiSf6XPhGsp63s+EbpzAPxSmFqchCmnWiEnneBwtp5TFF/5e4vm8JLt4zbfqxQx3Xl32L8BW27tx3HXQME1YLmt6iW+inCK/gMsHoYw4VW7jpn9jNRc92SDkUYqY+whwwQzJfgFEg h76GvMSX g8YRz45jfYZkdgglPnkQeCPqbTXyTQ2orJixYBUOS+7bf2GD2gC85ZV7e52C67VwMeWDP83+1G5Qbx1aCnjLE9VjnuF2RCr4BHn2ye+H1V8JfypCxTmEAAwyIObMhuxRwUNeCi/A0YTzlk9q8/MU0mxKVlWoEDYCMpolQzblabiUw3mwnt0jO4BG5U7TRYAiq6Nshut/4OiugiYtPTXk3NqaVM+Dqzg1jgd7mb+VLNl90f7KTpERxxjK/4GeXNuz8rPyfklSfzxG0nFaTmeTsjhgdpksCK8+WvPi7br080vWUV4PdiISW4MTKy0Dq88YL/JhqrPHXjIegcqrYVMn5icTD4ChGAwSTrXpegLpshadvc/nRfdj7gITaxg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav filemap_create_folio() and do_read_cache_folio() were always allocating folio of order 0. __filemap_get_folio was trying to allocate higher order folios when fgp_flags had higher order hint set but it will default to order 0 folio if higher order memory allocation fails. Supporting mapping_min_order implies that we guarantee each folio in the page cache has at least an order of mapping_min_order. When adding new folios to the page cache we must also ensure the index used is aligned to the mapping_min_order as the page cache requires the index to be aligned to the order of the folio. Co-developed-by: Luis Chamberlain Signed-off-by: Luis Chamberlain Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke Reviewed-by: Darrick J. Wong Reviewed-by: Matthew Wilcox (Oracle) --- include/linux/pagemap.h | 23 ++++++++++++++++++++++- mm/filemap.c | 24 ++++++++++++++++-------- 2 files changed, 38 insertions(+), 9 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 0c51154cdb57..7f1355abd8a2 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -439,6 +439,27 @@ unsigned int mapping_min_folio_order(const struct address_space *mapping) return (mapping->flags & AS_FOLIO_ORDER_MIN_MASK) >> AS_FOLIO_ORDER_MIN; } +static inline +unsigned long mapping_min_folio_nrpages(struct address_space *mapping) +{ + return 1UL << mapping_min_folio_order(mapping); +} + +/** + * mapping_align_index() - Align index based on the min + * folio order of the page cache. + * @mapping: The address_space. + * + * The index of a folio must be naturally aligned. If you are adding a + * new folio to the page cache and need to know what index to give it, + * call this function. + */ +static inline pgoff_t mapping_align_index(struct address_space *mapping, + pgoff_t index) +{ + return round_down(index, mapping_min_folio_nrpages(mapping)); +} + /* * Large folio support currently depends on THP. These dependencies are * being worked on but are not yet fixed. @@ -1165,7 +1186,7 @@ static inline vm_fault_t folio_lock_or_retry(struct folio *folio, void folio_wait_bit(struct folio *folio, int bit_nr); int folio_wait_bit_killable(struct folio *folio, int bit_nr); -/* +/* * Wait for a folio to be unlocked. * * This must be called with the caller "holding" the folio, diff --git a/mm/filemap.c b/mm/filemap.c index d617c9afca51..8eafbd4a4d0c 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -859,6 +859,8 @@ noinline int __filemap_add_folio(struct address_space *mapping, VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_FOLIO(folio_test_swapbacked(folio), folio); + VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping), + folio); mapping_set_update(&xas, mapping); VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio); @@ -1919,8 +1921,10 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, folio_wait_stable(folio); no_page: if (!folio && (fgp_flags & FGP_CREAT)) { - unsigned order = FGF_GET_ORDER(fgp_flags); + unsigned int min_order = mapping_min_folio_order(mapping); + unsigned int order = max(min_order, FGF_GET_ORDER(fgp_flags)); int err; + index = mapping_align_index(mapping, index); if ((fgp_flags & FGP_WRITE) && mapping_can_writeback(mapping)) gfp |= __GFP_WRITE; @@ -1943,7 +1947,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, gfp_t alloc_gfp = gfp; err = -ENOMEM; - if (order > 0) + if (order > min_order) alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN; folio = filemap_alloc_folio(alloc_gfp, order); if (!folio) @@ -1958,7 +1962,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, break; folio_put(folio); folio = NULL; - } while (order-- > 0); + } while (order-- > min_order); if (err == -EEXIST) goto repeat; @@ -2451,13 +2455,15 @@ static int filemap_update_page(struct kiocb *iocb, } static int filemap_create_folio(struct file *file, - struct address_space *mapping, pgoff_t index, + struct address_space *mapping, loff_t pos, struct folio_batch *fbatch) { struct folio *folio; int error; + unsigned int min_order = mapping_min_folio_order(mapping); + pgoff_t index; - folio = filemap_alloc_folio(mapping_gfp_mask(mapping), 0); + folio = filemap_alloc_folio(mapping_gfp_mask(mapping), min_order); if (!folio) return -ENOMEM; @@ -2475,6 +2481,7 @@ static int filemap_create_folio(struct file *file, * well to keep locking rules simple. */ filemap_invalidate_lock_shared(mapping); + index = (pos >> (PAGE_SHIFT + min_order)) << min_order; error = filemap_add_folio(mapping, folio, index, mapping_gfp_constraint(mapping, GFP_KERNEL)); if (error == -EEXIST) @@ -2535,8 +2542,7 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count, if (!folio_batch_count(fbatch)) { if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ)) return -EAGAIN; - err = filemap_create_folio(filp, mapping, - iocb->ki_pos >> PAGE_SHIFT, fbatch); + err = filemap_create_folio(filp, mapping, iocb->ki_pos, fbatch); if (err == AOP_TRUNCATED_PAGE) goto retry; return err; @@ -3752,9 +3758,11 @@ static struct folio *do_read_cache_folio(struct address_space *mapping, repeat: folio = filemap_get_folio(mapping, index); if (IS_ERR(folio)) { - folio = filemap_alloc_folio(gfp, 0); + folio = filemap_alloc_folio(gfp, + mapping_min_folio_order(mapping)); if (!folio) return ERR_PTR(-ENOMEM); + index = mapping_align_index(mapping, index); err = filemap_add_folio(mapping, folio, index, gfp); if (unlikely(err)) { folio_put(folio); From patchwork Tue Jun 25 11:44:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13710922 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3587C2BBCA for ; Tue, 25 Jun 2024 11:44:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4EADC6B014A; Tue, 25 Jun 2024 07:44:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 473666B014D; Tue, 25 Jun 2024 07:44:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2EDC96B014A; Tue, 25 Jun 2024 07:44:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 060356B0145 for ; Tue, 25 Jun 2024 07:44:43 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C37681C1E05 for ; Tue, 25 Jun 2024 11:44:42 +0000 (UTC) X-FDA: 82269228804.24.3633DDE Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [80.241.56.172]) by imf08.hostedemail.com (Postfix) with ESMTP id 05E7416001F for ; Tue, 25 Jun 2024 11:44:40 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=ywgdt+Jm; spf=pass (imf08.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.172 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719315873; a=rsa-sha256; cv=none; b=zOt6XZb3xGUs+fwxVs4/FmWFKD7ETu4+38TRYSn6uobszblp0CrT5akp6R6byoGRFkBO7n AMTe/JeXP+Y4azqJPBLIvhYjyMaVLjo/FpJDOrWcGJ0V/IQzQ6U9wgJAZNijZUsJjAj4lM e5pM5K3moFfU0vPbcpsAInFphAzdqfI= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=ywgdt+Jm; spf=pass (imf08.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.172 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719315873; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ywavaK8QMna1JetTs9rLR1bZJ1v7KeKc0aD3TnIhzwU=; b=eWEG4o05zGEkmV52CUhIz/hpMq7KVJ8f9Lmlj66DkBSDdAzTMfQ1dvTObtITFERq44UVKX pRQ71SHTUPKCb/6kwrqmbO1+5zK6+8AKty7EfuSdQ1hdM9Q+L41kNeOHweZN9o0TI6Zei+ Cm9WQLe0R7VBNN+wer4GqwihEmOSwRE= Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4W7jfs574Bz9sT5; Tue, 25 Jun 2024 13:44:37 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1719315877; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ywavaK8QMna1JetTs9rLR1bZJ1v7KeKc0aD3TnIhzwU=; b=ywgdt+Jm+LhNQyXY/UlD36z0PVt6R+zz6kAxj8bDajZJVNOkixhGveWz0fUaBO3Ek8dJ4c GZFkijp/Sn1IDWzOo/69WNGpPYelXmtZsa3tKnRg2e/DzElhZBHCruqtdIt743ANy760ud XDRxMHsnX7Ab1kC4IyQbO8b3UoP/nc4pC9b1Yxq50aQ/ag17Z86wW4tadOz9SDYMQOzyfP 4CV7/xytQv6yyzJVHQFA8x5O5fiybauKLpkWUXQqO+9hBTq3Q4gRm6yJhAVmT3ibGmgP2u OeNEV9/6ooee1VIsbEeQXS428ISzF3cqvZfsAUStPya/+LMYxh2TuX44bTkusw== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v8 03/10] readahead: allocate folios with mapping_min_order in readahead Date: Tue, 25 Jun 2024 11:44:13 +0000 Message-ID: <20240625114420.719014-4-kernel@pankajraghav.com> In-Reply-To: <20240625114420.719014-1-kernel@pankajraghav.com> References: <20240625114420.719014-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Stat-Signature: a779b36mjppnokpc1uep9xusebtqdm3t X-Rspamd-Queue-Id: 05E7416001F X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1719315880-534961 X-HE-Meta: U2FsdGVkX18bLigc8N5HoTEl4CGqj7Sseaenbq7VsQYgwnXDLiaotoVc0tCpoua/G5x53Mx5/bONXBPL70wvGHbhVNMQhxU+qlHHYOD1vZr1LMKJyyRiNuw8aWCVzLbZK2ZUX+mNzj+mR+MQTXN4TRdSsPtMH10DqdPeLLMu45WEmUxpqqjhfXw4fLHe+FNGOAM8jdd/6fChBaw6rs3pQLe1K/TC1DXAWTAihTkh3+dz+qZtoEhOdMgg7JRaBzxWkKB+sRVQMm8eWXX77KygE40cEfIOgJ1/icNG2Dbov6HY1P87p114J541eSxI3NtxiJTxcDe3n7KM+x+FTfFqLtEtTuPQzd8RXNcFIUya9kQue5BCvTbu6NEu+VmJpCvp5MBLkDvwzaiylf4jG/ONKuA0lHwmu9VjxzhZiCBnJP99YW/93vniISXNieTj+ziJx4A3dZ8EnNAqkOCPnWfHF8VS+SdbBHJYPnVb+WpoaYgY1FG4A2CcKCCwGBNP9Z6LnG1c8DjC8HyGRBPoSEFIHmxG6FE9VLpla75FE3JfUA2G1+S4reDGIPhn4x/F4dt+OpAEcCEHp8uPQME7SK4E+lXyhq1w9vGk58G4LoHZJcRB2AKsgE13gCYdnaQ/dqTCIXLAhjkdmTZronhV4hKJ/R7QfFmPsNEEo6UXnz+zwG5WT1dQwvFPssnwQiFQmpOe1dXMl4v2O4uwQTBazLFHxKDMxratJVrhRwuTwREZJWVNTo1Eu6i1UCWsoqa8F6D0WBHbV/7o+1Niwx2PwsIM83wPULQniGMZTPFZJf+IyvRkuEtKWQTyjvoDpw+kKsTXMWykq5KFJY8TzbYADi6nN7w2EonYa4f5uGW6HNQ6NyiG6rzhBBAy29wV6EU8qjIJS/heXaUv3UVWPAa/PnNP+5YCAHv++nml8hOZEqgb80zGRQJN93a5l9SxIFDG6urq9329CmmCoXCqy8WRKum 30L5lc5e q0VAIN9jOxkvdX+qVbD1HgTgJ8E2SZvZinxAHuBxEAGbOJcm8Enk6EAKGM+xznkRsdV0+EtsfaaNQJbygLfRbwQ8anLJYVHkRIYwlI0xW7u/GJbUFNUNuKiLHdRWWBqwePniRAeFfFrSR4N8oD7903ntP6NQTS4srJsaYKmMPfosFmTwUEK7CWQw1tHMfXUCSlAWPQX0AZ1kC8CuaAW+FEJpARYibk/R7Zv51XsDYgtIG+ew9wSAsK4L/paTESP2LlQr2vsDNQA25QKcOAf8vXQ+++sGNOmu6d+iBE00KxPda1FCc/Aeg4z5Y6g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav page_cache_ra_unbounded() was allocating single pages (0 order folios) if there was no folio found in an index. Allocate mapping_min_order folios as we need to guarantee the minimum order if it is set. While we are at it, rework the loop in page_cache_ra_unbounded() to advance with the number of pages in a folio instead of just one page at a time. page_cache_ra_order() tries to allocate folio to the page cache with a higher order if the index aligns with that order. Modify it so that the order does not go below the mapping_min_order requirement of the page cache. This function will do the right thing even if the new_order passed is less than the mapping_min_order. When adding new folios to the page cache we must also ensure the index used is aligned to the mapping_min_order as the page cache requires the index to be aligned to the order of the folio. readahead_expand() is called from readahead aops to extend the range of the readahead so this function can assume ractl->_index to be aligned with min_order. Signed-off-by: Pankaj Raghav Co-developed-by: Hannes Reinecke Signed-off-by: Hannes Reinecke Acked-by: Darrick J. Wong --- mm/readahead.c | 81 +++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 63 insertions(+), 18 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index 66058ae02f2e..2acfd6447d7b 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -206,9 +206,10 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, unsigned long nr_to_read, unsigned long lookahead_size) { struct address_space *mapping = ractl->mapping; - unsigned long index = readahead_index(ractl); + unsigned long ra_folio_index, index = readahead_index(ractl); gfp_t gfp_mask = readahead_gfp_mask(mapping); - unsigned long i; + unsigned long mark, i = 0; + unsigned int min_nrpages = mapping_min_folio_nrpages(mapping); /* * Partway through the readahead operation, we will have added @@ -223,10 +224,26 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, unsigned int nofs = memalloc_nofs_save(); filemap_invalidate_lock_shared(mapping); + index = mapping_align_index(mapping, index); + + /* + * As iterator `i` is aligned to min_nrpages, round_up the + * difference between nr_to_read and lookahead_size to mark the + * index that only has lookahead or "async_region" to set the + * readahead flag. + */ + ra_folio_index = round_up(readahead_index(ractl) + nr_to_read - lookahead_size, + min_nrpages); + mark = ra_folio_index - index; + if (index != readahead_index(ractl)) { + nr_to_read += readahead_index(ractl) - index; + ractl->_index = index; + } + /* * Preallocate as many pages as we will need. */ - for (i = 0; i < nr_to_read; i++) { + while (i < nr_to_read) { struct folio *folio = xa_load(&mapping->i_pages, index + i); int ret; @@ -240,12 +257,13 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, * not worth getting one just for that. */ read_pages(ractl); - ractl->_index++; - i = ractl->_index + ractl->_nr_pages - index - 1; + ractl->_index += min_nrpages; + i = ractl->_index + ractl->_nr_pages - index; continue; } - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, + mapping_min_folio_order(mapping)); if (!folio) break; @@ -255,14 +273,15 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, if (ret == -ENOMEM) break; read_pages(ractl); - ractl->_index++; - i = ractl->_index + ractl->_nr_pages - index - 1; + ractl->_index += min_nrpages; + i = ractl->_index + ractl->_nr_pages - index; continue; } - if (i == nr_to_read - lookahead_size) + if (i == mark) folio_set_readahead(folio); ractl->_workingset |= folio_test_workingset(folio); - ractl->_nr_pages++; + ractl->_nr_pages += min_nrpages; + i += min_nrpages; } /* @@ -492,13 +511,19 @@ void page_cache_ra_order(struct readahead_control *ractl, { struct address_space *mapping = ractl->mapping; pgoff_t index = readahead_index(ractl); + unsigned int min_order = mapping_min_folio_order(mapping); pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT; pgoff_t mark = index + ra->size - ra->async_size; unsigned int nofs; int err = 0; gfp_t gfp = readahead_gfp_mask(mapping); + unsigned int min_ra_size = max(4, mapping_min_folio_nrpages(mapping)); - if (!mapping_large_folio_support(mapping) || ra->size < 4) + /* + * Fallback when size < min_nrpages as each folio should be + * at least min_nrpages anyway. + */ + if (!mapping_large_folio_support(mapping) || ra->size < min_ra_size) goto fallback; limit = min(limit, index + ra->size - 1); @@ -507,11 +532,20 @@ void page_cache_ra_order(struct readahead_control *ractl, new_order += 2; new_order = min(mapping_max_folio_order(mapping), new_order); new_order = min_t(unsigned int, new_order, ilog2(ra->size)); + new_order = max(new_order, min_order); } /* See comment in page_cache_ra_unbounded() */ nofs = memalloc_nofs_save(); filemap_invalidate_lock_shared(mapping); + /* + * If the new_order is greater than min_order and index is + * already aligned to new_order, then this will be noop as index + * aligned to new_order should also be aligned to min_order. + */ + ractl->_index = mapping_align_index(mapping, index); + index = readahead_index(ractl); + while (index <= limit) { unsigned int order = new_order; @@ -519,7 +553,7 @@ void page_cache_ra_order(struct readahead_control *ractl, if (index & ((1UL << order) - 1)) order = __ffs(index); /* Don't allocate pages past EOF */ - while (index + (1UL << order) - 1 > limit) + while (order > min_order && index + (1UL << order) - 1 > limit) order--; err = ra_alloc_folio(ractl, index, mark, order, gfp); if (err) @@ -783,8 +817,15 @@ void readahead_expand(struct readahead_control *ractl, struct file_ra_state *ra = ractl->ra; pgoff_t new_index, new_nr_pages; gfp_t gfp_mask = readahead_gfp_mask(mapping); + unsigned long min_nrpages = mapping_min_folio_nrpages(mapping); + unsigned int min_order = mapping_min_folio_order(mapping); new_index = new_start / PAGE_SIZE; + /* + * Readahead code should have aligned the ractl->_index to + * min_nrpages before calling readahead aops. + */ + VM_BUG_ON(!IS_ALIGNED(ractl->_index, min_nrpages)); /* Expand the leading edge downwards */ while (ractl->_index > new_index) { @@ -794,9 +835,11 @@ void readahead_expand(struct readahead_control *ractl, if (folio && !xa_is_value(folio)) return; /* Folio apparently present */ - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, min_order); if (!folio) return; + + index = mapping_align_index(mapping, index); if (filemap_add_folio(mapping, folio, index, gfp_mask) < 0) { folio_put(folio); return; @@ -806,7 +849,7 @@ void readahead_expand(struct readahead_control *ractl, ractl->_workingset = true; psi_memstall_enter(&ractl->_pflags); } - ractl->_nr_pages++; + ractl->_nr_pages += min_nrpages; ractl->_index = folio->index; } @@ -821,9 +864,11 @@ void readahead_expand(struct readahead_control *ractl, if (folio && !xa_is_value(folio)) return; /* Folio apparently present */ - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, min_order); if (!folio) return; + + index = mapping_align_index(mapping, index); if (filemap_add_folio(mapping, folio, index, gfp_mask) < 0) { folio_put(folio); return; @@ -833,10 +878,10 @@ void readahead_expand(struct readahead_control *ractl, ractl->_workingset = true; psi_memstall_enter(&ractl->_pflags); } - ractl->_nr_pages++; + ractl->_nr_pages += min_nrpages; if (ra) { - ra->size++; - ra->async_size++; + ra->size += min_nrpages; + ra->async_size += min_nrpages; } } } From patchwork Tue Jun 25 11:44:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13710923 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7053AC2BBCA for ; Tue, 25 Jun 2024 11:44:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ED4056B02EE; Tue, 25 Jun 2024 07:44:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E83F86B02EF; Tue, 25 Jun 2024 07:44:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFEDC6B02F0; Tue, 25 Jun 2024 07:44:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id ABAAE6B02EE for ; Tue, 25 Jun 2024 07:44:46 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 242A01414D1 for ; Tue, 25 Jun 2024 11:44:46 +0000 (UTC) X-FDA: 82269228972.14.2792A07 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by imf08.hostedemail.com (Postfix) with ESMTP id 5099A160018 for ; Tue, 25 Jun 2024 11:44:44 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=PiG+mKE7; spf=pass (imf08.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719315877; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N1ycwWKGYSoq9aKuREaT9SOLOX0Nv8uK+1dIKxx6K5c=; b=UM6uHFjWSR3nLAz+bceyHZwTBefhumxGTH0mqO23y1EdvhsGuyusEcqN+G1JKI6tev8LG9 Pugwy+CaWYGKvZJ7LHMHAHIzMHLkHNtorRmcZ3/FwjNrVeSH6HxuUsX2Fd1kx8F5XepcH2 ZFYR2fgu/K85b2Q/ScDk2Z1vqlY32u4= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=PiG+mKE7; spf=pass (imf08.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719315877; a=rsa-sha256; cv=none; b=og2uRV4HknRoopcd9KPH3MbdI8GTD3JrslvHKyg94qJDlKYANAbwSPyNk8eTxDiapij+gY MnZ+paza1knMSbxqi7QFdqYJXlZUtt47t8/49+Jg0PZgDlYgzJdp3a9GsUE+EQCWmSd9Nf EFG6xWwI1x2F+S6W+V53nOs1vKjZIkA= Received: from smtp2.mailbox.org (smtp2.mailbox.org [10.196.197.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4W7jfx1tRlz9sYX; Tue, 25 Jun 2024 13:44:41 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1719315881; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=N1ycwWKGYSoq9aKuREaT9SOLOX0Nv8uK+1dIKxx6K5c=; b=PiG+mKE7HVDbyP+6P1IA+X3BgQEg8VFBTpkSS4qWE7AZ6/0YDNba7z63Q9TR7q5pMnDCHz 1YeFOEzjf0Aa4Lba+wO3gLyBTd5mQYepm7+RAv1OkHLr1TuSoHheqCd4EsqR6mL5AtkTjz 5OUQKA27sBsfIDLrwyuhdqWw8f6uL7NK+N/AVMWnL7C/wZBZjecDlbelI3PsgLHqkADWIq S1Pu8YnXEr6jWBqXn0I5nIdgnYbI5W5o18+JwqEiF05vdLSgfqmHUv6sQftCJPn6/au2DO 3MJkCFwSJhu8unImJO0t+ehOs6N9SeIE0q639VqHhhfLnG9l8cQ4GcTfGxcYVg== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v8 04/10] mm: split a folio in minimum folio order chunks Date: Tue, 25 Jun 2024 11:44:14 +0000 Message-ID: <20240625114420.719014-5-kernel@pankajraghav.com> In-Reply-To: <20240625114420.719014-1-kernel@pankajraghav.com> References: <20240625114420.719014-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 5099A160018 X-Stat-Signature: ogw1io5k58d9yj6wz8s1gf3y64bxtiyu X-HE-Tag: 1719315884-464540 X-HE-Meta: U2FsdGVkX19hc54ysxSEAqGFgTV0ZwmevdUlRu9Vz/BW5+1WPdepC3GSKjwNhDvGBdsIFi3dOJ4TK2RxzurRKoifnRfBLZ2yg5L51Es8v1OjuTS3FDsWtg/9B0t3uX96HFOBabqrT7NxSx9JmZf84ViSCt36Wj2uu669kJ9jacqKPrOnwivHwyJrjfhqzI4DWEE5+IZN5UGUb2iJ/AGve+B6ZclEapCcpdMfAgcHkTBZylhe+45km7OnvEVvD8jX8gvHtON5aOsS53Juk6NBIxUuHoPj0oH/2kotrl2nJtgAahs8fmwIv9BdAcHa1JSBbgooXwbwTT4iQT3pf8ZB2l9M8HSPkPbxV0qA0DMEFRlksTEoWq3RiDmxugC10+oK677jAgx9uXtsnJSALkqqCJyR6u7V2E7D9ED3FRkfVhIEaUeQYvmfM/Azg3IuerY1w9qRKDl/JvcrehajpHdE6/FY6xeAQ8S5HPRtt8FwHIECAuUWnowjRuL0IqeoxZJFqjrLfdbZdkdlwj/bKe7lmnXUNGhT3X65cr1RUSCUXtDBAci6AjOQqssBbNS73/9S7KYqK2BpBrkldSZ43uANCNySWtJySNXUGUE6Hm3vgmk6S3Op5VySGKxQ0o82MHmzYjCjwtPNA71AP1cvK1HNYzUrPZ3/mat7ejhpe2eNuVh3xzTK8gcXBVYEtTNNKzFwQp5psY8Zku+4gLY6XoEYUvmhEpADEvskiNphrRVtql9uwWdSY7mggdedkqoYi261zArbomj8eWMJpmiczvyUuk4UKBVwbn6yqcRsjldolMRdCg4lC1AnbM8kNYqyoQYtjfm42wmyQeuLwHzOYNp2oywIyWP1EkgaSF1EXdvrKuAl/lBP0DHjTmegcleIJQOD7RtANnZNKiL/Ar3v0lofQZMwXBEICEGL2RT3wByk9mnUCSXRzTetLmmyln/im5cDi6c0mJCwq3BdQX5gORy cZy1/Cj9 Pdd9jfMLwgioLljWUfgHeoqzlJu8mgDZjMOoIYXZQwjMLUDHDNaH2aW8xgN2xG3PjqhQP7aRT8MTUqoNXprKvNOBBYSeEiQn/gX13kxjX87r5Dk2VYI1s/dkvz7lIui9yEWJxMtkynPXxGSymMGZY/Mo1QfE+9CsdzDN0Ul7LtHGhJsHe2C3sT9YMLLzjFihk8PeTha07BZupz1UUpZ/G0Vhd05N/qNV4huBF2rJ0wDREzSQPPcs211GR8COHWgG4DE5te87E5cvru26Yyoi7XESD3f/LRSVNjQPTSY/QJE2ChL0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Luis Chamberlain split_folio() and split_folio_to_list() assume order 0, to support minorder for non-anonymous folios, we must expand these to check the folio mapping order and use that. Set new_order to be at least minimum folio order if it is set in split_huge_page_to_list() so that we can maintain minimum folio order requirement in the page cache. Update the debugfs write files used for testing to ensure the order is respected as well. We simply enforce the min order when a file mapping is used. Signed-off-by: Luis Chamberlain Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke --- There was a discussion about whether we need to consider truncation of folio to be considered a split failure or not [1]. The new code has retained the existing behaviour of returning a failure if the folio was truncated. I think we need to have a separate discussion whethere or not to consider it as a failure. include/linux/huge_mm.h | 14 ++++++++--- mm/huge_memory.c | 55 ++++++++++++++++++++++++++++++++++++++--- 2 files changed, 61 insertions(+), 8 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 212cca384d7e..70d80d17c3ff 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -90,6 +90,8 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; #define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \ (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order))) +#define split_folio(f) split_folio_to_list(f, NULL) + #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES #define HPAGE_PMD_SHIFT PMD_SHIFT #define HPAGE_PUD_SHIFT PUD_SHIFT @@ -320,9 +322,10 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add bool can_split_folio(struct folio *folio, int *pextra_pins); int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, unsigned int new_order); +int split_folio_to_list(struct folio *folio, struct list_head *list); static inline int split_huge_page(struct page *page) { - return split_huge_page_to_list_to_order(page, NULL, 0); + return split_folio(page_folio(page)); } void deferred_split_folio(struct folio *folio); @@ -487,6 +490,12 @@ static inline int split_huge_page(struct page *page) { return 0; } + +static inline int split_folio_to_list(struct folio *folio, struct list_head *list) +{ + return 0; +} + static inline void deferred_split_folio(struct folio *folio) {} #define split_huge_pmd(__vma, __pmd, __address) \ do { } while (0) @@ -601,7 +610,4 @@ static inline int split_folio_to_order(struct folio *folio, int new_order) return split_folio_to_list_to_order(folio, NULL, new_order); } -#define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0) -#define split_folio(f) split_folio_to_order(f, 0) - #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0fffaa58a47a..51fda5f9ac90 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3065,6 +3065,9 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) * released, or if some unexpected race happened (e.g., anon VMA disappeared, * truncation). * + * Callers should ensure that the order respects the address space mapping + * min-order if one is set for non-anonymous folios. + * * Returns -EINVAL when trying to split to an order that is incompatible * with the folio. Splitting to order 0 is compatible with all folios. */ @@ -3146,6 +3149,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, mapping = NULL; anon_vma_lock_write(anon_vma); } else { + unsigned int min_order; gfp_t gfp; mapping = folio->mapping; @@ -3156,6 +3160,14 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, goto out; } + min_order = mapping_min_folio_order(folio->mapping); + if (new_order < min_order) { + VM_WARN_ONCE(1, "Cannot split mapped folio below min-order: %u", + min_order); + ret = -EINVAL; + goto out; + } + gfp = current_gfp_context(mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK); @@ -3267,6 +3279,21 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, return ret; } +int split_folio_to_list(struct folio *folio, struct list_head *list) +{ + unsigned int min_order = 0; + + if (!folio_test_anon(folio)) { + if (!folio->mapping) { + count_vm_event(THP_SPLIT_PAGE_FAILED); + return -EBUSY; + } + min_order = mapping_min_folio_order(folio->mapping); + } + + return split_huge_page_to_list_to_order(&folio->page, list, min_order); +} + void __folio_undo_large_rmappable(struct folio *folio) { struct deferred_split *ds_queue; @@ -3496,6 +3523,8 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, struct vm_area_struct *vma = vma_lookup(mm, addr); struct page *page; struct folio *folio; + struct address_space *mapping; + unsigned int target_order = new_order; if (!vma) break; @@ -3516,7 +3545,13 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, if (!is_transparent_hugepage(folio)) goto next; - if (new_order >= folio_order(folio)) + if (!folio_test_anon(folio)) { + mapping = folio->mapping; + target_order = max(new_order, + mapping_min_folio_order(mapping)); + } + + if (target_order >= folio_order(folio)) goto next; total++; @@ -3532,9 +3567,13 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, if (!folio_trylock(folio)) goto next; - if (!split_folio_to_order(folio, new_order)) + if (!folio_test_anon(folio) && folio->mapping != mapping) + goto unlock; + + if (!split_folio_to_order(folio, target_order)) split++; +unlock: folio_unlock(folio); next: folio_put(folio); @@ -3559,6 +3598,7 @@ static int split_huge_pages_in_file(const char *file_path, pgoff_t off_start, pgoff_t index; int nr_pages = 1; unsigned long total = 0, split = 0; + unsigned int min_order; file = getname_kernel(file_path); if (IS_ERR(file)) @@ -3572,9 +3612,11 @@ static int split_huge_pages_in_file(const char *file_path, pgoff_t off_start, file_path, off_start, off_end); mapping = candidate->f_mapping; + min_order = mapping_min_folio_order(mapping); for (index = off_start; index < off_end; index += nr_pages) { struct folio *folio = filemap_get_folio(mapping, index); + unsigned int target_order = new_order; nr_pages = 1; if (IS_ERR(folio)) @@ -3583,18 +3625,23 @@ static int split_huge_pages_in_file(const char *file_path, pgoff_t off_start, if (!folio_test_large(folio)) goto next; + target_order = max(new_order, min_order); total++; nr_pages = folio_nr_pages(folio); - if (new_order >= folio_order(folio)) + if (target_order >= folio_order(folio)) goto next; if (!folio_trylock(folio)) goto next; - if (!split_folio_to_order(folio, new_order)) + if (folio->mapping != mapping) + goto unlock; + + if (!split_folio_to_order(folio, target_order)) split++; +unlock: folio_unlock(folio); next: folio_put(folio); From patchwork Tue Jun 25 11:44:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13710924 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1C21C2BBCA for ; Tue, 25 Jun 2024 11:44:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 203DC6B02F0; Tue, 25 Jun 2024 07:44:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B35B6B02F1; Tue, 25 Jun 2024 07:44:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 006406B02F2; Tue, 25 Jun 2024 07:44:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D2FC16B02F0 for ; Tue, 25 Jun 2024 07:44:50 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9628E1C1920 for ; Tue, 25 Jun 2024 11:44:50 +0000 (UTC) X-FDA: 82269229140.30.298B96E Received: from mout-p-103.mailbox.org (mout-p-103.mailbox.org [80.241.56.161]) by imf17.hostedemail.com (Postfix) with ESMTP id B6F8D4000F for ; Tue, 25 Jun 2024 11:44:48 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=Gl9MDuRF; spf=pass (imf17.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719315881; a=rsa-sha256; cv=none; b=WcqNk9GhtH1ZDwE3VhycPoNrgEplHxFZQc7sO0ebbEPH4Rm4UQe3gX1YWL/Y/fvSt80gzR KSm3TcSWRsJ+8OP1nGcghB8MciSx3BF0zIVrZlh1HU1jREwcf+BmWvsEBKK5+A/tZp4SJ4 UaWfVMGkf/CIKrVUnfQ4lmCLhLqp+YM= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=Gl9MDuRF; spf=pass (imf17.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719315881; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=B3OUZbSfJC/PeNvlGf0eSPPicsSbDeYTsWtl9oDbqvs=; b=BOaMycVpLg0PnOH/VhOWrk90TP4bT/a99DZxY2J/q5ATx9JxkJYxtc6FOU8gWBHS4O/XqT GCRhRZcpYfsINfe6BoqRZduW7wz56wyHcG3St3ZK19noNn3X7wMNdWaXrUlPFQ7exDGPkm Sf0WZYBz3v4cOQWNxDPg1Ikz9SA+0zs= Received: from smtp2.mailbox.org (smtp2.mailbox.org [10.196.197.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-103.mailbox.org (Postfix) with ESMTPS id 4W7jg11qg5z9ssB; Tue, 25 Jun 2024 13:44:45 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1719315885; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B3OUZbSfJC/PeNvlGf0eSPPicsSbDeYTsWtl9oDbqvs=; b=Gl9MDuRFaJeRCemR22+awf7zPbIO1zOHLEr3PDpxFJOrDCUo2YJ/Ov7jr3R1FYm7kHzA/i ylPymM9EuDr0WA7A13RxyuRFuX3oFA4VGXswbFQG1UAHgTYyAO6nWwm1Nw+lVoSDPu3Nf2 kURCO/Stb/5fitNOlIbzweXb53CUKT3A13qBhEnV+309WWZ/Oh/mwDEQbvecaDaWecIVb1 4gYZO4fOiTbN5An7f98ORIy5vTSGX5gak03FVnLbMV0eQo4L7J8jJh2XXYuGlO0B+NgLC3 JAS9lG21cK+5PW7LK1XEUwF2I/No07K1phD3V2vT7vmXG7TmqQGwbx0qPfhyzA== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v8 05/10] filemap: cap PTE range to be created to allowed zero fill in folio_map_range() Date: Tue, 25 Jun 2024 11:44:15 +0000 Message-ID: <20240625114420.719014-6-kernel@pankajraghav.com> In-Reply-To: <20240625114420.719014-1-kernel@pankajraghav.com> References: <20240625114420.719014-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Stat-Signature: nmhuwez6obj3zjh4fitq9r43qdr3jocy X-Rspamd-Queue-Id: B6F8D4000F X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1719315888-947978 X-HE-Meta: U2FsdGVkX183klk92LzyVv8XRwi/fSjirqYrZfzfRFB2DzLUMvNVjIl8Ar5z5BeHexxtTExCH8aBmcMHV2VAUQ1brImfPw/16hK7zWtkpHLZFOWlr35DovTt2XCXjvaXyq5FXS2JwBomcQa30PFjy6FAz9CexY4EIzsgewWcKQhROLObe/Gy9SLM9HtvKq6OoVC/W/VMOCrxZHM/h8TKmooGGycpR24JFNXrGofebTAt84TRQl24JVuL3MLrihkT1lxCZK75smZLA4lCfIZWKqkBKayHdAmJg180CObCRpZUgdE5hkQBze5RuutKE7Mn2Skxd2RNsaZRsMuTw3XEYAfzvP4neW+n1yciuCzilFV3yDC+YxUUQWwXxpYtLhGIn30t44mpoDArQBLd5Cqur4wLNplMcIFYOzntw/UQf40clwNTg3uYZVE2r4OX/UJBY3OZbWaK5PUGGjSDhv3E4E3BT/26jOGpki5O3NY0XopBa2qP9jp3tnPaDMpMo69h3URoubRaYPQZhmxt7gLQYOXQLkl06O/3n71gc2cfufMpawzfNgbRpL8GTfVyHFvjmM8p0HZLBOKG3CKzVAJ7Z3J/ah4N46k2eVgZ5RumpqgMNImRpiTeTnmGYgELFl4/xYP9F7sNJh+FtIW97jRcyw+VRmlXp63BOhkvvvz/ziVCpTF5P3SuEqLsjrqeciDuOHjSEfFfEw3ZcmhmqV1iYDUwm9byewxpqYsojeTCu12gf/EJXVzbAEdZjPhnjEGaKs9DuTqPF6/K2fkHRikPk76tHi541YE7vs3gLqK0aRfADlAtObHs102KNJB8Wr7AMb+FNi/5hRtHM9YKAr8wUq5ClMMOv0AesQSSXW5dLiZ9/nChIQ6SIU83uvDLJYEdHzvtl6Nfx2kNSf/9Z4kxDtXmt849yK33Y/uSLA4ZwqvmmniTU7hpz53Foez4RgizSrh0qvHH0XtYNhyEnhD 7YK2WzUE KM4tV580y1ZjODjR1Tlb3c+lY837aYCmzUdo7ALuu/zmRQ7+otpVpJEeLNqKoKBKIBZybI8VloZn2EqxUZ9kCZCM5Wmrmp6xpW6SIVZ6qp0cV7+AwWwC1nhDJGePesMlZlHb9ZDTDHpSJYm7OfdSNvu4AePr+tLKxpMpTO2J2p6Mr83K+hIY5qefZ3imXEdNwthpHVUDrG6aU+6fM+QxYZdX1tKT7XAodQIVUwKcqiYVIw5URqpD3FapCUYUaqA9auk3Wjv17FR87HyowTR6hPILdAGv5TC5I0l12g1M6+2Zlbwuy0dTyGRERPUHbizuTZeWDjWFSZAcRhMxgmCoKmKnXH3hREuOY9VDsu73egurzKtQPaXDGqPCDsGnEJqiGxB8AaWB2eRStGX+LOcdOnq8pVg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav Usually the page cache does not extend beyond the size of the inode, therefore, no PTEs are created for folios that extend beyond the size. But with LBS support, we might extend page cache beyond the size of the inode as we need to guarantee folios of minimum order. While doing a read, do_fault_around() can create PTEs for pages that lie beyond the EOF leading to incorrect error return when accessing a page beyond the mapped file. Cap the PTE range to be created for the page cache up to the end of file(EOF) in filemap_map_pages() so that return error codes are consistent with POSIX[1] for LBS configurations. generic/749(currently in xfstest-dev patches-in-queue branch [0]) has been created to trigger this edge case. This also fixes generic/749 for tmpfs with huge=always on systems with 4k base page size. [0] https://lore.kernel.org/all/20240615002935.1033031-3-mcgrof@kernel.org/ [1](from mmap(2)) SIGBUS Attempted access to a page of the buffer that lies beyond the end of the mapped file. For an explanation of the treatment of the bytes in the page that corresponds to the end of a mapped file that is not a multiple of the page size, see NOTES. Signed-off-by: Luis Chamberlain Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke Reviewed-by: Matthew Wilcox (Oracle) Reviewed-by: Darrick J. Wong --- mm/filemap.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 8eafbd4a4d0c..56ff1d936aa8 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3612,7 +3612,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, struct vm_area_struct *vma = vmf->vma; struct file *file = vma->vm_file; struct address_space *mapping = file->f_mapping; - pgoff_t last_pgoff = start_pgoff; + pgoff_t file_end, last_pgoff = start_pgoff; unsigned long addr; XA_STATE(xas, &mapping->i_pages, start_pgoff); struct folio *folio; @@ -3638,6 +3638,10 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, goto out; } + file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; + if (end_pgoff > file_end) + end_pgoff = file_end; + folio_type = mm_counter_file(folio); do { unsigned long end; From patchwork Tue Jun 25 11:44:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13710925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BECD1C30653 for ; Tue, 25 Jun 2024 11:44:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 55E716B02F2; Tue, 25 Jun 2024 07:44:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 50E846B02F4; Tue, 25 Jun 2024 07:44:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 339EC6B02F3; Tue, 25 Jun 2024 07:44:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 146A56B02F1 for ; Tue, 25 Jun 2024 07:44:54 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C1C0DC15A7 for ; Tue, 25 Jun 2024 11:44:53 +0000 (UTC) X-FDA: 82269229266.09.C754CD4 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by imf01.hostedemail.com (Postfix) with ESMTP id EE8584000E for ; Tue, 25 Jun 2024 11:44:51 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=0cMOTi7n; spf=pass (imf01.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719315885; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Df5ZWch5fTpYhlPxb9yx/HDWdwemGo64pl9vSh29I64=; b=Aj+B/zfHIk/mQAZ4F3UCttXV3ka4s/3A5X5m9Kj37LSSa96GnYrLFNHibh0QCpZKij5yYK BajtYmy+s+3D/tvTHNIoZ1VSwmKGKtSBo3d0Dg6w6/APKEsKTFJUM0C2fMNRs/NUX+RCdx WuNSpL6JFNDjEL7ckdh5m4144hj/eJY= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=0cMOTi7n; spf=pass (imf01.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719315885; a=rsa-sha256; cv=none; b=c1s/Hq/0/6WWzzxCpAPh0Q3JLF9OjB2830SHltgdqviUw+dhyODt+KtydN7PR1WaMafH9m M7n+eloWjaj+/PFwJcwTyJ0RmOihjCKIROgYFbOTD2wFx15L5tv4tWjelUxq+Zh0CldCga M0UkzUyOpbIJ7rdfZs5XT7p8BJPxYrQ= Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:b231:465::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4W7jg46VmYz9srB; Tue, 25 Jun 2024 13:44:48 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1719315888; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Df5ZWch5fTpYhlPxb9yx/HDWdwemGo64pl9vSh29I64=; b=0cMOTi7n0+bdIMypet/TOlug5hdj2GY0J3E8OdTB3YdQViT4ac5M8lASCbutH2h880yREb tDEJPZXkCWLc5f5xi939GQJiOTTFwOzK8RiPXTtOPaQUEwJo4T1XHrpuXme0gWrsaiYiM5 mzVvtZ0g5bpklzaTG3OhLBLLuh+hOmD7UDtC0KxOxJbCB5drbeB11NDM+OOcsB4zsv7QYS wntij5YskAvs78MC+oNNtEZHdKqobtc7gd+mWqvcyAsVNeUcyP8RWJsF8xBBL6G7xrivsc 9pk+zuB0Rztwcxm7L5xVShD0/TlZjTuyzaSak7c+rjdmP+RNJDoRAS1m7c4uYw== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v8 06/10] iomap: fix iomap_dio_zero() for fs bs > system page size Date: Tue, 25 Jun 2024 11:44:16 +0000 Message-ID: <20240625114420.719014-7-kernel@pankajraghav.com> In-Reply-To: <20240625114420.719014-1-kernel@pankajraghav.com> References: <20240625114420.719014-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: EE8584000E X-Stat-Signature: 8k1ye9r6uoymqb15xqxumoun1aixdkco X-HE-Tag: 1719315891-96218 X-HE-Meta: U2FsdGVkX19ZE2XDuqoSVkxTCslFl+XGp2H5GBobgUjV9su1IAfjmuX5kZ+8z2VCeQjKwd0bl1kbzjJ+s7CTAwqa+Yn0gkkHbIs5DNEq4nHWGJnSRfXR/gASQB3+FCkxU92U2CySJ5Ywu4mnH00RdPMCzgiMSpQQdB3gAZ6Ngu5l9Lgr4eUMPMyR5zVQNlLiphBo2KrgQtm+OHKHOXH1z3JDplZ8CQdU/MnG5TjlWs3/Ou3oNwKLQDxUkRCEPhoi2P63Ubf60h0/bXc+h2xg+xFzZS13xLrsHDr2VRB8kywjnk4wcjNDOBSilznBig1m6HrZrGBMxNaN9zXyx+2vn6oCCqbhMG+PCvFlhKjjtrbxned9c3+vVcOLDtanLzXszKpdp1+BMOfYlSbPW1UtIlWUMXs64lvaSjTWKiMDAC03qEF9857GpvudhWo502OR+z+BoVhPjdNl2JYHFHZ4wYNOvlARusq+jCHditROL3OJGzojV+wYjG9iIO/bQHwV0DTg+0B0lk67IXlQEWa/bd/e2QRgq5lSYQsc0YaxBNktMo8qiPJqEF8zZLSdkksuJS+omPUhI7z8Ea2PPIJ7/kzWNraec3MS3lyddzosii+45eAvWkpvkm5KgzpgBnbfOPxkO1iKNnzrRhjrQqIwq16QnVvdhVQiYbb0/EhOLOp8K7po6qcflCFlK48M2ih2aapJOZdxpcz1DghJj0D2GEns7/l889zSw72XHQZrIXV8dlctckb4xAZgs81oT7OGxVni2IHeL7BOP1f8JowsomH7qsIIcrRvVGtPUBp4cKIbbScNes0twGqC3kmRBEHR9lzkZ1fMiYacI3OMmHJRDIorMMB9OXQCwYdCO5tN5xsu+P+kwpcf+HgBRb7liHd8Da2Qd/1v9TF6CX2Czcg5l7vbDJGDRXA9r5T+jeCuAoiMzsrRnASu8izxuCm7JcYVPufO1Xc2Xwp67RElpLU LNd15D+w 4jCu2ecWR2aVFaUhnIzX1vRV2mkxgnqGKu24PnyxN9LH3L7zPNSmR9Jajdc1nIBEOKB5K9U1d3hWb7zZdyMkf9yuYMT5dmoc2WhPv0uKiW8pSHcyOcuMAkLJ/3ND59n6lPo3IIbC2rpnGKlOoLKvmsfrI44gNx+0iSrKo30kW73qEddUYWzZIy4hsBqKeauQ8b3YI+K5B3rkQD+0/S496/QhN2zo8zzmnRJbEkgJdckw5ez49nRmuRcc4mdTK7U3/WOFIq3I5DL+yH/W/DNNfA1+fpsdwuIm3m7p3ISCrKGOs9hA1WmVJ5Sprag== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav iomap_dio_zero() will pad a fs block with zeroes if the direct IO size < fs block size. iomap_dio_zero() has an implicit assumption that fs block size < page_size. This is true for most filesystems at the moment. If the block size > page size, this will send the contents of the page next to zero page(as len > PAGE_SIZE) to the underlying block device, causing FS corruption. iomap is a generic infrastructure and it should not make any assumptions about the fs block size and the page size of the system. Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke Reviewed-by: Dave Chinner Reviewed-by: Darrick J. Wong --- fs/iomap/buffered-io.c | 4 ++-- fs/iomap/direct-io.c | 30 ++++++++++++++++++++++++++++-- 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index f420c53d86ac..9a9e94c7ed1d 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -2007,10 +2007,10 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc, } EXPORT_SYMBOL_GPL(iomap_writepages); -static int __init iomap_init(void) +static int __init iomap_pagecache_init(void) { return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE), offsetof(struct iomap_ioend, io_bio), BIOSET_NEED_BVECS); } -fs_initcall(iomap_init); +fs_initcall(iomap_pagecache_init); diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index f3b43d223a46..61d09d2364f7 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include "trace.h" @@ -27,6 +28,13 @@ #define IOMAP_DIO_WRITE (1U << 30) #define IOMAP_DIO_DIRTY (1U << 31) +/* + * Used for sub block zeroing in iomap_dio_zero() + */ +#define ZERO_PAGE_64K_SIZE (65536) +#define ZERO_PAGE_64K_ORDER (get_order(ZERO_PAGE_64K_SIZE)) +static struct page *zero_page_64k; + struct iomap_dio { struct kiocb *iocb; const struct iomap_dio_ops *dops; @@ -236,9 +244,13 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, loff_t pos, unsigned len) { struct inode *inode = file_inode(dio->iocb->ki_filp); - struct page *page = ZERO_PAGE(0); struct bio *bio; + /* + * Max block size supported is 64k + */ + WARN_ON_ONCE(len > ZERO_PAGE_64K_SIZE); + bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, GFP_KERNEL); @@ -246,7 +258,7 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - __bio_add_page(bio, page, len, 0); + __bio_add_page(bio, zero_page_64k, len, 0); iomap_dio_submit_bio(iter, dio, bio, pos); } @@ -753,3 +765,17 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, return iomap_dio_complete(dio); } EXPORT_SYMBOL_GPL(iomap_dio_rw); + +static int __init iomap_dio_init(void) +{ + zero_page_64k = alloc_pages(GFP_KERNEL | __GFP_ZERO, + ZERO_PAGE_64K_ORDER); + + if (!zero_page_64k) + return -ENOMEM; + + set_memory_ro((unsigned long)page_address(zero_page_64k), + 1U << ZERO_PAGE_64K_ORDER); + return 0; +} +fs_initcall(iomap_dio_init); From patchwork Tue Jun 25 11:44:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13710926 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAADAC2BBCA for ; Tue, 25 Jun 2024 11:44:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5FAF26B02F4; Tue, 25 Jun 2024 07:44:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5A9D26B02F5; Tue, 25 Jun 2024 07:44:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44A076B02F6; Tue, 25 Jun 2024 07:44:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 23FF56B02F4 for ; Tue, 25 Jun 2024 07:44:58 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D54DE160B07 for ; Tue, 25 Jun 2024 11:44:57 +0000 (UTC) X-FDA: 82269229434.06.A7F4FD3 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by imf09.hostedemail.com (Postfix) with ESMTP id 070ED140011 for ; Tue, 25 Jun 2024 11:44:55 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=kipkGJg1; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf09.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719315879; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kZG/lL31K25//iaIAV7jBbt1MNjj987Sk8ll7Dxjdts=; b=burVrT3rDcGc9F+h1kkAi2k/wQ/xZwA0/bwlxbJigCh/uiTvWEQ6bXSImBP2dAvQhvJOr7 N5XGzMeupnQ1lIExYpRXGObU837QYD92RS8266o6cs3UDsyAf56VK0ILyDUOm5TArWk3H5 oxEjgETUFLhGjSJIx43LgTxnOMQbQuQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719315879; a=rsa-sha256; cv=none; b=jvr+lrXh5Th82mGfm7Hc2JPbiBv67cpJQ5B9zFBeLPymgAe65q28j1vMe8v0Oyl83tp8Vt aZuaNin4wYPE+gdNg9BIP0j/tXyMemSuk8C3KmSy8ePwz136z7dPTsVt77KBoiDcbpYbdr rvBvzSa0o5hbaouXSn0cQBcAc9uxfCE= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=kipkGJg1; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf09.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com Received: from smtp1.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:b231:465::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4W7jg86wgjz9sss; Tue, 25 Jun 2024 13:44:52 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1719315893; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kZG/lL31K25//iaIAV7jBbt1MNjj987Sk8ll7Dxjdts=; b=kipkGJg13H64tB3xNiHrnDUQlknoj9SB5UwPL7W747iM43EClbh9mwGs/CrYevYpL7dwy5 yCNI2/rrK1OKsdZopIkCOHXT0eQ6Cqxr9v6msmK6G6JfpTTj1qEENLSiThEgtM/qhXGYMX GaWME0bvdKddSrBN8BOmm6NEQdQb4JpLcTqhPcYf2UJjmDDA2Xp3FaeTfjicWVxjfRpSkH nGUAu9grTCZ5xhJqILNQJrmMUyBu58f462EuVQDsqJF1tezZUcNuXmDw0tb/2iO5yXZCA3 qC/qL2yIXVEpw0zMc8ARKspky8n5jArAa6O92MNAB1BYNhkxgeYOBO/aixYVcw== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan , Dave Chinner Subject: [PATCH v8 07/10] xfs: use kvmalloc for xattr buffers Date: Tue, 25 Jun 2024 11:44:17 +0000 Message-ID: <20240625114420.719014-8-kernel@pankajraghav.com> In-Reply-To: <20240625114420.719014-1-kernel@pankajraghav.com> References: <20240625114420.719014-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 070ED140011 X-Stat-Signature: g6ufbkc8bno3ewwk6yhubb7aszb7d1p5 X-Rspam-User: X-HE-Tag: 1719315895-21706 X-HE-Meta: U2FsdGVkX191oCg6mpDT5o5/tg4nqj/xsvmaya2No4THbXVtllzhyDKqaYPnKPWPhtiKGs8DjuiTz7gFR6EhuvvpeLS7B2AKFmGXmV1ScyOJqSMVR7VYOQKCE2WAekJv11MrAkVuVLJ1AbLFcnVfgmFd8BOG+b4Dtb1kY70YLS6AINYSpgOgsbRosECl0aAxfKBO0GVFIzwpFxOPt9Q49ieNJOBnPNACiiyrigLAKIBb1R1Bd4UKrOaEP8utHSLH8fjj7eeDOtuLYhUulUl8n8SFWYnax8t+zEQHKv3nfbbxa/KGWU2WaQIryvWyRp24G361St2YxotMns3HHfiZb7yYfHJXFzqmXiYyU+owzg4J7ZjuHJfudj6zce97kTQH8IRn9WYDERVn6Qu+ut0nnbGn0zhwpKKEpRhy3Bvllum/5iP8jKsPSjZq3RVjP/hVpvIky7Jkr1JmS0XRAhXmhnXr4CfprB2g7WoCPk0FrTjojpd1TWA1faSLeiYw3cmwPUih2VCLGysDltbq6/naCHPPnncgxNti+uROt0CGIsoYg8C0Ny5roj315OlARmpn+Z1pUSHoe7MTVzXr9UFbJANMlYJJ+hen3xhown0J7kIGkCnkPn1cjjYrsTKhu/muoK2HrKNhm1rXzAxBhPkP9lO0cOyotf2YdrBk9OKJgE+QKdfsqrHGtPaVeYlqhse//xv6npWMzu9keQacplZ3QxRubjnfAKzoEVDgDTXJFAxr+fcu7UFqijx07idhTfYcDvOhI48nDdCH42xKnrXyLUb4CesdRQo+nF48wGKURQ7IskUa0IP1+PpfdUbbEmMw0Jl+UcgOo5zHYEKEujVIys81B9qCO/CQ2yji4K6vV0EGr3fMXqwgYFIBRxlHUFLTObM8Ael93PPDprktBC9MXeubE6cQlJU9+1CAr7+pN6za3+Lu7UcpLY+aZTLQSQZ58FiIH/7cZxarXF2GcC7 fzbjqoIJ 4f7CDsMZmpXJC6ayaSYsmWM4Zm5G9lK2pivUdzgBgLdqL1lzDkQqCS+8XJ1jPcCNv5vI17vFEVE516t31pAnqD//+zMSUvG6a5OMgdqo50WKYAk0dgwE4AXnJMaZavlBXWMAbopTXwWnQ5mVx92LuoRv/C6FgE0NisT9abJp2huIegXfg2I1Z2KFUXGpD6/yvEhgfZ2NryvWGHHyBcIeNIDwxNxOcu19vo6ORUkN4vK4XFgZNhjjFDUsRnMVSE1d1Pqf7T50VT/ELrOiQIW9nIxLI700EUvBQ84pfECmTp5yIwV8fCGN8wTDTDj0ZXru8kkZwXAVgtCX54zQrtxB3T+TSLc7Nz9vs7LSwaFhYMG7NwUOb+iFNZYq9TuMAdbqdAAc7+YENgnUzC/UlH2iFjeeZdw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Dave Chinner Pankaj Raghav reported that when filesystem block size is larger than page size, the xattr code can use kmalloc() for high order allocations. This triggers a useless warning in the allocator as it is a __GFP_NOFAIL allocation here: static inline struct page *rmqueue(struct zone *preferred_zone, struct zone *zone, unsigned int order, gfp_t gfp_flags, unsigned int alloc_flags, int migratetype) { struct page *page; /* * We most definitely don't want callers attempting to * allocate greater than order-1 page units with __GFP_NOFAIL. */ >>>> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); ... Fix this by changing all these call sites to use kvmalloc(), which will strip the NOFAIL from the kmalloc attempt and if that fails will do a __GFP_NOFAIL vmalloc(). This is not an issue that productions systems will see as filesystems with block size > page size cannot be mounted by the kernel; Pankaj is developing this functionality right now. Reported-by: Pankaj Raghav Fixes: f078d4ea8276 ("xfs: convert kmem_alloc() to kmalloc()") Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Pankaj Raghav --- fs/xfs/libxfs/xfs_attr_leaf.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index b9e98950eb3d..09f4cb061a6e 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -1138,10 +1138,7 @@ xfs_attr3_leaf_to_shortform( trace_xfs_attr_leaf_to_sf(args); - tmpbuffer = kmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); - if (!tmpbuffer) - return -ENOMEM; - + tmpbuffer = kvmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); memcpy(tmpbuffer, bp->b_addr, args->geo->blksize); leaf = (xfs_attr_leafblock_t *)tmpbuffer; @@ -1205,7 +1202,7 @@ xfs_attr3_leaf_to_shortform( error = 0; out: - kfree(tmpbuffer); + kvfree(tmpbuffer); return error; } @@ -1613,7 +1610,7 @@ xfs_attr3_leaf_compact( trace_xfs_attr_leaf_compact(args); - tmpbuffer = kmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); + tmpbuffer = kvmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); memcpy(tmpbuffer, bp->b_addr, args->geo->blksize); memset(bp->b_addr, 0, args->geo->blksize); leaf_src = (xfs_attr_leafblock_t *)tmpbuffer; @@ -1651,7 +1648,7 @@ xfs_attr3_leaf_compact( */ xfs_trans_log_buf(trans, bp, 0, args->geo->blksize - 1); - kfree(tmpbuffer); + kvfree(tmpbuffer); } /* @@ -2330,7 +2327,7 @@ xfs_attr3_leaf_unbalance( struct xfs_attr_leafblock *tmp_leaf; struct xfs_attr3_icleaf_hdr tmphdr; - tmp_leaf = kzalloc(state->args->geo->blksize, + tmp_leaf = kvzalloc(state->args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); /* @@ -2371,7 +2368,7 @@ xfs_attr3_leaf_unbalance( } memcpy(save_leaf, tmp_leaf, state->args->geo->blksize); savehdr = tmphdr; /* struct copy */ - kfree(tmp_leaf); + kvfree(tmp_leaf); } xfs_attr3_leaf_hdr_to_disk(state->args->geo, save_leaf, &savehdr); From patchwork Tue Jun 25 11:44:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13710927 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0EDDC2BBCA for ; Tue, 25 Jun 2024 11:45:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 584F26B02F6; Tue, 25 Jun 2024 07:45:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 534046B02F9; Tue, 25 Jun 2024 07:45:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3AD8F6B02FA; Tue, 25 Jun 2024 07:45:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1A2896B02F6 for ; Tue, 25 Jun 2024 07:45:03 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C42D71414D1 for ; Tue, 25 Jun 2024 11:45:02 +0000 (UTC) X-FDA: 82269229644.25.E62305C Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [80.241.56.171]) by imf02.hostedemail.com (Postfix) with ESMTP id 289D18001E for ; Tue, 25 Jun 2024 11:45:00 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=Fe81Kwt6; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf02.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719315889; a=rsa-sha256; cv=none; b=BlpWf2/+aG6sd/lTC1urlEf+hF2N/8W/VhaTE+KIt/zbcMk1jdrt9Ajc20HXoZR5y57VWE d83WtOCSplEnoVGINMHVjd10XaUp+U411SDI7Dux4Swqa77kQuBXD1czg778VQhZlzAi4E jnFaDWkh/bbZoXD/yupDPMvwUCV7qZg= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=Fe81Kwt6; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf02.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719315889; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZPgRCkuUCpyOtqJkbk+wuiLppcoF4MzwGnJJC5c2ZmU=; b=GQeSvjF67ET8iRCzSB3mOaNE2swP4nLv09XC1M1uPENRePzD6B4XePU1wbkIe7hUDH3my2 YzEaqHDJUPfmFGMpi9/I4quNtx8fNRfovN5LGKI8+G7lQWlCMGEh2nKLT3YdUxb6l7VU0a ahMY0yApN1yZCdLwwLRKiAKklz1vpUI= Received: from smtp1.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:b231:465::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4W7jgD5q1Fz9sSN; Tue, 25 Jun 2024 13:44:56 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1719315896; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZPgRCkuUCpyOtqJkbk+wuiLppcoF4MzwGnJJC5c2ZmU=; b=Fe81Kwt6Af5fMxMJiq/2qofMmiyDbAdqQBxAjJKvZTgpRyA1GbhEOaXfqHeN+MBNMjxyIk h4FOGgYu2BZYLLS87eNULxWmW3qwiNPnurXlZBLkbjPhsMt0YsZlvaWgru2OSdYOxAFyex lEHc7rEpuSRR/h4W23j5fB7kDEMDtDhgNvLabR8JssFB2MO1KcSYQoteqUfygD88+OfXho pim21mKjg8k3xJW1IcKjFWb8ZFw5VGBaqE+MJ+L2BMmocd3CRXhnIm4yIntoxsIagmc+sI IychVXiMupepFH89+MgO+fXxTrjeV74yKU68HTPXmmRgPBJ+20d/2Tfb1enx/A== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v8 08/10] xfs: expose block size in stat Date: Tue, 25 Jun 2024 11:44:18 +0000 Message-ID: <20240625114420.719014-9-kernel@pankajraghav.com> In-Reply-To: <20240625114420.719014-1-kernel@pankajraghav.com> References: <20240625114420.719014-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 289D18001E X-Stat-Signature: smkuxwhf19rkdhm1pc58bk7p881985sh X-Rspam-User: X-HE-Tag: 1719315900-422398 X-HE-Meta: U2FsdGVkX18cZDpMTn/RY2G3EF0GRu9+17dEoCnSQTEdNd3PdqYGv8F/OSw/vl2RnAq1wIEUt1lTEKEypCXtqBBqYQZAzHbwFT4jyN0ElBfmlGHsvh9HK13UngV3G/YoSjpStXrDhS9MZrY3RLQL/fVLuluVDOEJprk604liBTT8RTBFUoYEKbz1kNfbh+kXMq9Sj/JTXQYq+IiN+wr4+tb6YwN8fZaz+RatioPj46j8yUt5IPT/vi3pau3BMxq03Onm//JFA0/nnCHjuMp73sKvqrz+CLrYIepKZjqVsMVQyiRvfj271J196McSSIGdEPiA+V0vC6RGivbduYuPYpDJJ8zoGvgBjLjHA9zrPXICCVmIJFsC5PwjJtWvihRyBawBTmyFQ8M49GXAbI5ssdEyv91ZEedkCfFqMcOG6xhpu4isg0rwT0bYhrK9Gyn1maF3wX+wDCHBMiYiuT1S62rk9I4ebjOlBcGSbgpfsbxLqMMdW+5CztFolr5Baiz5EI/tvnUvvf7tgbx9KG6UnmprqKXQDrpa2Xe5eY+uSsPfjc6BlNlM7pT2GcQU+1KdI0En8QJHWgpBF6/G3Za6yrxUi00eCdbxgEdb3j5rWwTHx92rCIyMuBIYHDmdVpZ9SlDPLgeBkFIeP5iEOW/BrY9IHczDscwl0we2WBS09IMk4U7ZyK2lPzzLc5RcMVDRZSwozm7Dtz3aU96vgBqBsGDqIOw2ayAVihGkrrCkJNVA4v0P3e3/aafLXg5deIPmID4MhtOo8kvO3EPpCzcg+/Du6BXtooeDwOtTPwVhuEFzetY7bkQWLKz9BeXzvZt+zZRt2l/gdcbOww4bbq6a4z5dwTmnCCEimqhJJlfrW1TibA/W+J6eEsfQan7bPIt0qtIZCHEXJwmu9SsQe67bkaj6boEi1d0cE8FgTDXSTDLNJk5lZaMFkNRaSy6pv3CU0klMv+s3xSyRazbQ2hv 9AX3OHT9 9e41rrvhB5yHSSqymb1w8X/GZbE9u01I690UCqCdV5wVY4qCZ3XbR3v8OIYY2khkZ32KsKBuzL0Xc+3TPgVT8ZrGkQYeko43kWuhT4v8mHiAcWg0PZnNdwmNS/AHFODCKJPaB4EO6qcJYvmQT59hw3sw7F2tTx9BxdmgATIQh75+PZy0aAQlx5l17bLU4wmc3O6PtqJBayV6bkRcEnJMTkjj8xN0DRYROLBYnw47UeVSRwsipuM8ovUiRqYh8pg2N/wHe5Bzd5K6U2r+gPeFGIordVOEuidghDWyOgZPGSuw+hZxfB+8VaVNBMO4FaM1St8SZmjjgk9giHNKWj7/GSgYupAPsR3p+28rPpqJYZo85uvfy1jlF3AlIgL1PFxmO+D1kXbf7fEYNQjafuQG351Nhy4UkuOzdu8Gfs/2zoZhfzRTxG3LpmSMviMS/sBXTgkL3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav For block size larger than page size, the unit of efficient IO is the block size, not the page size. Leaving stat() to report PAGE_SIZE as the block size causes test programs like fsx to issue illegal ranges for operations that require block size alignment (e.g. fallocate() insert range). Hence update the preferred IO size to reflect the block size in this case. This change is based on a patch originally from Dave Chinner.[1] [1] https://lwn.net/ml/linux-fsdevel/20181107063127.3902-16-david@fromorbit.com/ Reviewed-by: Darrick J. Wong Signed-off-by: Luis Chamberlain Signed-off-by: Pankaj Raghav Reviewed-by: Dave Chinner --- fs/xfs/xfs_iops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index a00dcbc77e12..da5c13150315 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -562,7 +562,7 @@ xfs_stat_blksize( return 1U << mp->m_allocsize_log; } - return PAGE_SIZE; + return max_t(uint32_t, PAGE_SIZE, mp->m_sb.sb_blocksize); } STATIC int From patchwork Tue Jun 25 11:44:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13710928 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE2C0C30653 for ; Tue, 25 Jun 2024 11:45:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C0FE6B02FA; Tue, 25 Jun 2024 07:45:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 44A7D6B02FB; Tue, 25 Jun 2024 07:45:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29C046B02FC; Tue, 25 Jun 2024 07:45:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 07C1D6B02FA for ; Tue, 25 Jun 2024 07:45:06 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8521E1C210C for ; Tue, 25 Jun 2024 11:45:05 +0000 (UTC) X-FDA: 82269229770.30.6FAE391 Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [80.241.56.171]) by imf26.hostedemail.com (Postfix) with ESMTP id DC4B2140012 for ; Tue, 25 Jun 2024 11:45:03 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=vx1oCSPH; spf=pass (imf26.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719315890; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iyy/Uh+tbYC9uCouyOUGcxbKb95j1x/VN4vR3SasgEQ=; b=VUpjCbGlIjd7mVGkqhK3xqBmi/E6Wy4GfXMDiV0W/8TBOvnszBDqpqYaXezU7NZKESqeGB eS7lwKd5PEklF8PHnZ4zWXh7dy8FoNGCQW3HM9ZAXjTQbde6g5q73gV9wczYkhYu8p+qqA YCBuFu1TvNUbfCHEMW0S6rRhi6nsiCM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719315890; a=rsa-sha256; cv=none; b=xuaaZ1a4pcy0i+HVxue6Cm39uxH5N1zwVUrPbh29+hnXCL2RvbD+Nr8bH7Sza5PL9SDXXi tdgBjw9fJi4Yq/QxijZhi2uTF9pDCM3fF7SXoFYYEIqv3kOuBzHezAZOFKmPftcK/ryNsu cEaFKo7vwIZ3MSjQJ8StDKtCKBu/7Fo= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=vx1oCSPH; spf=pass (imf26.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4W7jgJ6MTzz9t2m; Tue, 25 Jun 2024 13:45:00 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1719315900; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iyy/Uh+tbYC9uCouyOUGcxbKb95j1x/VN4vR3SasgEQ=; b=vx1oCSPHzXDRQPX8b0i9M9281m8Y9dOOLXEh58esfpUZkeQ8qW28s1hIC0KrbYg1kLnsh3 oWAiH1+XP/YA/DwPHuXNj43HGL7MdYMm2In3JAiadupkUDFJk3eOXtt76e4Mwno0lT9WIM tPxEcF5ic88JEn7ZdwQvrgVwZYYFUpapyatVmWb/i90WXIE0dkAH/2UeIu9rwYItsz5eEh V72EssyWBjmBHvM3d+1n6usIDunh1o3k0ckFx7HIjSpWWYfwDMQLxYN20hEC4xE9uj4MAx V0CSExZkXvwCFAwmg8uJYqzFOhAD8rJovBw3MTipYl/qPFf/NFZTdyLHUBYB3Q== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v8 09/10] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Date: Tue, 25 Jun 2024 11:44:19 +0000 Message-ID: <20240625114420.719014-10-kernel@pankajraghav.com> In-Reply-To: <20240625114420.719014-1-kernel@pankajraghav.com> References: <20240625114420.719014-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: DC4B2140012 X-Stat-Signature: dqbs6b1ddexwbds5x8zwpbrmjnfix1wa X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1719315903-470288 X-HE-Meta: U2FsdGVkX19pd53K5lICXGP2RIyKiJoc4cB3TDNvZxxlheGrbKh39MESNU6ZCZQdWucMZqIFjYRrz2iPXWcf+GZNxMuN8BdIAaxFqUz7FP4z5f9gCj+VwYmOZljpHc+UZ9lKIPKs3G3o4L91Y4R9NhQQXa2awLMo8uYG77F7U/8zSUpzuv9mWc3C7VHh43RX5zZJeprfZr7wheTfdBt3wW8WZQuOo9reXGQXpYQn7mMoeDdO1a+isjiOIf5TizCG+Tip6udyhJnp5hbdw8f49TEpVdbHPOa5EZf703cIwnpj0jfOV1m1KKmFBBkoj4JTuOX1jzbeRA4cR7yI4U623WT+AgDDaBZ9rs65qH8euoTy53T9ZIs8yoxH9hwJifu9xdicsZtNvMxMNGQQNpjFfEjgaZt5ds4IwF9z2gJCZyMuBMDciB0zzJq8o3jiHmPoQqCRXDsMa8XwWq7pQx6pnfD8rVHnd5cNTX1OjyCYiTxVUiIpWCigVdxoVYo99WnkTU8n/+4iEnN8/Fx9FsiRS1SJ73YczWlsQnL0aRjdVoPqcKUf+cxr1w0zQhWJEu/Uoc5c1t9xLBXTQNM9+wVNgG1+qB3+7c0Ea7XUZb33jIqIMBsVYKtNEeGQvoS1lxemGuvQHI6SJYwGIMQFKbpon5CpK3I4pBel54oToJDpOoaTq/c8y2+ksGTADp6MlUm+/0DPBn8KkMSkH/F6XaL/GHcv/JlsHceYM76Fq42nlhuT2SrojgSRWtbK8mDUgwe0ek57scJELGVLm9AWq8xBqF5z5K3va+E8ZWij8T+ckKFbp5tVnPU3LN4ojZpI7La9Ru3SkjD2BgJoVvd/QCHe+FF59wpQpwjc/Zn9X9ysMsBLKTc17n+cyB3GMgsbyputTyHrwk8GQwg1p6mc7KDO6k8lksqR7p6ljWzAlCdrcpCsSlGB1+HDcW+JJg1RJX6RzZSiKlV0VbSaakTE2pa PeOJIP5Q 6+/nJaVhyRAydmyRKus438jbs5ZOJtSztkBnREf4Zf44fjT7SCszEz8/PPoYbdRw4h70eisByECNSL615Yct1AD693NteaaPDYgKLAqgZiMbBJ1J94eZxXZmlxxC70crEH9G82W6dJpUzPpK/GH9i0ucXiwCqH/b/auxRRVQ/PKjQszjoglb8tcIwDiUeQAyApnmLN8C0Y13MgvZRmDYAd6Am8ztt70EdsOTBhiaDGrgAnn0agVyMRCiuqdSKaKPwcQvIbZ4oU29jk6fj3fshQTXu+tcZkglVJbpu5tNFQIVeDY6CjbFSJIWlaIyGksPIE2nnIksungKw12eeu7f6fsWlyCqHGB6LtKnGdtTjGgfTozru6U4oxXABeA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav Instead of assuming that PAGE_SHIFT is always higher than the blocklog, make the calculation generic so that page cache count can be calculated correctly for LBS. Reviewed-by: Darrick J. Wong Signed-off-by: Pankaj Raghav Reviewed-by: Dave Chinner --- fs/xfs/xfs_mount.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 09eef1721ef4..3949f720b535 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -132,11 +132,16 @@ xfs_sb_validate_fsb_count( xfs_sb_t *sbp, uint64_t nblocks) { + uint64_t max_bytes; + ASSERT(PAGE_SHIFT >= sbp->sb_blocklog); ASSERT(sbp->sb_blocklog >= BBSHIFT); + if (check_shl_overflow(nblocks, sbp->sb_blocklog, &max_bytes)) + return -EFBIG; + /* Limited by ULONG_MAX of page cache index */ - if (nblocks >> (PAGE_SHIFT - sbp->sb_blocklog) > ULONG_MAX) + if (max_bytes >> PAGE_SHIFT > ULONG_MAX) return -EFBIG; return 0; } From patchwork Tue Jun 25 11:44:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13710929 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9188C2BBCA for ; Tue, 25 Jun 2024 11:45:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5AD436B02FC; Tue, 25 Jun 2024 07:45:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 55DA66B02FD; Tue, 25 Jun 2024 07:45:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D72C6B02FE; Tue, 25 Jun 2024 07:45:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1E89B6B02FC for ; Tue, 25 Jun 2024 07:45:10 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DDB221616AC for ; Tue, 25 Jun 2024 11:45:09 +0000 (UTC) X-FDA: 82269229938.23.7CD637B Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [80.241.56.172]) by imf28.hostedemail.com (Postfix) with ESMTP id 30A34C0002 for ; Tue, 25 Jun 2024 11:45:07 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=JUEjC0St; spf=pass (imf28.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.172 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719315893; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fyS4GVf4PvByyE0IUbrr3l1s3P7v70XANt/sXrYXYDk=; b=GfCzvsoYz1IURj432TfBfLhU7gHMMDuqxqNENRRyXj8me6DekfkLTirtWqECIZp58eou2V 4GCdMPAXfWxdxO+KbQd7YDc81/9SHxzYu6crfOEe3uCGlansDX0cuUurgKRhiqJ6j6I42W zvbUlN75ga2aeKbyC94/xYwrUA5TWtc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719315893; a=rsa-sha256; cv=none; b=pdW2psuGaA7DJL/Xp76N3GWWocjN6hDVOlnLgc94wDaAQjIl3i1cWQfqobCo8srgxjA+cP lsuq3bBCb91WQtK0+CcgJ4OCtE7liQbmDvC8cFnwlmXxwQbf2EezeGPfyTNg5qPt1DL2OA yxJoj9IKD4bwAxwveZqPwrH4P9wYZaY= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=JUEjC0St; spf=pass (imf28.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.172 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com Received: from smtp1.mailbox.org (smtp1.mailbox.org [10.196.197.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4W7jgN6pNYz9spK; Tue, 25 Jun 2024 13:45:04 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1719315905; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fyS4GVf4PvByyE0IUbrr3l1s3P7v70XANt/sXrYXYDk=; b=JUEjC0StwOgXBgGlDaXdgJYKqaINIckvwir1VzS6vaPsaIuGOoRoH4bIEBnAPSiFaBTsij Wg2CeEeSf29wU8MOtLW7DEWlkJx7/q2myVSyUGY5xc9IZvT+AKXe2l99KwhqW1jGBua1DY PO4ofn3aJAnp4NBPB0RiKD2wMATzgAVn0gOz1gYoxGohcI4Odo85hvIEI/JYspGsH/3XKh bgJrG01RH+pzSMq8l41ho4jHg0lItICn8DvmfnmldRRQAMKy7e/U573JD1xzBsoh4GFzkh Pii6SUSGTYeX37Tx0bkk7i3xlrnBQ5+9hkU5tj7xhJ8HJ/fGXAPpPx5/7kqEzA== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v8 10/10] xfs: enable block size larger than page size support Date: Tue, 25 Jun 2024 11:44:20 +0000 Message-ID: <20240625114420.719014-11-kernel@pankajraghav.com> In-Reply-To: <20240625114420.719014-1-kernel@pankajraghav.com> References: <20240625114420.719014-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Stat-Signature: zths9cxafau58sacqenso939m1rstms8 X-Rspamd-Queue-Id: 30A34C0002 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1719315907-96485 X-HE-Meta: U2FsdGVkX1+IEZWGHVe3nxAEdeGudBMyPBAZLYBvA43POFeht8ZMMr6W3k3twlLV+zG/3UduP9hDoNa3PjIXeNhYEIkCvfTB/Q9eA5pmY9Nc1Eaf5hStJA5Ym57saiufuH4c0zAQ1aFyRRruXlbx49JQY2S44aGHng9/V2bi0hqqIF7ll6lk+DRJL6MU6105pUHep8pJqCUYZ9jVJ1ZFzV5p11pNy3j6HupGTiR4zwba1IrAkMjAAgb0iIGqIHqacvIQbE0hszqNRwjiBEosvR1klVqfVz+k/wLASWhevRCnO5XdScHRFL7rxqTI/ku8kEbH0sT4lNK9cocjK3yu4UVEkzuHqF3uPMDT3cQx+USRmb4+uUB1PhRW2h5lrrhEkYlKcL2ugsgCleDnd/Wz+mxWuJuiT7kZE+zkCSc6jGIlxz0NWyDxoz5XwogCcU2PXMEXebN6FqC1F9uOSHahXntKZD9crcMA8efhy9c44oeHZSXCeQoz+KYVx2dEamJdhPCZCKgwvHQ8JHmUfXYEhVsHHdolIQu0M0EhBgWVS03csh8t7yritQMFnE7lmcMu2EP+4K2S2Wl2Yo9du3v5a2QHpEoxajL6SrCFuILNLBJmSwdX1UpjkXPm8yIJUoLftCl4VCdAI2aTQvHCFb3Ogd9IyYd5MjDXnCCemMWwH240zTeqXErz4bIYMZC/9tvCeFzAr/IQ9OsjI5W8ZBZ+nyUw03Fhz81Ncq1Qrv0aai0zRuv7Nv/sbbBPUAsWKqsPBY8tL6snTgfHMRJotNyv2OBsmHmrKJkv8rpoBb2KU3hT/NCvAV783Q4ihd02PDQlFu2YAPcSx8JNNnqbaDQjKj7QVOX36inLG4zcMTB5OYuJeWJsLfEosUIb9pwSo+FYKM+57BqmPl1norDea/B2BkvkSnGixIzwEgr1/1Q18yx0zaJLIoMUDNhO/rk6ynEkc5nUky30vs9vGzVI8Hv XhHwUBaR Fl4BQD5UUy1h6wu2EAF9FD3/QKG+J5+UCeEazgkkD1il31BptwrWqsgmgh2vzNguT/RL9x8Auyl3Ho3HaJiJ73HMzd8Hd3Kn+EAvNu4tElmHUmkQmosIIJMXf9ojgDGpV2ir+YUUVAn/yxU0767Xk+2ArSdgWInANTZAOuy0oDb4xSOv7JkQZvKDFNZgfVSlVBhM4yVTKzkh3ExIUT8yXmfzK4SoPE45AbxLCJwq5eVyDzqkGJN8BKvVb6g8Y7teLUDqsgC0tFZ8BP6LPI7IrnPEG3VctRhp+u4nf X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav Page cache now has the ability to have a minimum order when allocating a folio which is a prerequisite to add support for block size > page size. Reviewed-by: Darrick J. Wong Signed-off-by: Luis Chamberlain Signed-off-by: Pankaj Raghav Reviewed-by: Dave Chinner --- @hch and @Dave I have retained the min_folio_order to be in the inode struct as the discussion about moving this to xfs_mount is still open. fs/xfs/libxfs/xfs_ialloc.c | 5 +++++ fs/xfs/libxfs/xfs_shared.h | 3 +++ fs/xfs/xfs_icache.c | 6 ++++-- fs/xfs/xfs_mount.c | 1 - fs/xfs/xfs_super.c | 18 ++++++++++-------- 5 files changed, 22 insertions(+), 11 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index 14c81f227c5b..1e76431d75a4 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -3019,6 +3019,11 @@ xfs_ialloc_setup_geometry( igeo->ialloc_align = mp->m_dalign; else igeo->ialloc_align = 0; + + if (mp->m_sb.sb_blocksize > PAGE_SIZE) + igeo->min_folio_order = mp->m_sb.sb_blocklog - PAGE_SHIFT; + else + igeo->min_folio_order = 0; } /* Compute the location of the root directory inode that is laid out by mkfs. */ diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index 34f104ed372c..e67a1c7cc0b0 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -231,6 +231,9 @@ struct xfs_ino_geometry { /* precomputed value for di_flags2 */ uint64_t new_diflags2; + /* minimum folio order of a page cache allocation */ + unsigned int min_folio_order; + }; #endif /* __XFS_SHARED_H__ */ diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 088ac200b026..e0f911f326e6 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -88,7 +88,8 @@ xfs_inode_alloc( /* VFS doesn't initialise i_mode! */ VFS_I(ip)->i_mode = 0; - mapping_set_large_folios(VFS_I(ip)->i_mapping); + mapping_set_folio_min_order(VFS_I(ip)->i_mapping, + M_IGEO(mp)->min_folio_order); XFS_STATS_INC(mp, vn_active); ASSERT(atomic_read(&ip->i_pincount) == 0); @@ -325,7 +326,8 @@ xfs_reinit_inode( inode->i_uid = uid; inode->i_gid = gid; inode->i_state = state; - mapping_set_large_folios(inode->i_mapping); + mapping_set_folio_min_order(inode->i_mapping, + M_IGEO(mp)->min_folio_order); return error; } diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 3949f720b535..c6933440f806 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -134,7 +134,6 @@ xfs_sb_validate_fsb_count( { uint64_t max_bytes; - ASSERT(PAGE_SHIFT >= sbp->sb_blocklog); ASSERT(sbp->sb_blocklog >= BBSHIFT); if (check_shl_overflow(nblocks, sbp->sb_blocklog, &max_bytes)) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 27e9f749c4c7..b8a93a8f35ca 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1638,16 +1638,18 @@ xfs_fs_fill_super( goto out_free_sb; } - /* - * Until this is fixed only page-sized or smaller data blocks work. - */ if (mp->m_sb.sb_blocksize > PAGE_SIZE) { - xfs_warn(mp, - "File system with blocksize %d bytes. " - "Only pagesize (%ld) or less will currently work.", + if (!xfs_has_crc(mp)) { + xfs_warn(mp, +"V4 Filesystem with blocksize %d bytes. Only pagesize (%ld) or less is supported.", mp->m_sb.sb_blocksize, PAGE_SIZE); - error = -ENOSYS; - goto out_free_sb; + error = -ENOSYS; + goto out_free_sb; + } + + xfs_warn(mp, +"EXPERIMENTAL: V5 Filesystem with Large Block Size (%d bytes) enabled.", + mp->m_sb.sb_blocksize); } /* Ensure this filesystem fits in the page cache limits */