diff mbox series

[v10,01/10] fs: Allow fine-grained control of folio sizes

Message ID 20240715094457.452836-2-kernel@pankajraghav.com (mailing list archive)
State Superseded, archived
Headers show
Series enable bs > ps in XFS | expand

Commit Message

Pankaj Raghav (Samsung) July 15, 2024, 9:44 a.m. UTC
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

We need filesystems to be able to communicate acceptable folio sizes
to the pagecache for a variety of uses (e.g. large block sizes).
Support a range of folio sizes between order-0 and order-31.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Co-developed-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 include/linux/pagemap.h | 107 +++++++++++++++++++++++++++++++++++-----
 mm/filemap.c            |   6 +--
 mm/readahead.c          |   4 +-
 3 files changed, 98 insertions(+), 19 deletions(-)

Comments

Matthew Wilcox July 16, 2024, 3:26 p.m. UTC | #1
On Mon, Jul 15, 2024 at 11:44:48AM +0200, Pankaj Raghav (Samsung) wrote:
> +/*
> + * mapping_max_folio_size_supported() - Check the max folio size supported
> + *
> + * The filesystem should call this function at mount time if there is a
> + * requirement on the folio mapping size in the page cache.
> + */
> +static inline size_t mapping_max_folio_size_supported(void)
> +{
> +	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> +		return 1U << (PAGE_SHIFT + MAX_PAGECACHE_ORDER);
> +	return PAGE_SIZE;
> +}

There's no need for this to be part of this patch.  I've removed stuff
from this patch before that's not needed, please stop adding unnecessary
functions.  This would logically be part of patch 10.

> +static inline void mapping_set_folio_order_range(struct address_space *mapping,
> +						 unsigned int min,
> +						 unsigned int max)
> +{
> +	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> +		return;
> +
> +	if (min > MAX_PAGECACHE_ORDER) {
> +		VM_WARN_ONCE(1,
> +	"min order > MAX_PAGECACHE_ORDER. Setting min_order to MAX_PAGECACHE_ORDER");
> +		min = MAX_PAGECACHE_ORDER;
> +	}

This is really too much.  It's something that will never happen.  Just
delete the message.

> +	if (max > MAX_PAGECACHE_ORDER) {
> +		VM_WARN_ONCE(1,
> +	"max order > MAX_PAGECACHE_ORDER. Setting max_order to MAX_PAGECACHE_ORDER");
> +		max = MAX_PAGECACHE_ORDER;

Absolutely not.  If the filesystem declares it can support a block size
of 4TB, then good for it.  We just silently clamp it.
Pankaj Raghav (Samsung) July 17, 2024, 9:46 a.m. UTC | #2
On Tue, Jul 16, 2024 at 04:26:10PM +0100, Matthew Wilcox wrote:
> On Mon, Jul 15, 2024 at 11:44:48AM +0200, Pankaj Raghav (Samsung) wrote:
> > +/*
> > + * mapping_max_folio_size_supported() - Check the max folio size supported
> > + *
> > + * The filesystem should call this function at mount time if there is a
> > + * requirement on the folio mapping size in the page cache.
> > + */
> > +static inline size_t mapping_max_folio_size_supported(void)
> > +{
> > +	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> > +		return 1U << (PAGE_SHIFT + MAX_PAGECACHE_ORDER);
> > +	return PAGE_SIZE;
> > +}
> 
> There's no need for this to be part of this patch.  I've removed stuff
> from this patch before that's not needed, please stop adding unnecessary
> functions.  This would logically be part of patch 10.

That makes sense. I will move it to the last patch.

> 
> > +static inline void mapping_set_folio_order_range(struct address_space *mapping,
> > +						 unsigned int min,
> > +						 unsigned int max)
> > +{
> > +	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> > +		return;
> > +
> > +	if (min > MAX_PAGECACHE_ORDER) {
> > +		VM_WARN_ONCE(1,
> > +	"min order > MAX_PAGECACHE_ORDER. Setting min_order to MAX_PAGECACHE_ORDER");
> > +		min = MAX_PAGECACHE_ORDER;
> > +	}
> 
> This is really too much.  It's something that will never happen.  Just
> delete the message.
> 
> > +	if (max > MAX_PAGECACHE_ORDER) {
> > +		VM_WARN_ONCE(1,
> > +	"max order > MAX_PAGECACHE_ORDER. Setting max_order to MAX_PAGECACHE_ORDER");
> > +		max = MAX_PAGECACHE_ORDER;
> 
> Absolutely not.  If the filesystem declares it can support a block size
> of 4TB, then good for it.  We just silently clamp it.

Hmm, but you raised the point about clamping in the previous patches[1]
after Ryan pointed out that we should not silently clamp the order.

```
> It seems strange to silently clamp these? Presumably for the bs>ps usecase,
> whatever values are passed in are a hard requirement? So wouldn't want them to
> be silently reduced. (Especially given the recent change to reduce the size of
> MAX_PAGECACHE_ORDER to less then PMD size in some cases).

Hm, yes.  We should probably make this return an errno.  Including
returning an errno for !IS_ENABLED() and min > 0.
```

It was not clear from the conversation in the previous patches that we
decided to just clamp the order (like it was done before).

So let's just stick with how it was done before where we clamp the
values if min and max > MAX_PAGECACHE_ORDER?

[1] https://lore.kernel.org/linux-fsdevel/Zoa9rQbEUam467-q@casper.infradead.org/
Ryan Roberts July 17, 2024, 9:59 a.m. UTC | #3
On 17/07/2024 10:46, Pankaj Raghav (Samsung) wrote:
> On Tue, Jul 16, 2024 at 04:26:10PM +0100, Matthew Wilcox wrote:
>> On Mon, Jul 15, 2024 at 11:44:48AM +0200, Pankaj Raghav (Samsung) wrote:
>>> +/*
>>> + * mapping_max_folio_size_supported() - Check the max folio size supported
>>> + *
>>> + * The filesystem should call this function at mount time if there is a
>>> + * requirement on the folio mapping size in the page cache.
>>> + */
>>> +static inline size_t mapping_max_folio_size_supported(void)
>>> +{
>>> +	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
>>> +		return 1U << (PAGE_SHIFT + MAX_PAGECACHE_ORDER);
>>> +	return PAGE_SIZE;
>>> +}
>>
>> There's no need for this to be part of this patch.  I've removed stuff
>> from this patch before that's not needed, please stop adding unnecessary
>> functions.  This would logically be part of patch 10.
> 
> That makes sense. I will move it to the last patch.
> 
>>
>>> +static inline void mapping_set_folio_order_range(struct address_space *mapping,
>>> +						 unsigned int min,
>>> +						 unsigned int max)
>>> +{
>>> +	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
>>> +		return;
>>> +
>>> +	if (min > MAX_PAGECACHE_ORDER) {
>>> +		VM_WARN_ONCE(1,
>>> +	"min order > MAX_PAGECACHE_ORDER. Setting min_order to MAX_PAGECACHE_ORDER");
>>> +		min = MAX_PAGECACHE_ORDER;
>>> +	}
>>
>> This is really too much.  It's something that will never happen.  Just
>> delete the message.
>>
>>> +	if (max > MAX_PAGECACHE_ORDER) {
>>> +		VM_WARN_ONCE(1,
>>> +	"max order > MAX_PAGECACHE_ORDER. Setting max_order to MAX_PAGECACHE_ORDER");
>>> +		max = MAX_PAGECACHE_ORDER;
>>
>> Absolutely not.  If the filesystem declares it can support a block size
>> of 4TB, then good for it.  We just silently clamp it.
> 
> Hmm, but you raised the point about clamping in the previous patches[1]
> after Ryan pointed out that we should not silently clamp the order.
> 
> ```
>> It seems strange to silently clamp these? Presumably for the bs>ps usecase,
>> whatever values are passed in are a hard requirement? So wouldn't want them to
>> be silently reduced. (Especially given the recent change to reduce the size of
>> MAX_PAGECACHE_ORDER to less then PMD size in some cases).
> 
> Hm, yes.  We should probably make this return an errno.  Including
> returning an errno for !IS_ENABLED() and min > 0.
> ```
> 
> It was not clear from the conversation in the previous patches that we
> decided to just clamp the order (like it was done before).
> 
> So let's just stick with how it was done before where we clamp the
> values if min and max > MAX_PAGECACHE_ORDER?
> 
> [1] https://lore.kernel.org/linux-fsdevel/Zoa9rQbEUam467-q@casper.infradead.org/

The way I see it, there are 2 approaches we could take:

1. Implement mapping_max_folio_size_supported(), write a headerdoc for
mapping_set_folio_order_range() that says min must be lte max, max must be lte
mapping_max_folio_size_supported(). Then emit VM_WARN() in
mapping_set_folio_order_range() if the constraints are violated, and clamp to
make it safe (from page cache's perspective). The VM_WARN()s can just be inline
in the if statements to keep them clean. The FS is responsible for checking
mapping_max_folio_size_supported() and ensuring min and max meet requirements.

2. Return an error from mapping_set_folio_order_range() (and the other functions
that set min/max). No need for warning. No state changed if error is returned.
FS can emit warning on error if it wants.

Personally I prefer option 2, but 1 is definitely less churn.

Thanks,
Ryan
Pankaj Raghav (Samsung) July 17, 2024, 3:12 p.m. UTC | #4
> >>
> >> This is really too much.  It's something that will never happen.  Just
> >> delete the message.
> >>
> >>> +	if (max > MAX_PAGECACHE_ORDER) {
> >>> +		VM_WARN_ONCE(1,
> >>> +	"max order > MAX_PAGECACHE_ORDER. Setting max_order to MAX_PAGECACHE_ORDER");
> >>> +		max = MAX_PAGECACHE_ORDER;
> >>
> >> Absolutely not.  If the filesystem declares it can support a block size
> >> of 4TB, then good for it.  We just silently clamp it.
> > 
> > Hmm, but you raised the point about clamping in the previous patches[1]
> > after Ryan pointed out that we should not silently clamp the order.
> > 
> > ```
> >> It seems strange to silently clamp these? Presumably for the bs>ps usecase,
> >> whatever values are passed in are a hard requirement? So wouldn't want them to
> >> be silently reduced. (Especially given the recent change to reduce the size of
> >> MAX_PAGECACHE_ORDER to less then PMD size in some cases).
> > 
> > Hm, yes.  We should probably make this return an errno.  Including
> > returning an errno for !IS_ENABLED() and min > 0.
> > ```
> > 
> > It was not clear from the conversation in the previous patches that we
> > decided to just clamp the order (like it was done before).
> > 
> > So let's just stick with how it was done before where we clamp the
> > values if min and max > MAX_PAGECACHE_ORDER?
> > 
> > [1] https://lore.kernel.org/linux-fsdevel/Zoa9rQbEUam467-q@casper.infradead.org/
> 
> The way I see it, there are 2 approaches we could take:
> 
> 1. Implement mapping_max_folio_size_supported(), write a headerdoc for
> mapping_set_folio_order_range() that says min must be lte max, max must be lte
> mapping_max_folio_size_supported(). Then emit VM_WARN() in
> mapping_set_folio_order_range() if the constraints are violated, and clamp to
> make it safe (from page cache's perspective). The VM_WARN()s can just be inline

Inlining with the `if` is not possible since:
91241681c62a ("include/linux/mmdebug.h: make VM_WARN* non-rvals")

> in the if statements to keep them clean. The FS is responsible for checking
> mapping_max_folio_size_supported() and ensuring min and max meet requirements.

This is sort of what is done here but IIUC willy's reply to the patch,
he prefers silent clamping over having WARNINGS. I think because we check
the constraints during the mount time, so it should be safe to call
this I guess?

> 
> 2. Return an error from mapping_set_folio_order_range() (and the other functions
> that set min/max). No need for warning. No state changed if error is returned.
> FS can emit warning on error if it wants.

I think Chinner was not happy with this approach because this is done
per inode and basically we would just shutdown the filesystem in the
first inode allocation instead of refusing the mount as we know about
the MAX_PAGECACHE_ORDER even during the mount phase anyway.

--
Pankaj
Darrick J. Wong July 17, 2024, 3:25 p.m. UTC | #5
On Wed, Jul 17, 2024 at 03:12:51PM +0000, Pankaj Raghav (Samsung) wrote:
> > >>
> > >> This is really too much.  It's something that will never happen.  Just
> > >> delete the message.
> > >>
> > >>> +	if (max > MAX_PAGECACHE_ORDER) {
> > >>> +		VM_WARN_ONCE(1,
> > >>> +	"max order > MAX_PAGECACHE_ORDER. Setting max_order to MAX_PAGECACHE_ORDER");
> > >>> +		max = MAX_PAGECACHE_ORDER;
> > >>
> > >> Absolutely not.  If the filesystem declares it can support a block size
> > >> of 4TB, then good for it.  We just silently clamp it.
> > > 
> > > Hmm, but you raised the point about clamping in the previous patches[1]
> > > after Ryan pointed out that we should not silently clamp the order.
> > > 
> > > ```
> > >> It seems strange to silently clamp these? Presumably for the bs>ps usecase,
> > >> whatever values are passed in are a hard requirement? So wouldn't want them to
> > >> be silently reduced. (Especially given the recent change to reduce the size of
> > >> MAX_PAGECACHE_ORDER to less then PMD size in some cases).
> > > 
> > > Hm, yes.  We should probably make this return an errno.  Including
> > > returning an errno for !IS_ENABLED() and min > 0.
> > > ```
> > > 
> > > It was not clear from the conversation in the previous patches that we
> > > decided to just clamp the order (like it was done before).
> > > 
> > > So let's just stick with how it was done before where we clamp the
> > > values if min and max > MAX_PAGECACHE_ORDER?
> > > 
> > > [1] https://lore.kernel.org/linux-fsdevel/Zoa9rQbEUam467-q@casper.infradead.org/
> > 
> > The way I see it, there are 2 approaches we could take:
> > 
> > 1. Implement mapping_max_folio_size_supported(), write a headerdoc for
> > mapping_set_folio_order_range() that says min must be lte max, max must be lte
> > mapping_max_folio_size_supported(). Then emit VM_WARN() in
> > mapping_set_folio_order_range() if the constraints are violated, and clamp to
> > make it safe (from page cache's perspective). The VM_WARN()s can just be inline
> 
> Inlining with the `if` is not possible since:
> 91241681c62a ("include/linux/mmdebug.h: make VM_WARN* non-rvals")
> 
> > in the if statements to keep them clean. The FS is responsible for checking
> > mapping_max_folio_size_supported() and ensuring min and max meet requirements.
> 
> This is sort of what is done here but IIUC willy's reply to the patch,
> he prefers silent clamping over having WARNINGS. I think because we check
> the constraints during the mount time, so it should be safe to call
> this I guess?

That's my read of the situation, but I'll ask about it at the next thp
meeting if that helps.

> > 
> > 2. Return an error from mapping_set_folio_order_range() (and the other functions
> > that set min/max). No need for warning. No state changed if error is returned.
> > FS can emit warning on error if it wants.
> 
> I think Chinner was not happy with this approach because this is done
> per inode and basically we would just shutdown the filesystem in the
> first inode allocation instead of refusing the mount as we know about
> the MAX_PAGECACHE_ORDER even during the mount phase anyway.

I agree.  Filesystem-wide properties (e.g. fs blocksize) should cause
the mount to fail if the pagecache cannot possibly handle any file
blocks.  Inode-specific properties (e.g. the forcealign+notears write
work John Garry is working on) could error out of open() with -EIO, but
that's a specialty file property.

--D

> --
> Pankaj
>
Ryan Roberts July 17, 2024, 3:26 p.m. UTC | #6
On 17/07/2024 16:12, Pankaj Raghav (Samsung) wrote:
>>>>
>>>> This is really too much.  It's something that will never happen.  Just
>>>> delete the message.
>>>>
>>>>> +	if (max > MAX_PAGECACHE_ORDER) {
>>>>> +		VM_WARN_ONCE(1,
>>>>> +	"max order > MAX_PAGECACHE_ORDER. Setting max_order to MAX_PAGECACHE_ORDER");
>>>>> +		max = MAX_PAGECACHE_ORDER;
>>>>
>>>> Absolutely not.  If the filesystem declares it can support a block size
>>>> of 4TB, then good for it.  We just silently clamp it.
>>>
>>> Hmm, but you raised the point about clamping in the previous patches[1]
>>> after Ryan pointed out that we should not silently clamp the order.
>>>
>>> ```
>>>> It seems strange to silently clamp these? Presumably for the bs>ps usecase,
>>>> whatever values are passed in are a hard requirement? So wouldn't want them to
>>>> be silently reduced. (Especially given the recent change to reduce the size of
>>>> MAX_PAGECACHE_ORDER to less then PMD size in some cases).
>>>
>>> Hm, yes.  We should probably make this return an errno.  Including
>>> returning an errno for !IS_ENABLED() and min > 0.
>>> ```
>>>
>>> It was not clear from the conversation in the previous patches that we
>>> decided to just clamp the order (like it was done before).
>>>
>>> So let's just stick with how it was done before where we clamp the
>>> values if min and max > MAX_PAGECACHE_ORDER?
>>>
>>> [1] https://lore.kernel.org/linux-fsdevel/Zoa9rQbEUam467-q@casper.infradead.org/
>>
>> The way I see it, there are 2 approaches we could take:
>>
>> 1. Implement mapping_max_folio_size_supported(), write a headerdoc for
>> mapping_set_folio_order_range() that says min must be lte max, max must be lte
>> mapping_max_folio_size_supported(). Then emit VM_WARN() in
>> mapping_set_folio_order_range() if the constraints are violated, and clamp to
>> make it safe (from page cache's perspective). The VM_WARN()s can just be inline
> 
> Inlining with the `if` is not possible since:
> 91241681c62a ("include/linux/mmdebug.h: make VM_WARN* non-rvals")

Ahh my bad. Could use WARN_ON()?

> 
>> in the if statements to keep them clean. The FS is responsible for checking
>> mapping_max_folio_size_supported() and ensuring min and max meet requirements.
> 
> This is sort of what is done here but IIUC willy's reply to the patch,
> he prefers silent clamping over having WARNINGS. I think because we check
> the constraints during the mount time, so it should be safe to call
> this I guess?

I don't want to put words in his mouth, but I thought he was complaining about
the verbosity of the warnings, not their presence.

> 
>>
>> 2. Return an error from mapping_set_folio_order_range() (and the other functions
>> that set min/max). No need for warning. No state changed if error is returned.
>> FS can emit warning on error if it wants.
> 
> I think Chinner was not happy with this approach because this is done
> per inode and basically we would just shutdown the filesystem in the
> first inode allocation instead of refusing the mount as we know about
> the MAX_PAGECACHE_ORDER even during the mount phase anyway.

Ahh that makes sense. Understood.

> 
> --
> Pankaj
Pankaj Raghav (Samsung) July 22, 2024, 2:19 p.m. UTC | #7
@willy:

I want to clarify before sending the next round of patches as I didn't
get any reply in the previous email.

IIUC your comments properly:

- I will go back to silent clamping in mapping_set_folio_order_range as
  before and remove VM_WARN_ONCE().

- I will move the mapping_max_folio_size_supported() to patch 10, and FSs
  can use them to check for the max block size that can be supported and
  take the respective action.

--
Pankaj

On Tue, Jul 16, 2024 at 04:26:10PM +0100, Matthew Wilcox wrote:
> On Mon, Jul 15, 2024 at 11:44:48AM +0200, Pankaj Raghav (Samsung) wrote:
> > +/*
> > + * mapping_max_folio_size_supported() - Check the max folio size supported
> > + *
> > + * The filesystem should call this function at mount time if there is a
> > + * requirement on the folio mapping size in the page cache.
> > + */
> > +static inline size_t mapping_max_folio_size_supported(void)
> > +{
> > +	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> > +		return 1U << (PAGE_SHIFT + MAX_PAGECACHE_ORDER);
> > +	return PAGE_SIZE;
> > +}
> 
> There's no need for this to be part of this patch.  I've removed stuff
> from this patch before that's not needed, please stop adding unnecessary
> functions.  This would logically be part of patch 10.
> 
> > +static inline void mapping_set_folio_order_range(struct address_space *mapping,
> > +						 unsigned int min,
> > +						 unsigned int max)
> > +{
> > +	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> > +		return;
> > +
> > +	if (min > MAX_PAGECACHE_ORDER) {
> > +		VM_WARN_ONCE(1,
> > +	"min order > MAX_PAGECACHE_ORDER. Setting min_order to MAX_PAGECACHE_ORDER");
> > +		min = MAX_PAGECACHE_ORDER;
> > +	}
> 
> This is really too much.  It's something that will never happen.  Just
> delete the message.
> 
> > +	if (max > MAX_PAGECACHE_ORDER) {
> > +		VM_WARN_ONCE(1,
> > +	"max order > MAX_PAGECACHE_ORDER. Setting max_order to MAX_PAGECACHE_ORDER");
> > +		max = MAX_PAGECACHE_ORDER;
> 
> Absolutely not.  If the filesystem declares it can support a block size
> of 4TB, then good for it.  We just silently clamp it.
>
diff mbox series

Patch

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 8026a8a433d36..8d2b5c51461b0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -204,14 +204,21 @@  enum mapping_flags {
 	AS_EXITING	= 4, 	/* final truncate in progress */
 	/* writeback related tags are not used */
 	AS_NO_WRITEBACK_TAGS = 5,
-	AS_LARGE_FOLIO_SUPPORT = 6,
-	AS_RELEASE_ALWAYS,	/* Call ->release_folio(), even if no private data */
-	AS_STABLE_WRITES,	/* must wait for writeback before modifying
+	AS_RELEASE_ALWAYS = 6,	/* Call ->release_folio(), even if no private data */
+	AS_STABLE_WRITES = 7,	/* must wait for writeback before modifying
 				   folio contents */
-	AS_UNMOVABLE,		/* The mapping cannot be moved, ever */
-	AS_INACCESSIBLE,	/* Do not attempt direct R/W access to the mapping */
+	AS_UNMOVABLE = 8,	/* The mapping cannot be moved, ever */
+	AS_INACCESSIBLE = 9,	/* Do not attempt direct R/W access to the mapping */
+	/* Bits 16-25 are used for FOLIO_ORDER */
+	AS_FOLIO_ORDER_BITS = 5,
+	AS_FOLIO_ORDER_MIN = 16,
+	AS_FOLIO_ORDER_MAX = AS_FOLIO_ORDER_MIN + AS_FOLIO_ORDER_BITS,
 };
 
+#define AS_FOLIO_ORDER_MASK     ((1u << AS_FOLIO_ORDER_BITS) - 1)
+#define AS_FOLIO_ORDER_MIN_MASK (AS_FOLIO_ORDER_MASK << AS_FOLIO_ORDER_MIN)
+#define AS_FOLIO_ORDER_MAX_MASK (AS_FOLIO_ORDER_MASK << AS_FOLIO_ORDER_MAX)
+
 /**
  * mapping_set_error - record a writeback error in the address_space
  * @mapping: the mapping in which an error should be set
@@ -367,9 +374,70 @@  static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
 #define MAX_XAS_ORDER		(XA_CHUNK_SHIFT * 2 - 1)
 #define MAX_PAGECACHE_ORDER	min(MAX_XAS_ORDER, PREFERRED_MAX_PAGECACHE_ORDER)
 
+/*
+ * mapping_max_folio_size_supported() - Check the max folio size supported
+ *
+ * The filesystem should call this function at mount time if there is a
+ * requirement on the folio mapping size in the page cache.
+ */
+static inline size_t mapping_max_folio_size_supported(void)
+{
+	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+		return 1U << (PAGE_SHIFT + MAX_PAGECACHE_ORDER);
+	return PAGE_SIZE;
+}
+
+/*
+ * mapping_set_folio_order_range() - Set the orders supported by a file.
+ * @mapping: The address space of the file.
+ * @min: Minimum folio order (between 0-MAX_PAGECACHE_ORDER inclusive).
+ * @max: Maximum folio order (between @min-MAX_PAGECACHE_ORDER inclusive).
+ *
+ * The filesystem should call this function in its inode constructor to
+ * indicate which base size (min) and maximum size (max) of folio the VFS
+ * can use to cache the contents of the file.  This should only be used
+ * if the filesystem needs special handling of folio sizes (ie there is
+ * something the core cannot know).
+ * Do not tune it based on, eg, i_size.
+ *
+ * Context: This should not be called while the inode is active as it
+ * is non-atomic.
+ */
+static inline void mapping_set_folio_order_range(struct address_space *mapping,
+						 unsigned int min,
+						 unsigned int max)
+{
+	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+		return;
+
+	if (min > MAX_PAGECACHE_ORDER) {
+		VM_WARN_ONCE(1,
+	"min order > MAX_PAGECACHE_ORDER. Setting min_order to MAX_PAGECACHE_ORDER");
+		min = MAX_PAGECACHE_ORDER;
+	}
+
+	if (max > MAX_PAGECACHE_ORDER) {
+		VM_WARN_ONCE(1,
+	"max order > MAX_PAGECACHE_ORDER. Setting max_order to MAX_PAGECACHE_ORDER");
+		max = MAX_PAGECACHE_ORDER;
+	}
+
+	if (max < min)
+		max = min;
+
+	mapping->flags = (mapping->flags & ~AS_FOLIO_ORDER_MASK) |
+		(min << AS_FOLIO_ORDER_MIN) | (max << AS_FOLIO_ORDER_MAX);
+}
+
+static inline void mapping_set_folio_min_order(struct address_space *mapping,
+					       unsigned int min)
+{
+	mapping_set_folio_order_range(mapping, min, MAX_PAGECACHE_ORDER);
+}
+
 /**
  * mapping_set_large_folios() - Indicate the file supports large folios.
- * @mapping: The file.
+ * @mapping: The address space of the file.
  *
  * The filesystem should call this function in its inode constructor to
  * indicate that the VFS can use large folios to cache the contents of
@@ -380,7 +448,23 @@  static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
  */
 static inline void mapping_set_large_folios(struct address_space *mapping)
 {
-	__set_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
+	mapping_set_folio_order_range(mapping, 0, MAX_PAGECACHE_ORDER);
+}
+
+static inline unsigned int
+mapping_max_folio_order(const struct address_space *mapping)
+{
+	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+		return 0;
+	return (mapping->flags & AS_FOLIO_ORDER_MAX_MASK) >> AS_FOLIO_ORDER_MAX;
+}
+
+static inline unsigned int
+mapping_min_folio_order(const struct address_space *mapping)
+{
+	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+		return 0;
+	return (mapping->flags & AS_FOLIO_ORDER_MIN_MASK) >> AS_FOLIO_ORDER_MIN;
 }
 
 /*
@@ -393,16 +477,13 @@  static inline bool mapping_large_folio_support(struct address_space *mapping)
 	VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON,
 			"Anonymous mapping always supports large folio");
 
-	return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
-		test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
+	return mapping_max_folio_order(mapping) > 0;
 }
 
 /* Return the maximum folio size for this pagecache mapping, in bytes. */
-static inline size_t mapping_max_folio_size(struct address_space *mapping)
+static inline size_t mapping_max_folio_size(const struct address_space *mapping)
 {
-	if (mapping_large_folio_support(mapping))
-		return PAGE_SIZE << MAX_PAGECACHE_ORDER;
-	return PAGE_SIZE;
+	return PAGE_SIZE << mapping_max_folio_order(mapping);
 }
 
 static inline int filemap_nr_thps(struct address_space *mapping)
diff --git a/mm/filemap.c b/mm/filemap.c
index d62150418b910..ad5e4a848070e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1933,10 +1933,8 @@  struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
 		if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP))))
 			fgp_flags |= FGP_LOCK;
 
-		if (!mapping_large_folio_support(mapping))
-			order = 0;
-		if (order > MAX_PAGECACHE_ORDER)
-			order = MAX_PAGECACHE_ORDER;
+		if (order > mapping_max_folio_order(mapping))
+			order = mapping_max_folio_order(mapping);
 		/* If we're not aligned, allocate a smaller folio */
 		if (index & ((1UL << order) - 1))
 			order = __ffs(index);
diff --git a/mm/readahead.c b/mm/readahead.c
index 517c0be7ce665..3e5239e9e1777 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -449,10 +449,10 @@  void page_cache_ra_order(struct readahead_control *ractl,
 
 	limit = min(limit, index + ra->size - 1);
 
-	if (new_order < MAX_PAGECACHE_ORDER)
+	if (new_order < mapping_max_folio_order(mapping))
 		new_order += 2;
 
-	new_order = min_t(unsigned int, MAX_PAGECACHE_ORDER, new_order);
+	new_order = min(mapping_max_folio_order(mapping), new_order);
 	new_order = min_t(unsigned int, new_order, ilog2(ra->size));
 
 	/* See comment in page_cache_ra_unbounded() */