mbox series

[RFC,v2,0/3] mm: Properly document tail pages for a folio

Message ID 20230814184411.330496-1-peterx@redhat.com (mailing list archive)
Headers show
Series mm: Properly document tail pages for a folio | expand

Message

Peter Xu Aug. 14, 2023, 6:44 p.m. UTC
rfcv1: https://lore.kernel.org/r/20230810204944.53471-1-peterx@redhat.com

This is rfcv2 of the patch, where I split two small changes out from the
last patch.  Please refer to each patch for details.  The goal of the
series is to document clearly on how the fields in struct folio is reused
over tail pages, and make it clear on what can still be reused as free.

Smoke tested on x86_64 only, kernel-doc should have no change.

Comments welcomed.  Thanks.

Peter Xu (3):
  mm: Add TAIL_MAPPING_REUSED_MAX
  mm: Reorg and declare free spaces in struct folio tails
  mm: Proper document tail pages fields for folio

 include/linux/mm_types.h | 60 ++++++++++++++++++++++++++++++++++++----
 mm/huge_memory.c         |  6 ++--
 2 files changed, 58 insertions(+), 8 deletions(-)

Comments

Matthew Wilcox Aug. 14, 2023, 7:58 p.m. UTC | #1
On Mon, Aug 14, 2023 at 02:44:08PM -0400, Peter Xu wrote:

Look, this is all still too complicated.  And you're trying to make
something better that I'm trying to make disappear.  I'd really rather
you spent your time worrying about making userfaultfd use folios
than faffing with this.

How about this?

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5e74ce4a28cd..873285bb5d45 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -334,11 +334,14 @@ struct folio {
 	/* public: */
 			unsigned char _folio_dtor;
 			unsigned char _folio_order;
+			/* two bytes available here */
 			atomic_t _entire_mapcount;
 			atomic_t _nr_pages_mapped;
 			atomic_t _pincount;
+			/* no more space on 32-bt */
 #ifdef CONFIG_64BIT
 			unsigned int _folio_nr_pages;
+			/* twelve bytes available on 64-bit */
 #endif
 	/* private: the union with struct page is transitional */
 		};
@@ -360,6 +363,7 @@ struct folio {
 			unsigned long _head_2a;
 	/* public: */
 			struct list_head _deferred_list;
+			/* three more words available here */
 	/* private: the union with struct page is transitional */
 		};
 		struct page __page_2;
Peter Xu Aug. 14, 2023, 8:21 p.m. UTC | #2
On Mon, Aug 14, 2023 at 08:58:44PM +0100, Matthew Wilcox wrote:
> On Mon, Aug 14, 2023 at 02:44:08PM -0400, Peter Xu wrote:
> 
> Look, this is all still too complicated.  And you're trying to make
> something better that I'm trying to make disappear.  I'd really rather
> you spent your time worrying about making userfaultfd use folios
> than faffing with this.

I saw that internally some of uffd already start to use folio, while I
don't think the syscall part needs changing yet - the ranged API should
work for folio when it comes, and other than that folio should be hidden
and transparent, afaiu.

Do you mean when large folios can land on anon/shmem we can start to
allocate large folios there for uffd operations?  Or something else?

> 
> How about this?

I still prefer my version, sorry. But I agree this is better than nothing
to guide what's free to use - it's really not obvious to me, and I suppose
true to most people.  Besides..

> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 5e74ce4a28cd..873285bb5d45 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -334,11 +334,14 @@ struct folio {
>  	/* public: */
>  			unsigned char _folio_dtor;
>  			unsigned char _folio_order;
> +			/* two bytes available here */
>  			atomic_t _entire_mapcount;
>  			atomic_t _nr_pages_mapped;
>  			atomic_t _pincount;
> +			/* no more space on 32-bt */
>  #ifdef CONFIG_64BIT
>  			unsigned int _folio_nr_pages;
> +			/* twelve bytes available on 64-bit */
>  #endif
>  	/* private: the union with struct page is transitional */
>  		};
> @@ -360,6 +363,7 @@ struct folio {
>  			unsigned long _head_2a;
>  	/* public: */
>  			struct list_head _deferred_list;
> +			/* three more words available here */

.. not really three more words here but 2 for 32 bits and 1 for 64 bits.
In my patch 3 I used "8 bytes free" so it's applicable to both.

I can figure it out in ten seconds now with my documents..

	 * |--------+-------------+-------------------|
	 * |  index | 32 bits     | 64 bits           |
	 * |--------+-------------+-------------------|
	 * |      0 | flags       | flags             |
	 * |      1 | head        | head              |
	 * |      2 | FREE        | FREE              |
	 * |      3 | FREE [1]    | FREE [1]          |
	 * |      4 | FREE        | FREE              |
	 * |      5 | FREE        | private [2]       |
	 * |      6 | mapcnt      | mapcnt+refcnt [3] |
	 * |      7 | refcnt [3]  |                   |
	 * |--------+-------------+-------------------|

Then...

	/* public: WORD 2-3 */
			struct list_head _deferred_list;
        <----- so after this we have WORDs 4/5 free on 32bits and 4 only on
               64 bits, because WORD 5 is used on 64bits

... but I won't be able to if without these documents.  I hope it justifies
that it's still worthwhile.

Thanks,
Randy Dunlap Aug. 14, 2023, 11:01 p.m. UTC | #3
On 8/14/23 12:58, Matthew Wilcox wrote:
> On Mon, Aug 14, 2023 at 02:44:08PM -0400, Peter Xu wrote:
> 
> Look, this is all still too complicated.  And you're trying to make
> something better that I'm trying to make disappear.  I'd really rather
> you spent your time worrying about making userfaultfd use folios
> than faffing with this.
> 
> How about this?
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 5e74ce4a28cd..873285bb5d45 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -334,11 +334,14 @@ struct folio {
>   	/* public: */
>   			unsigned char _folio_dtor;
>   			unsigned char _folio_order;
> +			/* two bytes available here */
>   			atomic_t _entire_mapcount;
>   			atomic_t _nr_pages_mapped;
>   			atomic_t _pincount;
> +			/* no more space on 32-bt */

			                    32-bit

>   #ifdef CONFIG_64BIT
>   			unsigned int _folio_nr_pages;
> +			/* twelve bytes available on 64-bit */
>   #endif
>   	/* private: the union with struct page is transitional */
>   		};
> @@ -360,6 +363,7 @@ struct folio {
>   			unsigned long _head_2a;
>   	/* public: */
>   			struct list_head _deferred_list;
> +			/* three more words available here */
>   	/* private: the union with struct page is transitional */
>   		};
>   		struct page __page_2;
Matthew Wilcox Aug. 15, 2023, 3:45 a.m. UTC | #4
On Mon, Aug 14, 2023 at 04:21:55PM -0400, Peter Xu wrote:
> On Mon, Aug 14, 2023 at 08:58:44PM +0100, Matthew Wilcox wrote:
> > On Mon, Aug 14, 2023 at 02:44:08PM -0400, Peter Xu wrote:
> > 
> > Look, this is all still too complicated.  And you're trying to make
> > something better that I'm trying to make disappear.  I'd really rather
> > you spent your time worrying about making userfaultfd use folios
> > than faffing with this.
> 
> I saw that internally some of uffd already start to use folio, while I
> don't think the syscall part needs changing yet - the ranged API should
> work for folio when it comes, and other than that folio should be hidden
> and transparent, afaiu.
> 
> Do you mean when large folios can land on anon/shmem we can start to
> allocate large folios there for uffd operations?  Or something else?

Hm, I thought there were some parts that still needed to be converted.
But I don't see anything obvious right now.

> > @@ -360,6 +363,7 @@ struct folio {
> >  			unsigned long _head_2a;
> >  	/* public: */
> >  			struct list_head _deferred_list;
> > +			/* three more words available here */
> 
> .. not really three more words here but 2 for 32 bits and 1 for 64 bits.
> In my patch 3 I used "8 bytes free" so it's applicable to both.

I always forget about THP_SWAP using tail->private.  That actually needs
to be asserted by the compiler, not just documented.  Something along
these lines.

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 659c7b84726c..3880b3f2e321 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -340,8 +340,11 @@ struct folio {
 			atomic_t _pincount;
 #ifdef CONFIG_64BIT
 			unsigned int _folio_nr_pages;
-#endif
+			/* 4 byte gap here */
 	/* private: the union with struct page is transitional */
+			/* Fix THP_SWAP to not use tail->private */
+			unsigned long _private_1;
+#endif
 		};
 		struct page __page_1;
 	};
@@ -362,6 +365,9 @@ struct folio {
 	/* public: */
 			struct list_head _deferred_list;
 	/* private: the union with struct page is transitional */
+			unsigned long _avail_2a;
+			/* Fix THP_SWAP to not use tail->private */
+			unsigned long _private_2a;
 		};
 		struct page __page_2;
 	};
@@ -386,12 +392,18 @@ FOLIO_MATCH(memcg_data, memcg_data);
 			offsetof(struct page, pg) + sizeof(struct page))
 FOLIO_MATCH(flags, _flags_1);
 FOLIO_MATCH(compound_head, _head_1);
+#ifdef CONFIG_64BIT
+FOLIO_MATCH(private, _private_1);
+#endif
 #undef FOLIO_MATCH
 #define FOLIO_MATCH(pg, fl)						\
 	static_assert(offsetof(struct folio, fl) ==			\
 			offsetof(struct page, pg) + 2 * sizeof(struct page))
 FOLIO_MATCH(flags, _flags_2);
 FOLIO_MATCH(compound_head, _head_2);
+FOLIO_MATCH(flags, _flags_2a);
+FOLIO_MATCH(compound_head, _head_2a);
+FOLIO_MATCH(private, _private_2a);
 #undef FOLIO_MATCH
 
 /*

This is against the patchset I just posted which frees up a word in the
first tail page.
Peter Xu Aug. 15, 2023, 7:37 p.m. UTC | #5
On Tue, Aug 15, 2023 at 04:45:52AM +0100, Matthew Wilcox wrote:
> I always forget about THP_SWAP using tail->private.  That actually needs
> to be asserted by the compiler, not just documented.  Something along
> these lines.
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 659c7b84726c..3880b3f2e321 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -340,8 +340,11 @@ struct folio {
>  			atomic_t _pincount;
>  #ifdef CONFIG_64BIT
>  			unsigned int _folio_nr_pages;
> -#endif
> +			/* 4 byte gap here */
>  	/* private: the union with struct page is transitional */
> +			/* Fix THP_SWAP to not use tail->private */
> +			unsigned long _private_1;
> +#endif
>  		};
>  		struct page __page_1;
>  	};
> @@ -362,6 +365,9 @@ struct folio {
>  	/* public: */
>  			struct list_head _deferred_list;
>  	/* private: the union with struct page is transitional */
> +			unsigned long _avail_2a;
> +			/* Fix THP_SWAP to not use tail->private */
> +			unsigned long _private_2a;
>  		};
>  		struct page __page_2;
>  	};
> @@ -386,12 +392,18 @@ FOLIO_MATCH(memcg_data, memcg_data);
>  			offsetof(struct page, pg) + sizeof(struct page))
>  FOLIO_MATCH(flags, _flags_1);
>  FOLIO_MATCH(compound_head, _head_1);
> +#ifdef CONFIG_64BIT
> +FOLIO_MATCH(private, _private_1);
> +#endif
>  #undef FOLIO_MATCH
>  #define FOLIO_MATCH(pg, fl)						\
>  	static_assert(offsetof(struct folio, fl) ==			\
>  			offsetof(struct page, pg) + 2 * sizeof(struct page))
>  FOLIO_MATCH(flags, _flags_2);
>  FOLIO_MATCH(compound_head, _head_2);
> +FOLIO_MATCH(flags, _flags_2a);
> +FOLIO_MATCH(compound_head, _head_2a);
> +FOLIO_MATCH(private, _private_2a);
>  #undef FOLIO_MATCH
>  
>  /*
> 
> This is against the patchset I just posted which frees up a word in the
> first tail page.

Okay, I assume you meant to suggest leverage FOLIO_MATCH(), which I can
definitely try.  But then I'd hope it covers not only private field but all
the fields that the tail pages reuses; the goal is to document everything
no matter in what form.  I'll see what I can get..  Thanks.
Matthew Wilcox Aug. 15, 2023, 8:16 p.m. UTC | #6
On Tue, Aug 15, 2023 at 03:37:21PM -0400, Peter Xu wrote:
> On Tue, Aug 15, 2023 at 04:45:52AM +0100, Matthew Wilcox wrote:
> > I always forget about THP_SWAP using tail->private.  That actually needs
> > to be asserted by the compiler, not just documented.  Something along
> > these lines.
> > 
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 659c7b84726c..3880b3f2e321 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -340,8 +340,11 @@ struct folio {
> >  			atomic_t _pincount;
> >  #ifdef CONFIG_64BIT
> >  			unsigned int _folio_nr_pages;
> > -#endif
> > +			/* 4 byte gap here */
> >  	/* private: the union with struct page is transitional */
> > +			/* Fix THP_SWAP to not use tail->private */
> > +			unsigned long _private_1;
> > +#endif
> >  		};
> >  		struct page __page_1;
> >  	};
> > @@ -362,6 +365,9 @@ struct folio {
> >  	/* public: */
> >  			struct list_head _deferred_list;
> >  	/* private: the union with struct page is transitional */
> > +			unsigned long _avail_2a;
> > +			/* Fix THP_SWAP to not use tail->private */
> > +			unsigned long _private_2a;
> >  		};
> >  		struct page __page_2;
> >  	};
> > @@ -386,12 +392,18 @@ FOLIO_MATCH(memcg_data, memcg_data);
> >  			offsetof(struct page, pg) + sizeof(struct page))
> >  FOLIO_MATCH(flags, _flags_1);
> >  FOLIO_MATCH(compound_head, _head_1);
> > +#ifdef CONFIG_64BIT
> > +FOLIO_MATCH(private, _private_1);
> > +#endif
> >  #undef FOLIO_MATCH
> >  #define FOLIO_MATCH(pg, fl)						\
> >  	static_assert(offsetof(struct folio, fl) ==			\
> >  			offsetof(struct page, pg) + 2 * sizeof(struct page))
> >  FOLIO_MATCH(flags, _flags_2);
> >  FOLIO_MATCH(compound_head, _head_2);
> > +FOLIO_MATCH(flags, _flags_2a);
> > +FOLIO_MATCH(compound_head, _head_2a);
> > +FOLIO_MATCH(private, _private_2a);
> >  #undef FOLIO_MATCH
> >  
> >  /*
> > 
> > This is against the patchset I just posted which frees up a word in the
> > first tail page.
> 
> Okay, I assume you meant to suggest leverage FOLIO_MATCH(), which I can
> definitely try.  But then I'd hope it covers not only private field but all
> the fields that the tail pages reuses; the goal is to document everything
> no matter in what form.  I'll see what I can get..  Thanks.

No, sometimes there are things which shouldn't be documented because they
don't matter, and when changing code sometimes we forget to change the
documentation, and then people read the documentation which is different
from the code, and they get confused.

It matters that the various 'private' members line up.  It matters
that folio->index matches page->index.  It does not matter what
offset _entire_mapcount is at.  That can be moved around freely and no
documentation needs to be changed.

I don't want you to use FOLIO_MATCH to make any unnecessary assertions.
The only assertion missing is for _private_1 and _private_2a, and that's
why I wrote a patch to add them.
Peter Xu Aug. 15, 2023, 8:39 p.m. UTC | #7
On Tue, Aug 15, 2023 at 09:16:47PM +0100, Matthew Wilcox wrote:
> No, sometimes there are things which shouldn't be documented because they
> don't matter, and when changing code sometimes we forget to change the
> documentation, and then people read the documentation which is different
> from the code, and they get confused.
> 
> It matters that the various 'private' members line up.  It matters
> that folio->index matches page->index.  It does not matter what
> offset _entire_mapcount is at.  That can be moved around freely and no
> documentation needs to be changed.
> 
> I don't want you to use FOLIO_MATCH to make any unnecessary assertions.
> The only assertion missing is for _private_1 and _private_2a, and that's
> why I wrote a patch to add them.

I didn't mean to add assertions for _entire_mapcount (I don't even know
how..), but _mapcount and _refcount to clamp the fields, then all fields
can be clear, just like head/flags clamping the start of fields.

One thing I can understand that you'd like to avoid these "offset" things
is perhaps because you keep that in mind to, one day, have mmdesc replacing
folio so folio doesn't need to match struct page at all some day,
ideally. The order of fields, size of fields, etc. none of them should
matter, when it comes, and we should go toward that.  However my argument
would be that, before that day comes IMHO we need some good documentation
for us to know how the fields look like now, why they worked, and how to
reuse new fields.. when that comes, we can just safely remove these
documentations.

It's just that these 'offset's still matter and matters a lot now, imho,
but it's very confusing when read without some help.

Let me try one more time to see how you think about it on an rfcv3.  If
that still doesn't get any form of ack from you, I'll put this aside.
Matthew Wilcox Aug. 15, 2023, 9:03 p.m. UTC | #8
On Tue, Aug 15, 2023 at 04:39:16PM -0400, Peter Xu wrote:
> On Tue, Aug 15, 2023 at 09:16:47PM +0100, Matthew Wilcox wrote:
> > No, sometimes there are things which shouldn't be documented because they
> > don't matter, and when changing code sometimes we forget to change the
> > documentation, and then people read the documentation which is different
> > from the code, and they get confused.
> > 
> > It matters that the various 'private' members line up.  It matters
> > that folio->index matches page->index.  It does not matter what
> > offset _entire_mapcount is at.  That can be moved around freely and no
> > documentation needs to be changed.
> > 
> > I don't want you to use FOLIO_MATCH to make any unnecessary assertions.
> > The only assertion missing is for _private_1 and _private_2a, and that's
> > why I wrote a patch to add them.
> 
> I didn't mean to add assertions for _entire_mapcount (I don't even know
> how..), but _mapcount and _refcount to clamp the fields, then all fields
> can be clear, just like head/flags clamping the start of fields.

Ah!  mapcount does make sense, yes.  We could just put a
			/* no more space here */
comment in, but an assert works too.

> One thing I can understand that you'd like to avoid these "offset" things
> is perhaps because you keep that in mind to, one day, have mmdesc replacing
> folio so folio doesn't need to match struct page at all some day,
> ideally. The order of fields, size of fields, etc. none of them should
> matter, when it comes, and we should go toward that.  However my argument
> would be that, before that day comes IMHO we need some good documentation
> for us to know how the fields look like now, why they worked, and how to
> reuse new fields.. when that comes, we can just safely remove these
> documentations.
> 
> It's just that these 'offset's still matter and matters a lot now, imho,
> but it's very confusing when read without some help.

No, that's not why I'm opposed to them.  I'm opposed to over-documenting
things, as I just outlined.  Documentation is necessary and good for all
kinds of reasons, but it should be meaningful and not prone to rot.  If
there's a tool that can tell you something, there's no point in
documenting it; that's why I pointed you towards pahole.  I also
like "documentation" which is checked by the compiler, hence the
existence of the FOLIO_MATCH macro which documents that the two
structures line up, and the compiler checks that they do.  FOLIO_MATCH
even caught a bug!

> Let me try one more time to see how you think about it on an rfcv3.  If
> that still doesn't get any form of ack from you, I'll put this aside.

At least we've got to something that I like the idea of ;-)