diff mbox series

[RFC,3/3] fb_defio: do not use deprecated page->mapping, index fields

Message ID 1e452b5b65f15a9a5d0c2ed3f5f812fdd1367603.1736352361.git.lorenzo.stoakes@oracle.com (mailing list archive)
State Awaiting Upstream
Headers show
Series expose mapping wrprotect, fix fb_defio use | expand

Commit Message

Lorenzo Stoakes Jan. 8, 2025, 4:18 p.m. UTC
With the introduction of rmap_wrprotect_file_page() there is no need to use
folio_mkclean() in order to write-protect mappings of frame buffer pages,
and therefore no need to inappropriately set kernel-allocated page->index,
mapping fields to permit this operation.

Instead, store the pointer to the page cache object for the mapped driver
in the fb_deferred_io object, and use the already stored page offset from
the pageref object to look up mappings in order to write-protect them.

This is justified, as for the page objects to store a mapping pointer at
the point of assignment of pages, they must all reference the same
underlying address_space object. Since the life time of the pagerefs is
also the lifetime of the fb_deferred_io object, storing the pointer here
makes snese.

This eliminates the need for all of the logic around setting and
maintaining page->index,mapping which we remove.

This eliminates the use of folio_mkclean() entirely but otherwise should
have no functional change.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 drivers/video/fbdev/core/fb_defio.c | 34 ++++++++++-------------------
 include/linux/fb.h                  |  1 +
 2 files changed, 12 insertions(+), 23 deletions(-)

Comments

Matthew Wilcox Jan. 8, 2025, 5:32 p.m. UTC | #1
On Wed, Jan 08, 2025 at 04:18:42PM +0000, Lorenzo Stoakes wrote:
> @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work)
>  		struct folio *folio = page_folio(pageref->page);
>  
>  		folio_lock(folio);
> -		folio_mkclean(folio);
> +		rmap_wrprotect_file_page(fbdefio->mapping,
> +					 pageref->offset >> PAGE_SHIFT,
> +					 compound_nr(pageref->page),
> +					 page_to_pfn(pageref->page));
>  		folio_unlock(folio);

Why do we need to lock the folio?  (since this isn't necessarily a
folio)  Also, do we need compound_nr() here?  I _think_ for defio,
the number of pages allocated per object are fixed, so this should be
an fbdefio->nr_pages field?

(something that's always troubled me about compound_nr() is that it
returns 1 for tail pages and the number you actually expect for head
pages)
Lorenzo Stoakes Jan. 8, 2025, 7:41 p.m. UTC | #2
On Wed, Jan 08, 2025 at 05:32:54PM +0000, Matthew Wilcox wrote:
> On Wed, Jan 08, 2025 at 04:18:42PM +0000, Lorenzo Stoakes wrote:
> > @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work)
> >  		struct folio *folio = page_folio(pageref->page);
> >
> >  		folio_lock(folio);
> > -		folio_mkclean(folio);
> > +		rmap_wrprotect_file_page(fbdefio->mapping,
> > +					 pageref->offset >> PAGE_SHIFT,
> > +					 compound_nr(pageref->page),
> > +					 page_to_pfn(pageref->page));
> >  		folio_unlock(folio);
>
> Why do we need to lock the folio?  (since this isn't necessarily a
> folio)  Also, do we need compound_nr() here?  I _think_ for defio,
> the number of pages allocated per object are fixed, so this should be
> an fbdefio->nr_pages field?

I'm trying to keep the code as similar as possible to the way it was before,
even if there are questionable parts.

There is a comment about some timing issue around the locks and so there appears
to be an assumption about that.

As to compound_nr(), we're not write protecting everything, just each invidiual
page in the list that needs it, so we only want to do one at a time. I strongly
suspect it's a single base page each time, but for belts + braces I'm doing
compound_nr().

See below, this is wrong, it should just be '1'.

So this is iterating through a list of pagerefs that can be in any random order.

>
> (something that's always troubled me about compound_nr() is that it
> returns 1 for tail pages and the number you actually expect for head
> pages)
>

OK I changed this from '1' to compound_nr() out of an (apparently) abundance of
caution, but I was wrong:

npagerefs = DIV_ROUND_UP(info->fix.smem_len, PAGE_SIZE);

There are page refs for each PAGE_SIZE (i.e. base page size), so there is no way
anything is compound.

Will switch this to 1.
David Hildenbrand Jan. 8, 2025, 8:14 p.m. UTC | #3
On 08.01.25 18:32, Matthew Wilcox wrote:
> On Wed, Jan 08, 2025 at 04:18:42PM +0000, Lorenzo Stoakes wrote:
>> @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work)
>>   		struct folio *folio = page_folio(pageref->page);
>>   
>>   		folio_lock(folio);
>> -		folio_mkclean(folio);
>> +		rmap_wrprotect_file_page(fbdefio->mapping,
>> +					 pageref->offset >> PAGE_SHIFT,
>> +					 compound_nr(pageref->page),
>> +					 page_to_pfn(pageref->page));
>>   		folio_unlock(folio);
> 
> Why do we need to lock the folio?  (since this isn't necessarily a
> folio) 

Can you clarify the "since this isn't necessarily a folio" part ? Do you 
mean in the future, when we split "struct page" and "struct folio"?

Doing an rmap walk on something that won't be a folio is ... sounds odd 
(->wrong :) )
Matthew Wilcox Jan. 8, 2025, 8:54 p.m. UTC | #4
On Wed, Jan 08, 2025 at 09:14:53PM +0100, David Hildenbrand wrote:
> On 08.01.25 18:32, Matthew Wilcox wrote:
> > On Wed, Jan 08, 2025 at 04:18:42PM +0000, Lorenzo Stoakes wrote:
> > > @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work)
> > >   		struct folio *folio = page_folio(pageref->page);
> > >   		folio_lock(folio);
> > > -		folio_mkclean(folio);
> > > +		rmap_wrprotect_file_page(fbdefio->mapping,
> > > +					 pageref->offset >> PAGE_SHIFT,
> > > +					 compound_nr(pageref->page),
> > > +					 page_to_pfn(pageref->page));
> > >   		folio_unlock(folio);
> > 
> > Why do we need to lock the folio?  (since this isn't necessarily a
> > folio)
> 
> Can you clarify the "since this isn't necessarily a folio" part ? Do you
> mean in the future, when we split "struct page" and "struct folio"?

Right.  I need to finish the email that explains where I think we're
going in 2025 ...

> Doing an rmap walk on something that won't be a folio is ... sounds odd
> (->wrong :) )

Not necessarily!  We already do that (since 2022) for DAX (see
6a8e0596f004).  rmap lets you find every place that a given range
of a file is mapped into user address spaces; but that file might be a
device file, and so it's not just pagecache but also (in this case)
fb memory, and whatever else device drivers decide to mmap.
David Hildenbrand Jan. 8, 2025, 9:12 p.m. UTC | #5
On 08.01.25 21:54, Matthew Wilcox wrote:
> On Wed, Jan 08, 2025 at 09:14:53PM +0100, David Hildenbrand wrote:
>> On 08.01.25 18:32, Matthew Wilcox wrote:
>>> On Wed, Jan 08, 2025 at 04:18:42PM +0000, Lorenzo Stoakes wrote:
>>>> @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work)
>>>>    		struct folio *folio = page_folio(pageref->page);
>>>>    		folio_lock(folio);
>>>> -		folio_mkclean(folio);
>>>> +		rmap_wrprotect_file_page(fbdefio->mapping,
>>>> +					 pageref->offset >> PAGE_SHIFT,
>>>> +					 compound_nr(pageref->page),
>>>> +					 page_to_pfn(pageref->page));
>>>>    		folio_unlock(folio);
>>>
>>> Why do we need to lock the folio?  (since this isn't necessarily a
>>> folio)
>>
>> Can you clarify the "since this isn't necessarily a folio" part ? Do you
>> mean in the future, when we split "struct page" and "struct folio"?
> 
> Right.  I need to finish the email that explains where I think we're
> going in 2025 ...
> 
>> Doing an rmap walk on something that won't be a folio is ... sounds odd
>> (->wrong :) )
> 
> Not necessarily!  We already do that (since 2022) for DAX (see
> 6a8e0596f004).  rmap lets you find every place that a given range
> of a file is mapped into user address spaces; but that file might be a
> device file, and so it's not just pagecache but also (in this case)
> fb memory, and whatever else device drivers decide to mmap.

Yes, that part I remember.

I thought we would be passing in a page into rmap_wrprotect_file_page(), 
and was wondering what we would do to "struct page" that won't be a 
folio in there.

Probably, because the "_page" in rmap_wrprotect_file_page() is misleading :)

... should it be "file_range" ? (but we also pass the pfn ... )
Matthew Wilcox Jan. 8, 2025, 9:55 p.m. UTC | #6
On Wed, Jan 08, 2025 at 10:12:36PM +0100, David Hildenbrand wrote:
> On 08.01.25 21:54, Matthew Wilcox wrote:
> > Not necessarily!  We already do that (since 2022) for DAX (see
> > 6a8e0596f004).  rmap lets you find every place that a given range
> > of a file is mapped into user address spaces; but that file might be a
> > device file, and so it's not just pagecache but also (in this case)
> > fb memory, and whatever else device drivers decide to mmap.
> 
> Yes, that part I remember.
> 
> I thought we would be passing in a page into rmap_wrprotect_file_page(), and
> was wondering what we would do to "struct page" that won't be a folio in
> there.
> 
> Probably, because the "_page" in rmap_wrprotect_file_page() is misleading :)
> 
> ... should it be "file_range" ? (but we also pass the pfn ... )

I don't think it's unprecedented for us to identify a page by its pfn.
After all, the acronym stands for "page frame number".  That said, for
the one caller of this, it has the struct page and passes in the result
from page_to_pfn().  So no harm in passing in the struct page directly.

I would not like to see this function called "rmap_wrprotect_file_pfn".
Files don't have pfns, so that's a bad name.
David Hildenbrand Jan. 8, 2025, 10:02 p.m. UTC | #7
On 08.01.25 22:55, Matthew Wilcox wrote:
> On Wed, Jan 08, 2025 at 10:12:36PM +0100, David Hildenbrand wrote:
>> On 08.01.25 21:54, Matthew Wilcox wrote:
>>> Not necessarily!  We already do that (since 2022) for DAX (see
>>> 6a8e0596f004).  rmap lets you find every place that a given range
>>> of a file is mapped into user address spaces; but that file might be a
>>> device file, and so it's not just pagecache but also (in this case)
>>> fb memory, and whatever else device drivers decide to mmap.
>>
>> Yes, that part I remember.
>>
>> I thought we would be passing in a page into rmap_wrprotect_file_page(), and
>> was wondering what we would do to "struct page" that won't be a folio in
>> there.
>>
>> Probably, because the "_page" in rmap_wrprotect_file_page() is misleading :)
>>
>> ... should it be "file_range" ? (but we also pass the pfn ... )
> 
> I don't think it's unprecedented for us to identify a page by its pfn.
> After all, the acronym stands for "page frame number".  That said, for
> the one caller of this, it has the struct page and passes in the result
> from page_to_pfn().  So no harm in passing in the struct page directly.
> 
> I would not like to see this function called "rmap_wrprotect_file_pfn".
> Files don't have pfns, so that's a bad name.

Agreed.

(it's too late in the evening for me to give any good suggestions :) )
Lorenzo Stoakes Jan. 13, 2025, 5:18 p.m. UTC | #8
On Wed, Jan 08, 2025 at 11:02:57PM +0100, David Hildenbrand wrote:
> On 08.01.25 22:55, Matthew Wilcox wrote:
> > On Wed, Jan 08, 2025 at 10:12:36PM +0100, David Hildenbrand wrote:
> > > On 08.01.25 21:54, Matthew Wilcox wrote:
> > > > Not necessarily!  We already do that (since 2022) for DAX (see
> > > > 6a8e0596f004).  rmap lets you find every place that a given range
> > > > of a file is mapped into user address spaces; but that file might be a
> > > > device file, and so it's not just pagecache but also (in this case)
> > > > fb memory, and whatever else device drivers decide to mmap.
> > >
> > > Yes, that part I remember.
> > >
> > > I thought we would be passing in a page into rmap_wrprotect_file_page(), and
> > > was wondering what we would do to "struct page" that won't be a folio in
> > > there.
> > >
> > > Probably, because the "_page" in rmap_wrprotect_file_page() is misleading :)
> > >
> > > ... should it be "file_range" ? (but we also pass the pfn ... )
> >
> > I don't think it's unprecedented for us to identify a page by its pfn.
> > After all, the acronym stands for "page frame number".  That said, for
> > the one caller of this, it has the struct page and passes in the result
> > from page_to_pfn().  So no harm in passing in the struct page directly.
> >
> > I would not like to see this function called "rmap_wrprotect_file_pfn".
> > Files don't have pfns, so that's a bad name.
>
> Agreed.
>
> (it's too late in the evening for me to give any good suggestions :) )

Matthew pinged me on irc with mapping_wrprotect_page() :>)

Am happy to do that, will respin in a bit anyway...

>
> --
> Cheers,
>
> David / dhildenb
>
Lorenzo Stoakes Jan. 13, 2025, 5:48 p.m. UTC | #9
On Wed, Jan 08, 2025 at 10:12:36PM +0100, David Hildenbrand wrote:
> On 08.01.25 21:54, Matthew Wilcox wrote:
> > On Wed, Jan 08, 2025 at 09:14:53PM +0100, David Hildenbrand wrote:
> > > On 08.01.25 18:32, Matthew Wilcox wrote:
> > > > On Wed, Jan 08, 2025 at 04:18:42PM +0000, Lorenzo Stoakes wrote:
> > > > > @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work)
> > > > >    		struct folio *folio = page_folio(pageref->page);
> > > > >    		folio_lock(folio);
> > > > > -		folio_mkclean(folio);
> > > > > +		rmap_wrprotect_file_page(fbdefio->mapping,
> > > > > +					 pageref->offset >> PAGE_SHIFT,
> > > > > +					 compound_nr(pageref->page),
> > > > > +					 page_to_pfn(pageref->page));
> > > > >    		folio_unlock(folio);
> > > >
> > > > Why do we need to lock the folio?  (since this isn't necessarily a
> > > > folio)
> > >
> > > Can you clarify the "since this isn't necessarily a folio" part ? Do you
> > > mean in the future, when we split "struct page" and "struct folio"?
> >
> > Right.  I need to finish the email that explains where I think we're
> > going in 2025 ...
> >
> > > Doing an rmap walk on something that won't be a folio is ... sounds odd
> > > (->wrong :) )
> >
> > Not necessarily!  We already do that (since 2022) for DAX (see
> > 6a8e0596f004).  rmap lets you find every place that a given range
> > of a file is mapped into user address spaces; but that file might be a
> > device file, and so it's not just pagecache but also (in this case)
> > fb memory, and whatever else device drivers decide to mmap.
>
> Yes, that part I remember.
>
> I thought we would be passing in a page into rmap_wrprotect_file_page(), and
> was wondering what we would do to "struct page" that won't be a folio in
> there.

The reason I provide a PFN is that we internally use a PFN for the walk, and
everything else is folio-fied for stuff that isn't necessarily a folio.

However it does seem silly to have to page_to_pfn() a page that we pass in, so I
will update to accept a page and do this bit in the function itself.

>
> Probably, because the "_page" in rmap_wrprotect_file_page() is misleading :)
>
> ... should it be "file_range" ? (but we also pass the pfn ... )
>
> --
> Cheers,
>
> David / dhildenb
>
Lorenzo Stoakes Jan. 13, 2025, 11:01 p.m. UTC | #10
On Wed, Jan 08, 2025 at 07:41:31PM +0000, Lorenzo Stoakes wrote:
> On Wed, Jan 08, 2025 at 05:32:54PM +0000, Matthew Wilcox wrote:
> > On Wed, Jan 08, 2025 at 04:18:42PM +0000, Lorenzo Stoakes wrote:
> > > @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work)
> > >  		struct folio *folio = page_folio(pageref->page);
> > >
> > >  		folio_lock(folio);
> > > -		folio_mkclean(folio);
> > > +		rmap_wrprotect_file_page(fbdefio->mapping,
> > > +					 pageref->offset >> PAGE_SHIFT,
> > > +					 compound_nr(pageref->page),
> > > +					 page_to_pfn(pageref->page));
> > >  		folio_unlock(folio);
> >
> > Why do we need to lock the folio?  (since this isn't necessarily a
> > folio)  Also, do we need compound_nr() here?  I _think_ for defio,
> > the number of pages allocated per object are fixed, so this should be
> > an fbdefio->nr_pages field?
>
> I'm trying to keep the code as similar as possible to the way it was before,
> even if there are questionable parts.
>
> There is a comment about some timing issue around the locks and so there appears
> to be an assumption about that.

Actually, reading through the code, I think the comment is with regards to
page_mkwrite(), so we should be ok, in fb_deferred_io_track_page():

	/*
	 * We want the page to remain locked from ->page_mkwrite until
	 * the PTE is marked dirty to avoid mapping_wrprotect_page()
	 * being called before the PTE is updated, which would leave
	 * the page ignored by defio.
	 * Do this by locking the page here and informing the caller
	 * about it with VM_FAULT_LOCKED.
	 */
	lock_page(pageref->page);

I don't think we need to lock the page (which is managed as kernel memory so
doesn't require it).

So will remove.

>
> As to compound_nr(), we're not write protecting everything, just each invidiual
> page in the list that needs it, so we only want to do one at a time. I strongly
> suspect it's a single base page each time, but for belts + braces I'm doing
> compound_nr().
>
> See below, this is wrong, it should just be '1'.
>
> So this is iterating through a list of pagerefs that can be in any random order.
>
> >
> > (something that's always troubled me about compound_nr() is that it
> > returns 1 for tail pages and the number you actually expect for head
> > pages)
> >
>
> OK I changed this from '1' to compound_nr() out of an (apparently) abundance of
> caution, but I was wrong:
>
> npagerefs = DIV_ROUND_UP(info->fix.smem_len, PAGE_SIZE);
>
> There are page refs for each PAGE_SIZE (i.e. base page size), so there is no way
> anything is compound.
>
> Will switch this to 1.
diff mbox series

Patch

diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
index 65363df8e81b..186ff902da5f 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -69,14 +69,6 @@  static struct fb_deferred_io_pageref *fb_deferred_io_pageref_lookup(struct fb_in
 	return pageref;
 }
 
-static void fb_deferred_io_pageref_clear(struct fb_deferred_io_pageref *pageref)
-{
-	struct page *page = pageref->page;
-
-	if (page)
-		page->mapping = NULL;
-}
-
 static struct fb_deferred_io_pageref *fb_deferred_io_pageref_get(struct fb_info *info,
 								 unsigned long offset,
 								 struct page *page)
@@ -140,13 +132,10 @@  static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf)
 	if (!page)
 		return VM_FAULT_SIGBUS;
 
-	if (vmf->vma->vm_file)
-		page->mapping = vmf->vma->vm_file->f_mapping;
-	else
+	if (!vmf->vma->vm_file)
 		printk(KERN_ERR "no mapping available\n");
 
-	BUG_ON(!page->mapping);
-	page->index = vmf->pgoff; /* for folio_mkclean() */
+	BUG_ON(!info->fbdefio->mapping);
 
 	vmf->page = page;
 	return 0;
@@ -194,9 +183,9 @@  static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned long
 
 	/*
 	 * We want the page to remain locked from ->page_mkwrite until
-	 * the PTE is marked dirty to avoid folio_mkclean() being called
-	 * before the PTE is updated, which would leave the page ignored
-	 * by defio.
+	 * the PTE is marked dirty to avoid rmap_wrprotect_file_page()
+	 * being called before the PTE is updated, which would leave
+	 * the page ignored by defio.
 	 * Do this by locking the page here and informing the caller
 	 * about it with VM_FAULT_LOCKED.
 	 */
@@ -280,7 +269,10 @@  static void fb_deferred_io_work(struct work_struct *work)
 		struct folio *folio = page_folio(pageref->page);
 
 		folio_lock(folio);
-		folio_mkclean(folio);
+		rmap_wrprotect_file_page(fbdefio->mapping,
+					 pageref->offset >> PAGE_SHIFT,
+					 compound_nr(pageref->page),
+					 page_to_pfn(pageref->page));
 		folio_unlock(folio);
 	}
 
@@ -337,6 +329,7 @@  void fb_deferred_io_open(struct fb_info *info,
 {
 	struct fb_deferred_io *fbdefio = info->fbdefio;
 
+	fbdefio->mapping = file->f_mapping;
 	file->f_mapping->a_ops = &fb_deferred_io_aops;
 	fbdefio->open_count++;
 }
@@ -344,13 +337,7 @@  EXPORT_SYMBOL_GPL(fb_deferred_io_open);
 
 static void fb_deferred_io_lastclose(struct fb_info *info)
 {
-	unsigned long i;
-
 	flush_delayed_work(&info->deferred_work);
-
-	/* clear out the mapping that we setup */
-	for (i = 0; i < info->npagerefs; ++i)
-		fb_deferred_io_pageref_clear(&info->pagerefs[i]);
 }
 
 void fb_deferred_io_release(struct fb_info *info)
@@ -370,5 +357,6 @@  void fb_deferred_io_cleanup(struct fb_info *info)
 
 	kvfree(info->pagerefs);
 	mutex_destroy(&fbdefio->lock);
+	fbdefio->mapping = NULL;
 }
 EXPORT_SYMBOL_GPL(fb_deferred_io_cleanup);
diff --git a/include/linux/fb.h b/include/linux/fb.h
index 5ba187e08cf7..cd653862ab99 100644
--- a/include/linux/fb.h
+++ b/include/linux/fb.h
@@ -225,6 +225,7 @@  struct fb_deferred_io {
 	int open_count; /* number of opened files; protected by fb_info lock */
 	struct mutex lock; /* mutex that protects the pageref list */
 	struct list_head pagereflist; /* list of pagerefs for touched pages */
+	struct address_space *mapping; /* page cache object for fb device */
 	/* callback */
 	struct page *(*get_page)(struct fb_info *info, unsigned long offset);
 	void (*deferred_io)(struct fb_info *info, struct list_head *pagelist);