Message ID | 1e452b5b65f15a9a5d0c2ed3f5f812fdd1367603.1736352361.git.lorenzo.stoakes@oracle.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Headers | show |
Series | expose mapping wrprotect, fix fb_defio use | expand |
On Wed, Jan 08, 2025 at 04:18:42PM +0000, Lorenzo Stoakes wrote: > @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work) > struct folio *folio = page_folio(pageref->page); > > folio_lock(folio); > - folio_mkclean(folio); > + rmap_wrprotect_file_page(fbdefio->mapping, > + pageref->offset >> PAGE_SHIFT, > + compound_nr(pageref->page), > + page_to_pfn(pageref->page)); > folio_unlock(folio); Why do we need to lock the folio? (since this isn't necessarily a folio) Also, do we need compound_nr() here? I _think_ for defio, the number of pages allocated per object are fixed, so this should be an fbdefio->nr_pages field? (something that's always troubled me about compound_nr() is that it returns 1 for tail pages and the number you actually expect for head pages)
On Wed, Jan 08, 2025 at 05:32:54PM +0000, Matthew Wilcox wrote: > On Wed, Jan 08, 2025 at 04:18:42PM +0000, Lorenzo Stoakes wrote: > > @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work) > > struct folio *folio = page_folio(pageref->page); > > > > folio_lock(folio); > > - folio_mkclean(folio); > > + rmap_wrprotect_file_page(fbdefio->mapping, > > + pageref->offset >> PAGE_SHIFT, > > + compound_nr(pageref->page), > > + page_to_pfn(pageref->page)); > > folio_unlock(folio); > > Why do we need to lock the folio? (since this isn't necessarily a > folio) Also, do we need compound_nr() here? I _think_ for defio, > the number of pages allocated per object are fixed, so this should be > an fbdefio->nr_pages field? I'm trying to keep the code as similar as possible to the way it was before, even if there are questionable parts. There is a comment about some timing issue around the locks and so there appears to be an assumption about that. As to compound_nr(), we're not write protecting everything, just each invidiual page in the list that needs it, so we only want to do one at a time. I strongly suspect it's a single base page each time, but for belts + braces I'm doing compound_nr(). See below, this is wrong, it should just be '1'. So this is iterating through a list of pagerefs that can be in any random order. > > (something that's always troubled me about compound_nr() is that it > returns 1 for tail pages and the number you actually expect for head > pages) > OK I changed this from '1' to compound_nr() out of an (apparently) abundance of caution, but I was wrong: npagerefs = DIV_ROUND_UP(info->fix.smem_len, PAGE_SIZE); There are page refs for each PAGE_SIZE (i.e. base page size), so there is no way anything is compound. Will switch this to 1.
On 08.01.25 18:32, Matthew Wilcox wrote: > On Wed, Jan 08, 2025 at 04:18:42PM +0000, Lorenzo Stoakes wrote: >> @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work) >> struct folio *folio = page_folio(pageref->page); >> >> folio_lock(folio); >> - folio_mkclean(folio); >> + rmap_wrprotect_file_page(fbdefio->mapping, >> + pageref->offset >> PAGE_SHIFT, >> + compound_nr(pageref->page), >> + page_to_pfn(pageref->page)); >> folio_unlock(folio); > > Why do we need to lock the folio? (since this isn't necessarily a > folio) Can you clarify the "since this isn't necessarily a folio" part ? Do you mean in the future, when we split "struct page" and "struct folio"? Doing an rmap walk on something that won't be a folio is ... sounds odd (->wrong :) )
On Wed, Jan 08, 2025 at 09:14:53PM +0100, David Hildenbrand wrote: > On 08.01.25 18:32, Matthew Wilcox wrote: > > On Wed, Jan 08, 2025 at 04:18:42PM +0000, Lorenzo Stoakes wrote: > > > @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work) > > > struct folio *folio = page_folio(pageref->page); > > > folio_lock(folio); > > > - folio_mkclean(folio); > > > + rmap_wrprotect_file_page(fbdefio->mapping, > > > + pageref->offset >> PAGE_SHIFT, > > > + compound_nr(pageref->page), > > > + page_to_pfn(pageref->page)); > > > folio_unlock(folio); > > > > Why do we need to lock the folio? (since this isn't necessarily a > > folio) > > Can you clarify the "since this isn't necessarily a folio" part ? Do you > mean in the future, when we split "struct page" and "struct folio"? Right. I need to finish the email that explains where I think we're going in 2025 ... > Doing an rmap walk on something that won't be a folio is ... sounds odd > (->wrong :) ) Not necessarily! We already do that (since 2022) for DAX (see 6a8e0596f004). rmap lets you find every place that a given range of a file is mapped into user address spaces; but that file might be a device file, and so it's not just pagecache but also (in this case) fb memory, and whatever else device drivers decide to mmap.
On 08.01.25 21:54, Matthew Wilcox wrote: > On Wed, Jan 08, 2025 at 09:14:53PM +0100, David Hildenbrand wrote: >> On 08.01.25 18:32, Matthew Wilcox wrote: >>> On Wed, Jan 08, 2025 at 04:18:42PM +0000, Lorenzo Stoakes wrote: >>>> @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work) >>>> struct folio *folio = page_folio(pageref->page); >>>> folio_lock(folio); >>>> - folio_mkclean(folio); >>>> + rmap_wrprotect_file_page(fbdefio->mapping, >>>> + pageref->offset >> PAGE_SHIFT, >>>> + compound_nr(pageref->page), >>>> + page_to_pfn(pageref->page)); >>>> folio_unlock(folio); >>> >>> Why do we need to lock the folio? (since this isn't necessarily a >>> folio) >> >> Can you clarify the "since this isn't necessarily a folio" part ? Do you >> mean in the future, when we split "struct page" and "struct folio"? > > Right. I need to finish the email that explains where I think we're > going in 2025 ... > >> Doing an rmap walk on something that won't be a folio is ... sounds odd >> (->wrong :) ) > > Not necessarily! We already do that (since 2022) for DAX (see > 6a8e0596f004). rmap lets you find every place that a given range > of a file is mapped into user address spaces; but that file might be a > device file, and so it's not just pagecache but also (in this case) > fb memory, and whatever else device drivers decide to mmap. Yes, that part I remember. I thought we would be passing in a page into rmap_wrprotect_file_page(), and was wondering what we would do to "struct page" that won't be a folio in there. Probably, because the "_page" in rmap_wrprotect_file_page() is misleading :) ... should it be "file_range" ? (but we also pass the pfn ... )
On Wed, Jan 08, 2025 at 10:12:36PM +0100, David Hildenbrand wrote: > On 08.01.25 21:54, Matthew Wilcox wrote: > > Not necessarily! We already do that (since 2022) for DAX (see > > 6a8e0596f004). rmap lets you find every place that a given range > > of a file is mapped into user address spaces; but that file might be a > > device file, and so it's not just pagecache but also (in this case) > > fb memory, and whatever else device drivers decide to mmap. > > Yes, that part I remember. > > I thought we would be passing in a page into rmap_wrprotect_file_page(), and > was wondering what we would do to "struct page" that won't be a folio in > there. > > Probably, because the "_page" in rmap_wrprotect_file_page() is misleading :) > > ... should it be "file_range" ? (but we also pass the pfn ... ) I don't think it's unprecedented for us to identify a page by its pfn. After all, the acronym stands for "page frame number". That said, for the one caller of this, it has the struct page and passes in the result from page_to_pfn(). So no harm in passing in the struct page directly. I would not like to see this function called "rmap_wrprotect_file_pfn". Files don't have pfns, so that's a bad name.
On 08.01.25 22:55, Matthew Wilcox wrote: > On Wed, Jan 08, 2025 at 10:12:36PM +0100, David Hildenbrand wrote: >> On 08.01.25 21:54, Matthew Wilcox wrote: >>> Not necessarily! We already do that (since 2022) for DAX (see >>> 6a8e0596f004). rmap lets you find every place that a given range >>> of a file is mapped into user address spaces; but that file might be a >>> device file, and so it's not just pagecache but also (in this case) >>> fb memory, and whatever else device drivers decide to mmap. >> >> Yes, that part I remember. >> >> I thought we would be passing in a page into rmap_wrprotect_file_page(), and >> was wondering what we would do to "struct page" that won't be a folio in >> there. >> >> Probably, because the "_page" in rmap_wrprotect_file_page() is misleading :) >> >> ... should it be "file_range" ? (but we also pass the pfn ... ) > > I don't think it's unprecedented for us to identify a page by its pfn. > After all, the acronym stands for "page frame number". That said, for > the one caller of this, it has the struct page and passes in the result > from page_to_pfn(). So no harm in passing in the struct page directly. > > I would not like to see this function called "rmap_wrprotect_file_pfn". > Files don't have pfns, so that's a bad name. Agreed. (it's too late in the evening for me to give any good suggestions :) )
On Wed, Jan 08, 2025 at 11:02:57PM +0100, David Hildenbrand wrote: > On 08.01.25 22:55, Matthew Wilcox wrote: > > On Wed, Jan 08, 2025 at 10:12:36PM +0100, David Hildenbrand wrote: > > > On 08.01.25 21:54, Matthew Wilcox wrote: > > > > Not necessarily! We already do that (since 2022) for DAX (see > > > > 6a8e0596f004). rmap lets you find every place that a given range > > > > of a file is mapped into user address spaces; but that file might be a > > > > device file, and so it's not just pagecache but also (in this case) > > > > fb memory, and whatever else device drivers decide to mmap. > > > > > > Yes, that part I remember. > > > > > > I thought we would be passing in a page into rmap_wrprotect_file_page(), and > > > was wondering what we would do to "struct page" that won't be a folio in > > > there. > > > > > > Probably, because the "_page" in rmap_wrprotect_file_page() is misleading :) > > > > > > ... should it be "file_range" ? (but we also pass the pfn ... ) > > > > I don't think it's unprecedented for us to identify a page by its pfn. > > After all, the acronym stands for "page frame number". That said, for > > the one caller of this, it has the struct page and passes in the result > > from page_to_pfn(). So no harm in passing in the struct page directly. > > > > I would not like to see this function called "rmap_wrprotect_file_pfn". > > Files don't have pfns, so that's a bad name. > > Agreed. > > (it's too late in the evening for me to give any good suggestions :) ) Matthew pinged me on irc with mapping_wrprotect_page() :>) Am happy to do that, will respin in a bit anyway... > > -- > Cheers, > > David / dhildenb >
On Wed, Jan 08, 2025 at 10:12:36PM +0100, David Hildenbrand wrote: > On 08.01.25 21:54, Matthew Wilcox wrote: > > On Wed, Jan 08, 2025 at 09:14:53PM +0100, David Hildenbrand wrote: > > > On 08.01.25 18:32, Matthew Wilcox wrote: > > > > On Wed, Jan 08, 2025 at 04:18:42PM +0000, Lorenzo Stoakes wrote: > > > > > @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work) > > > > > struct folio *folio = page_folio(pageref->page); > > > > > folio_lock(folio); > > > > > - folio_mkclean(folio); > > > > > + rmap_wrprotect_file_page(fbdefio->mapping, > > > > > + pageref->offset >> PAGE_SHIFT, > > > > > + compound_nr(pageref->page), > > > > > + page_to_pfn(pageref->page)); > > > > > folio_unlock(folio); > > > > > > > > Why do we need to lock the folio? (since this isn't necessarily a > > > > folio) > > > > > > Can you clarify the "since this isn't necessarily a folio" part ? Do you > > > mean in the future, when we split "struct page" and "struct folio"? > > > > Right. I need to finish the email that explains where I think we're > > going in 2025 ... > > > > > Doing an rmap walk on something that won't be a folio is ... sounds odd > > > (->wrong :) ) > > > > Not necessarily! We already do that (since 2022) for DAX (see > > 6a8e0596f004). rmap lets you find every place that a given range > > of a file is mapped into user address spaces; but that file might be a > > device file, and so it's not just pagecache but also (in this case) > > fb memory, and whatever else device drivers decide to mmap. > > Yes, that part I remember. > > I thought we would be passing in a page into rmap_wrprotect_file_page(), and > was wondering what we would do to "struct page" that won't be a folio in > there. The reason I provide a PFN is that we internally use a PFN for the walk, and everything else is folio-fied for stuff that isn't necessarily a folio. However it does seem silly to have to page_to_pfn() a page that we pass in, so I will update to accept a page and do this bit in the function itself. > > Probably, because the "_page" in rmap_wrprotect_file_page() is misleading :) > > ... should it be "file_range" ? (but we also pass the pfn ... ) > > -- > Cheers, > > David / dhildenb >
On Wed, Jan 08, 2025 at 07:41:31PM +0000, Lorenzo Stoakes wrote: > On Wed, Jan 08, 2025 at 05:32:54PM +0000, Matthew Wilcox wrote: > > On Wed, Jan 08, 2025 at 04:18:42PM +0000, Lorenzo Stoakes wrote: > > > @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work) > > > struct folio *folio = page_folio(pageref->page); > > > > > > folio_lock(folio); > > > - folio_mkclean(folio); > > > + rmap_wrprotect_file_page(fbdefio->mapping, > > > + pageref->offset >> PAGE_SHIFT, > > > + compound_nr(pageref->page), > > > + page_to_pfn(pageref->page)); > > > folio_unlock(folio); > > > > Why do we need to lock the folio? (since this isn't necessarily a > > folio) Also, do we need compound_nr() here? I _think_ for defio, > > the number of pages allocated per object are fixed, so this should be > > an fbdefio->nr_pages field? > > I'm trying to keep the code as similar as possible to the way it was before, > even if there are questionable parts. > > There is a comment about some timing issue around the locks and so there appears > to be an assumption about that. Actually, reading through the code, I think the comment is with regards to page_mkwrite(), so we should be ok, in fb_deferred_io_track_page(): /* * We want the page to remain locked from ->page_mkwrite until * the PTE is marked dirty to avoid mapping_wrprotect_page() * being called before the PTE is updated, which would leave * the page ignored by defio. * Do this by locking the page here and informing the caller * about it with VM_FAULT_LOCKED. */ lock_page(pageref->page); I don't think we need to lock the page (which is managed as kernel memory so doesn't require it). So will remove. > > As to compound_nr(), we're not write protecting everything, just each invidiual > page in the list that needs it, so we only want to do one at a time. I strongly > suspect it's a single base page each time, but for belts + braces I'm doing > compound_nr(). > > See below, this is wrong, it should just be '1'. > > So this is iterating through a list of pagerefs that can be in any random order. > > > > > (something that's always troubled me about compound_nr() is that it > > returns 1 for tail pages and the number you actually expect for head > > pages) > > > > OK I changed this from '1' to compound_nr() out of an (apparently) abundance of > caution, but I was wrong: > > npagerefs = DIV_ROUND_UP(info->fix.smem_len, PAGE_SIZE); > > There are page refs for each PAGE_SIZE (i.e. base page size), so there is no way > anything is compound. > > Will switch this to 1.
diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c index 65363df8e81b..186ff902da5f 100644 --- a/drivers/video/fbdev/core/fb_defio.c +++ b/drivers/video/fbdev/core/fb_defio.c @@ -69,14 +69,6 @@ static struct fb_deferred_io_pageref *fb_deferred_io_pageref_lookup(struct fb_in return pageref; } -static void fb_deferred_io_pageref_clear(struct fb_deferred_io_pageref *pageref) -{ - struct page *page = pageref->page; - - if (page) - page->mapping = NULL; -} - static struct fb_deferred_io_pageref *fb_deferred_io_pageref_get(struct fb_info *info, unsigned long offset, struct page *page) @@ -140,13 +132,10 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf) if (!page) return VM_FAULT_SIGBUS; - if (vmf->vma->vm_file) - page->mapping = vmf->vma->vm_file->f_mapping; - else + if (!vmf->vma->vm_file) printk(KERN_ERR "no mapping available\n"); - BUG_ON(!page->mapping); - page->index = vmf->pgoff; /* for folio_mkclean() */ + BUG_ON(!info->fbdefio->mapping); vmf->page = page; return 0; @@ -194,9 +183,9 @@ static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned long /* * We want the page to remain locked from ->page_mkwrite until - * the PTE is marked dirty to avoid folio_mkclean() being called - * before the PTE is updated, which would leave the page ignored - * by defio. + * the PTE is marked dirty to avoid rmap_wrprotect_file_page() + * being called before the PTE is updated, which would leave + * the page ignored by defio. * Do this by locking the page here and informing the caller * about it with VM_FAULT_LOCKED. */ @@ -280,7 +269,10 @@ static void fb_deferred_io_work(struct work_struct *work) struct folio *folio = page_folio(pageref->page); folio_lock(folio); - folio_mkclean(folio); + rmap_wrprotect_file_page(fbdefio->mapping, + pageref->offset >> PAGE_SHIFT, + compound_nr(pageref->page), + page_to_pfn(pageref->page)); folio_unlock(folio); } @@ -337,6 +329,7 @@ void fb_deferred_io_open(struct fb_info *info, { struct fb_deferred_io *fbdefio = info->fbdefio; + fbdefio->mapping = file->f_mapping; file->f_mapping->a_ops = &fb_deferred_io_aops; fbdefio->open_count++; } @@ -344,13 +337,7 @@ EXPORT_SYMBOL_GPL(fb_deferred_io_open); static void fb_deferred_io_lastclose(struct fb_info *info) { - unsigned long i; - flush_delayed_work(&info->deferred_work); - - /* clear out the mapping that we setup */ - for (i = 0; i < info->npagerefs; ++i) - fb_deferred_io_pageref_clear(&info->pagerefs[i]); } void fb_deferred_io_release(struct fb_info *info) @@ -370,5 +357,6 @@ void fb_deferred_io_cleanup(struct fb_info *info) kvfree(info->pagerefs); mutex_destroy(&fbdefio->lock); + fbdefio->mapping = NULL; } EXPORT_SYMBOL_GPL(fb_deferred_io_cleanup); diff --git a/include/linux/fb.h b/include/linux/fb.h index 5ba187e08cf7..cd653862ab99 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -225,6 +225,7 @@ struct fb_deferred_io { int open_count; /* number of opened files; protected by fb_info lock */ struct mutex lock; /* mutex that protects the pageref list */ struct list_head pagereflist; /* list of pagerefs for touched pages */ + struct address_space *mapping; /* page cache object for fb device */ /* callback */ struct page *(*get_page)(struct fb_info *info, unsigned long offset); void (*deferred_io)(struct fb_info *info, struct list_head *pagelist);
With the introduction of rmap_wrprotect_file_page() there is no need to use folio_mkclean() in order to write-protect mappings of frame buffer pages, and therefore no need to inappropriately set kernel-allocated page->index, mapping fields to permit this operation. Instead, store the pointer to the page cache object for the mapped driver in the fb_deferred_io object, and use the already stored page offset from the pageref object to look up mappings in order to write-protect them. This is justified, as for the page objects to store a mapping pointer at the point of assignment of pages, they must all reference the same underlying address_space object. Since the life time of the pagerefs is also the lifetime of the fb_deferred_io object, storing the pointer here makes snese. This eliminates the need for all of the logic around setting and maintaining page->index,mapping which we remove. This eliminates the use of folio_mkclean() entirely but otherwise should have no functional change. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> --- drivers/video/fbdev/core/fb_defio.c | 34 ++++++++++------------------- include/linux/fb.h | 1 + 2 files changed, 12 insertions(+), 23 deletions(-)