Message ID | 20230703055850.227169-1-sidhartha.kumar@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/4] mm/memory: convert do_page_mkwrite() to use folios | expand |
On 2023/7/3 13:58, Sidhartha Kumar wrote: > Saves one implicit call to compound_head(); > > Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> > --- > mm/memory.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 21fab27272092..098fac2f5efc0 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2932,7 +2932,7 @@ static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma) > static vm_fault_t do_page_mkwrite(struct vm_fault *vmf) > { > vm_fault_t ret; > - struct page *page = vmf->page; > + struct folio *folio = page_folio(vmf->page); > unsigned int old_flags = vmf->flags; > > vmf->flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE; > @@ -2947,14 +2947,14 @@ static vm_fault_t do_page_mkwrite(struct vm_fault *vmf) > if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) > return ret; > if (unlikely(!(ret & VM_FAULT_LOCKED))) { > - lock_page(page); > - if (!page->mapping) { > - unlock_page(page); > + folio_lock(folio); > + if (!folio_mapping(folio)) { Could page->mapping be directly converted to folio->mapping? Thanks, Peng > + folio_unlock(folio); > return 0; /* retry */ > } > ret |= VM_FAULT_LOCKED; > } else > - VM_BUG_ON_PAGE(!PageLocked(page), page); > + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); > return ret; > } >
On Sun, Jul 02, 2023 at 10:58:47PM -0700, Sidhartha Kumar wrote: > @@ -2947,14 +2947,14 @@ static vm_fault_t do_page_mkwrite(struct vm_fault *vmf) > if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) > return ret; > if (unlikely(!(ret & VM_FAULT_LOCKED))) { > - lock_page(page); > - if (!page->mapping) { > - unlock_page(page); > + folio_lock(folio); > + if (!folio_mapping(folio)) { You don't need to call folio_mapping() here. folio->mapping works absolutely fine in this circumstance. In fact, you may have broken a driver with this change. I can elaborate more, but I'm not quite in the mood to do that right now.
On Sun, Jul 02, 2023 at 10:58:47PM -0700, Sidhartha Kumar wrote: > @@ -2947,14 +2947,14 @@ static vm_fault_t do_page_mkwrite(struct vm_fault *vmf) > if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) > return ret; > if (unlikely(!(ret & VM_FAULT_LOCKED))) { > - lock_page(page); > - if (!page->mapping) { > - unlock_page(page); > + folio_lock(folio); > + if (!folio_mapping(folio)) { > + folio_unlock(folio); I promised to explain this better once I had time, and I have time now. folio->mapping is used for a multitude of purposes, unfortunately. Maybe some future work will reduce that, but for now, These Are The Rules. If the folio is marked as being Slab, it's used for something else. The folio does not belong to an address space (nor can it be mapped, so we're not going to see it here, but sometimes we see it in other contexts where we call folio_mapping()). The bottom two bits are used as PAGE_MAPPING_FLAGS. If they're both 0, this folio belongs to a file, and the rest of folio->mapping is a pointer to a struct address_space. Or they're both 0 because the whole thing is NULL. More on that below. If the bottom two bits are 01b, this is an anonymous folio, and folio->mapping is actually a pointer to an anon_vma (which is not the same thing as an anon vma). If the bottom two bits are 10b, this is a Movable page (anon & file memory is also movable, but this is different). The folio->mapping points to a struct movable_operations. If the bottom two bits are 11b, this is a KSM allocation, and folio->mapping points to a struct ksm_stable_node. When we remove a folio from the page cache, we reset folio->mapping to NULL. We often remove folios from the page cache before their refcount drops to zero (the common case is to look up the folio in the page cache, which grabs a reference, remove the folio from the page cache which decrements the refcount, then put the folio which might be the last refcount). So it's entirely possible to see a folio in this function with a NULL mapping; that means it's been removed from the file through a truncate or siimlar, and we need to fail the mkwrite. Userspace is about to get a segfault. If you find all of that confusing, well, I agree, and I'm trying to simplify it. So, with all that background, what's going on here? Part of the "modern" protocol for handling page faults is to lock the folio in vm_ops->page_mkwrite. But we still support (... why?) drivers that haven't been updated. They return 0 on success instead of VM_FAULT_LOCKED. So we take the lock for them, then check that the folio wasn't truncated, and bail out if it looks like it was. If we have a really old-school driver that has allocated a page, mapped it to userspace, and set page->mapping to be, eg, Movable, by calling folio_mapping() instead of folio->mapping, we'll end up seeing NULL instead of a non-NULL value, mistakenly believe it to have been truncated and enter an endless loop. Am I being paranoid here? Maybe! Drivers should have been updated by now. The "modern" way was introduced in 2007 (commit d0217ac04ca6), so it'd be nice to turn this into a WARN_ON_ONCE so drivers fix their code. There are only ~30 implementations of page_mkwrite in the kernel, so it might not take too long to check everything's OK.
diff --git a/mm/memory.c b/mm/memory.c index 21fab27272092..098fac2f5efc0 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2932,7 +2932,7 @@ static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma) static vm_fault_t do_page_mkwrite(struct vm_fault *vmf) { vm_fault_t ret; - struct page *page = vmf->page; + struct folio *folio = page_folio(vmf->page); unsigned int old_flags = vmf->flags; vmf->flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE; @@ -2947,14 +2947,14 @@ static vm_fault_t do_page_mkwrite(struct vm_fault *vmf) if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) return ret; if (unlikely(!(ret & VM_FAULT_LOCKED))) { - lock_page(page); - if (!page->mapping) { - unlock_page(page); + folio_lock(folio); + if (!folio_mapping(folio)) { + folio_unlock(folio); return 0; /* retry */ } ret |= VM_FAULT_LOCKED; } else - VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); return ret; }
Saves one implicit call to compound_head(); Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> --- mm/memory.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)