diff mbox series

[6/8] mm: Convert find_get_entry to return the head page

Message ID 20200819184850.24779-7-willy@infradead.org (mailing list archive)
State New, archived
Headers show
Series Return head pages from find_get_entry and find_lock_entry | expand

Commit Message

Matthew Wilcox Aug. 19, 2020, 6:48 p.m. UTC
There are only three callers remaining of find_get_entry().
find_get_swap_page() is happy to get the head page instead of the subpage.
Add find_subpage() calls to find_lock_entry() and pagecache_get_page()
to avoid auditing all their callers.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h |  4 ++--
 mm/filemap.c            | 13 +++++++------
 2 files changed, 9 insertions(+), 8 deletions(-)

Comments

Johannes Weiner Aug. 26, 2020, 3:09 p.m. UTC | #1
On Wed, Aug 19, 2020 at 07:48:48PM +0100, Matthew Wilcox (Oracle) wrote:
> There are only three callers remaining of find_get_entry().
> find_get_swap_page() is happy to get the head page instead of the subpage.
> Add find_subpage() calls to find_lock_entry() and pagecache_get_page()
> to avoid auditing all their callers.

I believe this would cause a subtle bug in memcg charge moving for pte
mapped huge pages. We currently skip over tail pages in the range
(they don't have page->mem_cgroup set) and account for the huge page
once from the headpage. After this change, we would see the headpage
and account for it 512 times (or whatever the number is on non-x86).

But that aside, I don't quite understand the intent.

Before, all these functions simply return the base page at @index,
whether it's a regular page or a tail page.

Afterwards, find_lock_entry(), find_get_page() et al still do, but
find_get_entry() returns headpage at @index & HPAGE_CACHE_INDEX_MASK.

Shouldn't we be consistent about how we handle huge pages when
somebody queries the tree for a given base page index?

[ Wouldn't that mean that e.g. find_get_swap_page() would return tail
  pages for regular files and head pages for shmem files? ]
Matthew Wilcox Aug. 26, 2020, 3:48 p.m. UTC | #2
On Wed, Aug 26, 2020 at 11:09:25AM -0400, Johannes Weiner wrote:
> On Wed, Aug 19, 2020 at 07:48:48PM +0100, Matthew Wilcox (Oracle) wrote:
> > There are only three callers remaining of find_get_entry().
> > find_get_swap_page() is happy to get the head page instead of the subpage.
> > Add find_subpage() calls to find_lock_entry() and pagecache_get_page()
> > to avoid auditing all their callers.
> 
> I believe this would cause a subtle bug in memcg charge moving for pte
> mapped huge pages. We currently skip over tail pages in the range
> (they don't have page->mem_cgroup set) and account for the huge page
> once from the headpage. After this change, we would see the headpage
> and account for it 512 times (or whatever the number is on non-x86).

Hmm ... so if you have the last 511 pages of a huge page mapped, you
actually don't charge for it at all today?

I think you're right that I'd introduce this bug, and so that needs to
be fixed.

> But that aside, I don't quite understand the intent.
> 
> Before, all these functions simply return the base page at @index,
> whether it's a regular page or a tail page.
> 
> Afterwards, find_lock_entry(), find_get_page() et al still do, but
> find_get_entry() returns headpage at @index & HPAGE_CACHE_INDEX_MASK.
> 
> Shouldn't we be consistent about how we handle huge pages when
> somebody queries the tree for a given base page index?
> 
> [ Wouldn't that mean that e.g. find_get_swap_page() would return tail
>   pages for regular files and head pages for shmem files? ]

What I'd _like_ to do is convert all the callers to cope with tail
pages never being returned from all the find_* functions.  That seems
like a lot of disruption.

My intent in this series is to get all the find_*_entr{y,ies}
functions to the point where they don't return tail pages.
Also find_get_pages_tag() because tags are only set on head pages.

This is generally what the callers want anyway.  There's even a hack
in find_get_entries() in current to terminate early on finding a THP
(see commit 71725ed10c40696dc6bdccf8e225815dcef24dba).  If I want
to remove that, I need to do _something_ to not put all the subpages
of a THP into the pagevec.

So the new rule will be that find_*_entry() don't return tail pages but
find_*_page() do.  With the full THP patchset in place, THPs become quite
common, so bugs in this area will surface quickly instead of lingering
for years and only popping out in rare circumstances.
diff mbox series

Patch

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 7de11dcd534d..8261c64f592d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -384,8 +384,8 @@  static inline struct page *find_subpage(struct page *head, pgoff_t index)
 	return head + (index & (thp_nr_pages(head) - 1));
 }
 
-struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
-struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset);
+struct page *find_get_entry(struct address_space *mapping, pgoff_t index);
+struct page *find_lock_entry(struct address_space *mapping, pgoff_t index);
 unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
 			  unsigned int nr_entries, struct page **entries,
 			  pgoff_t *indices);
diff --git a/mm/filemap.c b/mm/filemap.c
index 36067cf7f8c5..5a87e1bdddd6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1547,19 +1547,19 @@  EXPORT_SYMBOL(page_cache_prev_miss);
 /**
  * find_get_entry - find and get a page cache entry
  * @mapping: the address_space to search
- * @offset: the page cache index
+ * @index: The page cache index.
  *
  * Looks up the page cache slot at @mapping & @offset.  If there is a
- * page cache page, it is returned with an increased refcount.
+ * page cache page, the head page is returned with an increased refcount.
  *
  * If the slot holds a shadow entry of a previously evicted page, or a
  * swap entry from shmem/tmpfs, it is returned.
  *
- * Return: the found page or shadow entry, %NULL if nothing is found.
+ * Return: The head page or shadow entry, %NULL if nothing is found.
  */
-struct page *find_get_entry(struct address_space *mapping, pgoff_t offset)
+struct page *find_get_entry(struct address_space *mapping, pgoff_t index)
 {
-	XA_STATE(xas, &mapping->i_pages, offset);
+	XA_STATE(xas, &mapping->i_pages, index);
 	struct page *page;
 
 	rcu_read_lock();
@@ -1587,7 +1587,6 @@  struct page *find_get_entry(struct address_space *mapping, pgoff_t offset)
 		put_page(page);
 		goto repeat;
 	}
-	page = find_subpage(page, offset);
 out:
 	rcu_read_unlock();
 
@@ -1624,6 +1623,7 @@  struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset)
 			put_page(page);
 			goto repeat;
 		}
+		page = find_subpage(page, offset);
 		VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);
 	}
 	return page;
@@ -1670,6 +1670,7 @@  struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
 		page = NULL;
 	if (!page)
 		goto no_page;
+	page = find_subpage(page, index);
 
 	if (fgp_flags & FGP_LOCK) {
 		if (fgp_flags & FGP_NOWAIT) {