diff mbox series

[v2,06/28] gup: Fix some contiguous memmap assumptions

Message ID 20220110042406.499429-7-willy@infradead.org (mailing list archive)
State New
Headers show
Series Convert GUP to folios | expand

Commit Message

Matthew Wilcox (Oracle) Jan. 10, 2022, 4:23 a.m. UTC
Several functions in gup.c assume that a compound page has virtually
contiguous page structs.  This isn't true for SPARSEMEM configs unless
SPARSEMEM_VMEMMAP is also set.  Fix them by using nth_page() instead of
plain pointer arithmetic.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

Comments

Christoph Hellwig Jan. 10, 2022, 8:29 a.m. UTC | #1
On Mon, Jan 10, 2022 at 04:23:44AM +0000, Matthew Wilcox (Oracle) wrote:
> Several functions in gup.c assume that a compound page has virtually
> contiguous page structs.  This isn't true for SPARSEMEM configs unless
> SPARSEMEM_VMEMMAP is also set.  Fix them by using nth_page() instead of
> plain pointer arithmetic.

So is this an actualy bug that need a Fixes tag, or do all architectures
that support THP and sparsemem use SPARSEMEM_VMEMMAP?

> +	page = nth_page(head, (addr & (sz-1)) >> PAGE_SHIFT);

Would be nice to fix the indeation for sz - 1 while you're at it.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Matthew Wilcox (Oracle) Jan. 10, 2022, 1:37 p.m. UTC | #2
On Mon, Jan 10, 2022 at 12:29:58AM -0800, Christoph Hellwig wrote:
> On Mon, Jan 10, 2022 at 04:23:44AM +0000, Matthew Wilcox (Oracle) wrote:
> > Several functions in gup.c assume that a compound page has virtually
> > contiguous page structs.  This isn't true for SPARSEMEM configs unless
> > SPARSEMEM_VMEMMAP is also set.  Fix them by using nth_page() instead of
> > plain pointer arithmetic.
> 
> So is this an actualy bug that need a Fixes tag, or do all architectures
> that support THP and sparsemem use SPARSEMEM_VMEMMAP?

As far as I can tell (and I am by no means an expert in this area),
this problem only affects pages of order MAX_ORDER or higher.  That is,
somebody using regular 2MB hugepages on x86 won't see a problem, whether
they're using VMEMMAP or not.  It only starts to become a problem for
1GB hugepages.

Since THPs are (currently) only allocated from the page allocator, it's
never a problem for THPs, only hugetlbfs.  Correcting the places which
can't see a 1GB page is just defense against copy-and-paste programming.

So I'll defer to Mike -- does this ever affect real systems and thus
warrant a backport?  I know this doesn't affect UEK because we enable
SPARSEMEM_VMEMMAP.

> > +	page = nth_page(head, (addr & (sz-1)) >> PAGE_SHIFT);
> 
> Would be nice to fix the indeation for sz - 1 while you're at it.

Done.
Mike Kravetz Jan. 10, 2022, 7:05 p.m. UTC | #3
On 1/10/22 05:37, Matthew Wilcox wrote:
> On Mon, Jan 10, 2022 at 12:29:58AM -0800, Christoph Hellwig wrote:
>> On Mon, Jan 10, 2022 at 04:23:44AM +0000, Matthew Wilcox (Oracle) wrote:
>>> Several functions in gup.c assume that a compound page has virtually
>>> contiguous page structs.  This isn't true for SPARSEMEM configs unless
>>> SPARSEMEM_VMEMMAP is also set.  Fix them by using nth_page() instead of
>>> plain pointer arithmetic.
>>
>> So is this an actualy bug that need a Fixes tag, or do all architectures
>> that support THP and sparsemem use SPARSEMEM_VMEMMAP?
> 
> As far as I can tell (and I am by no means an expert in this area),
> this problem only affects pages of order MAX_ORDER or higher.  That is,
> somebody using regular 2MB hugepages on x86 won't see a problem, whether
> they're using VMEMMAP or not.  It only starts to become a problem for
> 1GB hugepages.
> 
> Since THPs are (currently) only allocated from the page allocator, it's
> never a problem for THPs, only hugetlbfs.  Correcting the places which
> can't see a 1GB page is just defense against copy-and-paste programming.
> 
> So I'll defer to Mike -- does this ever affect real systems and thus
> warrant a backport?  I know this doesn't affect UEK because we enable
> SPARSEMEM_VMEMMAP.

I guess it all depends on your definition of 'real' systems.  I am unaware
of any distros that disable SPARSEMEM_VMEMMAP, but I do not know or have
access to them all.

In arch specific Kconfig files, SPARSEMEM_VMEMMAP is enabled by default
(if sparsemem is enabled).  However, it is 'possible' to configure a kernel
with SPARSEMEM and without SPARSEMEM_VMEMMAP.

This issue came up almost a year ago in this thread:
https://lore.kernel.org/linux-mm/20210217184926.33567-1-mike.kravetz@oracle.com/

In practice, I do not recall ever seeing this outside of debug environments
specifically trying to hit the issue.
John Hubbard Jan. 11, 2022, 1:47 a.m. UTC | #4
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Several functions in gup.c assume that a compound page has virtually
> contiguous page structs.  This isn't true for SPARSEMEM configs unless
> SPARSEMEM_VMEMMAP is also set.  Fix them by using nth_page() instead of
> plain pointer arithmetic.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
diff mbox series

Patch

diff --git a/mm/gup.c b/mm/gup.c
index 8a0ea220ced1..9c0a702a4e03 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -235,7 +235,7 @@  static inline struct page *compound_range_next(unsigned long i,
 	struct page *next, *page;
 	unsigned int nr = 1;
 
-	next = start + i;
+	next = nth_page(start, i);
 	page = compound_head(next);
 	if (PageHead(page))
 		nr = min_t(unsigned int,
@@ -2430,8 +2430,8 @@  static int record_subpages(struct page *page, unsigned long addr,
 {
 	int nr;
 
-	for (nr = 0; addr != end; addr += PAGE_SIZE)
-		pages[nr++] = page++;
+	for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
+		pages[nr] = nth_page(page, nr);
 
 	return nr;
 }
@@ -2466,7 +2466,7 @@  static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
 	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 
 	head = pte_page(pte);
-	page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
+	page = nth_page(head, (addr & (sz-1)) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
 	head = try_grab_compound_head(head, refs, flags);
@@ -2526,7 +2526,7 @@  static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 					     pages, nr);
 	}
 
-	page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
 	head = try_grab_compound_head(pmd_page(orig), refs, flags);
@@ -2560,7 +2560,7 @@  static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 					     pages, nr);
 	}
 
-	page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
 	head = try_grab_compound_head(pud_page(orig), refs, flags);
@@ -2589,7 +2589,7 @@  static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 
 	BUILD_BUG_ON(pgd_devmap(orig));
 
-	page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
+	page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
 	head = try_grab_compound_head(pgd_page(orig), refs, flags);