diff mbox series

[1/1] mm/memory: Fix boundary check for next PFN in folio_pte_batch()

Message ID 20240227070418.62292-1-ioworker0@gmail.com (mailing list archive)
State New
Headers show
Series [1/1] mm/memory: Fix boundary check for next PFN in folio_pte_batch() | expand

Commit Message

Lance Yang Feb. 27, 2024, 7:04 a.m. UTC
Previously, in folio_pte_batch(), only the upper boundary of the
folio was checked using '>=' for comparison. This led to
incorrect behavior when the next PFN exceeded the lower boundary
of the folio, especially in corner cases where the next PFN might
fall into a different folio.

Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
 mm/memory.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

David Hildenbrand Feb. 27, 2024, 7:30 a.m. UTC | #1
On 27.02.24 08:04, Lance Yang wrote:
> Previously, in folio_pte_batch(), only the upper boundary of the
> folio was checked using '>=' for comparison. This led to
> incorrect behavior when the next PFN exceeded the lower boundary
> of the folio, especially in corner cases where the next PFN might
> fall into a different folio.

Which commit does this fix?

The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is 
already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's 
changes introduced a problem.

BUT

I don't see what is broken. :)

Can you please give an example/reproducer?

We know that the first PTE maps the folio. By incrementing the PFN using 
pte_next_pfn/pte_advance_pfn, we cannot suddenly get a lower PFN.

So how would pte_advance_pfn(folio_start_pfn + X) suddenly give us a PFN 
lower than folio_start_pfn?

Note that we are not really concerned about any kind of 
pte_advance_pfn() overflow that could generate PFN=0. I convinces myself 
that that that is something we don't have to worry about.


[I also thought about getting rid of the pte_pfn(pte) >= folio_end_pfn 
and instead limiting end_ptep. But that requires more work before the 
loop and feels more like a micro-optimization.]

> 
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> ---
>   mm/memory.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 642b4f2be523..e5291d1e8c37 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -986,12 +986,15 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
>   		pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
>   		bool *any_writable)
>   {
> -	unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio);
> +	unsigned long folio_start_pfn, folio_end_pfn;
>   	const pte_t *end_ptep = start_ptep + max_nr;
>   	pte_t expected_pte, *ptep;
>   	bool writable;
>   	int nr;
>   
> +	folio_start_pfn = folio_pfn(folio);
> +	folio_end_pfn = folio_start_pfn + folio_nr_pages(folio);
> +
>   	if (any_writable)
>   		*any_writable = false;
>   
> @@ -1015,7 +1018,7 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
>   		 * corner cases the next PFN might fall into a different
>   		 * folio.
>   		 */
> -		if (pte_pfn(pte) >= folio_end_pfn)
> +		if (pte_pfn(pte) >= folio_end_pfn || pte_pfn(pte) < folio_start_pfn)
>   			break;
>   
>   		if (any_writable)
Lance Yang Feb. 27, 2024, 8:23 a.m. UTC | #2
Hey David,

Thanks for taking time to review!

On Tue, Feb 27, 2024 at 3:30 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 27.02.24 08:04, Lance Yang wrote:
> > Previously, in folio_pte_batch(), only the upper boundary of the
> > folio was checked using '>=' for comparison. This led to
> > incorrect behavior when the next PFN exceeded the lower boundary
> > of the folio, especially in corner cases where the next PFN might
> > fall into a different folio.
>
> Which commit does this fix?
>
> The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is
> already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's
> changes introduced a problem.
>
> BUT
>
> I don't see what is broken. :)
>
> Can you please give an example/reproducer?

For example1:

PTE0 is present for large folio1.
PTE1 is present for large folio1.
PTE2 is present for large folio1.
PTE3 is present for large folio1.

folio_nr_pages(folio1) is 4.
folio_nr_pages(folio2) is 4.

pte = *start_ptep = PTE0;
max_nr = folio_nr_pages(folio2);

If folio_pfn(folio1) < folio_pfn(folio2),
the return value of folio_pte_batch(folio2, start_ptep, pte, max_nr)
will be 4(Actually it should be 0).

For example2:

PTE0 is present for large folio2.
PTE1 is present for large folio1.
PTE2 is present for large folio1.
PTE3 is present for large folio1.

folio_nr_pages(folio1) is 4.
folio_nr_pages(folio2) is 4.

pte = *start_ptep = PTE0;
max_nr = folio_nr_pages(folio1);

If max_nr=4, the return value of folio_pte_batch(folio1, start_ptep,
pte, max_nr)
will be 1(Actually it should be 0).

folio_pte_batch() will soon be exported, and IMO, these corner cases may need
to be handled.

Thanks,
Lance

>
> We know that the first PTE maps the folio. By incrementing the PFN using
> pte_next_pfn/pte_advance_pfn, we cannot suddenly get a lower PFN.
>
> So how would pte_advance_pfn(folio_start_pfn + X) suddenly give us a PFN
> lower than folio_start_pfn?
>
> Note that we are not really concerned about any kind of
> pte_advance_pfn() overflow that could generate PFN=0. I convinces myself
> that that that is something we don't have to worry about.
>
>
> [I also thought about getting rid of the pte_pfn(pte) >= folio_end_pfn
> and instead limiting end_ptep. But that requires more work before the
> loop and feels more like a micro-optimization.]
>
> >
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
> > ---
> >   mm/memory.c | 7 +++++--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 642b4f2be523..e5291d1e8c37 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -986,12 +986,15 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
> >               pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
> >               bool *any_writable)
> >   {
> > -     unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio);
> > +     unsigned long folio_start_pfn, folio_end_pfn;
> >       const pte_t *end_ptep = start_ptep + max_nr;
> >       pte_t expected_pte, *ptep;
> >       bool writable;
> >       int nr;
> >
> > +     folio_start_pfn = folio_pfn(folio);
> > +     folio_end_pfn = folio_start_pfn + folio_nr_pages(folio);
> > +
> >       if (any_writable)
> >               *any_writable = false;
> >
> > @@ -1015,7 +1018,7 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
> >                * corner cases the next PFN might fall into a different
> >                * folio.
> >                */
> > -             if (pte_pfn(pte) >= folio_end_pfn)
> > +             if (pte_pfn(pte) >= folio_end_pfn || pte_pfn(pte) < folio_start_pfn)
> >                       break;
> >
> >               if (any_writable)
>
> --
> Cheers,
>
> David / dhildenb
>
David Hildenbrand Feb. 27, 2024, 8:33 a.m. UTC | #3
On 27.02.24 09:23, Lance Yang wrote:
> Hey David,
> 
> Thanks for taking time to review!
> 
> On Tue, Feb 27, 2024 at 3:30 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 27.02.24 08:04, Lance Yang wrote:
>>> Previously, in folio_pte_batch(), only the upper boundary of the
>>> folio was checked using '>=' for comparison. This led to
>>> incorrect behavior when the next PFN exceeded the lower boundary
>>> of the folio, especially in corner cases where the next PFN might
>>> fall into a different folio.
>>
>> Which commit does this fix?
>>
>> The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is
>> already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's
>> changes introduced a problem.
>>
>> BUT
>>
>> I don't see what is broken. :)
>>
>> Can you please give an example/reproducer?
> 
> For example1:
> 
> PTE0 is present for large folio1.
> PTE1 is present for large folio1.
> PTE2 is present for large folio1.
> PTE3 is present for large folio1.
> 
> folio_nr_pages(folio1) is 4.
> folio_nr_pages(folio2) is 4.
> 
> pte = *start_ptep = PTE0;
> max_nr = folio_nr_pages(folio2);
> 
> If folio_pfn(folio1) < folio_pfn(folio2),
> the return value of folio_pte_batch(folio2, start_ptep, pte, max_nr)
> will be 4(Actually it should be 0).
> 
> For example2:
> 
> PTE0 is present for large folio2.
> PTE1 is present for large folio1.
> PTE2 is present for large folio1.
> PTE3 is present for large folio1.
> 
> folio_nr_pages(folio1) is 4.
> folio_nr_pages(folio2) is 4.
> 
> pte = *start_ptep = PTE0;
> max_nr = folio_nr_pages(folio1);
> 

In both cases, start_ptep does not map the folio.

It's a BUG in your caller unless I am missing something important.


> If max_nr=4, the return value of folio_pte_batch(folio1, start_ptep,
> pte, max_nr)
> will be 1(Actually it should be 0).
> 
> folio_pte_batch() will soon be exported, and IMO, these corner cases may need
> to be handled.

No, you should fix your caller.

The function cannot possibly do something reasonable if start_ptep does 
not map the folio.

nr = pte_batch_hint(start_ptep, pte);
...
ptep = start_ptep + nr; /* nr is >= 1 */
...
return min(ptep - start_ptep, max_nr); /* will return something > 0 */

Which would return > 0 for something that does not map that folio.


I was trying to avoid official kernel docs for this internal helper, 
maybe we have to improve it now.
Lance Yang Feb. 27, 2024, 8:45 a.m. UTC | #4
On Tue, Feb 27, 2024 at 4:33 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 27.02.24 09:23, Lance Yang wrote:
> > Hey David,
> >
> > Thanks for taking time to review!
> >
> > On Tue, Feb 27, 2024 at 3:30 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 27.02.24 08:04, Lance Yang wrote:
> >>> Previously, in folio_pte_batch(), only the upper boundary of the
> >>> folio was checked using '>=' for comparison. This led to
> >>> incorrect behavior when the next PFN exceeded the lower boundary
> >>> of the folio, especially in corner cases where the next PFN might
> >>> fall into a different folio.
> >>
> >> Which commit does this fix?
> >>
> >> The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is
> >> already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's
> >> changes introduced a problem.
> >>
> >> BUT
> >>
> >> I don't see what is broken. :)
> >>
> >> Can you please give an example/reproducer?
> >
> > For example1:
> >
> > PTE0 is present for large folio1.
> > PTE1 is present for large folio1.
> > PTE2 is present for large folio1.
> > PTE3 is present for large folio1.
> >
> > folio_nr_pages(folio1) is 4.
> > folio_nr_pages(folio2) is 4.
> >
> > pte = *start_ptep = PTE0;
> > max_nr = folio_nr_pages(folio2);
> >
> > If folio_pfn(folio1) < folio_pfn(folio2),
> > the return value of folio_pte_batch(folio2, start_ptep, pte, max_nr)
> > will be 4(Actually it should be 0).
> >
> > For example2:
> >
> > PTE0 is present for large folio2.
> > PTE1 is present for large folio1.
> > PTE2 is present for large folio1.
> > PTE3 is present for large folio1.
> >
> > folio_nr_pages(folio1) is 4.
> > folio_nr_pages(folio2) is 4.
> >
> > pte = *start_ptep = PTE0;
> > max_nr = folio_nr_pages(folio1);
> >
>
> In both cases, start_ptep does not map the folio.
>
> It's a BUG in your caller unless I am missing something important.

Sorry, I understood.

Thanks for your clarification!
Lance

>
>
> > If max_nr=4, the return value of folio_pte_batch(folio1, start_ptep,
> > pte, max_nr)
> > will be 1(Actually it should be 0).
> >
> > folio_pte_batch() will soon be exported, and IMO, these corner cases may need
> > to be handled.
>
> No, you should fix your caller.
>
> The function cannot possibly do something reasonable if start_ptep does
> not map the folio.
>
> nr = pte_batch_hint(start_ptep, pte);
> ...
> ptep = start_ptep + nr; /* nr is >= 1 */
> ...
> return min(ptep - start_ptep, max_nr); /* will return something > 0 */
>
> Which would return > 0 for something that does not map that folio.
>
>
> I was trying to avoid official kernel docs for this internal helper,
> maybe we have to improve it now.
>
> --
> Cheers,
>
> David / dhildenb
>
David Hildenbrand Feb. 27, 2024, 8:46 a.m. UTC | #5
On 27.02.24 09:45, Lance Yang wrote:
> On Tue, Feb 27, 2024 at 4:33 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 27.02.24 09:23, Lance Yang wrote:
>>> Hey David,
>>>
>>> Thanks for taking time to review!
>>>
>>> On Tue, Feb 27, 2024 at 3:30 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 27.02.24 08:04, Lance Yang wrote:
>>>>> Previously, in folio_pte_batch(), only the upper boundary of the
>>>>> folio was checked using '>=' for comparison. This led to
>>>>> incorrect behavior when the next PFN exceeded the lower boundary
>>>>> of the folio, especially in corner cases where the next PFN might
>>>>> fall into a different folio.
>>>>
>>>> Which commit does this fix?
>>>>
>>>> The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is
>>>> already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's
>>>> changes introduced a problem.
>>>>
>>>> BUT
>>>>
>>>> I don't see what is broken. :)
>>>>
>>>> Can you please give an example/reproducer?
>>>
>>> For example1:
>>>
>>> PTE0 is present for large folio1.
>>> PTE1 is present for large folio1.
>>> PTE2 is present for large folio1.
>>> PTE3 is present for large folio1.
>>>
>>> folio_nr_pages(folio1) is 4.
>>> folio_nr_pages(folio2) is 4.
>>>
>>> pte = *start_ptep = PTE0;
>>> max_nr = folio_nr_pages(folio2);
>>>
>>> If folio_pfn(folio1) < folio_pfn(folio2),
>>> the return value of folio_pte_batch(folio2, start_ptep, pte, max_nr)
>>> will be 4(Actually it should be 0).
>>>
>>> For example2:
>>>
>>> PTE0 is present for large folio2.
>>> PTE1 is present for large folio1.
>>> PTE2 is present for large folio1.
>>> PTE3 is present for large folio1.
>>>
>>> folio_nr_pages(folio1) is 4.
>>> folio_nr_pages(folio2) is 4.
>>>
>>> pte = *start_ptep = PTE0;
>>> max_nr = folio_nr_pages(folio1);
>>>
>>
>> In both cases, start_ptep does not map the folio.
>>
>> It's a BUG in your caller unless I am missing something important.
> 
> Sorry, I understood.
> 
> Thanks for your clarification!

I'll post some kernel doc as reply to Barry's export patch to clarify that.
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 642b4f2be523..e5291d1e8c37 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -986,12 +986,15 @@  static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
 		pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
 		bool *any_writable)
 {
-	unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio);
+	unsigned long folio_start_pfn, folio_end_pfn;
 	const pte_t *end_ptep = start_ptep + max_nr;
 	pte_t expected_pte, *ptep;
 	bool writable;
 	int nr;
 
+	folio_start_pfn = folio_pfn(folio);
+	folio_end_pfn = folio_start_pfn + folio_nr_pages(folio);
+
 	if (any_writable)
 		*any_writable = false;
 
@@ -1015,7 +1018,7 @@  static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
 		 * corner cases the next PFN might fall into a different
 		 * folio.
 		 */
-		if (pte_pfn(pte) >= folio_end_pfn)
+		if (pte_pfn(pte) >= folio_end_pfn || pte_pfn(pte) < folio_start_pfn)
 			break;
 
 		if (any_writable)