Message ID | 20240227070418.62292-1-ioworker0@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/1] mm/memory: Fix boundary check for next PFN in folio_pte_batch() | expand |
On 27.02.24 08:04, Lance Yang wrote: > Previously, in folio_pte_batch(), only the upper boundary of the > folio was checked using '>=' for comparison. This led to > incorrect behavior when the next PFN exceeded the lower boundary > of the folio, especially in corner cases where the next PFN might > fall into a different folio. Which commit does this fix? The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's changes introduced a problem. BUT I don't see what is broken. :) Can you please give an example/reproducer? We know that the first PTE maps the folio. By incrementing the PFN using pte_next_pfn/pte_advance_pfn, we cannot suddenly get a lower PFN. So how would pte_advance_pfn(folio_start_pfn + X) suddenly give us a PFN lower than folio_start_pfn? Note that we are not really concerned about any kind of pte_advance_pfn() overflow that could generate PFN=0. I convinces myself that that that is something we don't have to worry about. [I also thought about getting rid of the pte_pfn(pte) >= folio_end_pfn and instead limiting end_ptep. But that requires more work before the loop and feels more like a micro-optimization.] > > Signed-off-by: Lance Yang <ioworker0@gmail.com> > --- > mm/memory.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 642b4f2be523..e5291d1e8c37 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -986,12 +986,15 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags, > bool *any_writable) > { > - unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio); > + unsigned long folio_start_pfn, folio_end_pfn; > const pte_t *end_ptep = start_ptep + max_nr; > pte_t expected_pte, *ptep; > bool writable; > int nr; > > + folio_start_pfn = folio_pfn(folio); > + folio_end_pfn = folio_start_pfn + folio_nr_pages(folio); > + > if (any_writable) > *any_writable = false; > > @@ -1015,7 +1018,7 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > * corner cases the next PFN might fall into a different > * folio. > */ > - if (pte_pfn(pte) >= folio_end_pfn) > + if (pte_pfn(pte) >= folio_end_pfn || pte_pfn(pte) < folio_start_pfn) > break; > > if (any_writable)
Hey David, Thanks for taking time to review! On Tue, Feb 27, 2024 at 3:30 PM David Hildenbrand <david@redhat.com> wrote: > > On 27.02.24 08:04, Lance Yang wrote: > > Previously, in folio_pte_batch(), only the upper boundary of the > > folio was checked using '>=' for comparison. This led to > > incorrect behavior when the next PFN exceeded the lower boundary > > of the folio, especially in corner cases where the next PFN might > > fall into a different folio. > > Which commit does this fix? > > The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is > already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's > changes introduced a problem. > > BUT > > I don't see what is broken. :) > > Can you please give an example/reproducer? For example1: PTE0 is present for large folio1. PTE1 is present for large folio1. PTE2 is present for large folio1. PTE3 is present for large folio1. folio_nr_pages(folio1) is 4. folio_nr_pages(folio2) is 4. pte = *start_ptep = PTE0; max_nr = folio_nr_pages(folio2); If folio_pfn(folio1) < folio_pfn(folio2), the return value of folio_pte_batch(folio2, start_ptep, pte, max_nr) will be 4(Actually it should be 0). For example2: PTE0 is present for large folio2. PTE1 is present for large folio1. PTE2 is present for large folio1. PTE3 is present for large folio1. folio_nr_pages(folio1) is 4. folio_nr_pages(folio2) is 4. pte = *start_ptep = PTE0; max_nr = folio_nr_pages(folio1); If max_nr=4, the return value of folio_pte_batch(folio1, start_ptep, pte, max_nr) will be 1(Actually it should be 0). folio_pte_batch() will soon be exported, and IMO, these corner cases may need to be handled. Thanks, Lance > > We know that the first PTE maps the folio. By incrementing the PFN using > pte_next_pfn/pte_advance_pfn, we cannot suddenly get a lower PFN. > > So how would pte_advance_pfn(folio_start_pfn + X) suddenly give us a PFN > lower than folio_start_pfn? > > Note that we are not really concerned about any kind of > pte_advance_pfn() overflow that could generate PFN=0. I convinces myself > that that that is something we don't have to worry about. > > > [I also thought about getting rid of the pte_pfn(pte) >= folio_end_pfn > and instead limiting end_ptep. But that requires more work before the > loop and feels more like a micro-optimization.] > > > > > Signed-off-by: Lance Yang <ioworker0@gmail.com> > > --- > > mm/memory.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 642b4f2be523..e5291d1e8c37 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -986,12 +986,15 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > > pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags, > > bool *any_writable) > > { > > - unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio); > > + unsigned long folio_start_pfn, folio_end_pfn; > > const pte_t *end_ptep = start_ptep + max_nr; > > pte_t expected_pte, *ptep; > > bool writable; > > int nr; > > > > + folio_start_pfn = folio_pfn(folio); > > + folio_end_pfn = folio_start_pfn + folio_nr_pages(folio); > > + > > if (any_writable) > > *any_writable = false; > > > > @@ -1015,7 +1018,7 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > > * corner cases the next PFN might fall into a different > > * folio. > > */ > > - if (pte_pfn(pte) >= folio_end_pfn) > > + if (pte_pfn(pte) >= folio_end_pfn || pte_pfn(pte) < folio_start_pfn) > > break; > > > > if (any_writable) > > -- > Cheers, > > David / dhildenb >
On 27.02.24 09:23, Lance Yang wrote: > Hey David, > > Thanks for taking time to review! > > On Tue, Feb 27, 2024 at 3:30 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 27.02.24 08:04, Lance Yang wrote: >>> Previously, in folio_pte_batch(), only the upper boundary of the >>> folio was checked using '>=' for comparison. This led to >>> incorrect behavior when the next PFN exceeded the lower boundary >>> of the folio, especially in corner cases where the next PFN might >>> fall into a different folio. >> >> Which commit does this fix? >> >> The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is >> already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's >> changes introduced a problem. >> >> BUT >> >> I don't see what is broken. :) >> >> Can you please give an example/reproducer? > > For example1: > > PTE0 is present for large folio1. > PTE1 is present for large folio1. > PTE2 is present for large folio1. > PTE3 is present for large folio1. > > folio_nr_pages(folio1) is 4. > folio_nr_pages(folio2) is 4. > > pte = *start_ptep = PTE0; > max_nr = folio_nr_pages(folio2); > > If folio_pfn(folio1) < folio_pfn(folio2), > the return value of folio_pte_batch(folio2, start_ptep, pte, max_nr) > will be 4(Actually it should be 0). > > For example2: > > PTE0 is present for large folio2. > PTE1 is present for large folio1. > PTE2 is present for large folio1. > PTE3 is present for large folio1. > > folio_nr_pages(folio1) is 4. > folio_nr_pages(folio2) is 4. > > pte = *start_ptep = PTE0; > max_nr = folio_nr_pages(folio1); > In both cases, start_ptep does not map the folio. It's a BUG in your caller unless I am missing something important. > If max_nr=4, the return value of folio_pte_batch(folio1, start_ptep, > pte, max_nr) > will be 1(Actually it should be 0). > > folio_pte_batch() will soon be exported, and IMO, these corner cases may need > to be handled. No, you should fix your caller. The function cannot possibly do something reasonable if start_ptep does not map the folio. nr = pte_batch_hint(start_ptep, pte); ... ptep = start_ptep + nr; /* nr is >= 1 */ ... return min(ptep - start_ptep, max_nr); /* will return something > 0 */ Which would return > 0 for something that does not map that folio. I was trying to avoid official kernel docs for this internal helper, maybe we have to improve it now.
On Tue, Feb 27, 2024 at 4:33 PM David Hildenbrand <david@redhat.com> wrote: > > On 27.02.24 09:23, Lance Yang wrote: > > Hey David, > > > > Thanks for taking time to review! > > > > On Tue, Feb 27, 2024 at 3:30 PM David Hildenbrand <david@redhat.com> wrote: > >> > >> On 27.02.24 08:04, Lance Yang wrote: > >>> Previously, in folio_pte_batch(), only the upper boundary of the > >>> folio was checked using '>=' for comparison. This led to > >>> incorrect behavior when the next PFN exceeded the lower boundary > >>> of the folio, especially in corner cases where the next PFN might > >>> fall into a different folio. > >> > >> Which commit does this fix? > >> > >> The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is > >> already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's > >> changes introduced a problem. > >> > >> BUT > >> > >> I don't see what is broken. :) > >> > >> Can you please give an example/reproducer? > > > > For example1: > > > > PTE0 is present for large folio1. > > PTE1 is present for large folio1. > > PTE2 is present for large folio1. > > PTE3 is present for large folio1. > > > > folio_nr_pages(folio1) is 4. > > folio_nr_pages(folio2) is 4. > > > > pte = *start_ptep = PTE0; > > max_nr = folio_nr_pages(folio2); > > > > If folio_pfn(folio1) < folio_pfn(folio2), > > the return value of folio_pte_batch(folio2, start_ptep, pte, max_nr) > > will be 4(Actually it should be 0). > > > > For example2: > > > > PTE0 is present for large folio2. > > PTE1 is present for large folio1. > > PTE2 is present for large folio1. > > PTE3 is present for large folio1. > > > > folio_nr_pages(folio1) is 4. > > folio_nr_pages(folio2) is 4. > > > > pte = *start_ptep = PTE0; > > max_nr = folio_nr_pages(folio1); > > > > In both cases, start_ptep does not map the folio. > > It's a BUG in your caller unless I am missing something important. Sorry, I understood. Thanks for your clarification! Lance > > > > If max_nr=4, the return value of folio_pte_batch(folio1, start_ptep, > > pte, max_nr) > > will be 1(Actually it should be 0). > > > > folio_pte_batch() will soon be exported, and IMO, these corner cases may need > > to be handled. > > No, you should fix your caller. > > The function cannot possibly do something reasonable if start_ptep does > not map the folio. > > nr = pte_batch_hint(start_ptep, pte); > ... > ptep = start_ptep + nr; /* nr is >= 1 */ > ... > return min(ptep - start_ptep, max_nr); /* will return something > 0 */ > > Which would return > 0 for something that does not map that folio. > > > I was trying to avoid official kernel docs for this internal helper, > maybe we have to improve it now. > > -- > Cheers, > > David / dhildenb >
On 27.02.24 09:45, Lance Yang wrote: > On Tue, Feb 27, 2024 at 4:33 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 27.02.24 09:23, Lance Yang wrote: >>> Hey David, >>> >>> Thanks for taking time to review! >>> >>> On Tue, Feb 27, 2024 at 3:30 PM David Hildenbrand <david@redhat.com> wrote: >>>> >>>> On 27.02.24 08:04, Lance Yang wrote: >>>>> Previously, in folio_pte_batch(), only the upper boundary of the >>>>> folio was checked using '>=' for comparison. This led to >>>>> incorrect behavior when the next PFN exceeded the lower boundary >>>>> of the folio, especially in corner cases where the next PFN might >>>>> fall into a different folio. >>>> >>>> Which commit does this fix? >>>> >>>> The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is >>>> already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's >>>> changes introduced a problem. >>>> >>>> BUT >>>> >>>> I don't see what is broken. :) >>>> >>>> Can you please give an example/reproducer? >>> >>> For example1: >>> >>> PTE0 is present for large folio1. >>> PTE1 is present for large folio1. >>> PTE2 is present for large folio1. >>> PTE3 is present for large folio1. >>> >>> folio_nr_pages(folio1) is 4. >>> folio_nr_pages(folio2) is 4. >>> >>> pte = *start_ptep = PTE0; >>> max_nr = folio_nr_pages(folio2); >>> >>> If folio_pfn(folio1) < folio_pfn(folio2), >>> the return value of folio_pte_batch(folio2, start_ptep, pte, max_nr) >>> will be 4(Actually it should be 0). >>> >>> For example2: >>> >>> PTE0 is present for large folio2. >>> PTE1 is present for large folio1. >>> PTE2 is present for large folio1. >>> PTE3 is present for large folio1. >>> >>> folio_nr_pages(folio1) is 4. >>> folio_nr_pages(folio2) is 4. >>> >>> pte = *start_ptep = PTE0; >>> max_nr = folio_nr_pages(folio1); >>> >> >> In both cases, start_ptep does not map the folio. >> >> It's a BUG in your caller unless I am missing something important. > > Sorry, I understood. > > Thanks for your clarification! I'll post some kernel doc as reply to Barry's export patch to clarify that.
diff --git a/mm/memory.c b/mm/memory.c index 642b4f2be523..e5291d1e8c37 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -986,12 +986,15 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags, bool *any_writable) { - unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio); + unsigned long folio_start_pfn, folio_end_pfn; const pte_t *end_ptep = start_ptep + max_nr; pte_t expected_pte, *ptep; bool writable; int nr; + folio_start_pfn = folio_pfn(folio); + folio_end_pfn = folio_start_pfn + folio_nr_pages(folio); + if (any_writable) *any_writable = false; @@ -1015,7 +1018,7 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, * corner cases the next PFN might fall into a different * folio. */ - if (pte_pfn(pte) >= folio_end_pfn) + if (pte_pfn(pte) >= folio_end_pfn || pte_pfn(pte) < folio_start_pfn) break; if (any_writable)
Previously, in folio_pte_batch(), only the upper boundary of the folio was checked using '>=' for comparison. This led to incorrect behavior when the next PFN exceeded the lower boundary of the folio, especially in corner cases where the next PFN might fall into a different folio. Signed-off-by: Lance Yang <ioworker0@gmail.com> --- mm/memory.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)