Message ID | 20240603140745.83880-1-ioworker0@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2,1/1] mm/mlock: implement folio_mlock_step() using folio_pte_batch() | expand |
On Mon, Jun 03, 2024 at 10:07:45PM +0800, Lance Yang wrote: > +++ b/mm/mlock.c > @@ -307,26 +307,15 @@ void munlock_folio(struct folio *folio) > static inline unsigned int folio_mlock_step(struct folio *folio, > pte_t *pte, unsigned long addr, unsigned long end) > { > - unsigned int count, i, nr = folio_nr_pages(folio); > - unsigned long pfn = folio_pfn(folio); > + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; > + unsigned int count = (end - addr) >> PAGE_SHIFT; This is a pre-existing bug, but ... what happens if you're on a 64-bit system and you mlock() a range that is exactly 2^44 bytes? Seems to me that count becomes 0. Why not use an unsigned long here and avoid the problem entirely? folio_pte_batch() also needs to take an unsigned long max_nr in that case, because you aren't restricting it to folio_nr_pages().
On Mon, Jun 3, 2024 at 10:43 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Mon, Jun 03, 2024 at 10:07:45PM +0800, Lance Yang wrote: > > +++ b/mm/mlock.c > > @@ -307,26 +307,15 @@ void munlock_folio(struct folio *folio) > > static inline unsigned int folio_mlock_step(struct folio *folio, > > pte_t *pte, unsigned long addr, unsigned long end) > > { > > - unsigned int count, i, nr = folio_nr_pages(folio); > > - unsigned long pfn = folio_pfn(folio); > > + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; > > + unsigned int count = (end - addr) >> PAGE_SHIFT; > > This is a pre-existing bug, but ... what happens if you're on a 64-bit > system and you mlock() a range that is exactly 2^44 bytes? Seems to me > that count becomes 0. Why not use an unsigned long here and avoid the > problem entirely? Good catch! Thanks for pointing that out! Let's use an unsigned long here instead to avoid the problem entirely :) Thanks, Lance > > folio_pte_batch() also needs to take an unsigned long max_nr in that > case, because you aren't restricting it to folio_nr_pages(). >
On 03.06.24 16:43, Matthew Wilcox wrote: > On Mon, Jun 03, 2024 at 10:07:45PM +0800, Lance Yang wrote: >> +++ b/mm/mlock.c >> @@ -307,26 +307,15 @@ void munlock_folio(struct folio *folio) >> static inline unsigned int folio_mlock_step(struct folio *folio, >> pte_t *pte, unsigned long addr, unsigned long end) >> { >> - unsigned int count, i, nr = folio_nr_pages(folio); >> - unsigned long pfn = folio_pfn(folio); >> + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; >> + unsigned int count = (end - addr) >> PAGE_SHIFT; > > This is a pre-existing bug, but ... what happens if you're on a 64-bit > system and you mlock() a range that is exactly 2^44 bytes? Seems to me > that count becomes 0. Why not use an unsigned long here and avoid the > problem entirely? > > folio_pte_batch() also needs to take an unsigned long max_nr in that > case, because you aren't restricting it to folio_nr_pages(). Yeah, likely we should also take a look at other folio_pte_batch() users like copy_present_ptes() that pass the count as an int. Nothing should really be broken, but we might not batch as much as we could, which is unfortunate.
On Mon, Jun 03, 2024 at 04:56:05PM +0200, David Hildenbrand wrote: > On 03.06.24 16:43, Matthew Wilcox wrote: > > On Mon, Jun 03, 2024 at 10:07:45PM +0800, Lance Yang wrote: > > > +++ b/mm/mlock.c > > > @@ -307,26 +307,15 @@ void munlock_folio(struct folio *folio) > > > static inline unsigned int folio_mlock_step(struct folio *folio, > > > pte_t *pte, unsigned long addr, unsigned long end) > > > { > > > - unsigned int count, i, nr = folio_nr_pages(folio); > > > - unsigned long pfn = folio_pfn(folio); > > > + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; > > > + unsigned int count = (end - addr) >> PAGE_SHIFT; > > > > This is a pre-existing bug, but ... what happens if you're on a 64-bit > > system and you mlock() a range that is exactly 2^44 bytes? Seems to me > > that count becomes 0. Why not use an unsigned long here and avoid the > > problem entirely? > > > > folio_pte_batch() also needs to take an unsigned long max_nr in that > > case, because you aren't restricting it to folio_nr_pages(). > > Yeah, likely we should also take a look at other folio_pte_batch() users > like copy_present_ptes() that pass the count as an int. Nothing should > really be broken, but we might not batch as much as we could, which is > unfortunate. You did include: VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio); so at the least we have a userspace-triggerable warning.
On Mon, Jun 3, 2024 at 10:56 PM David Hildenbrand <david@redhat.com> wrote: > > On 03.06.24 16:43, Matthew Wilcox wrote: > > On Mon, Jun 03, 2024 at 10:07:45PM +0800, Lance Yang wrote: > >> +++ b/mm/mlock.c > >> @@ -307,26 +307,15 @@ void munlock_folio(struct folio *folio) > >> static inline unsigned int folio_mlock_step(struct folio *folio, > >> pte_t *pte, unsigned long addr, unsigned long end) > >> { > >> - unsigned int count, i, nr = folio_nr_pages(folio); > >> - unsigned long pfn = folio_pfn(folio); > >> + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; > >> + unsigned int count = (end - addr) >> PAGE_SHIFT; > > > > This is a pre-existing bug, but ... what happens if you're on a 64-bit > > system and you mlock() a range that is exactly 2^44 bytes? Seems to me > > that count becomes 0. Why not use an unsigned long here and avoid the > > problem entirely? > > > > folio_pte_batch() also needs to take an unsigned long max_nr in that > > case, because you aren't restricting it to folio_nr_pages(). > > Yeah, likely we should also take a look at other folio_pte_batch() users > like copy_present_ptes() that pass the count as an int. Nothing should > really be broken, but we might not batch as much as we could, which is > unfortunate. Could I change folio_pte_batch() to take an unsigned long max_nr? Thanks, Lance > > -- > Cheers, > > David / dhildenb >
On 03.06.24 17:01, Matthew Wilcox wrote: > On Mon, Jun 03, 2024 at 04:56:05PM +0200, David Hildenbrand wrote: >> On 03.06.24 16:43, Matthew Wilcox wrote: >>> On Mon, Jun 03, 2024 at 10:07:45PM +0800, Lance Yang wrote: >>>> +++ b/mm/mlock.c >>>> @@ -307,26 +307,15 @@ void munlock_folio(struct folio *folio) >>>> static inline unsigned int folio_mlock_step(struct folio *folio, >>>> pte_t *pte, unsigned long addr, unsigned long end) >>>> { >>>> - unsigned int count, i, nr = folio_nr_pages(folio); >>>> - unsigned long pfn = folio_pfn(folio); >>>> + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; >>>> + unsigned int count = (end - addr) >> PAGE_SHIFT; >>> >>> This is a pre-existing bug, but ... what happens if you're on a 64-bit >>> system and you mlock() a range that is exactly 2^44 bytes? Seems to me >>> that count becomes 0. Why not use an unsigned long here and avoid the >>> problem entirely? >>> >>> folio_pte_batch() also needs to take an unsigned long max_nr in that >>> case, because you aren't restricting it to folio_nr_pages(). >> >> Yeah, likely we should also take a look at other folio_pte_batch() users >> like copy_present_ptes() that pass the count as an int. Nothing should >> really be broken, but we might not batch as much as we could, which is >> unfortunate. > > You did include: > > VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio); > > so at the least we have a userspace-triggerable warning. Yes, and max_nr == 0 would likely not be healthy to the system. But For copy_pte_range(), zap_pte_range() and the madvise users, we should always have: next = pmd_addr_end(addr, end); and use "next" as the actual "end" -- not the VMA end. So "end - addr" = "next - addr" should never exceed a single PMD size. mlock_pte_range() is also called from walk_page_range(), which uses next = pmd_addr_end(addr, end); So likely exceeding PMD size is not possible here and all is working as expected. Will double check later.
On 03.06.24 17:08, Lance Yang wrote: > On Mon, Jun 3, 2024 at 10:56 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 03.06.24 16:43, Matthew Wilcox wrote: >>> On Mon, Jun 03, 2024 at 10:07:45PM +0800, Lance Yang wrote: >>>> +++ b/mm/mlock.c >>>> @@ -307,26 +307,15 @@ void munlock_folio(struct folio *folio) >>>> static inline unsigned int folio_mlock_step(struct folio *folio, >>>> pte_t *pte, unsigned long addr, unsigned long end) >>>> { >>>> - unsigned int count, i, nr = folio_nr_pages(folio); >>>> - unsigned long pfn = folio_pfn(folio); >>>> + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; >>>> + unsigned int count = (end - addr) >> PAGE_SHIFT; >>> >>> This is a pre-existing bug, but ... what happens if you're on a 64-bit >>> system and you mlock() a range that is exactly 2^44 bytes? Seems to me >>> that count becomes 0. Why not use an unsigned long here and avoid the >>> problem entirely? >>> >>> folio_pte_batch() also needs to take an unsigned long max_nr in that >>> case, because you aren't restricting it to folio_nr_pages(). >> >> Yeah, likely we should also take a look at other folio_pte_batch() users >> like copy_present_ptes() that pass the count as an int. Nothing should >> really be broken, but we might not batch as much as we could, which is >> unfortunate. > > Could I change folio_pte_batch() to take an unsigned long max_nr? It might be more future proof; see my other mail, I think currently all is fine, because "end" is not the end of the VMA but the end of the PMD. Please double-check.
On Mon, Jun 3, 2024 at 11:26 PM David Hildenbrand <david@redhat.com> wrote: > > On 03.06.24 17:01, Matthew Wilcox wrote: > > On Mon, Jun 03, 2024 at 04:56:05PM +0200, David Hildenbrand wrote: > >> On 03.06.24 16:43, Matthew Wilcox wrote: > >>> On Mon, Jun 03, 2024 at 10:07:45PM +0800, Lance Yang wrote: > >>>> +++ b/mm/mlock.c > >>>> @@ -307,26 +307,15 @@ void munlock_folio(struct folio *folio) > >>>> static inline unsigned int folio_mlock_step(struct folio *folio, > >>>> pte_t *pte, unsigned long addr, unsigned long end) > >>>> { > >>>> - unsigned int count, i, nr = folio_nr_pages(folio); > >>>> - unsigned long pfn = folio_pfn(folio); > >>>> + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; > >>>> + unsigned int count = (end - addr) >> PAGE_SHIFT; > >>> > >>> This is a pre-existing bug, but ... what happens if you're on a 64-bit > >>> system and you mlock() a range that is exactly 2^44 bytes? Seems to me > >>> that count becomes 0. Why not use an unsigned long here and avoid the > >>> problem entirely? > >>> > >>> folio_pte_batch() also needs to take an unsigned long max_nr in that > >>> case, because you aren't restricting it to folio_nr_pages(). > >> > >> Yeah, likely we should also take a look at other folio_pte_batch() users > >> like copy_present_ptes() that pass the count as an int. Nothing should > >> really be broken, but we might not batch as much as we could, which is > >> unfortunate. > > > > You did include: > > > > VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio); > > > > so at the least we have a userspace-triggerable warning. > > Yes, and max_nr == 0 would likely not be healthy to the system. > > But > > For copy_pte_range(), zap_pte_range() and the madvise users, we should > always have: > next = pmd_addr_end(addr, end); > > and use "next" as the actual "end" -- not the VMA end. So "end - addr" = > "next - addr" should never exceed a single PMD size. > > > mlock_pte_range() is also called from walk_page_range(), which uses > next = pmd_addr_end(addr, end); > > So likely exceeding PMD size is not possible here and all is working as > expected. Thanks for clarifying! I agree that currently all is fine, so perhaps we don't worry about that :) > > Will double check later. I did a double-check and you're correct. Thanks, Lance > > -- > Cheers, > > David / dhildenb >
On Tue, Jun 4, 2024 at 3:46 AM Lance Yang <ioworker0@gmail.com> wrote: > > On Mon, Jun 3, 2024 at 11:26 PM David Hildenbrand <david@redhat.com> wrote: > > > > On 03.06.24 17:01, Matthew Wilcox wrote: > > > On Mon, Jun 03, 2024 at 04:56:05PM +0200, David Hildenbrand wrote: > > >> On 03.06.24 16:43, Matthew Wilcox wrote: > > >>> On Mon, Jun 03, 2024 at 10:07:45PM +0800, Lance Yang wrote: > > >>>> +++ b/mm/mlock.c > > >>>> @@ -307,26 +307,15 @@ void munlock_folio(struct folio *folio) > > >>>> static inline unsigned int folio_mlock_step(struct folio *folio, > > >>>> pte_t *pte, unsigned long addr, unsigned long end) > > >>>> { > > >>>> - unsigned int count, i, nr = folio_nr_pages(folio); > > >>>> - unsigned long pfn = folio_pfn(folio); > > >>>> + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; > > >>>> + unsigned int count = (end - addr) >> PAGE_SHIFT; > > >>> > > >>> This is a pre-existing bug, but ... what happens if you're on a 64-bit > > >>> system and you mlock() a range that is exactly 2^44 bytes? Seems to me > > >>> that count becomes 0. Why not use an unsigned long here and avoid the > > >>> problem entirely? > > >>> > > >>> folio_pte_batch() also needs to take an unsigned long max_nr in that > > >>> case, because you aren't restricting it to folio_nr_pages(). > > >> > > >> Yeah, likely we should also take a look at other folio_pte_batch() users > > >> like copy_present_ptes() that pass the count as an int. Nothing should > > >> really be broken, but we might not batch as much as we could, which is > > >> unfortunate. > > > > > > You did include: > > > > > > VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio); > > > > > > so at the least we have a userspace-triggerable warning. > > > > Yes, and max_nr == 0 would likely not be healthy to the system. > > > > But > > > > For copy_pte_range(), zap_pte_range() and the madvise users, we should > > always have: > > next = pmd_addr_end(addr, end); > > > > and use "next" as the actual "end" -- not the VMA end. So "end - addr" = > > "next - addr" should never exceed a single PMD size. > > > > > > mlock_pte_range() is also called from walk_page_range(), which uses > > next = pmd_addr_end(addr, end); > > > > So likely exceeding PMD size is not possible here and all is working as > > expected. > > Thanks for clarifying! > > I agree that currently all is fine, so perhaps we don't worry about that :) I agree with this point. These functions are all scanning PTEs under a PMD. Any value exceeding the PTE entries of one PMD has been a bug of callers but not the callee. > > > > > Will double check later. > > I did a double-check and you're correct. > > Thanks, > Lance > > > > > -- > > Cheers, > > > > David / dhildenb > > Thanks Barry
diff --git a/mm/mlock.c b/mm/mlock.c index 30b51cdea89d..52d6e401ad67 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -307,26 +307,15 @@ void munlock_folio(struct folio *folio) static inline unsigned int folio_mlock_step(struct folio *folio, pte_t *pte, unsigned long addr, unsigned long end) { - unsigned int count, i, nr = folio_nr_pages(folio); - unsigned long pfn = folio_pfn(folio); + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; + unsigned int count = (end - addr) >> PAGE_SHIFT; pte_t ptent = ptep_get(pte); if (!folio_test_large(folio)) return 1; - count = pfn + nr - pte_pfn(ptent); - count = min_t(unsigned int, count, (end - addr) >> PAGE_SHIFT); - - for (i = 0; i < count; i++, pte++) { - pte_t entry = ptep_get(pte); - - if (!pte_present(entry)) - break; - if (pte_pfn(entry) - pfn >= nr) - break; - } - - return i; + return folio_pte_batch(folio, addr, pte, ptent, count, fpb_flags, NULL, + NULL, NULL); } static inline bool allow_mlock_munlock(struct folio *folio,