Message ID | 20210528005029.88088-1-almasrymina@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY | expand |
On Thu, 27 May 2021 17:50:29 -0700 Mina Almasry <almasrymina@google.com> wrote: > On UFFDIO_COPY, if we fail to copy the page contents while holding the > hugetlb_fault_mutex, we will drop the mutex and return to the caller > after allocating a page that consumed a reservation. In this case there > may be a fault that double consumes the reservation. To handle this, we > free the allocated page, fix the reservations, and allocate a temporary > hugetlb page and return that to the caller. When the caller does the > copy outside of the lock, we again check the cache, and allocate a page > consuming the reservation, and copy over the contents. > > Test: > Hacked the code locally such that resv_huge_pages underflows produce > a warning and the copy_huge_page_from_user() always fails, then: > > ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 > 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success > ./tools/testing/selftests/vm/userfaultfd hugetlb 10 > 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success > > Both tests succeed and produce no warnings. After the > test runs number of free/resv hugepages is correct. Many conflicts here with material that is queued for 5.14-rc1. How serious is this problem? Is a -stable backport warranted? If we decide to get this into 5.13 (and perhaps -stable) then I can take a look at reworking all the 5.14 material on top. If not very serious then we could rework this on top of the already queued material.
On Mon, May 31, 2021 at 4:25 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 27 May 2021 17:50:29 -0700 Mina Almasry <almasrymina@google.com> wrote: > > > On UFFDIO_COPY, if we fail to copy the page contents while holding the > > hugetlb_fault_mutex, we will drop the mutex and return to the caller > > after allocating a page that consumed a reservation. In this case there > > may be a fault that double consumes the reservation. To handle this, we > > free the allocated page, fix the reservations, and allocate a temporary > > hugetlb page and return that to the caller. When the caller does the > > copy outside of the lock, we again check the cache, and allocate a page > > consuming the reservation, and copy over the contents. > > > > Test: > > Hacked the code locally such that resv_huge_pages underflows produce > > a warning and the copy_huge_page_from_user() always fails, then: > > > > ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 > > 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success > > ./tools/testing/selftests/vm/userfaultfd hugetlb 10 > > 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success > > > > Both tests succeed and produce no warnings. After the > > test runs number of free/resv hugepages is correct. > > Many conflicts here with material that is queued for 5.14-rc1. > > How serious is this problem? Is a -stable backport warranted? > I've sent 2 similar patches to the list: 1. "[PATCH v4] mm, hugetlb: Fix simple resv_huge_pages underflow on UFFDIO_COPY" This one is sent to -stable and linux-mm and is a fairly simple fix. 2. "[PATCH v4] mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY" Which is this patch. It's a more complicated and not critical fix, so not targeted for -stable. It's only sent to linux-mm. > If we decide to get this into 5.13 (and perhaps -stable) then I can > take a look at reworking all the 5.14 material on top. If not very > serious then we could rework this on top of the already queued > material. > I assume given the above we want to rework this on top of the already queued material. I can upload a v5 that is rebased on top of your branch. Note that you have an earlier version of this fix in your branch, so really this patch will turn into a fix for that patch if I rebase it (I assume that's fine).
On Mon, 31 May 2021 17:11:52 -0700 Mina Almasry <almasrymina@google.com> wrote: > On Mon, May 31, 2021 at 4:25 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Thu, 27 May 2021 17:50:29 -0700 Mina Almasry <almasrymina@google.com> wrote: > > > > > On UFFDIO_COPY, if we fail to copy the page contents while holding the > > > hugetlb_fault_mutex, we will drop the mutex and return to the caller > > > after allocating a page that consumed a reservation. In this case there > > > may be a fault that double consumes the reservation. To handle this, we > > > free the allocated page, fix the reservations, and allocate a temporary > > > hugetlb page and return that to the caller. When the caller does the > > > copy outside of the lock, we again check the cache, and allocate a page > > > consuming the reservation, and copy over the contents. > > > > > > Test: > > > Hacked the code locally such that resv_huge_pages underflows produce > > > a warning and the copy_huge_page_from_user() always fails, then: > > > > > > ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 > > > 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success > > > ./tools/testing/selftests/vm/userfaultfd hugetlb 10 > > > 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success > > > > > > Both tests succeed and produce no warnings. After the > > > test runs number of free/resv hugepages is correct. > > > > Many conflicts here with material that is queued for 5.14-rc1. > > > > How serious is this problem? Is a -stable backport warranted? > > > > I've sent 2 similar patches to the list: > > 1. "[PATCH v4] mm, hugetlb: Fix simple resv_huge_pages underflow on UFFDIO_COPY" > > This one is sent to -stable and linux-mm and is a fairly simple fix. > > 2. "[PATCH v4] mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY" Ah, OK, the title of the first patch was changed, which threw me off. I'd skipped "[PATCH v4] mm, hugetlb: Fix simple resv_huge_pages underflow on UFFDIO_COPY" because Mike's comments appeared to require a v5. I applied it and made Mike's changelog suggestions. Queued for 5.13 and -stable. And I queued "[PATCH v4] mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY" for 5.14.
On Mon, May 31, 2021 at 5:36 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Mon, 31 May 2021 17:11:52 -0700 Mina Almasry <almasrymina@google.com> wrote: > > > On Mon, May 31, 2021 at 4:25 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > On Thu, 27 May 2021 17:50:29 -0700 Mina Almasry <almasrymina@google.com> wrote: > > > > > > > On UFFDIO_COPY, if we fail to copy the page contents while holding the > > > > hugetlb_fault_mutex, we will drop the mutex and return to the caller > > > > after allocating a page that consumed a reservation. In this case there > > > > may be a fault that double consumes the reservation. To handle this, we > > > > free the allocated page, fix the reservations, and allocate a temporary > > > > hugetlb page and return that to the caller. When the caller does the > > > > copy outside of the lock, we again check the cache, and allocate a page > > > > consuming the reservation, and copy over the contents. > > > > > > > > Test: > > > > Hacked the code locally such that resv_huge_pages underflows produce > > > > a warning and the copy_huge_page_from_user() always fails, then: > > > > > > > > ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 > > > > 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success > > > > ./tools/testing/selftests/vm/userfaultfd hugetlb 10 > > > > 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success > > > > > > > > Both tests succeed and produce no warnings. After the > > > > test runs number of free/resv hugepages is correct. > > > > > > Many conflicts here with material that is queued for 5.14-rc1. > > > > > > How serious is this problem? Is a -stable backport warranted? > > > > > > > I've sent 2 similar patches to the list: > > > > 1. "[PATCH v4] mm, hugetlb: Fix simple resv_huge_pages underflow on UFFDIO_COPY" > > > > This one is sent to -stable and linux-mm and is a fairly simple fix. > > > > 2. "[PATCH v4] mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY" > > Ah, OK, the title of the first patch was changed, which threw me off. > > I'd skipped "[PATCH v4] mm, hugetlb: Fix simple resv_huge_pages > underflow on UFFDIO_COPY" because Mike's comments appeared to require a > v5. I applied it and made Mike's changelog suggestions. Queued for > 5.13 and -stable. > > And I queued "[PATCH v4] mm, hugetlb: fix racy resv_huge_pages > underflow on UFFDIO_COPY" for 5.14. > > Awesome, thanks! And sorry for the confusion!
On 5/31/21 7:48 PM, Mina Almasry wrote: > On Mon, May 31, 2021 at 5:36 PM Andrew Morton <akpm@linux-foundation.org> wrote: >> On Mon, 31 May 2021 17:11:52 -0700 Mina Almasry <almasrymina@google.com> wrote: >>> On Mon, May 31, 2021 at 4:25 PM Andrew Morton <akpm@linux-foundation.org> wrote: >>>> On Thu, 27 May 2021 17:50:29 -0700 Mina Almasry <almasrymina@google.com> wrote: >>> I've sent 2 similar patches to the list: >>> >>> 1. "[PATCH v4] mm, hugetlb: Fix simple resv_huge_pages underflow on UFFDIO_COPY" >>> >>> This one is sent to -stable and linux-mm and is a fairly simple fix. >>> >>> 2. "[PATCH v4] mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY" >> >> Ah, OK, the title of the first patch was changed, which threw me off. >> >> I'd skipped "[PATCH v4] mm, hugetlb: Fix simple resv_huge_pages >> underflow on UFFDIO_COPY" because Mike's comments appeared to require a >> v5. I applied it and made Mike's changelog suggestions. Queued for >> 5.13 and -stable. >> >> And I queued "[PATCH v4] mm, hugetlb: fix racy resv_huge_pages >> underflow on UFFDIO_COPY" for 5.14. >> >> > > Awesome, thanks! And sorry for the confusion! > Mina, does this patch depend on changes to restore_reserve_on_error()? I am still working on those changes. It may be a few days before I can have something finalized. If this does depend on restore_reserve_on_error as I suspect, perhaps we should send these together.
On Tue, Jun 1, 2021 at 10:09 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 5/31/21 7:48 PM, Mina Almasry wrote: > > On Mon, May 31, 2021 at 5:36 PM Andrew Morton <akpm@linux-foundation.org> wrote: > >> On Mon, 31 May 2021 17:11:52 -0700 Mina Almasry <almasrymina@google.com> wrote: > >>> On Mon, May 31, 2021 at 4:25 PM Andrew Morton <akpm@linux-foundation.org> wrote: > >>>> On Thu, 27 May 2021 17:50:29 -0700 Mina Almasry <almasrymina@google.com> wrote: > >>> I've sent 2 similar patches to the list: > >>> > >>> 1. "[PATCH v4] mm, hugetlb: Fix simple resv_huge_pages underflow on UFFDIO_COPY" > >>> > >>> This one is sent to -stable and linux-mm and is a fairly simple fix. > >>> > >>> 2. "[PATCH v4] mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY" > >> > >> Ah, OK, the title of the first patch was changed, which threw me off. > >> > >> I'd skipped "[PATCH v4] mm, hugetlb: Fix simple resv_huge_pages > >> underflow on UFFDIO_COPY" because Mike's comments appeared to require a > >> v5. I applied it and made Mike's changelog suggestions. Queued for > >> 5.13 and -stable. > >> > >> And I queued "[PATCH v4] mm, hugetlb: fix racy resv_huge_pages > >> underflow on UFFDIO_COPY" for 5.14. > >> > >> > > > > Awesome, thanks! And sorry for the confusion! > > > > Mina, does this patch depend on changes to restore_reserve_on_error()? > Yes, this patch (and only this patch) depends on your changes for complete and correct functionality. I'm not sure what's the impact > I am still working on those changes. It may be a few days before I can > have something finalized. > > If this does depend on restore_reserve_on_error as I suspect, perhaps we > should send these together. I was thinking it's fine to have my fix in Andrew's tree a few days before yours, since the race is hard to reproduce and even if the race reproduces the userfaultfd tests still pass, so I don't see any disastrous consequences, but I'm happy to do whatever is appropriate here. > -- > Mike Kravetz
> > Mina, does this patch depend on changes to restore_reserve_on_error()? > > > > Yes, this patch (and only this patch) depends on your changes for > complete and correct functionality. I'm not sure what's the impact ...of missing your changes aside from the underflow. The userfaultfd tests still pass.
On 5/27/21 5:50 PM, Mina Almasry wrote: > diff --git a/include/linux/migrate.h b/include/linux/migrate.h > index 4bb4e519e3f5..4164c9ddd86e 100644 > --- a/include/linux/migrate.h > +++ b/include/linux/migrate.h > @@ -51,6 +51,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping, > struct page *newpage, struct page *page); > extern int migrate_page_move_mapping(struct address_space *mapping, > struct page *newpage, struct page *page, int extra_count); > +extern void migrate_copy_huge_page(struct page *dst, struct page *src); > #else > > static inline void putback_movable_pages(struct list_head *l) {} > @@ -77,6 +78,9 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping, > return -ENOSYS; > } > > +static inline void migrate_copy_huge_page(struct page *dst, struct page *src) > +{ > +} > #endif /* CONFIG_MIGRATION */ > > #ifdef CONFIG_COMPACTION I am not insisting, but it might be better to make the copy routine available under the current name 'copy_huge_page'. Why? There is an existing migrate_page_copy() which not only copies the page contents, but also page state/metadata. People could get confused that 'migrate_page_copy' and 'migrate_copy_huge_page' do not have the same functionality. Of course, as soon as you look at the routines you can see the difference. Again, not necessary. Just something to consider. I suspect you changed the name to 'migrate_copy_huge_page' mostly because it resides in migrate.c? > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 76e2a6efc165..6072c9f82794 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -30,6 +30,7 @@ > #include <linux/numa.h> > #include <linux/llist.h> > #include <linux/cma.h> > +#include <linux/migrate.h> > > #include <asm/page.h> > #include <asm/pgalloc.h> > @@ -4905,20 +4906,17 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > struct page **pagep) > { > bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); > - struct address_space *mapping; > - pgoff_t idx; > + struct hstate *h = hstate_vma(dst_vma); > + struct address_space *mapping = dst_vma->vm_file->f_mapping; > + pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr); > unsigned long size; > int vm_shared = dst_vma->vm_flags & VM_SHARED; > - struct hstate *h = hstate_vma(dst_vma); > pte_t _dst_pte; > spinlock_t *ptl; > - int ret; > + int ret = -ENOMEM; > struct page *page; > int writable; > > - mapping = dst_vma->vm_file->f_mapping; > - idx = vma_hugecache_offset(h, dst_vma, dst_addr); > - > if (is_continue) { > ret = -EFAULT; > page = find_lock_page(mapping, idx); > @@ -4947,12 +4945,44 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > /* fallback to copy_from_user outside mmap_lock */ > if (unlikely(ret)) { > ret = -ENOENT; > + /* Free the allocated page which may have > + * consumed a reservation. > + */ > + restore_reserve_on_error(h, dst_vma, dst_addr, page); > + put_page(page); > + > + /* Allocate a temporary page to hold the copied > + * contents. > + */ > + page = alloc_huge_page_vma(h, dst_vma, dst_addr); > + if (IS_ERR(page)) { In v3 of the patch, alloc_migrate_huge_page was used to allocate the temporary page and Dan Carpenter pointed out that the return value should just be checked for NULL. I believe the same still applies to alloc_huge_page_vma.
diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 4bb4e519e3f5..4164c9ddd86e 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -51,6 +51,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping, struct page *newpage, struct page *page); extern int migrate_page_move_mapping(struct address_space *mapping, struct page *newpage, struct page *page, int extra_count); +extern void migrate_copy_huge_page(struct page *dst, struct page *src); #else static inline void putback_movable_pages(struct list_head *l) {} @@ -77,6 +78,9 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping, return -ENOSYS; } +static inline void migrate_copy_huge_page(struct page *dst, struct page *src) +{ +} #endif /* CONFIG_MIGRATION */ #ifdef CONFIG_COMPACTION diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 76e2a6efc165..6072c9f82794 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -30,6 +30,7 @@ #include <linux/numa.h> #include <linux/llist.h> #include <linux/cma.h> +#include <linux/migrate.h> #include <asm/page.h> #include <asm/pgalloc.h> @@ -4905,20 +4906,17 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, struct page **pagep) { bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); - struct address_space *mapping; - pgoff_t idx; + struct hstate *h = hstate_vma(dst_vma); + struct address_space *mapping = dst_vma->vm_file->f_mapping; + pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr); unsigned long size; int vm_shared = dst_vma->vm_flags & VM_SHARED; - struct hstate *h = hstate_vma(dst_vma); pte_t _dst_pte; spinlock_t *ptl; - int ret; + int ret = -ENOMEM; struct page *page; int writable; - mapping = dst_vma->vm_file->f_mapping; - idx = vma_hugecache_offset(h, dst_vma, dst_addr); - if (is_continue) { ret = -EFAULT; page = find_lock_page(mapping, idx); @@ -4947,12 +4945,44 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, /* fallback to copy_from_user outside mmap_lock */ if (unlikely(ret)) { ret = -ENOENT; + /* Free the allocated page which may have + * consumed a reservation. + */ + restore_reserve_on_error(h, dst_vma, dst_addr, page); + put_page(page); + + /* Allocate a temporary page to hold the copied + * contents. + */ + page = alloc_huge_page_vma(h, dst_vma, dst_addr); + if (IS_ERR(page)) { + ret = -ENOMEM; + goto out; + } *pagep = page; - /* don't free the page */ + /* Set the outparam pagep and return to the caller to + * copy the contents outside the lock. Don't free the + * page. + */ goto out; } } else { - page = *pagep; + if (vm_shared && + hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + put_page(*pagep); + ret = -EEXIST; + *pagep = NULL; + goto out; + } + + page = alloc_huge_page(dst_vma, dst_addr, 0); + if (IS_ERR(page)) { + ret = -ENOMEM; + *pagep = NULL; + goto out; + } + migrate_copy_huge_page(page, *pagep); + put_page(*pagep); *pagep = NULL; } diff --git a/mm/migrate.c b/mm/migrate.c index b234c3f3acb7..3bfe1f7d127d 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -550,7 +550,7 @@ static void __copy_gigantic_page(struct page *dst, struct page *src, } } -static void copy_huge_page(struct page *dst, struct page *src) +void migrate_copy_huge_page(struct page *dst, struct page *src) { int i; int nr_pages; @@ -652,7 +652,7 @@ EXPORT_SYMBOL(migrate_page_states); void migrate_page_copy(struct page *newpage, struct page *page) { if (PageHuge(page) || PageTransHuge(page)) - copy_huge_page(newpage, page); + migrate_copy_huge_page(newpage, page); else copy_highpage(newpage, page); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 2d6a3a36f6ce..e13a0492b7ba 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -346,54 +346,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, out_unlock: mmap_read_unlock(dst_mm); out: - if (page) { - /* - * We encountered an error and are about to free a newly - * allocated huge page. - * - * Reservation handling is very subtle, and is different for - * private and shared mappings. See the routine - * restore_reserve_on_error for details. Unfortunately, we - * can not call restore_reserve_on_error now as it would - * require holding mmap_lock. - * - * If a reservation for the page existed in the reservation - * map of a private mapping, the map was modified to indicate - * the reservation was consumed when the page was allocated. - * We clear the HPageRestoreRsvCnt flag now so that the global - * reserve count will not be incremented in free_huge_page. - * The reservation map will still indicate the reservation - * was consumed and possibly prevent later page allocation. - * This is better than leaking a global reservation. If no - * reservation existed, it is still safe to clear - * HPageRestoreRsvCnt as no adjustments to reservation counts - * were made during allocation. - * - * The reservation map for shared mappings indicates which - * pages have reservations. When a huge page is allocated - * for an address with a reservation, no change is made to - * the reserve map. In this case HPageRestoreRsvCnt will be - * set to indicate that the global reservation count should be - * incremented when the page is freed. This is the desired - * behavior. However, when a huge page is allocated for an - * address without a reservation a reservation entry is added - * to the reservation map, and HPageRestoreRsvCnt will not be - * set. When the page is freed, the global reserve count will - * NOT be incremented and it will appear as though we have - * leaked reserved page. In this case, set HPageRestoreRsvCnt - * so that the global reserve count will be incremented to - * match the reservation map entry which was created. - * - * Note that vm_alloc_shared is based on the flags of the vma - * for which the page was originally allocated. dst_vma could - * be different or NULL on error. - */ - if (vm_alloc_shared) - SetHPageRestoreRsvCnt(page); - else - ClearHPageRestoreRsvCnt(page); + if (page) put_page(page); - } BUG_ON(copied < 0); BUG_ON(err > 0); BUG_ON(!copied && !err);
On UFFDIO_COPY, if we fail to copy the page contents while holding the hugetlb_fault_mutex, we will drop the mutex and return to the caller after allocating a page that consumed a reservation. In this case there may be a fault that double consumes the reservation. To handle this, we free the allocated page, fix the reservations, and allocate a temporary hugetlb page and return that to the caller. When the caller does the copy outside of the lock, we again check the cache, and allocate a page consuming the reservation, and copy over the contents. Test: Hacked the code locally such that resv_huge_pages underflows produce a warning and the copy_huge_page_from_user() always fails, then: ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success ./tools/testing/selftests/vm/userfaultfd hugetlb 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success Both tests succeed and produce no warnings. After the test runs number of free/resv hugepages is correct. Signed-off-by: Mina Almasry <almasrymina@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: linux-mm@kvack.org Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- include/linux/migrate.h | 4 ++++ mm/hugetlb.c | 48 +++++++++++++++++++++++++++++++++-------- mm/migrate.c | 4 ++-- mm/userfaultfd.c | 48 +---------------------------------------- 4 files changed, 46 insertions(+), 58 deletions(-) -- 2.32.0.rc0.204.g9fa02ecfa5-goog