Message ID | 20210525233134.246444-1-mike.kravetz@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | Track reserve map changes to restore on error | expand |
On Wed, May 26, 2021 at 7:31 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > Here is a modification to the reservation tracking for fixup on errors. > It is a more general change, but should work for the hugetlb_mcopy_pte_atomic > case as well. > > Perhaps use this as a prerequisite for your fix(es)? Pretty sure this > will eliminate the need for the call to hugetlb_unreserve_pages. Hi Mike, It seems like someone is fixing a bug, right? Maybe a link should be placed in the cover letter so that someone can know what issue we are facing. Thanks. > > Mike Kravetz (2): > hugetlb: rename HPageRestoreReserve flag to HPageRestoreRsvCnt > hugetlb: add new hugetlb specific flag HPG_restore_rsv_map > > fs/hugetlbfs/inode.c | 3 +- > include/linux/hugetlb.h | 17 +++++-- > mm/hugetlb.c | 108 ++++++++++++++++++++++++++++------------ > mm/userfaultfd.c | 14 +++--- > 4 files changed, 99 insertions(+), 43 deletions(-) > > -- > 2.31.1 >
On 5/25/21 8:19 PM, Muchun Song wrote: > On Wed, May 26, 2021 at 7:31 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: >> >> Here is a modification to the reservation tracking for fixup on errors. >> It is a more general change, but should work for the hugetlb_mcopy_pte_atomic >> case as well. >> >> Perhaps use this as a prerequisite for your fix(es)? Pretty sure this >> will eliminate the need for the call to hugetlb_unreserve_pages. > > Hi Mike, > > It seems like someone is fixing a bug, right? Maybe a link should be > placed in the cover letter so that someone can know what issue > we are facing. > Thanks Muchun, I wanted to first see if these patches would work in the code Mina is modifying. If this works for Mina, then a more formal patch and request for inclusion will be sent. I believe this issue has existed since the introduction of hugetlb reservations in v2.6.18. Since the bug only shows up when we take error paths, the issue may not have been observed. Mina found a similar issue in an error path which could also expose this issue.
On Wed, May 26, 2021 at 10:17 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 5/25/21 8:19 PM, Muchun Song wrote: > > On Wed, May 26, 2021 at 7:31 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > >> > >> Here is a modification to the reservation tracking for fixup on errors. > >> It is a more general change, but should work for the hugetlb_mcopy_pte_atomic > >> case as well. > >> > >> Perhaps use this as a prerequisite for your fix(es)? Pretty sure this > >> will eliminate the need for the call to hugetlb_unreserve_pages. > > > > Hi Mike, > > > > It seems like someone is fixing a bug, right? Maybe a link should be > > placed in the cover letter so that someone can know what issue > > we are facing. > > > > Thanks Muchun, > > I wanted to first see if these patches would work in the code Mina is > modifying. If this works for Mina, then a more formal patch and request > for inclusion will be sent. > So a quick test: I apply my patche and yours on top of linus/master, and I remove the hugetlb_unreserve_pages() call that triggered this conversation, and run the userfaultfd test, resv_huge_pages underflows again, so it seems on the surface this doesn't quite work as is. Not quite sure what to do off the top of my head. I think I will try to debug why the 3 patches don't work together and I will fix either your patch or mine. I haven't taken a deep look yet; I just ran a quick test. > I believe this issue has existed since the introduction of hugetlb > reservations in v2.6.18. Since the bug only shows up when we take error > paths, the issue may not have been observed. Mina found a similar issue > in an error path which could also expose this issue. > -- > Mike Kravetz
On Wed, May 26, 2021 at 4:19 PM Mina Almasry <almasrymina@google.com> wrote: > > On Wed, May 26, 2021 at 10:17 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > > > On 5/25/21 8:19 PM, Muchun Song wrote: > > > On Wed, May 26, 2021 at 7:31 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > >> > > >> Here is a modification to the reservation tracking for fixup on errors. > > >> It is a more general change, but should work for the hugetlb_mcopy_pte_atomic > > >> case as well. > > >> > > >> Perhaps use this as a prerequisite for your fix(es)? Pretty sure this > > >> will eliminate the need for the call to hugetlb_unreserve_pages. > > > > > > Hi Mike, > > > > > > It seems like someone is fixing a bug, right? Maybe a link should be > > > placed in the cover letter so that someone can know what issue > > > we are facing. > > > > > > > Thanks Muchun, > > > > I wanted to first see if these patches would work in the code Mina is > > modifying. If this works for Mina, then a more formal patch and request > > for inclusion will be sent. > > > > So a quick test: I apply my patche and yours on top of linus/master, > and I remove the hugetlb_unreserve_pages() call that triggered this > conversation, and run the userfaultfd test, resv_huge_pages underflows > again, so it seems on the surface this doesn't quite work as is. > > Not quite sure what to do off the top of my head. I think I will try > to debug why the 3 patches don't work together and I will fix either > your patch or mine. I haven't taken a deep look yet; I just ran a > quick test. > Ok found the issue. With the setup I described above, the hugetlb_shared test case passes: ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success The non-shared test case is the one that underflows: ./tools/testing/selftests/vm/userfaultfd hugetlb 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success I've debugged a bit, and this messy hunk 'fixes' the underflow with the non-shared case. (Sorry for the messiness). @@ -2329,17 +2340,14 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, */ SetHPageRestoreRsvCnt(page); } else { - rc = vma_needs_reservation(h, vma, address); - if (rc < 0) - /* - * See above comment about rare out of - * memory condition. - */ - SetHPageRestoreRsvCnt(page); - else if (rc) - vma_add_reservation(h, vma, address); - else - vma_end_reservation(h, vma, address); + resv = inode_resv_map(vma->vm_file->f_mapping->host); + if (resv) { + int chg = region_del(resv, idx, idx+1); + VM_BUG_ON(chg); + } The reason being is that on page allocation we region_add() an entry into the resv_map regardless of whether this is a shared mapping or not (vma_needs_reservation() + vma_commit_reservation(), which amounts to region_add() at the end of the day). To unroll back this change on error, we need to region_del() the region_add(). The code removed above doesn't end up calling region_del(), because vma_needs_reservation() returns 0, because region_chg() sees there is an entry in the resv_map, and returns 0. The VM_BUG_ON() is just because I'm not sure how to handle that error. > > I believe this issue has existed since the introduction of hugetlb > > reservations in v2.6.18. Since the bug only shows up when we take error > > paths, the issue may not have been observed. Mina found a similar issue > > in an error path which could also expose this issue. > > -- > > Mike Kravetz
On 5/26/21 7:48 PM, Mina Almasry wrote: > On Wed, May 26, 2021 at 4:19 PM Mina Almasry <almasrymina@google.com> wrote: >> >> On Wed, May 26, 2021 at 10:17 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: >>> >>> On 5/25/21 8:19 PM, Muchun Song wrote: >>>> On Wed, May 26, 2021 at 7:31 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: >>>>> >>>>> Here is a modification to the reservation tracking for fixup on errors. >>>>> It is a more general change, but should work for the hugetlb_mcopy_pte_atomic >>>>> case as well. >>>>> >>>>> Perhaps use this as a prerequisite for your fix(es)? Pretty sure this >>>>> will eliminate the need for the call to hugetlb_unreserve_pages. >>>> >>>> Hi Mike, >>>> >>>> It seems like someone is fixing a bug, right? Maybe a link should be >>>> placed in the cover letter so that someone can know what issue >>>> we are facing. >>>> >>> >>> Thanks Muchun, >>> >>> I wanted to first see if these patches would work in the code Mina is >>> modifying. If this works for Mina, then a more formal patch and request >>> for inclusion will be sent. >>> >> >> So a quick test: I apply my patche and yours on top of linus/master, >> and I remove the hugetlb_unreserve_pages() call that triggered this >> conversation, and run the userfaultfd test, resv_huge_pages underflows >> again, so it seems on the surface this doesn't quite work as is. >> >> Not quite sure what to do off the top of my head. I think I will try >> to debug why the 3 patches don't work together and I will fix either >> your patch or mine. I haven't taken a deep look yet; I just ran a >> quick test. >> > > Ok found the issue. With the setup I described above, the > hugetlb_shared test case passes: > > ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 2 > /tmp/kokonut_test/huge/userfaultfd_test && echo test success > > The non-shared test case is the one that underflows: > > ./tools/testing/selftests/vm/userfaultfd hugetlb 10 2 > /tmp/kokonut_test/huge/userfaultfd_test && echo test success > > I've debugged a bit, and this messy hunk 'fixes' the underflow with > the non-shared case. (Sorry for the messiness). > > @@ -2329,17 +2340,14 @@ void restore_reserve_on_error(struct hstate > *h, struct vm_area_struct *vma, > */ > SetHPageRestoreRsvCnt(page); > } else { > - rc = vma_needs_reservation(h, vma, address); > - if (rc < 0) > - /* > - * See above comment about rare out of > - * memory condition. > - */ > - SetHPageRestoreRsvCnt(page); > - else if (rc) > - vma_add_reservation(h, vma, address); > - else > - vma_end_reservation(h, vma, address); > + resv = inode_resv_map(vma->vm_file->f_mapping->host); > + if (resv) { > + int chg = region_del(resv, idx, idx+1); > + VM_BUG_ON(chg); > + } > > The reason being is that on page allocation we region_add() an entry > into the resv_map regardless of whether this is a shared mapping or > not (vma_needs_reservation() + vma_commit_reservation(), which amounts > to region_add() at the end of the day). > > To unroll back this change on error, we need to region_del() the region_add(). > > The code removed above doesn't end up calling region_del(), because > vma_needs_reservation() returns 0, because region_chg() sees there is > an entry in the resv_map, and returns 0. > > The VM_BUG_ON() is just because I'm not sure how to handle that error. > Thanks Mina! Yes, that new block of code in restore_reserve_on_error is incorrect for the private mapping case. Since alloc_huge_page does the region_add for both shared and private mappings, it seems we should just do the region_del for both. I'll update this patch to fix this and take your other comments into account.