Message ID | 20190328234704.27083-1-mike.kravetz@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | A couple hugetlbfs fixes | expand |
On Thu, Mar 28, 2019 at 04:47:02PM -0700, Mike Kravetz wrote: > I stumbled on these two hugetlbfs issues while looking at other things: > - The 'restore reserve' functionality at page free time should not > be adjusting subpool counts. > - A BUG can be triggered (not easily) due to temporarily mapping a > page before doing a COW. > > Both are described in detail in the commit message of the patches. > I would appreciate comments from Davidlohr Bueso as one patch is > directly related to code he added in commit 8382d914ebf7. > > I did not cc stable as the first problem has been around since reserves > were added to hugetlbfs and nobody has noticed. The second is very hard > to hit/reproduce. > > v2 - Update definition and all callers of hugetlb_fault_mutex_hash as > the arguments mm and vma are no longer used or necessary. > > Mike Kravetz (2): > huegtlbfs: on restore reserve error path retain subpool reservation > hugetlb: use same fault hash key for shared and private mappings > > fs/hugetlbfs/inode.c | 7 ++----- > include/linux/hugetlb.h | 4 +--- > mm/hugetlb.c | 43 +++++++++++++++++++++-------------------- > mm/userfaultfd.c | 3 +-- > 4 files changed, 26 insertions(+), 31 deletions(-) Both fixes look fine to me. Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
On Thu, 28 Mar 2019, Mike Kravetz wrote: >- A BUG can be triggered (not easily) due to temporarily mapping a > page before doing a COW. But you actually _have_ seen it? Do you have the traces? I ask not because of the patches perse, but because it would be nice to have a real snipplet in the Changelog for patch 2. Thanks, Davidlohr
On 4/8/19 12:48 PM, Davidlohr Bueso wrote: > On Thu, 28 Mar 2019, Mike Kravetz wrote: > >> - A BUG can be triggered (not easily) due to temporarily mapping a >> page before doing a COW. > > But you actually _have_ seen it? Do you have the traces? I ask > not because of the patches perse, but because it would be nice > to have a real snipplet in the Changelog for patch 2. Yes, I actually saw this problem. It happened while I was debugging and testing some patches for hugetlb migration. The BUG I hit was in unaccount_page_cache_page(): VM_BUG_ON_PAGE(page_mapped(page), page). Stack trace was something like: unaccount_page_cache_page __delete_from_page_cache delete_from_page_cache remove_huge_page remove_inode_hugepages hugetlbfs_punch_hole hugetlbfs_fallocate When I hit that, it took me a while to figure out how it could happen. i.e. How could a page be mapped at that point in remove_inode_hugepages? It checks page_mapped and we are holding the fault mutex. With some additional debug code (strategic udelays) I could hit the issue on a somewhat regular basis and verified another thread was in the hugetlb_no_page/hugetlb_cow path for the same page at the same time. Unfortunately, I did not save the traces. I am trying to recreate now. However, my test system was recently updated and it might take a little time to recreate.