Message ID | 20221021192805.366ad573@imladris.surriel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs | expand |
On 10/21/22 19:28, Rik van Riel wrote: > A common use case for hugetlbfs is for the application to create > memory pools backed by huge pages, which then get handed over to > some malloc library (eg. jemalloc) for further management. > > That malloc library may be doing MADV_DONTNEED calls on memory > that is no longer needed, expecting those calls to happen on > PAGE_SIZE boundaries. > > However, currently the MADV_DONTNEED code rounds up any such > requests to HPAGE_PMD_SIZE boundaries. This leads to undesired > outcomes when jemalloc expects a 4kB MADV_DONTNEED, but 2MB of > memory get zeroed out, instead. > > Use of pre-built shared libraries means that user code does not > always know the page size of every memory arena in use. > > Avoid unexpected data loss with MADV_DONTNEED by rounding up > only to PAGE_SIZE (in do_madvise), and rounding down to huge > page granularity. > > That way programs will only get as much memory zeroed out as > they requested. > > Cc: Mike Kravetz <mike.kravetz@oracle.com> > Cc: David Hildenbrand <david@redhat.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: stable@kernel.org > Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings") I do hate changing behavior, but in hindsight this is the right behavior. Especially, since it can cause unexpected data loss. Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
diff --git a/mm/madvise.c b/mm/madvise.c index 2baa93ca2310..c7105ec6d08c 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -813,7 +813,14 @@ static bool madvise_dontneed_free_valid_vma(struct vm_area_struct *vma, if (start & ~huge_page_mask(hstate_vma(vma))) return false; - *end = ALIGN(*end, huge_page_size(hstate_vma(vma))); + /* + * Madvise callers expect the length to be rounded up to PAGE_SIZE + * boundaries, and may be unaware that this VMA uses huge pages. + * Avoid unexpected data loss by rounding down the number of + * huge pages freed. + */ + *end = ALIGN_DOWN(*end, huge_page_size(hstate_vma(vma))); + return true; } @@ -828,6 +835,9 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, if (!madvise_dontneed_free_valid_vma(vma, start, &end, behavior)) return -EINVAL; + if (start == end) + return 0; + if (!userfaultfd_remove(vma, start, end)) { *prev = NULL; /* mmap_lock has been dropped, prev is stale */