Message ID | 20181203200850.6460-4-mike.kravetz@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hugetlbfs: use i_mmap_rwsem for better synchronization | expand |
Hi, [This is an automated email] This commit has been processed because it contains a "Fixes:" tag, fixing commit: ebed4bfc8da8 [PATCH] hugetlb: fix absurd HugePages_Rsvd. The bot has tested the following trees: v4.19.6, v4.14.85, v4.9.142, v4.4.166, v3.18.128. v4.19.6: Build OK! v4.14.85: Failed to apply! Possible dependencies: 285b8dcaacfc ("mm, hugetlbfs: pass fault address to no page handler") 786e86da629d ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") v4.9.142: Failed to apply! Possible dependencies: 1a1aad8a9b7b ("userfaultfd: hugetlbfs: add userfaultfd hugetlb hook") 29f3ad7d8380 ("fs: Provide function to unmap metadata for a range of blocks") 334fd34d76f2 ("vfs: Add page_cache_seek_hole_data helper") 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges") 7868a2087ec1 ("mm/hugetlb: add size parameter to huge_pte_offset()") 786e86da629d ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") 7fc9e4722435 ("fs: Introduce filemap_range_has_page()") 82b0f8c39a38 ("mm: join struct fault_env and vm_fault") 8bea80520750 ("mm/hugetlb.c: use huge_pte_lock instead of opencoding the lock") 953c66c2b22a ("mm: THP page cache support for ppc64") d72dc8a25afc ("mm: make pagevec_lookup() update index") fd60775aea80 ("mm, thp: avoid unlikely branches for split_huge_pmd") v4.4.166: Failed to apply! Possible dependencies: 0070e28d97e7 ("radix_tree: loop based on shift count, not height") 00f47b581105 ("radix-tree: rewrite radix_tree_tag_clear") 0e749e54244e ("dax: increase granularity of dax_clear_blocks() operations") 1366c37ed84b ("radix tree test harness") 29f3ad7d8380 ("fs: Provide function to unmap metadata for a range of blocks") 334fd34d76f2 ("vfs: Add page_cache_seek_hole_data helper") 339e6353046d ("radix_tree: tag all internal tree nodes as indirect pointers") 4aae8d1c051e ("mm/hugetlbfs: unmap pages if page fault raced with hole punch") 52db400fcd50 ("pmem, dax: clean up clear_pmem()") 72e2936c04f7 ("mm: remove unnecessary condition in remove_inode_hugepages") 7fc9e4722435 ("fs: Introduce filemap_range_has_page()") 83929372f629 ("filemap: prepare find and delete operations for huge pages") ac401cc78242 ("dax: New fault locking") b2e0d1625e19 ("dax: fix lifetime of in-kernel dax mappings with dax_map_atomic()") d604c324524b ("radix-tree: introduce radix_tree_replace_clear_tags()") d72dc8a25afc ("mm: make pagevec_lookup() update index") e4b274915863 ("DAX: move RADIX_DAX_ definitions to dax.c") e61452365372 ("radix_tree: add support for multi-order entries") f9fe48bece3a ("dax: support dirty DAX entries in radix tree") v3.18.128: Failed to apply! Possible dependencies: 1817889e3b2c ("mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holes") 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools") 1dd308a7b49d ("mm/hugetlb: document the reserve map/region tracking routines") 5e9113731a3c ("mm/hugetlb: add cache of descriptors to resv_map for region_add") 83cde9e8ba95 ("mm: use new helper functions around the i_mmap_mutex") b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages") c672c7f29f2f ("mm/hugetlb: expose hugetlb fault mutex for use by fallocate") cf3ad20bfead ("mm/hugetlb: compute/return the number of regions added by region_add()") feba16e25a57 ("mm/hugetlb: add region_del() to delete a specific range of entries") How should we proceed with this patch? -- Thanks, Sasha
On 12/4/18 1:38 AM, Mike Kravetz wrote: > After expanding i_mmap_rwsem use for better shared pmd and page fault/ > truncation synchronization, remove code that is no longer necessary. > > Cc: <stable@vger.kernel.org> > Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd") > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > --- > fs/hugetlbfs/inode.c | 46 +++++++++++++++----------------------------- > mm/hugetlb.c | 21 ++++++++++---------- > 2 files changed, 25 insertions(+), 42 deletions(-) > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index 3244147fc42b..a9c00c6ef80d 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -383,17 +383,16 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end) > * truncation is indicated by end of range being LLONG_MAX > * In this case, we first scan the range and release found pages. > * After releasing pages, hugetlb_unreserve_pages cleans up region/reserv > - * maps and global counts. Page faults can not race with truncation > - * in this routine. hugetlb_no_page() prevents page faults in the > - * truncated range. It checks i_size before allocation, and again after > - * with the page table lock for the page held. The same lock must be > - * acquired to unmap a page. > + * maps and global counts. > * hole punch is indicated if end is not LLONG_MAX > * In the hole punch case we scan the range and release found pages. > * Only when releasing a page is the associated region/reserv map > * deleted. The region/reserv map for ranges without associated > - * pages are not modified. Page faults can race with hole punch. > - * This is indicated if we find a mapped page. > + * pages are not modified. > + * > + * Callers of this routine must hold the i_mmap_rwsem in write mode to prevent > + * races with page faults. Should this patch be merged to the previous one? Because the changes to callers are done in the previous patch. > + * > * Note: If the passed end of range value is beyond the end of file, but > * not LLONG_MAX this routine still performs a hole punch operation. > */ > @@ -423,32 +422,14 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, > > for (i = 0; i < pagevec_count(&pvec); ++i) { > -aneesh
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 3244147fc42b..a9c00c6ef80d 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -383,17 +383,16 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end) * truncation is indicated by end of range being LLONG_MAX * In this case, we first scan the range and release found pages. * After releasing pages, hugetlb_unreserve_pages cleans up region/reserv - * maps and global counts. Page faults can not race with truncation - * in this routine. hugetlb_no_page() prevents page faults in the - * truncated range. It checks i_size before allocation, and again after - * with the page table lock for the page held. The same lock must be - * acquired to unmap a page. + * maps and global counts. * hole punch is indicated if end is not LLONG_MAX * In the hole punch case we scan the range and release found pages. * Only when releasing a page is the associated region/reserv map * deleted. The region/reserv map for ranges without associated - * pages are not modified. Page faults can race with hole punch. - * This is indicated if we find a mapped page. + * pages are not modified. + * + * Callers of this routine must hold the i_mmap_rwsem in write mode to prevent + * races with page faults. + * * Note: If the passed end of range value is beyond the end of file, but * not LLONG_MAX this routine still performs a hole punch operation. */ @@ -423,32 +422,14 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, for (i = 0; i < pagevec_count(&pvec); ++i) { struct page *page = pvec.pages[i]; - u32 hash; index = page->index; - hash = hugetlb_fault_mutex_hash(h, current->mm, - &pseudo_vma, - mapping, index, 0); - mutex_lock(&hugetlb_fault_mutex_table[hash]); - /* - * If page is mapped, it was faulted in after being - * unmapped in caller. Unmap (again) now after taking - * the fault mutex. The mutex will prevent faults - * until we finish removing the page. - * - * This race can only happen in the hole punch case. - * Getting here in a truncate operation is a bug. + * A mapped page is impossible as callers should unmap + * all references before calling. And, i_mmap_rwsem + * prevents the creation of additional mappings. */ - if (unlikely(page_mapped(page))) { - BUG_ON(truncate_op); - - i_mmap_lock_write(mapping); - hugetlb_vmdelete_list(&mapping->i_mmap, - index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h)); - i_mmap_unlock_write(mapping); - } + VM_BUG_ON(page_mapped(page)); lock_page(page); /* @@ -470,7 +451,6 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, } unlock_page(page); - mutex_unlock(&hugetlb_fault_mutex_table[hash]); } huge_pagevec_release(&pvec); cond_resched(); @@ -624,7 +604,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, /* addr is the offset within the file (zero based) */ addr = index * hpage_size; - /* mutex taken here, fault path and hole punch */ + /* + * fault mutex taken here, protects against fault path + * and hole punch. inode_lock previously taken protects + * against truncation. + */ hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping, index, addr); mutex_lock(&hugetlb_fault_mutex_table[hash]); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 362601b69c56..89e1a253a40b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3760,16 +3760,16 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } /* - * Use page lock to guard against racing truncation - * before we get page_table_lock. + * We can not race with truncation due to holding i_mmap_rwsem. + * Check once here for faults beyond end of file. */ + size = i_size_read(mapping->host) >> huge_page_shift(h); + if (idx >= size) + goto out; + retry: page = find_lock_page(mapping, idx); if (!page) { - size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) - goto out; - /* * Check for page in userfault range */ @@ -3859,9 +3859,6 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } ptl = huge_pte_lock(h, mm, ptep); - size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) - goto backout; ret = 0; if (!huge_pte_none(huge_ptep_get(ptep))) @@ -3964,8 +3961,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, /* * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold - * until finished with ptep. This prevents huge_pmd_unshare from - * being called elsewhere and making the ptep no longer valid. + * until finished with ptep. This serves two purposes: + * 1) It prevents huge_pmd_unshare from being called elsewhere + * and making the ptep no longer valid. + * 2) It synchronizes us with file truncation. * * ptep could have already be assigned via huge_pte_offset. That * is OK, as huge_pte_alloc will return the same value unless
After expanding i_mmap_rwsem use for better shared pmd and page fault/ truncation synchronization, remove code that is no longer necessary. Cc: <stable@vger.kernel.org> Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- fs/hugetlbfs/inode.c | 46 +++++++++++++++----------------------------- mm/hugetlb.c | 21 ++++++++++---------- 2 files changed, 25 insertions(+), 42 deletions(-)