mbox series

[v3,0/2] hugetlbfs: use i_mmap_rwsem for better synchronization

Message ID 20181222223013.22193-1-mike.kravetz@oracle.com (mailing list archive)
Headers show
Series hugetlbfs: use i_mmap_rwsem for better synchronization | expand

Message

Mike Kravetz Dec. 22, 2018, 10:30 p.m. UTC
There are two primary issues addressed here:
1) For shared pmds, huge PTE pointers returned by huge_pte_alloc can become
   invalid via a call to huge_pmd_unshare by another thread.
2) hugetlbfs page faults can race with truncation causing invalid global
   reserve counts and state.
Both issues are addressed by expanding the use of i_mmap_rwsem.

These issues have existed for a long time.  They can be recreated with a
test program that causes page fault/truncation races.  For simple mappings,
this results in a negative HugePages_Rsvd count.  If racing with mappings
that contain shared pmds, we can hit "BUG at fs/hugetlbfs/inode.c:444!" or
Oops! as the result of an invalid memory reference.

v2 -> v3
  Incorporated suggestions from Kirill.  Code change to hold i_mmap_rwsem
  for duration of copy in copy_hugetlb_page_range.  Took i_mmap_rwsem in
  hugetlbfs_evict_inode to be consistent with other callers.  Other changes
  were to documentation/comments.
v1 -> v2
  Combined patches 2 and 3 of v1 series as suggested by Aneesh.  No other
  changes were made.
Patches are a follow up to the RFC,
  http://lkml.kernel.org/r/20181024045053.1467-1-mike.kravetz@oracle.com
  Comments made by Naoya were addressed.

Mike Kravetz (2):
  hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
  hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race

 fs/hugetlbfs/inode.c | 61 +++++++++++++++-----------------
 mm/hugetlb.c         | 84 +++++++++++++++++++++++++++++++-------------
 mm/memory-failure.c  | 14 +++++++-
 mm/migrate.c         | 13 ++++++-
 mm/rmap.c            |  4 +++
 mm/userfaultfd.c     | 11 ++++--
 6 files changed, 125 insertions(+), 62 deletions(-)

Comments

Kirill A. Shutemov Dec. 24, 2018, 10:13 a.m. UTC | #1
On Sat, Dec 22, 2018 at 02:30:11PM -0800, Mike Kravetz wrote:
> There are two primary issues addressed here:
> 1) For shared pmds, huge PTE pointers returned by huge_pte_alloc can become
>    invalid via a call to huge_pmd_unshare by another thread.
> 2) hugetlbfs page faults can race with truncation causing invalid global
>    reserve counts and state.
> Both issues are addressed by expanding the use of i_mmap_rwsem.
> 
> These issues have existed for a long time.  They can be recreated with a
> test program that causes page fault/truncation races.  For simple mappings,
> this results in a negative HugePages_Rsvd count.  If racing with mappings
> that contain shared pmds, we can hit "BUG at fs/hugetlbfs/inode.c:444!" or
> Oops! as the result of an invalid memory reference.
> 
> v2 -> v3
>   Incorporated suggestions from Kirill.  Code change to hold i_mmap_rwsem
>   for duration of copy in copy_hugetlb_page_range.  Took i_mmap_rwsem in
>   hugetlbfs_evict_inode to be consistent with other callers.  Other changes
>   were to documentation/comments.
> v1 -> v2
>   Combined patches 2 and 3 of v1 series as suggested by Aneesh.  No other
>   changes were made.
> Patches are a follow up to the RFC,
>   http://lkml.kernel.org/r/20181024045053.1467-1-mike.kravetz@oracle.com
>   Comments made by Naoya were addressed.
> 
> Mike Kravetz (2):
>   hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
>   hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race

Looks good to me.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>