mbox series

[RFC,v2,0/2] mm: fix races due to deferred TLB flushes

Message ID 20201225092529.3228466-1-namit@vmware.com (mailing list archive)
Headers show
Series mm: fix races due to deferred TLB flushes | expand

Message

Nadav Amit Dec. 25, 2020, 9:25 a.m. UTC
From: Nadav Amit <namit@vmware.com>

This patch-set went from v1 to RFCv2, as there is still an ongoing
discussion regarding the way of solving the recently found races due to
deferred TLB flushes. These patches are only sent for reference for now,
and can be applied later if no better solution is taken.

In a nutshell, write-protecting PTEs with deferred TLB flushes was mostly
performed while holding mmap_lock for write. This prevented concurrent
page-fault handler invocations from mistakenly assuming that a page is
write-protected when in fact, due to the deferred TLB flush, other CPU
could still write to the page. Such a write can cause a memory
corruption if it takes place after the page was copied (in
cow_user_page()), and before the PTE was flushed (by wp_page_copy()).

However, the userfaultfd and soft-dirty mechanisms did not take
mmap_lock for write, but only for read, which made such races possible.
Since commit 09854ba94c6a ("mm: do_wp_page() simplification") these
races became more likely to take place as non-COW'd pages are more
likely to be COW'd instead of being reused. Both of the races that
these patches are intended to resolve were produced on v5.10.

To avoid the performance overhead some alternative solutions that do not
require to acquire mmap_lock for write were proposed, specifically for
userfaultfd. So far no better solution that can be backported was
proposed for the soft-dirty case.

v1->RFCv2:
- Better (i.e., correct) description of the userfaultfd buggy case [Yu]
- Patch for the soft-dirty case

Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>

Nadav Amit (2):
  mm/userfaultfd: fix memory corruption due to writeprotect
  fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup

 fs/proc/task_mmu.c | 27 +++++++++++++--------------
 mm/mprotect.c      |  3 ++-
 mm/userfaultfd.c   | 15 +++++++++++++--
 3 files changed, 28 insertions(+), 17 deletions(-)