diff mbox series

[v3] mm/gup: disallow GUP writing to file-backed mappings by default

Message ID 23c19e27ef0745f6d3125976e047ee0da62569d4.1682406295.git.lstoakes@gmail.com (mailing list archive)
State New
Headers show
Series [v3] mm/gup: disallow GUP writing to file-backed mappings by default | expand

Commit Message

Lorenzo Stoakes April 25, 2023, 7:14 a.m. UTC
GUP does not correctly implement write-notify semantics, nor does it
guarantee that the underlying pages are correctly dirtied, which could lead
to a kernel oops or data corruption when writing to file-backed mappings.

This is only relevant when the mappings are file-backed and the underlying
file system requires folio dirty tracking. File systems which do not, such
as shmem or hugetlb, are not at risk and therefore can be written to
without issue.

Unfortunately this limitation of GUP has been present for some time and
requires future rework of the GUP API in order to provide correct write
access to such mappings.

In the meantime, we add a check for the most broken GUP case -
FOLL_LONGTERM - which really under no circumstances can safely access
dirty-tracked file mappings.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
v3:
- Rebased on latest mm-unstable as of 24th April 2023.
- Explicitly check whether file system requires folio dirtying. Note that
  vma_wants_writenotify() could not be used directly as it is very much focused
  on determining if the PTE r/w should be set (e.g. assuming private mapping
  does not require it as already set, soft dirty considerations).
- Tested code against shmem and hugetlb mappings - confirmed that these are not
  disallowed by the check.
- Eliminate FOLL_ALLOW_BROKEN_FILE_MAPPING flag and instead perform check only
  for FOLL_LONGTERM pins.
- As a result, limit check to internal GUP code.

v2:
- Add accidentally excluded ptrace_access_vm() use of
  FOLL_ALLOW_BROKEN_FILE_MAPPING.
- Tweak commit message.
https://lore.kernel.org/all/c8ee7e02d3d4f50bb3e40855c53bda39eec85b7d.1682321768.git.lstoakes@gmail.com/

v1:
https://lore.kernel.org/all/f86dc089b460c80805e321747b0898fd1efe93d7.1682168199.git.lstoakes@gmail.com/

 mm/gup.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)

--
2.40.0

Comments

Kirill A. Shutemov April 25, 2023, 10:11 a.m. UTC | #1
On Tue, Apr 25, 2023 at 08:14:14AM +0100, Lorenzo Stoakes wrote:
> GUP does not correctly implement write-notify semantics, nor does it
> guarantee that the underlying pages are correctly dirtied, which could lead
> to a kernel oops or data corruption when writing to file-backed mappings.
> 
> This is only relevant when the mappings are file-backed and the underlying
> file system requires folio dirty tracking. File systems which do not, such
> as shmem or hugetlb, are not at risk and therefore can be written to
> without issue.
> 
> Unfortunately this limitation of GUP has been present for some time and
> requires future rework of the GUP API in order to provide correct write
> access to such mappings.
> 
> In the meantime, we add a check for the most broken GUP case -
> FOLL_LONGTERM - which really under no circumstances can safely access
> dirty-tracked file mappings.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> ---
> v3:
> - Rebased on latest mm-unstable as of 24th April 2023.
> - Explicitly check whether file system requires folio dirtying. Note that
>   vma_wants_writenotify() could not be used directly as it is very much focused
>   on determining if the PTE r/w should be set (e.g. assuming private mapping
>   does not require it as already set, soft dirty considerations).

Hm. Okay. Have you considered having a common base for your case and
vma_wants_writenotify()? Code duplication doesn't look good.
Lorenzo Stoakes April 25, 2023, 4:45 p.m. UTC | #2
On Tue, Apr 25, 2023 at 01:11:53PM +0300, Kirill A . Shutemov wrote:
> On Tue, Apr 25, 2023 at 08:14:14AM +0100, Lorenzo Stoakes wrote:
> > GUP does not correctly implement write-notify semantics, nor does it
> > guarantee that the underlying pages are correctly dirtied, which could lead
> > to a kernel oops or data corruption when writing to file-backed mappings.
> >
> > This is only relevant when the mappings are file-backed and the underlying
> > file system requires folio dirty tracking. File systems which do not, such
> > as shmem or hugetlb, are not at risk and therefore can be written to
> > without issue.
> >
> > Unfortunately this limitation of GUP has been present for some time and
> > requires future rework of the GUP API in order to provide correct write
> > access to such mappings.
> >
> > In the meantime, we add a check for the most broken GUP case -
> > FOLL_LONGTERM - which really under no circumstances can safely access
> > dirty-tracked file mappings.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> > ---
> > v3:
> > - Rebased on latest mm-unstable as of 24th April 2023.
> > - Explicitly check whether file system requires folio dirtying. Note that
> >   vma_wants_writenotify() could not be used directly as it is very much focused
> >   on determining if the PTE r/w should be set (e.g. assuming private mapping
> >   does not require it as already set, soft dirty considerations).
>
> Hm. Okay. Have you considered having a common base for your case and
> vma_wants_writenotify()? Code duplication doesn't look good.
>

I did and I actually started implementing something for the same reason,
however I wondered whether it was worth it for essentially 3 clauses that
are shared between the two.

On second thoughts, it is painful to have this duplicated, so let me take
another look.

>
> --
>   Kiryl Shutsemau / Kirill A. Shutemov
diff mbox series

Patch

diff --git a/mm/gup.c b/mm/gup.c
index 1f72a717232b..f77810ee8a9f 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -4,6 +4,7 @@ 
 #include <linux/err.h>
 #include <linux/spinlock.h>

+#include <linux/backing-dev.h>
 #include <linux/mm.h>
 #include <linux/memremap.h>
 #include <linux/pagemap.h>
@@ -959,16 +960,58 @@  static int faultin_page(struct vm_area_struct *vma,
 	return 0;
 }

+/*
+ * Writing to file-backed mappings which require folio dirty tracking using GUP
+ * is a fundamentally broken operation as kernel write access to GUP mappings
+ * may not adhere to the semantics expected by a file system.
+ */
+static inline bool can_write_file_mapping(struct vm_area_struct *vma,
+					  unsigned long gup_flags)
+{
+	const struct vm_operations_struct *vm_ops = vma->vm_ops;
+	const struct file *file = vma->vm_file;
+
+	/* If we aren't pinning then no problematic write can occur. */
+	if (!(gup_flags & (FOLL_GET | FOLL_PIN)))
+		return true;
+
+	/* We limit this check to the most egregious case - a long term pin. */
+	if (!(gup_flags & FOLL_LONGTERM))
+		return true;
+
+	/*
+	 * If the underlying file system needs to be notified of writes, then it
+	 * requires correct dirty tracking which we cannot guarantee.
+	 */
+	if (vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite))
+		return false;
+
+	/*
+	 * If no pfn_mkwrite() is specified, no dirty tracking is required for a
+	 * PFN map.
+	 */
+	if (vma->vm_flags & VM_PFNMAP)
+		return true;
+
+	/*
+	 * Even if the file system does not require write notification, if its
+	 * underlying mapping can writeback, dirty tracking will be required.
+	 */
+	return !file || !file->f_mapping ||
+		!mapping_can_writeback(file->f_mapping);
+}
+
 static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
 {
 	vm_flags_t vm_flags = vma->vm_flags;
 	int write = (gup_flags & FOLL_WRITE);
 	int foreign = (gup_flags & FOLL_REMOTE);
+	bool vma_anon = vma_is_anonymous(vma);

 	if (vm_flags & (VM_IO | VM_PFNMAP))
 		return -EFAULT;

-	if (gup_flags & FOLL_ANON && !vma_is_anonymous(vma))
+	if ((gup_flags & FOLL_ANON) && !vma_anon)
 		return -EFAULT;

 	if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma))
@@ -978,6 +1021,9 @@  static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
 		return -EFAULT;

 	if (write) {
+		if (!vma_anon && !can_write_file_mapping(vma, gup_flags))
+			return -EFAULT;
+
 		if (!(vm_flags & VM_WRITE)) {
 			if (!(gup_flags & FOLL_FORCE))
 				return -EFAULT;