diff mbox series

[next,2/3] shmem: Fix data loss when folio truncated

Message ID 24d53dac-d58d-6bb9-82af-c472922e4a31@google.com (mailing list archive)
State New, archived
Headers show
Series [next,1/3] truncate,shmem: Fix data loss when hole punched in folio | expand

Commit Message

Hugh Dickins Jan. 3, 2022, 1:34 a.m. UTC
xfstests generic 098 214 263 286 412 used to pass on huge tmpfs (well,
three of those _require_odirect, enabled by a shmem_direct_IO() stub),
but still fail even with the partial_end fix.

generic/098 output mismatch shows actual data loss:
    --- tests/generic/098.out
    +++ /home/hughd/xfstests/results//generic/098.out.bad
    @@ -4,9 +4,7 @@
     wrote 32768/32768 bytes at offset 262144
     XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
     File content after remount:
    -0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
    -*
    -0400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    +0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ...

The problem here is that shmem_getpage(,,,SGP_READ) intentionally
supplies NULL page beyond EOF, and truncation and eviction intentionally
lower i_size before shmem_undo_range() is called: so a whole folio got
truncated instead of being treated partially.

That could be solved by adding yet another SGP_mode to select the
required behaviour, but it's cleaner just to handle cache and then swap
in shmem_get_folio() - renamed here to shmem_get_partial_folio(), given
an easier interface, and moved next to its sole user, shmem_undo_range().

We certainly do not want to read data back from swap when evicting an
inode: i_size preset to 0 still ensures that.  Nor do we want to zero
folio data when evicting: truncate_inode_partial_folio()'s check for
length == folio_size(folio) already ensures that.

Fixes: 8842c9c23524 ("truncate,shmem: Handle truncates that split large folios")
Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/shmem.c |   39 ++++++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 15 deletions(-)

Comments

Matthew Wilcox Jan. 7, 2022, 3:53 p.m. UTC | #1
On Sun, Jan 02, 2022 at 05:34:05PM -0800, Hugh Dickins wrote:
> xfstests generic 098 214 263 286 412 used to pass on huge tmpfs (well,
> three of those _require_odirect, enabled by a shmem_direct_IO() stub),
> but still fail even with the partial_end fix.
> 
> generic/098 output mismatch shows actual data loss:
>     --- tests/generic/098.out
>     +++ /home/hughd/xfstests/results//generic/098.out.bad
>     @@ -4,9 +4,7 @@
>      wrote 32768/32768 bytes at offset 262144
>      XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>      File content after remount:
>     -0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
>     -*
>     -0400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     +0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     ...

generic/098 is passing for me ;-(  I'm using 'always' for THPs.
I'll have to try harder.  Regardless, I think your fix is good ...

> +static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)

Love the better calling convention.

> +	folio = __filemap_get_folio(inode->i_mapping, index,
> +					FGP_ENTRY | FGP_LOCK, 0);
> +	if (!folio || !xa_is_value(folio))
> +		return folio;

That first '!folio' is redundant.  xa_is_value(NULL) is false.
Hugh Dickins Jan. 8, 2022, 5:12 p.m. UTC | #2
On Fri, 7 Jan 2022, Matthew Wilcox wrote:
> On Sun, Jan 02, 2022 at 05:34:05PM -0800, Hugh Dickins wrote:
> > xfstests generic 098 214 263 286 412 used to pass on huge tmpfs (well,
> > three of those _require_odirect, enabled by a shmem_direct_IO() stub),
> > but still fail even with the partial_end fix.
> > 
> > generic/098 output mismatch shows actual data loss:
> >     --- tests/generic/098.out
> >     +++ /home/hughd/xfstests/results//generic/098.out.bad
> >     @@ -4,9 +4,7 @@
> >      wrote 32768/32768 bytes at offset 262144
> >      XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >      File content after remount:
> >     -0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> >     -*
> >     -0400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >     +0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >     ...
> 
> generic/098 is passing for me ;-(  I'm using 'always' for THPs.
> I'll have to try harder.  Regardless, I think your fix is good ...

Worrying that the test behaves differently.  Your 'always':
you have '-o huge=always' in the exported TMPFS_MOUNT_OPTIONS?
That should be enough, but I admit to belt and braces by also
echo force > /sys/kernel/mm/transparent_hugepage/shmem_enabled

Hugh
I also back up with
Matthew Wilcox Jan. 8, 2022, 9:25 p.m. UTC | #3
On Sat, Jan 08, 2022 at 09:12:08AM -0800, Hugh Dickins wrote:
> On Fri, 7 Jan 2022, Matthew Wilcox wrote:
> > On Sun, Jan 02, 2022 at 05:34:05PM -0800, Hugh Dickins wrote:
> > > xfstests generic 098 214 263 286 412 used to pass on huge tmpfs (well,
> > > three of those _require_odirect, enabled by a shmem_direct_IO() stub),
> > > but still fail even with the partial_end fix.
> > > 
> > > generic/098 output mismatch shows actual data loss:
> > >     --- tests/generic/098.out
> > >     +++ /home/hughd/xfstests/results//generic/098.out.bad
> > >     @@ -4,9 +4,7 @@
> > >      wrote 32768/32768 bytes at offset 262144
> > >      XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > >      File content after remount:
> > >     -0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> > >     -*
> > >     -0400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >     +0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >     ...
> > 
> > generic/098 is passing for me ;-(  I'm using 'always' for THPs.
> > I'll have to try harder.  Regardless, I think your fix is good ...
> 
> Worrying that the test behaves differently.  Your 'always':
> you have '-o huge=always' in the exported TMPFS_MOUNT_OPTIONS?
> That should be enough, but I admit to belt and braces by also
> echo force > /sys/kernel/mm/transparent_hugepage/shmem_enabled

Ah, I hadn't done TMPFS_MOUNT_OPTIONS, just the
    echo always >/sys/kernel/mm/transparent_hugepage/shmem_enabled

Adding TMPFS_MOUNT_OPTIONS and retrying with what I originally posted
reproduces the bug.  Retrying with the current for-next branch doesn't.
So now I can confirm that there was a bug and your patch fixed it.
And maybe I can avoid introducing more bugs of this nature in the future.
diff mbox series

Patch

--- hughd1/mm/shmem.c
+++ hughd2/mm/shmem.c
@@ -151,19 +151,6 @@  int shmem_getpage(struct inode *inode, p
 		mapping_gfp_mask(inode->i_mapping), NULL, NULL, NULL);
 }
 
-static int shmem_get_folio(struct inode *inode, pgoff_t index,
-		struct folio **foliop, enum sgp_type sgp)
-{
-	struct page *page = NULL;
-	int ret = shmem_getpage(inode, index, &page, sgp);
-
-	if (page)
-		*foliop = page_folio(page);
-	else
-		*foliop = NULL;
-	return ret;
-}
-
 static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
 {
 	return sb->s_fs_info;
@@ -894,6 +881,28 @@  void shmem_unlock_mapping(struct address
 	}
 }
 
+static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)
+{
+	struct folio *folio;
+	struct page *page;
+
+	/*
+	 * At first avoid shmem_getpage(,,,SGP_READ): that fails
+	 * beyond i_size, and reports fallocated pages as holes.
+	 */
+	folio = __filemap_get_folio(inode->i_mapping, index,
+					FGP_ENTRY | FGP_LOCK, 0);
+	if (!folio || !xa_is_value(folio))
+		return folio;
+	/*
+	 * But read a page back from swap if any of it is within i_size
+	 * (although in some cases this is just a waste of time).
+	 */
+	page = NULL;
+	shmem_getpage(inode, index, &page, SGP_READ);
+	return page ? page_folio(page) : NULL;
+}
+
 /*
  * Remove range of pages and swap entries from page cache, and free them.
  * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate.
@@ -948,7 +957,7 @@  static void shmem_undo_range(struct inod
 	}
 
 	same_folio = (lstart >> PAGE_SHIFT) == (lend >> PAGE_SHIFT);
-	shmem_get_folio(inode, lstart >> PAGE_SHIFT, &folio, SGP_READ);
+	folio = shmem_get_partial_folio(inode, lstart >> PAGE_SHIFT);
 	if (folio) {
 		same_folio = lend < folio_pos(folio) + folio_size(folio);
 		folio_mark_dirty(folio);
@@ -963,7 +972,7 @@  static void shmem_undo_range(struct inod
 	}
 
 	if (!same_folio)
-		shmem_get_folio(inode, lend >> PAGE_SHIFT, &folio, SGP_READ);
+		folio = shmem_get_partial_folio(inode, lend >> PAGE_SHIFT);
 	if (folio) {
 		folio_mark_dirty(folio);
 		if (!truncate_inode_partial_folio(folio, lstart, lend))