Message ID | 24d53dac-d58d-6bb9-82af-c472922e4a31@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [next,1/3] truncate,shmem: Fix data loss when hole punched in folio | expand |
On Sun, Jan 02, 2022 at 05:34:05PM -0800, Hugh Dickins wrote: > xfstests generic 098 214 263 286 412 used to pass on huge tmpfs (well, > three of those _require_odirect, enabled by a shmem_direct_IO() stub), > but still fail even with the partial_end fix. > > generic/098 output mismatch shows actual data loss: > --- tests/generic/098.out > +++ /home/hughd/xfstests/results//generic/098.out.bad > @@ -4,9 +4,7 @@ > wrote 32768/32768 bytes at offset 262144 > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > File content after remount: > -0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > -* > -0400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > +0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > ... generic/098 is passing for me ;-( I'm using 'always' for THPs. I'll have to try harder. Regardless, I think your fix is good ... > +static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index) Love the better calling convention. > + folio = __filemap_get_folio(inode->i_mapping, index, > + FGP_ENTRY | FGP_LOCK, 0); > + if (!folio || !xa_is_value(folio)) > + return folio; That first '!folio' is redundant. xa_is_value(NULL) is false.
On Fri, 7 Jan 2022, Matthew Wilcox wrote: > On Sun, Jan 02, 2022 at 05:34:05PM -0800, Hugh Dickins wrote: > > xfstests generic 098 214 263 286 412 used to pass on huge tmpfs (well, > > three of those _require_odirect, enabled by a shmem_direct_IO() stub), > > but still fail even with the partial_end fix. > > > > generic/098 output mismatch shows actual data loss: > > --- tests/generic/098.out > > +++ /home/hughd/xfstests/results//generic/098.out.bad > > @@ -4,9 +4,7 @@ > > wrote 32768/32768 bytes at offset 262144 > > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > File content after remount: > > -0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > > -* > > -0400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > +0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ... > > generic/098 is passing for me ;-( I'm using 'always' for THPs. > I'll have to try harder. Regardless, I think your fix is good ... Worrying that the test behaves differently. Your 'always': you have '-o huge=always' in the exported TMPFS_MOUNT_OPTIONS? That should be enough, but I admit to belt and braces by also echo force > /sys/kernel/mm/transparent_hugepage/shmem_enabled Hugh I also back up with
On Sat, Jan 08, 2022 at 09:12:08AM -0800, Hugh Dickins wrote: > On Fri, 7 Jan 2022, Matthew Wilcox wrote: > > On Sun, Jan 02, 2022 at 05:34:05PM -0800, Hugh Dickins wrote: > > > xfstests generic 098 214 263 286 412 used to pass on huge tmpfs (well, > > > three of those _require_odirect, enabled by a shmem_direct_IO() stub), > > > but still fail even with the partial_end fix. > > > > > > generic/098 output mismatch shows actual data loss: > > > --- tests/generic/098.out > > > +++ /home/hughd/xfstests/results//generic/098.out.bad > > > @@ -4,9 +4,7 @@ > > > wrote 32768/32768 bytes at offset 262144 > > > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > > File content after remount: > > > -0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > > > -* > > > -0400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > +0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > ... > > > > generic/098 is passing for me ;-( I'm using 'always' for THPs. > > I'll have to try harder. Regardless, I think your fix is good ... > > Worrying that the test behaves differently. Your 'always': > you have '-o huge=always' in the exported TMPFS_MOUNT_OPTIONS? > That should be enough, but I admit to belt and braces by also > echo force > /sys/kernel/mm/transparent_hugepage/shmem_enabled Ah, I hadn't done TMPFS_MOUNT_OPTIONS, just the echo always >/sys/kernel/mm/transparent_hugepage/shmem_enabled Adding TMPFS_MOUNT_OPTIONS and retrying with what I originally posted reproduces the bug. Retrying with the current for-next branch doesn't. So now I can confirm that there was a bug and your patch fixed it. And maybe I can avoid introducing more bugs of this nature in the future.
--- hughd1/mm/shmem.c +++ hughd2/mm/shmem.c @@ -151,19 +151,6 @@ int shmem_getpage(struct inode *inode, p mapping_gfp_mask(inode->i_mapping), NULL, NULL, NULL); } -static int shmem_get_folio(struct inode *inode, pgoff_t index, - struct folio **foliop, enum sgp_type sgp) -{ - struct page *page = NULL; - int ret = shmem_getpage(inode, index, &page, sgp); - - if (page) - *foliop = page_folio(page); - else - *foliop = NULL; - return ret; -} - static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb) { return sb->s_fs_info; @@ -894,6 +881,28 @@ void shmem_unlock_mapping(struct address } } +static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index) +{ + struct folio *folio; + struct page *page; + + /* + * At first avoid shmem_getpage(,,,SGP_READ): that fails + * beyond i_size, and reports fallocated pages as holes. + */ + folio = __filemap_get_folio(inode->i_mapping, index, + FGP_ENTRY | FGP_LOCK, 0); + if (!folio || !xa_is_value(folio)) + return folio; + /* + * But read a page back from swap if any of it is within i_size + * (although in some cases this is just a waste of time). + */ + page = NULL; + shmem_getpage(inode, index, &page, SGP_READ); + return page ? page_folio(page) : NULL; +} + /* * Remove range of pages and swap entries from page cache, and free them. * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate. @@ -948,7 +957,7 @@ static void shmem_undo_range(struct inod } same_folio = (lstart >> PAGE_SHIFT) == (lend >> PAGE_SHIFT); - shmem_get_folio(inode, lstart >> PAGE_SHIFT, &folio, SGP_READ); + folio = shmem_get_partial_folio(inode, lstart >> PAGE_SHIFT); if (folio) { same_folio = lend < folio_pos(folio) + folio_size(folio); folio_mark_dirty(folio); @@ -963,7 +972,7 @@ static void shmem_undo_range(struct inod } if (!same_folio) - shmem_get_folio(inode, lend >> PAGE_SHIFT, &folio, SGP_READ); + folio = shmem_get_partial_folio(inode, lend >> PAGE_SHIFT); if (folio) { folio_mark_dirty(folio); if (!truncate_inode_partial_folio(folio, lstart, lend))
xfstests generic 098 214 263 286 412 used to pass on huge tmpfs (well, three of those _require_odirect, enabled by a shmem_direct_IO() stub), but still fail even with the partial_end fix. generic/098 output mismatch shows actual data loss: --- tests/generic/098.out +++ /home/hughd/xfstests/results//generic/098.out.bad @@ -4,9 +4,7 @@ wrote 32768/32768 bytes at offset 262144 XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) File content after remount: -0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa -* -0400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... The problem here is that shmem_getpage(,,,SGP_READ) intentionally supplies NULL page beyond EOF, and truncation and eviction intentionally lower i_size before shmem_undo_range() is called: so a whole folio got truncated instead of being treated partially. That could be solved by adding yet another SGP_mode to select the required behaviour, but it's cleaner just to handle cache and then swap in shmem_get_folio() - renamed here to shmem_get_partial_folio(), given an easier interface, and moved next to its sole user, shmem_undo_range(). We certainly do not want to read data back from swap when evicting an inode: i_size preset to 0 still ensures that. Nor do we want to zero folio data when evicting: truncate_inode_partial_folio()'s check for length == folio_size(folio) already ensures that. Fixes: 8842c9c23524 ("truncate,shmem: Handle truncates that split large folios") Signed-off-by: Hugh Dickins <hughd@google.com> --- mm/shmem.c | 39 ++++++++++++++++++++++++--------------- 1 file changed, 24 insertions(+), 15 deletions(-)